Re: strange guest slowness after some time

2009-06-16 Thread Tomasz Chmielewski

Felix Leimbach wrote:


It's exactly the same CPU I have.
Interesting: Since two months I'm running on 2 Shanghai Quad-Cores 
instead and the problem is definitely gone.
The rest of the hardware as well as the whole software-stack remained 
unchanged.


That should confirm what we assumed already.


For me, it turned out that KVM I was running (coming with Proxmox VE) 
had a fairsched patch (OpenVZ-related) which caused this broken behaviour.



--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-06-08 Thread Felix Leimbach

Tomasz Chmielewski wrote:

Felix Leimbach schrieb:


BTW, what CPU do you have?

One dual core Opteron 2212
Note: I will upgrade to two Shanghai Quad-Cores in 2 weeks and test 
with those as well.


processor   : 1
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 65
model name  : Dual-Core AMD Opteron(tm) Processor 2212
stepping: 2
cpu MHz : 1994.996
cache size  : 1024 KB


It's exactly the same CPU I have.
Interesting: Since two months I'm running on 2 Shanghai Quad-Cores 
instead and the problem is definitely gone.
The rest of the hardware as well as the whole software-stack remained 
unchanged.


That should confirm what we assumed already.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-05-31 Thread Avi Kivity

Tomasz Chmielewski wrote:

Accidentally, I made some interesting discovery.

This ~2 MB video shows a kvm-86 guest being rebooted and GRUB started:

http://syneticon.net/kvm/kvm-slowness.ogg


GRUB has its timeout set to 50 seconds, and is supposed to show it on 
the screen by decreasing the number of seconds shown, every second.


Here, GRUB decreases the second counter very fast by 2 seconds, then 
waits 2 seconds, then again decreases the number of sends by 2 seconds 
very fast, and so on.


Perhaps my wording does not describe it very well though, so just try 
to download the video and open it i.e. in mplayer.


Wierd, wierd.

Can you run kvmtrace on this guest while this is happening and post the 
results somewhere?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-05-28 Thread Tomasz Chmielewski

Avi Kivity wrote:

Tomasz Chmielewski wrote:

Maybe virtio is racy and a loaded host exposes the race.


I see it happening with virtio on 2.6.29.x guests as well.

So, what would you do if you saw it on your systems as well? ;)

Add some debug routines into virtio_* modules?



I'm no virtio expert.  Maybe I'd insert tracepoints to record interrupts 
and kicks.


Accidentally, I made some interesting discovery.

This ~2 MB video shows a kvm-86 guest being rebooted and GRUB started:

http://syneticon.net/kvm/kvm-slowness.ogg


GRUB has its timeout set to 50 seconds, and is supposed to show it on 
the screen by decreasing the number of seconds shown, every second.


Here, GRUB decreases the second counter very fast by 2 seconds, then 
waits 2 seconds, then again decreases the number of sends by 2 seconds 
very fast, and so on.


Perhaps my wording does not describe it very well though, so just try to 
download the video and open it i.e. in mplayer.



Comments?


--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-05-26 Thread Tomasz Chmielewski

Rusty Russell wrote:

On Tuesday 07 April 2009 00:49:17 Tomasz Chmielewski wrote:

Tomasz Chmielewski schrieb:


As I mentioned, it was using virtio net.

Guests running with e1000 (and virtio_blk) don't have this problem.

Also, virtio_console seem to be affected by this slowness issue.


I'm pretty sure this is different.  Older virtio_console code ignored
interrupts and polled, and use a heuristic to back off on polling (this was
because we used the generic hvc infrastructure which hacked support).

You'll find a delay on the first keystroke after idle, but none on the
second.


I still observe this slowness with kvm-86 after the guest is running 
for some time (virtio_net and virtio_console seem to be affected; guest 
restart doesn't fix it).


--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-05-26 Thread Avi Kivity

Tomasz Chmielewski wrote:


I still observe this slowness with kvm-86 after the guest is running 
for some time (virtio_net and virtio_console seem to be affected; 
guest restart doesn't fix it).




Anything in guest dmesg?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-05-26 Thread Tomasz Chmielewski

Avi Kivity wrote:

Tomasz Chmielewski wrote:


I still observe this slowness with kvm-86 after the guest is running 
for some time (virtio_net and virtio_console seem to be affected; 
guest restart doesn't fix it).




Anything in guest dmesg?


No.
No hints in syslog, dmesg...


Can it be that this is more likely to happens on busy hosts?

It happens for me on a host where I have 16 guests running.


Also, as I booted the host almost 2 days ago, 2 or 3 guests didn't start 
properly (16 guests were starting at the same time), with their kernel 
saying:


Kernel panic - not syncing: IO-APIC + timer doesn't work!

Can it be related?

After I restarted these failed guests, they started properly.


--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-05-26 Thread Avi Kivity

Tomasz Chmielewski wrote:

Avi Kivity wrote:

Tomasz Chmielewski wrote:


I still observe this slowness with kvm-86 after the guest is 
running for some time (virtio_net and virtio_console seem to be 
affected; guest restart doesn't fix it).




Anything in guest dmesg?


No.
No hints in syslog, dmesg...


Can it be that this is more likely to happens on busy hosts?



We'll only know once we fix it...


It happens for me on a host where I have 16 guests running.


Also, as I booted the host almost 2 days ago, 2 or 3 guests didn't 
start properly (16 guests were starting at the same time), with their 
kernel saying:


Kernel panic - not syncing: IO-APIC + timer doesn't work!

Can it be related?

After I restarted these failed guests, they started properly.



This is timing related.  On a busy host you can get timeouts and thus 
the panics.  It's unrelated.


Maybe virtio is racy and a loaded host exposes the race.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-05-26 Thread Tomasz Chmielewski

Avi Kivity wrote:

Tomasz Chmielewski wrote:

Avi Kivity wrote:

Tomasz Chmielewski wrote:


I still observe this slowness with kvm-86 after the guest is 
running for some time (virtio_net and virtio_console seem to be 
affected; guest restart doesn't fix it).




Anything in guest dmesg?


No.
No hints in syslog, dmesg...


Can it be that this is more likely to happens on busy hosts?



We'll only know once we fix it...


(...)


Maybe virtio is racy and a loaded host exposes the race.


I see it happening with virtio on 2.6.29.x guests as well.

So, what would you do if you saw it on your systems as well? ;)

Add some debug routines into virtio_* modules?


--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-05-26 Thread Avi Kivity

Tomasz Chmielewski wrote:

Maybe virtio is racy and a loaded host exposes the race.


I see it happening with virtio on 2.6.29.x guests as well.

So, what would you do if you saw it on your systems as well? ;)

Add some debug routines into virtio_* modules?



I'm no virtio expert.  Maybe I'd insert tracepoints to record interrupts 
and kicks.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-04-07 Thread Rusty Russell
On Tuesday 07 April 2009 00:49:17 Tomasz Chmielewski wrote:
 Tomasz Chmielewski schrieb:
 
  As I mentioned, it was using virtio net.
  
  Guests running with e1000 (and virtio_blk) don't have this problem.
 
 Also, virtio_console seem to be affected by this slowness issue.

I'm pretty sure this is different.  Older virtio_console code ignored
interrupts and polled, and use a heuristic to back off on polling (this was
because we used the generic hvc infrastructure which hacked support).

You'll find a delay on the first keystroke after idle, but none on the
second.

Hope that helps,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-04-07 Thread Tomasz Chmielewski

Rusty Russell schrieb:

On Tuesday 07 April 2009 00:49:17 Tomasz Chmielewski wrote:

Tomasz Chmielewski schrieb:


As I mentioned, it was using virtio net.

Guests running with e1000 (and virtio_blk) don't have this problem.

Also, virtio_console seem to be affected by this slowness issue.


I'm pretty sure this is different.  Older virtio_console code ignored
interrupts and polled, and use a heuristic to back off on polling (this was
because we used the generic hvc infrastructure which hacked support).


By older you mean guest drivers?
I have 2.6.27.x on guests and see this issue.
If you meant host, I use kvm-84.



You'll find a delay on the first keystroke after idle, but none on the
second.


I'm not sure.
Press a seven times fast, and 7 characters will be printed a second later.

But: wait one second more, it will be unresponsive again. You won't see 
the characters as you type.



Also these symptoms are very similar to virtio_net issue:
- it happens only on some guest (even if they have the same kernel and 
userspace) after a random period of time
- it used to happen for me _always_ when network got slow with 
virtio_net driver

- it doesn't go away with guest restart initiated from guest's system
- it goes away with kvm process stop/start (i.e. new kvm process), but 
can appear later with no apparent cause




--
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-04-06 Thread Tomasz Chmielewski

Tomasz Chmielewski schrieb:


As I mentioned, it was using virtio net.

Guests running with e1000 (and virtio_blk) don't have this problem.


Also, virtio_console seem to be affected by this slowness issue.

Am I correct to think that if:

* on guest lsmod outputs:

virtio_console  6828  0 [permanent]

* on guest, /etc/inittab contains:

6:2345:respawn:/sbin/mingetty ttyS0

* on host, I start the guest with a parameter:

-serial unix:/var/run/qemu-server/103.serial,server,nowait


That the guests's ttyS0 console is virtio_console?



If my thinking is correct, than I have a slow serial console on some 
of the guests using virtio_pci and virtio_console driver.



By slow serial console I mean any character typed shows up after a 
second or so.


It can be also cured like with virtio_net - just run:

dd if=/dev/vda of=/dev/null

And the console reacts normally. Stop dd, console is slow again.


I have this issue on two guests with e1000 network, which use virtio_blk 
(and virtio_console...).

I never saw this issue with guests which don't use virtio.



--
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-04-01 Thread Tomasz Chmielewski

David S. Ahern schrieb:


Could you add a (unused) e1000 interface to your virtio guests?
As this issue happens rarely for me, maybe you could help to
reproduce it as well (i.e. if network gets slow on virtio interface,
give e1000 a IP address, and try if network is also slow on e1000 on
the very same guest).

Will do and report

BTW, what CPU do you have?

One dual core Opteron 2212
Note: I will upgrade to two Shanghai Quad-Cores in 2 weeks and test
with those as well.

I have this slowness on an Intel CPU as well, after about 10 days of
guest uptime (using virtio net):

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU3050  @ 2.13GHz



For the Intel server, the guest is using the e1000 NIC or virtio or
other? I have a few DL320G5s with this processor; I have not hit this
problem running rhel3 and rhel4 guests using e1000/scsi devices.


As I mentioned, it was using virtio net.

Guests running with e1000 (and virtio_blk) don't have this problem.


--
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-31 Thread Tomasz Chmielewski

Felix Leimbach schrieb:

Tomasz Chmielewski wrote:

Felix Leimbach schrieb:

Out of 3 e1000 guests none has ever been hit.

Observed with kvm-83 and kvm-84 with the host running in-kernel KVM 
code (linux 2.6.25.7)

Could you add a (unused) e1000 interface to your virtio guests?
As this issue happens rarely for me, maybe you could help to reproduce 
it as well (i.e. if network gets slow on virtio interface, give e1000 
a IP address, and try if network is also slow on e1000 on the very 
same guest).

Will do and report


BTW, what CPU do you have?

One dual core Opteron 2212
Note: I will upgrade to two Shanghai Quad-Cores in 2 weeks and test with 
those as well.


I have this slowness on an Intel CPU as well, after about 10 days of 
guest uptime (using virtio net):


processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU3050  @ 2.13GHz
stepping: 6
cpu MHz : 2133.410
cache size  : 2048 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall lm constant_tsc arch_perfmon pebs bts rep_good pni monitor 
ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm

bogomips: 4266.87
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:


--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-31 Thread David S. Ahern

Tomasz Chmielewski wrote:
 Felix Leimbach schrieb:
 Tomasz Chmielewski wrote:
 Felix Leimbach schrieb:
 Out of 3 e1000 guests none has ever been hit.

 Observed with kvm-83 and kvm-84 with the host running in-kernel KVM
 code (linux 2.6.25.7)
 Could you add a (unused) e1000 interface to your virtio guests?
 As this issue happens rarely for me, maybe you could help to
 reproduce it as well (i.e. if network gets slow on virtio interface,
 give e1000 a IP address, and try if network is also slow on e1000 on
 the very same guest).
 Will do and report

 BTW, what CPU do you have?
 One dual core Opteron 2212
 Note: I will upgrade to two Shanghai Quad-Cores in 2 weeks and test
 with those as well.
 
 I have this slowness on an Intel CPU as well, after about 10 days of
 guest uptime (using virtio net):
 
 processor   : 1
 vendor_id   : GenuineIntel
 cpu family  : 6
 model   : 15
 model name  : Intel(R) Xeon(R) CPU3050  @ 2.13GHz
 stepping: 6
 cpu MHz : 2133.410
 cache size  : 2048 KB
 physical id : 0
 siblings: 2
 core id : 1
 cpu cores   : 2
 fpu : yes
 fpu_exception   : yes
 cpuid level : 10
 wp  : yes
 flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
 syscall lm constant_tsc arch_perfmon pebs bts rep_good pni monitor
 ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
 bogomips: 4266.87
 clflush size: 64
 cache_alignment : 64
 address sizes   : 36 bits physical, 48 bits virtual
 power management:
 
 

For the Intel server, the guest is using the e1000 NIC or virtio or
other? I have a few DL320G5s with this processor; I have not hit this
problem running rhel3 and rhel4 guests using e1000/scsi devices.

david

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-19 Thread Tomasz Chmielewski

David S. Ahern schrieb:


David S. Ahern wrote:

Rusty Russell wrote:

On Wednesday 18 March 2009 16:59:36 Avi Kivity wrote:

Tomasz Chmielewski wrote:

virtio_net virtio0: id 64 is not a head!

This means that qemu said I've finished with buffer 64 and the guest didn't
know anything about buffer 64.

We should not lock up, tho networking is toast: I think that qemu got upset
and that caused this as well as it to chew 100% cpu.

I'll see if I can reproduce with kvm-84 userspace and 2.6.27 guests, 32-bit
guests on a 64-bit AMD host.  What's your kvm/qemu command line?


I've hit this as well.

Intel host, running RHEL5.3, x86_64 with KVM-81.

Guest is RHEL4.7, 32-bit, with the virtio drivers from RHEL4.8 beta.

Happens pretty darn quickly for me.

david



Like I said, pretty darn quickly.


Can you reproduce it also with e1000 instead of virtio?


--
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-19 Thread Tomasz Chmielewski

Tomasz Chmielewski schrieb:

Note how _time_ is different (similar timings are to other unaffected 
guests):


This is also pretty interesting:

# ping -c 10 unaffected guest
PING 192.168.4.4 (192.168.4.4) 56(84) bytes of data.
64 bytes from 192.168.4.4: icmp_seq=1 ttl=64 time=1.25 ms
64 bytes from 192.168.4.4: icmp_seq=2 ttl=64 time=1.58 ms


(...)


--- 192.168.4.4 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9091ms
rtt min/avg/max/mdev = 1.031/2.059/3.894/1.045 ms



How probable it is so many pings returned with exactly 1000 ms?

# ping -c 10 affected_guest
PING 192.168.4.5 (192.168.4.5) 56(84) bytes of data.
64 bytes from 192.168.4.5: icmp_seq=1 ttl=64 time=1009 ms
64 bytes from 192.168.4.5: icmp_seq=2 ttl=64 time=9.61 ms
64 bytes from 192.168.4.5: icmp_seq=3 ttl=64 time=1000 ms
64 bytes from 192.168.4.5: icmp_seq=4 ttl=64 time=1000 ms


(...)

Just same as above happened for me again.
This time, I equipped the guest in one virtio card and one e1000 card.

00:03.0 Ethernet controller: Qumranet, Inc. Device 1000
00:04.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet 
Controller (rev 03)

Pinging e1000 card on affected guest - replies are as fast:

# ping 10.1.1.1
PING 10.1.1.1 (10.1.1.1) 56(84) bytes of data.
64 bytes from 10.1.1.1: icmp_seq=1 ttl=64 time=5.86 ms
64 bytes from 10.1.1.1: icmp_seq=2 ttl=64 time=3.40 ms
64 bytes from 10.1.1.1: icmp_seq=3 ttl=64 time=0.791 ms

Pinging virtio on affected guest - slow:

# ping 192.168.113.83
PING 192.168.113.83 (192.168.113.83) 56(84) bytes of data.
64 bytes from 192.168.113.83: icmp_seq=1 ttl=64 time=21.6 ms
64 bytes from 192.168.113.83: icmp_seq=2 ttl=64 time=1000 ms
64 bytes from 192.168.113.83: icmp_seq=3 ttl=64 time=2.73 ms
64 bytes from 192.168.113.83: icmp_seq=4 ttl=64 time=243 ms


(this is same network, guests on the same host, so latencies are not caused by packets 
travelling around the globe).



--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-19 Thread David S. Ahern


Tomasz Chmielewski wrote:
 David S. Ahern schrieb:

 David S. Ahern wrote:
 Rusty Russell wrote:
 On Wednesday 18 March 2009 16:59:36 Avi Kivity wrote:
 Tomasz Chmielewski wrote:
 virtio_net virtio0: id 64 is not a head!
 This means that qemu said I've finished with buffer 64 and the
 guest didn't
 know anything about buffer 64.

 We should not lock up, tho networking is toast: I think that qemu
 got upset
 and that caused this as well as it to chew 100% cpu.

 I'll see if I can reproduce with kvm-84 userspace and 2.6.27 guests,
 32-bit
 guests on a 64-bit AMD host.  What's your kvm/qemu command line?

 I've hit this as well.

 Intel host, running RHEL5.3, x86_64 with KVM-81.

 Guest is RHEL4.7, 32-bit, with the virtio drivers from RHEL4.8 beta.

 Happens pretty darn quickly for me.

 david


 Like I said, pretty darn quickly.
 
 Can you reproduce it also with e1000 instead of virtio?
 
 

I have not had a problem with the e1000 nic. This seems to be strictly a
virtio bug; I get the same messages. These are 2 separate runs, one from
March 11:

kernel: virtio_net virtio0: id 98 is not a head!

and the other last night:

kernel: virtio_net virtio0: id 6 is not a head!


david

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-18 Thread Avi Kivity

Felix Leimbach wrote:

Avi Kivity wrote:

Felix Leimbach wrote:
Is tsc breakage still s.th. to watch out after I've upgraded to the 
Shanghai quadcores?


No, should be gone.

Will you have the old server around so we can test things?


No, I'll be upgrading the existing server. If you have specific tests in
mind I can perform them in the next two weeks before the upgrade. But I
cannot restart the server because a few VMs are in production use.

If a developer is interested in the old CPU (Opteron 2212) then I can
have it mailed to him/her.



Thanks for the offer; I can probably find a similar cpu, the main 
difficulty is replicating the problem.


Since there are now at least two reports, maybe it won't be that 
difficult.  If you can figure out a way to reliably reproduce this, that 
would be most helpful.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Tomasz Chmielewski

Tomasz Chmielewski schrieb:

Avi Kivity schrieb:


I'm guessing there's a problem with timers or timer interrupts.

What is the host cpu?


4 entries like this in /proc/cpuinfo:

processor   : 3
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 65
model name  : Dual-Core AMD Opteron(tm) Processor 2212



That's probably the kvmclock issue that hit older AMDs.  It was fixed 
in kvm-84, please try that.


I've been running it for about a week now with kvm-84 and no guest got 
slow.


Can it be related to using cpufreq and ondemand governor?


Something fishy here :(

After a week or so, network in one guest got slow with kvm-84 and no 
cpufreq.



--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Avi Kivity

Tomasz Chmielewski wrote:


After a week or so, network in one guest got slow with kvm-84 and no 
cpufreq.




This is virtio, right?  What about e1000?

(I realize it takes a week to reproduce, but maybe you have some more 
experience)



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Tomasz Chmielewski

Avi Kivity schrieb:

Tomasz Chmielewski wrote:


After a week or so, network in one guest got slow with kvm-84 and no 
cpufreq.




This is virtio, right?  What about e1000?

(I realize it takes a week to reproduce, but maybe you have some more 
experience)


Yes, all affected had virtio. Probably because I didn't have many guests 
with e1000 interface.


After a guest gets slow, I stop it and add another interface, e1000.


If it gets slow again, I'll check if e1000 interface is slow as well.

Will keep you updated.


--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Felix Leimbach

Tomasz Chmielewski wrote:

Avi Kivity schrieb:

Tomasz Chmielewski wrote:
After a week or so, network in one guest got slow with kvm-84 and no 
cpufreq.

This is virtio, right?  What about e1000?

(I realize it takes a week to reproduce, but maybe you have some more 
experience)


Yes, all affected had virtio. Probably because I didn't have many 
guests with e1000 interface.


After a guest gets slow, I stop it and add another interface, e1000.


If it gets slow again, I'll check if e1000 interface is slow as well.

Will keep you updated.
I see similar behavior: After a week one of my guests' network totally 
stops to respond. Only guests using virtio networking get hit. Both 
windows and linux guests are affected.

My guests in production use e1000 and have never been hit.
While that can be a coincidence it seems very unlikely: Out of 3 virtio 
guests 2 have been hit, one repeatedly.

Out of 3 e1000 guests none has ever been hit.

Observed with kvm-83 and kvm-84 with the host running in-kernel KVM code 
(linux 2.6.25.7)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Tomasz Chmielewski

Avi Kivity schrieb:

Felix Leimbach wrote:
I see similar behavior: After a week one of my guests' network totally 
stops to respond. Only guests using virtio networking get hit. Both 
windows and linux guests are affected.

My guests in production use e1000 and have never been hit.
While that can be a coincidence it seems very unlikely: Out of 3 
virtio guests 2 have been hit, one repeatedly.

Out of 3 e1000 guests none has ever been hit.

Observed with kvm-83 and kvm-84 with the host running in-kernel KVM 
code (linux 2.6.25.7)


Might it be that some counter overflowed?  What are the packet counts on 
long running guests?


I don't think so.

I just made both counters (TX, RX) of ifconfig for virtio interfaces 
overflow several times and everything is still as fast as it should be.



(output of ifconfig, even on an unaffected e1000 guest, might help)



--
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Tomasz Chmielewski

Felix Leimbach schrieb:

Yes, all affected had virtio. Probably because I didn't have many 
guests with e1000 interface.


After a guest gets slow, I stop it and add another interface, e1000.


If it gets slow again, I'll check if e1000 interface is slow as well.

Will keep you updated.
I see similar behavior: After a week one of my guests' network totally 
stops to respond. Only guests using virtio networking get hit. Both 
windows and linux guests are affected.


Also, does guest reboot help for you (for me, it doesn't)?

Or, you have to halt the guest and start it again (i.e. stop kvm/qemu 
process and start a new one) to make the network working properly again?



--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Avi Kivity

Felix Leimbach wrote:


BTW, what CPU do you have?

One dual core Opteron 2212


Does idle=poll help things?  It can cause tsc breakage similar to cpufreq.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Felix Leimbach

Tomasz Chmielewski wrote:

Avi Kivity schrieb:
Might it be that some counter overflowed?  What are the packet counts 
on long running guests?

I don't think so.

I just made both counters (TX, RX) of ifconfig for virtio interfaces 
overflow several times and everything is still as fast as it should be.

I had overflows on the counters as well (32 bit guests) without an problem.
Here is the current ifconfig output of a machine which suffered the 
problem before:


eth0  Link encap:Ethernet  HWaddr 52:54:00:74:01:01
 inet addr:10.75.13.1  Bcast:10.75.255.255  Mask:255.255.0.0
 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 RX packets:3542104 errors:0 dropped:0 overruns:0 frame:0
 TX packets:412546 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:1000
 RX bytes:682285568 (650.6 MiB)  TX bytes:2907586796 (2.7 GiB)


(output of ifconfig, even on an unaffected e1000 guest, might help)
currently I have e1000 only on windows guests. Is there a way to gather 
relevant statistics there too?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Avi Kivity

Felix Leimbach wrote:

Tomasz Chmielewski wrote:

Avi Kivity schrieb:
Might it be that some counter overflowed?  What are the packet 
counts on long running guests?

I don't think so.

I just made both counters (TX, RX) of ifconfig for virtio interfaces 
overflow several times and everything is still as fast as it should be.
I had overflows on the counters as well (32 bit guests) without an 
problem.
Here is the current ifconfig output of a machine which suffered the 
problem before:


eth0  Link encap:Ethernet  HWaddr 52:54:00:74:01:01
 inet addr:10.75.13.1  Bcast:10.75.255.255  Mask:255.255.0.0
 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 RX packets:3542104 errors:0 dropped:0 overruns:0 frame:0
 TX packets:412546 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:1000
 RX bytes:682285568 (650.6 MiB)  TX bytes:2907586796 (2.7 GiB)


packet counters are will within 32-bit limits.  byte counters not so 
interesting.


currently I have e1000 only on windows guests. Is there a way to 
gather relevant statistics there too?


Sure, right-click on the adapter icon, it's there somewhere.

Do you experience the slowdown on Windows guests?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Avi Kivity

Felix Leimbach wrote:
I noticed that you Tomasz are also running kvm-83. Maybe kvm-84 fixed 
the issue already?


kvm-84 fixes a serious problem with kvmclock on AMDs, but does not fix 
the problem with c1e, so it may not have fixed the problem completely.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Tomasz Chmielewski

Felix Leimbach schrieb:

I have not tried rebooting; always stopped and restarted the qemu 
instance. Will try on the next occasion.


Before I wrote that I tested on kvm-83 and 84 but it turns out the 
kvm-84 part was wrong: Since the upgrade 4 days ago I have not yet had a 
hang.
I noticed that you Tomasz are also running kvm-83. Maybe kvm-84 fixed 
the issue already?


No, I run kvm-84.
With kvm-83 I had this issue much more frequently. With kvm-84, is seems 
less frequent. Or maybe that's just what I'd like to believe ;)



--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Tomasz Chmielewski

Felix Leimbach schrieb:


BTW, what CPU do you have?

One dual core Opteron 2212
Note: I will upgrade to two Shanghai Quad-Cores in 2 weeks and test with 
those as well.


processor   : 1
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 65
model name  : Dual-Core AMD Opteron(tm) Processor 2212
stepping: 2
cpu MHz : 1994.996
cache size  : 1024 KB


It's exactly the same CPU I have.

Almost. My is 5.004 MHz faster ;)


model name  : Dual-Core AMD Opteron(tm) Processor 2212 

stepping: 2 

cpu MHz : 2000.000 




--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Tomasz Chmielewski

Avi Kivity schrieb:

Felix Leimbach wrote:

Tomasz Chmielewski wrote:

Avi Kivity schrieb:
Might it be that some counter overflowed?  What are the packet 
counts on long running guests?


Here is the current ifconfig output of a machine which suffered the 
problem before:


eth0  Link encap:Ethernet  HWaddr 52:54:00:74:01:01
 inet addr:10.75.13.1  Bcast:10.75.255.255  Mask:255.255.0.0
 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 RX packets:3542104 errors:0 dropped:0 overruns:0 frame:0
 TX packets:412546 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:1000
 RX bytes:682285568 (650.6 MiB)  TX bytes:2907586796 (2.7 GiB)


packet counters are will within 32-bit limits.  byte counters not so 
interesting.


Ah OK.
I did only byte overflow.

Packet overflow will take much longer. It's one of these very rare cases 
where setting very small MTU is useful...



--
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Felix Leimbach

Avi Kivity wrote:
Does idle=poll help things?  It can cause tsc breakage similar to 
cpufreq.

On the host, right? Can't test that as I cannot reboot the server.
Is tsc breakage still s.th. to watch out after I've upgraded to the 
Shanghai quadcores?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Tomasz Chmielewski

Tomasz Chmielewski schrieb:

Avi Kivity schrieb:

Felix Leimbach wrote:

Tomasz Chmielewski wrote:

Avi Kivity schrieb:
Might it be that some counter overflowed?  What are the packet 
counts on long running guests?


Here is the current ifconfig output of a machine which suffered the 
problem before:


eth0  Link encap:Ethernet  HWaddr 52:54:00:74:01:01
 inet addr:10.75.13.1  Bcast:10.75.255.255  Mask:255.255.0.0
 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 RX packets:3542104 errors:0 dropped:0 overruns:0 frame:0
 TX packets:412546 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:1000
 RX bytes:682285568 (650.6 MiB)  TX bytes:2907586796 (2.7 GiB)


packet counters are will within 32-bit limits.  byte counters not so 
interesting.


Ah OK.
I did only byte overflow.

Packet overflow will take much longer. It's one of these very rare cases 
where setting very small MTU is useful...


OK, another bug found.

Set your MTU to 100.

On two hosts, do:

HOST1_MTU1500# dd if=/dev/zero | ssh mana...@host2 dd of=/dev/null
HOST2_MTU100# dd if=/dev/zero | ssh mana...@host1 dd of=/dev/null

HOST2 with MTU 100 will crash after 10-15 minutes (with packet count 
still not overflown).



--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Felix Leimbach

Avi Kivity wrote:

Felix Leimbach wrote:

eth0  Link encap:Ethernet  HWaddr 52:54:00:74:01:01
 inet addr:10.75.13.1  Bcast:10.75.255.255  Mask:255.255.0.0
 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 RX packets:3542104 errors:0 dropped:0 overruns:0 frame:0
 TX packets:412546 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:1000
 RX bytes:682285568 (650.6 MiB)  TX bytes:2907586796 (2.7 GiB)


packet counters are will within 32-bit limits.  byte counters not so 
interesting.

ah right, I checked the byte counters only.
Testing packet counter overflow now (takes a while).


Do you experience the slowdown on Windows guests?

both Linux and Windows Server 2003. All 32bit.
But with me it is not a slowdown but a complete loss of network in the 
guest. Can't be pinged anymore. Although there might be a slowdown 
period before the that, I've heard hints in that direction from users.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Felix Leimbach

Tomasz Chmielewski wrote:

Tomasz Chmielewski schrieb:

Avi Kivity schrieb:
packet counters are will within 32-bit limits.  byte counters not so 
interesting.


Ah OK.
I did only byte overflow.

Packet overflow will take much longer. It's one of these very rare 
cases where setting very small MTU is useful...


OK, another bug found.

Set your MTU to 100.

On two hosts, do:

HOST1_MTU1500# dd if=/dev/zero | ssh mana...@host2 dd of=/dev/null
HOST2_MTU100# dd if=/dev/zero | ssh mana...@host1 dd of=/dev/null

HOST2 with MTU 100 will crash after 10-15 minutes (with packet count 
still not overflown).



Intersting. What are the packet counter at crash time (roughly)?

My - currently running - test is:

Guest 1 (Linux):
MTU 150
# cat /dev/zero | nc guest2ip 

Guest 2 (Windows 2003 Server):
MTU: 1500
# nc -l -p   NUL

My packet are currently at 63 million without a problem - yet.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Tomasz Chmielewski

Felix Leimbach schrieb:


OK, another bug found.

Set your MTU to 100.

On two hosts, do:

HOST1_MTU1500# dd if=/dev/zero | ssh mana...@host2 dd of=/dev/null
HOST2_MTU100# dd if=/dev/zero | ssh mana...@host1 dd of=/dev/null

HOST2 with MTU 100 will crash after 10-15 minutes (with packet count 
still not overflown).



Intersting. What are the packet counter at crash time (roughly)?

My - currently running - test is:

Guest 1 (Linux):
MTU 150
# cat /dev/zero | nc guest2ip 

Guest 2 (Windows 2003 Server):
MTU: 1500
# nc -l -p   NUL

My packet are currently at 63 million without a problem - yet.


I have it running with MTU 1500. And one of the guests (the one which 
was crashing with MTU=100) froze.


On a VNC console I can see:

virtio_net virtio0: id 64 is not a head!
BUG: soft lockup - CPU#0 stuck for 61s! [ssh:2265]

And soft lockup is being printed periodically. VNC and serial console 
do not react to any key press. Guest do not react on ACPI events (shutdown).

kvm/qemu process is using 100% CPU.

See this screenshot:

http://www1.wpkg.org/lockup.png


Guest that locks up is running Debian Lenny with 2.6.26 kernel.
Guest that does not lock up runs Mandriva 2009.0 with 2.6.27.x kernel.
(data being transferred both side to/from each of these hosts).



--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Tomasz Chmielewski

Tomasz Chmielewski schrieb:


See this screenshot:

http://www1.wpkg.org/lockup.png


Guest that locks up is running Debian Lenny with 2.6.26 kernel.
Guest that does not lock up runs Mandriva 2009.0 with 2.6.27.x kernel.
(data being transferred both side to/from each of these hosts).


Sorry, both machines run Debian Lenny and 2.6.26 kernel.
The only difference is that machine which crashes (with MTU=100) or 
locks up (with MTU=1500) runs a 2.6.26-1-686 kernel and the one which 
doesn't lock up runs 2.6.26-1-486 kernel (both are Debian's kernels).



--
Tomasz Chmielewski
http://wpkg.org



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Tomasz Chmielewski

Tomasz Chmielewski schrieb:


Sorry, both machines run Debian Lenny and 2.6.26 kernel.
The only difference is that machine which crashes (with MTU=100) or 
locks up (with MTU=1500) runs a 2.6.26-1-686 kernel and the one which 
doesn't lock up runs 2.6.26-1-486 kernel (both are Debian's kernels).


Some more tries and I got this one. Serial console died, but SSH is still 
working.

Note the S tainted flag.
According to Documentation/oops-tracing.txt, it means:

 3: 'S' if the oops occurred on an SMP kernel running on hardware that
hasn't been certified as safe to run multiprocessor.
Currently this occurs only on various Athlons that are not
SMP capable.


And this is a difference between 2.6.26-1-686 and 2.6.26-1-486 kernels.

# grep -i smp /boot/config-2.6.26-1-686
CONFIG_X86_SMP=y
CONFIG_X86_32_SMP=y
CONFIG_SMP=y


# grep -i smp /boot/config-2.6.26-1-486
CONFIG_BROKEN_ON_SMP=y
# CONFIG_SMP is not set



[10942.216450] BUG: soft lockup - CPU#0 stuck for 760s! [postgres:1802]
[10942.216450] Modules linked in: ipv6 loop joydev virtio_balloon virtio_net 
parport_pc parport snd_pcsp serio_raw snd_pcm snd_timer psmouse snd soundcore 
snd_page_alloc i2c_piix4 i2c_core button usbhid hid ff_memless evdev ext3 jbd 
mbcache virtio_blk ide_cd_mod cdrom ide_pci_generic floppy virtio_pci uhci_hcd 
usbcore piix ide_core ata_generic libata scsi_mod dock thermal processor fan 
thermal_sys
[10942.216450]
[10942.216450] Pid: 1802, comm: postgres Tainted: G S(2.6.26-1-686 #1)
[10942.216450] EIP: 0060:[c011d5a0] EFLAGS: 0206 CPU: 0
[10942.216450] EIP is at finish_task_switch+0x25/0x99
[10942.216450] EAX: c1208fa0 EBX: c03bafa0 ECX: c1208fa0 EDX: ce0be4a0
[10942.216450] ESI:  EDI: ce0be4a0 EBP: 0001 ESP: ce7f9afc
[10942.216450]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[10942.216450] CR0: 8005003b CR2: 080f3a10 CR3: 0eaeb000 CR4: 06d0
[10942.216450] DR0:  DR1:  DR2:  DR3: 
[10942.216450] DR6: 0ff0 DR7: 0400
[10942.216450]  [c02b82ee] ? schedule+0x60c/0x66f
[10942.216450]  [c0129ab0] ? lock_timer_base+0x19/0x35
[10942.216450]  [c0129bc3] ? __mod_timer+0x99/0xa3
[10942.216450]  [c02b8549] ? schedule_timeout+0x6b/0x86
[10942.216450]  [c01297ec] ? process_timeout+0x0/0x5
[10942.216450]  [c02b8544] ? schedule_timeout+0x66/0x86
[10942.216450]  [c017f2c6] ? do_select+0x364/0x3bd
[10942.216450]  [c017f7ca] ? __pollwait+0x0/0xac
[10942.216450]  [d08e74c4] ? start_xmit+0x9f/0xa5 [virtio_net]
[10942.216450]  [c025895c] ? dev_hard_start_xmit+0x1eb/0x24f
[10942.216450]  [c02669f2] ? __qdisc_run+0xcc/0x17c
[10942.216450]  [c025abbf] ? dev_queue_xmit+0x287/0x2bc
[10942.216450]  [c02762cd] ? ip_finish_output+0x1c5/0x1fc
[10942.216450]  [c0115403] ? pvclock_clocksource_read+0x4b/0xd0
[10942.216450]  [c0275e5b] ? ip_local_out+0x15/0x17
[10942.216450]  [c013604c] ? getnstimeofday+0x37/0xbc
[10942.216450]  [c01344c2] ? ktime_get_ts+0x22/0x49
[10942.216450]  [c01344f6] ? ktime_get+0xd/0x21
[10942.216450]  [c01190e6] ? hrtick_start_fair+0xeb/0x12c
[10942.216450]  [c011b39f] ? task_rq_lock+0x3b/0x5e
[10942.216450]  [c02531ab] ? skb_checksum+0x52/0x272
[10942.216450]  [c017f5a1] ? core_sys_select+0x282/0x29f
[10942.216450]  [c0129ccb] ? mod_timer+0x19/0x36
[10942.216450]  [c0252345] ? sock_def_readable+0xf/0x58
[10942.216450]  [c0283cf4] ? tcp_rcv_established+0x51d/0x7b1
[10942.216450]  [c0288d9f] ? tcp_v4_do_rcv+0x262/0x3e8
[10942.216450]  [c028ab5d] ? tcp_v4_rcv+0x5b6/0x609
[10942.216450]  [c0272ec3] ? ip_local_deliver_finish+0xe8/0x183
[10942.216450]  [c0272dbe] ? ip_rcv_finish+0x286/0x2a3
[10942.216450]  [c025837a] ? netif_receive_skb+0x2d6/0x343
[10942.216450]  [d08e7aa9] ? virtnet_poll+0x21d/0x258 [virtio_net]
[10942.216450]  [c017f915] ? sys_select+0x9f/0x180
[10942.216450]  [c0103853] ? sysenter_past_esp+0x78/0xb1
[10942.216450]  ===



--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-17 Thread Tomasz Chmielewski

Tomasz Chmielewski schrieb:

Tomasz Chmielewski schrieb:


Sorry, both machines run Debian Lenny and 2.6.26 kernel.
The only difference is that machine which crashes (with MTU=100) or 
locks up (with MTU=1500) runs a 2.6.26-1-686 kernel and the one 
which doesn't lock up runs 2.6.26-1-486 kernel (both are Debian's 
kernels).


Some more tries and I got this one. Serial console died, but SSH is 
still working.


Note the S tainted flag.
According to Documentation/oops-tracing.txt, it means:

 3: 'S' if the oops occurred on an SMP kernel running on hardware that
hasn't been certified as safe to run multiprocessor.
Currently this occurs only on various Athlons that are not
SMP capable.


And this is a difference between 2.6.26-1-686 and 2.6.26-1-486 kernels.

# grep -i smp /boot/config-2.6.26-1-686
CONFIG_X86_SMP=y
CONFIG_X86_32_SMP=y
CONFIG_SMP=y


# grep -i smp /boot/config-2.6.26-1-486
CONFIG_BROKEN_ON_SMP=y
# CONFIG_SMP is not set


BTW, it was the machine with /boot/config-2.6.26-1-486 kernel (non-SMP) 
which got slow for me today.



--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-09 Thread Tomasz Chmielewski

Tomasz Chmielewski schrieb:


The host is running kvm-83.
Affected guests are running 2.6.27.14 kernels and use virtio drivers.
The problem happens only _sometimes_. Out of 9 guests I have running on 
this host, I saw this problem only on 3 guests. I never saw this 
happening on more than one guest at a time.

All three have 512 MB memory assigned, other guests have less memory.


I upgraded ~2 days ago to kvm-84 and the same just happened for a guest with 
256 MB memory.

Note how _time_ is different (similar timings are to other unaffected guests):

# ping -f -c 1 unaffected_guest

1 packets transmitted, 1 received, 0% packet loss, time 12313ms
rtt min/avg/max/mdev = 0.432/1.164/96.163/1.934 ms, pipe 7, ipg/ewma 
1.231/1.111 ms


# ping -f -c 1 affected_guest

1 packets transmitted, 1 received, 0% packet loss, time 135625ms
rtt min/avg/max/mdev = 0.807/14.228/55.569/5.779 ms, pipe 4, ipg/ewma 
13.563/8.601 ms


Running dd if=/dev/vda of=/dev/null on the affected guest reduces that a bit:

# ping -f -c 1 affected_guest

1 packets transmitted, 1 received, 0% packet loss, time 50469ms
rtt min/avg/max/mdev = 0.616/4.881/54.357/3.847 ms, pipe 5, ipg/ewma 
5.047/7.783 ms



Anyone? Is it a known bug?


--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-09 Thread Tomasz Chmielewski

Tomasz Chmielewski schrieb:

I upgraded ~2 days ago to kvm-84 and the same just happened for a guest 
with 256 MB memory.


Note how _time_ is different (similar timings are to other unaffected 
guests):


This is also pretty interesting:

# ping -c 10 unaffected guest
PING 192.168.4.4 (192.168.4.4) 56(84) bytes of data.
64 bytes from 192.168.4.4: icmp_seq=1 ttl=64 time=1.25 ms
64 bytes from 192.168.4.4: icmp_seq=2 ttl=64 time=1.58 ms
64 bytes from 192.168.4.4: icmp_seq=3 ttl=64 time=3.53 ms
64 bytes from 192.168.4.4: icmp_seq=4 ttl=64 time=1.43 ms
64 bytes from 192.168.4.4: icmp_seq=5 ttl=64 time=3.89 ms
64 bytes from 192.168.4.4: icmp_seq=6 ttl=64 time=3.43 ms
64 bytes from 192.168.4.4: icmp_seq=7 ttl=64 time=1.03 ms
64 bytes from 192.168.4.4: icmp_seq=8 ttl=64 time=1.36 ms
64 bytes from 192.168.4.4: icmp_seq=9 ttl=64 time=1.28 ms
64 bytes from 192.168.4.4: icmp_seq=10 ttl=64 time=1.78 ms

--- 192.168.4.4 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9091ms
rtt min/avg/max/mdev = 1.031/2.059/3.894/1.045 ms



How probable it is so many pings returned with exactly 1000 ms?

# ping -c 10 affected_guest
PING 192.168.4.5 (192.168.4.5) 56(84) bytes of data.
64 bytes from 192.168.4.5: icmp_seq=1 ttl=64 time=1009 ms
64 bytes from 192.168.4.5: icmp_seq=2 ttl=64 time=9.61 ms
64 bytes from 192.168.4.5: icmp_seq=3 ttl=64 time=1000 ms
64 bytes from 192.168.4.5: icmp_seq=4 ttl=64 time=1000 ms
64 bytes from 192.168.4.5: icmp_seq=5 ttl=64 time=1000 ms
64 bytes from 192.168.4.5: icmp_seq=6 ttl=64 time=992 ms
64 bytes from 192.168.4.5: icmp_seq=7 ttl=64 time=1000 ms
64 bytes from 192.168.4.5: icmp_seq=8 ttl=64 time=1001 ms
64 bytes from 192.168.4.5: icmp_seq=9 ttl=64 time=1000 ms
64 bytes from 192.168.4.5: icmp_seq=10 ttl=64 time=998 ms

--- 192.168.4.5 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 10025ms
rtt min/avg/max/mdev = 9.610/901.198/1009.161/297.222 ms, pipe 2


This one is with dd if=/dev/vda of=/dev/null running on the affected guest:

# ping -c 10 affected_guest
PING 192.168.4.5 (192.168.4.5) 56(84) bytes of data.
64 bytes from 192.168.4.5: icmp_seq=1 ttl=64 time=29.4 ms
64 bytes from 192.168.4.5: icmp_seq=2 ttl=64 time=4.56 ms
64 bytes from 192.168.4.5: icmp_seq=3 ttl=64 time=4.05 ms
64 bytes from 192.168.4.5: icmp_seq=4 ttl=64 time=4.20 ms
64 bytes from 192.168.4.5: icmp_seq=5 ttl=64 time=3.82 ms
64 bytes from 192.168.4.5: icmp_seq=6 ttl=64 time=2.47 ms
64 bytes from 192.168.4.5: icmp_seq=7 ttl=64 time=2.16 ms
64 bytes from 192.168.4.5: icmp_seq=8 ttl=64 time=3.89 ms
64 bytes from 192.168.4.5: icmp_seq=9 ttl=64 time=5.98 ms
64 bytes from 192.168.4.5: icmp_seq=10 ttl=64 time=9.16 ms

--- 192.168.4.5 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9107ms
rtt min/avg/max/mdev = 2.169/6.978/29.439/7.714 ms



--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-09 Thread Avi Kivity

Tomasz Chmielewski wrote:
I have a strange slowness which affects some guests after they are 
running for some time. Slowness can happen a few hours after guest 
start, or, a couple of days after guest start.


What do I mean by slowness?

This is how long it takes to log in via SSH to an unaffected guest - 
below a second:


$ time ssh backupu...@normal_guest exit
0.02user 0.01system 0:00.67elapsed 4%CPU (0avgtext+0avgdata 0maxresident)

Now, let's try to log in to the affected guest running on the same 
host - more than 12 seconds:


$ time ssh backupu...@slow_guest exit
0.02user 0.01system 0:12.56elapsed 0%CPU (0avgtext+0avgdata 0maxresident)

If I log in via SSH to the affected guest, any key presses lag a 
second or two.



This is actually weird - if I run something IO intensive on the guest, 
the login is much faster (running CPU-intensive tasks makes no 
difference):


guest# dd if=/dev/vda of=/dev/null

$ time ssh backupu...@slow_guest exit
0.02user 0.00system 0:00.70elapsed 2%CPU (0avgtext+0avgdata 0maxresident)

Also, running ping -f slow_guest helps a lot and SSH logins are fast.


Look at the difference here - 7470ms vs 139183ms (and packet losses):

# ping -f -c 1 normal_guest

1 packets transmitted, 1 received, 0% packet loss, time 7470ms
rtt min/avg/max/mdev = 0.443/0.709/6.487/0.112 ms, ipg/ewma 
0.747/0.716 ms


# ping -f -c 1 slow_guest

1 packets transmitted, 9934 received, 0% packet loss, time 139183ms
rtt min/avg/max/mdev = 0.470/14.337/50.455/5.409 ms, pipe 4, ipg/ewma 
13.919/14.788 ms



CPU-intensive tasks are as fast as on unaffected guests.
Reading from /dev/vda is as fast as on unaffected guests.

So the only thing broken seems to be the network.


Rebooting the guest does not help - it is still slow.
The only thing that helps is stopping the guest and starting it again 
(i.e., stopping kvm process and starting a new one).



Is there an explanation to this phenomenon? Looks like a problem with 
virtio drivers somewhere, or?




The host is running kvm-83.
Affected guests are running 2.6.27.14 kernels and use virtio drivers.
The problem happens only _sometimes_. Out of 9 guests I have running 
on this host, I saw this problem only on 3 guests. I never saw this 
happening on more than one guest at a time.

All three have 512 MB memory assigned, other guests have less memory.



I'm guessing there's a problem with timers or timer interrupts.

What is the host cpu?

Does the problem occur if you pin a guest to a cpu with taskset?

--
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-09 Thread Tomasz Chmielewski

Avi Kivity schrieb:


I'm guessing there's a problem with timers or timer interrupts.

What is the host cpu?


4 entries like this in /proc/cpuinfo:

processor   : 3
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 65
model name  : Dual-Core AMD Opteron(tm) Processor 2212
stepping: 2
cpu MHz : 2000.000
cache size  : 1024 KB
physical id : 1
siblings: 2
core id : 1
cpu cores   : 2
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext 
fxsr_opt rdtscp lm3dnowext 3dnow rep_good pni cx16 lahf_lm cmp_legacy 
svm extapic cr8_legacy

bogomips: 3993.03
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc



Does the problem occur if you pin a guest to a cpu with taskset?


Like this?

# taskset -p 01 22906

(doesn't help)


# taskset -p 02 22906

(doesn't help)


But if I do:

# taskset -p 03 22906

or

# taskset -p 04 22906

it fixes it _rarely_ for the first few seconds, then it's broken again, 
until I switch the CPUs again (look at ping 9 and 10; other pings are 
also slow, unaffected guests are around 1 ms):


# ping -c 10 192.168.113.85
PING 192.168.113.85 (192.168.113.85) 56(84) bytes of data.
64 bytes from 192.168.113.85: icmp_seq=1 ttl=64 time=22.0 ms
64 bytes from 192.168.113.85: icmp_seq=2 ttl=64 time=23.7 ms
64 bytes from 192.168.113.85: icmp_seq=3 ttl=64 time=2.96 ms
64 bytes from 192.168.113.85: icmp_seq=4 ttl=64 time=51.3 ms
64 bytes from 192.168.113.85: icmp_seq=5 ttl=64 time=22.2 ms
64 bytes from 192.168.113.85: icmp_seq=6 ttl=64 time=1.60 ms
64 bytes from 192.168.113.85: icmp_seq=7 ttl=64 time=49.8 ms
64 bytes from 192.168.113.85: icmp_seq=8 ttl=64 time=23.3 ms
64 bytes from 192.168.113.85: icmp_seq=9 ttl=64 time=999 ms
64 bytes from 192.168.113.85: icmp_seq=10 ttl=64 time=822 ms


--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-09 Thread Avi Kivity

Tomasz Chmielewski wrote:

Avi Kivity schrieb:


I'm guessing there's a problem with timers or timer interrupts.

What is the host cpu?


4 entries like this in /proc/cpuinfo:

processor   : 3
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 65
model name  : Dual-Core AMD Opteron(tm) Processor 2212



That's probably the kvmclock issue that hit older AMDs.  It was fixed in 
kvm-84, please try that.



Does the problem occur if you pin a guest to a cpu with taskset?


Like this?

# taskset -p 01 22906



I meant 'taskset 01 qemu ...' but it wouldn't have helped if it's kvmclock.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-09 Thread Tomasz Chmielewski

Avi Kivity schrieb:

Tomasz Chmielewski wrote:

Avi Kivity schrieb:


I'm guessing there's a problem with timers or timer interrupts.

What is the host cpu?


4 entries like this in /proc/cpuinfo:

processor   : 3
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 65
model name  : Dual-Core AMD Opteron(tm) Processor 2212



That's probably the kvmclock issue that hit older AMDs.  It was fixed in 
kvm-84, please try that.


It is kvm-84, I have it running since Saturday (but I had this issue 
with kvm-83 as well).


# dmesg | grep kvm
(...)
loaded kvm module (kvm-84)

# modinfo kvm
filename:   /lib/modules/2.6.24-2-pve/kernel/arch/x86/kvm/kvm.ko
version:kvm-84

# kvm -h
QEMU PC emulator version 0.9.1 (kvm-84), Copyright (c) 2003-2008 Fabrice 
Bellard




Does the problem occur if you pin a guest to a cpu with taskset?


Like this?

# taskset -p 01 22906



I meant 'taskset 01 qemu ...' but it wouldn't have helped if it's kvmclock.


It can be done on a running process as well (22906 is the PID of the 
affected guest).
And the issue is hard to reproduce (shows up after 1-7 days on a random 
guest).



--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-09 Thread Avi Kivity

Tomasz Chmielewski wrote:

Avi Kivity schrieb:

Tomasz Chmielewski wrote:

Avi Kivity schrieb:


I'm guessing there's a problem with timers or timer interrupts.

What is the host cpu?


4 entries like this in /proc/cpuinfo:

processor   : 3
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 65
model name  : Dual-Core AMD Opteron(tm) Processor 2212



That's probably the kvmclock issue that hit older AMDs.  It was fixed 
in kvm-84, please try that.


It is kvm-84, I have it running since Saturday (but I had this issue 
with kvm-83 as well).




And the problem continues?

What's your current clocksource (in the guest)?  Does changing it help?

See /sys/devices/system/clocksource/clocksource0/*.



I meant 'taskset 01 qemu ...' but it wouldn't have helped if it's 
kvmclock.


It can be done on a running process as well (22906 is the PID of the 
affected gue


Right, but if the guest is poisoned somehow, this won't help.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-09 Thread Tomasz Chmielewski

Avi Kivity schrieb:


I'm guessing there's a problem with timers or timer interrupts.

What is the host cpu?


4 entries like this in /proc/cpuinfo:

processor   : 3
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 65
model name  : Dual-Core AMD Opteron(tm) Processor 2212



That's probably the kvmclock issue that hit older AMDs.  It was fixed 
in kvm-84, please try that.


It is kvm-84, I have it running since Saturday (but I had this issue 
with kvm-83 as well).




And the problem continues?

What's your current clocksource (in the guest)?  Does changing it help?

See /sys/devices/system/clocksource/clocksource0/*.


It was kvm-clock.
I tried changing it to acpi_pm, jiffies, tsc, but it made no difference.


I meant 'taskset 01 qemu ...' but it wouldn't have helped if it's 
kvmclock.


It can be done on a running process as well (22906 is the PID of the 
affected gue


Right, but if the guest is poisoned somehow, this won't help.


Yep, it seems poisoned.
I'll start the guest again in the evening, will add it a e1000 card.

If the problem reappears, it would be good to see if it affect only 
virtio card or not (I've never seen this issue on a guest which doesn't 
use virtio drivers - so far at least).



--
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-09 Thread Tomasz Chmielewski

Tomasz Chmielewski schrieb:

Avi Kivity schrieb:


I'm guessing there's a problem with timers or timer interrupts.

What is the host cpu?


4 entries like this in /proc/cpuinfo:

processor   : 3
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 65
model name  : Dual-Core AMD Opteron(tm) Processor 2212



That's probably the kvmclock issue that hit older AMDs.  It was 
fixed in kvm-84, please try that.


It is kvm-84, I have it running since Saturday (but I had this issue 
with kvm-83 as well).




And the problem continues?

What's your current clocksource (in the guest)?  Does changing it help?

See /sys/devices/system/clocksource/clocksource0/*.


It was kvm-clock.
I tried changing it to acpi_pm, jiffies, tsc, but it made no difference.


Actually, I don't think that I checked tsc, because when I changed to jiffies, 
the time has stopped:

# echo jiffies  
/sys/devices/system/clocksource/clocksource0/current_clocksource
# date
Mon Mar  9 12:29:00 CET 2009
# date
Mon Mar  9 12:29:00 CET 2009
# date
Mon Mar  9 12:29:00 CET 2009
# date
Mon Mar  9 12:29:00 CET 2009
# date
Mon Mar  9 12:29:00 CET 2009

And I couldn't change to anything else any more:

# echo tsc  /sys/devices/system/clocksource/clocksource0/current_clocksource
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
jiffies
# echo kvm-clock  
/sys/devices/system/clocksource/clocksource0/current_clocksource
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
jiffies


So I had to kill the guest and start it again (the above is reproduced on 
another,
non-poisoned guest).


--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-09 Thread Avi Kivity

Tomasz Chmielewski wrote:


It was kvm-clock.
I tried changing it to acpi_pm, jiffies, tsc, but it made no difference.


Actually, I don't think that I checked tsc, because when I changed to 
jiffies, the time has stopped:


# echo jiffies  
/sys/devices/system/clocksource/clocksource0/current_clocksource

# date
Mon Mar  9 12:29:00 CET 2009
# date
Mon Mar  9 12:29:00 CET 2009
# date
Mon Mar  9 12:29:00 CET 2009


can you post some /proc/interrupt dumps from the guest?  I guess the 
timer interrupt isn't working.


Does -no-kvm-irqchip help?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-09 Thread Tomasz Chmielewski

Avi Kivity schrieb:

Tomasz Chmielewski wrote:


It was kvm-clock.
I tried changing it to acpi_pm, jiffies, tsc, but it made no difference.


Actually, I don't think that I checked tsc, because when I changed to 
jiffies, the time has stopped:


# echo jiffies  
/sys/devices/system/clocksource/clocksource0/current_clocksource

# date
Mon Mar  9 12:29:00 CET 2009
# date
Mon Mar  9 12:29:00 CET 2009
# date
Mon Mar  9 12:29:00 CET 2009


can you post some /proc/interrupt dumps from the guest?  I guess the 
timer interrupt isn't working.


We're touching another issue from my original one (guest slowness) 
here, I suppose.


But there are new interrupts here, when I set the clocksource to 
jiffies (setting to jiffies also kills my serial console connection 
- no key press go through to the guest any more):


# cat /proc/interrupts
   CPU0
  0:104   IO-APIC-edge  timer
  1:  6   IO-APIC-edge  i8042
  4:480   IO-APIC-edge  serial
  6:  2   IO-APIC-edge  floppy
  7:  0   IO-APIC-edge  parport0
  8:  2   IO-APIC-edge  rtc0
  9:  0   IO-APIC-fasteoi   acpi
 10:   4400   IO-APIC-fasteoi   virtio0, virtio2, virtio4
 11:   1550   IO-APIC-fasteoi   uhci_hcd:usb1, virtio1, virtio3
 12: 89   IO-APIC-edge  i8042
 14:  0   IO-APIC-edge  ide0
 15: 30   IO-APIC-edge  ide1
NMI:  0   Non-maskable interrupts
LOC:  85231   Local timer interrupts
RES:  0   Rescheduling interrupts
CAL:  0   function call interrupts
TLB:  0   TLB shootdowns
TRM:  0   Thermal event interrupts
SPU:  0   Spurious interrupts
ERR:  0
MIS:  0

# cat /proc/interrupts
   CPU0
  0:104   IO-APIC-edge  timer
  1:  6   IO-APIC-edge  i8042
  4:486   IO-APIC-edge  serial
  6:  2   IO-APIC-edge  floppy
  7:  0   IO-APIC-edge  parport0
  8:  2   IO-APIC-edge  rtc0
  9:  0   IO-APIC-fasteoi   acpi
 10:   4461   IO-APIC-fasteoi   virtio0, virtio2, virtio4
 11:   1590   IO-APIC-fasteoi   uhci_hcd:usb1, virtio1, virtio3
 12: 89   IO-APIC-edge  i8042
 14:  0   IO-APIC-edge  ide0
 15: 30   IO-APIC-edge  ide1
NMI:  0   Non-maskable interrupts
LOC: 108361   Local timer interrupts
RES:  0   Rescheduling interrupts
CAL:  0   function call interrupts
TLB:  0   TLB shootdowns
TRM:  0   Thermal event interrupts
SPU:  0   Spurious interrupts
ERR:  0
MIS:  0




Does -no-kvm-irqchip help?


Nope, it doesn't - with jiffies, time always stops.


--
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


strange guest slowness after some time

2009-03-07 Thread Tomasz Chmielewski
I have a strange slowness which affects some guests after they are 
running for some time. Slowness can happen a few hours after guest 
start, or, a couple of days after guest start.


What do I mean by slowness?

This is how long it takes to log in via SSH to an unaffected guest - 
below a second:


$ time ssh backupu...@normal_guest exit
0.02user 0.01system 0:00.67elapsed 4%CPU (0avgtext+0avgdata 0maxresident)

Now, let's try to log in to the affected guest running on the same host 
- more than 12 seconds:


$ time ssh backupu...@slow_guest exit
0.02user 0.01system 0:12.56elapsed 0%CPU (0avgtext+0avgdata 0maxresident)

If I log in via SSH to the affected guest, any key presses lag a second 
or two.



This is actually weird - if I run something IO intensive on the guest, 
the login is much faster (running CPU-intensive tasks makes no difference):


guest# dd if=/dev/vda of=/dev/null

$ time ssh backupu...@slow_guest exit
0.02user 0.00system 0:00.70elapsed 2%CPU (0avgtext+0avgdata 0maxresident)

Also, running ping -f slow_guest helps a lot and SSH logins are fast.


Look at the difference here - 7470ms vs 139183ms (and packet losses):

# ping -f -c 1 normal_guest

1 packets transmitted, 1 received, 0% packet loss, time 7470ms
rtt min/avg/max/mdev = 0.443/0.709/6.487/0.112 ms, ipg/ewma 0.747/0.716 ms

# ping -f -c 1 slow_guest

1 packets transmitted, 9934 received, 0% packet loss, time 139183ms
rtt min/avg/max/mdev = 0.470/14.337/50.455/5.409 ms, pipe 4, ipg/ewma 
13.919/14.788 ms



CPU-intensive tasks are as fast as on unaffected guests.
Reading from /dev/vda is as fast as on unaffected guests.

So the only thing broken seems to be the network.


Rebooting the guest does not help - it is still slow.
The only thing that helps is stopping the guest and starting it again 
(i.e., stopping kvm process and starting a new one).



Is there an explanation to this phenomenon? Looks like a problem with 
virtio drivers somewhere, or?




The host is running kvm-83.
Affected guests are running 2.6.27.14 kernels and use virtio drivers.
The problem happens only _sometimes_. Out of 9 guests I have running on 
this host, I saw this problem only on 3 guests. I never saw this 
happening on more than one guest at a time.

All three have 512 MB memory assigned, other guests have less memory.


--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-07 Thread Johannes Baumann
are your nameservers ok?
ssh is reveres checking your ip, if your nameserver is not
available login may take some time.

johannes

Tomasz Chmielewski schrieb:
 I have a strange slowness which affects some guests after they are
 running for some time. Slowness can happen a few hours after guest
 start, or, a couple of days after guest start.
 
 What do I mean by slowness?
 
 This is how long it takes to log in via SSH to an unaffected guest -
 below a second:
 
 $ time ssh backupu...@normal_guest exit
 0.02user 0.01system 0:00.67elapsed 4%CPU (0avgtext+0avgdata 0maxresident)
 
 Now, let's try to log in to the affected guest running on the same host
 - more than 12 seconds:
 
 $ time ssh backupu...@slow_guest exit
 0.02user 0.01system 0:12.56elapsed 0%CPU (0avgtext+0avgdata 0maxresident)
 
 If I log in via SSH to the affected guest, any key presses lag a second
 or two.
 
 
 This is actually weird - if I run something IO intensive on the guest,
 the login is much faster (running CPU-intensive tasks makes no difference):
 
 guest# dd if=/dev/vda of=/dev/null
 
 $ time ssh backupu...@slow_guest exit
 0.02user 0.00system 0:00.70elapsed 2%CPU (0avgtext+0avgdata 0maxresident)
 
 Also, running ping -f slow_guest helps a lot and SSH logins are fast.
 
 
 Look at the difference here - 7470ms vs 139183ms (and packet losses):
 
 # ping -f -c 1 normal_guest
 
 1 packets transmitted, 1 received, 0% packet loss, time 7470ms
 rtt min/avg/max/mdev = 0.443/0.709/6.487/0.112 ms, ipg/ewma 0.747/0.716 ms
 
 # ping -f -c 1 slow_guest
 
 1 packets transmitted, 9934 received, 0% packet loss, time 139183ms
 rtt min/avg/max/mdev = 0.470/14.337/50.455/5.409 ms, pipe 4, ipg/ewma
 13.919/14.788 ms
 
 
 CPU-intensive tasks are as fast as on unaffected guests.
 Reading from /dev/vda is as fast as on unaffected guests.
 
 So the only thing broken seems to be the network.
 
 
 Rebooting the guest does not help - it is still slow.
 The only thing that helps is stopping the guest and starting it again
 (i.e., stopping kvm process and starting a new one).
 
 
 Is there an explanation to this phenomenon? Looks like a problem with
 virtio drivers somewhere, or?
 
 
 
 The host is running kvm-83.
 Affected guests are running 2.6.27.14 kernels and use virtio drivers.
 The problem happens only _sometimes_. Out of 9 guests I have running on
 this host, I saw this problem only on 3 guests. I never saw this
 happening on more than one guest at a time.
 All three have 512 MB memory assigned, other guests have less memory.
 
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-07 Thread Tomasz Chmielewski

Johannes Baumann schrieb:

are your nameservers ok?
ssh is reveres checking your ip, if your nameserver is not
available login may take some time.


Nameservers were fine.
If they were wrong, it would affect all other guests, or?

Also, to my knowledge, nameservers normally do not affect ping losses 
and/or ping roundtrip times ;)


dd if=/dev/vda of=/dev/null curing the problem also excludes the 
nameserver idea.


--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html