Re: strange guest slowness after some time

2009-03-07 Thread Tomasz Chmielewski

Johannes Baumann schrieb:

are your nameservers ok?
ssh is reveres checking your ip, if your nameserver is not
available login may take some time.


Nameservers were fine.
If they were wrong, it would affect all other guests, or?

Also, to my knowledge, nameservers normally do not affect ping losses 
and/or ping roundtrip times ;)


"dd if=/dev/vda of=/dev/null" curing the problem also excludes the 
nameserver idea.


--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-03-07 Thread Johannes Baumann
are your nameservers ok?
ssh is reveres checking your ip, if your nameserver is not
available login may take some time.

johannes

Tomasz Chmielewski schrieb:
> I have a strange slowness which affects some guests after they are
> running for some time. "Slowness" can happen a few hours after guest
> start, or, a couple of days after guest start.
> 
> What do I mean by "slowness"?
> 
> This is how long it takes to log in via SSH to an unaffected guest -
> below a second:
> 
> $ time ssh backupu...@normal_guest exit
> 0.02user 0.01system 0:00.67elapsed 4%CPU (0avgtext+0avgdata 0maxresident)
> 
> Now, let's try to log in to the affected guest running on the same host
> - more than 12 seconds:
> 
> $ time ssh backupu...@slow_guest exit
> 0.02user 0.01system 0:12.56elapsed 0%CPU (0avgtext+0avgdata 0maxresident)
> 
> If I log in via SSH to the affected guest, any key presses lag a second
> or two.
> 
> 
> This is actually weird - if I run something IO intensive on the guest,
> the login is much faster (running CPU-intensive tasks makes no difference):
> 
> guest# dd if=/dev/vda of=/dev/null
> 
> $ time ssh backupu...@slow_guest exit
> 0.02user 0.00system 0:00.70elapsed 2%CPU (0avgtext+0avgdata 0maxresident)
> 
> Also, running "ping -f " helps a lot and SSH logins are fast.
> 
> 
> Look at the difference here - 7470ms vs 139183ms (and packet losses):
> 
> # ping -f -c 1 normal_guest
> 
> 1 packets transmitted, 1 received, 0% packet loss, time 7470ms
> rtt min/avg/max/mdev = 0.443/0.709/6.487/0.112 ms, ipg/ewma 0.747/0.716 ms
> 
> # ping -f -c 1 slow_guest
> 
> 1 packets transmitted, 9934 received, 0% packet loss, time 139183ms
> rtt min/avg/max/mdev = 0.470/14.337/50.455/5.409 ms, pipe 4, ipg/ewma
> 13.919/14.788 ms
> 
> 
> CPU-intensive tasks are as fast as on unaffected guests.
> Reading from /dev/vda is as fast as on unaffected guests.
> 
> So the only thing broken seems to be the network.
> 
> 
> Rebooting the guest does not help - it is still slow.
> The only thing that helps is stopping the guest and starting it again
> (i.e., stopping kvm process and starting a new one).
> 
> 
> Is there an explanation to this phenomenon? Looks like a problem with
> virtio drivers somewhere, or?
> 
> 
> 
> The host is running kvm-83.
> Affected guests are running 2.6.27.14 kernels and use virtio drivers.
> The problem happens only _sometimes_. Out of 9 guests I have running on
> this host, I saw this problem only on 3 guests. I never saw this
> happening on more than one guest at a time.
> All three have 512 MB memory assigned, other guests have less memory.
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


strange guest slowness after some time

2009-03-07 Thread Tomasz Chmielewski
I have a strange slowness which affects some guests after they are 
running for some time. "Slowness" can happen a few hours after guest 
start, or, a couple of days after guest start.


What do I mean by "slowness"?

This is how long it takes to log in via SSH to an unaffected guest - 
below a second:


$ time ssh backupu...@normal_guest exit
0.02user 0.01system 0:00.67elapsed 4%CPU (0avgtext+0avgdata 0maxresident)

Now, let's try to log in to the affected guest running on the same host 
- more than 12 seconds:


$ time ssh backupu...@slow_guest exit
0.02user 0.01system 0:12.56elapsed 0%CPU (0avgtext+0avgdata 0maxresident)

If I log in via SSH to the affected guest, any key presses lag a second 
or two.



This is actually weird - if I run something IO intensive on the guest, 
the login is much faster (running CPU-intensive tasks makes no difference):


guest# dd if=/dev/vda of=/dev/null

$ time ssh backupu...@slow_guest exit
0.02user 0.00system 0:00.70elapsed 2%CPU (0avgtext+0avgdata 0maxresident)

Also, running "ping -f " helps a lot and SSH logins are fast.


Look at the difference here - 7470ms vs 139183ms (and packet losses):

# ping -f -c 1 normal_guest

1 packets transmitted, 1 received, 0% packet loss, time 7470ms
rtt min/avg/max/mdev = 0.443/0.709/6.487/0.112 ms, ipg/ewma 0.747/0.716 ms

# ping -f -c 1 slow_guest

1 packets transmitted, 9934 received, 0% packet loss, time 139183ms
rtt min/avg/max/mdev = 0.470/14.337/50.455/5.409 ms, pipe 4, ipg/ewma 
13.919/14.788 ms



CPU-intensive tasks are as fast as on unaffected guests.
Reading from /dev/vda is as fast as on unaffected guests.

So the only thing broken seems to be the network.


Rebooting the guest does not help - it is still slow.
The only thing that helps is stopping the guest and starting it again 
(i.e., stopping kvm process and starting a new one).



Is there an explanation to this phenomenon? Looks like a problem with 
virtio drivers somewhere, or?




The host is running kvm-83.
Affected guests are running 2.6.27.14 kernels and use virtio drivers.
The problem happens only _sometimes_. Out of 9 guests I have running on 
this host, I saw this problem only on 3 guests. I never saw this 
happening on more than one guest at a time.

All three have 512 MB memory assigned, other guests have less memory.


--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Houston, we have May 15, 1953 (says guest when host uses cpufreq, and dies)

2009-03-07 Thread Tomasz Chmielewski

Marcelo Tosatti schrieb:


flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 
3dnowext 3dnow rep_good pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips: 3993.20
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc


kvm-84 as mentioned. Sorry. 



It is not stable for me (although I did go through exactly the same routine with kvm-83) - I got an oops. 
I'm not sure if the problem is kvm or cpufreq.


This is what I did:
- stopped all kvm-83 guests, removed all kvm-83 modules
- inserted kvm-83 modules, started 9 guests with kvm-84 binary
- inserted cpufreq modules and set the governor to ondemand
- cat /proc/cpuinfo revealed that CPUs are still running at full speed, so I 
did:

cat /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor

but it didn't return and cat was in "D" state.

- I stopped all guests, removed kvm-84 modules, and got this oops:


Unable to handle kernel paging request at 88429721 RIP:
[]
PGD 203067 PUD 207063 PMD 1184dc067 PTE 0
Oops: 0010 [1] PREEMPT SMP
CPU: 0
Modules linked in: loop cpufreq_ondemand powernow_k8 freq_table crc32c 
libcrc32c vzethdev vznetdev simfs vzrst vzcpt tun vzdquota vzmon ipv6 vzdev 
xt_tcpudp xt_length ipt_ttl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter 
xt_multiport xt_limit ipt_tos ipt_REJECT ip_tables x_tables ib_iser rdma_cm 
ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi 
scsi_transport_iscsi bridge 8021q bonding dm_snapshot dm_mirror dm_multipath 
dm_mod joydev usbhid hid sata_svw pata_serverworks psmouse ehci_hcd ohci_hcd 
evdev thermal button ata_generic pata_acpi serio_raw i2c_piix4 tg3 container 
pcspkr libata usbcore processor i2c_core ssb shpchp pci_hotplug k8temp romfs 
isofs sd_mod sg mptsas mptscsih mptbase scsi_transport_sas scsi_mod raid1 md_mod
Pid: 20389, comm: kondemand/0 Not tainted 2.6.24-2-pve #1 ovz005
RIP: 0010:[]  []
RSP: 0018:810039d09d20  EFLAGS: 00010202
RAX: 0001 RBX: ef41d1f5 RCX: 806cf690
RDX:  RSI: 810080959000 RDI: 88450e80
RBP:  R08: 001e8480 R09: 
R10: 810001027ee0 R11: 0001 R12: 88450e90
R13: 810039d09df0 R14:  R15: 
FS:  7f93a7d736d0() GS:8060b000() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 88429721 CR3: 00201000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process kondemand/0 (pid: 20389, veid=0, threadinfo 810039d08000, task 
810039d06000)
Stack:  ef41d1f5  88451950 810039d09df0
0001 804aa671 8077b1f0 810039d09df0
  8077b1c0 8025ca0a
Call Trace:
[] notifier_call_chain+0x51/0x70
[] __srcu_notifier_call_chain+0x5a/0x90
[] cpufreq_notify_transition+0x7d/0xb0
[] :powernow_k8:powernowk8_target+0x2bf/0x690
[] :cpufreq_ondemand:do_dbs_timer+0x26a/0x300
[] :cpufreq_ondemand:do_dbs_timer+0x0/0x300
[] run_workqueue+0x88/0x120
[] worker_thread+0x0/0x130
[] worker_thread+0xc5/0x130
[] autoremove_wake_function+0x0/0x30
[] worker_thread+0x0/0x130
[] worker_thread+0x0/0x130
[] kthread+0x4b/0x80
[] child_rip+0xa/0x12
[] kthread+0x0/0x80
[] child_rip+0x0/0x12


Code:  Bad RIP value.
RIP  []
RSP 
CR2: 88429721
---[ end trace e6b0e16fe814aeb1 ]---
note: kondemand/0[20389] exited with preempt_count 1



--
Tomasz Chmielewski
http://wpkg.org 
--

To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html