Re: [GIT PULL] vfio fix for 3.9-rc7

2013-04-09 Thread richard -rw- weinberger
On Tue, Apr 9, 2013 at 9:05 PM, Alex Williamson
alex.william...@redhat.com wrote:
 Hi Linus,

 Here's one small fix for vfio that I'd like to get in for 3.9;
 tightening the range checking around a vfio ioctl.  Thanks!

 Alex

 The following changes since commit 25e9789ddd9d14a8971f4a421d04f282719ab733:

   vfio: include linux/slab.h for kmalloc (2013-03-15 12:58:20 -0600)

 are available in the git repository at:

   git://github.com/awilliam/linux-vfio.git tags/vfio-v3.9-rc7

 for you to fetch changes up to 904c680c7bf016a8619a045850937427f8d7368c:

   vfio-pci: Fix possible integer overflow (2013-03-26 11:33:16 -0600)

Is this commit queued for -stable?
I don't see a stable tag in the commit message.

Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/GIT PULL] Linux KVM tool for v3.2

2011-11-08 Thread richard -rw- weinberger
Pekka,

On Fri, Nov 4, 2011 at 9:38 AM, Pekka Enberg penb...@cs.helsinki.fi wrote:
    ./kvm run

 We also support booting both raw images and QCOW2 images in read-only
 mode:

    ./kvm run -d debian_squeeze_amd64_standard.qcow2 -p root=/dev/vda1


I'm trying to use the kvm tool, but the virtio_blk userland seems to freeze.
Any idea what happens here?

./kvm run -d /scratch/rw/fc14_64_image.qcow2 -p root=/dev/vda1
nolapic init=/bin/sh notsc hpet=disable debug
  Warning: Forcing read-only support for QCOW
  # kvm run -k ../../arch/x86/boot/bzImage -m 320 -c 2 --name guest-9815
eeaarrllyy  ccoonnssoollee  iinn  sseettuupp  ccooddee

early console in decompress_kernel

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[0.00] Linux version 3.1.0-00905-ge0cab7f (rw@inhelltoy) (gcc
version 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #3 Tue Nov 8 14:14:24
CET 2011
[0.00] Command line: notsc noapic noacpi pci=conf1 reboot=k
panic=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 console=ttyS0
earlyprintk=serial i8042.noaux=1 root=/dev/vda1 nolapic init=/bin/sh
notsc hpet=disable debug
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009fc00 (usable)
[0.00]  BIOS-e820: 0009fc00 - 000a (reserved)
[0.00]  BIOS-e820: 000f - 000f (reserved)
[0.00]  BIOS-e820: 0010 - 1400 (usable)
[0.00] bootconsole [earlyser0] enabled
[0.00] NX (Execute Disable) protection: active
[0.00] DMI not present or invalid.
[0.00] e820 update range:  - 0001
(usable) == (reserved)
[0.00] e820 remove range: 000a - 0010 (usable)
[0.00] No AGP bridge found
[0.00] last_pfn = 0x14000 max_arch_pfn = 0x4
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges disabled:
[0.00]   0-F uncachable
[0.00] MTRR variable ranges disabled:
[0.00]   0 disabled
[0.00]   1 disabled
[0.00]   2 disabled
[0.00]   3 disabled
[0.00]   4 disabled
[0.00]   5 disabled
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
[0.00] CPU MTRRs all blank - virtualized system.
[0.00] found SMP MP-table at [880f0370] f0370
[0.00] initial memory mapped : 0 - 2000
[0.00] Base memory trampoline at [8809a000] 9a000 size 20480
[0.00] init_memory_mapping: -1400
[0.00]  00 - 001400 page 2M
[0.00] kernel direct mapping tables up to 1400 @ 13ffe000-1400
[0.00] ACPI Error: A valid RSDP was not found (20110623/tbxfroot-219)
[0.00]  [ea00-ea5f] PMD -
[88001300-8800135f] on node 0
[0.00] Zone PFN ranges:
[0.00]   DMA  0x0010 - 0x1000
[0.00]   DMA320x1000 - 0x0010
[0.00]   Normal   empty
[0.00] Movable zone start PFN for each node
[0.00] early_node_map[2] active PFN ranges
[0.00] 0: 0x0010 - 0x009f
[0.00] 0: 0x0100 - 0x00014000
[0.00] On node 0 totalpages: 81807
[0.00]   DMA zone: 64 pages used for memmap
[0.00]   DMA zone: 5 pages reserved
[0.00]   DMA zone: 3914 pages, LIFO batch:0
[0.00]   DMA32 zone: 1216 pages used for memmap
[0.00]   DMA32 zone: 76608 pages, LIFO batch:15
[0.00] Intel MultiProcessor Specification v1.4
[0.00] MPTABLE: OEM ID: KVMCPU00
[0.00] MPTABLE: Product ID: 0.1
[0.00] MPTABLE: APIC at: 0xFEE0
[0.00] Processor #0 (Bootup-CPU)
[0.00] Processor #1
[0.00] ACPI: NR_CPUS/possible_cpus limit of 1 reached.
Processor 1/0x1 ignored.
[0.00] IOAPIC[0]: apic_id 3, version 17, address 0xfec0, GSI 0-23
[0.00] Processors: 1
[0.00] nr_irqs_gsi: 40
[0.00] Allocating PCI resources starting at 1400 (gap:
1400:ec00)
[0.00] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[0.00] pcpu-alloc: [0] 0
[0.00] Built 1 zonelists in Zone order, mobility grouping on.
Total pages: 80522
[0.00] Kernel command line: notsc noapic noacpi pci=conf1
reboot=k panic=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1
console=ttyS0 earlyprintk=serial i8042.noaux=1 root=/dev/vda1 nolapic
init=/bin/sh notsc hpet=disable debug
[0.00] notsc: Kernel compiled with CONFIG_X86_TSC, cannot
disable TSC completely.
[0.00] notsc: Kernel compiled with CONFIG_X86_TSC, cannot
disable TSC completely.
[0.00] PID hash table entries: 2048 (order: 2, 16384 bytes)
[0.00] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[0.00] Inode-cache hash table entries: 32768 (order: 6

Re: [PATCH 8/8] arch/x86: use printk_ratelimited instead of printk_ratelimit

2011-06-06 Thread richard -rw- weinberger
On Mon, Jun 6, 2011 at 8:35 AM, Rusty Russell ru...@rustcorp.com.au wrote:
 On Sat, 4 Jun 2011 17:37:04 +0200, Christian Dietrich 
 christian.dietr...@informatik.uni-erlangen.de wrote:
 Since printk_ratelimit() shouldn't be used anymore (see comment in
 include/linux/printk.h), replace it with printk_ratelimited.

 Acked-by: Rusty Russell ru...@rustcorp.com.au (lguest part)

 diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
 index db832fd..23a6eff 100644
 --- a/arch/x86/lguest/boot.c
 +++ b/arch/x86/lguest/boot.c
 @@ -56,6 +56,7 @@
  #include linux/lguest_launcher.h
  #include linux/virtio_console.h
  #include linux/pm.h
 +#include linux/ratelimit.h
  #include asm/apic.h
  #include asm/lguest.h
  #include asm/paravirt.h

 Is this new include really needed?  The printk_ratelimited() definition
 is in printk.h...

Yes.
printk_ratelimited() needs DEFINE_RATELIMIT_STATE() which is defined
in ratelimit.h.

-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I/O Performance Tips

2010-12-09 Thread RW
We've don't use Ubuntu (we use Gentoo/KVM 0.12.5) but we've had a 
similar problem with kernels 2.6.32-r11 (this is Gentoo specific and 
means update -r11 and not release candidate 11) and with the first three 
releases of 2.6.34 especially when NFS was involved. We're currently 
running 2.6.32-r11 and 2.6.32-r20 kernels without this problems. And 
we're using deadline scheduler in host and guests. The VM images are 
stored on a LVM volume. Maybe a kernel update will help (if available of 
course...). We're also running all VMs with cache=none.


You should also send the qemu-kvm options you're using and version to 
the list. Maybe someone else could help further.


- Robert


On 12/09/10 09:10, Sebastian Nickel - Hetzner Online AG wrote:

Hello,
we have got some issues with I/O in our kvm environment. We are using
kernel version 2.6.32 (Ubuntu 10.04 LTS) to virtualise our hosts and we
are using ksm, too. Recently we noticed that sometimes the guest systems
(mainly OpenSuse guest systems) suddenly have a read only filesystem.
After some inspection we found out that the guest system generates some
ata errors due to timeouts (mostly in flush cache situations). On the
physical host there are always the same kernel messages when this
happens:


[1508127.195469] INFO: task kjournald:497 blocked for more than 120
seconds.
[1508127.212828] echo 0  /proc/sys/kernel/hung_task_timeout_secs
disables this message.
[1508127.246841] kjournald D  0   497  2
0x
[1508127.246848]  88062128dba0 0046 00015bc0
00015bc0
[1508127.246855]  880621089ab0 88062128dfd8 00015bc0
8806210896f0
[1508127.246862]  00015bc0 88062128dfd8 00015bc0
880621089ab0
[1508127.246868] Call Trace:
[1508127.246880]  [8116e500] ? sync_buffer+0x0/0x50
[1508127.246889]  [81557d87] io_schedule+0x47/0x70
[1508127.246893]  [8116e545] sync_buffer+0x45/0x50
[1508127.246897]  [8155825a] __wait_on_bit_lock+0x5a/0xc0
[1508127.246901]  [8116e500] ? sync_buffer+0x0/0x50
[1508127.246905]  [81558338] out_of_line_wait_on_bit_lock
+0x78/0x90
[1508127.246911]  [810850d0] ? wake_bit_function+0x0/0x40
[1508127.246915]  [8116e6c6] __lock_buffer+0x36/0x40
[1508127.246920]  [81213d11] journal_submit_data_buffers
+0x311/0x320
[1508127.246924]  [81213ff2] journal_commit_transaction
+0x2d2/0xe40
[1508127.246931]  [810397a9] ? default_spin_lock_flags
+0x9/0x10
[1508127.246935]  [81076c7c] ? lock_timer_base+0x3c/0x70
[1508127.246939]  [81077719] ? try_to_del_timer_sync+0x79/0xd0
[1508127.246943]  [81217f0d] kjournald+0xed/0x250
[1508127.246947]  [81085090] ? autoremove_wake_function
+0x0/0x40
[1508127.246951]  [81217e20] ? kjournald+0x0/0x250
[1508127.246954]  [81084d16] kthread+0x96/0xa0
[1508127.246959]  [810141ea] child_rip+0xa/0x20
[1508127.246962]  [81084c80] ? kthread+0x0/0xa0
[1508127.246966]  [810141e0] ? child_rip+0x0/0x20
[1508127.246969] INFO: task flush-251:0:505 blocked for more than 120
seconds.
[1508127.264076] echo 0  /proc/sys/kernel/hung_task_timeout_secs
disables this message.
[1508127.298343] flush-251:0   D 810f4370 0   505  2
0x
[1508127.298349]  880621fdba30 0046 00015bc0
00015bc0
[1508127.298354]  88062108b1a0 880621fdbfd8 00015bc0
88062108ade0
[1508127.298358]  00015bc0 880621fdbfd8 00015bc0
88062108b1a0
[1508127.298362] Call Trace:
[1508127.298370]  [810f4370] ? sync_page+0x0/0x50
[1508127.298375]  [81557d87] io_schedule+0x47/0x70
[1508127.298379]  [810f43ad] sync_page+0x3d/0x50
[1508127.298383]  [8155825a] __wait_on_bit_lock+0x5a/0xc0
[1508127.298391]  [810f4347] __lock_page+0x67/0x70
[1508127.298395]  [810850d0] ? wake_bit_function+0x0/0x40
[1508127.298402]  [810f4487] ? unlock_page+0x27/0x30
[1508127.298410]  [810fd9dd] write_cache_pages+0x3bd/0x4d0
[1508127.298417]  [810fc670] ? __writepage+0x0/0x40
[1508127.298425]  [810fdb14] generic_writepages+0x24/0x30
[1508127.298432]  [810fdb55] do_writepages+0x35/0x40
[1508127.298439]  [811668a6] writeback_single_inode+0xf6/0x3d0
[1508127.298449]  [812b81d6] ? rb_erase+0xd6/0x160
[1508127.298455]  [8116750e] writeback_inodes_wb+0x40e/0x5e0
[1508127.298462]  [811677ea] wb_writeback+0x10a/0x1d0
[1508127.298469]  [81077719] ? try_to_del_timer_sync+0x79/0xd0
[1508127.298477]  [8155803d] ? schedule_timeout+0x19d/0x300
[1508127.298485]  [81167b1c] wb_do_writeback+0x18c/0x1a0
[1508127.298493]  [81167b83] bdi_writeback_task+0x53/0xe0
[1508127.298503]  [8110f546] bdi_start_fn+0x86/0x100
[1508127.298510]  [8110f4c0] ? bdi_start_fn+0x0/0x100
[1508127.298518]  [81084d16] 

Re: Loss of network connectivity with high load

2010-06-22 Thread RW
I've had the same issues too. I'm using Gentoo.
Kernel packages  2.6.32-r7 (Release 7 not RC-7)
have had this problem. Haven't tested -r8 to -r10.
Kernel 2.6.32-r11 is currently working fine. 2.6.34
was working too but it crashed after two days with a
clock syncing issue. I'm hopping that 2.6.32-r11 doesn't
have this issue. It's currently running for 24 hours
now. KVM is 0.12.4.

- Robert


On 06/22/10 14:32, Thomas Besser wrote:
 Thomas Besser wrote:
 On Wednesday, 16 June 2010 17:14:54 +0300,
 Avi Kivity wrote:
 2.6.26 is really old.  Can you try a new kernel in the guest?  Recent
 kernels had many virtio fixes.
 I think we have the same problem here: after about 3 GB of transfered data
 with rsync, network (virtio) in the guest stops working (restart of
 network solves the network stop).

 Both host and guest Debian Lenny with 2.6.32. qemu-kvm is version 0.12.4.
 Will try 2.6.34 in guest and report, if the problem still exists.
 
 With a 2.6.34 guest kernel the problem seems to be gone (after about 6 GB 
 transferred data).
 
 Regards
 Thomas
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Corrupt qcow2 image, recovery?

2010-03-17 Thread RW
Liang Guo gave me this advice some weeks ago:

 you may use kvm-nbd or qemu-nbd to present kvm image
 as a NBD device, so
 that you can use nbd-client to access them. eg:

 kvm-nbd /vm/sid1.img
 modprobe nbd
 nbd-client localhost 1024 /dev/nbd0
 fdisk -l /dev/nbd0

Didn't work for me because I always got a segfault
but maybe it work's for you.

- Robert




On 03/16/10 19:21, Christian Nilsson wrote:
 Hi!
 
 I'm running kvm / qemu-kvm on a couple of production servers everything (or 
 at least most things) works as it should.
 However today someone thought it was a good idea to restart one of the 
 servers and after that the windows 2k3 guest on that server don't boot 
 anymore.
 
 kvm on this server is a bit outdated: QEMU PC emulator version 0.9.1 
 (kvm-83)
 (I guess this is one of the qcow2 corruption bugs, and i can only blame 
 myself for not upgrading kvm sooner.)
 The guest.qcow2 is a 21GiB file for a 60GiB disk
 
 i have tried a couple of things kvm-img convert -f qcow2 -O raw guest.qcow2 
 guest.raw
 this stops and does nothing after creating a guest.raw that is 60GiB but only 
 using 60MiB
 
 so mounted the fs from another server running: QEMU PC emulator version 
 0.12.1 (qemu-kvm-0.12.1.2)
 
 and run qemu-img with the same options as above and after a few secs got 
 qemu-img: error while reading
 and the same 60MiB used by guest.raw
 
 i also tried booting qemu-kvm with a linux guest and this qcow2 image but 
 only get I/O Errors (and no partitions found)
 
 # qemu-img check guest.qcow2
 ERROR: invalid cluster offset=0x10a000  
 ERROR OFLAG_COPIED: l2_offset=ee73 refcount=1   
 ERROR l2_offset=ee73: Table is not cluster aligned; L1 entry corrupted
 ERROR: invalid cluster offset=0x11d44100080   
 ERROR: invalid cluster offset=0x11d61600080   
 ERROR: invalid cluster offset=0x11d68600080   
 ERROR: invalid cluster offset=0x11d95300080
 (and a loot more in this style, full log can be provided if 
 it would be of help to anybody)
 
 
 
 is there any possibility to repair this file, or convert it to a RAW file 
 (even with parts padded that are not safe from the qcow2 image), or as a 
 last resort, are there any debug tools for qcow2 images that might be of use?
 
 I have read up on the qcow fileformat but right now i'm a bit short of time, 
 i need the data in this guests disk image, or at least the MS SQL datafiles 
 that are on this disk) i have also checked the qcow2 file and it do contain a 
 NTLDR string and a loot of other NTFS recognized strings so i know that all 
 data is not gone. the question is how can i access it as a Filesystem again?
 
 
 Any help would be appreciated!
 
 Regards
 Christian Nilsson
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network shutdown under load

2010-02-09 Thread RW
Thanks for the patch! It seems to solve the problem that
under load ( 50 MBit/s) the network goes down. I've applied
the patch to KVM 0.12.2 running Gentoo. Host and guest is running
kernel 2.6.32 currently (kernel 2.6.30 in guest and 2.6.32 in
host works also for us).

Another host doing the same jobs with the same amount of traffic
and configuration but with KVM 0.11.1 was shutting down
the network interface every 5-10 minutes today while the
patched 0.12.2 was running fine. During the time the 0.11.1 KVMs
were down the patched one delivered 200 MBit/s without
problems. Now both hosts running with the patched
version. We're expecting much more traffic tomorrow so if the
network is still up on thursday I would say the bug is fixed.

Thanks for that patch! It really was a lifesaver today :-)

- Robert


On 02/08/2010 05:10 PM, Tom Lendacky wrote:
 Fix a race condition where qemu finds that there are not enough virtio
 ring buffers available and the guest make more buffers available before
 qemu can enable notifications.

 Signed-off-by: Tom Lendacky t...@us.ibm.com
 Signed-off-by: Anthony Liguori aligu...@us.ibm.com

  hw/virtio-net.c |   10 +-
  1 files changed, 9 insertions(+), 1 deletions(-)

 diff --git a/hw/virtio-net.c b/hw/virtio-net.c
 index 6e48997..5c0093e 100644
 --- a/hw/virtio-net.c
 +++ b/hw/virtio-net.c
 @@ -379,7 +379,15 @@ static int virtio_net_has_buffers(VirtIONet *n, int 
 bufsize)
  (n-mergeable_rx_bufs 
   !virtqueue_avail_bytes(n-rx_vq, bufsize, 0))) {
  virtio_queue_set_notification(n-rx_vq, 1);
 -return 0;
 +
 +/* To avoid a race condition where the guest has made some buffers
 + * available after the above check but before notification was
 + * enabled, check for available buffers again.
 + */
 +if (virtio_queue_empty(n-rx_vq) ||
 +(n-mergeable_rx_bufs 
 + !virtqueue_avail_bytes(n-rx_vq, bufsize, 0)))
 +return 0;
  }
  
  virtio_queue_set_notification(n-rx_vq, 0);

 On Friday 29 January 2010 02:06:41 pm Tom Lendacky wrote:
   
 There's been some discussion of this already in the kvm list, but I want to
 summarize what I've found and also include the qemu-devel list in an effort
  to find a solution to this problem.

 Running a netperf test between two kvm guests results in the guest's
  network interface shutting down. I originally found this using kvm guests
  on two different machines that were connected via a 10GbE link.  However,
  I found this problem can be easily reproduced using two guests on the same
  machine.

 I am running the 2.6.32 level of the kvm.git tree and the 0.12.1.2 level of
 the qemu-kvm.git tree.

 The setup includes two bridges, br0 and br1.

 The commands used to start the guests are as follows:
 usr/local/bin/qemu-system-x86_64 -name cape-vm001 -m 1024 -drive
 file=/autobench/var/tmp/cape-vm001-
 raw.img,if=virtio,index=0,media=disk,boot=on -net
 nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:51,netdev=cape-vm001-eth0 -
 netdev tap,id=cape-vm001-eth0,script=/autobench/var/tmp/ifup-kvm-
 br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
 nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:D1,netdev=cape-vm001-eth1 -
 netdev tap,id=cape-vm001-eth1,script=/autobench/var/tmp/ifup-kvm-
 br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :1 -monitor
 telnet::5701,server,nowait -snapshot -daemonize

 usr/local/bin/qemu-system-x86_64 -name cape-vm002 -m 1024 -drive
 file=/autobench/var/tmp/cape-vm002-
 raw.img,if=virtio,index=0,media=disk,boot=on -net
 nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:61,netdev=cape-vm002-eth0 -
 netdev tap,id=cape-vm002-eth0,script=/autobench/var/tmp/ifup-kvm-
 br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
 nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:E1,netdev=cape-vm002-eth1 -
 netdev tap,id=cape-vm002-eth1,script=/autobench/var/tmp/ifup-kvm-
 br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :2 -monitor
 telnet::5702,server,nowait -snapshot -daemonize

 The ifup-kvm-br0 script takes the (first) qemu created tap device and
  brings it up and adds it to bridge br0.  The ifup-kvm-br1 script take the
  (second) qemu created tap device and brings it up and adds it to bridge
  br1.

 Each ethernet device within a guest is on it's own subnet.  For example:
   guest 1 eth0 has addr 192.168.100.32 and eth1 has addr 192.168.101.32
   guest 2 eth0 has addr 192.168.100.64 and eth1 has addr 192.168.101.64

 On one of the guests run netserver:
   netserver -L 192.168.101.32 -p 12000

 On the other guest run netperf:
   netperf -L 192.168.101.64 -H 192.168.101.32 -p 12000 -t TCP_STREAM -l 60
  -c -C -- -m 16K -M 16K

 It may take more than one netperf run (I find that my second run almost
  always causes the shutdown) but the network on the eth1 links will stop
  working.

 I did some debugging and found that in qemu on the guest running netserver:
  - the receive_disabled variable is set and never gets 

Re: Network shutdown under load

2010-02-04 Thread RW
Hi,

thanks for that! I've running a lot of hosts still running with
kernel 2.6.30 and KVM 88 without problems. It seems that
all qemu-kvm versions = 0.11.0 have this problem incl.
the latest 0.12.2. So if one of the enterprise distributions
will choose one of this versions for inclusion in there
enterprise products customers will definitley will get
problems. This is definitley a showstopper if you can't
do more then 30-50 MBit/s over some period of time.
I think kernel 2.6.32 will be choosen by a lot of distributions
but the problem still exists there as far as I've read the
mailings here.

Regards,
Robert


Cedric Peltier wrote:
 Hi,

 We encoutered similar problem yesterday by upgrading a developpement
 server from ubuntu 9.04 (kernel 2.6.28) to ubuntu 9.10 (kernel 2.6.31).
 Going back under the kernel 2.6.28 was the solution for us until now..

 Regards,


 Cédric PELTIER, société Indigo



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with shared L2 cache

2010-01-22 Thread RW
You can add the option -cpu host to KVM. Then you get
the same cache size information as your host has.

Robert


On 01/22/10 15:31, Hristo Tomov wrote:
 Hello!
 
 I have following problem with KVM. I have quad-core processor
 Xeon X7350 ( 8MB L2 cache ). The processor shares 4 MB L2 cache on two
 cores. By XEN emulation the guest system have 4 MB cache at disposal.
 
 xen guest: cat /proc/cpuinfo
 
 processor: 3
 vendor_id: GenuineIntel
 cpu family: 6
 model: 15
 model name: Intel(R) Xeon(R) CPU   X7350  @ 2.93GHz
 stepping: 11
 cpu MHz: 2932.026
 cache size: 4096 KB
 fpu: yes
 fpu_exception: yes
 cpuid level: 10
 wp: yes
 flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
 mca cmov pat clflush dts mmx fxsr sse sse2 ss syscall nx lm
 constant_tsc pni ds_cpl cx16 xtpr lahf_lm
 bogomips: 5865.67
 clflush size: 64
 cache_alignment: 64
 address sizes: 40 bits physical, 48 bits virtual
 power management:
 
  By KVM in the guest system have 2 MB cache at disposal.
 
 kvm guest: cat /proc/cpuinfo
 
 processor   : 3
 vendor_id   : GenuineIntel
 cpu family  : 6
 model   : 6
 model name  : QEMU Virtual CPU version 0.9.1
 stepping: 3
 cpu MHz : 2932.026
 cache size  : 2048 KB
 fpu : yes
 fpu_exception   : yes
 cpuid level : 2
 wp  : yes
 flags   : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca
 cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm pni
 bogomips: 5861.69
 clflush size: 64
 cache_alignment : 64
 address sizes   : 40 bits physical, 48 bits virtual
 power management:
 
 On specific loads with the same number of processors, this difference
 in the cache size leads to negative results. The productivity of the
 KVM virtual machine shows 2 - 5 times worse results in comparison with
 XEN virtual machine.
 
 
 Host:
 
 cpu: Intel(R) Xeon(R) CPU   X7350  @ 2.93GHz
 os:   Centos 5.4 x86_64
 kvm:kvm-83-105.el5_4.13
 kernel:kernel-2.6.18-164.10.1.el5
 kernel arch:x86_64
 
 Guest:
 
 os:  Centos 5.4 x86_64
 qemu command: /usr/libexec/qemu-kvm -S -M pc -m 4096 -smp 4 -name
 centos -uuid 17136bf5-a7a3-0323-7e9b-90ec5aa06cdb
 -no-kvm-pit-reinjection -monitor pty -pidfile
 /var/run/libvirt/qemu//centos.pid -boot c -drive
 file=/kvm/centos.img,if=virtio,index=0,boot=on,cache=none -net
 nic,macaddr=52:54:00:43:81:4f,vlan=0,model=virtio -net
 tap,fd=14,script=,vlan=0,ifname=vnet0 -serial pty -parallel none -usb
 -vnc 127.0.0.1:0 -k en-us
 
 Host CPU:
 processor: 0
 vendor_id: GenuineIntel
 cpu family: 6
 model: 15
 model name: Intel(R) Xeon(R) CPU X7350  @ 2.93GHz
 stepping: 11
 cpu MHz: 2931.962
 cache size: 4096 KB
 physical id: 3
 siblings: 4
 core id: 0
 cpu cores: 4
 apicid: 12
 fpu: yes
 fpu_exception: yes
 cpuid level: 10
 wp: yes
 flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
 nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
 bogomips: 5863.92
 clflush size: 64
 cache_alignment: 64
 address sizes: 40 bits physical, 48 bits virtual
 power management:
 
 processor: 1
 vendor_id: GenuineIntel
 cpu family: 6
 model: 15
 model name: Intel(R) Xeon(R) CPU   X7350  @ 2.93GHz
 stepping: 11
 cpu MHz: 2931.962
 cache size: 4096 KB
 physical id: 8
 siblings: 4
 core id: 0
 cpu cores: 4
 apicid: 32
 fpu: yes
 fpu_exception: yes
 cpuid level: 10
 wp: yes
 flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
 nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
 bogomips: 5863.11
 clflush size: 64
 cache_alignment: 64
 address sizes: 40 bits physical, 48 bits virtual
 power management:
 
 processor   : 2
 vendor_id   : GenuineIntel
 cpu family  : 6
 cpu family  : 6
 model   : 15
 model name  : Intel(R) Xeon(R) CPU   X7350  @ 2.93GHz
 stepping: 11
 cpu MHz : 2931.962
 cache size  : 4096 KB
 physical id : 4
 siblings: 4
 core id : 0
 cpu cores   : 4
 apicid  : 16
 fpu : yes
 fpu_exception   : yes
 cpuid level : 10
 wp  : yes
 flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
 nx lm constant_tsc pni monitor ds_cpl vm
 x est tm2 cx16 xtpr lahf_lm
 bogomips: 5863.10
 clflush size: 64
 clflush size: 64
 cache_alignment : 64
 address sizes   : 40 bits physical, 48 

Re: Segfault

2010-01-22 Thread RW
Thanks for the hint but unfortunately every qemu command quits with
a segfault with this image:

scotty images # qemu-nbd nweb.img
Segmentation fault

[192749.840652] qemu-nbd[16052]: segfault at 7f88b1382000 ip
00419492 sp 7fff1e94f710 error 7 in qemu-nbd[40+31000]

It's sad that a segfault of Qemu/KVM can corrupt a whole image
while the host itself was running without any problems as the
segfault happened. Seems that the data is lost. But thanks again
anyway!

Robert

On 01/22/10 15:35, Liang Guo wrote:
 
 2010/1/20 RW k...@tauceti.net mailto:k...@tauceti.net
 
 
  Is there any chance to get (some) data in the image back? Backup
 is a few weeks old :-(
 
 
 
 you may use kvm-nbd or qemu-nbd to present kvm image as a NBD device, so
 that you can use nbd-client to access them. eg:
 
 kvm-nbd /vm/sid1.img
 modprobe nbd
 nbd-client localhost 1024 /dev/nbd0
 fdisk -l /dev/nbd0 
 
 -- 
 Liang Guo
 http://bluestone.cublog.cn
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: repeatable hang with loop mount and heavy IO in guest

2010-01-21 Thread RW
Some months ago I also thought elevator=noop should be a good idea.
But it isn't. It works good as long as you only do short IO requests.
Try using deadline in host and guest.

Robert


On 01/21/10 18:26, Antoine Martin wrote:
 I've tried various guests, including most recent Fedora12 kernels,
 custom 2.6.32.x
 All of them hang around the same point (~1GB written) when I do heavy IO
 write inside the guest.
 I have waited 30 minutes to see if the guest would recover, but it just
 sits there, not writing back any data, not doing anything - but
 certainly not allowing any new IO writes. The host has some load on it,
 but nothing heavy enough to completely hand a guest for that long.
 
 mount -o loop some_image.fs ./somewhere bs=512
 dd if=/dev/zero of=/somewhere/zero
 then after ~1GB: sync
 
 Host is running: 2.6.31.4
 QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88)
 
 Guests are booted with elevator=noop as the filesystems are stored as
 files, accessed as virtio disks.
 
 
 The hung backtraces always look similar to these:
 [  361.460136] INFO: task loop0:2097 blocked for more than 120 seconds.
 [  361.460139] echo 0  /proc/sys/kernel/hung_task_timeout_secs
 disables this message.
 [  361.460142] loop0 D 88000b92c848 0  2097  2
 0x0080
 [  361.460148]  88000b92c5d0 0046 880008c1f810
 880009829fd8
 [  361.460153]  880009829fd8 880009829fd8 88000a21ee80
 88000b92c5d0
 [  361.460157]  880009829610 8181b768 880001af33b0
 0002
 [  361.460161] Call Trace:
 [  361.460216]  [8105bf12] ? sync_page+0x0/0x43
 [  361.460253]  [8151383e] ? io_schedule+0x2c/0x43
 [  361.460257]  [8105bf50] ? sync_page+0x3e/0x43
 [  361.460261]  [81513a2a] ? __wait_on_bit+0x41/0x71
 [  361.460264]  [8105c092] ? wait_on_page_bit+0x6a/0x70
 [  361.460283]  [810385a7] ? wake_bit_function+0x0/0x23
 [  361.460287]  [81064975] ? shrink_page_list+0x3e5/0x61e
 [  361.460291]  [81513992] ? schedule_timeout+0xa3/0xbe
 [  361.460305]  [81038579] ? autoremove_wake_function+0x0/0x2e
 [  361.460308]  [8106538f] ? shrink_zone+0x7e1/0xaf6
 [  361.460310]  [81061725] ? determine_dirtyable_memory+0xd/0x17
 [  361.460314]  [810637da] ? isolate_pages_global+0xa3/0x216
 [  361.460316]  [81062712] ? mark_page_accessed+0x2a/0x39
 [  361.460335]  [810a61db] ? __find_get_block+0x13b/0x15c
 [  361.460337]  [81065ed4] ? try_to_free_pages+0x1ab/0x2c9
 [  361.460340]  [81063737] ? isolate_pages_global+0x0/0x216
 [  361.460343]  [81060baf] ? __alloc_pages_nodemask+0x394/0x564
 [  361.460350]  [8108250c] ? __slab_alloc+0x137/0x44f
 [  361.460371]  [812cc4c1] ? radix_tree_preload+0x1f/0x6a
 [  361.460374]  [81082a08] ? kmem_cache_alloc+0x5d/0x88
 [  361.460376]  [812cc4c1] ? radix_tree_preload+0x1f/0x6a
 [  361.460379]  [8105c0b5] ? add_to_page_cache_locked+0x1d/0xf1
 [  361.460381]  [8105c1b0] ? add_to_page_cache_lru+0x27/0x57
 [  361.460384]  [8105c25a] ?
 grab_cache_page_write_begin+0x7a/0xa0
 [  361.460399]  [81104620] ? ext3_write_begin+0x7e/0x201
 [  361.460417]  [8134648f] ? do_lo_send_aops+0xa1/0x174
 [  361.460420]  [81081948] ? virt_to_head_page+0x9/0x2a
 [  361.460422]  [8134686b] ? loop_thread+0x309/0x48a
 [  361.460425]  [813463ee] ? do_lo_send_aops+0x0/0x174
 [  361.460427]  [81038579] ? autoremove_wake_function+0x0/0x2e
 [  361.460430]  [81346562] ? loop_thread+0x0/0x48a
 [  361.460432]  [8103819b] ? kthread+0x78/0x80
 [  361.460441]  [810238df] ? finish_task_switch+0x2b/0x78
 [  361.460454]  [81002f6a] ? child_rip+0xa/0x20
 [  361.460460]  [81012ac3] ? native_pax_close_kernel+0x0/0x32
 [  361.460463]  [81038123] ? kthread+0x0/0x80
 [  361.460469]  [81002f60] ? child_rip+0x0/0x20
 [  361.460471] INFO: task kjournald:2098 blocked for more than 120 seconds.
 [  361.460473] echo 0  /proc/sys/kernel/hung_task_timeout_secs
 disables this message.
 [  361.460474] kjournald D 88000b92e558 0  2098  2
 0x0080
 [  361.460477]  88000b92e2e0 0046 88000aad9840
 88000983ffd8
 [  361.460480]  88000983ffd8 88000983ffd8 81808e00
 88000b92e2e0
 [  361.460483]  88000983fcf0 8181b768 880001af3c40
 0002
 [  361.460486] Call Trace:
 [  361.460488]  [810a6b16] ? sync_buffer+0x0/0x3c
 [  361.460491]  [8151383e] ? io_schedule+0x2c/0x43
 [  361.460494]  [810a6b4e] ? sync_buffer+0x38/0x3c
 [  361.460496]  [81513a2a] ? __wait_on_bit+0x41/0x71
 [  361.460499]  [810a6b16] ? sync_buffer+0x0/0x3c
 [  361.460501]  [81513ac4] ? out_of_line_wait_on_bit+0x6a/0x76
 [  361.460504]  [810385a7] ? wake_bit_function+0x0/0x23
 [  361.460514]  [8113edad] ?
 

Re: repeatable hang with loop mount and heavy IO in guest

2010-01-21 Thread RW
No sorry, I haven't any performance data with noop. I even don't
have had a crash. BUT I've experienced serve I/O degradation
with noop. Once I've written a big chunk of data (e.g. a simple
rsync -av /usr /opt) with noop it works for a while and
after a few seconds I saw heavy writes which made the
VM virtually unusable. As far as I remember it was kjournald
which cases the writes.

I've written a mail to the list some months ago with some benchmarks:
http://article.gmane.org/gmane.comp.emulators.kvm.devel/41112/match=benchmark
There're some I/O benchmarks in there. You can't get the graphs
currently since tauceti.net is offline until monday. I haven't
tested noop in these benchmarks because of the problems
mentioned above. But it compares deadline and cfq a little bit
on a HP DL 380 G6 server.

Robert

On 01/21/10 22:08, Thomas Beinicke wrote:
 On Thursday 21 January 2010 21:08:38 RW wrote:
 Some months ago I also thought elevator=noop should be a good idea.
 But it isn't. It works good as long as you only do short IO requests.
 Try using deadline in host and guest.

 Robert
 
 @Robert: I've been using noop on all of my KVMs and didn't have any problems 
 so far, never had any crash too.
 Do you have any performance data or comparisons between noop and deadline io 
 schedulers?
 
 Cheers,
 
 Thomas
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Segfault

2010-01-20 Thread RW
When I start qemu-img with strace I get the output below. Does
this help to to identify the problem?

scotty images # strace qemu-img convert -f qcow2 nweb.img -O raw test.img
execve(/usr/bin/qemu-img, [qemu-img, convert, -f, qcow2,
nweb.img, -O, raw, test.img], [/* 44 vars */]) = 0
brk(0)  = 0x635000

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7fc6ec61a000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7fc6ec619000
access(/etc/ld.so.preload, R_OK)  = -1 ENOENT (No such file or
directory)
open(/usr/lib64/tls/x86_64/librt.so.1, O_RDONLY) = -1 ENOENT (No such
file or directory)
stat(/usr/lib64/tls/x86_64, 0x7f9a4cd0) = -1 ENOENT (No such file
or directory)
open(/usr/lib64/tls/librt.so.1, O_RDONLY) = -1 ENOENT (No such file or
directory)
stat(/usr/lib64/tls, {st_mode=S_IFDIR|0755, st_size=6, ...}) = 0

open(/usr/lib64/x86_64/librt.so.1, O_RDONLY) = -1 ENOENT (No such file
or directory)
stat(/usr/lib64/x86_64, 0x7f9a4cd0) = -1 ENOENT (No such file or
directory)
open(/usr/lib64/librt.so.1, O_RDONLY) = -1 ENOENT (No such file or
directory)
stat(/usr/lib64, {st_mode=S_IFDIR|0755, st_size=147456, ...}) = 0

open(/etc/ld.so.cache, O_RDONLY)  = 3

fstat(3, {st_mode=S_IFREG|0644, st_size=207105, ...}) = 0

mmap(NULL, 207105, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fc6ec5e6000

close(3)= 0

open(/lib/librt.so.1, O_RDONLY)   = 3

read(3,
\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\340\\0\0\0\0\0\0...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=35656, ...}) = 0

mmap(NULL, 2132976, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7fc6ec1f6000
mprotect(0x7fc6ec1fe000, 2093056, PROT_NONE) = 0

mmap(0x7fc6ec3fd000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7000) = 0x7fc6ec3fd000
close(3)= 0

open(/usr/lib64/tls/libpthread.so.0, O_RDONLY) = -1 ENOENT (No such
file or directory)
open(/usr/lib64/libpthread.so.0, O_RDONLY) = -1 ENOENT (No such file
or directory)
open(/lib/libpthread.so.0, O_RDONLY)  = 3

read(3,
\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\340W\0\0\0\0\0\0...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=132454, ...}) = 0

mmap(NULL, 2204528, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7fc6ebfdb000
mprotect(0x7fc6ebff1000, 2093056, PROT_NONE) = 0

mmap(0x7fc6ec1f, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x7fc6ec1f
mmap(0x7fc6ec1f2000, 13168, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fc6ec1f2000
close(3)= 0

open(/usr/lib64/tls/libaio.so.1, O_RDONLY) = -1 ENOENT (No such file
or directory)
open(/usr/lib64/libaio.so.1, O_RDONLY) = 3

read(3,
\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0`\6\0\0\0\0\0\0..., 832)
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=5280, ...}) = 0

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7fc6ec5e5000
mmap(NULL, 2101296, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7fc6ebdd9000
mprotect(0x7fc6ebdda000, 2093056, PROT_NONE) = 0

mmap(0x7fc6ebfd9000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x7fc6ebfd9000
close(3)= 0

open(/usr/lib64/tls/libz.so.1, O_RDONLY) = -1 ENOENT (No such file or
directory)
open(/usr/lib64/libz.so.1, O_RDONLY)  = -1 ENOENT (No such file or
directory)
open(/lib/libz.so.1, O_RDONLY)= 3

read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\0
\0\0\0\0\0\0..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=84272, ...}) = 0

mmap(NULL, 2179568, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7fc6ebbc4000
mprotect(0x7fc6ebbd8000, 2093056, PROT_NONE) = 0

mmap(0x7fc6ebdd7000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x13000) = 0x7fc6ebdd7000
close(3)= 0

open(/usr/lib64/tls/libcurl.so.4, O_RDONLY) = -1 ENOENT (No such file
or directory)
open(/usr/lib64/libcurl.so.4, O_RDONLY) = 3

read(3,
\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\0\310\0\0\0\0\0\0...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=279672, ...}) = 0

mmap(NULL, 2375624, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7fc6eb98
mprotect(0x7fc6eb9c2000, 2097152, PROT_NONE) = 0

mmap(0x7fc6ebbc2000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x42000) = 0x7fc6ebbc2000
close(3)= 0

open(/usr/lib64/tls/libgssapi_krb5.so.2, O_RDONLY) = -1 ENOENT (No
such file or directory)
open(/usr/lib64/libgssapi_krb5.so.2, O_RDONLY) = 3

read(3,
\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\0j\0\0\0\0\0\0..., 832)
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=173272, ...}) = 0

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7fc6ec5e4000
mmap(NULL, 2268688, PROT_READ|PROT_EXEC, 

Some KVM benchmarks/ IO question

2009-10-05 Thread RW
I've currently a HP DL 380 G6 server which I can use for some benchmarks
I want to share with you. Maybe someone find it interesting or usefull.
Both host and guest running Gentoo with kernel 2.6.31.1 (which is stable
2.6.31 update 1). The host have the following components:

2 x Intel Xeon CPU L5520 - 2 Quad-Processors (static performance, VT-d,
Hyperthreading enabled in BIOS)
8 x 300 GB SAS 10k hard drives (RAID 10)
24 GB RAM

(Some) Host (kernel) settings:
I/O scheduler: deadline
Filesystem: xfs
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
CONFIG_VIRTIO_BLK=m
CONFIG_VIRTIO_NET=m
CONFIG_VIRTIO_CONSOLE=m
CONFIG_HW_RANDOM_VIRTIO=m
CONFIG_VIRTIO=m
CONFIG_VIRTIO_RING=m
CONFIG_VIRTIO_PCI=m
CONFIG_VIRTIO_BALLOON=m

(Some) Guest (kernel) settings:
I/O scheduler: deadline/cfq (see below)
Filesystem: ext3 (datamode=ordered/writeback [see below))
VIRTIO network and block driver used
Guest is a qcow2-image (which was expanded with dd to be big
enough for the IO tests so that growing the image doesn't happen
during the IO test.
CONFIG_KVM_CLOCK=y
CONFIG_KVM_GUEST=y
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_VIRTIO_BLK=y
CONFIG_VIRTIO_NET=m
CONFIG_VIRTIO_CONSOLE=y
CONFIG_HW_RANDOM_VIRTIO=y
CONFIG_VIRTIO=y
CONFIG_VIRTIO_RING=y
CONFIG_VIRTIO_PCI=y
CONFIG_VIRTIO_BALLOON=y
CONFIG_PARAVIRT_SPINLOCKS=y

KVM startup options (the imported ones):
-m variable-see_below
-smp variable-see_below
-cpu host
-daemonize
-drive file=/data/kvm/kvmimages/gfs1.qcow2,if=virtio,boot=on
-net nic,vlan=104,model=virtio,macaddr=00:ff:48:23:45:4b
-net tap,vlan=104,ifname=tap.b.gfs1,script=no
-net nic,vlan=96,model=virtio,macaddr=00:ff:48:23:45:4d
-net tap,vlan=96,ifname=tap.f.gfs1,script=no

Since we use KVM mostly for webserver stuff I've run tests with
Apachebench, Graphics Magick and IOzone to simulate our workload as good
as we can.

Apachebench with 4 GB RAM, CFQ scheduler, 1/2/4/8 vProcs compared with
the host (the hp380g6 graph):
http://www.tauceti.net/kvm-benchmarks/merge-3428/
Apachebench with 2 GB RAM, CFQ scheduler, 1/2/4/8 vProcs compared with
the host (the hp380g6 graph):
http://www.tauceti.net/kvm-benchmarks/merge-6053/
I was interested if more RAM will give more throughput and if more
vProcs will scale. As you can see 8 vProcs (compared to 2xQuadCore+HT)
will give almost the same requests per second in KVM as the host alone
can deliver. RAM doesn't matter in this case.

Graphics Magick resize with 4 GB RAM, CFQ scheduler, 1/2/4/8 vProcs
compared with the host (the hp380g6 graph):
http://www.tauceti.net/kvm-benchmarks/merge-5214/
Graphics Magick resize with 2 GB RAM, CFQ scheduler, 1/2/4/8 vProcs
compared with the host (the hp380g6 graph):
http://www.tauceti.net/kvm-benchmarks/merge-7186/
With 8 vProcs the KVM runs about 10% slower. RAM doesn't seem to matter
again here.

The following IO tests run with the KVM option cache=none:
IOzone write test with 2 GB RAM, CFQ scheduler, datamode=ordered,
1/2/4/8 vProcs compared with the host (the hp380g6 graph):
http://www.tauceti.net/kvm-benchmarks/merge-3564/
IOzone write test with 2 GB RAM, deadline scheduler, datamode=ordered,
1/2/4/8 vProcs compared with the host (the hp380g6 graph):
http://www.tauceti.net/kvm-benchmarks/merge-4533/
It's interesting to see that the deadline scheduler gives you about
10-15 MByte/s more throughput. Please note that I haven't checked the
CPU usage during the IO tests.

The following IO tests run with no cache option set so this defaults
to writethrough:
IOzone write test with 2 GB RAM, deadline scheduler, datamode ordered
and writeback (see testpage), 8 vProcs compared with the hoste (the
hp380g6 graph):
http://www.tauceti.net/kvm-benchmarks/merge-7526/

All in all for this tests it seems that a KVM can perform almost as good
as the host. I haven't tested the network throughput but it works quite
well if I copy big files with ssh and/or rsync but I have no numbers yet.

I'm using KVM for 2 years now and 1 year in production and I'm satisfied
with it. Thanks to all developers for their work! But what is still
confusing to me is the IO thing. I've had two times where new releases
due to a bug destroyed some data but it was not obvious to me that it
was a bug and if it was in KVM (with new kernel release) or Qemu. So
I've decided to start KVM's always with cache=none and use ext3 as
filesystem with datamode=ordered. Now the benchmark above shows quite
impressive results with the KVM default option cache=writethrough but
due to the things happend to me in the past I really fear very much to
switch from cache=none to cache=writethrough because of data integrity.

I know that there was some discussion some time ago but can some of the
developers confirm that the following combination saves the data
integrity of the guest filesystems and the integrity of the qcow2 image
as a whole in case of a crash e.g.:

Host: Filessystem xfs for KVM qcow2 images
AND Guest: cache=writethrough, virtio block driver, Filesystems ext3
with 

Re: sync guest calls made async on host - SQLite performance

2009-09-25 Thread RW
I've read the article a few days ago and it was interesting.
As I upgraded vom 2.6.29 to 2.6.30 (Gentoo) I also saw a dramatic
increase disk and filesystem performance. But then I realized
that the default mode for ext3 changed to data=writeback.
So I changed that back to data=ordered and performance was
as it was with 2.6.29.

I think ext3 with data=writeback in a KVM and KVM started
with if=virtio,cache=none is a little bit crazy. I don't know
if this is the case with current Ubuntu Alpha but it looks
like so.

Regards,
Robert

 I would like to call attention to the SQLite performance under KVM in
 the current Ubuntu Alpha.

 http://www.phoronix.com/scan.php?page=articleitem=linux_2631_kvmnum=3

 SQLite's benchmark as part of the Phoronix Test Suite is typically IO
 limited and is affected by both disk and filesystem performance.

 en comparing SQLite under the host against the guest OS,  there is an
 der of magnitude _IMPROVEMENT_ in the measured performance  of the guest.

 I am expecting that the host is doing synchronous IO operations but
 somewhere in the stack the calls are ultimately being made asynchronous
 or at the very least batched for writing.

 On the surface, this represents a data integrity issue and  I am
 interested in the KVM communities thoughts on this behaviour.  Is it
 expected? Is it acceptable?  Is it safe?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html