Re: kexec/kdump of a kvm guest?
On Jul 24, 2008, at 2:13 AM, Mike Snitzer wrote: On Sat, Jul 5, 2008 at 7:20 AM, Avi Kivity [EMAIL PROTECTED] wrote: Mike Snitzer wrote: My host is x86_64 RHEL5U1 running 2.6.25.4 with kvm-70 (kvm-intel). When I configure kdump in the guest (running 2.6.22.19) and force a crash (with 'echo c /proc/sysrq-trigger) kexec boots the kdump kernel but then the kernel hangs (before it gets to /sbin/init et al). On the host, the associated qemu is consuming 100% cpu. I really need to be able to collect vmcores from my kvm guests. So far I can't (on raw hardware all works fine). I've tested this a while ago and it worked (though I tested regular kexecs, not crashes); this may be a regression. Please run kvm_stat to see what's happening at the time of the crash. OK, I can look into kvm_stat but I just discovered that just having kvm-intel and kvm loaded into my 2.6.22.19 kernel actually prevents Is 2.6.22.19 your host or your guest kernel? It's very unlikely that you loaded kvm modules in the guest. the host from being able to kexec/kdump too!? I didn't have any guests running (only the kvm modules were loaded). As soon as I unloaded the kvm modules kdump worked as expected. Something about kvm is completely breaking kexec/kdump on both the host and guest kernels. I guess the kexec people would be pretty interested in this as well, so I'll just CC them for now. As you're stating that the host kernel breaks with kvm modules loaded, maybe someone there could give a hint. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: scsi broken 4GB RAM
Hi, I tried windows server 2008 (64 bit) on Proxmox VE 0.9beta2 (KVM 71), see http://pve.proxmox.com): Some details: --memory 6144 --cdrom en_windows_server_2008_datacenter_enterprise_standard_x64_dvd_X14-26714.iso --name win2008-6gb-scsi --smp 1 --bootdisk scsi0 --scsi0 80 The installer shows 80 GB harddisk but freezes after clicking next for a minute then: Windows could not creat a partition on disk 0. The error occurred while preparing the computer´s system volume. Error code: 0x8004245F. I also got installer problems if I just use scsi as boot disk (no high memory) on several windows versions, including win2003 and xp. So I decided to use IDE, works without any issue on windows. But: I reduced the memory to 2048 and the installer continues to work! Best Regards, Martin Maurer [EMAIL PROTECTED] http://www.proxmox.com Proxmox Server Solutions GmbH Kohlgasse 51/10, 1050 Vienna, Austria Phone: +43 1 545 4497 11 Fax: +43 1 545 4497 22 Commercial register no.: FN 258879 f Registration office: Handelsgericht Wien -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Henrik Holst Sent: Mittwoch, 23. Juli 2008 23:09 To: kvm@vger.kernel.org Subject: scsi broken 4GB RAM I do not know if this is a bug in qemu or the linux kernel sym53c8xx module (I haven't had the opportunity to test with anything other than Linux at the moment) but if one starts an qemu instance with -m 4096 and larger the scsi emulated disk fails in the Linux guest. If booting any install cd the /dev/sda is seen as only 512B in size and if booting an ubuntu 8.04-amd64 with the secondary drive as scsi it is seen with the correct size but one cannot read not write the partition table. Is there anyone out there that could test say a Windows image on scsi with 4GB or more of RAM and see if it works or not? If so it could be the linux driver that is faulty. /Henrik Holst -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: scsi broken 4GB RAM
Sorry, just returned to the installer - also stopped with the same error code, using just 2 gb ram. Best Regards, Martin Maurer [EMAIL PROTECTED] http://www.proxmox.com Proxmox Server Solutions GmbH Kohlgasse 51/10, 1050 Vienna, Austria Phone: +43 1 545 4497 11 Fax: +43 1 545 4497 22 Commercial register no.: FN 258879 f Registration office: Handelsgericht Wien -Original Message- From: Martin Maurer Sent: Donnerstag, 24. Juli 2008 11:44 To: kvm@vger.kernel.org Subject: RE: scsi broken 4GB RAM Hi, I tried windows server 2008 (64 bit) on Proxmox VE 0.9beta2 (KVM 71), see http://pve.proxmox.com): Some details: --memory 6144 --cdrom en_windows_server_2008_datacenter_enterprise_standard_x64_dvd_X14- 26714.iso --name win2008-6gb-scsi --smp 1 --bootdisk scsi0 --scsi0 80 The installer shows 80 GB harddisk but freezes after clicking next for a minute then: Windows could not creat a partition on disk 0. The error occurred while preparing the computer´s system volume. Error code: 0x8004245F. I also got installer problems if I just use scsi as boot disk (no high memory) on several windows versions, including win2003 and xp. So I decided to use IDE, works without any issue on windows. But: I reduced the memory to 2048 and the installer continues to work! Best Regards, Martin Maurer [EMAIL PROTECTED] http://www.proxmox.com Proxmox Server Solutions GmbH Kohlgasse 51/10, 1050 Vienna, Austria Phone: +43 1 545 4497 11 Fax: +43 1 545 4497 22 Commercial register no.: FN 258879 f Registration office: Handelsgericht Wien -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Henrik Holst Sent: Mittwoch, 23. Juli 2008 23:09 To: kvm@vger.kernel.org Subject: scsi broken 4GB RAM I do not know if this is a bug in qemu or the linux kernel sym53c8xx module (I haven't had the opportunity to test with anything other than Linux at the moment) but if one starts an qemu instance with -m 4096 and larger the scsi emulated disk fails in the Linux guest. If booting any install cd the /dev/sda is seen as only 512B in size and if booting an ubuntu 8.04-amd64 with the secondary drive as scsi it is seen with the correct size but one cannot read not write the partition table. Is there anyone out there that could test say a Windows image on scsi with 4GB or more of RAM and see if it works or not? If so it could be the linux driver that is faulty. /Henrik Holst -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/8] KVM: PCIPT: fix interrupt handling
On Wed, 2008-07-23 at 19:07 +0530, Amit Shah wrote: * On Wednesday 16 Jul 2008 18:47:01 Ben-Ami Yassour wrote: This patch fixes a few problems with the interrupt handling for passthrough devices. 1. Pass the interrupt handler the pointer to the device, so we do not need to lock the pcipt lock in the interrupt handler. 2. Remove the pt_irq_handled bitmap - it is no longer needed. 3. Split kvm_pci_pt_work_fn into two functions, one for interrupt injection and another for the ack - is much simpler code this way. 4. Change the passthrough initialization order - add the device structure to the list, before registering the interrupt handler. 5. On passthrough destruction path, free the interrupt handler before cleaning queued work. Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED] --- if (irqchip_in_kernel(kvm)) { + match-pt_dev.guest.irq = pci_pt_dev-guest.irq; + match-pt_dev.host.irq = dev-irq; + if (kvm-arch.vioapic) + kvm-arch.vioapic-ack_notifier = kvm_pci_pt_ack_irq; + if (kvm-arch.vpic) + kvm-arch.vpic-ack_notifier = kvm_pci_pt_ack_irq; + We shouldn't register this notifier unless we get the irq below to avoid unneeded function calls and checks. Note: This code was changed in the last version of the code but the comment is still relevant. Do you mean that we need to postpone registering the notification? In the case of an assigned device this is means the we postpone it for a few seconds, and implementing it like above it cleaner. So I don't see the real value in postponing it. Thanks, Ben -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/9] kvm: qemu: Eliminate extra virtio_net copy
This is Anthony's net-tap-zero-copy.patch which eliminates a copy on the host-guest data path with virtio_net. --- qemu/hw/virtio-net.c | 76 - qemu/net.h |3 ++ qemu/vl.c| 50 + 3 files changed, 109 insertions(+), 20 deletions(-) diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c index a681a7e..5e71afe 100644 --- a/qemu/hw/virtio-net.c +++ b/qemu/hw/virtio-net.c @@ -70,6 +70,8 @@ typedef struct VirtIONet VLANClientState *vc; QEMUTimer *tx_timer; int tx_timer_active; +int last_elem_valid; +VirtQueueElement last_elem; } VirtIONet; /* TODO @@ -153,47 +155,80 @@ static int virtio_net_can_receive(void *opaque) return 1; } -static void virtio_net_receive(void *opaque, const uint8_t *buf, int size) +static void virtio_net_receive_zc(void *opaque, IOZeroCopyHandler *zc, void *data) { VirtIONet *n = opaque; -VirtQueueElement elem; +VirtQueueElement *elem = n-last_elem; struct virtio_net_hdr *hdr; -int offset, i; -int total; +ssize_t err; +int idx; -if (virtqueue_pop(n-rx_vq, elem) == 0) +if (!n-last_elem_valid virtqueue_pop(n-rx_vq, elem) == 0) return; -if (elem.in_num 1 || elem.in_sg[0].iov_len != sizeof(*hdr)) { +if (elem-in_num 1 || elem-in_sg[0].iov_len != sizeof(*hdr)) { fprintf(stderr, virtio-net header not in first element\n); exit(1); } -hdr = (void *)elem.in_sg[0].iov_base; +n-last_elem_valid = 1; + +hdr = (void *)elem-in_sg[0].iov_base; hdr-flags = 0; hdr-gso_type = VIRTIO_NET_HDR_GSO_NONE; -offset = 0; -total = sizeof(*hdr); +idx = tap_has_offload(n-vc-vlan-first_client) ? 0 : 1; + +do { +err = zc(data, elem-in_sg[idx], elem-in_num - idx); +} while (err == -1 errno == EINTR); + +if (err == -1 errno == EAGAIN) +return; -if (tap_has_offload(n-vc-vlan-first_client)) { - memcpy(hdr, buf, sizeof(*hdr)); - offset += total; +if (err 0) { +fprintf(stderr, virtio_net: error during IO\n); +return; } +/* signal other side */ +n-last_elem_valid = 0; +virtqueue_push(n-rx_vq, elem, sizeof(*hdr) + err); +virtio_notify(n-vdev, n-rx_vq); +} + +struct compat_data +{ +const uint8_t *buf; +int size; +}; + +static ssize_t compat_copy(void *opaque, struct iovec *iov, int iovcnt) +{ +struct compat_data *compat = opaque; +int offset, i; + /* copy in packet. ugh */ -i = 1; -while (offset size i elem.in_num) { - int len = MIN(elem.in_sg[i].iov_len, size - offset); - memcpy(elem.in_sg[i].iov_base, buf + offset, len); +offset = 0; +i = 0; +while (offset compat-size i iovcnt) { + int len = MIN(iov[i].iov_len, compat-size - offset); + memcpy(iov[i].iov_base, compat-buf + offset, len); offset += len; - total += len; i++; } -/* signal other side */ -virtqueue_push(n-rx_vq, elem, total); -virtio_notify(n-vdev, n-rx_vq); +return offset; +} + +static void virtio_net_receive(void *opaque, const uint8_t *buf, int size) +{ +struct compat_data compat; + +compat.buf = buf; +compat.size = size; + +virtio_net_receive_zc(opaque, compat_copy, compat); } /* TX */ @@ -310,6 +345,7 @@ PCIDevice *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn) memcpy(n-mac, nd-macaddr, 6); n-vc = qemu_new_vlan_client(nd-vlan, virtio_net_receive, virtio_net_can_receive, n); +n-vc-fd_read_zc = virtio_net_receive_zc; n-tx_timer = qemu_new_timer(vm_clock, virtio_net_tx_timer, n); n-tx_timer_active = 0; diff --git a/qemu/net.h b/qemu/net.h index 6cfd8ce..aca50e9 100644 --- a/qemu/net.h +++ b/qemu/net.h @@ -6,6 +6,8 @@ /* VLANs support */ typedef ssize_t (IOReadvHandler)(void *, const struct iovec *, int); +typedef ssize_t (IOZeroCopyHandler)(void *, struct iovec *, int); +typedef void (IOReadZCHandler)(void *, IOZeroCopyHandler *, void *); typedef struct VLANClientState VLANClientState; @@ -14,6 +16,7 @@ typedef void (SetOffload)(VLANClientState *, int, int, int, int); struct VLANClientState { IOReadHandler *fd_read; IOReadvHandler *fd_readv; +IOReadZCHandler *fd_read_zc; /* Packets may still be sent if this returns zero. It's used to rate-limit the slirp code. */ IOCanRWHandler *fd_can_read; diff --git a/qemu/vl.c b/qemu/vl.c index de92848..bc5b151 100644 --- a/qemu/vl.c +++ b/qemu/vl.c @@ -4204,6 +4204,7 @@ typedef struct TAPState { char buf[TAP_BUFSIZE]; int size; int offload; +int received_eagain; } TAPState; static void tap_receive(void *opaque, const uint8_t *buf, int size) @@ -4232,6 +4233,48 @@ static ssize_t tap_readv(void *opaque, const struct iovec *iov, return len; } +static VLANClientState *tap_can_zero_copy(TAPState
[PATCH 0/9][RFC] KVM virtio_net performance
Hey, Here's a bunch of patches attempting to improve the performance of virtio_net. This is more an RFC rather than a patch submission since, as can be seen below, not all patches actually improve the perfomance measurably. I've tried hard to test each of these patches with as stable and informative a benchmark as I could find. The first benchmark is a netperf[1] based throughput benchmark and the second uses a flood ping[2] to measure latency differences. Each set of figures is min/average/max/standard deviation. The first set is Gb/s and the second is milliseconds. The network configuration used was very simple - the guest with a virtio_net interface and the host with a tap interface and static IP addresses assigned to both - e.g. there was no bridge in the host involved and iptables was disable in both the host and guest. I used: 1) kvm-71-26-g6152996 with the patches that follow 2) Linus's v2.6.26-5752-g93ded9b with Rusty's virtio patches from 219:bbd2611289c5 applied; these are the patches have just been submitted to Linus The conclusions I draw are: 1) The length of the tx mitigation timer makes quite a difference to throughput achieved; we probably need a good heuristic for adjusting this on the fly. 2) Using the recently merged GSO support in the tun/tap driver gives a huge boost, but much more so on the host-guest side. 3) Adjusting the virtio_net ring sizes makes a small difference, but not as much as one might expect 4) Dropping the global mutex while reading GSO packets from the tap interface gives a nice speedup. This highlights the global mutex as a general perfomance issue. 5) Eliminating an extra copy on the host-guest path only makes a barely measurable difference. Anyway, the figures: netperf, 10x20s runs (Gb/s) | guest-host | host-guest -++--- baseline | 1.520/ 1.573/ 1.610/ 0.034 | 1.160/ 1.357/ 1.630/ 0.165 50us tx timer + rearm| 1.050/ 1.086/ 1.110/ 0.017 | 1.710/ 1.832/ 1.960/ 0.092 250us tx timer + rearm | 1.700/ 1.764/ 1.880/ 0.064 | 0.900/ 1.203/ 1.580/ 0.205 150us tx timer + rearm | 1.520/ 1.602/ 1.690/ 0.044 | 1.670/ 1.928/ 2.150/ 0.141 no ring-full heuristic | 1.480/ 1.569/ 1.710/ 0.066 | 1.610/ 1.857/ 2.140/ 0.153 VIRTIO_F_NOTIFY_ON_EMPTY | 1.470/ 1.554/ 1.650/ 0.054 | 1.770/ 1.960/ 2.170/ 0.119 recv NO_NOTIFY | 1.530/ 1.604/ 1.680/ 0.047 | 1.780/ 1.944/ 2.190/ 0.129 GSO | 4.120/ 4.323/ 4.420/ 0.099 | 6.540/ 7.033/ 7.340/ 0.244 ring size == 256 | 4.050/ 4.406/ 4.560/ 0.143 | 6.280/ 7.236/ 8.280/ 0.613 ring size == 512 | 4.420/ 4.600/ 4.960/ 0.140 | 6.470/ 7.205/ 7.510/ 0.314 drop mutex during tapfd read | 4.320/ 4.578/ 4.790/ 0.161 | 8.370/ 8.589/ 8.730/ 0.120 aligouri zero-copy | 4.510/ 4.694/ 4.960/ 0.148 | 8.430/ 8.614/ 8.840/ 0.142 ping -f -c 10 (ms) | guest-host | host-guest -++--- baseline | 0.060/ 0.459/ 7.602/ 0.846 | 0.067/ 0.331/ 2.517/ 0.057 50us tx timer + rearm| 0.081/ 0.143/ 7.436/ 0.374 | 0.093/ 0.133/ 1.883/ 0.026 250us tx timer + rearm | 0.302/ 0.463/ 7.580/ 0.849 | 0.297/ 0.344/ 2.128/ 0.028 150us tx timer + rearm | 0.197/ 0.323/ 7.671/ 0.740 | 0.199/ 0.245/ 7.836/ 0.037 no ring-full heuristic | 0.182/ 0.324/ 7.688/ 0.753 | 0.199/ 0.243/ 2.197/ 0.030 VIRTIO_F_NOTIFY_ON_EMPTY | 0.197/ 0.321/ 7.447/ 0.730 | 0.196/ 0.242/ 2.218/ 0.032 recv NO_NOTIFY | 0.186/ 0.321/ 7.520/ 0.732 | 0.200/ 0.233/ 2.216/ 0.028 GSO | 0.178/ 0.324/ 7.667/ 0.736 | 0.147/ 0.246/ 1.361/ 0.024 ring size == 256 | 0.184/ 0.323/ 7.674/ 0.728 | 0.199/ 0.243/ 2.181/ 0.028 ring size == 512 | (not measured) | (not measured) drop mutex during tapfd read | 0.183/ 0.323/ 7.820/ 0.733 | 0.202/ 0.242/ 2.219/ 0.027 aligouri zero-copy | 0.185/ 0.325/ 7.863/ 0.736 | 0.202/ 0.245/ 7.844/ 0.036 Cheers, Mark. [1] - I used netperf trunk from: http://www.netperf.org/svn/netperf2/trunk and simply ran: $ i=0; while [ $i -lt 10 ]; do ./netperf -H host -f g -l 20 -P 0 | netperf-collect.py; i=$((i+1)); done where netperf-collect.py is just a script to calculate the average across the runs: http://markmc.fedorapeople.org/netperf-collect.py [2] - ping -c 10 -f host -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/9] kvm: qemu: Add support for partial csums and GSO
Signed-off-by: Mark McLoughlin [EMAIL PROTECTED] --- qemu/hw/virtio-net.c | 86 +- qemu/net.h |5 +++ qemu/vl.c| 73 +++--- 3 files changed, 144 insertions(+), 20 deletions(-) diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c index 419a2d7..81282c4 100644 --- a/qemu/hw/virtio-net.c +++ b/qemu/hw/virtio-net.c @@ -22,9 +22,18 @@ #define VIRTIO_ID_NET 1 /* The feature bitmap for virtio net */ -#define VIRTIO_NET_F_NO_CSUM 0 -#define VIRTIO_NET_F_MAC 5 -#define VIRTIO_NET_F_GS0 6 +#define VIRTIO_NET_F_CSUM 0 /* Host handles pkts w/ partial csum */ +#define VIRTIO_NET_F_GUEST_CSUM1 /* Guest handles pkts w/ partial csum */ +#define VIRTIO_NET_F_MAC 5 /* Host has given MAC address. */ +#define VIRTIO_NET_F_GSO 6 /* Host handles pkts w/ any GSO type */ +#define VIRTIO_NET_F_GUEST_TSO47 /* Guest can handle TSOv4 in. */ +#define VIRTIO_NET_F_GUEST_TSO68 /* Guest can handle TSOv6 in. */ +#define VIRTIO_NET_F_GUEST_ECN 9 /* Guest can handle TSO[6] w/ ECN in. */ +#define VIRTIO_NET_F_GUEST_UFO 10 /* Guest can handle UFO in. */ +#define VIRTIO_NET_F_HOST_TSO4 11 /* Host can handle TSOv4 in. */ +#define VIRTIO_NET_F_HOST_TSO6 12 /* Host can handle TSOv6 in. */ +#define VIRTIO_NET_F_HOST_ECN 13 /* Host can handle TSO[6] w/ ECN in. */ +#define VIRTIO_NET_F_HOST_UFO 14 /* Host can handle UFO in. */ #define TX_TIMER_INTERVAL (15) /* 150 us */ @@ -42,8 +51,6 @@ struct virtio_net_hdr uint8_t flags; #define VIRTIO_NET_HDR_GSO_NONE0 // Not a GSO frame #define VIRTIO_NET_HDR_GSO_TCPV4 1 // GSO frame, IPv4 TCP (TSO) -/* FIXME: Do we need this? If they said they can handle ECN, do they care? */ -#define VIRTIO_NET_HDR_GSO_TCPV4_ECN 2 // GSO frame, IPv4 TCP w/ ECN #define VIRTIO_NET_HDR_GSO_UDP 3 // GSO frame, IPv4 UDP (UFO) #define VIRTIO_NET_HDR_GSO_TCPV6 4 // GSO frame, IPv6 TCP #define VIRTIO_NET_HDR_GSO_ECN 0x80// TCP has ECN set @@ -85,7 +92,38 @@ static void virtio_net_update_config(VirtIODevice *vdev, uint8_t *config) static uint32_t virtio_net_get_features(VirtIODevice *vdev) { -return (1 VIRTIO_NET_F_MAC); +VirtIONet *n = to_virtio_net(vdev); +VLANClientState *host = n-vc-vlan-first_client; +uint32_t features = (1 VIRTIO_NET_F_MAC); + +if (tap_has_offload(host)) { + features |= (1 VIRTIO_NET_F_CSUM); + features |= (1 VIRTIO_NET_F_GUEST_CSUM); + features |= (1 VIRTIO_NET_F_GUEST_TSO4); + features |= (1 VIRTIO_NET_F_GUEST_TSO6); + features |= (1 VIRTIO_NET_F_GUEST_ECN); + features |= (1 VIRTIO_NET_F_HOST_TSO4); + features |= (1 VIRTIO_NET_F_HOST_TSO6); + features |= (1 VIRTIO_NET_F_HOST_ECN); + /* Kernel can't actually handle UFO in software currently. */ +} + +return features; +} + +static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features) +{ +VirtIONet *n = to_virtio_net(vdev); +VLANClientState *host = n-vc-vlan-first_client; + +if (!tap_has_offload(host) || !host-set_offload) + return; + +host-set_offload(host, + (features VIRTIO_NET_F_GUEST_CSUM) 1, + (features VIRTIO_NET_F_GUEST_TSO4) 1, + (features VIRTIO_NET_F_GUEST_TSO6) 1, + (features VIRTIO_NET_F_GUEST_ECN) 1); } /* RX */ @@ -121,6 +159,7 @@ static void virtio_net_receive(void *opaque, const uint8_t *buf, int size) VirtQueueElement elem; struct virtio_net_hdr *hdr; int offset, i; +int total; if (virtqueue_pop(n-rx_vq, elem) == 0) return; @@ -134,18 +173,26 @@ static void virtio_net_receive(void *opaque, const uint8_t *buf, int size) hdr-flags = 0; hdr-gso_type = VIRTIO_NET_HDR_GSO_NONE; -/* copy in packet. ugh */ offset = 0; +total = sizeof(*hdr); + +if (tap_has_offload(n-vc-vlan-first_client)) { + memcpy(hdr, buf, sizeof(*hdr)); + offset += total; +} + +/* copy in packet. ugh */ i = 1; while (offset size i elem.in_num) { int len = MIN(elem.in_sg[i].iov_len, size - offset); memcpy(elem.in_sg[i].iov_base, buf + offset, len); offset += len; + total += len; i++; } /* signal other side */ -virtqueue_push(n-rx_vq, elem, sizeof(*hdr) + offset); +virtqueue_push(n-rx_vq, elem, total); virtio_notify(n-vdev, n-rx_vq); } @@ -153,23 +200,31 @@ static void virtio_net_receive(void *opaque, const uint8_t *buf, int size) static void virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq) { VirtQueueElement elem; +int has_offload = tap_has_offload(n-vc-vlan-first_client); if (!(n-vdev.status VIRTIO_CONFIG_S_DRIVER_OK))
[PATCH 8/9] kvm: qemu: Drop the mutex while reading from tapfd
The idea here is that with GSO, packets are much larger and we can allow the vcpu threads to e.g. process irq acks during the window where we're reading these packets from the tapfd. Signed-off-by: Mark McLoughlin [EMAIL PROTECTED] --- qemu/vl.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/qemu/vl.c b/qemu/vl.c index efdaafd..de92848 100644 --- a/qemu/vl.c +++ b/qemu/vl.c @@ -4281,7 +4281,9 @@ static void tap_send(void *opaque) sbuf.buf = s-buf; s-size = getmsg(s-fd, NULL, sbuf, f) =0 ? sbuf.len : -1; #else + kvm_sleep_begin(); s-size = read(s-fd, s-buf, sizeof(s-buf)); + kvm_sleep_end(); #endif if (s-size == -1 errno == EINTR) -- 1.5.4.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kexec/kdump of a kvm guest?
On Thu, Jul 24, 2008 at 4:39 AM, Alexander Graf [EMAIL PROTECTED] wrote: On Jul 24, 2008, at 2:13 AM, Mike Snitzer wrote: On Sat, Jul 5, 2008 at 7:20 AM, Avi Kivity [EMAIL PROTECTED] wrote: Mike Snitzer wrote: My host is x86_64 RHEL5U1 running 2.6.25.4 with kvm-70 (kvm-intel). When I configure kdump in the guest (running 2.6.22.19) and force a crash (with 'echo c /proc/sysrq-trigger) kexec boots the kdump kernel but then the kernel hangs (before it gets to /sbin/init et al). On the host, the associated qemu is consuming 100% cpu. I really need to be able to collect vmcores from my kvm guests. So far I can't (on raw hardware all works fine). I've tested this a while ago and it worked (though I tested regular kexecs, not crashes); this may be a regression. Please run kvm_stat to see what's happening at the time of the crash. OK, I can look into kvm_stat but I just discovered that just having kvm-intel and kvm loaded into my 2.6.22.19 kernel actually prevents Is 2.6.22.19 your host or your guest kernel? It's very unlikely that you loaded kvm modules in the guest. Correct, 2.6.22.19 is my host kernel. the host from being able to kexec/kdump too!? I didn't have any guests running (only the kvm modules were loaded). As soon as I unloaded the kvm modules kdump worked as expected. Something about kvm is completely breaking kexec/kdump on both the host and guest kernels. I guess the kexec people would be pretty interested in this as well, so I'll just CC them for now. As you're stating that the host kernel breaks with kvm modules loaded, maybe someone there could give a hint. OK, I can try using a newer kernel on the host too (e.g. 2.6.25.x) to see how kexec/kdump of the host fairs when kvm modules are loaded. On the guest side of things, as I mentioned in my original post, kexec/kdump wouldn't work within a 2.6.22.19 guest with the host running 2.6.25.4 (with kvm-70). Mike -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/9] kvm: qemu: Increase size of virtio_net rings
Signed-off-by: Mark McLoughlin [EMAIL PROTECTED] --- qemu/hw/virtio-net.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c index 81282c4..a681a7e 100644 --- a/qemu/hw/virtio-net.c +++ b/qemu/hw/virtio-net.c @@ -305,8 +305,8 @@ PCIDevice *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn) n-vdev.update_config = virtio_net_update_config; n-vdev.get_features = virtio_net_get_features; n-vdev.set_features = virtio_net_set_features; -n-rx_vq = virtio_add_queue(n-vdev, 128, virtio_net_handle_rx); -n-tx_vq = virtio_add_queue(n-vdev, 128, virtio_net_handle_tx); +n-rx_vq = virtio_add_queue(n-vdev, 512, virtio_net_handle_rx); +n-tx_vq = virtio_add_queue(n-vdev, 512, virtio_net_handle_tx); memcpy(n-mac, nd-macaddr, 6); n-vc = qemu_new_vlan_client(nd-vlan, virtio_net_receive, virtio_net_can_receive, n); -- 1.5.4.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/9] kvm: qemu: Disable recv notifications until avail buffers exhausted
Signed-off-by: Mark McLoughlin [EMAIL PROTECTED] --- qemu/hw/virtio-net.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c index 4adfa42..419a2d7 100644 --- a/qemu/hw/virtio-net.c +++ b/qemu/hw/virtio-net.c @@ -106,9 +106,12 @@ static int virtio_net_can_receive(void *opaque) !(n-vdev.status VIRTIO_CONFIG_S_DRIVER_OK)) return 0; -if (n-rx_vq-vring.avail-idx == n-rx_vq-last_avail_idx) +if (n-rx_vq-vring.avail-idx == n-rx_vq-last_avail_idx) { + n-rx_vq-vring.used-flags = ~VRING_USED_F_NO_NOTIFY; return 0; +} +n-rx_vq-vring.used-flags |= VRING_USED_F_NO_NOTIFY; return 1; } -- 1.5.4.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/9] kvm: qemu: Fix virtio_net tx timer
The current virtio_net tx timer is 2ns, which doesn't make any sense. Set it to a more reasonable 150us instead. Signed-off-by: Mark McLoughlin [EMAIL PROTECTED] --- qemu/hw/virtio-net.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c index 2e57e5a..31867f1 100644 --- a/qemu/hw/virtio-net.c +++ b/qemu/hw/virtio-net.c @@ -26,7 +26,7 @@ #define VIRTIO_NET_F_MAC 5 #define VIRTIO_NET_F_GS0 6 -#define TX_TIMER_INTERVAL (1000 / 500) +#define TX_TIMER_INTERVAL (15) /* 150 us */ /* The config defining mac address (6 bytes) */ struct virtio_net_config -- 1.5.4.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/9] kvm: qemu: Set MIN_TIMER_REARM_US to 150us
Equivalent to ~300 syscalls on my machine Signed-off-by: Mark McLoughlin [EMAIL PROTECTED] --- qemu/vl.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qemu/vl.c b/qemu/vl.c index 5d285cc..b7d3397 100644 --- a/qemu/vl.c +++ b/qemu/vl.c @@ -891,7 +891,7 @@ static void qemu_rearm_alarm_timer(struct qemu_alarm_timer *t) } /* TODO: MIN_TIMER_REARM_US should be optimized */ -#define MIN_TIMER_REARM_US 250 +#define MIN_TIMER_REARM_US 150 static struct qemu_alarm_timer *alarm_timer; -- 1.5.4.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/9] kvm: qemu: Remove virtio_net tx ring-full heuristic
virtio_net tries to guess when it has received a tx notification from the guest whether it indicates that the guest has no more room in the tx ring and it should immediately flush the queued buffers. The heuristic is based on the fact that there are 128 buffer entries in the ring and each packet uses 2 buffers (i.e. the virtio_net_hdr and the packet's linear data). Using GSO or increasing the size of the rings will break that heuristic, so let's remove it and assume that any notification from the guest after we've disabled notifications indicates that we should flush our buffers. Signed-off-by: Mark McLoughlin [EMAIL PROTECTED] --- qemu/hw/virtio-net.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c index 31867f1..4adfa42 100644 --- a/qemu/hw/virtio-net.c +++ b/qemu/hw/virtio-net.c @@ -175,8 +175,7 @@ static void virtio_net_handle_tx(VirtIODevice *vdev, VirtQueue *vq) { VirtIONet *n = to_virtio_net(vdev); -if (n-tx_timer_active - (vq-vring.avail-idx - vq-last_avail_idx) == 64) { +if (n-tx_timer_active) { vq-vring.used-flags = ~VRING_USED_F_NO_NOTIFY; qemu_del_timer(n-tx_timer); n-tx_timer_active = 0; -- 1.5.4.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/9] kvm: qemu: Add VIRTIO_F_NOTIFY_ON_EMPTY
Set the VIRTIO_F_NOTIFY_ON_EMPTY feature bit so the guest can rely on us notifying them when the queue is empty. Also, only notify when the available queue is empty *and* when we've finished with all the buffers we had detached. Right now, when the queue is empty, we notify the guest for every used buffer. Signed-off-by: Mark McLoughlin [EMAIL PROTECTED] --- qemu/hw/virtio.c |6 +- qemu/hw/virtio.h |5 + 2 files changed, 10 insertions(+), 1 deletions(-) diff --git a/qemu/hw/virtio.c b/qemu/hw/virtio.c index 3429ac8..e035e4e 100644 --- a/qemu/hw/virtio.c +++ b/qemu/hw/virtio.c @@ -138,6 +138,7 @@ void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem, /* Make sure buffer is written before we update index. */ wmb(); vq-vring.used-idx++; +vq-inuse--; } int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) @@ -187,6 +188,8 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) elem-index = head; +vq-inuse++; + return elem-in_num + elem-out_num; } @@ -275,6 +278,7 @@ static uint32_t virtio_ioport_read(void *opaque, uint32_t addr) switch (addr) { case VIRTIO_PCI_HOST_FEATURES: ret = vdev-get_features(vdev); + ret |= (1 VIRTIO_F_NOTIFY_ON_EMPTY); break; case VIRTIO_PCI_GUEST_FEATURES: ret = vdev-features; @@ -431,7 +435,7 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size, void virtio_notify(VirtIODevice *vdev, VirtQueue *vq) { /* Always notify when queue is empty */ -if (vq-vring.avail-idx != vq-last_avail_idx +if ((vq-inuse || vq-vring.avail-idx != vq-last_avail_idx) (vq-vring.avail-flags VRING_AVAIL_F_NO_INTERRUPT)) return; diff --git a/qemu/hw/virtio.h b/qemu/hw/virtio.h index 61f5038..1adaed3 100644 --- a/qemu/hw/virtio.h +++ b/qemu/hw/virtio.h @@ -30,6 +30,10 @@ /* We've given up on this device. */ #define VIRTIO_CONFIG_S_FAILED 0x80 +/* We notify when the ring is completely used, even if the guest is supressing + * callbacks */ +#define VIRTIO_F_NOTIFY_ON_EMPTY24 + /* from Linux's linux/virtio_ring.h */ /* This marks a buffer as continuing via the next field. */ @@ -86,6 +90,7 @@ struct VirtQueue VRing vring; uint32_t pfn; uint16_t last_avail_idx; +int inuse; void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq); }; -- 1.5.4.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: scsi broken 4GB RAM
Using IDE boot disk, no problem. Win2008 (64bit) works without any problems - 6 gb ram in the guest. After successful booting IDE, I added a second disk using SCSI: windows see the disk but cannot initialize the disk. So SCSI looks quite unusable if you run windows guest (win2003 sp2 also stops during install), or should we load any SCSI driver during setup? Win2008 uses LSI Logic 8953U PCI SCSI Adapter, 53C895A Device (LSI Logic Driver 4.16.6.0, signed) Any other expierences running SCSI on windows? Best Regards, Martin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin Maurer Sent: Donnerstag, 24. Juli 2008 11:46 To: kvm@vger.kernel.org Subject: RE: scsi broken 4GB RAM Sorry, just returned to the installer - also stopped with the same error code, using just 2 gb ram. Best Regards, Martin Maurer [EMAIL PROTECTED] http://www.proxmox.com Proxmox Server Solutions GmbH Kohlgasse 51/10, 1050 Vienna, Austria Phone: +43 1 545 4497 11 Fax: +43 1 545 4497 22 Commercial register no.: FN 258879 f Registration office: Handelsgericht Wien -Original Message- From: Martin Maurer Sent: Donnerstag, 24. Juli 2008 11:44 To: kvm@vger.kernel.org Subject: RE: scsi broken 4GB RAM Hi, I tried windows server 2008 (64 bit) on Proxmox VE 0.9beta2 (KVM 71), see http://pve.proxmox.com): Some details: --memory 6144 --cdrom en_windows_server_2008_datacenter_enterprise_standard_x64_dvd_X14- 26714.iso --name win2008-6gb-scsi --smp 1 --bootdisk scsi0 --scsi0 80 The installer shows 80 GB harddisk but freezes after clicking next for a minute then: Windows could not creat a partition on disk 0. The error occurred while preparing the computer´s system volume. Error code: 0x8004245F. I also got installer problems if I just use scsi as boot disk (no high memory) on several windows versions, including win2003 and xp. So I decided to use IDE, works without any issue on windows. But: I reduced the memory to 2048 and the installer continues to work! Best Regards, Martin Maurer [EMAIL PROTECTED] http://www.proxmox.com Proxmox Server Solutions GmbH Kohlgasse 51/10, 1050 Vienna, Austria Phone: +43 1 545 4497 11 Fax: +43 1 545 4497 22 Commercial register no.: FN 258879 f Registration office: Handelsgericht Wien -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Henrik Holst Sent: Mittwoch, 23. Juli 2008 23:09 To: kvm@vger.kernel.org Subject: scsi broken 4GB RAM I do not know if this is a bug in qemu or the linux kernel sym53c8xx module (I haven't had the opportunity to test with anything other than Linux at the moment) but if one starts an qemu instance with -m 4096 and larger the scsi emulated disk fails in the Linux guest. If booting any install cd the /dev/sda is seen as only 512B in size and if booting an ubuntu 8.04-amd64 with the secondary drive as scsi it is seen with the correct size but one cannot read not write the partition table. Is there anyone out there that could test say a Windows image on scsi with 4GB or more of RAM and see if it works or not? If so it could be the linux driver that is faulty. /Henrik Holst -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/8] KVM: PCIPT: fix interrupt handling
* On Thursday 24 Jul 2008 16:58:57 Ben-Ami Yassour wrote: On Wed, 2008-07-23 at 19:07 +0530, Amit Shah wrote: * On Wednesday 16 Jul 2008 18:47:01 Ben-Ami Yassour wrote: This patch fixes a few problems with the interrupt handling for passthrough devices. 1. Pass the interrupt handler the pointer to the device, so we do not need to lock the pcipt lock in the interrupt handler. 2. Remove the pt_irq_handled bitmap - it is no longer needed. 3. Split kvm_pci_pt_work_fn into two functions, one for interrupt injection and another for the ack - is much simpler code this way. 4. Change the passthrough initialization order - add the device structure to the list, before registering the interrupt handler. 5. On passthrough destruction path, free the interrupt handler before cleaning queued work. Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED] --- if (irqchip_in_kernel(kvm)) { + match-pt_dev.guest.irq = pci_pt_dev-guest.irq; + match-pt_dev.host.irq = dev-irq; + if (kvm-arch.vioapic) + kvm-arch.vioapic-ack_notifier = kvm_pci_pt_ack_irq; + if (kvm-arch.vpic) + kvm-arch.vpic-ack_notifier = kvm_pci_pt_ack_irq; + We shouldn't register this notifier unless we get the irq below to avoid unneeded function calls and checks. Note: This code was changed in the last version of the code but the comment is still relevant. Do you mean that we need to postpone registering the notification? I mean we can register these function pointers after the request_irq succeeds. In the case of an assigned device this is means the we postpone it for a few seconds, and implementing it like above it cleaner. So I don't see the real value in postponing it. Sorry, don't get what you mean here. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2019608 ] Ubuntu 8.04.1 (IA32 x86_64) - cannot install bootloader
Bugs item #2019608, was opened at 2008-07-16 15:03 Message generated for change (Comment added) made by awwy You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2019608group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: intel Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Johannes Truschnigg (c0l0) Assigned to: Nobody/Anonymous (nobody) Summary: Ubuntu 8.04.1 (IA32 x86_64) - cannot install bootloader Initial Comment: CPU: Intel Core 2 Quad Q6600 (4 cores) Distro, kernel: Gentoo GNU/Linux ~amd64, Kernel 2.6.25-r6 Bitness, compiler: x86_64, GCC 4.3.1 KVM versions: kvm-70, kvm-71 When trying to install Ubuntu) either 32bit or 64bit) in a KVM guest, grub-install croaks with. The guest kernel debug ringbuffer shows the following messages: (Please see http://pasted.at/9d7e95f873.html or the attached file!) Windows XP also hangs at installing, actually before anthing substantial other than copying installation files gets done. The first phase of the install completes, however - the graphical installer that's started after the first reboot hangs indefinitely. Worked fine with version = kvm-69 with the very same settings. I'm happy to provide additional information upon request. -- Comment By: Alexander Graf (awwy) Date: 2008-07-24 13:36 Message: Logged In: YES user_id=376328 Originator: NO I bisected it down to commit cc91437d10770328d0b32f200399569a0ad22792, which lies between kvm-60 and kvm-61. I can't really make out any obvious problem that patch may rise though. Nevertheless it seems to be userspace in fault here. -- Comment By: Alexander Graf (awwy) Date: 2008-07-24 05:56 Message: Logged In: YES user_id=376328 Originator: NO I am getting exactly the same error on SLES10 SP2. Running a 32-bit binary in an x86_64 SLES10SP2 guest generates a #DF on a RIP, that looks like a 32-bit mangled kernel space address (80228ca0 vs. 80228ca0). Apparently something truncates it - I'll try to bisect. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2019608group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Simple way of putting a VM on a LAN
On Wed, Jul 23, 2008 at 11:15 PM, Bill Davidsen [EMAIL PROTECTED] wrote: Your easy way seems to mean using Debian, other distributions don't have some of the scripts, or they are in different places or do different things. Other thoughts below. yep, on Gentoo and SuSE i didn't find the included scripts flexible enough, so i did the same 'by hand'. that was a few years ago, it might be better now; but it's not hard to do anyway. Not being a trusting person I find that a bridge is an ineffective firewall, a bridge isn't a firewall. it's the software equivalent of plugging both your host and guest to an ethernet switch. in most ways, your host 'steps out of the way'. but with a bit of trickery that could live on the VM, to the extent it's needed. Now the sets up its own IP is a mystery, since there's no place I have told it what the IP of the machine it replaces might be. I did take the as said before, it's as if your VM is directly plugged to the LAN. you just configure its network 'from inside'. the host doesn't care what IP numbers it uses. in fact, it could be using totally different protocols, just as long as they go over ethernet. hand does result in a working configuration, however, so other than the lack of control from using iptables to forward packets, it works well. you can use iptables. maybe you have to setup ebtables, but in the end, just put rules in the FORWARD chains. google for 'transparent firewall', or 'bridge iptables' of manual setup, it's faster than setting up iptables, and acceptably secure as long as the kvm host is at least as secure as the original. just do with your VM as you do with a 'real' box. after that, you can use the fact that every packet to the VM has to pass through your eth0 device; even if they don't appear on your INPUT chains. -- Javier -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/8] KVM: PCIPT: fix interrupt handling
On Thu, 2008-07-24 at 19:01 +0530, Amit Shah wrote: * On Thursday 24 Jul 2008 16:58:57 Ben-Ami Yassour wrote: On Wed, 2008-07-23 at 19:07 +0530, Amit Shah wrote: * On Wednesday 16 Jul 2008 18:47:01 Ben-Ami Yassour wrote: if (irqchip_in_kernel(kvm)) { + match-pt_dev.guest.irq = pci_pt_dev-guest.irq; + match-pt_dev.host.irq = dev-irq; + if (kvm-arch.vioapic) + kvm-arch.vioapic-ack_notifier = kvm_pci_pt_ack_irq; + if (kvm-arch.vpic) + kvm-arch.vpic-ack_notifier = kvm_pci_pt_ack_irq; + We shouldn't register this notifier unless we get the irq below to avoid unneeded function calls and checks. Note: This code was changed in the last version of the code but the comment is still relevant. Do you mean that we need to postpone registering the notification? I mean we can register these function pointers after the request_irq succeeds. request_irq should be the last initialization operation, since every thing should be ready when in case an interrupt is received In the case of an assigned device this is means the we postpone it for a few seconds, and implementing it like above it cleaner. So I don't see the real value in postponing it. Sorry, don't get what you mean here. never mind... see the answer is above. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Release of OpenNebula 1.0 for Data Center Virtualization Cloud Solutions
The dsa-research group (http://dsa-research.org) is pleased to announce that a stable release (v1.0) of the OpenNebula (ONE) Virtual Infrastructure Engine (http://www.OpenNebula.org) is available for download under the terms of the Apache License, Version 2.0. ONE enables the dynamic allocation of virtual machines on a pool of physical resources, so extending the benefits of existing virtualization platforms from a single physical resource to a pool of resources, decoupling the server not only from the physical infrastructure but also from the physical location. MAIN FEATURES The OpenNebula Virtual Infrastructure Engine differentiates from existing VM managers in its highly modular and open architecture designed to meet the requirements of cluster administrators. The last version supports Xen and KVM virtualization platforms to provide the following features and capabilities: - Centralized management, a single access point to manage a pool of VMs and physical resources. - Efficient resource management, including support to build any capacity provision policy and for advance reservation of capacity through the Haizea lease manager - Powerful API and CLI interfaces for monitoring and controlling VMs and physical resources - Easy 3rd party software integration to provide a complete solution for the deployment of flexible and efficient virtual infrastructures - Fault tolerant design, state is kept in a SQLite database. - Open and flexible architecture to add new infrastructure metrics and parameters or even to support new Hypervisors. - Support to access Amazon EC2 resources to supplement local resources with cloud resources to satisfy peak or fluctuating demands. - Ease of installation and administration on UNIX clusters - Open source software released under Apache license v2.0 - As engine for the dynamic management of VMs, OpenNebula is being enhanced in the context of the RESERVOIR project (EU grant agreement 215605) to address the requirements of several business use cases. More details at http://www.opennebula.org/doku.php?id=documentation:rn-rel1.0 RELEVANT LINKS - Benefits and Features: http://www.opennebula.org/doku.php?id=about - Documentation: http://www.opennebula.org/doku.php?id=documentation - Release Notes: http://www.opennebula.org/doku.php?id=documentation:rn-rel1.0 - Download: http://www.opennebula.org/doku.php?id=software - Ecosystem and Related Tools: http://www.opennebula.org/doku.php?id=ecosystem --8-- Constantino Vázquez, Grid Virtualization Technology Engineer/ Researcher: http://www.dsa-research.org/doku.php?id=people:tinova DSA Research Group: http://dsa-research.org Globus GridWay Metascheduler: http://www.GridWay.org OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2026870 ] xorg-cirrus 1.2.1 fails in x86_64 kvm guests.
Bugs item #2026870, was opened at 2008-07-24 16:56 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2026870group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Sren Hansen (shawarma) Assigned to: Nobody/Anonymous (nobody) Summary: xorg-cirrus 1.2.1 fails in x86_64 kvm guests. Initial Comment: When trying to boot an Ubuntu Intrepid amd64 (x86_64) live CD, the guest hangs when starting X. I've narrowed it down to the new version of X. It works when booting with -no-kvm. I'm afraid that's all the info I have right now. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2026870group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2026870 ] xorg-cirrus 1.2.1 fails in x86_64 kvm guests.
Bugs item #2026870, was opened at 2008-07-24 16:56 Message generated for change (Comment added) made by shawarma You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2026870group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Sren Hansen (shawarma) Assigned to: Nobody/Anonymous (nobody) Summary: xorg-cirrus 1.2.1 fails in x86_64 kvm guests. Initial Comment: When trying to boot an Ubuntu Intrepid amd64 (x86_64) live CD, the guest hangs when starting X. I've narrowed it down to the new version of X. It works when booting with -no-kvm. I'm afraid that's all the info I have right now. -- Comment By: Sren Hansen (shawarma) Date: 2008-07-24 17:26 Message: Logged In: YES user_id=567099 Originator: YES I tried starting X from ssh, so I got this output: This is a pre-release version of the X server from The X.Org Foundation. It is not supported in any way. Bugs may be filed in the bugzilla at http://bugs.freedesktop.org/. Select the xorg product for bugs you find in this release. Before reporting bugs in pre-release versions please check the latest version in the X.Org Foundation git repository. See http://wiki.x.org/wiki/GitPage for git access instructions. X.Org X Server 1.4.99.905 (1.5.0 RC 5) Release Date: 5 September 2007 X Protocol Version 11, Revision 0 Build Operating System: Linux Ubuntu (xorg-server 2:1.4.99.905-0ubuntu4) Current Operating System: Linux ibsen 2.6.26-4-server #1 SMP Mon Jul 14 19:19:23 UTC 2008 x86_64 Build Date: 16 July 2008 03:40:43PM Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. Module Loader present Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. (==) Log file: /var/log/Xorg.0.log, Time: Thu Jul 24 15:18:18 2008 (==) Using config file: /etc/X11/xorg.conf (EE) Failed to load module dri2 (module does not exist, 0) error setting MTRR (base = 0xf000, size = 0x0010, type = 1) Function not implemented (38) -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2026870group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2019053 ] tbench fails on guest when AMD NPT enabled
Bugs item #2019053, was opened at 2008-07-15 18:10 Message generated for change (Comment added) made by alex_williamson You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2019053group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: amd Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Alex Williamson (alex_williamson) Assigned to: Nobody/Anonymous (nobody) Summary: tbench fails on guest when AMD NPT enabled Initial Comment: Running on a dual-socket system with AMD 2356 quad-core processors (8 total cores), 32GB RAM, Ubuntu Hardy 2.6.24-19-generic (64bit) with kvm-71 userspace and kernel modules. With no module options, dmesg confirms: kvm: Nested Paging enabled Start guest with: /usr/local/kvm/bin/qemu-system-x86_64 -hda /dev/VM/Ubuntu64 -m 1024 -net nic,model=e1000,mac=de:ad:be:ef:00:01 -net tap,script=/root/bin/br0-ifup -smp 8 -vnc :0 Guest VM is also Ubuntu Hardy 64bit. On the guest run 'tbench 16 tbench server'. System running tbench_srv is a different system in my case. The tbench client will fail randomly, often quietly with Child failed with status 1, but sometimes more harshly with a glibc double free error. If I unload the modules and reload w/o npt: modprobe -r kvm-amd modprobe -r kvm modprobe kvm-amd npt=0 dmesg confirms: kvm: Nested Paging disabled The tbench test now runs over and over successfully. The test also runs fine on an Intel E5450 (no EPT). -- Comment By: Alex Williamson (alex_williamson) Date: 2008-07-24 11:10 Message: Logged In: YES user_id=333914 Originator: YES I tried the Ubuntu Gutsy 2.6.22-15-generic kernel on the host, but I still see the issue. I'll install openSUSE 10.3 and see what happens. -- Comment By: Alexander Graf (awwy) Date: 2008-07-23 23:45 Message: Logged In: YES user_id=376328 Originator: NO I'm seeing random segfaults when using NPT on a host kernel = 2.6.23. So far I have not been able to reproduce my test case breakages with an openSUSE 10.3 kernel though, so could you please test that and verify if tbench works for you on openSUSE 10.3? It does break with 11.0. I have the feeling that we're seeing the same problem here. -- Comment By: Alex Williamson (alex_williamson) Date: 2008-07-16 09:18 Message: Logged In: YES user_id=333914 Originator: YES No, I added mlockall(MCL_CURRENT | MCL_FUTURE) to qemu/vl.c:main() and it makes no difference. I'm only starting a 1G guest on an otherwise idle 32G host, so host memory pressure is pretty light. -- Comment By: Avi Kivity (avik) Date: 2008-07-16 08:19 Message: Logged In: YES user_id=539971 Originator: NO Strange. If you add an mlockall() to qemu startup, does the test pass? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2019053group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 and PXE issues
Greg Kurtzer wrote: Hello, I noticed some problems with the e1000 implementation in kvm = 70. At first glance it seemed liked a PXE problem as it would not acknowledge the DHCP offer from the server. I tried several different Etherboot ROM images and version 5.2.6 seemed to work. That version isn't PXE compliant so I built an ELF image to boot, and it downloaded it very, very, very, very slowly (as in about 10 minutes) but it did end up working. This all worked perfectly with version 69 and previous. Please let me know if there is a need for any further information, or anything additional to test. Please note that I am not subscribed to this email list so please CC me directly with any responses. I think the e1000 driver in gPXE might have bitrotted. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9][RFC] KVM virtio_net performance
Mark McLoughlin wrote: Hey, One all all-important thing I forgot to include was a comparison with lguest :-) Hey Mark, This patch set is really great! I guess the hard part now is deciding what all we want to apply. Do you have a suggestion of which patches you think are worth applying? BTW, do you have native and guest loopback numbers to compare where we stand with native? netperf, 10x20s runs (Gb/s) | guest-host | host-guest -++--- KVM | 4.230/ 4.619/ 4.780/ 0.155 | 8.140/ 8.578/ 8.770/ 0.162 lguest | 5.700/ 5.926/ 6.150/ 0.132 | 8.680/ 9.073/ 9.320/ 0.205 ping -f -c 10 (ms) | guest-host | host-guest -++--- KVM | 0.199/ 0.326/ 7.698/ 0.744 | 0.199/ 0.245/ 0.402/ 0.022 lguest | 0.022/ 0.055/ 0.467/ 0.019 | 0.019/ 0.046/89.249/ 0.448 So, puppies gets you an extra 1.3Gb/s guest-host, .5Gb/s host-guest and much better latency. I'm surprised lguest gets an extra 1.3gb guest-host. Any idea of where we're loosing it? Actually, I guess the main reason for the latency difference is that when lguest gets notified on the tx ring, it immediately sends whatever is available and then starts a timer. KVM doesn't send anything until it's tx timer fires or the ring is full. Yes, we should definitely do that. It will make ping appear to be a lot faster than it really is :-) Regards, Anthony Liguori Cheers, Mark. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Live Migration, DRBD
I am very happy to discover that KVM does live migration. Now I am figuring out whether it will work for me. What I have in mind is to use DRBD for the file system image. The problem is that during the migration I want to shift the file system access at the moment when the VM has quit running on the host it is leaving but before it starts running on the host where it is arriving. Is there a hook to let me do stuff at this point? This is what I want to do: On the departing machine... - VM has stopped here - umount the volume with the VM file system image - mark volume in DRDB as secondary On the arriving machine... - mark volume in DRBD as primary - mount the volume with the VM file system image - VM can now start here Is there a way? Thanks, -kb -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kexec/kdump of a kvm guest?
Mike Snitzer wrote: On Thu, Jul 24, 2008 at 9:15 AM, Vivek Goyal [EMAIL PROTECTED] wrote: On Thu, Jul 24, 2008 at 07:49:59AM -0400, Mike Snitzer wrote: On Thu, Jul 24, 2008 at 4:39 AM, Alexander Graf [EMAIL PROTECTED] wrote: I can do further research but welcome others' insight: do others have advice on how best to collect a crashed kvm guest's core? I don't know what you do in libvirt, but you can start a gdbstub in QEMU, connect with gdb, and then have gdb dump out a core. Regards, Anthony Liguori It will be interesting to look at your results with 2.6.25.x kernels with kvm module inserted. Currently I can't think what can possibly be wrong. If the host's 2.6.25.4 kernel has both the kvm and kvm-intel modules loaded kexec/kdump does _not_ work (simply hangs the system). If I only have the kvm module loaded kexec/kdump works as expected (likewise if no kvm modules are loaded at all). So it would appear that kvm-intel and kexec are definitely mutually exclusive at the moment (at least on both 2.6.22.x and 2.6.25.x). Mike -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9][RFC] KVM virtio_net performance
Hi Mark, Mark McLoughlin wrote: Hey, Here's a bunch of patches attempting to improve the performance of virtio_net. This is more an RFC rather than a patch submission since, as can be seen below, not all patches actually improve the perfomance measurably. I'm still seeing the same problem I saw with my patch series. Namely, dhclient fails to get a DHCP address. Rusty noticed that RX has a lot more packets received then it should so we're suspicious that we're getting packet corruption. Configuring the tap device with a static address, here's what I get with iperf: w/o patches: guest-host: 625 Mbits/sec host-guest: 825 Mbits/sec w/patches guest-host: 2.02 Gbits/sec host-guest: 1.89 Gbits/sec guest lo: 4.35 Gbits/sec host lo: 4.36 Gbits/sec This is with KVM GUEST configured FWIW. Regards, Anthony Liguori I've tried hard to test each of these patches with as stable and informative a benchmark as I could find. The first benchmark is a netperf[1] based throughput benchmark and the second uses a flood ping[2] to measure latency differences. Each set of figures is min/average/max/standard deviation. The first set is Gb/s and the second is milliseconds. The network configuration used was very simple - the guest with a virtio_net interface and the host with a tap interface and static IP addresses assigned to both - e.g. there was no bridge in the host involved and iptables was disable in both the host and guest. I used: 1) kvm-71-26-g6152996 with the patches that follow 2) Linus's v2.6.26-5752-g93ded9b with Rusty's virtio patches from 219:bbd2611289c5 applied; these are the patches have just been submitted to Linus The conclusions I draw are: 1) The length of the tx mitigation timer makes quite a difference to throughput achieved; we probably need a good heuristic for adjusting this on the fly. 2) Using the recently merged GSO support in the tun/tap driver gives a huge boost, but much more so on the host-guest side. 3) Adjusting the virtio_net ring sizes makes a small difference, but not as much as one might expect 4) Dropping the global mutex while reading GSO packets from the tap interface gives a nice speedup. This highlights the global mutex as a general perfomance issue. 5) Eliminating an extra copy on the host-guest path only makes a barely measurable difference. Anyway, the figures: netperf, 10x20s runs (Gb/s) | guest-host | host-guest -++--- baseline | 1.520/ 1.573/ 1.610/ 0.034 | 1.160/ 1.357/ 1.630/ 0.165 50us tx timer + rearm| 1.050/ 1.086/ 1.110/ 0.017 | 1.710/ 1.832/ 1.960/ 0.092 250us tx timer + rearm | 1.700/ 1.764/ 1.880/ 0.064 | 0.900/ 1.203/ 1.580/ 0.205 150us tx timer + rearm | 1.520/ 1.602/ 1.690/ 0.044 | 1.670/ 1.928/ 2.150/ 0.141 no ring-full heuristic | 1.480/ 1.569/ 1.710/ 0.066 | 1.610/ 1.857/ 2.140/ 0.153 VIRTIO_F_NOTIFY_ON_EMPTY | 1.470/ 1.554/ 1.650/ 0.054 | 1.770/ 1.960/ 2.170/ 0.119 recv NO_NOTIFY | 1.530/ 1.604/ 1.680/ 0.047 | 1.780/ 1.944/ 2.190/ 0.129 GSO | 4.120/ 4.323/ 4.420/ 0.099 | 6.540/ 7.033/ 7.340/ 0.244 ring size == 256 | 4.050/ 4.406/ 4.560/ 0.143 | 6.280/ 7.236/ 8.280/ 0.613 ring size == 512 | 4.420/ 4.600/ 4.960/ 0.140 | 6.470/ 7.205/ 7.510/ 0.314 drop mutex during tapfd read | 4.320/ 4.578/ 4.790/ 0.161 | 8.370/ 8.589/ 8.730/ 0.120 aligouri zero-copy | 4.510/ 4.694/ 4.960/ 0.148 | 8.430/ 8.614/ 8.840/ 0.142 ping -f -c 10 (ms) | guest-host | host-guest -++--- baseline | 0.060/ 0.459/ 7.602/ 0.846 | 0.067/ 0.331/ 2.517/ 0.057 50us tx timer + rearm| 0.081/ 0.143/ 7.436/ 0.374 | 0.093/ 0.133/ 1.883/ 0.026 250us tx timer + rearm | 0.302/ 0.463/ 7.580/ 0.849 | 0.297/ 0.344/ 2.128/ 0.028 150us tx timer + rearm | 0.197/ 0.323/ 7.671/ 0.740 | 0.199/ 0.245/ 7.836/ 0.037 no ring-full heuristic | 0.182/ 0.324/ 7.688/ 0.753 | 0.199/ 0.243/ 2.197/ 0.030 VIRTIO_F_NOTIFY_ON_EMPTY | 0.197/ 0.321/ 7.447/ 0.730 | 0.196/ 0.242/ 2.218/ 0.032 recv NO_NOTIFY | 0.186/ 0.321/ 7.520/ 0.732 | 0.200/ 0.233/ 2.216/ 0.028 GSO | 0.178/ 0.324/ 7.667/ 0.736 | 0.147/ 0.246/ 1.361/ 0.024 ring size == 256 | 0.184/ 0.323/ 7.674/ 0.728 | 0.199/ 0.243/ 2.181/ 0.028 ring size == 512 | (not measured) | (not measured) drop mutex during tapfd read | 0.183/ 0.323/ 7.820/ 0.733 | 0.202/ 0.242/ 2.219/ 0.027 aligouri zero-copy | 0.185/ 0.325/ 7.863/ 0.736 | 0.202/ 0.245/ 7.844/ 0.036 Cheers, Mark. [1] - I used
Re: [PATCH 2/2] Remove -tdf
Anthony Liguori wrote: Gleb Natapov wrote: On Tue, Jul 22, 2008 at 08:20:41PM -0500, Anthony Liguori wrote: Currently both in-kernel PIT and even the in kernel irqchips are not 100% bullet proof. Of course this code is a hack, Gleb Natapov has send better fix for PIT/RTC to qemu list. Can you look into them: http://www.mail-archive.com/kvm@vger.kernel.org/msg01181.html Paul Brook's initial feedback is still valid. It causes quite a lot of churn and may not jive well with a virtual time base. An advantage to the current -tdf patch is that it's more contained. I don't think either approach is going to get past Paul in it's current form. Yes, my patch causes a lot of churn because it changes widely used API. Indeed. But the time drift fix itself is contained to PIT/RTC code only. The last patch series I've sent disables time drift fix if virtual time base is enabled as Paul requested. There was no further feedback from him. I think there's a healthy amount of scepticism about whether tdf really is worth it. This is why I suggested that we need to better quantify exactly how much this patch set helps things. For instance, a time drift test for kvm-autotest would be perfect. tdf is ugly and deviates from how hardware works. A compelling case is needed to justify it. We'll add time drift tests to autotest the minute it starts to run enough interesting tests/loads. In our private test platform we use a simple scenario to test it: 1. Use windows guest and play a movie (changes rtc on acpi win/pit on -no-acpi win freq to 1000hz). 2. Pin the guest to a physical cpu + load the same cpu. 3. Measure a minute in real life vs in the guest. Actually the movie seems to be more smooth without time drift fix. When fixing irqs some times the player needs to cope with too rapid changes. Anyway the main focus is time accuracy and not smoother movies. In-kernel pit does relatively good job for Windows guests, the problem its not yet 100% stable and also we can do it in userspace and the rtc needs a solution too. As Jan Kiszka wrote in one of his mails may be Paul's virtual time base can be adopted to work with KVM too. BTW how virtual time base handles SMP guest? I really don't know. I haven't looked to deeply at the virtual time base. Keep in mind though, that QEMU SMP is not true SMP. All VCPUs run in lock-step. Regards, Anthony Liguori Also, it's important that this is reproducible in upstream QEMU and not just in KVM. If we can make a compelling case for the importance of this, we can possibly work out a compromise. I developed and tested my patch with upstream QEMU. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Live Migration, DRBD
Kent Borg wrote: I am very happy to discover that KVM does live migration. Now I am figuring out whether it will work for me. What I have in mind is to use DRBD for the file system image. The problem is that during the migration I want to shift the file system access at the moment when the VM has quit running on the host it is leaving but before it starts running on the host where it is arriving. Is there a hook to let me do stuff at this point? This is what I want to do: On the departing machine... - VM has stopped here - umount the volume with the VM file system image - mark volume in DRDB as secondary On the arriving machine... - mark volume in DRBD as primary - mount the volume with the VM file system image - VM can now start here Is there a way? No, but one can add such pretty easy. The whole migration code is in one file qemu/migration.c You can add a parameter to qemu migration command to specify a script that should be called on migration end event (similar to the tap script). Thanks, -kb -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: scsi broken 4GB RAM
Martin Maurer wrote: Using IDE boot disk, no problem. Win2008 (64bit) works without any problems - 6 gb ram in the guest. After successful booting IDE, I added a second disk using SCSI: windows see the disk but cannot initialize the disk. So SCSI looks quite unusable if you run windows guest (win2003 sp2 also stops during install), or should we load any SCSI driver during setup? Win2008 uses LSI Logic 8953U PCI SCSI Adapter, 53C895A Device (LSI Logic Driver 4.16.6.0, signed) Any other expierences running SCSI on windows? You're right, its broken right now :( At least ide is stable. Best Regards, Martin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin Maurer Sent: Donnerstag, 24. Juli 2008 11:46 To: kvm@vger.kernel.org Subject: RE: scsi broken 4GB RAM Sorry, just returned to the installer - also stopped with the same error code, using just 2 gb ram. Best Regards, Martin Maurer [EMAIL PROTECTED] http://www.proxmox.com Proxmox Server Solutions GmbH Kohlgasse 51/10, 1050 Vienna, Austria Phone: +43 1 545 4497 11 Fax: +43 1 545 4497 22 Commercial register no.: FN 258879 f Registration office: Handelsgericht Wien -Original Message- From: Martin Maurer Sent: Donnerstag, 24. Juli 2008 11:44 To: kvm@vger.kernel.org Subject: RE: scsi broken 4GB RAM Hi, I tried windows server 2008 (64 bit) on Proxmox VE 0.9beta2 (KVM 71), see http://pve.proxmox.com): Some details: --memory 6144 --cdrom en_windows_server_2008_datacenter_enterprise_standard_x64_dvd_X14- 26714.iso --name win2008-6gb-scsi --smp 1 --bootdisk scsi0 --scsi0 80 The installer shows 80 GB harddisk but freezes after clicking next for a minute then: Windows could not creat a partition on disk 0. The error occurred while preparing the computer´s system volume. Error code: 0x8004245F. I also got installer problems if I just use scsi as boot disk (no high memory) on several windows versions, including win2003 and xp. So I decided to use IDE, works without any issue on windows. But: I reduced the memory to 2048 and the installer continues to work! Best Regards, Martin Maurer [EMAIL PROTECTED] http://www.proxmox.com Proxmox Server Solutions GmbH Kohlgasse 51/10, 1050 Vienna, Austria Phone: +43 1 545 4497 11 Fax: +43 1 545 4497 22 Commercial register no.: FN 258879 f Registration office: Handelsgericht Wien -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Henrik Holst Sent: Mittwoch, 23. Juli 2008 23:09 To: kvm@vger.kernel.org Subject: scsi broken 4GB RAM I do not know if this is a bug in qemu or the linux kernel sym53c8xx module (I haven't had the opportunity to test with anything other than Linux at the moment) but if one starts an qemu instance with -m 4096 and larger the scsi emulated disk fails in the Linux guest. If booting any install cd the /dev/sda is seen as only 512B in size and if booting an ubuntu 8.04-amd64 with the secondary drive as scsi it is seen with the correct size but one cannot read not write the partition table. Is there anyone out there that could test say a Windows image on scsi with 4GB or more of RAM and see if it works or not? If so it could be the linux driver that is faulty. /Henrik Holst -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/9] kvm: qemu: Drop the mutex while reading from tapfd
Mark McLoughlin wrote: The idea here is that with GSO, packets are much larger and we can allow the vcpu threads to e.g. process irq acks during the window where we're reading these packets from the tapfd. Signed-off-by: Mark McLoughlin [EMAIL PROTECTED] --- qemu/vl.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/qemu/vl.c b/qemu/vl.c index efdaafd..de92848 100644 --- a/qemu/vl.c +++ b/qemu/vl.c @@ -4281,7 +4281,9 @@ static void tap_send(void *opaque) sbuf.buf = s-buf; s-size = getmsg(s-fd, NULL, sbuf, f) =0 ? sbuf.len : -1; #else Maybe do it only when GSO is actually used by the guest/tap. Otherwise it can cause some ctx trashing right? + kvm_sleep_begin(); s-size = read(s-fd, s-buf, sizeof(s-buf)); + kvm_sleep_end(); #endif if (s-size == -1 errno == EINTR) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] kvm: qemu: Remove virtio_net tx ring-full heuristic
Mark McLoughlin wrote: virtio_net tries to guess when it has received a tx notification from the guest whether it indicates that the guest has no more room in the tx ring and it should immediately flush the queued buffers. The heuristic is based on the fact that there are 128 buffer entries in the ring and each packet uses 2 buffers (i.e. the virtio_net_hdr and the packet's linear data). Using GSO or increasing the size of the rings will break that heuristic, so let's remove it and assume that any notification from the guest after we've disabled notifications indicates that we should flush our buffers. Signed-off-by: Mark McLoughlin [EMAIL PROTECTED] --- qemu/hw/virtio-net.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c index 31867f1..4adfa42 100644 --- a/qemu/hw/virtio-net.c +++ b/qemu/hw/virtio-net.c @@ -175,8 +175,7 @@ static void virtio_net_handle_tx(VirtIODevice *vdev, VirtQueue *vq) { VirtIONet *n = to_virtio_net(vdev); -if (n-tx_timer_active - (vq-vring.avail-idx - vq-last_avail_idx) == 64) { +if (n-tx_timer_active) { vq-vring.used-flags = ~VRING_USED_F_NO_NOTIFY; qemu_del_timer(n-tx_timer); n-tx_timer_active = 0; Actually we can improve latency a bit more by using this timer only for high throughput scenario. For example, if during the previous timer period no/few packets were accumulated, we can set the flag off and not issue new timer. This way we'll get notified immediately without timer latency. When lots of packets will be transmitted, we'll go back to this batch mode again. Cheers, Dor -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] kvm: qemu: Remove virtio_net tx ring-full heuristic
On Friday 25 July 2008 09:22:53 Dor Laor wrote: Mark McLoughlin wrote: vq-vring.used-flags = ~VRING_USED_F_NO_NOTIFY; qemu_del_timer(n-tx_timer); n-tx_timer_active = 0; As stated by newer messages, we should handle the first tx notification if the timer wasn't active to shorten latency. Cheers, Dor Here's what lguest does at the moment. Basically, we cut the timeout a tiny bit each time, until we get *fewer* packets than last time. Then we bump it up again. Rough, but seems to work (it should be a per-device var of course, not a static). @@ -921,6 +922,7 @@ static void handle_net_output(int fd, st unsigned int head, out, in, num = 0; int len; struct iovec iov[vq-vring.num]; + static int last_timeout_num; if (!timeout) net_xmit_notify++; @@ -941,6 +943,14 @@ static void handle_net_output(int fd, st /* Block further kicks and set up a timer if we saw anything. */ if (!timeout num) block_vq(vq); + + if (timeout) { + if (num last_timeout_num) + timeout_usec += 10; + else if (timeout_usec 1) + timeout_usec--; + last_timeout_num = num; + } } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kexec/kdump of a kvm guest?
On Thu, Jul 24, 2008 at 03:03:33PM -0400, Mike Snitzer wrote: On Thu, Jul 24, 2008 at 9:15 AM, Vivek Goyal [EMAIL PROTECTED] wrote: On Thu, Jul 24, 2008 at 07:49:59AM -0400, Mike Snitzer wrote: On Thu, Jul 24, 2008 at 4:39 AM, Alexander Graf [EMAIL PROTECTED] wrote: As you're stating that the host kernel breaks with kvm modules loaded, maybe someone there could give a hint. OK, I can try using a newer kernel on the host too (e.g. 2.6.25.x) to see how kexec/kdump of the host fairs when kvm modules are loaded. On the guest side of things, as I mentioned in my original post, kexec/kdump wouldn't work within a 2.6.22.19 guest with the host running 2.6.25.4 (with kvm-70). Hi Mike, I have never tried kexec/kdump inside a kvm guest. So I don't know if historically they have been working or not. Avi indicated he seems to remember that at least kexec worked last he tried (didn't provide when/what he tried though). Having said that, Why do we need kdump to work inside the guest? In this case qemu should be knowing about the memory of guest kernel and should be able to capture a kernel crash dump? I am not sure if qemu already does that. If not, then probably we should think about it? To me, kdump is a good solution for baremetal but not for virtualized environment where we already have another piece of software running which can do the job for us. We will end up wasting memory in every instance of guest (memory reserved for kdump kernel in every guest). I haven't looked into what mechanics qemu provides for collecting the entire guest memory image; I'll dig deeper at some point. It seems the libvirt mid-layer (virsh dump - dump the core of a domain to a file for analysis) doesn't support saving a kvm guest core: # virsh dump guest10 guest10.dump libvir: error : this function is not supported by the hypervisor: virDomainCoreDump error: Failed to core dump domain guest10 to guest10.dump Seems that libvirt functionality isn't available yet with kvm (I'm using libvirt 0.4.2, I'll give libvirt 0.4.4 a try). cc'ing the libvirt-list to get their insight. That aside, having the crash dump collection be multi-phased really isn't workable (that is if it requires a crashed guest to be manually saved after the fact). The host system _could_ be rebooted; whereby losing the guest's core image. So automating qemu and/or libvirtd to trigger a dump would seem worthwhile (maybe its already done?). That's a good point. Ideally, one would like dump to be captured automatically if kernel crashes and then reboot back to production kernel. I am not sure what can we do to let qemu know after crash so that it can automatically save dump. What happens in the case of xen guests. Is dump automatically captured or one has to force the dump capture externally. So while I agree with you its ideal to not have to waste memory in each guest for the purposes of kdump; if users want to model a guest image as closely as possible to what will be deployed on bare metal it really would be ideal to support a 1:1 functional equivalent with kvm. Agreed. Making kdump work inside kvm guest does not harm. I work with people who refuse to use kvm because of the lack of kexec/kdump support. Interesting. I can do further research but welcome others' insight: do others have advice on how best to collect a crashed kvm guest's core? It will be interesting to look at your results with 2.6.25.x kernels with kvm module inserted. Currently I can't think what can possibly be wrong. If the host's 2.6.25.4 kernel has both the kvm and kvm-intel modules loaded kexec/kdump does _not_ work (simply hangs the system). If I only have the kvm module loaded kexec/kdump works as expected (likewise if no kvm modules are loaded at all). So it would appear that kvm-intel and kexec are definitely mutually exclusive at the moment (at least on both 2.6.22.x and 2.6.25.x). Ok. So first task is to fix host kexec/kdump with kvm-intel module inserted. Can you do little debugging to find out where system hangs. I generally try few things for kexec related issue debugging. 1. Specify earlyprintk= parameter for second kernel and see if control is reaching to second kernel. 2. Otherwise specify --console-serial parameter on kexec -l commandline and it should display a message I am in purgatory on serial console. This will just mean that control has reached at least till purgatory. 3. If that also does not work, then most likely first kernel itself got stuck somewhere and we need to put some printks in first kernel to find out what's wrong. Thanks Vivek -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Live Migration, DRBD
Kent Borg kentborg at borg.org writes: I am very happy to discover that KVM does live migration. Now I am figuring out whether it will work for me. What I have in mind is to use DRBD for the file system image. The problem is that during the migration I want to shift the file system access at the moment when the VM has quit running on the host it is leaving but before it starts running on the host where it is arriving. Is there a hook to let me do stuff at this point? This is what I want to do: On the departing machine... - VM has stopped here - umount the volume with the VM file system image - mark volume in DRDB as secondary On the arriving machine... - mark volume in DRBD as primary - mount the volume with the VM file system image - VM can now start here Yes, there is a way, but first your setup is a little strange. Why do you take a device (the DRBD) then put a file system on it which just contains a file with the system image? Why not use the DRBD device directly as your system disk? e.g. qemu-system-86_64 -hda /dev/drbdX This way you do not get an extra layer of filesystem slowing things down and taking up space, the whole of the DRBD device is directly accessible to the guest. Most importantly it saves the mount/umount steps in your above procedures. When using DRBD devices directly live migration simply requires that the device is accessible on both nodes at the same time. In other words live migration assumes a shared device, which you have. The only problem is that it needs to be opened read/write on both nodes at the same time, which means you need to go Primary/Primary. The recent DRBD versions support Primary/Primary, you just need to add net { allow-two-primaries; } to the resource section in drbd.conf With that done you can go to the target node, make the device primary there too, start up qemu to accept the incoming migration and migrate from the source node. Afterwards it is advisable to set the source node to secondary. This procedure is safe, as apparently qemu won't start accessing the target device until the source has been finished with and flushed. I have tested the procedure and it worked very well. Hope that helps, Jim P.S. I'm not subscribed to this list so please email me directly if you need to. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html