date:20080724

Re: kexec/kdump of a kvm guest?

2008-07-24 Thread Alexander Graf



On Jul 24, 2008, at 2:13 AM, Mike Snitzer wrote:


On Sat, Jul 5, 2008 at 7:20 AM, Avi Kivity [EMAIL PROTECTED] wrote:

Mike Snitzer wrote:


My host is x86_64 RHEL5U1 running 2.6.25.4 with kvm-70 (kvm-intel).

When I configure kdump in the guest (running 2.6.22.19) and force a
crash (with 'echo c  /proc/sysrq-trigger) kexec boots the kdump
kernel but then the kernel hangs (before it gets to /sbin/init et  
al).

On the host, the associated qemu is consuming 100% cpu.

I really need to be able to collect vmcores from my kvm guests.  So
far I can't (on raw hardware all works fine).




I've tested this a while ago and it worked (though I tested regular  
kexecs,

not crashes); this may be a regression.

Please run kvm_stat to see what's happening at the time of the crash.


OK, I can look into kvm_stat but I just discovered that just having
kvm-intel and kvm loaded into my 2.6.22.19 kernel actually prevents


Is 2.6.22.19 your host or your guest kernel? It's very unlikely that  
you loaded kvm modules in the guest.



the host from being able to kexec/kdump too!?  I didn't have any
guests running (only the kvm modules were loaded).  As soon as I
unloaded the kvm modules kdump worked as expected.

Something about kvm is completely breaking kexec/kdump on both the
host and guest kernels.


I guess the kexec people would be pretty interested in this as well,  
so I'll just CC them for now.
As you're stating that the host kernel breaks with kvm modules loaded,  
maybe someone there could give a hint.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: scsi broken 4GB RAM

2008-07-24 Thread Martin Maurer

Hi,

I tried windows server 2008 (64 bit) on Proxmox VE 0.9beta2 (KVM 71), see 
http://pve.proxmox.com):

Some details:
--memory 6144 --cdrom 
en_windows_server_2008_datacenter_enterprise_standard_x64_dvd_X14-26714.iso 
--name win2008-6gb-scsi --smp 1 --bootdisk scsi0 --scsi0 80

The installer shows 80 GB harddisk but freezes after clicking next for a minute 
then:

Windows could not creat a partition on disk 0. The error occurred while 
preparing the computer´s system volume. Error code: 0x8004245F.

I also got installer problems if I just use scsi as boot disk (no high memory) 
on several windows versions, including win2003 and xp. So I decided to use IDE, 
works without any issue on windows.

But: I reduced the memory to 2048 and the installer continues to work!

Best Regards,

Martin Maurer

[EMAIL PROTECTED]
http://www.proxmox.com


Proxmox Server Solutions GmbH
Kohlgasse 51/10, 1050 Vienna, Austria
Phone: +43 1 545 4497 11 Fax: +43 1 545 4497 22
Commercial register no.: FN 258879 f
Registration office: Handelsgericht Wien


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
 Behalf Of Henrik Holst
 Sent: Mittwoch, 23. Juli 2008 23:09
 To: kvm@vger.kernel.org
 Subject: scsi broken  4GB RAM
 
 I do not know if this is a bug in qemu or the linux kernel sym53c8xx
 module (I haven't had the opportunity to test with anything other than
 Linux at the moment) but if one starts an qemu instance with -m 4096
 and larger the scsi emulated disk fails in the Linux guest.
 
 If booting any install cd the /dev/sda is seen as only 512B in size
 and if booting an ubuntu 8.04-amd64 with the secondary drive as scsi
 it is seen with the correct size but one cannot read not write the
 partition table.
 
 Is there anyone out there that could test say a Windows image on scsi
 with 4GB or more of RAM and see if it works or not? If so it could be
 the linux driver that is faulty.
 
 /Henrik Holst
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: scsi broken 4GB RAM

2008-07-24 Thread Martin Maurer

Sorry, just returned to the installer - also stopped with the same error code, 
using just 2 gb ram.

Best Regards,

Martin Maurer

[EMAIL PROTECTED]
http://www.proxmox.com


Proxmox Server Solutions GmbH
Kohlgasse 51/10, 1050 Vienna, Austria
Phone: +43 1 545 4497 11 Fax: +43 1 545 4497 22
Commercial register no.: FN 258879 f
Registration office: Handelsgericht Wien


 -Original Message-
 From: Martin Maurer
 Sent: Donnerstag, 24. Juli 2008 11:44
 To: kvm@vger.kernel.org
 Subject: RE: scsi broken  4GB RAM
 
 Hi,
 
 I tried windows server 2008 (64 bit) on Proxmox VE 0.9beta2 (KVM 71),
 see http://pve.proxmox.com):
 
 Some details:
 --memory 6144 --cdrom
 en_windows_server_2008_datacenter_enterprise_standard_x64_dvd_X14-
 26714.iso --name win2008-6gb-scsi --smp 1 --bootdisk scsi0 --scsi0 80
 
 The installer shows 80 GB harddisk but freezes after clicking next for
 a minute then:
 
 Windows could not creat a partition on disk 0. The error occurred
 while preparing the computer´s system volume. Error code: 0x8004245F.
 
 I also got installer problems if I just use scsi as boot disk (no high
 memory) on several windows versions, including win2003 and xp. So I
 decided to use IDE, works without any issue on windows.
 
 But: I reduced the memory to 2048 and the installer continues to work!
 
 Best Regards,
 
 Martin Maurer
 
 [EMAIL PROTECTED]
 http://www.proxmox.com
 
 
 Proxmox Server Solutions GmbH
 Kohlgasse 51/10, 1050 Vienna, Austria
 Phone: +43 1 545 4497 11 Fax: +43 1 545 4497 22
 Commercial register no.: FN 258879 f
 Registration office: Handelsgericht Wien
 
 
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
  Behalf Of Henrik Holst
  Sent: Mittwoch, 23. Juli 2008 23:09
  To: kvm@vger.kernel.org
  Subject: scsi broken  4GB RAM
 
  I do not know if this is a bug in qemu or the linux kernel sym53c8xx
  module (I haven't had the opportunity to test with anything other
 than
  Linux at the moment) but if one starts an qemu instance with -m 4096
  and larger the scsi emulated disk fails in the Linux guest.
 
  If booting any install cd the /dev/sda is seen as only 512B in size
  and if booting an ubuntu 8.04-amd64 with the secondary drive as scsi
  it is seen with the correct size but one cannot read not write the
  partition table.
 
  Is there anyone out there that could test say a Windows image on scsi
  with 4GB or more of RAM and see if it works or not? If so it could be
  the linux driver that is faulty.
 
  /Henrik Holst
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/8] KVM: PCIPT: fix interrupt handling

2008-07-24 Thread Ben-Ami Yassour

On Wed, 2008-07-23 at 19:07 +0530, Amit Shah wrote:
 * On Wednesday 16 Jul 2008 18:47:01 Ben-Ami Yassour wrote:
  This patch fixes a few problems with the interrupt handling for
  passthrough devices.
 
  1. Pass the interrupt handler the pointer to the device, so we do not
  need to lock the pcipt lock in the interrupt handler.
 
  2. Remove the pt_irq_handled bitmap - it is no longer needed.
 
  3. Split kvm_pci_pt_work_fn into two functions, one for interrupt
  injection and another for the ack - is much simpler code this way.
 
  4. Change the passthrough initialization order - add the device
  structure to the list, before registering the interrupt handler.
 
  5. On passthrough destruction path, free the interrupt handler before
  cleaning queued work.
 
  Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED]
  ---
 
  if (irqchip_in_kernel(kvm)) {
  +   match-pt_dev.guest.irq = pci_pt_dev-guest.irq;
  +   match-pt_dev.host.irq = dev-irq;
  +   if (kvm-arch.vioapic)
  +   kvm-arch.vioapic-ack_notifier = kvm_pci_pt_ack_irq;
  +   if (kvm-arch.vpic)
  +   kvm-arch.vpic-ack_notifier = kvm_pci_pt_ack_irq;
  +
 
 We shouldn't register this notifier unless we get the irq below to avoid 
 unneeded function calls and checks.

Note: This code was changed in the last version of the code but the
comment is still relevant.

Do you mean that we need to postpone registering the notification?
In the case of an assigned device this is means the we postpone it for a
few seconds, and implementing it like above it cleaner. So I don't see
the real value in postponing it.

Thanks,
Ben


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 9/9] kvm: qemu: Eliminate extra virtio_net copy

2008-07-24 Thread Mark McLoughlin

This is Anthony's net-tap-zero-copy.patch which eliminates
a copy on the host-guest data path with virtio_net.
---
 qemu/hw/virtio-net.c |   76 -
 qemu/net.h   |3 ++
 qemu/vl.c|   50 +
 3 files changed, 109 insertions(+), 20 deletions(-)

diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c
index a681a7e..5e71afe 100644
--- a/qemu/hw/virtio-net.c
+++ b/qemu/hw/virtio-net.c
@@ -70,6 +70,8 @@ typedef struct VirtIONet
 VLANClientState *vc;
 QEMUTimer *tx_timer;
 int tx_timer_active;
+int last_elem_valid;
+VirtQueueElement last_elem;
 } VirtIONet;
 
 /* TODO
@@ -153,47 +155,80 @@ static int virtio_net_can_receive(void *opaque)
 return 1;
 }
 
-static void virtio_net_receive(void *opaque, const uint8_t *buf, int size)
+static void virtio_net_receive_zc(void *opaque, IOZeroCopyHandler *zc, void 
*data)
 {
 VirtIONet *n = opaque;
-VirtQueueElement elem;
+VirtQueueElement *elem = n-last_elem;
 struct virtio_net_hdr *hdr;
-int offset, i;
-int total;
+ssize_t err;
+int idx;
 
-if (virtqueue_pop(n-rx_vq, elem) == 0)
+if (!n-last_elem_valid  virtqueue_pop(n-rx_vq, elem) == 0)
return;
 
-if (elem.in_num  1 || elem.in_sg[0].iov_len != sizeof(*hdr)) {
+if (elem-in_num  1 || elem-in_sg[0].iov_len != sizeof(*hdr)) {
fprintf(stderr, virtio-net header not in first element\n);
exit(1);
 }
 
-hdr = (void *)elem.in_sg[0].iov_base;
+n-last_elem_valid = 1;
+
+hdr = (void *)elem-in_sg[0].iov_base;
 hdr-flags = 0;
 hdr-gso_type = VIRTIO_NET_HDR_GSO_NONE;
 
-offset = 0;
-total = sizeof(*hdr);
+idx = tap_has_offload(n-vc-vlan-first_client) ? 0 : 1;
+
+do {
+err = zc(data, elem-in_sg[idx], elem-in_num - idx);
+} while (err == -1  errno == EINTR);
+
+if (err == -1  errno == EAGAIN)
+return;
 
-if (tap_has_offload(n-vc-vlan-first_client)) {
-   memcpy(hdr, buf, sizeof(*hdr));
-   offset += total;
+if (err  0) {
+fprintf(stderr, virtio_net: error during IO\n);
+return;
 }
 
+/* signal other side */
+n-last_elem_valid = 0;
+virtqueue_push(n-rx_vq, elem, sizeof(*hdr) + err);
+virtio_notify(n-vdev, n-rx_vq);
+}
+
+struct compat_data
+{
+const uint8_t *buf;
+int size;
+};
+
+static ssize_t compat_copy(void *opaque, struct iovec *iov, int iovcnt)
+{
+struct compat_data *compat = opaque;
+int offset, i;
+
 /* copy in packet.  ugh */
-i = 1;
-while (offset  size  i  elem.in_num) {
-   int len = MIN(elem.in_sg[i].iov_len, size - offset);
-   memcpy(elem.in_sg[i].iov_base, buf + offset, len);
+offset = 0;
+i = 0;
+while (offset  compat-size  i  iovcnt) {
+   int len = MIN(iov[i].iov_len, compat-size - offset);
+   memcpy(iov[i].iov_base, compat-buf + offset, len);
offset += len;
-   total += len;
i++;
 }
 
-/* signal other side */
-virtqueue_push(n-rx_vq, elem, total);
-virtio_notify(n-vdev, n-rx_vq);
+return offset;
+}
+
+static void virtio_net_receive(void *opaque, const uint8_t *buf, int size)
+{
+struct compat_data compat;
+
+compat.buf = buf;
+compat.size = size;
+
+virtio_net_receive_zc(opaque, compat_copy, compat);
 }
 
 /* TX */
@@ -310,6 +345,7 @@ PCIDevice *virtio_net_init(PCIBus *bus, NICInfo *nd, int 
devfn)
 memcpy(n-mac, nd-macaddr, 6);
 n-vc = qemu_new_vlan_client(nd-vlan, virtio_net_receive,
  virtio_net_can_receive, n);
+n-vc-fd_read_zc = virtio_net_receive_zc;
 
 n-tx_timer = qemu_new_timer(vm_clock, virtio_net_tx_timer, n);
 n-tx_timer_active = 0;
diff --git a/qemu/net.h b/qemu/net.h
index 6cfd8ce..aca50e9 100644
--- a/qemu/net.h
+++ b/qemu/net.h
@@ -6,6 +6,8 @@
 /* VLANs support */
 
 typedef ssize_t (IOReadvHandler)(void *, const struct iovec *, int);
+typedef ssize_t (IOZeroCopyHandler)(void *, struct iovec *, int);
+typedef void (IOReadZCHandler)(void *, IOZeroCopyHandler *, void *);
 
 typedef struct VLANClientState VLANClientState;
 
@@ -14,6 +16,7 @@ typedef void (SetOffload)(VLANClientState *, int, int, int, 
int);
 struct VLANClientState {
 IOReadHandler *fd_read;
 IOReadvHandler *fd_readv;
+IOReadZCHandler *fd_read_zc;
 /* Packets may still be sent if this returns zero.  It's used to
rate-limit the slirp code.  */
 IOCanRWHandler *fd_can_read;
diff --git a/qemu/vl.c b/qemu/vl.c
index de92848..bc5b151 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -4204,6 +4204,7 @@ typedef struct TAPState {
 char buf[TAP_BUFSIZE];
 int size;
 int offload;
+int received_eagain;
 } TAPState;
 
 static void tap_receive(void *opaque, const uint8_t *buf, int size)
@@ -4232,6 +4233,48 @@ static ssize_t tap_readv(void *opaque, const struct 
iovec *iov,
 return len;
 }
 
+static VLANClientState *tap_can_zero_copy(TAPState

[PATCH 0/9][RFC] KVM virtio_net performance

2008-07-24 Thread Mark McLoughlin


Hey,
  Here's a bunch of patches attempting to improve the performance
of virtio_net. This is more an RFC rather than a patch submission
since, as can be seen below, not all patches actually improve the
perfomance measurably.

  I've tried hard to test each of these patches with as stable and
informative a benchmark as I could find. The first benchmark is a
netperf[1] based throughput benchmark and the second uses a flood
ping[2] to measure latency differences.

  Each set of figures is min/average/max/standard deviation. The
first set is Gb/s and the second is milliseconds.

  The network configuration used was very simple - the guest with
a virtio_net interface and the host with a tap interface and static
IP addresses assigned to both - e.g. there was no bridge in the host
involved and iptables was disable in both the host and guest.

  I used:

  1) kvm-71-26-g6152996 with the patches that follow

  2) Linus's v2.6.26-5752-g93ded9b with Rusty's virtio patches from
 219:bbd2611289c5 applied; these are the patches have just been
 submitted to Linus

  The conclusions I draw are:

  1) The length of the tx mitigation timer makes quite a difference to
 throughput achieved; we probably need a good heuristic for
 adjusting this on the fly.

  2) Using the recently merged GSO support in the tun/tap driver gives
 a huge boost, but much more so on the host-guest side.

  3) Adjusting the virtio_net ring sizes makes a small difference, but
 not as much as one might expect

  4) Dropping the global mutex while reading GSO packets from the tap
 interface gives a nice speedup. This highlights the global mutex
 as a general perfomance issue.

  5) Eliminating an extra copy on the host-guest path only makes a
 barely measurable difference.

Anyway, the figures:

  netperf, 10x20s runs (Gb/s)  |   guest-host  |   host-guest
  
-++---
  baseline | 1.520/ 1.573/ 1.610/ 0.034 | 1.160/ 1.357/ 
1.630/ 0.165
  50us tx timer + rearm| 1.050/ 1.086/ 1.110/ 0.017 | 1.710/ 1.832/ 
1.960/ 0.092
  250us tx timer + rearm   | 1.700/ 1.764/ 1.880/ 0.064 | 0.900/ 1.203/ 
1.580/ 0.205
  150us tx timer + rearm   | 1.520/ 1.602/ 1.690/ 0.044 | 1.670/ 1.928/ 
2.150/ 0.141
  no ring-full heuristic   | 1.480/ 1.569/ 1.710/ 0.066 | 1.610/ 1.857/ 
2.140/ 0.153
  VIRTIO_F_NOTIFY_ON_EMPTY | 1.470/ 1.554/ 1.650/ 0.054 | 1.770/ 1.960/ 
2.170/ 0.119
  recv NO_NOTIFY   | 1.530/ 1.604/ 1.680/ 0.047 | 1.780/ 1.944/ 
2.190/ 0.129
  GSO  | 4.120/ 4.323/ 4.420/ 0.099 | 6.540/ 7.033/ 
7.340/ 0.244
  ring size == 256 | 4.050/ 4.406/ 4.560/ 0.143 | 6.280/ 7.236/ 
8.280/ 0.613
  ring size == 512 | 4.420/ 4.600/ 4.960/ 0.140 | 6.470/ 7.205/ 
7.510/ 0.314
  drop mutex during tapfd read | 4.320/ 4.578/ 4.790/ 0.161 | 8.370/ 8.589/ 
8.730/ 0.120
  aligouri zero-copy   | 4.510/ 4.694/ 4.960/ 0.148 | 8.430/ 8.614/ 
8.840/ 0.142

  ping -f -c 10 (ms)   |   guest-host  |   host-guest
  
-++---
  baseline | 0.060/ 0.459/ 7.602/ 0.846 | 0.067/ 0.331/ 
2.517/ 0.057
  50us tx timer + rearm| 0.081/ 0.143/ 7.436/ 0.374 | 0.093/ 0.133/ 
1.883/ 0.026
  250us tx timer + rearm   | 0.302/ 0.463/ 7.580/ 0.849 | 0.297/ 0.344/ 
2.128/ 0.028
  150us tx timer + rearm   | 0.197/ 0.323/ 7.671/ 0.740 | 0.199/ 0.245/ 
7.836/ 0.037
  no ring-full heuristic   | 0.182/ 0.324/ 7.688/ 0.753 | 0.199/ 0.243/ 
2.197/ 0.030
  VIRTIO_F_NOTIFY_ON_EMPTY | 0.197/ 0.321/ 7.447/ 0.730 | 0.196/ 0.242/ 
2.218/ 0.032
  recv NO_NOTIFY   | 0.186/ 0.321/ 7.520/ 0.732 | 0.200/ 0.233/ 
2.216/ 0.028
  GSO  | 0.178/ 0.324/ 7.667/ 0.736 | 0.147/ 0.246/ 
1.361/ 0.024
  ring size == 256 | 0.184/ 0.323/ 7.674/ 0.728 | 0.199/ 0.243/ 
2.181/ 0.028
  ring size == 512 | (not measured) | (not 
measured)
  drop mutex during tapfd read | 0.183/ 0.323/ 7.820/ 0.733 | 0.202/ 0.242/ 
2.219/ 0.027
  aligouri zero-copy   | 0.185/ 0.325/ 7.863/ 0.736 | 0.202/ 0.245/ 
7.844/ 0.036

Cheers,
Mark.

[1] - I used netperf trunk from:

  http://www.netperf.org/svn/netperf2/trunk

  and simply ran:

  $ i=0; while [ $i -lt 10 ]; do ./netperf -H host -f g -l 20 -P 0 | 
netperf-collect.py; i=$((i+1)); done

  where netperf-collect.py is just a script to calculate the
  average across the runs:

  http://markmc.fedorapeople.org/netperf-collect.py

[2] - ping -c 10 -f host
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/9] kvm: qemu: Add support for partial csums and GSO

2008-07-24 Thread Mark McLoughlin

Signed-off-by: Mark McLoughlin [EMAIL PROTECTED]
---
 qemu/hw/virtio-net.c |   86 +-
 qemu/net.h   |5 +++
 qemu/vl.c|   73 +++---
 3 files changed, 144 insertions(+), 20 deletions(-)

diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c
index 419a2d7..81282c4 100644
--- a/qemu/hw/virtio-net.c
+++ b/qemu/hw/virtio-net.c
@@ -22,9 +22,18 @@
 #define VIRTIO_ID_NET  1
 
 /* The feature bitmap for virtio net */
-#define VIRTIO_NET_F_NO_CSUM   0
-#define VIRTIO_NET_F_MAC   5
-#define VIRTIO_NET_F_GS0   6
+#define VIRTIO_NET_F_CSUM  0   /* Host handles pkts w/ partial csum */
+#define VIRTIO_NET_F_GUEST_CSUM1   /* Guest handles pkts w/ 
partial csum */
+#define VIRTIO_NET_F_MAC   5   /* Host has given MAC address. */
+#define VIRTIO_NET_F_GSO   6   /* Host handles pkts w/ any GSO type */
+#define VIRTIO_NET_F_GUEST_TSO47   /* Guest can handle TSOv4 in. */
+#define VIRTIO_NET_F_GUEST_TSO68   /* Guest can handle TSOv6 in. */
+#define VIRTIO_NET_F_GUEST_ECN 9   /* Guest can handle TSO[6] w/ ECN in. */
+#define VIRTIO_NET_F_GUEST_UFO 10  /* Guest can handle UFO in. */
+#define VIRTIO_NET_F_HOST_TSO4 11  /* Host can handle TSOv4 in. */
+#define VIRTIO_NET_F_HOST_TSO6 12  /* Host can handle TSOv6 in. */
+#define VIRTIO_NET_F_HOST_ECN  13  /* Host can handle TSO[6] w/ ECN in. */
+#define VIRTIO_NET_F_HOST_UFO  14  /* Host can handle UFO in. */
 
 #define TX_TIMER_INTERVAL (15) /* 150 us */
 
@@ -42,8 +51,6 @@ struct virtio_net_hdr
 uint8_t flags;
 #define VIRTIO_NET_HDR_GSO_NONE0   // Not a GSO frame
 #define VIRTIO_NET_HDR_GSO_TCPV4   1   // GSO frame, IPv4 TCP (TSO)
-/* FIXME: Do we need this?  If they said they can handle ECN, do they care? */
-#define VIRTIO_NET_HDR_GSO_TCPV4_ECN   2   // GSO frame, IPv4 TCP w/ ECN
 #define VIRTIO_NET_HDR_GSO_UDP 3   // GSO frame, IPv4 UDP (UFO)
 #define VIRTIO_NET_HDR_GSO_TCPV6   4   // GSO frame, IPv6 TCP
 #define VIRTIO_NET_HDR_GSO_ECN 0x80// TCP has ECN set
@@ -85,7 +92,38 @@ static void virtio_net_update_config(VirtIODevice *vdev, 
uint8_t *config)
 
 static uint32_t virtio_net_get_features(VirtIODevice *vdev)
 {
-return (1  VIRTIO_NET_F_MAC);
+VirtIONet *n = to_virtio_net(vdev);
+VLANClientState *host = n-vc-vlan-first_client;
+uint32_t features = (1  VIRTIO_NET_F_MAC);
+
+if (tap_has_offload(host)) {
+   features |= (1  VIRTIO_NET_F_CSUM);
+   features |= (1  VIRTIO_NET_F_GUEST_CSUM);
+   features |= (1  VIRTIO_NET_F_GUEST_TSO4);
+   features |= (1  VIRTIO_NET_F_GUEST_TSO6);
+   features |= (1  VIRTIO_NET_F_GUEST_ECN);
+   features |= (1  VIRTIO_NET_F_HOST_TSO4);
+   features |= (1  VIRTIO_NET_F_HOST_TSO6);
+   features |= (1  VIRTIO_NET_F_HOST_ECN);
+   /* Kernel can't actually handle UFO in software currently. */
+}
+
+return features;
+}
+
+static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features)
+{
+VirtIONet *n = to_virtio_net(vdev);
+VLANClientState *host = n-vc-vlan-first_client;
+
+if (!tap_has_offload(host) || !host-set_offload)
+   return;
+
+host-set_offload(host,
+ (features  VIRTIO_NET_F_GUEST_CSUM)  1,
+ (features  VIRTIO_NET_F_GUEST_TSO4)  1,
+ (features  VIRTIO_NET_F_GUEST_TSO6)  1,
+ (features  VIRTIO_NET_F_GUEST_ECN)   1);
 }
 
 /* RX */
@@ -121,6 +159,7 @@ static void virtio_net_receive(void *opaque, const uint8_t 
*buf, int size)
 VirtQueueElement elem;
 struct virtio_net_hdr *hdr;
 int offset, i;
+int total;
 
 if (virtqueue_pop(n-rx_vq, elem) == 0)
return;
@@ -134,18 +173,26 @@ static void virtio_net_receive(void *opaque, const 
uint8_t *buf, int size)
 hdr-flags = 0;
 hdr-gso_type = VIRTIO_NET_HDR_GSO_NONE;
 
-/* copy in packet.  ugh */
 offset = 0;
+total = sizeof(*hdr);
+
+if (tap_has_offload(n-vc-vlan-first_client)) {
+   memcpy(hdr, buf, sizeof(*hdr));
+   offset += total;
+}
+
+/* copy in packet.  ugh */
 i = 1;
 while (offset  size  i  elem.in_num) {
int len = MIN(elem.in_sg[i].iov_len, size - offset);
memcpy(elem.in_sg[i].iov_base, buf + offset, len);
offset += len;
+   total += len;
i++;
 }
 
 /* signal other side */
-virtqueue_push(n-rx_vq, elem, sizeof(*hdr) + offset);
+virtqueue_push(n-rx_vq, elem, total);
 virtio_notify(n-vdev, n-rx_vq);
 }
 
@@ -153,23 +200,31 @@ static void virtio_net_receive(void *opaque, const 
uint8_t *buf, int size)
 static void virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq)
 {
 VirtQueueElement elem;
+int has_offload = tap_has_offload(n-vc-vlan-first_client);
 
 if (!(n-vdev.status  VIRTIO_CONFIG_S_DRIVER_OK))

[PATCH 8/9] kvm: qemu: Drop the mutex while reading from tapfd

2008-07-24 Thread Mark McLoughlin

The idea here is that with GSO, packets are much larger
and we can allow the vcpu threads to e.g. process irq
acks during the window where we're reading these
packets from the tapfd.

Signed-off-by: Mark McLoughlin [EMAIL PROTECTED]
---
 qemu/vl.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/qemu/vl.c b/qemu/vl.c
index efdaafd..de92848 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -4281,7 +4281,9 @@ static void tap_send(void *opaque)
sbuf.buf = s-buf;
s-size = getmsg(s-fd, NULL, sbuf, f) =0 ? sbuf.len : -1;
 #else
+   kvm_sleep_begin();
s-size = read(s-fd, s-buf, sizeof(s-buf));
+   kvm_sleep_end();
 #endif
 
if (s-size == -1  errno == EINTR)
-- 
1.5.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kexec/kdump of a kvm guest?

2008-07-24 Thread Mike Snitzer

On Thu, Jul 24, 2008 at 4:39 AM, Alexander Graf [EMAIL PROTECTED] wrote:

 On Jul 24, 2008, at 2:13 AM, Mike Snitzer wrote:

 On Sat, Jul 5, 2008 at 7:20 AM, Avi Kivity [EMAIL PROTECTED] wrote:

 Mike Snitzer wrote:

 My host is x86_64 RHEL5U1 running 2.6.25.4 with kvm-70 (kvm-intel).

 When I configure kdump in the guest (running 2.6.22.19) and force a
 crash (with 'echo c  /proc/sysrq-trigger) kexec boots the kdump
 kernel but then the kernel hangs (before it gets to /sbin/init et al).
 On the host, the associated qemu is consuming 100% cpu.

 I really need to be able to collect vmcores from my kvm guests.  So
 far I can't (on raw hardware all works fine).



 I've tested this a while ago and it worked (though I tested regular
 kexecs,
 not crashes); this may be a regression.

 Please run kvm_stat to see what's happening at the time of the crash.

 OK, I can look into kvm_stat but I just discovered that just having
 kvm-intel and kvm loaded into my 2.6.22.19 kernel actually prevents

 Is 2.6.22.19 your host or your guest kernel? It's very unlikely that you
 loaded kvm modules in the guest.

Correct, 2.6.22.19 is my host kernel.

 the host from being able to kexec/kdump too!?  I didn't have any
 guests running (only the kvm modules were loaded).  As soon as I
 unloaded the kvm modules kdump worked as expected.

 Something about kvm is completely breaking kexec/kdump on both the
 host and guest kernels.

 I guess the kexec people would be pretty interested in this as well, so I'll
 just CC them for now.
 As you're stating that the host kernel breaks with kvm modules loaded, maybe
 someone there could give a hint.

OK, I can try using a newer kernel on the host too (e.g. 2.6.25.x) to
see how kexec/kdump of the host fairs when kvm modules are loaded.

On the guest side of things, as I mentioned in my original post,
kexec/kdump wouldn't work within a 2.6.22.19 guest with the host
running 2.6.25.4 (with kvm-70).

Mike
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 7/9] kvm: qemu: Increase size of virtio_net rings

2008-07-24 Thread Mark McLoughlin

Signed-off-by: Mark McLoughlin [EMAIL PROTECTED]
---
 qemu/hw/virtio-net.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c
index 81282c4..a681a7e 100644
--- a/qemu/hw/virtio-net.c
+++ b/qemu/hw/virtio-net.c
@@ -305,8 +305,8 @@ PCIDevice *virtio_net_init(PCIBus *bus, NICInfo *nd, int 
devfn)
 n-vdev.update_config = virtio_net_update_config;
 n-vdev.get_features = virtio_net_get_features;
 n-vdev.set_features = virtio_net_set_features;
-n-rx_vq = virtio_add_queue(n-vdev, 128, virtio_net_handle_rx);
-n-tx_vq = virtio_add_queue(n-vdev, 128, virtio_net_handle_tx);
+n-rx_vq = virtio_add_queue(n-vdev, 512, virtio_net_handle_rx);
+n-tx_vq = virtio_add_queue(n-vdev, 512, virtio_net_handle_tx);
 memcpy(n-mac, nd-macaddr, 6);
 n-vc = qemu_new_vlan_client(nd-vlan, virtio_net_receive,
  virtio_net_can_receive, n);
-- 
1.5.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/9] kvm: qemu: Disable recv notifications until avail buffers exhausted

2008-07-24 Thread Mark McLoughlin

Signed-off-by: Mark McLoughlin [EMAIL PROTECTED]
---
 qemu/hw/virtio-net.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c
index 4adfa42..419a2d7 100644
--- a/qemu/hw/virtio-net.c
+++ b/qemu/hw/virtio-net.c
@@ -106,9 +106,12 @@ static int virtio_net_can_receive(void *opaque)
!(n-vdev.status  VIRTIO_CONFIG_S_DRIVER_OK))
return 0;
 
-if (n-rx_vq-vring.avail-idx == n-rx_vq-last_avail_idx)
+if (n-rx_vq-vring.avail-idx == n-rx_vq-last_avail_idx) {
+   n-rx_vq-vring.used-flags = ~VRING_USED_F_NO_NOTIFY;
return 0;
+}
 
+n-rx_vq-vring.used-flags |= VRING_USED_F_NO_NOTIFY;
 return 1;
 }
 
-- 
1.5.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/9] kvm: qemu: Fix virtio_net tx timer

2008-07-24 Thread Mark McLoughlin

The current virtio_net tx timer is 2ns, which doesn't
make any sense. Set it to a more reasonable 150us
instead.

Signed-off-by: Mark McLoughlin [EMAIL PROTECTED]
---
 qemu/hw/virtio-net.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c
index 2e57e5a..31867f1 100644
--- a/qemu/hw/virtio-net.c
+++ b/qemu/hw/virtio-net.c
@@ -26,7 +26,7 @@
 #define VIRTIO_NET_F_MAC   5
 #define VIRTIO_NET_F_GS0   6
 
-#define TX_TIMER_INTERVAL (1000 / 500)
+#define TX_TIMER_INTERVAL (15) /* 150 us */
 
 /* The config defining mac address (6 bytes) */
 struct virtio_net_config
-- 
1.5.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/9] kvm: qemu: Set MIN_TIMER_REARM_US to 150us

2008-07-24 Thread Mark McLoughlin

Equivalent to ~300 syscalls on my machine

Signed-off-by: Mark McLoughlin [EMAIL PROTECTED]
---
 qemu/vl.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/qemu/vl.c b/qemu/vl.c
index 5d285cc..b7d3397 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -891,7 +891,7 @@ static void qemu_rearm_alarm_timer(struct qemu_alarm_timer 
*t)
 }
 
 /* TODO: MIN_TIMER_REARM_US should be optimized */
-#define MIN_TIMER_REARM_US 250
+#define MIN_TIMER_REARM_US 150
 
 static struct qemu_alarm_timer *alarm_timer;
 
-- 
1.5.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/9] kvm: qemu: Remove virtio_net tx ring-full heuristic

2008-07-24 Thread Mark McLoughlin

virtio_net tries to guess when it has received a tx
notification from the guest whether it indicates that the
guest has no more room in the tx ring and it should
immediately flush the queued buffers.

The heuristic is based on the fact that there are 128
buffer entries in the ring and each packet uses 2 buffers
(i.e. the virtio_net_hdr and the packet's linear data).

Using GSO or increasing the size of the rings will break
that heuristic, so let's remove it and assume that any
notification from the guest after we've disabled
notifications indicates that we should flush our buffers.

Signed-off-by: Mark McLoughlin [EMAIL PROTECTED]
---
 qemu/hw/virtio-net.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c
index 31867f1..4adfa42 100644
--- a/qemu/hw/virtio-net.c
+++ b/qemu/hw/virtio-net.c
@@ -175,8 +175,7 @@ static void virtio_net_handle_tx(VirtIODevice *vdev, 
VirtQueue *vq)
 {
 VirtIONet *n = to_virtio_net(vdev);
 
-if (n-tx_timer_active 
-   (vq-vring.avail-idx - vq-last_avail_idx) == 64) {
+if (n-tx_timer_active) {
vq-vring.used-flags = ~VRING_USED_F_NO_NOTIFY;
qemu_del_timer(n-tx_timer);
n-tx_timer_active = 0;
-- 
1.5.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/9] kvm: qemu: Add VIRTIO_F_NOTIFY_ON_EMPTY

2008-07-24 Thread Mark McLoughlin

Set the VIRTIO_F_NOTIFY_ON_EMPTY feature bit so the
guest can rely on us notifying them when the queue
is empty.

Also, only notify when the available queue is empty
*and* when we've finished with all the buffers we
had detached. Right now, when the queue is empty,
we notify the guest for every used buffer.

Signed-off-by: Mark McLoughlin [EMAIL PROTECTED]
---
 qemu/hw/virtio.c |6 +-
 qemu/hw/virtio.h |5 +
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/qemu/hw/virtio.c b/qemu/hw/virtio.c
index 3429ac8..e035e4e 100644
--- a/qemu/hw/virtio.c
+++ b/qemu/hw/virtio.c
@@ -138,6 +138,7 @@ void virtqueue_push(VirtQueue *vq, const VirtQueueElement 
*elem,
 /* Make sure buffer is written before we update index. */
 wmb();
 vq-vring.used-idx++;
+vq-inuse--;
 }
 
 int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
@@ -187,6 +188,8 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
 
 elem-index = head;
 
+vq-inuse++;
+
 return elem-in_num + elem-out_num;
 }
 
@@ -275,6 +278,7 @@ static uint32_t virtio_ioport_read(void *opaque, uint32_t 
addr)
 switch (addr) {
 case VIRTIO_PCI_HOST_FEATURES:
ret = vdev-get_features(vdev);
+   ret |= (1  VIRTIO_F_NOTIFY_ON_EMPTY);
break;
 case VIRTIO_PCI_GUEST_FEATURES:
ret = vdev-features;
@@ -431,7 +435,7 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
queue_size,
 void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
 /* Always notify when queue is empty */
-if (vq-vring.avail-idx != vq-last_avail_idx 
+if ((vq-inuse || vq-vring.avail-idx != vq-last_avail_idx) 
(vq-vring.avail-flags  VRING_AVAIL_F_NO_INTERRUPT))
return;
 
diff --git a/qemu/hw/virtio.h b/qemu/hw/virtio.h
index 61f5038..1adaed3 100644
--- a/qemu/hw/virtio.h
+++ b/qemu/hw/virtio.h
@@ -30,6 +30,10 @@
 /* We've given up on this device. */
 #define VIRTIO_CONFIG_S_FAILED 0x80
 
+/* We notify when the ring is completely used, even if the guest is supressing
+ * callbacks */
+#define VIRTIO_F_NOTIFY_ON_EMPTY24
+
 /* from Linux's linux/virtio_ring.h */
 
 /* This marks a buffer as continuing via the next field. */
@@ -86,6 +90,7 @@ struct VirtQueue
 VRing vring;
 uint32_t pfn;
 uint16_t last_avail_idx;
+int inuse;
 void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq);
 };
 
-- 
1.5.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: scsi broken 4GB RAM

2008-07-24 Thread Martin Maurer

Using IDE boot disk, no problem. Win2008 (64bit) works without any problems - 6 
gb ram in the guest.

After successful booting IDE, I added a second disk using SCSI: windows see the 
disk but cannot initialize the disk.

So SCSI looks quite unusable if you run windows guest (win2003 sp2 also stops 
during install), or should we load any SCSI driver during setup? Win2008 uses 
LSI Logic 8953U PCI SCSI Adapter, 53C895A Device (LSI Logic Driver 4.16.6.0, 
signed)

Any other expierences running SCSI on windows?

Best Regards,

Martin

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
 Behalf Of Martin Maurer
 Sent: Donnerstag, 24. Juli 2008 11:46
 To: kvm@vger.kernel.org
 Subject: RE: scsi broken  4GB RAM
 
 Sorry, just returned to the installer - also stopped with the same
 error code, using just 2 gb ram.
 
 Best Regards,
 
 Martin Maurer
 
 [EMAIL PROTECTED]
 http://www.proxmox.com
 
 
 Proxmox Server Solutions GmbH
 Kohlgasse 51/10, 1050 Vienna, Austria
 Phone: +43 1 545 4497 11 Fax: +43 1 545 4497 22
 Commercial register no.: FN 258879 f
 Registration office: Handelsgericht Wien
 
 
  -Original Message-
  From: Martin Maurer
  Sent: Donnerstag, 24. Juli 2008 11:44
  To: kvm@vger.kernel.org
  Subject: RE: scsi broken  4GB RAM
 
  Hi,
 
  I tried windows server 2008 (64 bit) on Proxmox VE 0.9beta2 (KVM 71),
  see http://pve.proxmox.com):
 
  Some details:
  --memory 6144 --cdrom
  en_windows_server_2008_datacenter_enterprise_standard_x64_dvd_X14-
  26714.iso --name win2008-6gb-scsi --smp 1 --bootdisk scsi0 --scsi0 80
 
  The installer shows 80 GB harddisk but freezes after clicking next
 for
  a minute then:
 
  Windows could not creat a partition on disk 0. The error occurred
  while preparing the computer´s system volume. Error code:
 0x8004245F.
 
  I also got installer problems if I just use scsi as boot disk (no
 high
  memory) on several windows versions, including win2003 and xp. So I
  decided to use IDE, works without any issue on windows.
 
  But: I reduced the memory to 2048 and the installer continues to
 work!
 
  Best Regards,
 
  Martin Maurer
 
  [EMAIL PROTECTED]
  http://www.proxmox.com
 
  
  Proxmox Server Solutions GmbH
  Kohlgasse 51/10, 1050 Vienna, Austria
  Phone: +43 1 545 4497 11 Fax: +43 1 545 4497 22
  Commercial register no.: FN 258879 f
  Registration office: Handelsgericht Wien
 
 
   -Original Message-
   From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 On
   Behalf Of Henrik Holst
   Sent: Mittwoch, 23. Juli 2008 23:09
   To: kvm@vger.kernel.org
   Subject: scsi broken  4GB RAM
  
   I do not know if this is a bug in qemu or the linux kernel
 sym53c8xx
   module (I haven't had the opportunity to test with anything other
  than
   Linux at the moment) but if one starts an qemu instance with -m
 4096
   and larger the scsi emulated disk fails in the Linux guest.
  
   If booting any install cd the /dev/sda is seen as only 512B in size
   and if booting an ubuntu 8.04-amd64 with the secondary drive as
 scsi
   it is seen with the correct size but one cannot read not write the
   partition table.
  
   Is there anyone out there that could test say a Windows image on
 scsi
   with 4GB or more of RAM and see if it works or not? If so it could
 be
   the linux driver that is faulty.
  
   /Henrik Holst
   --
   To unsubscribe from this list: send the line unsubscribe kvm in
   the body of a message to [EMAIL PROTECTED]
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/8] KVM: PCIPT: fix interrupt handling

2008-07-24 Thread Amit Shah

* On Thursday 24 Jul 2008 16:58:57 Ben-Ami Yassour wrote:
 On Wed, 2008-07-23 at 19:07 +0530, Amit Shah wrote:
  * On Wednesday 16 Jul 2008 18:47:01 Ben-Ami Yassour wrote:
   This patch fixes a few problems with the interrupt handling for
   passthrough devices.
  
   1. Pass the interrupt handler the pointer to the device, so we do not
   need to lock the pcipt lock in the interrupt handler.
  
   2. Remove the pt_irq_handled bitmap - it is no longer needed.
  
   3. Split kvm_pci_pt_work_fn into two functions, one for interrupt
   injection and another for the ack - is much simpler code this way.
  
   4. Change the passthrough initialization order - add the device
   structure to the list, before registering the interrupt handler.
  
   5. On passthrough destruction path, free the interrupt handler before
   cleaning queued work.
  
   Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED]
   ---
  
 if (irqchip_in_kernel(kvm)) {
   + match-pt_dev.guest.irq = pci_pt_dev-guest.irq;
   + match-pt_dev.host.irq = dev-irq;
   + if (kvm-arch.vioapic)
   + kvm-arch.vioapic-ack_notifier = kvm_pci_pt_ack_irq;
   + if (kvm-arch.vpic)
   + kvm-arch.vpic-ack_notifier = kvm_pci_pt_ack_irq;
   +
 
  We shouldn't register this notifier unless we get the irq below to avoid
  unneeded function calls and checks.

 Note: This code was changed in the last version of the code but the
 comment is still relevant.

 Do you mean that we need to postpone registering the notification?

I mean we can register these function pointers after the request_irq succeeds.

 In the case of an assigned device this is means the we postpone it for a
 few seconds, and implementing it like above it cleaner. So I don't see
 the real value in postponing it.

Sorry, don't get what you mean here.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2019608 ] Ubuntu 8.04.1 (IA32 x86_64) - cannot install bootloader

2008-07-24 Thread SourceForge.net

Bugs item #2019608, was opened at 2008-07-16 15:03
Message generated for change (Comment added) made by awwy
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2019608group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: intel
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Johannes Truschnigg (c0l0)
Assigned to: Nobody/Anonymous (nobody)
Summary: Ubuntu 8.04.1 (IA32  x86_64) - cannot install bootloader

Initial Comment:
CPU: Intel Core 2 Quad Q6600 (4 cores)
Distro, kernel: Gentoo GNU/Linux ~amd64, Kernel 2.6.25-r6
Bitness, compiler: x86_64, GCC 4.3.1
KVM versions: kvm-70, kvm-71

When trying to install Ubuntu) either 32bit or 64bit) in a KVM guest, 
grub-install croaks with. The guest kernel debug ringbuffer shows the following 
messages:

(Please see http://pasted.at/9d7e95f873.html or the attached file!)

Windows XP also hangs at installing, actually before anthing substantial other 
than copying installation files gets done. The first phase of the install 
completes, however - the graphical installer that's started after the first 
reboot hangs indefinitely.

Worked fine with version = kvm-69 with the very same settings.

I'm happy to provide additional information upon request.

--

Comment By: Alexander Graf (awwy)
Date: 2008-07-24 13:36

Message:
Logged In: YES 
user_id=376328
Originator: NO

I bisected it down to commit cc91437d10770328d0b32f200399569a0ad22792,
which lies between kvm-60 and kvm-61. I can't really make out any obvious
problem that patch may rise though. Nevertheless it seems to be userspace
in fault here.


--

Comment By: Alexander Graf (awwy)
Date: 2008-07-24 05:56

Message:
Logged In: YES 
user_id=376328
Originator: NO

I am getting exactly the same error on SLES10 SP2. Running a 32-bit binary
in an x86_64 SLES10SP2 guest generates a #DF on a RIP, that looks like a
32-bit mangled kernel space address (80228ca0 vs.
80228ca0). Apparently something truncates it - I'll try to bisect.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2019608group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Simple way of putting a VM on a LAN

2008-07-24 Thread Javier Guerra

On Wed, Jul 23, 2008 at 11:15 PM, Bill Davidsen [EMAIL PROTECTED] wrote:
 Your easy way seems to mean using Debian, other distributions don't have
 some of the scripts, or they are in different places or do different things.
 Other thoughts below.

yep, on Gentoo and SuSE i didn't find the included scripts flexible
enough, so i did the same 'by hand'.  that was a few years ago, it
might be better now; but it's not hard to do anyway.


 Not being a trusting person I find that a bridge is an ineffective firewall,

a bridge isn't a firewall.  it's the software equivalent of plugging
both your host and guest to an ethernet switch.  in most ways, your
host 'steps out of the way'.

 but with a bit of trickery that could live on the VM, to the extent it's
 needed. Now the sets up its own IP is a mystery, since there's no place I
 have told it what the IP of the machine it replaces might be. I did take the

as said before, it's as if your VM is directly plugged to the LAN.
you just configure its network 'from inside'.  the host doesn't care
what IP numbers it uses.  in fact, it could be using totally different
protocols, just as long as they go over ethernet.

 hand does result in a working configuration, however, so other than the lack
 of control from using iptables to forward packets, it works well.

you can use iptables.  maybe you have to setup ebtables, but in the
end, just put rules in the FORWARD chains.  google for 'transparent
firewall', or 'bridge iptables'

 of manual setup, it's faster than setting up iptables, and acceptably secure
 as long as the kvm host is at least as secure as the original.

just do with your VM as you do with a 'real' box.  after that, you can
use the fact that every packet to the VM has to pass through your eth0
device; even if they don't appear on your INPUT chains.

-- 
Javier
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/8] KVM: PCIPT: fix interrupt handling

2008-07-24 Thread Ben-Ami Yassour

On Thu, 2008-07-24 at 19:01 +0530, Amit Shah wrote:
 * On Thursday 24 Jul 2008 16:58:57 Ben-Ami Yassour wrote:
  On Wed, 2008-07-23 at 19:07 +0530, Amit Shah wrote:
   * On Wednesday 16 Jul 2008 18:47:01 Ben-Ami Yassour wrote:
   
if (irqchip_in_kernel(kvm)) {
+   match-pt_dev.guest.irq = pci_pt_dev-guest.irq;
+   match-pt_dev.host.irq = dev-irq;
+   if (kvm-arch.vioapic)
+   kvm-arch.vioapic-ack_notifier = 
kvm_pci_pt_ack_irq;
+   if (kvm-arch.vpic)
+   kvm-arch.vpic-ack_notifier = 
kvm_pci_pt_ack_irq;
+
  
   We shouldn't register this notifier unless we get the irq below to avoid
   unneeded function calls and checks.
 
  Note: This code was changed in the last version of the code but the
  comment is still relevant.
 
  Do you mean that we need to postpone registering the notification?
 
 I mean we can register these function pointers after the request_irq succeeds.

request_irq should be the last initialization operation, since every
thing should be ready when in case an interrupt is received

 
  In the case of an assigned device this is means the we postpone it for a
  few seconds, and implementing it like above it cleaner. So I don't see
  the real value in postponing it.
 
 Sorry, don't get what you mean here.
never mind... see the answer is above.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Release of OpenNebula 1.0 for Data Center Virtualization Cloud Solutions

2008-07-24 Thread Tino Vázquez

The dsa-research group (http://dsa-research.org) is pleased to  
announce that a stable release (v1.0) of the OpenNebula (ONE) Virtual  
Infrastructure Engine (http://www.OpenNebula.org) is available for  
download under the terms of the Apache License, Version 2.0. ONE  
enables the dynamic allocation of virtual machines on a pool of  
physical resources, so extending the benefits of existing  
virtualization platforms from a single physical resource to a pool of  
resources, decoupling the server not only from the physical  
infrastructure but also from the physical location.


MAIN FEATURES

The OpenNebula Virtual Infrastructure Engine differentiates from  
existing VM managers in its highly modular and open architecture  
designed to meet the requirements of cluster administrators. The last  
version supports Xen and KVM virtualization platforms to provide the  
following features and capabilities:


- Centralized management, a single access point to manage a pool of  
VMs and physical resources.
- Efficient resource management, including support to build any  
capacity provision policy and for advance reservation of capacity  
through the Haizea lease manager
- Powerful API and CLI interfaces for monitoring and controlling VMs  
and physical resources
- Easy 3rd party software integration to provide a complete solution  
for the deployment of flexible and efficient virtual infrastructures

- Fault tolerant design, state is kept in a SQLite database.
- Open and flexible architecture to add new infrastructure metrics and  
parameters or even to support new Hypervisors.
- Support to access Amazon EC2 resources to supplement local resources  
with cloud resources to satisfy peak or fluctuating demands.

- Ease of installation and administration on UNIX clusters
- Open source software released under Apache license v2.0
- As engine for the dynamic management of VMs, OpenNebula is being  
enhanced in the context of the RESERVOIR project (EU grant agreement  
215605) to address the requirements of several business use cases.


More details at http://www.opennebula.org/doku.php?id=documentation:rn-rel1.0

RELEVANT LINKS

- Benefits and Features: http://www.opennebula.org/doku.php?id=about
- Documentation: http://www.opennebula.org/doku.php?id=documentation
- Release Notes: http://www.opennebula.org/doku.php?id=documentation:rn-rel1.0
- Download: http://www.opennebula.org/doku.php?id=software
- Ecosystem and Related Tools: http://www.opennebula.org/doku.php?id=ecosystem

--8--
Constantino Vázquez, Grid  Virtualization Technology Engineer/ 
Researcher: http://www.dsa-research.org/doku.php?id=people:tinova

DSA Research Group: http://dsa-research.org
Globus GridWay Metascheduler: http://www.GridWay.org
OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org






--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2026870 ] xorg-cirrus 1.2.1 fails in x86_64 kvm guests.

2008-07-24 Thread SourceForge.net

Bugs item #2026870, was opened at 2008-07-24 16:56
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2026870group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Sren Hansen (shawarma)
Assigned to: Nobody/Anonymous (nobody)
Summary: xorg-cirrus 1.2.1 fails in x86_64 kvm guests.

Initial Comment:
When trying to boot an Ubuntu Intrepid amd64 (x86_64) live CD, the guest hangs 
when starting X. I've narrowed it down to the new version of X. It works when 
booting with -no-kvm.

I'm afraid that's all the info I have right now.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2026870group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2026870 ] xorg-cirrus 1.2.1 fails in x86_64 kvm guests.

2008-07-24 Thread SourceForge.net

Bugs item #2026870, was opened at 2008-07-24 16:56
Message generated for change (Comment added) made by shawarma
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2026870group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Sren Hansen (shawarma)
Assigned to: Nobody/Anonymous (nobody)
Summary: xorg-cirrus 1.2.1 fails in x86_64 kvm guests.

Initial Comment:
When trying to boot an Ubuntu Intrepid amd64 (x86_64) live CD, the guest hangs 
when starting X. I've narrowed it down to the new version of X. It works when 
booting with -no-kvm.

I'm afraid that's all the info I have right now.

--

Comment By: Sren Hansen (shawarma)
Date: 2008-07-24 17:26

Message:
Logged In: YES 
user_id=567099
Originator: YES

I tried starting X from ssh, so I got this output:
This is a pre-release version of the X server from The X.Org Foundation.
It is not supported in any way.
Bugs may be filed in the bugzilla at http://bugs.freedesktop.org/.
Select the xorg product for bugs you find in this release.
Before reporting bugs in pre-release versions please check the
latest version in the X.Org Foundation git repository.
See http://wiki.x.org/wiki/GitPage for git access instructions.

X.Org X Server 1.4.99.905 (1.5.0 RC 5)
Release Date: 5 September 2007
X Protocol Version 11, Revision 0
Build Operating System: Linux Ubuntu (xorg-server 2:1.4.99.905-0ubuntu4)
Current Operating System: Linux ibsen 2.6.26-4-server #1 SMP Mon Jul 14
19:19:23 UTC 2008 x86_64
Build Date: 16 July 2008  03:40:43PM
 
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
Module Loader present
Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: /var/log/Xorg.0.log, Time: Thu Jul 24 15:18:18 2008
(==) Using config file: /etc/X11/xorg.conf
(EE) Failed to load module dri2 (module does not exist, 0)
error setting MTRR (base = 0xf000, size = 0x0010, type = 1)
Function not implemented (38)


--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2026870group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2019053 ] tbench fails on guest when AMD NPT enabled

2008-07-24 Thread SourceForge.net

Bugs item #2019053, was opened at 2008-07-15 18:10
Message generated for change (Comment added) made by alex_williamson
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2019053group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: amd
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alex Williamson (alex_williamson)
Assigned to: Nobody/Anonymous (nobody)
Summary: tbench fails on guest when AMD NPT enabled

Initial Comment:
Running on a dual-socket system with AMD 2356 quad-core processors (8 total 
cores), 32GB RAM, Ubuntu Hardy 2.6.24-19-generic (64bit) with kvm-71 userspace 
and kernel modules.  With no module options, dmesg confirms: kvm: Nested Paging 
enabled

Start guest with:

/usr/local/kvm/bin/qemu-system-x86_64 -hda /dev/VM/Ubuntu64 -m 1024 -net 
nic,model=e1000,mac=de:ad:be:ef:00:01 -net tap,script=/root/bin/br0-ifup -smp 8 
-vnc :0

Guest VM is also Ubuntu Hardy 64bit.  On the guest run 'tbench 16 tbench 
server'.  System running tbench_srv is a different system in my case.

The tbench client will fail randomly, often quietly with Child failed with 
status 1, but sometimes more harshly with a glibc double free error.

If I unload the modules and reload w/o npt:

modprobe -r kvm-amd
modprobe -r kvm
modprobe kvm-amd npt=0

dmesg confirms: kvm: Nested Paging disabled

The tbench test now runs over and over successfully.  The test also runs fine 
on an Intel E5450 (no EPT).

--

Comment By: Alex Williamson (alex_williamson)
Date: 2008-07-24 11:10

Message:
Logged In: YES 
user_id=333914
Originator: YES

I tried the Ubuntu Gutsy 2.6.22-15-generic kernel on the host, but I still
see the issue.  I'll install openSUSE 10.3 and see what happens.

--

Comment By: Alexander Graf (awwy)
Date: 2008-07-23 23:45

Message:
Logged In: YES 
user_id=376328
Originator: NO

I'm seeing random segfaults when using NPT on a host kernel = 2.6.23. So
far I have not been able to reproduce my test case breakages with an
openSUSE 10.3 kernel though, so could you please test that and verify if
tbench works for you on openSUSE 10.3? It does break with 11.0.

I have the feeling that we're seeing the same problem here.

--

Comment By: Alex Williamson (alex_williamson)
Date: 2008-07-16 09:18

Message:
Logged In: YES 
user_id=333914
Originator: YES

No, I added mlockall(MCL_CURRENT | MCL_FUTURE) to qemu/vl.c:main() and it
makes no difference.  I'm only starting a 1G guest on an otherwise idle 32G
host, so host memory pressure is pretty light.

--

Comment By: Avi Kivity (avik)
Date: 2008-07-16 08:19

Message:
Logged In: YES 
user_id=539971
Originator: NO

Strange.  If you add an mlockall() to qemu startup, does the test pass?

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2019053group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 and PXE issues

2008-07-24 Thread H. Peter Anvin


Greg Kurtzer wrote:

Hello,

I noticed some problems with the e1000 implementation in kvm = 70. At
first glance it seemed liked a PXE problem as it would not acknowledge
the DHCP offer from the server. I tried several different Etherboot
ROM images and version 5.2.6 seemed to work. That version isn't PXE
compliant so I built an ELF image to boot, and it downloaded it very,
very, very, very slowly (as in about 10 minutes) but it did end up
working.

This all worked perfectly with version 69 and previous.

Please let me know if there is a need for any further information, or
anything additional to test.

Please note that I am not subscribed to this email list so please CC
me directly with any responses.



I think the e1000 driver in gPXE might have bitrotted.

-hpa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9][RFC] KVM virtio_net performance

2008-07-24 Thread Anthony Liguori


Mark McLoughlin wrote:

Hey,
One all all-important thing I forgot to include was a comparison with
lguest :-)
  


Hey Mark,

This patch set is really great!  I guess the hard part now is deciding 
what all we want to apply.  Do you have a suggestion of which patches 
you think are worth applying?


BTW, do you have native and guest loopback numbers to compare where we 
stand with native?



  netperf, 10x20s runs (Gb/s)  |   guest-host  |   host-guest
  
-++---
  KVM  | 4.230/ 4.619/ 4.780/ 0.155 | 8.140/ 8.578/ 
8.770/ 0.162
  lguest   | 5.700/ 5.926/ 6.150/ 0.132 | 8.680/ 9.073/ 
9.320/ 0.205

  ping -f -c 10 (ms)   |   guest-host  |   host-guest
  
-++---
  KVM  | 0.199/ 0.326/ 7.698/ 0.744 | 0.199/ 0.245/ 
0.402/ 0.022
  lguest   | 0.022/ 0.055/ 0.467/ 0.019 | 0.019/ 
0.046/89.249/ 0.448


So, puppies gets you an extra 1.3Gb/s guest-host, .5Gb/s host-guest
and much better latency.
  


I'm surprised lguest gets an extra 1.3gb guest-host.  Any idea of where 
we're loosing it?



Actually, I guess the main reason for the latency difference is that
when lguest gets notified on the tx ring, it immediately sends whatever
is available and then starts a timer. KVM doesn't send anything until
it's tx timer fires or the ring is full.
  


Yes, we should definitely do that.  It will make ping appear to be a lot 
faster than it really is :-)


Regards,

Anthony Liguori


Cheers,
Mark.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Live Migration, DRBD

2008-07-24 Thread Kent Borg

I am very happy to discover that KVM does live migration.  Now I am
figuring out whether it will work for me. 

What I have in mind is to use DRBD for the file system image.  The
problem is that during the migration I want to shift the file system
access at the moment when the VM has quit running on the host it is
leaving but before it starts running on the host where it is arriving. 
Is there a hook to let me do stuff at this point?

This is what I want to do:

On the departing machine...

  - VM has stopped here
  - umount the volume with the VM file system image
  - mark volume in DRDB as secondary


On the arriving machine...

  - mark volume in DRBD as primary
  - mount the volume with the VM file system image
  - VM can now start here


Is there a way?


Thanks,

-kb
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kexec/kdump of a kvm guest?

2008-07-24 Thread Anthony Liguori


Mike Snitzer wrote:

On Thu, Jul 24, 2008 at 9:15 AM, Vivek Goyal [EMAIL PROTECTED] wrote:
  

On Thu, Jul 24, 2008 at 07:49:59AM -0400, Mike Snitzer wrote:


On Thu, Jul 24, 2008 at 4:39 AM, Alexander Graf [EMAIL PROTECTED] wrote:
  

I can do further research but welcome others' insight: do others have
advice on how best to collect a crashed kvm guest's core?
  


I don't know what you do in libvirt, but you can start a gdbstub in 
QEMU, connect with gdb, and then have gdb dump out a core.


Regards,

Anthony Liguori


It will be interesting to look at your results with 2.6.25.x kernels with
kvm module inserted. Currently I can't think what can possibly be wrong.



If the host's 2.6.25.4 kernel has both the kvm and kvm-intel modules
loaded kexec/kdump does _not_ work (simply hangs the system).  If I
only have the kvm module loaded kexec/kdump works as expected
(likewise if no kvm modules are loaded at all).  So it would appear
that kvm-intel and kexec are definitely mutually exclusive at the
moment (at least on both 2.6.22.x and 2.6.25.x).

Mike
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9][RFC] KVM virtio_net performance

2008-07-24 Thread Anthony Liguori


Hi Mark,

Mark McLoughlin wrote:

Hey,
  Here's a bunch of patches attempting to improve the performance
of virtio_net. This is more an RFC rather than a patch submission
since, as can be seen below, not all patches actually improve the
perfomance measurably.
  


I'm still seeing the same problem I saw with my patch series.  Namely, 
dhclient fails to get a DHCP address.  Rusty noticed that RX has a lot 
more packets received then it should so we're suspicious that we're 
getting packet corruption.


Configuring the tap device with a static address, here's what I get with 
iperf:


w/o patches:

guest-host: 625 Mbits/sec
host-guest: 825 Mbits/sec

w/patches

guest-host:  2.02 Gbits/sec
host-guest: 1.89 Gbits/sec

guest lo: 4.35 Gbits/sec
host lo: 4.36 Gbits/sec

This is with KVM GUEST configured FWIW.

Regards,

Anthony Liguori


  I've tried hard to test each of these patches with as stable and
informative a benchmark as I could find. The first benchmark is a
netperf[1] based throughput benchmark and the second uses a flood
ping[2] to measure latency differences.

  Each set of figures is min/average/max/standard deviation. The
first set is Gb/s and the second is milliseconds.

  The network configuration used was very simple - the guest with
a virtio_net interface and the host with a tap interface and static
IP addresses assigned to both - e.g. there was no bridge in the host
involved and iptables was disable in both the host and guest.

  I used:

  1) kvm-71-26-g6152996 with the patches that follow

  2) Linus's v2.6.26-5752-g93ded9b with Rusty's virtio patches from
 219:bbd2611289c5 applied; these are the patches have just been
 submitted to Linus

  The conclusions I draw are:

  1) The length of the tx mitigation timer makes quite a difference to
 throughput achieved; we probably need a good heuristic for
 adjusting this on the fly.

  2) Using the recently merged GSO support in the tun/tap driver gives
 a huge boost, but much more so on the host-guest side.

  3) Adjusting the virtio_net ring sizes makes a small difference, but
 not as much as one might expect

  4) Dropping the global mutex while reading GSO packets from the tap
 interface gives a nice speedup. This highlights the global mutex
 as a general perfomance issue.

  5) Eliminating an extra copy on the host-guest path only makes a
 barely measurable difference.

Anyway, the figures:

  netperf, 10x20s runs (Gb/s)  |   guest-host  |   host-guest
  
-++---
  baseline | 1.520/ 1.573/ 1.610/ 0.034 | 1.160/ 1.357/ 
1.630/ 0.165
  50us tx timer + rearm| 1.050/ 1.086/ 1.110/ 0.017 | 1.710/ 1.832/ 
1.960/ 0.092
  250us tx timer + rearm   | 1.700/ 1.764/ 1.880/ 0.064 | 0.900/ 1.203/ 
1.580/ 0.205
  150us tx timer + rearm   | 1.520/ 1.602/ 1.690/ 0.044 | 1.670/ 1.928/ 
2.150/ 0.141
  no ring-full heuristic   | 1.480/ 1.569/ 1.710/ 0.066 | 1.610/ 1.857/ 
2.140/ 0.153
  VIRTIO_F_NOTIFY_ON_EMPTY | 1.470/ 1.554/ 1.650/ 0.054 | 1.770/ 1.960/ 
2.170/ 0.119
  recv NO_NOTIFY   | 1.530/ 1.604/ 1.680/ 0.047 | 1.780/ 1.944/ 
2.190/ 0.129
  GSO  | 4.120/ 4.323/ 4.420/ 0.099 | 6.540/ 7.033/ 
7.340/ 0.244
  ring size == 256 | 4.050/ 4.406/ 4.560/ 0.143 | 6.280/ 7.236/ 
8.280/ 0.613
  ring size == 512 | 4.420/ 4.600/ 4.960/ 0.140 | 6.470/ 7.205/ 
7.510/ 0.314
  drop mutex during tapfd read | 4.320/ 4.578/ 4.790/ 0.161 | 8.370/ 8.589/ 
8.730/ 0.120
  aligouri zero-copy   | 4.510/ 4.694/ 4.960/ 0.148 | 8.430/ 8.614/ 
8.840/ 0.142

  ping -f -c 10 (ms)   |   guest-host  |   host-guest
  
-++---
  baseline | 0.060/ 0.459/ 7.602/ 0.846 | 0.067/ 0.331/ 
2.517/ 0.057
  50us tx timer + rearm| 0.081/ 0.143/ 7.436/ 0.374 | 0.093/ 0.133/ 
1.883/ 0.026
  250us tx timer + rearm   | 0.302/ 0.463/ 7.580/ 0.849 | 0.297/ 0.344/ 
2.128/ 0.028
  150us tx timer + rearm   | 0.197/ 0.323/ 7.671/ 0.740 | 0.199/ 0.245/ 
7.836/ 0.037
  no ring-full heuristic   | 0.182/ 0.324/ 7.688/ 0.753 | 0.199/ 0.243/ 
2.197/ 0.030
  VIRTIO_F_NOTIFY_ON_EMPTY | 0.197/ 0.321/ 7.447/ 0.730 | 0.196/ 0.242/ 
2.218/ 0.032
  recv NO_NOTIFY   | 0.186/ 0.321/ 7.520/ 0.732 | 0.200/ 0.233/ 
2.216/ 0.028
  GSO  | 0.178/ 0.324/ 7.667/ 0.736 | 0.147/ 0.246/ 
1.361/ 0.024
  ring size == 256 | 0.184/ 0.323/ 7.674/ 0.728 | 0.199/ 0.243/ 
2.181/ 0.028
  ring size == 512 | (not measured) | (not 
measured)
  drop mutex during tapfd read | 0.183/ 0.323/ 7.820/ 0.733 | 0.202/ 0.242/ 
2.219/ 0.027
  aligouri zero-copy   | 0.185/ 0.325/ 7.863/ 0.736 | 0.202/ 0.245/ 
7.844/ 0.036

Cheers,
Mark.

[1] - I used

Re: [PATCH 2/2] Remove -tdf

2008-07-24 Thread Dor Laor


Anthony Liguori wrote:

Gleb Natapov wrote:

On Tue, Jul 22, 2008 at 08:20:41PM -0500, Anthony Liguori wrote:
 
Currently both in-kernel PIT and even the in kernel irqchips are 
not  100% bullet proof.
Of course this code is a hack, Gleb Natapov has send better fix 
for  PIT/RTC to qemu list.

Can you look into them:
http://www.mail-archive.com/kvm@vger.kernel.org/msg01181.html
  
Paul Brook's initial feedback is still valid.  It causes quite a lot 
of  churn and may not jive well with a virtual time base.  An 
advantage to  the current -tdf patch is that it's more contained.  I 
don't think  either approach is going to get past Paul in it's 
current form.


Yes, my patch causes a lot of churn because it changes widely used API.
  


Indeed.


But the time drift fix itself is contained to PIT/RTC code only. The
last patch series I've sent disables time drift fix if virtual time base
is enabled as Paul requested. There was no further feedback from him.
  


I think there's a healthy amount of scepticism  about whether tdf 
really is worth it.  This is why I suggested that we need to better 
quantify exactly how much this patch set helps things.  For instance, 
a time drift test for kvm-autotest would be perfect.


tdf is ugly and deviates from how hardware works.  A compelling case 
is needed to justify it.


We'll add time drift tests to autotest the minute it starts to run 
enough interesting tests/loads.

In our private test platform we use a simple scenario to test it:
1. Use windows guest and play a movie (changes rtc on acpi win/pit on 
-no-acpi win freq to 1000hz).

2. Pin the guest to a physical cpu + load the same cpu.
3. Measure a minute in real life vs in the guest.

Actually the movie seems to be more smooth without time drift fix. When 
fixing irqs some times the player needs to cope with too rapid changes. 
Anyway the main focus is time accuracy and not smoother movies.


In-kernel pit does relatively good job for Windows guests, the problem 
its not yet 100% stable and also we can do it in userspace and the rtc 
needs a solution too.

As Jan Kiszka wrote in one of his mails may be Paul's virtual time base
can be adopted to work with KVM too. BTW how virtual time base handles
SMP guest?
  


I really don't know.  I haven't looked to deeply at the virtual time 
base.  Keep in mind though, that QEMU SMP is not true SMP.  All VCPUs 
run in lock-step.


Regards,

Anthony Liguori

Also, it's important that this is reproducible in upstream QEMU and 
not  just in KVM.  If we can make a compelling case for the 
importance of  this, we can possibly work out a compromise.




I developed and tested my patch with upstream QEMU.

--
Gleb.
  




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Live Migration, DRBD

2008-07-24 Thread Dor Laor


Kent Borg wrote:

I am very happy to discover that KVM does live migration.  Now I am
figuring out whether it will work for me. 


What I have in mind is to use DRBD for the file system image.  The
problem is that during the migration I want to shift the file system
access at the moment when the VM has quit running on the host it is
leaving but before it starts running on the host where it is arriving. 
Is there a hook to let me do stuff at this point?


This is what I want to do:

On the departing machine...

  - VM has stopped here
  - umount the volume with the VM file system image
  - mark volume in DRDB as secondary


On the arriving machine...

  - mark volume in DRBD as primary
  - mount the volume with the VM file system image
  - VM can now start here


Is there a way?

  
No, but one can add such pretty easy. The whole migration code is in one 
file qemu/migration.c
You can add a parameter to qemu migration command to specify a script 
that should be called on

migration end event (similar to the tap script).

Thanks,

-kb
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: scsi broken 4GB RAM

2008-07-24 Thread Dor Laor


Martin Maurer wrote:

Using IDE boot disk, no problem. Win2008 (64bit) works without any problems - 6 
gb ram in the guest.

After successful booting IDE, I added a second disk using SCSI: windows see the 
disk but cannot initialize the disk.

So SCSI looks quite unusable if you run windows guest (win2003 sp2 also stops 
during install), or should we load any SCSI driver during setup? Win2008 uses 
LSI Logic 8953U PCI SCSI Adapter, 53C895A Device (LSI Logic Driver 4.16.6.0, 
signed)

Any other expierences running SCSI on windows?

  

You're right, its broken right now :(
At least ide is stable.

Best Regards,

Martin

  

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Martin Maurer
Sent: Donnerstag, 24. Juli 2008 11:46
To: kvm@vger.kernel.org
Subject: RE: scsi broken  4GB RAM

Sorry, just returned to the installer - also stopped with the same
error code, using just 2 gb ram.

Best Regards,

Martin Maurer

[EMAIL PROTECTED]
http://www.proxmox.com


Proxmox Server Solutions GmbH
Kohlgasse 51/10, 1050 Vienna, Austria
Phone: +43 1 545 4497 11 Fax: +43 1 545 4497 22
Commercial register no.: FN 258879 f
Registration office: Handelsgericht Wien




-Original Message-
From: Martin Maurer
Sent: Donnerstag, 24. Juli 2008 11:44
To: kvm@vger.kernel.org
Subject: RE: scsi broken  4GB RAM

Hi,

I tried windows server 2008 (64 bit) on Proxmox VE 0.9beta2 (KVM 71),
see http://pve.proxmox.com):

Some details:
--memory 6144 --cdrom
en_windows_server_2008_datacenter_enterprise_standard_x64_dvd_X14-
26714.iso --name win2008-6gb-scsi --smp 1 --bootdisk scsi0 --scsi0 80

The installer shows 80 GB harddisk but freezes after clicking next
  

for


a minute then:

Windows could not creat a partition on disk 0. The error occurred
while preparing the computer´s system volume. Error code:
  

0x8004245F.


I also got installer problems if I just use scsi as boot disk (no
  

high


memory) on several windows versions, including win2003 and xp. So I
decided to use IDE, works without any issue on windows.

But: I reduced the memory to 2048 and the installer continues to
  

work!


Best Regards,

Martin Maurer

[EMAIL PROTECTED]
http://www.proxmox.com


Proxmox Server Solutions GmbH
Kohlgasse 51/10, 1050 Vienna, Austria
Phone: +43 1 545 4497 11 Fax: +43 1 545 4497 22
Commercial register no.: FN 258879 f
Registration office: Handelsgericht Wien


  

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]


On


Behalf Of Henrik Holst
Sent: Mittwoch, 23. Juli 2008 23:09
To: kvm@vger.kernel.org
Subject: scsi broken  4GB RAM

I do not know if this is a bug in qemu or the linux kernel


sym53c8xx


module (I haven't had the opportunity to test with anything other


than
  

Linux at the moment) but if one starts an qemu instance with -m


4096


and larger the scsi emulated disk fails in the Linux guest.

If booting any install cd the /dev/sda is seen as only 512B in size
and if booting an ubuntu 8.04-amd64 with the secondary drive as


scsi


it is seen with the correct size but one cannot read not write the
partition table.

Is there anyone out there that could test say a Windows image on


scsi


with 4GB or more of RAM and see if it works or not? If so it could


be


the linux driver that is faulty.

/Henrik Holst
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/9] kvm: qemu: Drop the mutex while reading from tapfd

2008-07-24 Thread Dor Laor


Mark McLoughlin wrote:

The idea here is that with GSO, packets are much larger
and we can allow the vcpu threads to e.g. process irq
acks during the window where we're reading these
packets from the tapfd.

Signed-off-by: Mark McLoughlin [EMAIL PROTECTED]
---
 qemu/vl.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/qemu/vl.c b/qemu/vl.c
index efdaafd..de92848 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -4281,7 +4281,9 @@ static void tap_send(void *opaque)
sbuf.buf = s-buf;
s-size = getmsg(s-fd, NULL, sbuf, f) =0 ? sbuf.len : -1;
 #else
  

Maybe do it only when GSO is actually used by the guest/tap.
Otherwise it can cause some ctx trashing right?

+   kvm_sleep_begin();
s-size = read(s-fd, s-buf, sizeof(s-buf));
+   kvm_sleep_end();
 #endif
 
 	if (s-size == -1  errno == EINTR)
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/9] kvm: qemu: Remove virtio_net tx ring-full heuristic

2008-07-24 Thread Dor Laor


Mark McLoughlin wrote:

virtio_net tries to guess when it has received a tx
notification from the guest whether it indicates that the
guest has no more room in the tx ring and it should
immediately flush the queued buffers.

The heuristic is based on the fact that there are 128
buffer entries in the ring and each packet uses 2 buffers
(i.e. the virtio_net_hdr and the packet's linear data).

Using GSO or increasing the size of the rings will break
that heuristic, so let's remove it and assume that any
notification from the guest after we've disabled
notifications indicates that we should flush our buffers.

Signed-off-by: Mark McLoughlin [EMAIL PROTECTED]
---
 qemu/hw/virtio-net.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c
index 31867f1..4adfa42 100644
--- a/qemu/hw/virtio-net.c
+++ b/qemu/hw/virtio-net.c
@@ -175,8 +175,7 @@ static void virtio_net_handle_tx(VirtIODevice *vdev, 
VirtQueue *vq)
 {
 VirtIONet *n = to_virtio_net(vdev);
 
-if (n-tx_timer_active 

-   (vq-vring.avail-idx - vq-last_avail_idx) == 64) {
+if (n-tx_timer_active) {
vq-vring.used-flags = ~VRING_USED_F_NO_NOTIFY;
qemu_del_timer(n-tx_timer);
n-tx_timer_active = 0;
  
Actually we can improve latency a bit more by using this timer only for 
high throughput
scenario. For example, if during the previous timer period no/few 
packets were accumulated,
we can set the flag off and not issue new timer. This way we'll get 
notified immediately without timer
latency. When lots of packets will be transmitted, we'll go back to this 
batch mode again.

Cheers, Dor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/9] kvm: qemu: Remove virtio_net tx ring-full heuristic

2008-07-24 Thread Rusty Russell

On Friday 25 July 2008 09:22:53 Dor Laor wrote:
 Mark McLoughlin wrote:
  vq-vring.used-flags = ~VRING_USED_F_NO_NOTIFY;
  qemu_del_timer(n-tx_timer);
  n-tx_timer_active = 0;

 As stated by newer messages, we should handle the first tx notification
 if the timer wasn't active to shorten latency.
 Cheers, Dor

Here's what lguest does at the moment.  Basically, we cut the timeout a tiny 
bit each time, until we get *fewer* packets than last time.  Then we bump it 
up again.

Rough, but seems to work (it should be a per-device var of course, not a 
static).

@@ -921,6 +922,7 @@ static void handle_net_output(int fd, st
unsigned int head, out, in, num = 0;
int len;
struct iovec iov[vq-vring.num];
+   static int last_timeout_num;
 
if (!timeout)
net_xmit_notify++;
@@ -941,6 +943,14 @@ static void handle_net_output(int fd, st
/* Block further kicks and set up a timer if we saw anything. */
if (!timeout  num)
block_vq(vq);
+
+   if (timeout) {
+   if (num  last_timeout_num)
+   timeout_usec += 10;
+   else if (timeout_usec  1)
+   timeout_usec--;
+   last_timeout_num = num;
+   }
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kexec/kdump of a kvm guest?

2008-07-24 Thread Vivek Goyal

On Thu, Jul 24, 2008 at 03:03:33PM -0400, Mike Snitzer wrote:
 On Thu, Jul 24, 2008 at 9:15 AM, Vivek Goyal [EMAIL PROTECTED] wrote:
  On Thu, Jul 24, 2008 at 07:49:59AM -0400, Mike Snitzer wrote:
  On Thu, Jul 24, 2008 at 4:39 AM, Alexander Graf [EMAIL PROTECTED] wrote:
 
   As you're stating that the host kernel breaks with kvm modules loaded, 
   maybe
   someone there could give a hint.
 
  OK, I can try using a newer kernel on the host too (e.g. 2.6.25.x) to
  see how kexec/kdump of the host fairs when kvm modules are loaded.
 
  On the guest side of things, as I mentioned in my original post,
  kexec/kdump wouldn't work within a 2.6.22.19 guest with the host
  running 2.6.25.4 (with kvm-70).
 
 
  Hi Mike,
 
  I have never tried kexec/kdump inside a kvm guest. So I don't know if
  historically they have been working or not.
 
 Avi indicated he seems to remember that at least kexec worked last he
 tried (didn't provide when/what he tried though).
 
  Having said that, Why do we need kdump to work inside the guest? In this
  case qemu should be knowing about the memory of guest kernel and should
  be able to capture a kernel crash dump? I am not sure if qemu already does
  that. If not, then probably we should think about it?
 
  To me, kdump is a good solution for baremetal but not for virtualized
  environment where we already have another piece of software running which
  can do the job for us. We will end up wasting memory in every instance
  of guest (memory reserved for kdump kernel in every guest).
 
 I haven't looked into what mechanics qemu provides for collecting the
 entire guest memory image; I'll dig deeper at some point.  It seems
 the libvirt mid-layer (virsh dump - dump the core of a domain to a
 file for analysis) doesn't support saving a kvm guest core:
 # virsh dump guest10 guest10.dump
 libvir: error : this function is not supported by the hypervisor:
 virDomainCoreDump
 error: Failed to core dump domain guest10 to guest10.dump
 
 Seems that libvirt functionality isn't available yet with kvm (I'm
 using libvirt 0.4.2, I'll give libvirt 0.4.4 a try).  cc'ing the
 libvirt-list to get their insight.
 
 That aside, having the crash dump collection be multi-phased really
 isn't workable (that is if it requires a crashed guest to be manually
 saved after the fact).  The host system _could_ be rebooted; whereby
 losing the guest's core image.  So automating qemu and/or libvirtd to
 trigger a dump would seem worthwhile (maybe its already done?).
 

That's a good point. Ideally, one would like dump to be captured
automatically if kernel crashes and then reboot back to production
kernel. I am not sure what can we do to let qemu know after crash
so that it can automatically save dump.

What happens in the case of xen guests. Is dump automatically captured
or one has to force the dump capture externally.

 So while I agree with you its ideal to not have to waste memory in
 each guest for the purposes of kdump; if users want to model a guest
 image as closely as possible to what will be deployed on bare metal it
 really would be ideal to support a 1:1 functional equivalent with kvm.

Agreed. Making kdump work inside kvm guest does not harm.

  I work with people who refuse to use kvm because of the lack of
 kexec/kdump support.
 

Interesting.

 I can do further research but welcome others' insight: do others have
 advice on how best to collect a crashed kvm guest's core?
 
  It will be interesting to look at your results with 2.6.25.x kernels with
  kvm module inserted. Currently I can't think what can possibly be wrong.
 
 If the host's 2.6.25.4 kernel has both the kvm and kvm-intel modules
 loaded kexec/kdump does _not_ work (simply hangs the system).  If I
 only have the kvm module loaded kexec/kdump works as expected
 (likewise if no kvm modules are loaded at all).  So it would appear
 that kvm-intel and kexec are definitely mutually exclusive at the
 moment (at least on both 2.6.22.x and 2.6.25.x).

Ok. So first task is to fix host kexec/kdump with kvm-intel module
inserted.

Can you do little debugging to find out where system hangs. I generally
try few things for kexec related issue debugging.

1. Specify earlyprintk= parameter for second kernel and see if control
   is reaching to second kernel.

2. Otherwise specify --console-serial parameter on kexec -l commandline
   and it should display a message I am in purgatory on serial console.
   This will just mean that control has reached at least till purgatory.

3. If that also does not work, then most likely first kernel itself got
   stuck somewhere and we need to put some printks in first kernel to find
   out what's wrong.


Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Live Migration, DRBD

2008-07-24 Thread Jim

Kent Borg kentborg at borg.org writes:

 
 I am very happy to discover that KVM does live migration.  Now I am
 figuring out whether it will work for me. 
 
 What I have in mind is to use DRBD for the file system image.  The
 problem is that during the migration I want to shift the file system
 access at the moment when the VM has quit running on the host it is
 leaving but before it starts running on the host where it is arriving. 
 Is there a hook to let me do stuff at this point?
 
 This is what I want to do:
 
 On the departing machine...
 
   - VM has stopped here
   - umount the volume with the VM file system image
   - mark volume in DRDB as secondary
 
 On the arriving machine...
 
   - mark volume in DRBD as primary
   - mount the volume with the VM file system image
   - VM can now start here
 

Yes, there is a way, but first your setup is a little strange. Why do you
take a device (the DRBD) then put a file system on it which just contains a
file with the system image? Why not use the DRBD device directly as your system
disk?

e.g. qemu-system-86_64 -hda /dev/drbdX

This way you do not get an extra layer of filesystem slowing things down
and taking up space, the whole of the DRBD device is directly accessible to
the guest.

Most importantly it saves the mount/umount steps in your above procedures.

When using DRBD devices directly live migration simply requires that the
device is accessible on both nodes at the same time. In other words live
migration assumes a shared device, which you have. The only problem is that
it needs to be opened read/write on both nodes at the same time, which means
you need to go Primary/Primary.

The recent DRBD versions support Primary/Primary, you just need to add
net { allow-two-primaries; }
to the resource section in drbd.conf

With that done you can go to the target node, make the device primary there
too, start up qemu to accept the incoming migration and migrate from the
source node.

Afterwards it is advisable to set the source node to secondary.

This procedure is safe, as apparently qemu won't start accessing the target
device until the source has been finished with and flushed. I have tested
the procedure and it worked very well.

Hope that helps,

Jim

P.S. I'm not subscribed to this list so please email me directly if you
need to.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

37 matches

Mail list logo