Re: Network shutdown under load

2010-02-10 Thread Anthony Liguori

On 02/08/2010 10:10 AM, Tom Lendacky wrote:

Fix a race condition where qemu finds that there are not enough virtio
ring buffers available and the guest make more buffers available before
qemu can enable notifications.

Signed-off-by: Tom Lendackyt...@us.ibm.com
Signed-off-by: Anthony Liguorialigu...@us.ibm.com
   


Applied.  Thanks.  We should audit the code for proper barrier support.  
Right now, I think there's a lot of places that we're missing them.


Regards,

Anthony Liguori


  hw/virtio-net.c |   10 +-
  1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 6e48997..5c0093e 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -379,7 +379,15 @@ static int virtio_net_has_buffers(VirtIONet *n, int 
bufsize)
  (n-mergeable_rx_bufs
   !virtqueue_avail_bytes(n-rx_vq, bufsize, 0))) {
  virtio_queue_set_notification(n-rx_vq, 1);
-return 0;
+
+/* To avoid a race condition where the guest has made some buffers
+ * available after the above check but before notification was
+ * enabled, check for available buffers again.
+ */
+if (virtio_queue_empty(n-rx_vq) ||
+(n-mergeable_rx_bufs
+ !virtqueue_avail_bytes(n-rx_vq, bufsize, 0)))
+return 0;
  }

  virtio_queue_set_notification(n-rx_vq, 0);

On Friday 29 January 2010 02:06:41 pm Tom Lendacky wrote:
   

There's been some discussion of this already in the kvm list, but I want to
summarize what I've found and also include the qemu-devel list in an effort
  to find a solution to this problem.

Running a netperf test between two kvm guests results in the guest's
  network interface shutting down. I originally found this using kvm guests
  on two different machines that were connected via a 10GbE link.  However,
  I found this problem can be easily reproduced using two guests on the same
  machine.

I am running the 2.6.32 level of the kvm.git tree and the 0.12.1.2 level of
the qemu-kvm.git tree.

The setup includes two bridges, br0 and br1.

The commands used to start the guests are as follows:
usr/local/bin/qemu-system-x86_64 -name cape-vm001 -m 1024 -drive
file=/autobench/var/tmp/cape-vm001-
raw.img,if=virtio,index=0,media=disk,boot=on -net
nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:51,netdev=cape-vm001-eth0 -
netdev tap,id=cape-vm001-eth0,script=/autobench/var/tmp/ifup-kvm-
br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:D1,netdev=cape-vm001-eth1 -
netdev tap,id=cape-vm001-eth1,script=/autobench/var/tmp/ifup-kvm-
br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :1 -monitor
telnet::5701,server,nowait -snapshot -daemonize

usr/local/bin/qemu-system-x86_64 -name cape-vm002 -m 1024 -drive
file=/autobench/var/tmp/cape-vm002-
raw.img,if=virtio,index=0,media=disk,boot=on -net
nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:61,netdev=cape-vm002-eth0 -
netdev tap,id=cape-vm002-eth0,script=/autobench/var/tmp/ifup-kvm-
br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:E1,netdev=cape-vm002-eth1 -
netdev tap,id=cape-vm002-eth1,script=/autobench/var/tmp/ifup-kvm-
br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :2 -monitor
telnet::5702,server,nowait -snapshot -daemonize

The ifup-kvm-br0 script takes the (first) qemu created tap device and
  brings it up and adds it to bridge br0.  The ifup-kvm-br1 script take the
  (second) qemu created tap device and brings it up and adds it to bridge
  br1.

Each ethernet device within a guest is on it's own subnet.  For example:
   guest 1 eth0 has addr 192.168.100.32 and eth1 has addr 192.168.101.32
   guest 2 eth0 has addr 192.168.100.64 and eth1 has addr 192.168.101.64

On one of the guests run netserver:
   netserver -L 192.168.101.32 -p 12000

On the other guest run netperf:
   netperf -L 192.168.101.64 -H 192.168.101.32 -p 12000 -t TCP_STREAM -l 60
  -c -C -- -m 16K -M 16K

It may take more than one netperf run (I find that my second run almost
  always causes the shutdown) but the network on the eth1 links will stop
  working.

I did some debugging and found that in qemu on the guest running netserver:
  - the receive_disabled variable is set and never gets reset
  - the read_poll event handler for the eth1 tap device is disabled and
  never re-enabled
These conditions result in no packets being read from the tap device and
  sent to the guest - effectively shutting down the network.  Network
  connectivity can be restored by shutting down the guest interfaces,
  unloading the virtio_net module, re-loading the virtio_net module and
  re-starting the guest interfaces.

I'm continuing to work on debugging this, but would appreciate if some
  folks with more qemu network experience could try to recreate and debug
  this.

If my kernel config matters, I can provide that.

Thanks,
Tom
--
To unsubscribe from this list: send the line unsubscribe 

Re: Network shutdown under load

2010-02-09 Thread RW
Thanks for the patch! It seems to solve the problem that
under load ( 50 MBit/s) the network goes down. I've applied
the patch to KVM 0.12.2 running Gentoo. Host and guest is running
kernel 2.6.32 currently (kernel 2.6.30 in guest and 2.6.32 in
host works also for us).

Another host doing the same jobs with the same amount of traffic
and configuration but with KVM 0.11.1 was shutting down
the network interface every 5-10 minutes today while the
patched 0.12.2 was running fine. During the time the 0.11.1 KVMs
were down the patched one delivered 200 MBit/s without
problems. Now both hosts running with the patched
version. We're expecting much more traffic tomorrow so if the
network is still up on thursday I would say the bug is fixed.

Thanks for that patch! It really was a lifesaver today :-)

- Robert


On 02/08/2010 05:10 PM, Tom Lendacky wrote:
 Fix a race condition where qemu finds that there are not enough virtio
 ring buffers available and the guest make more buffers available before
 qemu can enable notifications.

 Signed-off-by: Tom Lendacky t...@us.ibm.com
 Signed-off-by: Anthony Liguori aligu...@us.ibm.com

  hw/virtio-net.c |   10 +-
  1 files changed, 9 insertions(+), 1 deletions(-)

 diff --git a/hw/virtio-net.c b/hw/virtio-net.c
 index 6e48997..5c0093e 100644
 --- a/hw/virtio-net.c
 +++ b/hw/virtio-net.c
 @@ -379,7 +379,15 @@ static int virtio_net_has_buffers(VirtIONet *n, int 
 bufsize)
  (n-mergeable_rx_bufs 
   !virtqueue_avail_bytes(n-rx_vq, bufsize, 0))) {
  virtio_queue_set_notification(n-rx_vq, 1);
 -return 0;
 +
 +/* To avoid a race condition where the guest has made some buffers
 + * available after the above check but before notification was
 + * enabled, check for available buffers again.
 + */
 +if (virtio_queue_empty(n-rx_vq) ||
 +(n-mergeable_rx_bufs 
 + !virtqueue_avail_bytes(n-rx_vq, bufsize, 0)))
 +return 0;
  }
  
  virtio_queue_set_notification(n-rx_vq, 0);

 On Friday 29 January 2010 02:06:41 pm Tom Lendacky wrote:
   
 There's been some discussion of this already in the kvm list, but I want to
 summarize what I've found and also include the qemu-devel list in an effort
  to find a solution to this problem.

 Running a netperf test between two kvm guests results in the guest's
  network interface shutting down. I originally found this using kvm guests
  on two different machines that were connected via a 10GbE link.  However,
  I found this problem can be easily reproduced using two guests on the same
  machine.

 I am running the 2.6.32 level of the kvm.git tree and the 0.12.1.2 level of
 the qemu-kvm.git tree.

 The setup includes two bridges, br0 and br1.

 The commands used to start the guests are as follows:
 usr/local/bin/qemu-system-x86_64 -name cape-vm001 -m 1024 -drive
 file=/autobench/var/tmp/cape-vm001-
 raw.img,if=virtio,index=0,media=disk,boot=on -net
 nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:51,netdev=cape-vm001-eth0 -
 netdev tap,id=cape-vm001-eth0,script=/autobench/var/tmp/ifup-kvm-
 br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
 nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:D1,netdev=cape-vm001-eth1 -
 netdev tap,id=cape-vm001-eth1,script=/autobench/var/tmp/ifup-kvm-
 br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :1 -monitor
 telnet::5701,server,nowait -snapshot -daemonize

 usr/local/bin/qemu-system-x86_64 -name cape-vm002 -m 1024 -drive
 file=/autobench/var/tmp/cape-vm002-
 raw.img,if=virtio,index=0,media=disk,boot=on -net
 nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:61,netdev=cape-vm002-eth0 -
 netdev tap,id=cape-vm002-eth0,script=/autobench/var/tmp/ifup-kvm-
 br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
 nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:E1,netdev=cape-vm002-eth1 -
 netdev tap,id=cape-vm002-eth1,script=/autobench/var/tmp/ifup-kvm-
 br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :2 -monitor
 telnet::5702,server,nowait -snapshot -daemonize

 The ifup-kvm-br0 script takes the (first) qemu created tap device and
  brings it up and adds it to bridge br0.  The ifup-kvm-br1 script take the
  (second) qemu created tap device and brings it up and adds it to bridge
  br1.

 Each ethernet device within a guest is on it's own subnet.  For example:
   guest 1 eth0 has addr 192.168.100.32 and eth1 has addr 192.168.101.32
   guest 2 eth0 has addr 192.168.100.64 and eth1 has addr 192.168.101.64

 On one of the guests run netserver:
   netserver -L 192.168.101.32 -p 12000

 On the other guest run netperf:
   netperf -L 192.168.101.64 -H 192.168.101.32 -p 12000 -t TCP_STREAM -l 60
  -c -C -- -m 16K -M 16K

 It may take more than one netperf run (I find that my second run almost
  always causes the shutdown) but the network on the eth1 links will stop
  working.

 I did some debugging and found that in qemu on the guest running netserver:
  - the receive_disabled variable is set and never gets 

Re: Network shutdown under load

2010-02-08 Thread Tom Lendacky

Fix a race condition where qemu finds that there are not enough virtio
ring buffers available and the guest make more buffers available before
qemu can enable notifications.

Signed-off-by: Tom Lendacky t...@us.ibm.com
Signed-off-by: Anthony Liguori aligu...@us.ibm.com

 hw/virtio-net.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 6e48997..5c0093e 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -379,7 +379,15 @@ static int virtio_net_has_buffers(VirtIONet *n, int 
bufsize)
 (n-mergeable_rx_bufs 
  !virtqueue_avail_bytes(n-rx_vq, bufsize, 0))) {
 virtio_queue_set_notification(n-rx_vq, 1);
-return 0;
+
+/* To avoid a race condition where the guest has made some buffers
+ * available after the above check but before notification was
+ * enabled, check for available buffers again.
+ */
+if (virtio_queue_empty(n-rx_vq) ||
+(n-mergeable_rx_bufs 
+ !virtqueue_avail_bytes(n-rx_vq, bufsize, 0)))
+return 0;
 }
 
 virtio_queue_set_notification(n-rx_vq, 0);

On Friday 29 January 2010 02:06:41 pm Tom Lendacky wrote:
 There's been some discussion of this already in the kvm list, but I want to
 summarize what I've found and also include the qemu-devel list in an effort
  to find a solution to this problem.
 
 Running a netperf test between two kvm guests results in the guest's
  network interface shutting down. I originally found this using kvm guests
  on two different machines that were connected via a 10GbE link.  However,
  I found this problem can be easily reproduced using two guests on the same
  machine.
 
 I am running the 2.6.32 level of the kvm.git tree and the 0.12.1.2 level of
 the qemu-kvm.git tree.
 
 The setup includes two bridges, br0 and br1.
 
 The commands used to start the guests are as follows:
 usr/local/bin/qemu-system-x86_64 -name cape-vm001 -m 1024 -drive
 file=/autobench/var/tmp/cape-vm001-
 raw.img,if=virtio,index=0,media=disk,boot=on -net
 nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:51,netdev=cape-vm001-eth0 -
 netdev tap,id=cape-vm001-eth0,script=/autobench/var/tmp/ifup-kvm-
 br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
 nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:D1,netdev=cape-vm001-eth1 -
 netdev tap,id=cape-vm001-eth1,script=/autobench/var/tmp/ifup-kvm-
 br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :1 -monitor
 telnet::5701,server,nowait -snapshot -daemonize
 
 usr/local/bin/qemu-system-x86_64 -name cape-vm002 -m 1024 -drive
 file=/autobench/var/tmp/cape-vm002-
 raw.img,if=virtio,index=0,media=disk,boot=on -net
 nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:61,netdev=cape-vm002-eth0 -
 netdev tap,id=cape-vm002-eth0,script=/autobench/var/tmp/ifup-kvm-
 br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
 nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:E1,netdev=cape-vm002-eth1 -
 netdev tap,id=cape-vm002-eth1,script=/autobench/var/tmp/ifup-kvm-
 br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :2 -monitor
 telnet::5702,server,nowait -snapshot -daemonize
 
 The ifup-kvm-br0 script takes the (first) qemu created tap device and
  brings it up and adds it to bridge br0.  The ifup-kvm-br1 script take the
  (second) qemu created tap device and brings it up and adds it to bridge
  br1.
 
 Each ethernet device within a guest is on it's own subnet.  For example:
   guest 1 eth0 has addr 192.168.100.32 and eth1 has addr 192.168.101.32
   guest 2 eth0 has addr 192.168.100.64 and eth1 has addr 192.168.101.64
 
 On one of the guests run netserver:
   netserver -L 192.168.101.32 -p 12000
 
 On the other guest run netperf:
   netperf -L 192.168.101.64 -H 192.168.101.32 -p 12000 -t TCP_STREAM -l 60
  -c -C -- -m 16K -M 16K
 
 It may take more than one netperf run (I find that my second run almost
  always causes the shutdown) but the network on the eth1 links will stop
  working.
 
 I did some debugging and found that in qemu on the guest running netserver:
  - the receive_disabled variable is set and never gets reset
  - the read_poll event handler for the eth1 tap device is disabled and
  never re-enabled
 These conditions result in no packets being read from the tap device and
  sent to the guest - effectively shutting down the network.  Network
  connectivity can be restored by shutting down the guest interfaces,
  unloading the virtio_net module, re-loading the virtio_net module and
  re-starting the guest interfaces.
 
 I'm continuing to work on debugging this, but would appreciate if some
  folks with more qemu network experience could try to recreate and debug
  this.
 
 If my kernel config matters, I can provide that.
 
 Thanks,
 Tom
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line 

Re: Network shutdown under load

2010-02-04 Thread RW
Hi,

thanks for that! I've running a lot of hosts still running with
kernel 2.6.30 and KVM 88 without problems. It seems that
all qemu-kvm versions = 0.11.0 have this problem incl.
the latest 0.12.2. So if one of the enterprise distributions
will choose one of this versions for inclusion in there
enterprise products customers will definitley will get
problems. This is definitley a showstopper if you can't
do more then 30-50 MBit/s over some period of time.
I think kernel 2.6.32 will be choosen by a lot of distributions
but the problem still exists there as far as I've read the
mailings here.

Regards,
Robert


Cedric Peltier wrote:
 Hi,

 We encoutered similar problem yesterday by upgrading a developpement
 server from ubuntu 9.04 (kernel 2.6.28) to ubuntu 9.10 (kernel 2.6.31).
 Going back under the kernel 2.6.28 was the solution for us until now..

 Regards,


 Cédric PELTIER, société Indigo



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html