On Tue, Apr 01, 2014 at 09:43:42AM -0600, David Ahern wrote: > On 4/1/14, 9:09 AM, Stefan Hajnoczi wrote: > >On Thu, Mar 27, 2014 at 04:13:15PM -0600, David Ahern wrote: > >> > >>We are hitting a networking problem and hoping someone has an idea > >>-- perhaps a known bug. > >> > >>After a couple of hours of runtime with low level traffic (e.g., 1 > >>sec pings) the VM stops receiving packets. In the host running tc on > >>the tap device shows a full backlog and packets getting dropped: > >> > >>tc -s qdisc show dev vnet0 > >>qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 > >>1 1 1 1 1 1 1 > >>Sent 5806496634 bytes 4163358 pkt (dropped 116079, overlimits 0 requeues 4) > >>backlog 33834b 500p requeues 4 > >> > >> > >>The tap device is passed to qemu as fd=24. Running strace on the IO > >>thread does not show the fd in the list passed to select. e.g., > >> > >>select(55, [7 8 11 18 52 53 54], [], [], {1, 0}) = 1 (in [8], left > >>{0, 872402}) > >> > >>That would explain why the packets are not pulled from the tap > >>device into the VM. When networking is functioning properly, you do > >>see fd=24 in the list followed by read(24, ...). > >> > >>Why would qemu stop adding the fd to the list passed to select? > >> > >> > >>This is qemu-kvm-1.0 (upgrading is not an option), started by > >>libvirt (libvirt 1.0.2). The command line is rather long. Snippets: > >> > >>/usr/bin/kvm -M pc-1.0 -cpu host -enable-kvm -m 4096 -smp 4 > >>... > >>-netdev tap,fd=24,id=hostnet0 -device > >>virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:ba:55:60,bus=pci.0,addr=0xa > >>... > >> > >>Host kernel: 3.2.0-60-generic > >>Guest kernel: 3.8.0-29-generic > > > >Are you using vhost_net or the userspace virtio-net emulation? > > virtio-net.
There have been bugs in the past, like the recent: http://git.qemu.org/?p=qemu.git;a=commitdiff;h=68e5ec64009812dbaa03ed9cfded9344986f5304 So it's not totally surprising. I'm afraid you either need to bisect against a newer, working QEMU version or debug your old QEMU from first principles (using gdb it should be possible to figure out more about why the tap file descriptor is not in the select(2) fdset). Stefan