On Thu, Nov 13, 2008 at 04:24:52PM +0100, Emmanuel Lacour wrote:
> On Thu, Nov 13, 2008 at 03:12:33PM +0000, Mark McLoughlin wrote:
> > The fact that re-loading the virtio_net driver fixes things up makes me
> > suspect you've found a bug in the virtio_net driver, rather than e.g. a
> > bug in the kvm-userspace side.
> > To try and narrow down what's happening, when the interface has hung,
> > try:
> >   - tcpdump on both eth0 in the guest and the tap device on the host 
> >     (tap5 in your example)
On eth0 I see echo requests, but _no_ echo replies
On tap5 I see echo requests _and_ echo replies

> >   - look for anything unusual in the stats for both those interfaces,
> >      e.g. /proc/net/dev, netstat -s etc.
Comparing with other guest without problems, the only difference is that this
tap (and only this one) reports "overruns":

tap5      Link encap:Ethernet  HWaddr 00:FF:AD:53:76:25  
          inet6 addr: fe80::2ff:adff:fe53:7625/64 Scope:Link
          RX packets:717737621 errors:0 dropped:0 overruns:0 frame:0
          TX packets:636626720 errors:0 dropped:0 overruns:317 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:368973099756 (343.6 GiB)  TX bytes:217917073227 (202.9 GiB)

overruns seems to happen just when there is "hang", it doesn't seems to
increase when network is working properly.

> >   - strace the /usr/bin/kvm process
Unfortunatly I was unable to do this because I can't reproduce the problem on a
test VM and I can't leave this VM with a non working network for analysis
because of production so I have a script which pings and restart
module/interface when needed.

