I know this issue has been discussed on this list before, but I am still experiencing network freezes in a guest that requires a restart to clear. When the network freezes in the guest I no longer see the network interrupts counter incrementing (i.e., the eth0 counter in /proc/interrupts in the guest). Using the crash utility, I verified that the interrupt is still enabled on the guest side and that no interrupts are pending. This suggests that the interrupts are not getting delivered to the VM.
On the host side, I attached gdb to the qemu process. When the networking freezes in the guest the irq_state for the rtl8139 device is: irq_state = {1, 0, 0, 0} My current guess is that there is a race condition getting tripped where the change in irq state is not getting pushed to the VM or the race is in clearing the interrupt. Once missed nothing resets the irq_state to 0, so no more of these interrupts are delivered to the VM. From pci_set_irq() in qemu/hw/pci.c: change = level - pci_dev->irq_state[irq_num]; if (!change) return; I take this to mean that as long as irq_state is 1, any further requests to set it to 1 are basically ignored, so the interrupt is not pushed to the VM. Also, I have only seen the network interrupt in this state; the timer interrupts and ide interrupts appear to be working fine even when the network interrupts stop. In debugging the problem I noticed that it is easy for the guest to be overwhelmed with packets such that the ring buffer fills in the rtl8139 device in qemu. You see this in the fifo counter in the output of 'ethtool -S eth0' on the guest side, and I confirmed it by modifying the qemu source for the nic to print when packets are dropped due to no space in the ring buffer. I am seeing as much as 100+ packets dropped per second. (The code for the existing message "RTL8139: C+ Rx mode : descriptor XX is owned by host" was modified to print a summary of the number of times this message was hit per second; without flow control the number of messages was unwieldy.) If I recompile the guest side 8139cp driver to use a higher "weight" (e.g., 48 or 64), then the number of dropped packets due to ring buffer overflow drops dramatically (I have also run cases with the size of the ring buffer increased). Incrementing this weight allows more packets to be pulled from the device each time the cp_rx_poll() function is called in the guest kernel. With the increased weight I still see network freezes but can usually run longer before a freeze occurs (e.g., with the increase I can run for say an hour+ with a network load going versus without the increase I can trigger the freeze pretty quickly with a network load). Host ---- Model: PowerEdge 2950 CPUs: two dual-core, Xeon(R) CPU 5140 @ 2.33GHz OS: RHEL 5.1, x86_64 kernel: 2.6.24.2 kernel, x86_64 Network: attached to gigabit network kvm: kvm-61 Guest ----- VM: 2.5 GB RAM, 2 VCPUs, rtl8139 NIC OS: RHEL4 U4 32-bit kernel: U6 kernel recompiled to run at 250 HZ rather than 1000 HZ NIC: 8139cp driver Command line: /usr/local/bin/qemu-system-x86_64 -localtime -m 2560 -smp 2 \ -hda rootfs.img -hdb trace.img \ -net nic,macaddr=00:16:3e:30:f0:32,model=rtl8139 -net tap \ -monitor stdio -vnc :2 I am continuing to look into the irq processing on the kvm/qemu side. I'd like to know if anyone has suggestions on what to look at. This is my first foray into the kvm and qemu code, and it's a lot to take in all at once. thanks, david ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel