On Fri, Aug 02, 2013 at 09:41:24PM +0200, Jan Kiszka wrote: > On 2013-08-02 14:45, Jan Kiszka wrote: > > On 2013-08-02 13:46, Stefan Hajnoczi wrote: > >> On Thu, Aug 01, 2013 at 07:15:54PM +0200, Jan Kiszka wrote: > >>> I was digging into the involved code and found something fishy: > >>> > >>> net/tap.c: > >>> static void tap_send(void *opaque) > >>> { > >>> ... > >>> size = qemu_send_packet_async(&s->nc, buf, size, > >>> tap_send_completed); > >>> if (size == 0) { > >>> tap_read_poll(s, false); > >>> } > >>> > >>> So, if tap_send is registered for the mainloop polling (ie. can_receive > >>> returned true before starting to poll) but qemu_send_packet_async > >>> returns 0 now as qemu_can_send_packet/can_receive happens to report > >>> false in the meantime, we will disable read polling. If also write > >>> polling is off, the fd will be completely removed from the iohandler > >>> list. But even if write polling remains on, I wonder what should bring > >>> read polling back? > >> > >> This behavior seems fine to me. Once the peer (pcnet) is able to > >> receive again it must flush the queue, this will re-enable > >> tap_read_poll(). > >> > >> Can you explain a bit more why this would be a problem? > > > > The problem is that I don't see at all what will call tap_read_poll(s, > > 1), neither in theory nor in reality. > > > > As long as the real test case is out of reach, I tried to emulate the > > faulty behaviour by letting tap_can_send always return 1. Result: > > reception stalls during boot as even qemu_flush_queued_packets cannot > > get it running again once tap_read_poll(s, 0) was called. > > OK, false alarm. The issue was most likely fixed by commit 199ee608 > (net: fix qemu_flush_queued_packets() in presence of a hub) which is > present in 1.5.x but not 1.3.x. We initially tried to test on 1.5 but > had to role back to 1.3 due to other issues - and missed this fix. > > My understanding of the networking maze was confused by the unfortunate > naming of the incoming net client queues ("send_queue") - will propose a > renaming. > > This still requires a confirmation on the target, but I'm quite > optimistic now.
Okay, good to hear. It makes more sense now and I agree that "send_queue" is not a great name. Stefan