On Wed, Jun 27, 2012 at 08:41:49AM +0100, Stefan Hajnoczi wrote: > On Wed, Jun 27, 2012 at 8:39 AM, Stefan Hajnoczi <stefa...@gmail.com> wrote: > > On Tue, Jun 26, 2012 at 8:34 PM, Marcelo Tosatti <mtosa...@redhat.com> > > wrote: > >> On Sat, Jun 23, 2012 at 12:55:49AM +0200, Jan Kiszka wrote: > >> net.txt > >> -------- > >> > >> > >> iothread flow > >> ============= > >> > >> 1) Skip-work-if-device-locked > >> > >> select(tap fd ready) > >> tap_send > >> if (trylock(TAPState->NetClientState->dev)) > >> proceed; > >> else > >> continue; (data remains in queue) > >> tap_read_packet > >> qemu_send_packet_async > >> > >> DRAWBACK: lock intensive. > >> DRAWBACK: VCPU threads can starve IOTHREAD (can be fixed with > >> a scheme such as trylock() marks a flag saying "iothread wants lock". > > > > One alternative is to say the lock cannot be held across system calls > > or other long-running operations (similar to interrupt handler > > spinlocks in the kernel). But QEMU threads are still at the mercy of > > the scheduler or page faults, so perhaps this is not a good idea > > because waiting on the lock could tie up the iothread. > > > >> 2) Queue-work-to-vcpu-context > >> > >> select(tap fd ready) > >> tap_send > >> if (trylock(TAPState->NetClientState->dev)) { > >> proceed; > >> } else { > >> AddToDeviceWorkQueue(work); > >> } > >> tap_read_packet > >> qemu_send_packet_async > >> > >> DRAWBACK: lock intensive > >> DRAWBACK: cache effects > > > > This almost suggests we should invert packet reception for NICs. tap > > fd ready should add a work item and the guest device calls into > > net/tap.c to pull out any packets from the work function: > > > > tap_send > > dev_add_work(work); > > > > virtio_net_rx_work_fn > > while ((req = virtqueue_pop(rx_queue))) { > > if (!net_receive_packet(netclient, req->buf)) { > > break; /* no more packets available */ > > } > > /* ...mark req complete and push onto virtqueue... */ > > } > > virtqueue_notify(rx_queue); > > > > The idea is to avoid copies on the rx path and make the locking simple > > by always deferring to a work function (which can be invoked > > immediately if the device isn't locked). > > I just realized this approach bypasses net/queue.c. I think it works > really well for the peer model (no "VLANs"). Basically flow control > is dictated by the peer (receiver) and we don't need to use NetQueue > anymore. Whether this works elegantly for slirp and other backends, > I'm not sure yet.
Right. The advantage is only backends where performance matters need to be converted (and only net hw drivers that matter performance wise need to be converted).