On (Thu) 01 Nov 2012 [14:04:11], Michael S. Tsirkin wrote: > On Thu, Nov 01, 2012 at 02:49:18PM +0530, Amit Shah wrote: > > On (Tue) 23 Oct 2012 [14:55:03], Stefan Hajnoczi wrote: > > > On Mon, Oct 22, 2012 at 06:50:00AM -0700, Edivaldo de Araujo Pereira > > > wrote: > > > > I didn't take enough time to uderstand the code, so unfortunately I > > > > fear there is not much I could do to solve the problem, apart from > > > > trying your suggestions. But I'll try to spend a little more time on > > > > it, until we find a solution. > > > > > > I've thought a little about how to approach this. Amit, here's a brain > > > dump: > > > > > > The simplest solution is to make virtqueue_avail_bytes() use the old > > > behavior of stopping early. > > > > > > However, I wonder if we can actually *improve* performance of existing > > > code by changing virtio-net.c:virtio_net_receive(). The intuition is > > > that calling virtio_net_has_buffers() (internally calls > > > virtqueue_avail_bytes()) followed by virtqueue_pop() is suboptimal > > > because we're repeatedly traversing the descriptor chain. > > > > > > We can get rid of this repetition. A side-effect of this is that we no > > > longer need to call virtqueue_avail_bytes() from virtio-net.c. Here's > > > how: > > > > > > The common case in virtio_net_receive() is that we have buffers and they > > > are large enough for the received packet. So to optimize for this case: > > > > > > 1. Take the VirtQueueElement off the vring but don't increment > > > last_avail_idx yet. (This is essentially a "peek" operation.) > > > > > > 2. If there is an error or we drop the packet because the > > > VirtQueueElement is too small, just bail out and we'll grab the same > > > VirtQueueElement again next time. > > > > > > 3. When we've committed filling in this VirtQueueElement, increment > > > last_avail_idx. This is the point of no return. > > > > > > Essentially we're splitting pop() into peek() and consume(). Peek() > > > grabs the VirtQueueElement but does not increment last_avail_idx. > > > Consume() simply increments last_avail_idx and maybe the EVENT_IDX > > > optimization stuff. > > > > > > Whether this will improve performance, I'm not sure. Perhaps > > > virtio_net_has_buffers() pulls most descriptors into the CPU's cache and > > > following up with virtqueue_pop() is very cheap already. But the idea > > > here is to avoid the virtio_net_has_buffers() because we'll find out > > > soon enough when we try to pop :). > > > > This sounds doable -- adding mst for comments. > > > > > Another approach would be to drop virtio_net_has_buffers() but continue > > > to use virtqueue_pop(). We'd keep the same VirtQueueElem stashed in > > > VirtIONet across virtio_net_receive() calls in the case where we drop > > > the packet. I don't like this approach very much though because it gets > > > tricky when the guest modifies the vring memory, resets the virtio > > > device, etc across calls. > > > > Right. > > > > Also, save/load will become slightly complicated in both these > > cases, but it might be worth it. > > > > Michael, can you comment pls? > > > > It will also complicate switching to/from vhost-net. > > If this patch helps serial but degrades speed for -net I'm inclined > to simply make serial and net user different codepaths.
There's an opportunity for optimisation, let's not discount it so quickly. The reporter also pointed out there was ~ +20% difference with tun/tap, so eliminating the call to virtqueue_avail_bytes() can help overall. Amit