[kvm-devel] [Virtio-for-kvm] [PATCH 7/7] userspace virtio
From f244bcad756c4f761627557bb7f315b1d8f22fb2 Mon Sep 17 00:00:00 2001 From: Dor Laor <[EMAIL PROTECTED]> Date: Thu, 20 Dec 2007 13:26:30 +0200 Subject: [PATCH] [VIRTIO-NET] Rx performance improvement The current performance are not good enough, the problem lies in qemu tap handling code that caused to pass packets one at a time and also to copy them to a temporal buffer. This patch prevents qemu handlers from reading the tap and instead it selects the tap descriptors for virtio devices. This eliminates copies and also batch guest notifications (interrupts). Using this patch the rx performance reaches 800Mbps. The patch does not follow qemu's api since the intention is first to have a better io in kvm and then to polish it correctly. Signed-off-by: Dor Laor <[EMAIL PROTECTED]> --- qemu/hw/pc.h |2 +- qemu/hw/virtio-net.c | 114 +++-- qemu/sysemu.h|3 + qemu/vl.c| 23 ++- 4 files changed, 107 insertions(+), 35 deletions(-) diff --git a/qemu/hw/pc.h b/qemu/hw/pc.h index 95471f3..5d4c747 100644 --- a/qemu/hw/pc.h +++ b/qemu/hw/pc.h @@ -145,7 +145,7 @@ void isa_ne2000_init(int base, qemu_irq irq, NICInfo *nd); /* virtio-net.c */ void *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn); - +void virtio_net_poll(void); /* virtio-blk.h */ void *virtio_blk_init(PCIBus *bus, uint16_t vendor, uint16_t device, diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c index f6f1f28..b955a5e 100644 --- a/qemu/hw/virtio-net.c +++ b/qemu/hw/virtio-net.c @@ -60,8 +60,13 @@ typedef struct VirtIONet VirtQueue *tx_vq; VLANClientState *vc; int can_receive; +int tap_fd; +struct VirtIONet *next; +int do_notify; } VirtIONet; +static VirtIONet *VirtIONetHead = NULL; + static VirtIONet *to_virtio_net(VirtIODevice *vdev) { return (VirtIONet *)vdev; @@ -96,42 +101,81 @@ static int virtio_net_can_receive(void *opaque) return (n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK) && n->can_receive; } -static void virtio_net_receive(void *opaque, const uint8_t *buf, int size) +void virtio_net_poll(void) { -VirtIONet *n = opaque; +VirtIONet *vnet; +int len; +fd_set rfds; +struct timeval tv; +int max_fd = -1; VirtQueueElement elem; struct virtio_net_hdr *hdr; -int offset, i; - -/* FIXME: the drivers really need to set their status better */ -if (n->rx_vq->vring.avail == NULL) { -n->can_receive = 0; -return; -} - -if (virtqueue_pop(n->rx_vq, &elem) == 0) { -/* wait until the guest adds some rx bufs */ -n->can_receive = 0; -return; -} - -hdr = (void *)elem.in_sg[0].iov_base; -hdr->flags = 0; -hdr->gso_type = VIRTIO_NET_HDR_GSO_NONE; - -/* copy in packet. ugh */ -offset = 0; -i = 1; -while (offset < size && i < elem.in_num) { -int len = MIN(elem.in_sg[i].iov_len, size - offset); -memcpy(elem.in_sg[i].iov_base, buf + offset, len); -offset += len; -i++; -} +int did_notify; + +FD_ZERO(&rfds); +tv.tv_sec = 0; +tv.tv_usec = 0; + +while (1) { + + // Prepare the set of device to select from +for (vnet = VirtIONetHead; vnet; vnet = vnet->next) { + + vnet->do_notify = 0; + //first check if the driver is ok + if (!virtio_net_can_receive(vnet)) + continue; + + /* FIXME: the drivers really need to set their status better */ + if (vnet->rx_vq->vring.avail == NULL) { +vnet->can_receive = 0; +continue; + } + + FD_SET(vnet->tap_fd, &rfds); + if (max_fd < vnet->tap_fd) max_fd = vnet->tap_fd; +} + +if (select(max_fd + 1, &rfds, NULL, NULL, &tv) <= 0) +break; + + // Now check who has data pending in the tap +for (vnet = VirtIONetHead; vnet; vnet = vnet->next) { + + if (!FD_ISSET(vnet->tap_fd, &rfds)) + continue; + + if (virtqueue_pop(vnet->rx_vq, &elem) == 0) { +vnet->can_receive = 0; + continue; + } + + hdr = (void *)elem.in_sg[0].iov_base; + hdr->flags = 0; + hdr->gso_type = VIRTIO_NET_HDR_GSO_NONE; +again: + len = readv(vnet->tap_fd, &elem.in_sg[1], elem.in_num - 1); + if (len == -1) + if (errno == EINTR || errno == EAGAIN) + goto again; + else + fprintf(stderr, "reading network error %d", len); + +virtqueue_push(vnet->rx_vq, &elem, sizeof(*hdr) + len); +vnet->do_notify = 1; +} + +/* signal other side */ +did_notify = 0; +for (vnet = VirtIONetHead; vnet; vnet = vnet->next) +if (vnet->do_notify) { +virtio_notify(&vnet->vdev, vnet->rx_vq); +did_notify+
Re: [kvm-devel] [Virtio-for-kvm] [PATCH 7/7] userspace virtio
Dor Laor wrote: > From f244bcad756c4f761627557bb7f315b1d8f22fb2 Mon Sep 17 00:00:00 2001 > From: Dor Laor <[EMAIL PROTECTED]> > Date: Thu, 20 Dec 2007 13:26:30 +0200 > Subject: [PATCH] [VIRTIO-NET] Rx performance improvement > The current performance are not good enough, the problem lies > in qemu tap handling code that caused to pass packets one at > a time and also to copy them to a temporal buffer. > > This patch prevents qemu handlers from reading the tap and instead > it selects the tap descriptors for virtio devices. > This eliminates copies and also batch guest notifications (interrupts). > > Using this patch the rx performance reaches 800Mbps. > The patch does not follow qemu's api since the intention is first to have > a better io in kvm and then to polish it correctly. > > This breaks -net user, which is one of the motivations for having a qemu net device. We need to maintain the old path for that, and only use the new fast path if using -net tap. -- error compiling committee.c: too many arguments to function - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Virtio-for-kvm] [PATCH 7/7] userspace virtio
I think we should hold off on this sort of patch at first. I know it improves performance, but it's very hack-ish. I have a similar patch[1] that improves performance more but is even more hack-ish. I think we have to approach this by not special cases virtio-net to know about the tap fd, but to figure out the interface that virtio-net would need to be efficient, and then refactor the net interface to look like that. Then we can still support user, pcap, and the other network transports. [1] http://hg.codemonkey.ws/qemu-virtio/file/75cefe566cea/aio-net.diff Regards, Anthony Liguori Dor Laor wrote: > From f244bcad756c4f761627557bb7f315b1d8f22fb2 Mon Sep 17 00:00:00 2001 > From: Dor Laor <[EMAIL PROTECTED]> > Date: Thu, 20 Dec 2007 13:26:30 +0200 > Subject: [PATCH] [VIRTIO-NET] Rx performance improvement > The current performance are not good enough, the problem lies > in qemu tap handling code that caused to pass packets one at > a time and also to copy them to a temporal buffer. > > This patch prevents qemu handlers from reading the tap and instead > it selects the tap descriptors for virtio devices. > This eliminates copies and also batch guest notifications (interrupts). > > Using this patch the rx performance reaches 800Mbps. > The patch does not follow qemu's api since the intention is first to have > a better io in kvm and then to polish it correctly. > > Signed-off-by: Dor Laor <[EMAIL PROTECTED]> > --- > qemu/hw/pc.h |2 +- > qemu/hw/virtio-net.c | 114 > +++-- > qemu/sysemu.h|3 + > qemu/vl.c| 23 ++- > 4 files changed, 107 insertions(+), 35 deletions(-) > > diff --git a/qemu/hw/pc.h b/qemu/hw/pc.h > index 95471f3..5d4c747 100644 > --- a/qemu/hw/pc.h > +++ b/qemu/hw/pc.h > @@ -145,7 +145,7 @@ void isa_ne2000_init(int base, qemu_irq irq, > NICInfo *nd); > /* virtio-net.c */ > > void *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn); > - > +void virtio_net_poll(void); > > /* virtio-blk.h */ > void *virtio_blk_init(PCIBus *bus, uint16_t vendor, uint16_t device, > diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c > index f6f1f28..b955a5e 100644 > --- a/qemu/hw/virtio-net.c > +++ b/qemu/hw/virtio-net.c > @@ -60,8 +60,13 @@ typedef struct VirtIONet > VirtQueue *tx_vq; > VLANClientState *vc; > int can_receive; > +int tap_fd; > +struct VirtIONet *next; > +int do_notify; > } VirtIONet; > > +static VirtIONet *VirtIONetHead = NULL; > + > static VirtIONet *to_virtio_net(VirtIODevice *vdev) > { > return (VirtIONet *)vdev; > @@ -96,42 +101,81 @@ static int virtio_net_can_receive(void *opaque) > return (n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK) && > n->can_receive; > } > > -static void virtio_net_receive(void *opaque, const uint8_t *buf, int > size) > +void virtio_net_poll(void) > { > -VirtIONet *n = opaque; > +VirtIONet *vnet; > +int len; > +fd_set rfds; > +struct timeval tv; > +int max_fd = -1; > VirtQueueElement elem; > struct virtio_net_hdr *hdr; > -int offset, i; > - > -/* FIXME: the drivers really need to set their status better */ > -if (n->rx_vq->vring.avail == NULL) { > -n->can_receive = 0; > -return; > -} > - > -if (virtqueue_pop(n->rx_vq, &elem) == 0) { > -/* wait until the guest adds some rx bufs */ > -n->can_receive = 0; > -return; > -} > - > -hdr = (void *)elem.in_sg[0].iov_base; > -hdr->flags = 0; > -hdr->gso_type = VIRTIO_NET_HDR_GSO_NONE; > - > -/* copy in packet. ugh */ > -offset = 0; > -i = 1; > -while (offset < size && i < elem.in_num) { > -int len = MIN(elem.in_sg[i].iov_len, size - offset); > -memcpy(elem.in_sg[i].iov_base, buf + offset, len); > -offset += len; > -i++; > -} > +int did_notify; > + > +FD_ZERO(&rfds); > +tv.tv_sec = 0; > +tv.tv_usec = 0; > + > +while (1) { > + > + // Prepare the set of device to select from > +for (vnet = VirtIONetHead; vnet; vnet = vnet->next) { > + > + vnet->do_notify = 0; > + //first check if the driver is ok > + if (!virtio_net_can_receive(vnet)) > + continue; > + > + /* FIXME: the drivers really need to set their status > better */ > + if (vnet->rx_vq->vring.avail == NULL) { > +vnet->can_receive = 0; > +continue; > + } > + > + FD_SET(vnet->tap_fd, &rfds); > + if (max_fd < vnet->tap_fd) max_fd = vnet->tap_fd; > +} > + +if (select(max_fd + 1, &rfds, NULL, NULL, &tv) <= 0) > +break; > + > + // Now check who has data pending in the tap > +for (vnet = VirtIONetHead; vnet; vnet = vnet->next) { > + > + if (!FD_ISSET(vnet->tap_fd, &rfds)) > + continue; > + + if (virtqueue_pop(vnet->rx_vq, &elem) == 0) { > +vnet->
Re: [kvm-devel] [Virtio-for-kvm] [PATCH 7/7] userspace virtio
Anthony Liguori wrote: > I think we should hold off on this sort of patch at first. I know it > improves performance, but it's very hack-ish. I have a similar patch[1] > that improves performance more but is even more hack-ish. > > I think we have to approach this by not special cases virtio-net to know > about the tap fd, but to figure out the interface that virtio-net would > need to be efficient, and then refactor the net interface to look like > that. Then we can still support user, pcap, and the other network > transports. > > [1] http://hg.codemonkey.ws/qemu-virtio/file/75cefe566cea/aio-net.diff > While you are right in principle, high performance networking is long overdue in kvm so I applied that patch. Once a mega async dma framework is added to qemu, we'll just revert that patch prior to adding the glue to said framework. -- error compiling committee.c: too many arguments to function - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Virtio-for-kvm] [PATCH 7/7] userspace virtio
Avi Kivity wrote: > Anthony Liguori wrote: >> I think we should hold off on this sort of patch at first. I know it >> improves performance, but it's very hack-ish. I have a similar >> patch[1] that improves performance more but is even more hack-ish. >> >> I think we have to approach this by not special cases virtio-net to >> know about the tap fd, but to figure out the interface that >> virtio-net would need to be efficient, and then refactor the net >> interface to look like that. Then we can still support user, pcap, >> and the other network transports. >> >> [1] http://hg.codemonkey.ws/qemu-virtio/file/75cefe566cea/aio-net.diff >> > > While you are right in principle, high performance networking is long > overdue in kvm so I applied that patch. Once a mega async dma > framework is added to qemu, we'll just revert that patch prior to > adding the glue to said framework. > I second Avi, the reason we wanted quick (and ugly) performance patch merged was because people are trying kvm and sometime get disappointed of IO performance. Since we did have unmerged pv net for quite a while, we're pushing towards merging it. We can't wait for qemu guys to merge it and we also plan to develop host kernel pv network side. btw: My hack leaves the -user support but with the previous performance. I'll add the tx improvement too. Thanks for the backward compat patch :) Dor - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel