Re: [Qemu-devel] RFC: handling backend too fast in virtio-net
On Fri, Feb 15, 2013 at 11:24:29AM +0100, Stefan Hajnoczi wrote: On Thu, Feb 14, 2013 at 07:21:57PM +0100, Luigi Rizzo wrote: CCed Michael Tsirkin virtio-style network devices (where the producer and consumer chase each other through a shared memory block) can enter into a bad operating regime when the consumer is too fast. I am hitting this case right now when virtio is used on top of the netmap/VALE backend that I posted a few weeks ago: what happens is that the backend is so fast that the io thread keeps re-enabling notifications every few packets. This results, on my test machine, in a throughput of 250-300Kpps (and extremely unstable, oscillating between 200 and 600Kpps). I'd like to get some feedback on the following trivial patch to have the thread keep spinning for a bounded amount of time when the producer is slower than the consumer. This gives a relatively stable throughput between 700 and 800 Kpps (we have something similar in our paravirtualized e1000 driver, which is slightly faster at 900-1100 Kpps). Did you experiment with tx timer instead of bh? It seems that hw/virtio-net.c has two tx mitigation strategies - the bh approach that you've tweaked and a true timer. It seems you don't really want tx batching but you do want to avoid guest-host notifications? One more thing I forgot: virtio-net does not use ioeventfd by default. ioeventfd changes the cost of guest-host notifications because the notification becomes an eventfd signal inside the kernel and kvm.ko then re-enters the guest. This means a guest-host notification becomes a light-weight exit and we don't return from ioctl(KVM_RUN). Perhaps -device virtio-blk-pci,ioeventfd=on will give similar results to your patch? Stefan
Re: [Qemu-devel] RFC: handling backend too fast in virtio-net
On Mon, Feb 18, 2013 at 1:50 AM, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Feb 15, 2013 at 11:24:29AM +0100, Stefan Hajnoczi wrote: On Thu, Feb 14, 2013 at 07:21:57PM +0100, Luigi Rizzo wrote: CCed Michael Tsirkin virtio-style network devices (where the producer and consumer chase each other through a shared memory block) can enter into a bad operating regime when the consumer is too fast. I am hitting this case right now when virtio is used on top of the netmap/VALE backend that I posted a few weeks ago: what happens is that the backend is so fast that the io thread keeps re-enabling notifications every few packets. This results, on my test machine, in a throughput of 250-300Kpps (and extremely unstable, oscillating between 200 and 600Kpps). I'd like to get some feedback on the following trivial patch to have the thread keep spinning for a bounded amount of time when the producer is slower than the consumer. This gives a relatively stable throughput between 700 and 800 Kpps (we have something similar in our paravirtualized e1000 driver, which is slightly faster at 900-1100 Kpps). Did you experiment with tx timer instead of bh? It seems that hw/virtio-net.c has two tx mitigation strategies - the bh approach that you've tweaked and a true timer. It seems you don't really want tx batching but you do want to avoid guest-host notifications? One more thing I forgot: virtio-net does not use ioeventfd by default. ioeventfd changes the cost of guest-host notifications because the notification becomes an eventfd signal inside the kernel and kvm.ko then re-enters the guest. This means a guest-host notification becomes a light-weight exit and we don't return from ioctl(KVM_RUN). Perhaps -device virtio-blk-pci,ioeventfd=on will give similar results to your patch? is the ioeventfd the mechanism used by vhostnet ? If so, Giuseppe Lettieri (in Cc) has tried that with a modified netmap backend and experienced the same problem -- making the io thread (user or kernel) spin a bit more has great benefit on the throughput. cheers luigi
Re: [Qemu-devel] RFC: handling backend too fast in virtio-net
On Mon, Feb 18, 2013 at 02:12:13AM -0800, Luigi Rizzo wrote: On Mon, Feb 18, 2013 at 1:50 AM, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Feb 15, 2013 at 11:24:29AM +0100, Stefan Hajnoczi wrote: On Thu, Feb 14, 2013 at 07:21:57PM +0100, Luigi Rizzo wrote: CCed Michael Tsirkin virtio-style network devices (where the producer and consumer chase each other through a shared memory block) can enter into a bad operating regime when the consumer is too fast. I am hitting this case right now when virtio is used on top of the netmap/VALE backend that I posted a few weeks ago: what happens is that the backend is so fast that the io thread keeps re-enabling notifications every few packets. This results, on my test machine, in a throughput of 250-300Kpps (and extremely unstable, oscillating between 200 and 600Kpps). I'd like to get some feedback on the following trivial patch to have the thread keep spinning for a bounded amount of time when the producer is slower than the consumer. This gives a relatively stable throughput between 700 and 800 Kpps (we have something similar in our paravirtualized e1000 driver, which is slightly faster at 900-1100 Kpps). Did you experiment with tx timer instead of bh? It seems that hw/virtio-net.c has two tx mitigation strategies - the bh approach that you've tweaked and a true timer. It seems you don't really want tx batching but you do want to avoid guest-host notifications? One more thing I forgot: virtio-net does not use ioeventfd by default. ioeventfd changes the cost of guest-host notifications because the notification becomes an eventfd signal inside the kernel and kvm.ko then re-enters the guest. This means a guest-host notification becomes a light-weight exit and we don't return from ioctl(KVM_RUN). Perhaps -device virtio-blk-pci,ioeventfd=on will give similar results to your patch? is the ioeventfd the mechanism used by vhostnet ? If so, Giuseppe Lettieri (in Cc) has tried that with a modified netmap backend and experienced the same problem -- making the io thread (user or kernel) spin a bit more has great benefit on the throughput. Yes, ioeventfd is how the vhost_net kernel module is notified by the guest. It's easy to test, just use -device virtio-net-pci,ioeventfd=on,... Stefan
Re: [Qemu-devel] RFC: handling backend too fast in virtio-net
On Thu, Feb 14, 2013 at 07:21:57PM +0100, Luigi Rizzo wrote: CCed Michael Tsirkin virtio-style network devices (where the producer and consumer chase each other through a shared memory block) can enter into a bad operating regime when the consumer is too fast. I am hitting this case right now when virtio is used on top of the netmap/VALE backend that I posted a few weeks ago: what happens is that the backend is so fast that the io thread keeps re-enabling notifications every few packets. This results, on my test machine, in a throughput of 250-300Kpps (and extremely unstable, oscillating between 200 and 600Kpps). I'd like to get some feedback on the following trivial patch to have the thread keep spinning for a bounded amount of time when the producer is slower than the consumer. This gives a relatively stable throughput between 700 and 800 Kpps (we have something similar in our paravirtualized e1000 driver, which is slightly faster at 900-1100 Kpps). Did you experiment with tx timer instead of bh? It seems that hw/virtio-net.c has two tx mitigation strategies - the bh approach that you've tweaked and a true timer. It seems you don't really want tx batching but you do want to avoid guest-host notifications? EXISTING LOGIC: reschedule the bh when at least a burst of packets has been received. Otherwise enable notifications and do another race-prevention check. NEW LOGIC: when the bh is scheduled the first time, establish a budget (currently 20) of reschedule attempts. Every time virtio_net_flush_tx() returns 0 packets [this could be changed to 'less than a full burst'] the counter is increased. The bh is rescheduled until the counter reaches the budget, at which point we re-enable notifications as above. I am not 100% happy with this patch because there is a magic constant (the maximum number of retries) which should be really adaptive, or perhaps we should even bound the amount of time the bh runs, rather than the number of retries. In practice, having the thread spin (or sleep) for 10-20us even without new packets is probably preferable to re-enable notifications and request a kick. opinions ? cheers luigi diff --git a/hw/virtio-net.c b/hw/virtio-net.c index 573c669..5389088 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -49,6 +51,7 @@ typedef struct VirtIONet NICState *nic; uint32_t tx_timeout; int32_t tx_burst; +int32_t tx_retries; /* keep spinning a bit on tx */ uint32_t has_vnet_hdr; size_t host_hdr_len; size_t guest_hdr_len; @@ -1062,7 +1065,9 @@ static void virtio_net_tx_bh(void *opaque) /* If we flush a full burst of packets, assume there are * more coming and immediately reschedule */ -if (ret = n-tx_burst) { +if (ret == 0) + n-tx_retries++; +if (n-tx_retries 20) { qemu_bh_schedule(q-tx_bh); q-tx_waiting = 1; return; @@ -1076,6 +1082,8 @@ static void virtio_net_tx_bh(void *opaque) virtio_queue_set_notification(q-tx_vq, 0); qemu_bh_schedule(q-tx_bh); q-tx_waiting = 1; +} else { + n-tx_retries = 0; } }
Re: [Qemu-devel] RFC: handling backend too fast in virtio-net
On Fri, Feb 15, 2013 at 11:24:29AM +0100, Stefan Hajnoczi wrote: On Thu, Feb 14, 2013 at 07:21:57PM +0100, Luigi Rizzo wrote: CCed Michael Tsirkin virtio-style network devices (where the producer and consumer chase each other through a shared memory block) can enter into a bad operating regime when the consumer is too fast. I am hitting this case right now when virtio is used on top of the netmap/VALE backend that I posted a few weeks ago: what happens is that the backend is so fast that the io thread keeps re-enabling notifications every few packets. This results, on my test machine, in a throughput of 250-300Kpps (and extremely unstable, oscillating between 200 and 600Kpps). I'd like to get some feedback on the following trivial patch to have the thread keep spinning for a bounded amount of time when the producer is slower than the consumer. This gives a relatively stable throughput between 700 and 800 Kpps (we have something similar in our paravirtualized e1000 driver, which is slightly faster at 900-1100 Kpps). Did you experiment with tx timer instead of bh? It seems that hw/virtio-net.c has two tx mitigation strategies - the bh approach that you've tweaked and a true timer. It seems you don't really want tx batching but you do want to avoid guest-host notifications? i have just tried: bh (with my tweaks): 700-800Kpps (large variance) timer (150us, 256 slots): ~490Kpps (very stable) As expected The throughput in the timer version seems proportional to ring_size/timeout , e.g. 1ms and 256slots give approx 210Kpps, 1ms and 128 slots give 108Kpps. but it saturates around 490Kpps. cheers luigi EXISTING LOGIC: reschedule the bh when at least a burst of packets has been received. Otherwise enable notifications and do another race-prevention check. NEW LOGIC: when the bh is scheduled the first time, establish a budget (currently 20) of reschedule attempts. Every time virtio_net_flush_tx() returns 0 packets [this could be changed to 'less than a full burst'] the counter is increased. The bh is rescheduled until the counter reaches the budget, at which point we re-enable notifications as above. I am not 100% happy with this patch because there is a magic constant (the maximum number of retries) which should be really adaptive, or perhaps we should even bound the amount of time the bh runs, rather than the number of retries. In practice, having the thread spin (or sleep) for 10-20us even without new packets is probably preferable to re-enable notifications and request a kick. opinions ? cheers luigi diff --git a/hw/virtio-net.c b/hw/virtio-net.c index 573c669..5389088 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -49,6 +51,7 @@ typedef struct VirtIONet NICState *nic; uint32_t tx_timeout; int32_t tx_burst; +int32_t tx_retries; /* keep spinning a bit on tx */ uint32_t has_vnet_hdr; size_t host_hdr_len; size_t guest_hdr_len; @@ -1062,7 +1065,9 @@ static void virtio_net_tx_bh(void *opaque) /* If we flush a full burst of packets, assume there are * more coming and immediately reschedule */ -if (ret = n-tx_burst) { +if (ret == 0) + n-tx_retries++; +if (n-tx_retries 20) { qemu_bh_schedule(q-tx_bh); q-tx_waiting = 1; return; @@ -1076,6 +1082,8 @@ static void virtio_net_tx_bh(void *opaque) virtio_queue_set_notification(q-tx_vq, 0); qemu_bh_schedule(q-tx_bh); q-tx_waiting = 1; +} else { + n-tx_retries = 0; } }