Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Sat, 28 May 2011 23:02:04 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Thu, May 26, 2011 at 12:58:23PM +0930, Rusty Russell wrote: ie. free two packets for every one we're about to add. For steady state that would work really well. Sure, with indirect buffers, but if we don't use indirect (and we discussed switching indirect off dynamically in the past) this becomes harder to be sure about. I think I understand why but does not a simple capacity check make it more obvious? ... Then we hit the case where the ring seems full after we do the add: at that point, screw latency, and just try to free all the buffers we can. I see. But the code currently does this: for(..) get_buf add_buf if (capacity max_sk_frags+2) { if (!enable_cb) for(..) get_buf } In other words the second get_buf is only called in the unlikely case of race condition. So we'll need to add *another* call to get_buf. Is it just me or is this becoming messy? Yes, good point. I really wonder if anyone would be able to measure the difference between simply freeing 2 every time (with possible extra stalls for strange cases) and the more complete version. But it runs against my grain to implement heuristics when one more call would make it provably reliable. Please find a way to make that for loop less ugly though! Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Wed, 25 May 2011 09:07:59 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Wed, May 25, 2011 at 11:05:04AM +0930, Rusty Russell wrote: Hmm I'm not sure I got it, need to think about this. I'd like to go back and document how my design was supposed to work. This really should have been in commit log or even a comment. I thought we need a min, not a max. We start with this: while ((c = (virtqueue_get_capacity(vq) 2 + MAX_SKB_FRAGS) (skb = get_buf))) kfree_skb(skb); return !c; This is clean and simple, right? And it's exactly asking for what we need. No, I started from the other direction: for (i = 0; i 2; i++) { skb = get_buf(); if (!skb) break; kfree_skb(skb); } ie. free two packets for every one we're about to add. For steady state that would work really well. Then we hit the case where the ring seems full after we do the add: at that point, screw latency, and just try to free all the buffers we can. on the normal path min == 2 so we're low latency but we keep ahead on average. min == 0 for the we're out of capacity, we may have to stop the queue. Does the above make sense at all? It makes sense, but I think it's a classic case where incremental improvements aren't as good as starting from scratch. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Wed, May 25, 2011 at 11:05:04AM +0930, Rusty Russell wrote: On Mon, 23 May 2011 14:19:00 +0300, Michael S. Tsirkin m...@redhat.com wrote: I do understand how it seems a waste to leave direct space in the ring while we might in practice have space due to indirect. Didn't come up with a nice way to solve this yet - but 'no worse than now :)' Let's just make it bool free_old_xmit_skbs(unsigned int max). max == 2 for the normal xmit path, so we're low latency but we keep ahead on average. max == -1 for the we're out of capacity, we may have to stop the queue. That keeps it simple and probably the right thing... Thanks, Rusty. Hmm I'm not sure I got it, need to think about this. I'd like to go back and document how my design was supposed to work. This really should have been in commit log or even a comment. I thought we need a min, not a max. We start with this: while ((c = (virtqueue_get_capacity(vq) 2 + MAX_SKB_FRAGS) (skb = get_buf))) kfree_skb(skb); return !c; This is clean and simple, right? And it's exactly asking for what we need. But this way we always keep a lot of memory in skbs even when rate of communication is low. So we add the min parameter: int n = 0; while c = (virtqueue_get_capacity(vq) 2 + MAX_SKB_FRAGS)) || n++ min) (skb = get_buf))) kfree_skb(skb); return !c; on the normal path min == 2 so we're low latency but we keep ahead on average. min == 0 for the we're out of capacity, we may have to stop the queue. Does the above make sense at all? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
Michael S. Tsirkin m...@redhat.com wrote on 05/23/2011 04:49:00 PM: To do this properly, we should really be using the actual number of sg elements needed, but we'd have to do most of xmit_skb beforehand so we know how many. Cheers, Rusty. Maybe I'm confused here. The problem isn't the failing add_buf for the given skb IIUC. What we are trying to do here is stop the queue *before xmit_skb fails*. We can't look at the number of fragments in the current skb - the next one can be much larger. That's why we check capacity after xmit_skb, not before it, right? Maybe Rusty means it is a simpler model to free the amount of space that this xmit needs. We will still fail anyway at some time but it is unlikely, since earlier iteration freed up atleast the space that it was going to use. The code could become much simpler: start_xmit() { { num_sgs = get num_sgs for this skb; /* Free enough pending old buffers to enable queueing this one */ free_old_xmit_skbs(vi, num_sgs * 2); /* ?? */ if (virtqueue_get_capacity() num_sgs) { netif_stop_queue(dev); if (virtqueue_enable_cb_delayed(vi-svq) || free_old_xmit_skbs(vi, num_sgs)) { /* Nothing freed up, or not enough freed up */ kfree_skb(skb); return NETDEV_TX_OK; } netif_start_queue(dev); virtqueue_disable_cb(vi-svq); } /* xmit_skb cannot fail now, also pass 'num_sgs' */ xmit_skb(vi, skb, num_sgs); virtqueue_kick(vi-svq); skb_orphan(skb); nf_reset(skb); return NETDEV_TX_OK; } We could even return TX_BUSY since that makes the dequeue code more efficient. See dev_dequeue_skb() - you can skip a lot of code (and avoid taking locks) to check if the queue is already stopped but that code runs only if you return TX_BUSY in the earlier iteration. BTW, shouldn't the check in start_xmit be: if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { ... } Thanks, - KK ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Tue, May 24, 2011 at 01:24:15PM +0530, Krishna Kumar2 wrote: Michael S. Tsirkin m...@redhat.com wrote on 05/23/2011 04:49:00 PM: To do this properly, we should really be using the actual number of sg elements needed, but we'd have to do most of xmit_skb beforehand so we know how many. Cheers, Rusty. Maybe I'm confused here. The problem isn't the failing add_buf for the given skb IIUC. What we are trying to do here is stop the queue *before xmit_skb fails*. We can't look at the number of fragments in the current skb - the next one can be much larger. That's why we check capacity after xmit_skb, not before it, right? Maybe Rusty means it is a simpler model to free the amount of space that this xmit needs. We will still fail anyway at some time but it is unlikely, since earlier iteration freed up atleast the space that it was going to use. Not sure I nderstand. We can't know space is freed in the previous iteration as buffers might not have been used by then. The code could become much simpler: start_xmit() { { num_sgs = get num_sgs for this skb; /* Free enough pending old buffers to enable queueing this one */ free_old_xmit_skbs(vi, num_sgs * 2); /* ?? */ if (virtqueue_get_capacity() num_sgs) { netif_stop_queue(dev); if (virtqueue_enable_cb_delayed(vi-svq) || free_old_xmit_skbs(vi, num_sgs)) { /* Nothing freed up, or not enough freed up */ kfree_skb(skb); return NETDEV_TX_OK; This packet drop is what we wanted to avoid. } netif_start_queue(dev); virtqueue_disable_cb(vi-svq); } /* xmit_skb cannot fail now, also pass 'num_sgs' */ xmit_skb(vi, skb, num_sgs); virtqueue_kick(vi-svq); skb_orphan(skb); nf_reset(skb); return NETDEV_TX_OK; } We could even return TX_BUSY since that makes the dequeue code more efficient. See dev_dequeue_skb() - you can skip a lot of code (and avoid taking locks) to check if the queue is already stopped but that code runs only if you return TX_BUSY in the earlier iteration. BTW, shouldn't the check in start_xmit be: if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { ... } Thanks, - KK I thought we used to do basically this but other devices moved to a model where they stop *before* queueing fails, so we did too. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
Michael S. Tsirkin m...@redhat.com wrote on 05/24/2011 02:42:55 PM: To do this properly, we should really be using the actual number of sg elements needed, but we'd have to do most of xmit_skb beforehand so we know how many. Cheers, Rusty. Maybe I'm confused here. The problem isn't the failing add_buf for the given skb IIUC. What we are trying to do here is stop the queue *before xmit_skb fails*. We can't look at the number of fragments in the current skb - the next one can be much larger. That's why we check capacity after xmit_skb, not before it, right? Maybe Rusty means it is a simpler model to free the amount of space that this xmit needs. We will still fail anyway at some time but it is unlikely, since earlier iteration freed up atleast the space that it was going to use. Not sure I nderstand. We can't know space is freed in the previous iteration as buffers might not have been used by then. Yes, the first few iterations may not have freed up space, but later ones should. The amount of free space should increase from then on, especially since we try to free double of what we consume. The code could become much simpler: start_xmit() { { num_sgs = get num_sgs for this skb; /* Free enough pending old buffers to enable queueing this one */ free_old_xmit_skbs(vi, num_sgs * 2); /* ?? */ if (virtqueue_get_capacity() num_sgs) { netif_stop_queue(dev); if (virtqueue_enable_cb_delayed(vi-svq) || free_old_xmit_skbs(vi, num_sgs)) { /* Nothing freed up, or not enough freed up */ kfree_skb(skb); return NETDEV_TX_OK; This packet drop is what we wanted to avoid. Please see below on returning NETDEV_TX_BUSY. } netif_start_queue(dev); virtqueue_disable_cb(vi-svq); } /* xmit_skb cannot fail now, also pass 'num_sgs' */ xmit_skb(vi, skb, num_sgs); virtqueue_kick(vi-svq); skb_orphan(skb); nf_reset(skb); return NETDEV_TX_OK; } We could even return TX_BUSY since that makes the dequeue code more efficient. See dev_dequeue_skb() - you can skip a lot of code (and avoid taking locks) to check if the queue is already stopped but that code runs only if you return TX_BUSY in the earlier iteration. BTW, shouldn't the check in start_xmit be: if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { ... } Thanks, - KK I thought we used to do basically this but other devices moved to a model where they stop *before* queueing fails, so we did too. I am not sure of why it was changed, since returning TX_BUSY seems more efficient IMHO. qdisc_restart() handles requeue'd packets much better than a stopped queue, as a significant part of this code is skipped if gso_skb is present (qdisc will eventually start dropping packets when tx_queue_len is exceeded anyway). Thanks, - KK ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Tue, May 24, 2011 at 02:57:43PM +0530, Krishna Kumar2 wrote: Michael S. Tsirkin m...@redhat.com wrote on 05/24/2011 02:42:55 PM: To do this properly, we should really be using the actual number of sg elements needed, but we'd have to do most of xmit_skb beforehand so we know how many. Cheers, Rusty. Maybe I'm confused here. The problem isn't the failing add_buf for the given skb IIUC. What we are trying to do here is stop the queue *before xmit_skb fails*. We can't look at the number of fragments in the current skb - the next one can be much larger. That's why we check capacity after xmit_skb, not before it, right? Maybe Rusty means it is a simpler model to free the amount of space that this xmit needs. We will still fail anyway at some time but it is unlikely, since earlier iteration freed up atleast the space that it was going to use. Not sure I nderstand. We can't know space is freed in the previous iteration as buffers might not have been used by then. Yes, the first few iterations may not have freed up space, but later ones should. The amount of free space should increase from then on, especially since we try to free double of what we consume. Hmm. This is only an upper limit on the # of entries in the queue. Assume that vq size is 4 and we transmit 4 enties without getting anything in the used ring. The next transmit will fail. So I don't really see why it's unlikely that we reach the packet drop code with your patch. The code could become much simpler: start_xmit() { { num_sgs = get num_sgs for this skb; /* Free enough pending old buffers to enable queueing this one */ free_old_xmit_skbs(vi, num_sgs * 2); /* ?? */ if (virtqueue_get_capacity() num_sgs) { netif_stop_queue(dev); if (virtqueue_enable_cb_delayed(vi-svq) || free_old_xmit_skbs(vi, num_sgs)) { /* Nothing freed up, or not enough freed up */ kfree_skb(skb); return NETDEV_TX_OK; This packet drop is what we wanted to avoid. Please see below on returning NETDEV_TX_BUSY. } netif_start_queue(dev); virtqueue_disable_cb(vi-svq); } /* xmit_skb cannot fail now, also pass 'num_sgs' */ xmit_skb(vi, skb, num_sgs); virtqueue_kick(vi-svq); skb_orphan(skb); nf_reset(skb); return NETDEV_TX_OK; } We could even return TX_BUSY since that makes the dequeue code more efficient. See dev_dequeue_skb() - you can skip a lot of code (and avoid taking locks) to check if the queue is already stopped but that code runs only if you return TX_BUSY in the earlier iteration. BTW, shouldn't the check in start_xmit be: if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { ... } Thanks, - KK I thought we used to do basically this but other devices moved to a model where they stop *before* queueing fails, so we did too. I am not sure of why it was changed, since returning TX_BUSY seems more efficient IMHO. qdisc_restart() handles requeue'd packets much better than a stopped queue, as a significant part of this code is skipped if gso_skb is present I think this is the argument: http://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg06364.html (qdisc will eventually start dropping packets when tx_queue_len is exceeded anyway). Thanks, - KK tx_queue_len is a pretty large buffer so maybe no. I think the packet drops from the scheduler queue can also be done intelligently (e.g. with CHOKe) which should work better than dropping a random packet? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
Michael S. Tsirkin m...@redhat.com wrote on 05/24/2011 04:59:39 PM: Maybe Rusty means it is a simpler model to free the amount of space that this xmit needs. We will still fail anyway at some time but it is unlikely, since earlier iteration freed up atleast the space that it was going to use. Not sure I nderstand. We can't know space is freed in the previous iteration as buffers might not have been used by then. Yes, the first few iterations may not have freed up space, but later ones should. The amount of free space should increase from then on, especially since we try to free double of what we consume. Hmm. This is only an upper limit on the # of entries in the queue. Assume that vq size is 4 and we transmit 4 enties without getting anything in the used ring. The next transmit will fail. So I don't really see why it's unlikely that we reach the packet drop code with your patch. I was assuming 256 entries :) I will try to get some numbers to see how often it is true tomorrow. I am not sure of why it was changed, since returning TX_BUSY seems more efficient IMHO. qdisc_restart() handles requeue'd packets much better than a stopped queue, as a significant part of this code is skipped if gso_skb is present I think this is the argument: http://www.mail-archive.com/virtualization@lists.linux- foundation.org/msg06364.html Thanks for digging up that thread! Yes, that one skb would get sent first ahead of possibly higher priority skbs. However, from a performance point, TX_BUSY code skips a lot of checks and code for all subsequent packets till the device is restarted. I can test performance with both cases and report what I find (the requeue code has become very simple and clean from horribly complex, thanks to Herbert and Dave). (qdisc will eventually start dropping packets when tx_queue_len is tx_queue_len is a pretty large buffer so maybe no. I remember seeing tons of drops (pfifo_fast_enqueue) when xmit returns TX_BUSY. I think the packet drops from the scheduler queue can also be done intelligently (e.g. with CHOKe) which should work better than dropping a random packet? I am not sure of that - choke_enqueue checks against a random skb to drop current skb, and also during congestion. But for my sample driver xmit, returning TX_BUSY could still allow to be used with CHOKe. thanks, - KK ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Tue, May 24, 2011 at 06:20:35PM +0530, Krishna Kumar2 wrote: Michael S. Tsirkin m...@redhat.com wrote on 05/24/2011 04:59:39 PM: Maybe Rusty means it is a simpler model to free the amount of space that this xmit needs. We will still fail anyway at some time but it is unlikely, since earlier iteration freed up atleast the space that it was going to use. Not sure I nderstand. We can't know space is freed in the previous iteration as buffers might not have been used by then. Yes, the first few iterations may not have freed up space, but later ones should. The amount of free space should increase from then on, especially since we try to free double of what we consume. Hmm. This is only an upper limit on the # of entries in the queue. Assume that vq size is 4 and we transmit 4 enties without getting anything in the used ring. The next transmit will fail. So I don't really see why it's unlikely that we reach the packet drop code with your patch. I was assuming 256 entries :) I will try to get some numbers to see how often it is true tomorrow. That would depend on how fast the hypervisor is. Try doing something to make hypervisor slower than the guest. I don't think we need measurements to realize that with the host being slower than guest that would happen a lot, though. I am not sure of why it was changed, since returning TX_BUSY seems more efficient IMHO. qdisc_restart() handles requeue'd packets much better than a stopped queue, as a significant part of this code is skipped if gso_skb is present I think this is the argument: http://www.mail-archive.com/virtualization@lists.linux- foundation.org/msg06364.html Thanks for digging up that thread! Yes, that one skb would get sent first ahead of possibly higher priority skbs. However, from a performance point, TX_BUSY code skips a lot of checks and code for all subsequent packets till the device is restarted. I can test performance with both cases and report what I find (the requeue code has become very simple and clean from horribly complex, thanks to Herbert and Dave). Cc Herbert, and try to convince him :) (qdisc will eventually start dropping packets when tx_queue_len is tx_queue_len is a pretty large buffer so maybe no. I remember seeing tons of drops (pfifo_fast_enqueue) when xmit returns TX_BUSY. I think the packet drops from the scheduler queue can also be done intelligently (e.g. with CHOKe) which should work better than dropping a random packet? I am not sure of that - choke_enqueue checks against a random skb to drop current skb, and also during congestion. But for my sample driver xmit, returning TX_BUSY could still allow to be used with CHOKe. thanks, - KK ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Mon, 23 May 2011 14:19:00 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Mon, May 23, 2011 at 11:37:15AM +0930, Rusty Russell wrote: Can we hit problems with OOM? Sure, but no worse than now... The problem is that this virtqueue_get_capacity() returns the worst case, not the normal case. So using it is deceptive. Maybe just document this? Yes, but also by renaming virtqueue_get_capacity(). Takes it from a 3 to a 6 on the API hard-to-misuse scale. How about, virtqueue_min_capacity()? Makes the reader realize something weird is going on. I still believe capacity really needs to be decided at the virtqueue level, not in the driver. E.g. with indirect each skb uses a single entry: freeing 1 small skb is always enough to have space for a large one. I do understand how it seems a waste to leave direct space in the ring while we might in practice have space due to indirect. Didn't come up with a nice way to solve this yet - but 'no worse than now :)' Agreed. I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make sure we have enough space in the buffer. Another way to do that is with a define :). To do this properly, we should really be using the actual number of sg elements needed, but we'd have to do most of xmit_skb beforehand so we know how many. Cheers, Rusty. Maybe I'm confused here. The problem isn't the failing add_buf for the given skb IIUC. What we are trying to do here is stop the queue *before xmit_skb fails*. We can't look at the number of fragments in the current skb - the next one can be much larger. That's why we check capacity after xmit_skb, not before it, right? No, I was confused... More coffee! Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Mon, 23 May 2011 14:19:00 +0300, Michael S. Tsirkin m...@redhat.com wrote: I do understand how it seems a waste to leave direct space in the ring while we might in practice have space due to indirect. Didn't come up with a nice way to solve this yet - but 'no worse than now :)' Let's just make it bool free_old_xmit_skbs(unsigned int max). max == 2 for the normal xmit path, so we're low latency but we keep ahead on average. max == -1 for the we're out of capacity, we may have to stop the queue. That keeps it simple and probably the right thing... Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Wed, May 25, 2011 at 10:58:26AM +0930, Rusty Russell wrote: On Mon, 23 May 2011 14:19:00 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Mon, May 23, 2011 at 11:37:15AM +0930, Rusty Russell wrote: Can we hit problems with OOM? Sure, but no worse than now... The problem is that this virtqueue_get_capacity() returns the worst case, not the normal case. So using it is deceptive. Maybe just document this? Yes, but also by renaming virtqueue_get_capacity(). Takes it from a 3 to a 6 on the API hard-to-misuse scale. How about, virtqueue_min_capacity()? Makes the reader realize something weird is going on. Absolutely. Great idea. I still believe capacity really needs to be decided at the virtqueue level, not in the driver. E.g. with indirect each skb uses a single entry: freeing 1 small skb is always enough to have space for a large one. I do understand how it seems a waste to leave direct space in the ring while we might in practice have space due to indirect. Didn't come up with a nice way to solve this yet - but 'no worse than now :)' Agreed. I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make sure we have enough space in the buffer. Another way to do that is with a define :). To do this properly, we should really be using the actual number of sg elements needed, but we'd have to do most of xmit_skb beforehand so we know how many. Cheers, Rusty. Maybe I'm confused here. The problem isn't the failing add_buf for the given skb IIUC. What we are trying to do here is stop the queue *before xmit_skb fails*. We can't look at the number of fragments in the current skb - the next one can be much larger. That's why we check capacity after xmit_skb, not before it, right? No, I was confused... More coffee! Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Mon, May 23, 2011 at 11:37:15AM +0930, Rusty Russell wrote: On Sun, 22 May 2011 15:10:08 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Sat, May 21, 2011 at 11:49:59AM +0930, Rusty Russell wrote: On Fri, 20 May 2011 02:11:56 +0300, Michael S. Tsirkin m...@redhat.com wrote: Current code might introduce a lot of latency variation if there are many pending bufs at the time we attempt to transmit a new one. This is bad for real-time applications and can't be good for TCP either. Do we have more than speculation to back that up, BTW? Need to dig this up: I thought we saw some reports of this on the list? I think so too, but a reference needs to be here too. It helps to have exact benchmarks on what's being tested, otherwise we risk unexpected interaction with the other optimization patches. struct sk_buff *skb; unsigned int len; - - while ((skb = virtqueue_get_buf(vi-svq, len)) != NULL) { + bool c; + int n; + + /* We try to free up at least 2 skbs per one sent, so that we'll get +* all of the memory back if they are used fast enough. */ + for (n = 0; +((c = virtqueue_get_capacity(vi-svq) capacity) || n 2) +((skb = virtqueue_get_buf(vi-svq, len))); +++n) { pr_debug(Sent skb %p\n, skb); vi-dev-stats.tx_bytes += skb-len; vi-dev-stats.tx_packets++; dev_kfree_skb_any(skb); } + return !c; This is for() abuse :) Why is the capacity check in there at all? Surely it's simpler to try to free 2 skbs each time around? This is in case we can't use indirect: we want to free up enough buffers for the following add_buf to succeed. Sure, or we could just count the frags of the skb we're taking out, which would be accurate for both cases and far more intuitive. ie. always try to free up twice as much as we're about to put in. Can we hit problems with OOM? Sure, but no worse than now... The problem is that this virtqueue_get_capacity() returns the worst case, not the normal case. So using it is deceptive. Maybe just document this? I still believe capacity really needs to be decided at the virtqueue level, not in the driver. E.g. with indirect each skb uses a single entry: freeing 1 small skb is always enough to have space for a large one. I do understand how it seems a waste to leave direct space in the ring while we might in practice have space due to indirect. Didn't come up with a nice way to solve this yet - but 'no worse than now :)' I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make sure we have enough space in the buffer. Another way to do that is with a define :). To do this properly, we should really be using the actual number of sg elements needed, but we'd have to do most of xmit_skb beforehand so we know how many. Cheers, Rusty. Maybe I'm confused here. The problem isn't the failing add_buf for the given skb IIUC. What we are trying to do here is stop the queue *before xmit_skb fails*. We can't look at the number of fragments in the current skb - the next one can be much larger. That's why we check capacity after xmit_skb, not before it, right? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Sat, May 21, 2011 at 11:49:59AM +0930, Rusty Russell wrote: On Fri, 20 May 2011 02:11:56 +0300, Michael S. Tsirkin m...@redhat.com wrote: Current code might introduce a lot of latency variation if there are many pending bufs at the time we attempt to transmit a new one. This is bad for real-time applications and can't be good for TCP either. Do we have more than speculation to back that up, BTW? Need to dig this up: I thought we saw some reports of this on the list? This patch is pretty sloppy; the previous ones were better polished. -static void free_old_xmit_skbs(struct virtnet_info *vi) +static bool free_old_xmit_skbs(struct virtnet_info *vi, int capacity) { A comment here indicating it returns true if it frees something? Agree. struct sk_buff *skb; unsigned int len; - - while ((skb = virtqueue_get_buf(vi-svq, len)) != NULL) { + bool c; + int n; + + /* We try to free up at least 2 skbs per one sent, so that we'll get +* all of the memory back if they are used fast enough. */ + for (n = 0; +((c = virtqueue_get_capacity(vi-svq) capacity) || n 2) +((skb = virtqueue_get_buf(vi-svq, len))); +++n) { pr_debug(Sent skb %p\n, skb); vi-dev-stats.tx_bytes += skb-len; vi-dev-stats.tx_packets++; dev_kfree_skb_any(skb); } + return !c; This is for() abuse :) Why is the capacity check in there at all? Surely it's simpler to try to free 2 skbs each time around? This is in case we can't use indirect: we want to free up enough buffers for the following add_buf to succeed. for (n = 0; n 2; n++) { skb = virtqueue_get_buf(vi-svq, len); if (!skb) break; pr_debug(Sent skb %p\n, skb); vi-dev-stats.tx_bytes += skb-len; vi-dev-stats.tx_packets++; dev_kfree_skb_any(skb); } static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb) @@ -574,8 +582,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) struct virtnet_info *vi = netdev_priv(dev); int capacity; - /* Free up any pending old buffers before queueing new ones. */ - free_old_xmit_skbs(vi); + /* Free enough pending old buffers to enable queueing new ones. */ + free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS); /* Try to transmit */ capacity = xmit_skb(vi, skb); @@ -609,9 +617,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) netif_stop_queue(dev); if (unlikely(!virtqueue_enable_cb_delayed(vi-svq))) { /* More just got used, free them then recheck. */ - free_old_xmit_skbs(vi); - capacity = virtqueue_get_capacity(vi-svq); - if (capacity = 2+MAX_SKB_FRAGS) { + if (!likely(free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { This extra argument to free_old_xmit_skbs seems odd, unless you have future plans? Thanks, Rusty. I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make sure we have enough space in the buffer. Another way to do that is with a define :). -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Sun, 22 May 2011 15:10:08 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Sat, May 21, 2011 at 11:49:59AM +0930, Rusty Russell wrote: On Fri, 20 May 2011 02:11:56 +0300, Michael S. Tsirkin m...@redhat.com wrote: Current code might introduce a lot of latency variation if there are many pending bufs at the time we attempt to transmit a new one. This is bad for real-time applications and can't be good for TCP either. Do we have more than speculation to back that up, BTW? Need to dig this up: I thought we saw some reports of this on the list? I think so too, but a reference needs to be here too. It helps to have exact benchmarks on what's being tested, otherwise we risk unexpected interaction with the other optimization patches. struct sk_buff *skb; unsigned int len; - - while ((skb = virtqueue_get_buf(vi-svq, len)) != NULL) { + bool c; + int n; + + /* We try to free up at least 2 skbs per one sent, so that we'll get + * all of the memory back if they are used fast enough. */ + for (n = 0; + ((c = virtqueue_get_capacity(vi-svq) capacity) || n 2) + ((skb = virtqueue_get_buf(vi-svq, len))); + ++n) { pr_debug(Sent skb %p\n, skb); vi-dev-stats.tx_bytes += skb-len; vi-dev-stats.tx_packets++; dev_kfree_skb_any(skb); } + return !c; This is for() abuse :) Why is the capacity check in there at all? Surely it's simpler to try to free 2 skbs each time around? This is in case we can't use indirect: we want to free up enough buffers for the following add_buf to succeed. Sure, or we could just count the frags of the skb we're taking out, which would be accurate for both cases and far more intuitive. ie. always try to free up twice as much as we're about to put in. Can we hit problems with OOM? Sure, but no worse than now... The problem is that this virtqueue_get_capacity() returns the worst case, not the normal case. So using it is deceptive. I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make sure we have enough space in the buffer. Another way to do that is with a define :). To do this properly, we should really be using the actual number of sg elements needed, but we'd have to do most of xmit_skb beforehand so we know how many. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Fri, 20 May 2011 02:11:56 +0300, Michael S. Tsirkin m...@redhat.com wrote: Current code might introduce a lot of latency variation if there are many pending bufs at the time we attempt to transmit a new one. This is bad for real-time applications and can't be good for TCP either. Do we have more than speculation to back that up, BTW? This patch is pretty sloppy; the previous ones were better polished. -static void free_old_xmit_skbs(struct virtnet_info *vi) +static bool free_old_xmit_skbs(struct virtnet_info *vi, int capacity) { A comment here indicating it returns true if it frees something? struct sk_buff *skb; unsigned int len; - - while ((skb = virtqueue_get_buf(vi-svq, len)) != NULL) { + bool c; + int n; + + /* We try to free up at least 2 skbs per one sent, so that we'll get + * all of the memory back if they are used fast enough. */ + for (n = 0; + ((c = virtqueue_get_capacity(vi-svq) capacity) || n 2) + ((skb = virtqueue_get_buf(vi-svq, len))); + ++n) { pr_debug(Sent skb %p\n, skb); vi-dev-stats.tx_bytes += skb-len; vi-dev-stats.tx_packets++; dev_kfree_skb_any(skb); } + return !c; This is for() abuse :) Why is the capacity check in there at all? Surely it's simpler to try to free 2 skbs each time around? for (n = 0; n 2; n++) { skb = virtqueue_get_buf(vi-svq, len); if (!skb) break; pr_debug(Sent skb %p\n, skb); vi-dev-stats.tx_bytes += skb-len; vi-dev-stats.tx_packets++; dev_kfree_skb_any(skb); } static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb) @@ -574,8 +582,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) struct virtnet_info *vi = netdev_priv(dev); int capacity; - /* Free up any pending old buffers before queueing new ones. */ - free_old_xmit_skbs(vi); + /* Free enough pending old buffers to enable queueing new ones. */ + free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS); /* Try to transmit */ capacity = xmit_skb(vi, skb); @@ -609,9 +617,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) netif_stop_queue(dev); if (unlikely(!virtqueue_enable_cb_delayed(vi-svq))) { /* More just got used, free them then recheck. */ - free_old_xmit_skbs(vi); - capacity = virtqueue_get_capacity(vi-svq); - if (capacity = 2+MAX_SKB_FRAGS) { + if (!likely(free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { This extra argument to free_old_xmit_skbs seems odd, unless you have future plans? Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCHv2 10/14] virtio_net: limit xmit polling
Current code might introduce a lot of latency variation if there are many pending bufs at the time we attempt to transmit a new one. This is bad for real-time applications and can't be good for TCP either. Free up just enough to both clean up all buffers eventually and to be able to xmit the next packet. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 22 ++ 1 files changed, 14 insertions(+), 8 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index f33c92b..42935cb 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -509,17 +509,25 @@ again: return received; } -static void free_old_xmit_skbs(struct virtnet_info *vi) +static bool free_old_xmit_skbs(struct virtnet_info *vi, int capacity) { struct sk_buff *skb; unsigned int len; - - while ((skb = virtqueue_get_buf(vi-svq, len)) != NULL) { + bool c; + int n; + + /* We try to free up at least 2 skbs per one sent, so that we'll get +* all of the memory back if they are used fast enough. */ + for (n = 0; +((c = virtqueue_get_capacity(vi-svq) capacity) || n 2) +((skb = virtqueue_get_buf(vi-svq, len))); +++n) { pr_debug(Sent skb %p\n, skb); vi-dev-stats.tx_bytes += skb-len; vi-dev-stats.tx_packets++; dev_kfree_skb_any(skb); } + return !c; } static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb) @@ -574,8 +582,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) struct virtnet_info *vi = netdev_priv(dev); int capacity; - /* Free up any pending old buffers before queueing new ones. */ - free_old_xmit_skbs(vi); + /* Free enough pending old buffers to enable queueing new ones. */ + free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS); /* Try to transmit */ capacity = xmit_skb(vi, skb); @@ -609,9 +617,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) netif_stop_queue(dev); if (unlikely(!virtqueue_enable_cb_delayed(vi-svq))) { /* More just got used, free them then recheck. */ - free_old_xmit_skbs(vi); - capacity = virtqueue_get_capacity(vi-svq); - if (capacity = 2+MAX_SKB_FRAGS) { + if (!likely(free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { netif_start_queue(dev); virtqueue_disable_cb(vi-svq); } -- 1.7.5.53.gc233e ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization