Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Sat, 28 May 2011 23:02:04 +0300, "Michael S. Tsirkin" wrote: > On Thu, May 26, 2011 at 12:58:23PM +0930, Rusty Russell wrote: > > ie. free two packets for every one we're about to add. For steady state > > that would work really well. > > Sure, with indirect buffers, but if we > don't use indirect (and we discussed switching indirect off > dynamically in the past) this becomes harder to > be sure about. I think I understand why but > does not a simple capacity check make it more obvious? ... > > Then we hit the case where the ring > > seems full after we do the add: at that point, screw latency, and just > > try to free all the buffers we can. > > I see. But the code currently does this: > > for(..) > get_buf > add_buf > if (capacity < max_sk_frags+2) { > if (!enable_cb) > for(..) > get_buf > } > > > In other words the second get_buf is only called > in the unlikely case of race condition. > > So we'll need to add *another* call to get_buf. > Is it just me or is this becoming messy? Yes, good point. I really wonder if anyone would be able to measure the difference between simply freeing 2 every time (with possible extra stalls for strange cases) and the more complete version. But it runs against my grain to implement heuristics when one more call would make it provably reliable. Please find a way to make that for loop less ugly though! Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Thu, May 26, 2011 at 12:58:23PM +0930, Rusty Russell wrote: > On Wed, 25 May 2011 09:07:59 +0300, "Michael S. Tsirkin" > wrote: > > On Wed, May 25, 2011 at 11:05:04AM +0930, Rusty Russell wrote: > > Hmm I'm not sure I got it, need to think about this. > > I'd like to go back and document how my design was supposed to work. > > This really should have been in commit log or even a comment. > > I thought we need a min, not a max. > > We start with this: > > > > while ((c = (virtqueue_get_capacity(vq) < 2 + MAX_SKB_FRAGS) && > > (skb = get_buf))) > > kfree_skb(skb); > > return !c; > > > > This is clean and simple, right? And it's exactly asking for what we need. > > No, I started from the other direction: > > for (i = 0; i < 2; i++) { > skb = get_buf(); > if (!skb) > break; > kfree_skb(skb); > } > > ie. free two packets for every one we're about to add. For steady state > that would work really well. Sure, with indirect buffers, but if we don't use indirect (and we discussed switching indirect off dynamically in the past) this becomes harder to be sure about. I think I understand why but does not a simple capacity check make it more obvious? > Then we hit the case where the ring > seems full after we do the add: at that point, screw latency, and just > try to free all the buffers we can. I see. But the code currently does this: for(..) get_buf add_buf if (capacity < max_sk_frags+2) { if (!enable_cb) for(..) get_buf } In other words the second get_buf is only called in the unlikely case of race condition. So we'll need to add *another* call to get_buf. Is it just me or is this becoming messy? I was also be worried that we are adding more "modes" to the code: high and low latency depending on different speeds between host and guest, which would be hard to trigger and test. That's why I tried hard to make the code behave the same all the time and free up just a bit more than the minimum necessary. > > on the normal path min == 2 so we're low latency but we keep ahead on > > average. min == 0 for the "we're out of capacity, we may have to stop > > the queue". > > > > Does the above make sense at all? > > It makes sense, but I think it's a classic case where incremental > improvements aren't as good as starting from scratch. > > Cheers, > Rusty. The only difference on good path seems an extra capacity check, so I don't expect the difference will be testable, do you? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Wed, 25 May 2011 09:07:59 +0300, "Michael S. Tsirkin" wrote: > On Wed, May 25, 2011 at 11:05:04AM +0930, Rusty Russell wrote: > Hmm I'm not sure I got it, need to think about this. > I'd like to go back and document how my design was supposed to work. > This really should have been in commit log or even a comment. > I thought we need a min, not a max. > We start with this: > > while ((c = (virtqueue_get_capacity(vq) < 2 + MAX_SKB_FRAGS) && > (skb = get_buf))) > kfree_skb(skb); > return !c; > > This is clean and simple, right? And it's exactly asking for what we need. No, I started from the other direction: for (i = 0; i < 2; i++) { skb = get_buf(); if (!skb) break; kfree_skb(skb); } ie. free two packets for every one we're about to add. For steady state that would work really well. Then we hit the case where the ring seems full after we do the add: at that point, screw latency, and just try to free all the buffers we can. > on the normal path min == 2 so we're low latency but we keep ahead on > average. min == 0 for the "we're out of capacity, we may have to stop > the queue". > > Does the above make sense at all? It makes sense, but I think it's a classic case where incremental improvements aren't as good as starting from scratch. Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Wed, May 25, 2011 at 11:05:04AM +0930, Rusty Russell wrote: > On Mon, 23 May 2011 14:19:00 +0300, "Michael S. Tsirkin" > wrote: > > I do understand how it seems a waste to leave direct space > > in the ring while we might in practice have space > > due to indirect. Didn't come up with a nice way to > > solve this yet - but 'no worse than now :)' > > Let's just make it "bool free_old_xmit_skbs(unsigned int max)". max == > 2 for the normal xmit path, so we're low latency but we keep ahead on > average. max == -1 for the "we're out of capacity, we may have to stop > the queue". > > That keeps it simple and probably the right thing... > > Thanks, > Rusty. Hmm I'm not sure I got it, need to think about this. I'd like to go back and document how my design was supposed to work. This really should have been in commit log or even a comment. I thought we need a min, not a max. We start with this: while ((c = (virtqueue_get_capacity(vq) < 2 + MAX_SKB_FRAGS) && (skb = get_buf))) kfree_skb(skb); return !c; This is clean and simple, right? And it's exactly asking for what we need. But this way we always keep a lot of memory in skbs even when rate of communication is low. So we add the min parameter: int n = 0; while c = (virtqueue_get_capacity(vq) < 2 + MAX_SKB_FRAGS)) || n++ < min) && (skb = get_buf))) kfree_skb(skb); return !c; on the normal path min == 2 so we're low latency but we keep ahead on average. min == 0 for the "we're out of capacity, we may have to stop the queue". Does the above make sense at all? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Wed, May 25, 2011 at 10:58:26AM +0930, Rusty Russell wrote: > On Mon, 23 May 2011 14:19:00 +0300, "Michael S. Tsirkin" > wrote: > > On Mon, May 23, 2011 at 11:37:15AM +0930, Rusty Russell wrote: > > > Can we hit problems with OOM? Sure, but no worse than now... > > > The problem is that this "virtqueue_get_capacity()" returns the worst > > > case, not the normal case. So using it is deceptive. > > > > > > > Maybe just document this? > > Yes, but also by renaming virtqueue_get_capacity(). Takes it from a 3 > to a 6 on the API hard-to-misuse scale. > > How about, virtqueue_min_capacity()? Makes the reader realize something > weird is going on. Absolutely. Great idea. > > I still believe capacity really needs to be decided > > at the virtqueue level, not in the driver. > > E.g. with indirect each skb uses a single entry: freeing > > 1 small skb is always enough to have space for a large one. > > > > I do understand how it seems a waste to leave direct space > > in the ring while we might in practice have space > > due to indirect. Didn't come up with a nice way to > > solve this yet - but 'no worse than now :)' > > Agreed. > > > > > I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make > > > > sure we have enough space in the buffer. Another way to do > > > > that is with a define :). > > > > > > To do this properly, we should really be using the actual number of sg > > > elements needed, but we'd have to do most of xmit_skb beforehand so we > > > know how many. > > > > > > Cheers, > > > Rusty. > > > > Maybe I'm confused here. The problem isn't the failing > > add_buf for the given skb IIUC. What we are trying to do here is stop > > the queue *before xmit_skb fails*. We can't look at the > > number of fragments in the current skb - the next one can be > > much larger. That's why we check capacity after xmit_skb, > > not before it, right? > > No, I was confused... More coffee! > > Thanks, > Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Mon, 23 May 2011 14:19:00 +0300, "Michael S. Tsirkin" wrote: > I do understand how it seems a waste to leave direct space > in the ring while we might in practice have space > due to indirect. Didn't come up with a nice way to > solve this yet - but 'no worse than now :)' Let's just make it "bool free_old_xmit_skbs(unsigned int max)". max == 2 for the normal xmit path, so we're low latency but we keep ahead on average. max == -1 for the "we're out of capacity, we may have to stop the queue". That keeps it simple and probably the right thing... Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Mon, 23 May 2011 14:19:00 +0300, "Michael S. Tsirkin" wrote: > On Mon, May 23, 2011 at 11:37:15AM +0930, Rusty Russell wrote: > > Can we hit problems with OOM? Sure, but no worse than now... > > The problem is that this "virtqueue_get_capacity()" returns the worst > > case, not the normal case. So using it is deceptive. > > > > Maybe just document this? Yes, but also by renaming virtqueue_get_capacity(). Takes it from a 3 to a 6 on the API hard-to-misuse scale. How about, virtqueue_min_capacity()? Makes the reader realize something weird is going on. > I still believe capacity really needs to be decided > at the virtqueue level, not in the driver. > E.g. with indirect each skb uses a single entry: freeing > 1 small skb is always enough to have space for a large one. > > I do understand how it seems a waste to leave direct space > in the ring while we might in practice have space > due to indirect. Didn't come up with a nice way to > solve this yet - but 'no worse than now :)' Agreed. > > > I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make > > > sure we have enough space in the buffer. Another way to do > > > that is with a define :). > > > > To do this properly, we should really be using the actual number of sg > > elements needed, but we'd have to do most of xmit_skb beforehand so we > > know how many. > > > > Cheers, > > Rusty. > > Maybe I'm confused here. The problem isn't the failing > add_buf for the given skb IIUC. What we are trying to do here is stop > the queue *before xmit_skb fails*. We can't look at the > number of fragments in the current skb - the next one can be > much larger. That's why we check capacity after xmit_skb, > not before it, right? No, I was confused... More coffee! Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Tue, May 24, 2011 at 06:20:35PM +0530, Krishna Kumar2 wrote: > "Michael S. Tsirkin" wrote on 05/24/2011 04:59:39 PM: > > > > > > Maybe Rusty means it is a simpler model to free the amount > > > > > of space that this xmit needs. We will still fail anyway > > > > > at some time but it is unlikely, since earlier iteration > > > > > freed up atleast the space that it was going to use. > > > > > > > > Not sure I nderstand. We can't know space is freed in the previous > > > > iteration as buffers might not have been used by then. > > > > > > Yes, the first few iterations may not have freed up space, but > > > later ones should. The amount of free space should increase > > > from then on, especially since we try to free double of what > > > we consume. > > > > Hmm. This is only an upper limit on the # of entries in the queue. > > Assume that vq size is 4 and we transmit 4 enties without > > getting anything in the used ring. The next transmit will fail. > > > > So I don't really see why it's unlikely that we reach the packet > > drop code with your patch. > > I was assuming 256 entries :) I will try to get some > numbers to see how often it is true tomorrow. That would depend on how fast the hypervisor is. Try doing something to make hypervisor slower than the guest. I don't think we need measurements to realize that with the host being slower than guest that would happen a lot, though. > > > I am not sure of why it was changed, since returning TX_BUSY > > > seems more efficient IMHO. > > > qdisc_restart() handles requeue'd > > > packets much better than a stopped queue, as a significant > > > part of this code is skipped if gso_skb is present > > > > I think this is the argument: > > http://www.mail-archive.com/virtualization@lists.linux- > > foundation.org/msg06364.html > > Thanks for digging up that thread! Yes, that one skb would get > sent first ahead of possibly higher priority skbs. However, > from a performance point, TX_BUSY code skips a lot of checks > and code for all subsequent packets till the device is > restarted. I can test performance with both cases and report > what I find (the requeue code has become very simple and clean > from "horribly complex", thanks to Herbert and Dave). Cc Herbert, and try to convince him :) > > > (qdisc > > > will eventually start dropping packets when tx_queue_len is > > > > tx_queue_len is a pretty large buffer so maybe no. > > I remember seeing tons of drops (pfifo_fast_enqueue) when > xmit returns TX_BUSY. > > > I think the packet drops from the scheduler queue can also be > > done intelligently (e.g. with CHOKe) which should > > work better than dropping a random packet? > > I am not sure of that - choke_enqueue checks against a random > skb to drop current skb, and also during congestion. But for > my "sample driver xmit", returning TX_BUSY could still allow > to be used with CHOKe. > > thanks, > > - KK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
"Michael S. Tsirkin" wrote on 05/24/2011 04:59:39 PM: > > > > Maybe Rusty means it is a simpler model to free the amount > > > > of space that this xmit needs. We will still fail anyway > > > > at some time but it is unlikely, since earlier iteration > > > > freed up atleast the space that it was going to use. > > > > > > Not sure I nderstand. We can't know space is freed in the previous > > > iteration as buffers might not have been used by then. > > > > Yes, the first few iterations may not have freed up space, but > > later ones should. The amount of free space should increase > > from then on, especially since we try to free double of what > > we consume. > > Hmm. This is only an upper limit on the # of entries in the queue. > Assume that vq size is 4 and we transmit 4 enties without > getting anything in the used ring. The next transmit will fail. > > So I don't really see why it's unlikely that we reach the packet > drop code with your patch. I was assuming 256 entries :) I will try to get some numbers to see how often it is true tomorrow. > > I am not sure of why it was changed, since returning TX_BUSY > > seems more efficient IMHO. > > qdisc_restart() handles requeue'd > > packets much better than a stopped queue, as a significant > > part of this code is skipped if gso_skb is present > > I think this is the argument: > http://www.mail-archive.com/virtualization@lists.linux- > foundation.org/msg06364.html Thanks for digging up that thread! Yes, that one skb would get sent first ahead of possibly higher priority skbs. However, from a performance point, TX_BUSY code skips a lot of checks and code for all subsequent packets till the device is restarted. I can test performance with both cases and report what I find (the requeue code has become very simple and clean from "horribly complex", thanks to Herbert and Dave). > > (qdisc > > will eventually start dropping packets when tx_queue_len is > > tx_queue_len is a pretty large buffer so maybe no. I remember seeing tons of drops (pfifo_fast_enqueue) when xmit returns TX_BUSY. > I think the packet drops from the scheduler queue can also be > done intelligently (e.g. with CHOKe) which should > work better than dropping a random packet? I am not sure of that - choke_enqueue checks against a random skb to drop current skb, and also during congestion. But for my "sample driver xmit", returning TX_BUSY could still allow to be used with CHOKe. thanks, - KK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Tue, May 24, 2011 at 02:57:43PM +0530, Krishna Kumar2 wrote: > "Michael S. Tsirkin" wrote on 05/24/2011 02:42:55 PM: > > > > > > To do this properly, we should really be using the actual number of > sg > > > > > elements needed, but we'd have to do most of xmit_skb beforehand so > we > > > > > know how many. > > > > > > > > > > Cheers, > > > > > Rusty. > > > > > > > > Maybe I'm confused here. The problem isn't the failing > > > > add_buf for the given skb IIUC. What we are trying to do here is > stop > > > > the queue *before xmit_skb fails*. We can't look at the > > > > number of fragments in the current skb - the next one can be > > > > much larger. That's why we check capacity after xmit_skb, > > > > not before it, right? > > > > > > Maybe Rusty means it is a simpler model to free the amount > > > of space that this xmit needs. We will still fail anyway > > > at some time but it is unlikely, since earlier iteration > > > freed up atleast the space that it was going to use. > > > > Not sure I nderstand. We can't know space is freed in the previous > > iteration as buffers might not have been used by then. > > Yes, the first few iterations may not have freed up space, but > later ones should. The amount of free space should increase > from then on, especially since we try to free double of what > we consume. Hmm. This is only an upper limit on the # of entries in the queue. Assume that vq size is 4 and we transmit 4 enties without getting anything in the used ring. The next transmit will fail. So I don't really see why it's unlikely that we reach the packet drop code with your patch. > > > The > > > code could become much simpler: > > > > > > start_xmit() > > > { > > > { > > > num_sgs = get num_sgs for this skb; > > > > > > /* Free enough pending old buffers to enable queueing this one > */ > > > free_old_xmit_skbs(vi, num_sgs * 2); /* ?? */ > > > > > > if (virtqueue_get_capacity() < num_sgs) { > > > netif_stop_queue(dev); > > > if (virtqueue_enable_cb_delayed(vi->svq) || > > > free_old_xmit_skbs(vi, num_sgs)) { > > > /* Nothing freed up, or not enough freed up */ > > > kfree_skb(skb); > > > return NETDEV_TX_OK; > > > > This packet drop is what we wanted to avoid. > > Please see below on returning NETDEV_TX_BUSY. > > > > > > } > > > netif_start_queue(dev); > > > virtqueue_disable_cb(vi->svq); > > > } > > > > > > /* xmit_skb cannot fail now, also pass 'num_sgs' */ > > > xmit_skb(vi, skb, num_sgs); > > > virtqueue_kick(vi->svq); > > > > > > skb_orphan(skb); > > > nf_reset(skb); > > > > > > return NETDEV_TX_OK; > > > } > > > > > > We could even return TX_BUSY since that makes the dequeue > > > code more efficient. See dev_dequeue_skb() - you can skip a > > > lot of code (and avoid taking locks) to check if the queue > > > is already stopped but that code runs only if you return > > > TX_BUSY in the earlier iteration. > > > > > > BTW, shouldn't the check in start_xmit be: > > >if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { > > > ... > > >} > > > > > > Thanks, > > > > > > - KK > > > > I thought we used to do basically this but other devices moved to a > > model where they stop *before* queueing fails, so we did too. > > I am not sure of why it was changed, since returning TX_BUSY > seems more efficient IMHO. > qdisc_restart() handles requeue'd > packets much better than a stopped queue, as a significant > part of this code is skipped if gso_skb is present I think this is the argument: http://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg06364.html > (qdisc > will eventually start dropping packets when tx_queue_len is > exceeded anyway). > > Thanks, > > - KK tx_queue_len is a pretty large buffer so maybe no. I think the packet drops from the scheduler queue can also be done intelligently (e.g. with CHOKe) which should work better than dropping a random packet? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
"Michael S. Tsirkin" wrote on 05/24/2011 02:42:55 PM: > > > > To do this properly, we should really be using the actual number of sg > > > > elements needed, but we'd have to do most of xmit_skb beforehand so we > > > > know how many. > > > > > > > > Cheers, > > > > Rusty. > > > > > > Maybe I'm confused here. The problem isn't the failing > > > add_buf for the given skb IIUC. What we are trying to do here is stop > > > the queue *before xmit_skb fails*. We can't look at the > > > number of fragments in the current skb - the next one can be > > > much larger. That's why we check capacity after xmit_skb, > > > not before it, right? > > > > Maybe Rusty means it is a simpler model to free the amount > > of space that this xmit needs. We will still fail anyway > > at some time but it is unlikely, since earlier iteration > > freed up atleast the space that it was going to use. > > Not sure I nderstand. We can't know space is freed in the previous > iteration as buffers might not have been used by then. Yes, the first few iterations may not have freed up space, but later ones should. The amount of free space should increase from then on, especially since we try to free double of what we consume. > > The > > code could become much simpler: > > > > start_xmit() > > { > > { > > num_sgs = get num_sgs for this skb; > > > > /* Free enough pending old buffers to enable queueing this one */ > > free_old_xmit_skbs(vi, num_sgs * 2); /* ?? */ > > > > if (virtqueue_get_capacity() < num_sgs) { > > netif_stop_queue(dev); > > if (virtqueue_enable_cb_delayed(vi->svq) || > > free_old_xmit_skbs(vi, num_sgs)) { > > /* Nothing freed up, or not enough freed up */ > > kfree_skb(skb); > > return NETDEV_TX_OK; > > This packet drop is what we wanted to avoid. Please see below on returning NETDEV_TX_BUSY. > > > } > > netif_start_queue(dev); > > virtqueue_disable_cb(vi->svq); > > } > > > > /* xmit_skb cannot fail now, also pass 'num_sgs' */ > > xmit_skb(vi, skb, num_sgs); > > virtqueue_kick(vi->svq); > > > > skb_orphan(skb); > > nf_reset(skb); > > > > return NETDEV_TX_OK; > > } > > > > We could even return TX_BUSY since that makes the dequeue > > code more efficient. See dev_dequeue_skb() - you can skip a > > lot of code (and avoid taking locks) to check if the queue > > is already stopped but that code runs only if you return > > TX_BUSY in the earlier iteration. > > > > BTW, shouldn't the check in start_xmit be: > >if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { > > ... > >} > > > > Thanks, > > > > - KK > > I thought we used to do basically this but other devices moved to a > model where they stop *before* queueing fails, so we did too. I am not sure of why it was changed, since returning TX_BUSY seems more efficient IMHO. qdisc_restart() handles requeue'd packets much better than a stopped queue, as a significant part of this code is skipped if gso_skb is present (qdisc will eventually start dropping packets when tx_queue_len is exceeded anyway). Thanks, - KK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Tue, May 24, 2011 at 01:24:15PM +0530, Krishna Kumar2 wrote: > "Michael S. Tsirkin" wrote on 05/23/2011 04:49:00 PM: > > > > To do this properly, we should really be using the actual number of sg > > > elements needed, but we'd have to do most of xmit_skb beforehand so we > > > know how many. > > > > > > Cheers, > > > Rusty. > > > > Maybe I'm confused here. The problem isn't the failing > > add_buf for the given skb IIUC. What we are trying to do here is stop > > the queue *before xmit_skb fails*. We can't look at the > > number of fragments in the current skb - the next one can be > > much larger. That's why we check capacity after xmit_skb, > > not before it, right? > > Maybe Rusty means it is a simpler model to free the amount > of space that this xmit needs. We will still fail anyway > at some time but it is unlikely, since earlier iteration > freed up atleast the space that it was going to use. Not sure I nderstand. We can't know space is freed in the previous iteration as buffers might not have been used by then. > The > code could become much simpler: > > start_xmit() > { > { > num_sgs = get num_sgs for this skb; > > /* Free enough pending old buffers to enable queueing this one */ > free_old_xmit_skbs(vi, num_sgs * 2); /* ?? */ > > if (virtqueue_get_capacity() < num_sgs) { > netif_stop_queue(dev); > if (virtqueue_enable_cb_delayed(vi->svq) || > free_old_xmit_skbs(vi, num_sgs)) { > /* Nothing freed up, or not enough freed up */ > kfree_skb(skb); > return NETDEV_TX_OK; This packet drop is what we wanted to avoid. > } > netif_start_queue(dev); > virtqueue_disable_cb(vi->svq); > } > > /* xmit_skb cannot fail now, also pass 'num_sgs' */ > xmit_skb(vi, skb, num_sgs); > virtqueue_kick(vi->svq); > > skb_orphan(skb); > nf_reset(skb); > > return NETDEV_TX_OK; > } > > We could even return TX_BUSY since that makes the dequeue > code more efficient. See dev_dequeue_skb() - you can skip a > lot of code (and avoid taking locks) to check if the queue > is already stopped but that code runs only if you return > TX_BUSY in the earlier iteration. > > BTW, shouldn't the check in start_xmit be: > if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { > ... > } > > Thanks, > > - KK I thought we used to do basically this but other devices moved to a model where they stop *before* queueing fails, so we did too. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
"Michael S. Tsirkin" wrote on 05/23/2011 04:49:00 PM: > > To do this properly, we should really be using the actual number of sg > > elements needed, but we'd have to do most of xmit_skb beforehand so we > > know how many. > > > > Cheers, > > Rusty. > > Maybe I'm confused here. The problem isn't the failing > add_buf for the given skb IIUC. What we are trying to do here is stop > the queue *before xmit_skb fails*. We can't look at the > number of fragments in the current skb - the next one can be > much larger. That's why we check capacity after xmit_skb, > not before it, right? Maybe Rusty means it is a simpler model to free the amount of space that this xmit needs. We will still fail anyway at some time but it is unlikely, since earlier iteration freed up atleast the space that it was going to use. The code could become much simpler: start_xmit() { { num_sgs = get num_sgs for this skb; /* Free enough pending old buffers to enable queueing this one */ free_old_xmit_skbs(vi, num_sgs * 2); /* ?? */ if (virtqueue_get_capacity() < num_sgs) { netif_stop_queue(dev); if (virtqueue_enable_cb_delayed(vi->svq) || free_old_xmit_skbs(vi, num_sgs)) { /* Nothing freed up, or not enough freed up */ kfree_skb(skb); return NETDEV_TX_OK; } netif_start_queue(dev); virtqueue_disable_cb(vi->svq); } /* xmit_skb cannot fail now, also pass 'num_sgs' */ xmit_skb(vi, skb, num_sgs); virtqueue_kick(vi->svq); skb_orphan(skb); nf_reset(skb); return NETDEV_TX_OK; } We could even return TX_BUSY since that makes the dequeue code more efficient. See dev_dequeue_skb() - you can skip a lot of code (and avoid taking locks) to check if the queue is already stopped but that code runs only if you return TX_BUSY in the earlier iteration. BTW, shouldn't the check in start_xmit be: if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { ... } Thanks, - KK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Mon, May 23, 2011 at 11:37:15AM +0930, Rusty Russell wrote: > On Sun, 22 May 2011 15:10:08 +0300, "Michael S. Tsirkin" > wrote: > > On Sat, May 21, 2011 at 11:49:59AM +0930, Rusty Russell wrote: > > > On Fri, 20 May 2011 02:11:56 +0300, "Michael S. Tsirkin" > > > wrote: > > > > Current code might introduce a lot of latency variation > > > > if there are many pending bufs at the time we > > > > attempt to transmit a new one. This is bad for > > > > real-time applications and can't be good for TCP either. > > > > > > Do we have more than speculation to back that up, BTW? > > > > Need to dig this up: I thought we saw some reports of this on the list? > > I think so too, but a reference needs to be here too. > > It helps to have exact benchmarks on what's being tested, otherwise we > risk unexpected interaction with the other optimization patches. > > > > > struct sk_buff *skb; > > > > unsigned int len; > > > > - > > > > - while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) { > > > > + bool c; > > > > + int n; > > > > + > > > > + /* We try to free up at least 2 skbs per one sent, so that > > > > we'll get > > > > +* all of the memory back if they are used fast enough. */ > > > > + for (n = 0; > > > > +((c = virtqueue_get_capacity(vi->svq) < capacity) || n < > > > > 2) && > > > > +((skb = virtqueue_get_buf(vi->svq, &len))); > > > > +++n) { > > > > pr_debug("Sent skb %p\n", skb); > > > > vi->dev->stats.tx_bytes += skb->len; > > > > vi->dev->stats.tx_packets++; > > > > dev_kfree_skb_any(skb); > > > > } > > > > + return !c; > > > > > > This is for() abuse :) > > > > > > Why is the capacity check in there at all? Surely it's simpler to try > > > to free 2 skbs each time around? > > > > This is in case we can't use indirect: we want to free up > > enough buffers for the following add_buf to succeed. > > Sure, or we could just count the frags of the skb we're taking out, > which would be accurate for both cases and far more intuitive. > > ie. always try to free up twice as much as we're about to put in. > > Can we hit problems with OOM? Sure, but no worse than now... > The problem is that this "virtqueue_get_capacity()" returns the worst > case, not the normal case. So using it is deceptive. > Maybe just document this? I still believe capacity really needs to be decided at the virtqueue level, not in the driver. E.g. with indirect each skb uses a single entry: freeing 1 small skb is always enough to have space for a large one. I do understand how it seems a waste to leave direct space in the ring while we might in practice have space due to indirect. Didn't come up with a nice way to solve this yet - but 'no worse than now :)' > > I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make > > sure we have enough space in the buffer. Another way to do > > that is with a define :). > > To do this properly, we should really be using the actual number of sg > elements needed, but we'd have to do most of xmit_skb beforehand so we > know how many. > > Cheers, > Rusty. Maybe I'm confused here. The problem isn't the failing add_buf for the given skb IIUC. What we are trying to do here is stop the queue *before xmit_skb fails*. We can't look at the number of fragments in the current skb - the next one can be much larger. That's why we check capacity after xmit_skb, not before it, right? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Sun, 22 May 2011 15:10:08 +0300, "Michael S. Tsirkin" wrote: > On Sat, May 21, 2011 at 11:49:59AM +0930, Rusty Russell wrote: > > On Fri, 20 May 2011 02:11:56 +0300, "Michael S. Tsirkin" > > wrote: > > > Current code might introduce a lot of latency variation > > > if there are many pending bufs at the time we > > > attempt to transmit a new one. This is bad for > > > real-time applications and can't be good for TCP either. > > > > Do we have more than speculation to back that up, BTW? > > Need to dig this up: I thought we saw some reports of this on the list? I think so too, but a reference needs to be here too. It helps to have exact benchmarks on what's being tested, otherwise we risk unexpected interaction with the other optimization patches. > > > struct sk_buff *skb; > > > unsigned int len; > > > - > > > - while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) { > > > + bool c; > > > + int n; > > > + > > > + /* We try to free up at least 2 skbs per one sent, so that we'll get > > > + * all of the memory back if they are used fast enough. */ > > > + for (n = 0; > > > + ((c = virtqueue_get_capacity(vi->svq) < capacity) || n < 2) && > > > + ((skb = virtqueue_get_buf(vi->svq, &len))); > > > + ++n) { > > > pr_debug("Sent skb %p\n", skb); > > > vi->dev->stats.tx_bytes += skb->len; > > > vi->dev->stats.tx_packets++; > > > dev_kfree_skb_any(skb); > > > } > > > + return !c; > > > > This is for() abuse :) > > > > Why is the capacity check in there at all? Surely it's simpler to try > > to free 2 skbs each time around? > > This is in case we can't use indirect: we want to free up > enough buffers for the following add_buf to succeed. Sure, or we could just count the frags of the skb we're taking out, which would be accurate for both cases and far more intuitive. ie. always try to free up twice as much as we're about to put in. Can we hit problems with OOM? Sure, but no worse than now... The problem is that this "virtqueue_get_capacity()" returns the worst case, not the normal case. So using it is deceptive. > I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make > sure we have enough space in the buffer. Another way to do > that is with a define :). To do this properly, we should really be using the actual number of sg elements needed, but we'd have to do most of xmit_skb beforehand so we know how many. Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Sat, May 21, 2011 at 11:49:59AM +0930, Rusty Russell wrote: > On Fri, 20 May 2011 02:11:56 +0300, "Michael S. Tsirkin" > wrote: > > Current code might introduce a lot of latency variation > > if there are many pending bufs at the time we > > attempt to transmit a new one. This is bad for > > real-time applications and can't be good for TCP either. > > Do we have more than speculation to back that up, BTW? Need to dig this up: I thought we saw some reports of this on the list? > This patch is pretty sloppy; the previous ones were better polished. > > > -static void free_old_xmit_skbs(struct virtnet_info *vi) > > +static bool free_old_xmit_skbs(struct virtnet_info *vi, int capacity) > > { > > A comment here indicating it returns true if it frees something? Agree. > > struct sk_buff *skb; > > unsigned int len; > > - > > - while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) { > > + bool c; > > + int n; > > + > > + /* We try to free up at least 2 skbs per one sent, so that we'll get > > +* all of the memory back if they are used fast enough. */ > > + for (n = 0; > > +((c = virtqueue_get_capacity(vi->svq) < capacity) || n < 2) && > > +((skb = virtqueue_get_buf(vi->svq, &len))); > > +++n) { > > pr_debug("Sent skb %p\n", skb); > > vi->dev->stats.tx_bytes += skb->len; > > vi->dev->stats.tx_packets++; > > dev_kfree_skb_any(skb); > > } > > + return !c; > > This is for() abuse :) > > Why is the capacity check in there at all? Surely it's simpler to try > to free 2 skbs each time around? This is in case we can't use indirect: we want to free up enough buffers for the following add_buf to succeed. >for (n = 0; n < 2; n++) { > skb = virtqueue_get_buf(vi->svq, &len); > if (!skb) > break; > pr_debug("Sent skb %p\n", skb); > vi->dev->stats.tx_bytes += skb->len; > vi->dev->stats.tx_packets++; > dev_kfree_skb_any(skb); >} > > > static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb) > > @@ -574,8 +582,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, > > struct net_device *dev) > > struct virtnet_info *vi = netdev_priv(dev); > > int capacity; > > > > - /* Free up any pending old buffers before queueing new ones. */ > > - free_old_xmit_skbs(vi); > > + /* Free enough pending old buffers to enable queueing new ones. */ > > + free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS); > > > > /* Try to transmit */ > > capacity = xmit_skb(vi, skb); > > @@ -609,9 +617,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, > > struct net_device *dev) > > netif_stop_queue(dev); > > if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) { > > /* More just got used, free them then recheck. */ > > - free_old_xmit_skbs(vi); > > - capacity = virtqueue_get_capacity(vi->svq); > > - if (capacity >= 2+MAX_SKB_FRAGS) { > > + if (!likely(free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { > > This extra argument to free_old_xmit_skbs seems odd, unless you have > future plans? > > Thanks, > Rusty. I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make sure we have enough space in the buffer. Another way to do that is with a define :). -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Fri, 20 May 2011 02:11:56 +0300, "Michael S. Tsirkin" wrote: > Current code might introduce a lot of latency variation > if there are many pending bufs at the time we > attempt to transmit a new one. This is bad for > real-time applications and can't be good for TCP either. Do we have more than speculation to back that up, BTW? This patch is pretty sloppy; the previous ones were better polished. > -static void free_old_xmit_skbs(struct virtnet_info *vi) > +static bool free_old_xmit_skbs(struct virtnet_info *vi, int capacity) > { A comment here indicating it returns true if it frees something? > struct sk_buff *skb; > unsigned int len; > - > - while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) { > + bool c; > + int n; > + > + /* We try to free up at least 2 skbs per one sent, so that we'll get > + * all of the memory back if they are used fast enough. */ > + for (n = 0; > + ((c = virtqueue_get_capacity(vi->svq) < capacity) || n < 2) && > + ((skb = virtqueue_get_buf(vi->svq, &len))); > + ++n) { > pr_debug("Sent skb %p\n", skb); > vi->dev->stats.tx_bytes += skb->len; > vi->dev->stats.tx_packets++; > dev_kfree_skb_any(skb); > } > + return !c; This is for() abuse :) Why is the capacity check in there at all? Surely it's simpler to try to free 2 skbs each time around? for (n = 0; n < 2; n++) { skb = virtqueue_get_buf(vi->svq, &len); if (!skb) break; pr_debug("Sent skb %p\n", skb); vi->dev->stats.tx_bytes += skb->len; vi->dev->stats.tx_packets++; dev_kfree_skb_any(skb); } > static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb) > @@ -574,8 +582,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct > net_device *dev) > struct virtnet_info *vi = netdev_priv(dev); > int capacity; > > - /* Free up any pending old buffers before queueing new ones. */ > - free_old_xmit_skbs(vi); > + /* Free enough pending old buffers to enable queueing new ones. */ > + free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS); > > /* Try to transmit */ > capacity = xmit_skb(vi, skb); > @@ -609,9 +617,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct > net_device *dev) > netif_stop_queue(dev); > if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) { > /* More just got used, free them then recheck. */ > - free_old_xmit_skbs(vi); > - capacity = virtqueue_get_capacity(vi->svq); > - if (capacity >= 2+MAX_SKB_FRAGS) { > + if (!likely(free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { This extra argument to free_old_xmit_skbs seems odd, unless you have future plans? Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html