Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-29 Thread Rusty Russell
On Sat, 28 May 2011 23:02:04 +0300, "Michael S. Tsirkin"  
wrote:
> On Thu, May 26, 2011 at 12:58:23PM +0930, Rusty Russell wrote:
> > ie. free two packets for every one we're about to add.  For steady state
> > that would work really well.
> 
> Sure, with indirect buffers, but if we
> don't use indirect (and we discussed switching indirect off
> dynamically in the past) this becomes harder to
> be sure about. I think I understand why but
> does not a simple capacity check make it more obvious?

...

> >  Then we hit the case where the ring
> > seems full after we do the add: at that point, screw latency, and just
> > try to free all the buffers we can.
> 
> I see. But the code currently does this:
> 
>   for(..)
>   get_buf
>   add_buf
>   if (capacity < max_sk_frags+2) {
>   if (!enable_cb)
>   for(..)
>   get_buf
>   }
> 
> 
> In other words the second get_buf is only called
> in the unlikely case of race condition.
> 
> So we'll need to add *another* call to get_buf.
> Is it just me or is this becoming messy?

Yes, good point.  I really wonder if anyone would be able to measure the
difference between simply freeing 2 every time (with possible extra
stalls for strange cases) and the more complete version.

But it runs against my grain to implement heuristics when one more call
would make it provably reliable.

Please find a way to make that for loop less ugly though!

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-28 Thread Michael S. Tsirkin
On Thu, May 26, 2011 at 12:58:23PM +0930, Rusty Russell wrote:
> On Wed, 25 May 2011 09:07:59 +0300, "Michael S. Tsirkin"  
> wrote:
> > On Wed, May 25, 2011 at 11:05:04AM +0930, Rusty Russell wrote:
> > Hmm I'm not sure I got it, need to think about this.
> > I'd like to go back and document how my design was supposed to work.
> > This really should have been in commit log or even a comment.
> > I thought we need a min, not a max.
> > We start with this:
> > 
> > while ((c = (virtqueue_get_capacity(vq) < 2 + MAX_SKB_FRAGS) &&
> > (skb = get_buf)))
> > kfree_skb(skb);
> > return !c;
> > 
> > This is clean and simple, right? And it's exactly asking for what we need.
> 
> No, I started from the other direction:
> 
> for (i = 0; i < 2; i++) {
> skb = get_buf();
> if (!skb)
> break;
> kfree_skb(skb);
> }
> 
> ie. free two packets for every one we're about to add.  For steady state
> that would work really well.

Sure, with indirect buffers, but if we
don't use indirect (and we discussed switching indirect off
dynamically in the past) this becomes harder to
be sure about. I think I understand why but
does not a simple capacity check make it more obvious?

>  Then we hit the case where the ring
> seems full after we do the add: at that point, screw latency, and just
> try to free all the buffers we can.

I see. But the code currently does this:

for(..)
get_buf
add_buf
if (capacity < max_sk_frags+2) {
if (!enable_cb)
for(..)
get_buf
}


In other words the second get_buf is only called
in the unlikely case of race condition.

So we'll need to add *another* call to get_buf.
Is it just me or is this becoming messy?

I was also be worried that we are adding more
"modes" to the code: high and low latency
depending on different speeds between host and guest,
which would be hard to trigger and test.
That's why I tried hard to make the code behave the
same all the time and free up just a bit more than
the minimum necessary.

> > on the normal path min == 2 so we're low latency but we keep ahead on
> > average. min == 0 for the "we're out of capacity, we may have to stop
> > the queue".
> > 
> > Does the above make sense at all?
> 
> It makes sense, but I think it's a classic case where incremental
> improvements aren't as good as starting from scratch.
> 
> Cheers,
> Rusty.

The only difference on good path seems an extra capacity check,
so I don't expect the difference will be testable, do you?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-27 Thread Rusty Russell
On Wed, 25 May 2011 09:07:59 +0300, "Michael S. Tsirkin"  
wrote:
> On Wed, May 25, 2011 at 11:05:04AM +0930, Rusty Russell wrote:
> Hmm I'm not sure I got it, need to think about this.
> I'd like to go back and document how my design was supposed to work.
> This really should have been in commit log or even a comment.
> I thought we need a min, not a max.
> We start with this:
> 
>   while ((c = (virtqueue_get_capacity(vq) < 2 + MAX_SKB_FRAGS) &&
>   (skb = get_buf)))
>   kfree_skb(skb);
>   return !c;
> 
> This is clean and simple, right? And it's exactly asking for what we need.

No, I started from the other direction:

for (i = 0; i < 2; i++) {
skb = get_buf();
if (!skb)
break;
kfree_skb(skb);
}

ie. free two packets for every one we're about to add.  For steady state
that would work really well.  Then we hit the case where the ring seems
full after we do the add: at that point, screw latency, and just try to
free all the buffers we can.

> on the normal path min == 2 so we're low latency but we keep ahead on
> average. min == 0 for the "we're out of capacity, we may have to stop
> the queue".
> 
> Does the above make sense at all?

It makes sense, but I think it's a classic case where incremental
improvements aren't as good as starting from scratch.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-24 Thread Michael S. Tsirkin
On Wed, May 25, 2011 at 11:05:04AM +0930, Rusty Russell wrote:
> On Mon, 23 May 2011 14:19:00 +0300, "Michael S. Tsirkin"  
> wrote:
> > I do understand how it seems a waste to leave direct space
> > in the ring while we might in practice have space
> > due to indirect. Didn't come up with a nice way to
> > solve this yet - but 'no worse than now :)'
> 
> Let's just make it "bool free_old_xmit_skbs(unsigned int max)".  max ==
> 2 for the normal xmit path, so we're low latency but we keep ahead on
> average.  max == -1 for the "we're out of capacity, we may have to stop
> the queue".
> 
> That keeps it simple and probably the right thing...
> 
> Thanks,
> Rusty.

Hmm I'm not sure I got it, need to think about this.
I'd like to go back and document how my design was supposed to work.
This really should have been in commit log or even a comment.
I thought we need a min, not a max.
We start with this:

while ((c = (virtqueue_get_capacity(vq) < 2 + MAX_SKB_FRAGS) &&
(skb = get_buf)))
kfree_skb(skb);
return !c;

This is clean and simple, right? And it's exactly asking for what we need.

But this way we always keep a lot of memory in skbs even when rate of
communication is low.

So we add the min parameter:

int n = 0;

while c = (virtqueue_get_capacity(vq) < 2 + MAX_SKB_FRAGS)) ||
 n++ < min) && (skb = get_buf)))
kfree_skb(skb);
return !c;


on the normal path min == 2 so we're low latency but we keep ahead on
average. min == 0 for the "we're out of capacity, we may have to stop
the queue".

Does the above make sense at all?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-24 Thread Michael S. Tsirkin
On Wed, May 25, 2011 at 10:58:26AM +0930, Rusty Russell wrote:
> On Mon, 23 May 2011 14:19:00 +0300, "Michael S. Tsirkin"  
> wrote:
> > On Mon, May 23, 2011 at 11:37:15AM +0930, Rusty Russell wrote:
> > > Can we hit problems with OOM?  Sure, but no worse than now...
> > > The problem is that this "virtqueue_get_capacity()" returns the worst
> > > case, not the normal case.  So using it is deceptive.
> > > 
> > 
> > Maybe just document this?
> 
> Yes, but also by renaming virtqueue_get_capacity().  Takes it from a 3
> to a 6 on the API hard-to-misuse scale.
> 
> How about, virtqueue_min_capacity()?  Makes the reader realize something
> weird is going on.

Absolutely. Great idea.

> > I still believe capacity really needs to be decided
> > at the virtqueue level, not in the driver.
> > E.g. with indirect each skb uses a single entry: freeing
> > 1 small skb is always enough to have space for a large one.
> > 
> > I do understand how it seems a waste to leave direct space
> > in the ring while we might in practice have space
> > due to indirect. Didn't come up with a nice way to
> > solve this yet - but 'no worse than now :)'
> 
> Agreed.
> 
> > > > I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make
> > > > sure we have enough space in the buffer. Another way to do
> > > > that is with a define :).
> > > 
> > > To do this properly, we should really be using the actual number of sg
> > > elements needed, but we'd have to do most of xmit_skb beforehand so we
> > > know how many.
> > > 
> > > Cheers,
> > > Rusty.
> > 
> > Maybe I'm confused here.  The problem isn't the failing
> > add_buf for the given skb IIUC.  What we are trying to do here is stop
> > the queue *before xmit_skb fails*. We can't look at the
> > number of fragments in the current skb - the next one can be
> > much larger.  That's why we check capacity after xmit_skb,
> > not before it, right?
> 
> No, I was confused...  More coffee!
> 
> Thanks,
> Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-24 Thread Rusty Russell
On Mon, 23 May 2011 14:19:00 +0300, "Michael S. Tsirkin"  
wrote:
> I do understand how it seems a waste to leave direct space
> in the ring while we might in practice have space
> due to indirect. Didn't come up with a nice way to
> solve this yet - but 'no worse than now :)'

Let's just make it "bool free_old_xmit_skbs(unsigned int max)".  max ==
2 for the normal xmit path, so we're low latency but we keep ahead on
average.  max == -1 for the "we're out of capacity, we may have to stop
the queue".

That keeps it simple and probably the right thing...

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-24 Thread Rusty Russell
On Mon, 23 May 2011 14:19:00 +0300, "Michael S. Tsirkin"  
wrote:
> On Mon, May 23, 2011 at 11:37:15AM +0930, Rusty Russell wrote:
> > Can we hit problems with OOM?  Sure, but no worse than now...
> > The problem is that this "virtqueue_get_capacity()" returns the worst
> > case, not the normal case.  So using it is deceptive.
> > 
> 
> Maybe just document this?

Yes, but also by renaming virtqueue_get_capacity().  Takes it from a 3
to a 6 on the API hard-to-misuse scale.

How about, virtqueue_min_capacity()?  Makes the reader realize something
weird is going on.

> I still believe capacity really needs to be decided
> at the virtqueue level, not in the driver.
> E.g. with indirect each skb uses a single entry: freeing
> 1 small skb is always enough to have space for a large one.
> 
> I do understand how it seems a waste to leave direct space
> in the ring while we might in practice have space
> due to indirect. Didn't come up with a nice way to
> solve this yet - but 'no worse than now :)'

Agreed.

> > > I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make
> > > sure we have enough space in the buffer. Another way to do
> > > that is with a define :).
> > 
> > To do this properly, we should really be using the actual number of sg
> > elements needed, but we'd have to do most of xmit_skb beforehand so we
> > know how many.
> > 
> > Cheers,
> > Rusty.
> 
> Maybe I'm confused here.  The problem isn't the failing
> add_buf for the given skb IIUC.  What we are trying to do here is stop
> the queue *before xmit_skb fails*. We can't look at the
> number of fragments in the current skb - the next one can be
> much larger.  That's why we check capacity after xmit_skb,
> not before it, right?

No, I was confused...  More coffee!

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-24 Thread Michael S. Tsirkin
On Tue, May 24, 2011 at 06:20:35PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin"  wrote on 05/24/2011 04:59:39 PM:
> 
> > > > > Maybe Rusty means it is a simpler model to free the amount
> > > > > of space that this xmit needs. We will still fail anyway
> > > > > at some time but it is unlikely, since earlier iteration
> > > > > freed up atleast the space that it was going to use.
> > > >
> > > > Not sure I nderstand.  We can't know space is freed in the previous
> > > > iteration as buffers might not have been used by then.
> > >
> > > Yes, the first few iterations may not have freed up space, but
> > > later ones should. The amount of free space should increase
> > > from then on, especially since we try to free double of what
> > > we consume.
> >
> > Hmm. This is only an upper limit on the # of entries in the queue.
> > Assume that vq size is 4 and we transmit 4 enties without
> > getting anything in the used ring. The next transmit will fail.
> >
> > So I don't really see why it's unlikely that we reach the packet
> > drop code with your patch.
> 
> I was assuming 256 entries :) I will try to get some
> numbers to see how often it is true tomorrow.

That would depend on how fast the hypervisor is.
Try doing something to make hypervisor slower than the guest.  I don't
think we need measurements to realize that with the host being slower
than guest that would happen a lot, though.

> > > I am not sure of why it was changed, since returning TX_BUSY
> > > seems more efficient IMHO.
> > > qdisc_restart() handles requeue'd
> > > packets much better than a stopped queue, as a significant
> > > part of this code is skipped if gso_skb is present
> >
> > I think this is the argument:
> > http://www.mail-archive.com/virtualization@lists.linux-
> > foundation.org/msg06364.html
> 
> Thanks for digging up that thread! Yes, that one skb would get
> sent first ahead of possibly higher priority skbs. However,
> from a performance point, TX_BUSY code skips a lot of checks
> and code for all subsequent packets till the device is
> restarted. I can test performance with both cases and report
> what I find (the requeue code has become very simple and clean
> from "horribly complex", thanks to Herbert and Dave).

Cc Herbert, and try to convince him :)

> > > (qdisc
> > > will eventually start dropping packets when tx_queue_len is
> >
> > tx_queue_len is a pretty large buffer so maybe no.
> 
> I remember seeing tons of drops (pfifo_fast_enqueue) when
> xmit returns TX_BUSY.
> 
> > I think the packet drops from the scheduler queue can also be
> > done intelligently (e.g. with CHOKe) which should
> > work better than dropping a random packet?
> 
> I am not sure of that - choke_enqueue checks against a random
> skb to drop current skb, and also during congestion. But for
> my "sample driver xmit", returning TX_BUSY could still allow
> to be used with CHOKe.
> 
> thanks,
> 
> - KK
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-24 Thread Krishna Kumar2
"Michael S. Tsirkin"  wrote on 05/24/2011 04:59:39 PM:

> > > > Maybe Rusty means it is a simpler model to free the amount
> > > > of space that this xmit needs. We will still fail anyway
> > > > at some time but it is unlikely, since earlier iteration
> > > > freed up atleast the space that it was going to use.
> > >
> > > Not sure I nderstand.  We can't know space is freed in the previous
> > > iteration as buffers might not have been used by then.
> >
> > Yes, the first few iterations may not have freed up space, but
> > later ones should. The amount of free space should increase
> > from then on, especially since we try to free double of what
> > we consume.
>
> Hmm. This is only an upper limit on the # of entries in the queue.
> Assume that vq size is 4 and we transmit 4 enties without
> getting anything in the used ring. The next transmit will fail.
>
> So I don't really see why it's unlikely that we reach the packet
> drop code with your patch.

I was assuming 256 entries :) I will try to get some
numbers to see how often it is true tomorrow.

> > I am not sure of why it was changed, since returning TX_BUSY
> > seems more efficient IMHO.
> > qdisc_restart() handles requeue'd
> > packets much better than a stopped queue, as a significant
> > part of this code is skipped if gso_skb is present
>
> I think this is the argument:
> http://www.mail-archive.com/virtualization@lists.linux-
> foundation.org/msg06364.html

Thanks for digging up that thread! Yes, that one skb would get
sent first ahead of possibly higher priority skbs. However,
from a performance point, TX_BUSY code skips a lot of checks
and code for all subsequent packets till the device is
restarted. I can test performance with both cases and report
what I find (the requeue code has become very simple and clean
from "horribly complex", thanks to Herbert and Dave).

> > (qdisc
> > will eventually start dropping packets when tx_queue_len is
>
> tx_queue_len is a pretty large buffer so maybe no.

I remember seeing tons of drops (pfifo_fast_enqueue) when
xmit returns TX_BUSY.

> I think the packet drops from the scheduler queue can also be
> done intelligently (e.g. with CHOKe) which should
> work better than dropping a random packet?

I am not sure of that - choke_enqueue checks against a random
skb to drop current skb, and also during congestion. But for
my "sample driver xmit", returning TX_BUSY could still allow
to be used with CHOKe.

thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-24 Thread Michael S. Tsirkin
On Tue, May 24, 2011 at 02:57:43PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin"  wrote on 05/24/2011 02:42:55 PM:
> 
> > > > > To do this properly, we should really be using the actual number of
> sg
> > > > > elements needed, but we'd have to do most of xmit_skb beforehand so
> we
> > > > > know how many.
> > > > >
> > > > > Cheers,
> > > > > Rusty.
> > > >
> > > > Maybe I'm confused here.  The problem isn't the failing
> > > > add_buf for the given skb IIUC.  What we are trying to do here is
> stop
> > > > the queue *before xmit_skb fails*. We can't look at the
> > > > number of fragments in the current skb - the next one can be
> > > > much larger.  That's why we check capacity after xmit_skb,
> > > > not before it, right?
> > >
> > > Maybe Rusty means it is a simpler model to free the amount
> > > of space that this xmit needs. We will still fail anyway
> > > at some time but it is unlikely, since earlier iteration
> > > freed up atleast the space that it was going to use.
> >
> > Not sure I nderstand.  We can't know space is freed in the previous
> > iteration as buffers might not have been used by then.
> 
> Yes, the first few iterations may not have freed up space, but
> later ones should. The amount of free space should increase
> from then on, especially since we try to free double of what
> we consume.

Hmm. This is only an upper limit on the # of entries in the queue.
Assume that vq size is 4 and we transmit 4 enties without
getting anything in the used ring. The next transmit will fail.

So I don't really see why it's unlikely that we reach the packet
drop code with your patch.

> > > The
> > > code could become much simpler:
> > >
> > > start_xmit()
> > > {
> > > {
> > > num_sgs = get num_sgs for this skb;
> > >
> > > /* Free enough pending old buffers to enable queueing this one
> */
> > > free_old_xmit_skbs(vi, num_sgs * 2); /* ?? */
> > >
> > > if (virtqueue_get_capacity() < num_sgs) {
> > > netif_stop_queue(dev);
> > > if (virtqueue_enable_cb_delayed(vi->svq) ||
> > > free_old_xmit_skbs(vi, num_sgs)) {
> > > /* Nothing freed up, or not enough freed up */
> > > kfree_skb(skb);
> > > return NETDEV_TX_OK;
> >
> > This packet drop is what we wanted to avoid.
> 
> Please see below on returning NETDEV_TX_BUSY.
> 
> >
> > > }
> > > netif_start_queue(dev);
> > > virtqueue_disable_cb(vi->svq);
> > > }
> > >
> > > /* xmit_skb cannot fail now, also pass 'num_sgs' */
> > > xmit_skb(vi, skb, num_sgs);
> > > virtqueue_kick(vi->svq);
> > >
> > > skb_orphan(skb);
> > > nf_reset(skb);
> > >
> > > return NETDEV_TX_OK;
> > > }
> > >
> > > We could even return TX_BUSY since that makes the dequeue
> > > code more efficient. See dev_dequeue_skb() - you can skip a
> > > lot of code (and avoid taking locks) to check if the queue
> > > is already stopped but that code runs only if you return
> > > TX_BUSY in the earlier iteration.
> > >
> > > BTW, shouldn't the check in start_xmit be:
> > >if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) {
> > >   ...
> > >}
> > >
> > > Thanks,
> > >
> > > - KK
> >
> > I thought we used to do basically this but other devices moved to a
> > model where they stop *before* queueing fails, so we did too.
> 
> I am not sure of why it was changed, since returning TX_BUSY
> seems more efficient IMHO.
> qdisc_restart() handles requeue'd
> packets much better than a stopped queue, as a significant
> part of this code is skipped if gso_skb is present

I think this is the argument:
http://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg06364.html


> (qdisc
> will eventually start dropping packets when tx_queue_len is
> exceeded anyway).
> 
> Thanks,
> 
> - KK

tx_queue_len is a pretty large buffer so maybe no.
I think the packet drops from the scheduler queue can also be
done intelligently (e.g. with CHOKe) which should
work better than dropping a random packet?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-24 Thread Krishna Kumar2
"Michael S. Tsirkin"  wrote on 05/24/2011 02:42:55 PM:

> > > > To do this properly, we should really be using the actual number of
sg
> > > > elements needed, but we'd have to do most of xmit_skb beforehand so
we
> > > > know how many.
> > > >
> > > > Cheers,
> > > > Rusty.
> > >
> > > Maybe I'm confused here.  The problem isn't the failing
> > > add_buf for the given skb IIUC.  What we are trying to do here is
stop
> > > the queue *before xmit_skb fails*. We can't look at the
> > > number of fragments in the current skb - the next one can be
> > > much larger.  That's why we check capacity after xmit_skb,
> > > not before it, right?
> >
> > Maybe Rusty means it is a simpler model to free the amount
> > of space that this xmit needs. We will still fail anyway
> > at some time but it is unlikely, since earlier iteration
> > freed up atleast the space that it was going to use.
>
> Not sure I nderstand.  We can't know space is freed in the previous
> iteration as buffers might not have been used by then.

Yes, the first few iterations may not have freed up space, but
later ones should. The amount of free space should increase
from then on, especially since we try to free double of what
we consume.

> > The
> > code could become much simpler:
> >
> > start_xmit()
> > {
> > {
> > num_sgs = get num_sgs for this skb;
> >
> > /* Free enough pending old buffers to enable queueing this one
*/
> > free_old_xmit_skbs(vi, num_sgs * 2); /* ?? */
> >
> > if (virtqueue_get_capacity() < num_sgs) {
> > netif_stop_queue(dev);
> > if (virtqueue_enable_cb_delayed(vi->svq) ||
> > free_old_xmit_skbs(vi, num_sgs)) {
> > /* Nothing freed up, or not enough freed up */
> > kfree_skb(skb);
> > return NETDEV_TX_OK;
>
> This packet drop is what we wanted to avoid.

Please see below on returning NETDEV_TX_BUSY.

>
> > }
> > netif_start_queue(dev);
> > virtqueue_disable_cb(vi->svq);
> > }
> >
> > /* xmit_skb cannot fail now, also pass 'num_sgs' */
> > xmit_skb(vi, skb, num_sgs);
> > virtqueue_kick(vi->svq);
> >
> > skb_orphan(skb);
> > nf_reset(skb);
> >
> > return NETDEV_TX_OK;
> > }
> >
> > We could even return TX_BUSY since that makes the dequeue
> > code more efficient. See dev_dequeue_skb() - you can skip a
> > lot of code (and avoid taking locks) to check if the queue
> > is already stopped but that code runs only if you return
> > TX_BUSY in the earlier iteration.
> >
> > BTW, shouldn't the check in start_xmit be:
> >if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) {
> >   ...
> >}
> >
> > Thanks,
> >
> > - KK
>
> I thought we used to do basically this but other devices moved to a
> model where they stop *before* queueing fails, so we did too.

I am not sure of why it was changed, since returning TX_BUSY
seems more efficient IMHO. qdisc_restart() handles requeue'd
packets much better than a stopped queue, as a significant
part of this code is skipped if gso_skb is present (qdisc
will eventually start dropping packets when tx_queue_len is
exceeded anyway).

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-24 Thread Michael S. Tsirkin
On Tue, May 24, 2011 at 01:24:15PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin"  wrote on 05/23/2011 04:49:00 PM:
> 
> > > To do this properly, we should really be using the actual number of sg
> > > elements needed, but we'd have to do most of xmit_skb beforehand so we
> > > know how many.
> > >
> > > Cheers,
> > > Rusty.
> >
> > Maybe I'm confused here.  The problem isn't the failing
> > add_buf for the given skb IIUC.  What we are trying to do here is stop
> > the queue *before xmit_skb fails*. We can't look at the
> > number of fragments in the current skb - the next one can be
> > much larger.  That's why we check capacity after xmit_skb,
> > not before it, right?
> 
> Maybe Rusty means it is a simpler model to free the amount
> of space that this xmit needs. We will still fail anyway
> at some time but it is unlikely, since earlier iteration
> freed up atleast the space that it was going to use.

Not sure I nderstand.  We can't know space is freed in the previous
iteration as buffers might not have been used by then.

> The
> code could become much simpler:
> 
> start_xmit()
> {
> {
> num_sgs = get num_sgs for this skb;
> 
> /* Free enough pending old buffers to enable queueing this one */
> free_old_xmit_skbs(vi, num_sgs * 2); /* ?? */
> 
> if (virtqueue_get_capacity() < num_sgs) {
> netif_stop_queue(dev);
> if (virtqueue_enable_cb_delayed(vi->svq) ||
> free_old_xmit_skbs(vi, num_sgs)) {
> /* Nothing freed up, or not enough freed up */
> kfree_skb(skb);
> return NETDEV_TX_OK;

This packet drop is what we wanted to avoid.


> }
> netif_start_queue(dev);
> virtqueue_disable_cb(vi->svq);
> }
> 
> /* xmit_skb cannot fail now, also pass 'num_sgs' */
> xmit_skb(vi, skb, num_sgs);
> virtqueue_kick(vi->svq);
> 
> skb_orphan(skb);
> nf_reset(skb);
> 
> return NETDEV_TX_OK;
> }
> 
> We could even return TX_BUSY since that makes the dequeue
> code more efficient. See dev_dequeue_skb() - you can skip a
> lot of code (and avoid taking locks) to check if the queue
> is already stopped but that code runs only if you return
> TX_BUSY in the earlier iteration.
> 
> BTW, shouldn't the check in start_xmit be:
>   if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) {
>   ...
>   }
> 
> Thanks,
> 
> - KK

I thought we used to do basically this but other devices moved to a
model where they stop *before* queueing fails, so we did too.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-24 Thread Krishna Kumar2
"Michael S. Tsirkin"  wrote on 05/23/2011 04:49:00 PM:

> > To do this properly, we should really be using the actual number of sg
> > elements needed, but we'd have to do most of xmit_skb beforehand so we
> > know how many.
> >
> > Cheers,
> > Rusty.
>
> Maybe I'm confused here.  The problem isn't the failing
> add_buf for the given skb IIUC.  What we are trying to do here is stop
> the queue *before xmit_skb fails*. We can't look at the
> number of fragments in the current skb - the next one can be
> much larger.  That's why we check capacity after xmit_skb,
> not before it, right?

Maybe Rusty means it is a simpler model to free the amount
of space that this xmit needs. We will still fail anyway
at some time but it is unlikely, since earlier iteration
freed up atleast the space that it was going to use. The
code could become much simpler:

start_xmit()
{
{
num_sgs = get num_sgs for this skb;

/* Free enough pending old buffers to enable queueing this one */
free_old_xmit_skbs(vi, num_sgs * 2); /* ?? */

if (virtqueue_get_capacity() < num_sgs) {
netif_stop_queue(dev);
if (virtqueue_enable_cb_delayed(vi->svq) ||
free_old_xmit_skbs(vi, num_sgs)) {
/* Nothing freed up, or not enough freed up */
kfree_skb(skb);
return NETDEV_TX_OK;
}
netif_start_queue(dev);
virtqueue_disable_cb(vi->svq);
}

/* xmit_skb cannot fail now, also pass 'num_sgs' */
xmit_skb(vi, skb, num_sgs);
virtqueue_kick(vi->svq);

skb_orphan(skb);
nf_reset(skb);

return NETDEV_TX_OK;
}

We could even return TX_BUSY since that makes the dequeue
code more efficient. See dev_dequeue_skb() - you can skip a
lot of code (and avoid taking locks) to check if the queue
is already stopped but that code runs only if you return
TX_BUSY in the earlier iteration.

BTW, shouldn't the check in start_xmit be:
if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) {
...
}

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-23 Thread Michael S. Tsirkin
On Mon, May 23, 2011 at 11:37:15AM +0930, Rusty Russell wrote:
> On Sun, 22 May 2011 15:10:08 +0300, "Michael S. Tsirkin"  
> wrote:
> > On Sat, May 21, 2011 at 11:49:59AM +0930, Rusty Russell wrote:
> > > On Fri, 20 May 2011 02:11:56 +0300, "Michael S. Tsirkin" 
> > >  wrote:
> > > > Current code might introduce a lot of latency variation
> > > > if there are many pending bufs at the time we
> > > > attempt to transmit a new one. This is bad for
> > > > real-time applications and can't be good for TCP either.
> > > 
> > > Do we have more than speculation to back that up, BTW?
> > 
> > Need to dig this up: I thought we saw some reports of this on the list?
> 
> I think so too, but a reference needs to be here too.
> 
> It helps to have exact benchmarks on what's being tested, otherwise we
> risk unexpected interaction with the other optimization patches.
> 
> > > > struct sk_buff *skb;
> > > > unsigned int len;
> > > > -
> > > > -   while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
> > > > +   bool c;
> > > > +   int n;
> > > > +
> > > > +   /* We try to free up at least 2 skbs per one sent, so that 
> > > > we'll get
> > > > +* all of the memory back if they are used fast enough. */
> > > > +   for (n = 0;
> > > > +((c = virtqueue_get_capacity(vi->svq) < capacity) || n < 
> > > > 2) &&
> > > > +((skb = virtqueue_get_buf(vi->svq, &len)));
> > > > +++n) {
> > > > pr_debug("Sent skb %p\n", skb);
> > > > vi->dev->stats.tx_bytes += skb->len;
> > > > vi->dev->stats.tx_packets++;
> > > > dev_kfree_skb_any(skb);
> > > > }
> > > > +   return !c;
> > > 
> > > This is for() abuse :)
> > > 
> > > Why is the capacity check in there at all?  Surely it's simpler to try
> > > to free 2 skbs each time around?
> > 
> > This is in case we can't use indirect: we want to free up
> > enough buffers for the following add_buf to succeed.
> 
> Sure, or we could just count the frags of the skb we're taking out,
> which would be accurate for both cases and far more intuitive.
> 
> ie. always try to free up twice as much as we're about to put in.
> 
> Can we hit problems with OOM?  Sure, but no worse than now...
> The problem is that this "virtqueue_get_capacity()" returns the worst
> case, not the normal case.  So using it is deceptive.
> 

Maybe just document this?

I still believe capacity really needs to be decided
at the virtqueue level, not in the driver.
E.g. with indirect each skb uses a single entry: freeing
1 small skb is always enough to have space for a large one.

I do understand how it seems a waste to leave direct space
in the ring while we might in practice have space
due to indirect. Didn't come up with a nice way to
solve this yet - but 'no worse than now :)'

> > I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make
> > sure we have enough space in the buffer. Another way to do
> > that is with a define :).
> 
> To do this properly, we should really be using the actual number of sg
> elements needed, but we'd have to do most of xmit_skb beforehand so we
> know how many.
> 
> Cheers,
> Rusty.

Maybe I'm confused here.  The problem isn't the failing
add_buf for the given skb IIUC.  What we are trying to do here is stop
the queue *before xmit_skb fails*. We can't look at the
number of fragments in the current skb - the next one can be
much larger.  That's why we check capacity after xmit_skb,
not before it, right?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-22 Thread Rusty Russell
On Sun, 22 May 2011 15:10:08 +0300, "Michael S. Tsirkin"  
wrote:
> On Sat, May 21, 2011 at 11:49:59AM +0930, Rusty Russell wrote:
> > On Fri, 20 May 2011 02:11:56 +0300, "Michael S. Tsirkin"  
> > wrote:
> > > Current code might introduce a lot of latency variation
> > > if there are many pending bufs at the time we
> > > attempt to transmit a new one. This is bad for
> > > real-time applications and can't be good for TCP either.
> > 
> > Do we have more than speculation to back that up, BTW?
> 
> Need to dig this up: I thought we saw some reports of this on the list?

I think so too, but a reference needs to be here too.

It helps to have exact benchmarks on what's being tested, otherwise we
risk unexpected interaction with the other optimization patches.

> > >   struct sk_buff *skb;
> > >   unsigned int len;
> > > -
> > > - while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
> > > + bool c;
> > > + int n;
> > > +
> > > + /* We try to free up at least 2 skbs per one sent, so that we'll get
> > > +  * all of the memory back if they are used fast enough. */
> > > + for (n = 0;
> > > +  ((c = virtqueue_get_capacity(vi->svq) < capacity) || n < 2) &&
> > > +  ((skb = virtqueue_get_buf(vi->svq, &len)));
> > > +  ++n) {
> > >   pr_debug("Sent skb %p\n", skb);
> > >   vi->dev->stats.tx_bytes += skb->len;
> > >   vi->dev->stats.tx_packets++;
> > >   dev_kfree_skb_any(skb);
> > >   }
> > > + return !c;
> > 
> > This is for() abuse :)
> > 
> > Why is the capacity check in there at all?  Surely it's simpler to try
> > to free 2 skbs each time around?
> 
> This is in case we can't use indirect: we want to free up
> enough buffers for the following add_buf to succeed.

Sure, or we could just count the frags of the skb we're taking out,
which would be accurate for both cases and far more intuitive.

ie. always try to free up twice as much as we're about to put in.

Can we hit problems with OOM?  Sure, but no worse than now...

The problem is that this "virtqueue_get_capacity()" returns the worst
case, not the normal case.  So using it is deceptive.

> I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make
> sure we have enough space in the buffer. Another way to do
> that is with a define :).

To do this properly, we should really be using the actual number of sg
elements needed, but we'd have to do most of xmit_skb beforehand so we
know how many.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-22 Thread Michael S. Tsirkin
On Sat, May 21, 2011 at 11:49:59AM +0930, Rusty Russell wrote:
> On Fri, 20 May 2011 02:11:56 +0300, "Michael S. Tsirkin"  
> wrote:
> > Current code might introduce a lot of latency variation
> > if there are many pending bufs at the time we
> > attempt to transmit a new one. This is bad for
> > real-time applications and can't be good for TCP either.
> 
> Do we have more than speculation to back that up, BTW?

Need to dig this up: I thought we saw some reports of this on the list?

> This patch is pretty sloppy; the previous ones were better polished.
> 
> > -static void free_old_xmit_skbs(struct virtnet_info *vi)
> > +static bool free_old_xmit_skbs(struct virtnet_info *vi, int capacity)
> >  {
> 
> A comment here indicating it returns true if it frees something?

Agree.

> > struct sk_buff *skb;
> > unsigned int len;
> > -
> > -   while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
> > +   bool c;
> > +   int n;
> > +
> > +   /* We try to free up at least 2 skbs per one sent, so that we'll get
> > +* all of the memory back if they are used fast enough. */
> > +   for (n = 0;
> > +((c = virtqueue_get_capacity(vi->svq) < capacity) || n < 2) &&
> > +((skb = virtqueue_get_buf(vi->svq, &len)));
> > +++n) {
> > pr_debug("Sent skb %p\n", skb);
> > vi->dev->stats.tx_bytes += skb->len;
> > vi->dev->stats.tx_packets++;
> > dev_kfree_skb_any(skb);
> > }
> > +   return !c;
> 
> This is for() abuse :)
> 
> Why is the capacity check in there at all?  Surely it's simpler to try
> to free 2 skbs each time around?

This is in case we can't use indirect: we want to free up
enough buffers for the following add_buf to succeed.


>for (n = 0; n < 2; n++) {
> skb = virtqueue_get_buf(vi->svq, &len);
> if (!skb)
> break;
>   pr_debug("Sent skb %p\n", skb);
>   vi->dev->stats.tx_bytes += skb->len;
>   vi->dev->stats.tx_packets++;
>   dev_kfree_skb_any(skb);
>}
> 
> >  static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
> > @@ -574,8 +582,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, 
> > struct net_device *dev)
> > struct virtnet_info *vi = netdev_priv(dev);
> > int capacity;
> >  
> > -   /* Free up any pending old buffers before queueing new ones. */
> > -   free_old_xmit_skbs(vi);
> > +   /* Free enough pending old buffers to enable queueing new ones. */
> > +   free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS);
> >  
> > /* Try to transmit */
> > capacity = xmit_skb(vi, skb);
> > @@ -609,9 +617,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, 
> > struct net_device *dev)
> > netif_stop_queue(dev);
> > if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
> > /* More just got used, free them then recheck. */
> > -   free_old_xmit_skbs(vi);
> > -   capacity = virtqueue_get_capacity(vi->svq);
> > -   if (capacity >= 2+MAX_SKB_FRAGS) {
> > +   if (!likely(free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) {
> 
> This extra argument to free_old_xmit_skbs seems odd, unless you have
> future plans?
> 
> Thanks,
> Rusty.

I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make
sure we have enough space in the buffer. Another way to do
that is with a define :).

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 10/14] virtio_net: limit xmit polling

2011-05-20 Thread Rusty Russell
On Fri, 20 May 2011 02:11:56 +0300, "Michael S. Tsirkin"  
wrote:
> Current code might introduce a lot of latency variation
> if there are many pending bufs at the time we
> attempt to transmit a new one. This is bad for
> real-time applications and can't be good for TCP either.

Do we have more than speculation to back that up, BTW?

This patch is pretty sloppy; the previous ones were better polished.

> -static void free_old_xmit_skbs(struct virtnet_info *vi)
> +static bool free_old_xmit_skbs(struct virtnet_info *vi, int capacity)
>  {

A comment here indicating it returns true if it frees something?

>   struct sk_buff *skb;
>   unsigned int len;
> -
> - while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
> + bool c;
> + int n;
> +
> + /* We try to free up at least 2 skbs per one sent, so that we'll get
> +  * all of the memory back if they are used fast enough. */
> + for (n = 0;
> +  ((c = virtqueue_get_capacity(vi->svq) < capacity) || n < 2) &&
> +  ((skb = virtqueue_get_buf(vi->svq, &len)));
> +  ++n) {
>   pr_debug("Sent skb %p\n", skb);
>   vi->dev->stats.tx_bytes += skb->len;
>   vi->dev->stats.tx_packets++;
>   dev_kfree_skb_any(skb);
>   }
> + return !c;

This is for() abuse :)

Why is the capacity check in there at all?  Surely it's simpler to try
to free 2 skbs each time around?

   for (n = 0; n < 2; n++) {
skb = virtqueue_get_buf(vi->svq, &len);
if (!skb)
break;
pr_debug("Sent skb %p\n", skb);
vi->dev->stats.tx_bytes += skb->len;
vi->dev->stats.tx_packets++;
dev_kfree_skb_any(skb);
   }

>  static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
> @@ -574,8 +582,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
> net_device *dev)
>   struct virtnet_info *vi = netdev_priv(dev);
>   int capacity;
>  
> - /* Free up any pending old buffers before queueing new ones. */
> - free_old_xmit_skbs(vi);
> + /* Free enough pending old buffers to enable queueing new ones. */
> + free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS);
>  
>   /* Try to transmit */
>   capacity = xmit_skb(vi, skb);
> @@ -609,9 +617,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
> net_device *dev)
>   netif_stop_queue(dev);
>   if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
>   /* More just got used, free them then recheck. */
> - free_old_xmit_skbs(vi);
> - capacity = virtqueue_get_capacity(vi->svq);
> - if (capacity >= 2+MAX_SKB_FRAGS) {
> + if (!likely(free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) {

This extra argument to free_old_xmit_skbs seems odd, unless you have
future plans?

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html