Re: TSO trimming question

2007-12-21 Thread Bill Fink
On Fri, 21 Dec 2007, Ilpo Järvinen wrote:

> On Fri, 21 Dec 2007, Bill Fink wrote:
> 
> > On Fri, 21 Dec 2007, Bill Fink wrote:
> > 
> > > Or perhaps even:
> > > 
> > >   /* Ok, it looks like it is advisable to defer.  */
> > >   tp->tso_deferred = jiffies;
> > > 
> > >   /* need to return a non-zero value to defer, which means won't
> > >* defer if jiffies == 0 but it's only a 1 in 4 billion event
> > >* (and avoids a compare/branch by not checking jiffies)
> > >/
> > > return jiffies;
> > 
> > Ack.  I introduced my own 64-bit to 32-bit issue (too late at night).
> > How about:
> > 
> > /* Ok, it looks like it is advisable to defer.  */
> > tp->tso_deferred = jiffies;
> > 
> > /* this won't defer if jiffies == 0 but it's only a 1 in
> >  * 4 billion event (and avoids a branch)
> >  */
> > return (jiffies != 0);
> 
> I'm not sure how the jiffies work but is this racy as well?
> 
> Simple return tp->tso_deferred; should work, shouldn't it? :-)

As long as tp->tso_deferred remains u32, pending the other issue.

-Bill
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread Ilpo Järvinen
On Fri, 21 Dec 2007, Bill Fink wrote:

> On Fri, 21 Dec 2007, Bill Fink wrote:
> 
> > Or perhaps even:
> > 
> > /* Ok, it looks like it is advisable to defer.  */
> > tp->tso_deferred = jiffies;
> > 
> > /* need to return a non-zero value to defer, which means won't
> >  * defer if jiffies == 0 but it's only a 1 in 4 billion event
> >  * (and avoids a compare/branch by not checking jiffies)
> >  /
> > return jiffies;
> 
> Ack.  I introduced my own 64-bit to 32-bit issue (too late at night).
> How about:
> 
>   /* Ok, it looks like it is advisable to defer.  */
>   tp->tso_deferred = jiffies;
> 
>   /* this won't defer if jiffies == 0 but it's only a 1 in
>* 4 billion event (and avoids a branch)
>*/
>   return (jiffies != 0);

I'm not sure how the jiffies work but is this racy as well?

Simple return tp->tso_deferred; should work, shouldn't it? :-)


-- 
 i.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread Bill Fink
On Fri, 21 Dec 2007, Bill Fink wrote:

> Or perhaps even:
> 
>   /* Ok, it looks like it is advisable to defer.  */
>   tp->tso_deferred = jiffies;
> 
>   /* need to return a non-zero value to defer, which means won't
>* defer if jiffies == 0 but it's only a 1 in 4 billion event
>* (and avoids a compare/branch by not checking jiffies)
>/
> return jiffies;

Ack.  I introduced my own 64-bit to 32-bit issue (too late at night).
How about:

/* Ok, it looks like it is advisable to defer.  */
tp->tso_deferred = jiffies;

/* this won't defer if jiffies == 0 but it's only a 1 in
 * 4 billion event (and avoids a branch)
 */
return (jiffies != 0);

-Bill
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread Bill Fink
On Fri, 21 Dec 2007, David Miller wrote:

> From: Herbert Xu <[EMAIL PROTECTED]>
> Date: Fri, 21 Dec 2007 17:29:27 +0800
> 
> > On Fri, Dec 21, 2007 at 01:27:20AM -0800, David Miller wrote:
> > >
> > > It's two shifts, and this gets scheduled along with the other
> > > instructions on many cpus so it's effectively free.
> > > 
> > > I don't see why this is even worth mentioning and discussing.
> > 
> > I totally agree.  Two shifts are way better than a branch.
> 
> We take probably a thousand+ 100+ cycle cache misses in the TCP stack
> on big window openning ACKs.
> 
> Instead of discussing ways to solve that huge performance killer we're
> wanking about two friggin' integer shifts.
> 
> It's hilarious isn't it? :-)

I don't think obfuscated code is hilarious.  Instead of the convoluted
and dense code:

/* Defer for less than two clock ticks. */
if (tp->tso_deferred &&
((jiffies << 1) >> 1) - (tp->tso_deferred >> 1) > 1)

You can have the much simpler and more easily understandable:

/* Defer for less than two clock ticks. */
if (tp->tso_deferred && (jiffies - tp->tso_deferred) > 1)

And instead of:

/* Ok, it looks like it is advisable to defer.  */
tp->tso_deferred = 1 | (jiffies<<1);

return 1;

You could do as Ilpo suggested:

/* Ok, it looks like it is advisable to defer.  */
tp->tso_deferred = max_t(u32, jiffies, 1);

return 1;

Or perhaps more efficiently:

/* Ok, it looks like it is advisable to defer.  */
tp->tso_deferred = jiffies;
if (unlikely(jiffies == 0))
tp->tso_deferred = 1;

return 1;

Or perhaps even:

/* Ok, it looks like it is advisable to defer.  */
tp->tso_deferred = jiffies;

/* need to return a non-zero value to defer, which means won't
 * defer if jiffies == 0 but it's only a 1 in 4 billion event
 * (and avoids a compare/branch by not checking jiffies)
 /
return jiffies;

Since it really only needs a non-zero return value to defer.

See, no branches needed and much clearer code.  That seems worthwhile
to me from a code maintenance standpoint, even if it isn't any speed
improvement.

And what about the 64-bit jiffies versus 32-bit tp->tso_deferred issue?
Should tso_deferred be made unsigned long to match jiffies?

-Bill
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Fri, 21 Dec 2007 17:29:27 +0800

> On Fri, Dec 21, 2007 at 01:27:20AM -0800, David Miller wrote:
> >
> > It's two shifts, and this gets scheduled along with the other
> > instructions on many cpus so it's effectively free.
> > 
> > I don't see why this is even worth mentioning and discussing.
> 
> I totally agree.  Two shifts are way better than a branch.

We take probably a thousand+ 100+ cycle cache misses in the TCP stack
on big window openning ACKs.

Instead of discussing ways to solve that huge performance killer we're
wanking about two friggin' integer shifts.

It's hilarious isn't it? :-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread Herbert Xu
On Fri, Dec 21, 2007 at 01:27:20AM -0800, David Miller wrote:
>
> It's two shifts, and this gets scheduled along with the other
> instructions on many cpus so it's effectively free.
> 
> I don't see why this is even worth mentioning and discussing.

I totally agree.  Two shifts are way better than a branch.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread Ilpo Järvinen
On Fri, 21 Dec 2007, Ilpo Järvinen wrote:

> On Fri, 21 Dec 2007, Bill Fink wrote:
> 
> > If so it seems like a lot of unnecessary
> > work just to avoid a 1 in 4 billion event, since it's my understanding
> > that the whole tcp_tso_should_defer function is just an optimization
> > and not a criticality to the proper functioning of TCP, especially
> > considering it hasn't even been executing at all up to now.
> 
> It would still be good to not return 1 in that case we didn't flag the 
> deferral, how about adding one additional tick (+comment) in the zero 
> case making tso_deferred == 0 again unambiguously defined, e.g.:
> 
>   tp->tso_deferred = min_t(u32, jiffies, 1);

Blah, max_t of course :-).


-- 
 i.

Re: TSO trimming question

2007-12-21 Thread David Miller
From: Bill Fink <[EMAIL PROTECTED]>
Date: Fri, 21 Dec 2007 03:06:48 -0500

> What's with all the shifting back and forth?  Here with:
> 
>   ((jiffies<<1)>>1) - (tp->tso_deferred>>1)
> 
> and later with:
> 
>   /* Ok, it looks like it is advisable to defer.  */
>   tp->tso_deferred = 1 | (jiffies<<1);
> 
> Is this just done to try and avoid the special case of jiffies==0 
> when the jiffies wrap?  If so it seems like a lot of unnecessary
> work just to avoid a 1 in 4 billion event, since it's my understanding
> that the whole tcp_tso_should_defer function is just an optimization
> and not a criticality to the proper functioning of TCP, especially
> considering it hasn't even been executing at all up to now.

How else would you avoid the incorrect result when jiffies is
indeed zero?

It's two shifts, and this gets scheduled along with the other
instructions on many cpus so it's effectively free.

I don't see why this is even worth mentioning and discussing.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread Ilpo Järvinen
On Fri, 21 Dec 2007, Bill Fink wrote:

> I meant to ask about this a while back but then got distracted by
> other things.  But now since the subject has come up, I had a couple
> of more questions about this code.
> 
> What's with all the shifting back and forth?  Here with:
> 
>   ((jiffies<<1)>>1) - (tp->tso_deferred>>1)
> 
> and later with:
> 
>   /* Ok, it looks like it is advisable to defer.  */
>   tp->tso_deferred = 1 | (jiffies<<1);
> 
> Is this just done to try and avoid the special case of jiffies==0 
> when the jiffies wrap? 

I thought that it must be the reason. I couldn't figure out any other 
explination while thinking the same thing but since I saw no problem in 
that rather weird approach, I didn't want to touch that in a patch which 
had net-2.6 (or stable) potential.

> If so it seems like a lot of unnecessary
> work just to avoid a 1 in 4 billion event, since it's my understanding
> that the whole tcp_tso_should_defer function is just an optimization
> and not a criticality to the proper functioning of TCP, especially
> considering it hasn't even been executing at all up to now.

It would still be good to not return 1 in that case we didn't flag the 
deferral, how about adding one additional tick (+comment) in the zero 
case making tso_deferred == 0 again unambiguously defined, e.g.:

tp->tso_deferred = min_t(u32, jiffies, 1);

...I'm relatively sure that nobody would ever notice that tick :-) and we 
kept return value consistent with tso_deferred state invariant.

> My second question is more basic and if I'm not mistaken actually
> relates to a remaining bug in the (corrected) test:
> 
>   /* Defer for less than two clock ticks. */
>   if (tp->tso_deferred &&
>   ((jiffies << 1) >> 1) - (tp->tso_deferred >> 1) > 1)
> 
> Since jiffies is an unsigned long, which is 64-bits on a 64-bit system,
> whereas tp->tso_deferred is a u32, once jiffies exceeds 31-bits, which
> will happen in about 25 days if HZ=1000, won't the second part of the
> test always be true after that?  Or am I missing something obvious?

I didn't notice that earlier but I think you're right though my knowledge 
about jiffies and such is quite limited.

...Feel free to submit a patch to correct these.


-- 
 i.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread Bill Fink
On Thu, 20 Dec 2007, David Miller wrote:

> From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
> Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET)
> 
> > [PATCH] [TCP]: Fix TSO deferring
> > 
> > I'd say that most of what tcp_tso_should_defer had in between
> > there was dead code because of this.
> > 
> > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
> 
> Yikes!
> 
> John, we've been living a lie for more than a year. :-/
> 
> On the bright side this explains a lot of small TSO frames I've been
> seeing in traces over the past year but never got a chance to
> investigate.
> 
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index 8dafda9..693b9f6 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -1217,7 +1217,8 @@ static int tcp_tso_should_defer(struct sock *sk, 
> > struct sk_buff *skb)
> > goto send_now;
> >  
> > /* Defer for less than two clock ticks. */
> > -   if (!tp->tso_deferred && ((jiffies<<1)>>1) - (tp->tso_deferred>>1) > 1)
> > +   if (tp->tso_deferred &&
> > +   ((jiffies << 1) >> 1) - (tp->tso_deferred >> 1) > 1)
> > goto send_now;
> >  
> > in_flight = tcp_packets_in_flight(tp);

I meant to ask about this a while back but then got distracted by
other things.  But now since the subject has come up, I had a couple
of more questions about this code.

What's with all the shifting back and forth?  Here with:

((jiffies<<1)>>1) - (tp->tso_deferred>>1)

and later with:

/* Ok, it looks like it is advisable to defer.  */
tp->tso_deferred = 1 | (jiffies<<1);

Is this just done to try and avoid the special case of jiffies==0 
when the jiffies wrap?  If so it seems like a lot of unnecessary
work just to avoid a 1 in 4 billion event, since it's my understanding
that the whole tcp_tso_should_defer function is just an optimization
and not a criticality to the proper functioning of TCP, especially
considering it hasn't even been executing at all up to now.

My second question is more basic and if I'm not mistaken actually
relates to a remaining bug in the (corrected) test:

/* Defer for less than two clock ticks. */
if (tp->tso_deferred &&
((jiffies << 1) >> 1) - (tp->tso_deferred >> 1) > 1)

Since jiffies is an unsigned long, which is 64-bits on a 64-bit system,
whereas tp->tso_deferred is a u32, once jiffies exceeds 31-bits, which
will happen in about 25 days if HZ=1000, won't the second part of the
test always be true after that?  Or am I missing something obvious?

-Bill
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-20 Thread David Miller
From: John Heffner <[EMAIL PROTECTED]>
Date: Thu, 20 Dec 2007 11:02:21 -0500

> David Miller wrote:
> > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
> > Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET)
> > 
> >> [PATCH] [TCP]: Fix TSO deferring
> >>
> >> I'd say that most of what tcp_tso_should_defer had in between
> >> there was dead code because of this.
> >>
> >> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
> > 
> > Yikes!
> > 
> > John, we've been living a lie for more than a year. :-/
> > 
> > On the bright side this explains a lot of small TSO frames I've been
> > seeing in traces over the past year but never got a chance to
> > investigate.
> 
> Ouch.  This fix may improve some benchmarks.
> 
> Re-checking this function was on my list of things to do because I had 
> also noticed some TSO frames that seemed a bit small.  This clearly 
> explains it.

What I'll do for now is I'll put this into net-2.6.25 to let it
"cook" for a while.  It's an obvious fix but it's enabling code
that's effectively been disabled for more than a year so something
might turn up that we need to fix.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-20 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Thu, 20 Dec 2007 22:00:12 +0800

> On Thu, Dec 20, 2007 at 04:00:37AM -0800, David Miller wrote:
> >
> > In the most ideal sense, tcp_window_allows() should probably
> > be changed to only return MSS multiples.
> > 
> > Unfortunately this would add an expensive modulo operation,
> > however I think it would elimiate this problem case.
> 
> Well you only have to divide in the unlikely case of us being
> limited by the receiver window.  In that case speed is probably
> not of the essence anyway.

Agreed, to some extent.

I say "to some extent" because it might be realistic, with
lots (millions) of sockets to hit this case a lot.

There are so many things that are a "don't care" performance
wise until you have a lot of stinky connections over crappy
links.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-20 Thread John Heffner

David Miller wrote:

From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET)


[PATCH] [TCP]: Fix TSO deferring

I'd say that most of what tcp_tso_should_defer had in between
there was dead code because of this.

Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>


Yikes!

John, we've been living a lie for more than a year. :-/

On the bright side this explains a lot of small TSO frames I've been
seeing in traces over the past year but never got a chance to
investigate.


Ouch.  This fix may improve some benchmarks.

Re-checking this function was on my list of things to do because I had 
also noticed some TSO frames that seemed a bit small.  This clearly 
explains it.


  -John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-20 Thread Herbert Xu
On Thu, Dec 20, 2007 at 04:00:37AM -0800, David Miller wrote:
>
> In the most ideal sense, tcp_window_allows() should probably
> be changed to only return MSS multiples.
> 
> Unfortunately this would add an expensive modulo operation,
> however I think it would elimiate this problem case.

Well you only have to divide in the unlikely case of us being
limited by the receiver window.  In that case speed is probably
not of the essence anyway.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-20 Thread Ilpo Järvinen
On Thu, 20 Dec 2007, David Miller wrote:

> From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
> Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET)
> 
> > That's not the only case, IMHO if there's odd boundary due to 
> > snd_una+snd_wnd - skb->seq limit (done in tcp_window_allows()), we don't 
> > consider it as odd but break the skb at arbitary point resulting
> > two small segments to the network, and what's worse, when the later skb 
> > resulting from the first split is matching skb->len < limit check as well 
> > causing an unnecessary small skb to be created for nagle purpose too, 
> > solving it fully requires some thought in case the mss_now != mss_cache 
> > even if non-odd boundaries are honored in the middle of skb.
> 
> In the most ideal sense, tcp_window_allows() should probably
> be changed to only return MSS multiples.

That's what Herbert suggested already, I'll send a patch later
on... :-)

> Unfortunately this would add an expensive modulo operation,
> however I think it would elimiate this problem case.

Yes. Should we still call tcp_minshall_update() if split in the middle of 
wq results in smaller than MSS tail (occurs only if mss_now != mss_cache)?

-- 
 i.

Re: TSO trimming question

2007-12-20 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET)

> That's not the only case, IMHO if there's odd boundary due to 
> snd_una+snd_wnd - skb->seq limit (done in tcp_window_allows()), we don't 
> consider it as odd but break the skb at arbitary point resulting
> two small segments to the network, and what's worse, when the later skb 
> resulting from the first split is matching skb->len < limit check as well 
> causing an unnecessary small skb to be created for nagle purpose too, 
> solving it fully requires some thought in case the mss_now != mss_cache 
> even if non-odd boundaries are honored in the middle of skb.

In the most ideal sense, tcp_window_allows() should probably
be changed to only return MSS multiples.

Unfortunately this would add an expensive modulo operation,
however I think it would elimiate this problem case.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-20 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET)

> [PATCH] [TCP]: Fix TSO deferring
> 
> I'd say that most of what tcp_tso_should_defer had in between
> there was dead code because of this.
> 
> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>

Yikes!

John, we've been living a lie for more than a year. :-/

On the bright side this explains a lot of small TSO frames I've been
seeing in traces over the past year but never got a chance to
investigate.

> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 8dafda9..693b9f6 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1217,7 +1217,8 @@ static int tcp_tso_should_defer(struct sock *sk, struct 
> sk_buff *skb)
>   goto send_now;
>  
>   /* Defer for less than two clock ticks. */
> - if (!tp->tso_deferred && ((jiffies<<1)>>1) - (tp->tso_deferred>>1) > 1)
> + if (tp->tso_deferred &&
> + ((jiffies << 1) >> 1) - (tp->tso_deferred >> 1) > 1)
>   goto send_now;
>  
>   in_flight = tcp_packets_in_flight(tp);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-20 Thread Ilpo Järvinen
On Wed, 19 Dec 2007, David Miller wrote:

> From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
> Date: Wed, 19 Dec 2007 23:46:33 +0200 (EET)
> 
> > I'm not fully sure what's purpose of this code in tcp_write_xmit:
> > 
> >if (skb->len < limit) {
> >unsigned int trim = skb->len % mss_now;
> > 
> >if (trim)
> >limit = skb->len - trim;
> > }
> > 
> > Is it used to make sure we send only multiples of mss_now here and leave 
> > the left-over into another skb?

Yeah, I now understand that this part is correct. I somehow got such 
impression while trying to figure this out that it ends up being dead code 
but that wasn't correct thought from my side. However, it caught my 
attention and after some thinking I'd say there's more to handle here 
(covered by the second question).

Also note that patch I sent earlier is not right either but needs some 
refining to do the right thing.

> > Or does it try to make sure that
> > tso_fragment result honors multiple of mss_now boundaries when snd_wnd
> > is the limitting factor? For latter IMHO this would be necessary:
> > 
> > if (skb->len > limit)
> > limit -= limit % mss_now;
> 
> The purpose of the test is to make sure we process tail sub-mss chunks
> correctly wrt. Nagle, which most closely matches the first purpose
> you've listed.
> 
> So I think the calculation really does belong where it is.
>
> Because of the way that the sendmsg() super-skb formation logic
> works, we always will tack on more data and grow the tail
> SKB before creating a new one.  So any sub-mss chunk at the
> end of a TSO frame really is at the end of the write queue
> and really should get nagle processing.

Yes, I now agree this is fully correct for this task.

> Actually, there is an exception, which is when we run out of
> skb_frag_list slots.  In that case we'll potentially have breaks at
> odd boundaries in the middle of the queue.  But this can only happen
> in exceptional cases (user does tons of 1-byte sendfile()'s over
> random non-consequetive locations of a file) or outright bugs
> (MAX_SKB_FRAGS is defined incorrectly, for example) and thus this
> situation is not worth coding for.

That's not the only case, IMHO if there's odd boundary due to 
snd_una+snd_wnd - skb->seq limit (done in tcp_window_allows()), we don't 
consider it as odd but break the skb at arbitary point resulting
two small segments to the network, and what's worse, when the later skb 
resulting from the first split is matching skb->len < limit check as well 
causing an unnecessary small skb to be created for nagle purpose too, 
solving it fully requires some thought in case the mss_now != mss_cache 
even if non-odd boundaries are honored in the middle of skb.

Though whether we get there is depending on what tcp_tso_should_defer() 
decided. Hmm, there seems to be an unrelated bug in it as well :-/. A 
patch below. Please consider the fact that enabling TSO deferring may 
have some unpleasant effect to TCP dynamics, considering that I don't find 
stable mandatory for this to avoid breaking, besides things have been 
working quite well without it too... Only compile tested.

-- 
 i.


--
[PATCH] [TCP]: Fix TSO deferring

I'd say that most of what tcp_tso_should_defer had in between
there was dead code because of this.

Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
---
 net/ipv4/tcp_output.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8dafda9..693b9f6 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1217,7 +1217,8 @@ static int tcp_tso_should_defer(struct sock *sk, struct 
sk_buff *skb)
goto send_now;
 
/* Defer for less than two clock ticks. */
-   if (!tp->tso_deferred && ((jiffies<<1)>>1) - (tp->tso_deferred>>1) > 1)
+   if (tp->tso_deferred &&
+   ((jiffies << 1) >> 1) - (tp->tso_deferred >> 1) > 1)
goto send_now;
 
in_flight = tcp_packets_in_flight(tp);
-- 
1.5.0.6


Re: TSO trimming question

2007-12-19 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Thu, 20 Dec 2007 11:26:18 +0800

> Ilpo J??rvinen <[EMAIL PROTECTED]> wrote:
> >
> >> is the limitting factor? For latter IMHO this would be necessary:
> >> 
> >>   if (skb->len > limit)
> >>   limit -= limit % mss_now;
> 
> Good catch! But how about putting this logic into tcp_window_allows
> since then you can avoid the divide for the case when we're not
> bound by the receiver window.

Because the purpose of this calculation is different, as I
described in my other reply, I don't think this change is
correct, regardless of where it is placed :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-19 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Wed, 19 Dec 2007 23:46:33 +0200 (EET)

> I'm not fully sure what's purpose of this code in tcp_write_xmit:
> 
>if (skb->len < limit) {
>unsigned int trim = skb->len % mss_now;
> 
>if (trim)
>limit = skb->len - trim;
>   }
> 
> Is it used to make sure we send only multiples of mss_now here and leave 
> the left-over into another skb? Or does it try to make sure that
> tso_fragment result honors multiple of mss_now boundaries when snd_wnd
> is the limitting factor? For latter IMHO this would be necessary:
> 
>   if (skb->len > limit)
>   limit -= limit % mss_now;

The purpose of the test is to make sure we process tail sub-mss chunks
correctly wrt. Nagle, which most closely matches the first purpose
you've listed.

So I think the calculation really does belong where it is.

Because of the way that the sendmsg() super-skb formation logic
works, we always will tack on more data and grow the tail
SKB before creating a new one.  So any sub-mss chunk at the
end of a TSO frame really is at the end of the write queue
and really should get nagle processing.

Actually, there is an exception, which is when we run out of
skb_frag_list slots.  In that case we'll potentially have breaks at
odd boundaries in the middle of the queue.  But this can only happen
in exceptional cases (user does tons of 1-byte sendfile()'s over
random non-consequetive locations of a file) or outright bugs
(MAX_SKB_FRAGS is defined incorrectly, for example) and thus this
situation is not worth coding for.

sendmsg()'s goal, when TSO is enabled, is to append to the write
queue tail SKB, and it does so until the MAX_SKB_FRAGS limit
is reached.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-19 Thread Herbert Xu
Ilpo J??rvinen <[EMAIL PROTECTED]> wrote:
>
>> is the limitting factor? For latter IMHO this would be necessary:
>> 
>>   if (skb->len > limit)
>>   limit -= limit % mss_now;

Good catch! But how about putting this logic into tcp_window_allows
since then you can avoid the divide for the case when we're not
bound by the receiver window.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-19 Thread Ilpo Järvinen
On Wed, 19 Dec 2007, Ilpo Järvinen wrote:

> I'm not fully sure what's purpose of this code in tcp_write_xmit:
> 
>if (skb->len < limit) {
>unsigned int trim = skb->len % mss_now;
> 
>if (trim)
>limit = skb->len - trim;
>   }
> 
> Is it used to make sure we send only multiples of mss_now here and leave 
> the left-over into another skb? Or does it try to make sure that
> tso_fragment result honors multiple of mss_now boundaries when snd_wnd
> is the limitting factor? For latter IMHO this would be necessary:
> 
>   if (skb->len > limit)
>   limit -= limit % mss_now;

...Anyway, here's an untested patch to just do it :-):

-- 
 i.


[PATCH] [TCP]: Force tso skb split to mss_now boundary (if snd_wnd limits)

If snd_wnd was limitting factor, the tso_fragment might not
create full-sized skbs but would include the window left-overs
as well.

Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
---
 net/ipv4/tcp_output.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 1c7ef17..8dafda9 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1453,6 +1453,8 @@ static int tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle)
 
if (trim)
limit = skb->len - trim;
+   } else if (skb->len > limit) {
+   limit -= limit % mss_now;
}
}
 
@@ -1525,6 +1527,8 @@ void tcp_push_one(struct sock *sk, unsigned int mss_now)
 
if (trim)
limit = skb->len - trim;
+   } else if (skb->len > limit) {
+   limit -= limit % mss_now;
}
}
 
-- 
1.5.0.6