Re: TSO trimming question
On Fri, 21 Dec 2007, Ilpo Järvinen wrote: > On Fri, 21 Dec 2007, Bill Fink wrote: > > > On Fri, 21 Dec 2007, Bill Fink wrote: > > > > > Or perhaps even: > > > > > > /* Ok, it looks like it is advisable to defer. */ > > > tp->tso_deferred = jiffies; > > > > > > /* need to return a non-zero value to defer, which means won't > > >* defer if jiffies == 0 but it's only a 1 in 4 billion event > > >* (and avoids a compare/branch by not checking jiffies) > > >/ > > > return jiffies; > > > > Ack. I introduced my own 64-bit to 32-bit issue (too late at night). > > How about: > > > > /* Ok, it looks like it is advisable to defer. */ > > tp->tso_deferred = jiffies; > > > > /* this won't defer if jiffies == 0 but it's only a 1 in > > * 4 billion event (and avoids a branch) > > */ > > return (jiffies != 0); > > I'm not sure how the jiffies work but is this racy as well? > > Simple return tp->tso_deferred; should work, shouldn't it? :-) As long as tp->tso_deferred remains u32, pending the other issue. -Bill -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
On Fri, 21 Dec 2007, Bill Fink wrote: > On Fri, 21 Dec 2007, Bill Fink wrote: > > > Or perhaps even: > > > > /* Ok, it looks like it is advisable to defer. */ > > tp->tso_deferred = jiffies; > > > > /* need to return a non-zero value to defer, which means won't > > * defer if jiffies == 0 but it's only a 1 in 4 billion event > > * (and avoids a compare/branch by not checking jiffies) > > / > > return jiffies; > > Ack. I introduced my own 64-bit to 32-bit issue (too late at night). > How about: > > /* Ok, it looks like it is advisable to defer. */ > tp->tso_deferred = jiffies; > > /* this won't defer if jiffies == 0 but it's only a 1 in >* 4 billion event (and avoids a branch) >*/ > return (jiffies != 0); I'm not sure how the jiffies work but is this racy as well? Simple return tp->tso_deferred; should work, shouldn't it? :-) -- i. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
On Fri, 21 Dec 2007, Bill Fink wrote: > Or perhaps even: > > /* Ok, it looks like it is advisable to defer. */ > tp->tso_deferred = jiffies; > > /* need to return a non-zero value to defer, which means won't >* defer if jiffies == 0 but it's only a 1 in 4 billion event >* (and avoids a compare/branch by not checking jiffies) >/ > return jiffies; Ack. I introduced my own 64-bit to 32-bit issue (too late at night). How about: /* Ok, it looks like it is advisable to defer. */ tp->tso_deferred = jiffies; /* this won't defer if jiffies == 0 but it's only a 1 in * 4 billion event (and avoids a branch) */ return (jiffies != 0); -Bill -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
On Fri, 21 Dec 2007, David Miller wrote: > From: Herbert Xu <[EMAIL PROTECTED]> > Date: Fri, 21 Dec 2007 17:29:27 +0800 > > > On Fri, Dec 21, 2007 at 01:27:20AM -0800, David Miller wrote: > > > > > > It's two shifts, and this gets scheduled along with the other > > > instructions on many cpus so it's effectively free. > > > > > > I don't see why this is even worth mentioning and discussing. > > > > I totally agree. Two shifts are way better than a branch. > > We take probably a thousand+ 100+ cycle cache misses in the TCP stack > on big window openning ACKs. > > Instead of discussing ways to solve that huge performance killer we're > wanking about two friggin' integer shifts. > > It's hilarious isn't it? :-) I don't think obfuscated code is hilarious. Instead of the convoluted and dense code: /* Defer for less than two clock ticks. */ if (tp->tso_deferred && ((jiffies << 1) >> 1) - (tp->tso_deferred >> 1) > 1) You can have the much simpler and more easily understandable: /* Defer for less than two clock ticks. */ if (tp->tso_deferred && (jiffies - tp->tso_deferred) > 1) And instead of: /* Ok, it looks like it is advisable to defer. */ tp->tso_deferred = 1 | (jiffies<<1); return 1; You could do as Ilpo suggested: /* Ok, it looks like it is advisable to defer. */ tp->tso_deferred = max_t(u32, jiffies, 1); return 1; Or perhaps more efficiently: /* Ok, it looks like it is advisable to defer. */ tp->tso_deferred = jiffies; if (unlikely(jiffies == 0)) tp->tso_deferred = 1; return 1; Or perhaps even: /* Ok, it looks like it is advisable to defer. */ tp->tso_deferred = jiffies; /* need to return a non-zero value to defer, which means won't * defer if jiffies == 0 but it's only a 1 in 4 billion event * (and avoids a compare/branch by not checking jiffies) / return jiffies; Since it really only needs a non-zero return value to defer. See, no branches needed and much clearer code. That seems worthwhile to me from a code maintenance standpoint, even if it isn't any speed improvement. And what about the 64-bit jiffies versus 32-bit tp->tso_deferred issue? Should tso_deferred be made unsigned long to match jiffies? -Bill -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
From: Herbert Xu <[EMAIL PROTECTED]> Date: Fri, 21 Dec 2007 17:29:27 +0800 > On Fri, Dec 21, 2007 at 01:27:20AM -0800, David Miller wrote: > > > > It's two shifts, and this gets scheduled along with the other > > instructions on many cpus so it's effectively free. > > > > I don't see why this is even worth mentioning and discussing. > > I totally agree. Two shifts are way better than a branch. We take probably a thousand+ 100+ cycle cache misses in the TCP stack on big window openning ACKs. Instead of discussing ways to solve that huge performance killer we're wanking about two friggin' integer shifts. It's hilarious isn't it? :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
On Fri, Dec 21, 2007 at 01:27:20AM -0800, David Miller wrote: > > It's two shifts, and this gets scheduled along with the other > instructions on many cpus so it's effectively free. > > I don't see why this is even worth mentioning and discussing. I totally agree. Two shifts are way better than a branch. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
On Fri, 21 Dec 2007, Ilpo Järvinen wrote: > On Fri, 21 Dec 2007, Bill Fink wrote: > > > If so it seems like a lot of unnecessary > > work just to avoid a 1 in 4 billion event, since it's my understanding > > that the whole tcp_tso_should_defer function is just an optimization > > and not a criticality to the proper functioning of TCP, especially > > considering it hasn't even been executing at all up to now. > > It would still be good to not return 1 in that case we didn't flag the > deferral, how about adding one additional tick (+comment) in the zero > case making tso_deferred == 0 again unambiguously defined, e.g.: > > tp->tso_deferred = min_t(u32, jiffies, 1); Blah, max_t of course :-). -- i.
Re: TSO trimming question
From: Bill Fink <[EMAIL PROTECTED]> Date: Fri, 21 Dec 2007 03:06:48 -0500 > What's with all the shifting back and forth? Here with: > > ((jiffies<<1)>>1) - (tp->tso_deferred>>1) > > and later with: > > /* Ok, it looks like it is advisable to defer. */ > tp->tso_deferred = 1 | (jiffies<<1); > > Is this just done to try and avoid the special case of jiffies==0 > when the jiffies wrap? If so it seems like a lot of unnecessary > work just to avoid a 1 in 4 billion event, since it's my understanding > that the whole tcp_tso_should_defer function is just an optimization > and not a criticality to the proper functioning of TCP, especially > considering it hasn't even been executing at all up to now. How else would you avoid the incorrect result when jiffies is indeed zero? It's two shifts, and this gets scheduled along with the other instructions on many cpus so it's effectively free. I don't see why this is even worth mentioning and discussing. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
On Fri, 21 Dec 2007, Bill Fink wrote: > I meant to ask about this a while back but then got distracted by > other things. But now since the subject has come up, I had a couple > of more questions about this code. > > What's with all the shifting back and forth? Here with: > > ((jiffies<<1)>>1) - (tp->tso_deferred>>1) > > and later with: > > /* Ok, it looks like it is advisable to defer. */ > tp->tso_deferred = 1 | (jiffies<<1); > > Is this just done to try and avoid the special case of jiffies==0 > when the jiffies wrap? I thought that it must be the reason. I couldn't figure out any other explination while thinking the same thing but since I saw no problem in that rather weird approach, I didn't want to touch that in a patch which had net-2.6 (or stable) potential. > If so it seems like a lot of unnecessary > work just to avoid a 1 in 4 billion event, since it's my understanding > that the whole tcp_tso_should_defer function is just an optimization > and not a criticality to the proper functioning of TCP, especially > considering it hasn't even been executing at all up to now. It would still be good to not return 1 in that case we didn't flag the deferral, how about adding one additional tick (+comment) in the zero case making tso_deferred == 0 again unambiguously defined, e.g.: tp->tso_deferred = min_t(u32, jiffies, 1); ...I'm relatively sure that nobody would ever notice that tick :-) and we kept return value consistent with tso_deferred state invariant. > My second question is more basic and if I'm not mistaken actually > relates to a remaining bug in the (corrected) test: > > /* Defer for less than two clock ticks. */ > if (tp->tso_deferred && > ((jiffies << 1) >> 1) - (tp->tso_deferred >> 1) > 1) > > Since jiffies is an unsigned long, which is 64-bits on a 64-bit system, > whereas tp->tso_deferred is a u32, once jiffies exceeds 31-bits, which > will happen in about 25 days if HZ=1000, won't the second part of the > test always be true after that? Or am I missing something obvious? I didn't notice that earlier but I think you're right though my knowledge about jiffies and such is quite limited. ...Feel free to submit a patch to correct these. -- i. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
On Thu, 20 Dec 2007, David Miller wrote: > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET) > > > [PATCH] [TCP]: Fix TSO deferring > > > > I'd say that most of what tcp_tso_should_defer had in between > > there was dead code because of this. > > > > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> > > Yikes! > > John, we've been living a lie for more than a year. :-/ > > On the bright side this explains a lot of small TSO frames I've been > seeing in traces over the past year but never got a chance to > investigate. > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > > index 8dafda9..693b9f6 100644 > > --- a/net/ipv4/tcp_output.c > > +++ b/net/ipv4/tcp_output.c > > @@ -1217,7 +1217,8 @@ static int tcp_tso_should_defer(struct sock *sk, > > struct sk_buff *skb) > > goto send_now; > > > > /* Defer for less than two clock ticks. */ > > - if (!tp->tso_deferred && ((jiffies<<1)>>1) - (tp->tso_deferred>>1) > 1) > > + if (tp->tso_deferred && > > + ((jiffies << 1) >> 1) - (tp->tso_deferred >> 1) > 1) > > goto send_now; > > > > in_flight = tcp_packets_in_flight(tp); I meant to ask about this a while back but then got distracted by other things. But now since the subject has come up, I had a couple of more questions about this code. What's with all the shifting back and forth? Here with: ((jiffies<<1)>>1) - (tp->tso_deferred>>1) and later with: /* Ok, it looks like it is advisable to defer. */ tp->tso_deferred = 1 | (jiffies<<1); Is this just done to try and avoid the special case of jiffies==0 when the jiffies wrap? If so it seems like a lot of unnecessary work just to avoid a 1 in 4 billion event, since it's my understanding that the whole tcp_tso_should_defer function is just an optimization and not a criticality to the proper functioning of TCP, especially considering it hasn't even been executing at all up to now. My second question is more basic and if I'm not mistaken actually relates to a remaining bug in the (corrected) test: /* Defer for less than two clock ticks. */ if (tp->tso_deferred && ((jiffies << 1) >> 1) - (tp->tso_deferred >> 1) > 1) Since jiffies is an unsigned long, which is 64-bits on a 64-bit system, whereas tp->tso_deferred is a u32, once jiffies exceeds 31-bits, which will happen in about 25 days if HZ=1000, won't the second part of the test always be true after that? Or am I missing something obvious? -Bill -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
From: John Heffner <[EMAIL PROTECTED]> Date: Thu, 20 Dec 2007 11:02:21 -0500 > David Miller wrote: > > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > > Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET) > > > >> [PATCH] [TCP]: Fix TSO deferring > >> > >> I'd say that most of what tcp_tso_should_defer had in between > >> there was dead code because of this. > >> > >> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> > > > > Yikes! > > > > John, we've been living a lie for more than a year. :-/ > > > > On the bright side this explains a lot of small TSO frames I've been > > seeing in traces over the past year but never got a chance to > > investigate. > > Ouch. This fix may improve some benchmarks. > > Re-checking this function was on my list of things to do because I had > also noticed some TSO frames that seemed a bit small. This clearly > explains it. What I'll do for now is I'll put this into net-2.6.25 to let it "cook" for a while. It's an obvious fix but it's enabling code that's effectively been disabled for more than a year so something might turn up that we need to fix. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
From: Herbert Xu <[EMAIL PROTECTED]> Date: Thu, 20 Dec 2007 22:00:12 +0800 > On Thu, Dec 20, 2007 at 04:00:37AM -0800, David Miller wrote: > > > > In the most ideal sense, tcp_window_allows() should probably > > be changed to only return MSS multiples. > > > > Unfortunately this would add an expensive modulo operation, > > however I think it would elimiate this problem case. > > Well you only have to divide in the unlikely case of us being > limited by the receiver window. In that case speed is probably > not of the essence anyway. Agreed, to some extent. I say "to some extent" because it might be realistic, with lots (millions) of sockets to hit this case a lot. There are so many things that are a "don't care" performance wise until you have a lot of stinky connections over crappy links. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
David Miller wrote: From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET) [PATCH] [TCP]: Fix TSO deferring I'd say that most of what tcp_tso_should_defer had in between there was dead code because of this. Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> Yikes! John, we've been living a lie for more than a year. :-/ On the bright side this explains a lot of small TSO frames I've been seeing in traces over the past year but never got a chance to investigate. Ouch. This fix may improve some benchmarks. Re-checking this function was on my list of things to do because I had also noticed some TSO frames that seemed a bit small. This clearly explains it. -John -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
On Thu, Dec 20, 2007 at 04:00:37AM -0800, David Miller wrote: > > In the most ideal sense, tcp_window_allows() should probably > be changed to only return MSS multiples. > > Unfortunately this would add an expensive modulo operation, > however I think it would elimiate this problem case. Well you only have to divide in the unlikely case of us being limited by the receiver window. In that case speed is probably not of the essence anyway. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
On Thu, 20 Dec 2007, David Miller wrote: > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET) > > > That's not the only case, IMHO if there's odd boundary due to > > snd_una+snd_wnd - skb->seq limit (done in tcp_window_allows()), we don't > > consider it as odd but break the skb at arbitary point resulting > > two small segments to the network, and what's worse, when the later skb > > resulting from the first split is matching skb->len < limit check as well > > causing an unnecessary small skb to be created for nagle purpose too, > > solving it fully requires some thought in case the mss_now != mss_cache > > even if non-odd boundaries are honored in the middle of skb. > > In the most ideal sense, tcp_window_allows() should probably > be changed to only return MSS multiples. That's what Herbert suggested already, I'll send a patch later on... :-) > Unfortunately this would add an expensive modulo operation, > however I think it would elimiate this problem case. Yes. Should we still call tcp_minshall_update() if split in the middle of wq results in smaller than MSS tail (occurs only if mss_now != mss_cache)? -- i.
Re: TSO trimming question
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET) > That's not the only case, IMHO if there's odd boundary due to > snd_una+snd_wnd - skb->seq limit (done in tcp_window_allows()), we don't > consider it as odd but break the skb at arbitary point resulting > two small segments to the network, and what's worse, when the later skb > resulting from the first split is matching skb->len < limit check as well > causing an unnecessary small skb to be created for nagle purpose too, > solving it fully requires some thought in case the mss_now != mss_cache > even if non-odd boundaries are honored in the middle of skb. In the most ideal sense, tcp_window_allows() should probably be changed to only return MSS multiples. Unfortunately this would add an expensive modulo operation, however I think it would elimiate this problem case. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET) > [PATCH] [TCP]: Fix TSO deferring > > I'd say that most of what tcp_tso_should_defer had in between > there was dead code because of this. > > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> Yikes! John, we've been living a lie for more than a year. :-/ On the bright side this explains a lot of small TSO frames I've been seeing in traces over the past year but never got a chance to investigate. > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 8dafda9..693b9f6 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -1217,7 +1217,8 @@ static int tcp_tso_should_defer(struct sock *sk, struct > sk_buff *skb) > goto send_now; > > /* Defer for less than two clock ticks. */ > - if (!tp->tso_deferred && ((jiffies<<1)>>1) - (tp->tso_deferred>>1) > 1) > + if (tp->tso_deferred && > + ((jiffies << 1) >> 1) - (tp->tso_deferred >> 1) > 1) > goto send_now; > > in_flight = tcp_packets_in_flight(tp); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
On Wed, 19 Dec 2007, David Miller wrote: > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > Date: Wed, 19 Dec 2007 23:46:33 +0200 (EET) > > > I'm not fully sure what's purpose of this code in tcp_write_xmit: > > > >if (skb->len < limit) { > >unsigned int trim = skb->len % mss_now; > > > >if (trim) > >limit = skb->len - trim; > > } > > > > Is it used to make sure we send only multiples of mss_now here and leave > > the left-over into another skb? Yeah, I now understand that this part is correct. I somehow got such impression while trying to figure this out that it ends up being dead code but that wasn't correct thought from my side. However, it caught my attention and after some thinking I'd say there's more to handle here (covered by the second question). Also note that patch I sent earlier is not right either but needs some refining to do the right thing. > > Or does it try to make sure that > > tso_fragment result honors multiple of mss_now boundaries when snd_wnd > > is the limitting factor? For latter IMHO this would be necessary: > > > > if (skb->len > limit) > > limit -= limit % mss_now; > > The purpose of the test is to make sure we process tail sub-mss chunks > correctly wrt. Nagle, which most closely matches the first purpose > you've listed. > > So I think the calculation really does belong where it is. > > Because of the way that the sendmsg() super-skb formation logic > works, we always will tack on more data and grow the tail > SKB before creating a new one. So any sub-mss chunk at the > end of a TSO frame really is at the end of the write queue > and really should get nagle processing. Yes, I now agree this is fully correct for this task. > Actually, there is an exception, which is when we run out of > skb_frag_list slots. In that case we'll potentially have breaks at > odd boundaries in the middle of the queue. But this can only happen > in exceptional cases (user does tons of 1-byte sendfile()'s over > random non-consequetive locations of a file) or outright bugs > (MAX_SKB_FRAGS is defined incorrectly, for example) and thus this > situation is not worth coding for. That's not the only case, IMHO if there's odd boundary due to snd_una+snd_wnd - skb->seq limit (done in tcp_window_allows()), we don't consider it as odd but break the skb at arbitary point resulting two small segments to the network, and what's worse, when the later skb resulting from the first split is matching skb->len < limit check as well causing an unnecessary small skb to be created for nagle purpose too, solving it fully requires some thought in case the mss_now != mss_cache even if non-odd boundaries are honored in the middle of skb. Though whether we get there is depending on what tcp_tso_should_defer() decided. Hmm, there seems to be an unrelated bug in it as well :-/. A patch below. Please consider the fact that enabling TSO deferring may have some unpleasant effect to TCP dynamics, considering that I don't find stable mandatory for this to avoid breaking, besides things have been working quite well without it too... Only compile tested. -- i. -- [PATCH] [TCP]: Fix TSO deferring I'd say that most of what tcp_tso_should_defer had in between there was dead code because of this. Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> --- net/ipv4/tcp_output.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 8dafda9..693b9f6 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1217,7 +1217,8 @@ static int tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb) goto send_now; /* Defer for less than two clock ticks. */ - if (!tp->tso_deferred && ((jiffies<<1)>>1) - (tp->tso_deferred>>1) > 1) + if (tp->tso_deferred && + ((jiffies << 1) >> 1) - (tp->tso_deferred >> 1) > 1) goto send_now; in_flight = tcp_packets_in_flight(tp); -- 1.5.0.6
Re: TSO trimming question
From: Herbert Xu <[EMAIL PROTECTED]> Date: Thu, 20 Dec 2007 11:26:18 +0800 > Ilpo J??rvinen <[EMAIL PROTECTED]> wrote: > > > >> is the limitting factor? For latter IMHO this would be necessary: > >> > >> if (skb->len > limit) > >> limit -= limit % mss_now; > > Good catch! But how about putting this logic into tcp_window_allows > since then you can avoid the divide for the case when we're not > bound by the receiver window. Because the purpose of this calculation is different, as I described in my other reply, I don't think this change is correct, regardless of where it is placed :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Wed, 19 Dec 2007 23:46:33 +0200 (EET) > I'm not fully sure what's purpose of this code in tcp_write_xmit: > >if (skb->len < limit) { >unsigned int trim = skb->len % mss_now; > >if (trim) >limit = skb->len - trim; > } > > Is it used to make sure we send only multiples of mss_now here and leave > the left-over into another skb? Or does it try to make sure that > tso_fragment result honors multiple of mss_now boundaries when snd_wnd > is the limitting factor? For latter IMHO this would be necessary: > > if (skb->len > limit) > limit -= limit % mss_now; The purpose of the test is to make sure we process tail sub-mss chunks correctly wrt. Nagle, which most closely matches the first purpose you've listed. So I think the calculation really does belong where it is. Because of the way that the sendmsg() super-skb formation logic works, we always will tack on more data and grow the tail SKB before creating a new one. So any sub-mss chunk at the end of a TSO frame really is at the end of the write queue and really should get nagle processing. Actually, there is an exception, which is when we run out of skb_frag_list slots. In that case we'll potentially have breaks at odd boundaries in the middle of the queue. But this can only happen in exceptional cases (user does tons of 1-byte sendfile()'s over random non-consequetive locations of a file) or outright bugs (MAX_SKB_FRAGS is defined incorrectly, for example) and thus this situation is not worth coding for. sendmsg()'s goal, when TSO is enabled, is to append to the write queue tail SKB, and it does so until the MAX_SKB_FRAGS limit is reached. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
Ilpo J??rvinen <[EMAIL PROTECTED]> wrote: > >> is the limitting factor? For latter IMHO this would be necessary: >> >> if (skb->len > limit) >> limit -= limit % mss_now; Good catch! But how about putting this logic into tcp_window_allows since then you can avoid the divide for the case when we're not bound by the receiver window. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
On Wed, 19 Dec 2007, Ilpo Järvinen wrote: > I'm not fully sure what's purpose of this code in tcp_write_xmit: > >if (skb->len < limit) { >unsigned int trim = skb->len % mss_now; > >if (trim) >limit = skb->len - trim; > } > > Is it used to make sure we send only multiples of mss_now here and leave > the left-over into another skb? Or does it try to make sure that > tso_fragment result honors multiple of mss_now boundaries when snd_wnd > is the limitting factor? For latter IMHO this would be necessary: > > if (skb->len > limit) > limit -= limit % mss_now; ...Anyway, here's an untested patch to just do it :-): -- i. [PATCH] [TCP]: Force tso skb split to mss_now boundary (if snd_wnd limits) If snd_wnd was limitting factor, the tso_fragment might not create full-sized skbs but would include the window left-overs as well. Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> --- net/ipv4/tcp_output.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 1c7ef17..8dafda9 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1453,6 +1453,8 @@ static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle) if (trim) limit = skb->len - trim; + } else if (skb->len > limit) { + limit -= limit % mss_now; } } @@ -1525,6 +1527,8 @@ void tcp_push_one(struct sock *sk, unsigned int mss_now) if (trim) limit = skb->len - trim; + } else if (skb->len > limit) { + limit -= limit % mss_now; } } -- 1.5.0.6