Re: [PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows
On Tue, 31 Jul 2007, Stephen Hemminger wrote: > I noticed no difference in the two flow tests. That is not a bad thing, just > that this test doesn't hit that code. ...I'm not too sure about your test setup but the bugs I fixed only cover cases that occur if flow is bidirectional (and obviously active in both directions at the same time), they won't occur in a case of unidirectional transfer or in request-reply style connections (well, in the latter case if there's some overlap, it can have effect but that's usually not significant)... In case of bidirectional transfers, you *should* see some difference as previously the fast recovery was _very_ broken. Of course there could be other issue with large cwnd TCP that hides it by going to RTO still, but at least over 384k/200ms link (DBP sized buffers, IIRC), these change behavior very dramatically, mainly in the initial slow-start overshoot recovery because in there losses per RTT is so high number compared to what is experienced later on. One or a few losses are usually recovered without RTO when congestion happens later on. > The anomaly is that first flow does slow start then gets loss and ends up > reducing it's window size all the way to the bottom, finally it recovers. > This happens with Cubic, H-TCP and others as well; if the queue in the > network is large enough, they don't handle the initial loss well. ...TCP related stuff that changed in /proc/net/netstat might shed some light to this if none of the given explinations please you... :-) > See the graph. What exactly do you mean by "RENO" in the title, I mean what's tcp_sack set to? There is occassionally a bit confusion in that respect in the terminology @ netdev, I've used to reno refering to non-SACK stuff elsewhere but in here that's not always the case... Usually it's possible to derive the correct interpretation from the context, but in this case I'm not too sure... :-) What I often have often seen with non-SACK TCP is that initial slow-start exhausts even very large advertised window on high DBP link and then due to draining of ACK feedback, gets RTOed... That usually shows up as long lasting recovery where one segment per RTT is recovered and new data is being sent as duplicate ACKs arrive with nearly constant rate until the window limit is hit (but I cannot see such periond in the graph you posted, so I guess it's not the explanation in this case). And if your "RENO" refers to something with SACK, that's not going to explain it anyway. ...Another nasty one I know is RED+ECN, though I'd say it's a bit far fetched one, as ECN cannot be used nicely in retransmission, a retransmission gets dropped instead of marking if RED wanted to mark. I guess that doesn't occur in your test case either? -- i. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows
I noticed no difference in the two flow tests. That is not a bad thing, just that this test doesn't hit that code. The anomaly is that first flow does slow start then gets loss and ends up reducing it's window size all the way to the bottom, finally it recovers. This happens with Cubic, H-TCP and others as well; if the queue in the network is large enough, they don't handle the initial loss well. See the graph. <>
Re: [PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Tue, 31 Jul 2007 10:51:13 +0100 > On Mon, 30 Jul 2007 22:21:12 -0700 (PDT) > David Miller <[EMAIL PROTECTED]> wrote: > > > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > > Date: Tue, 31 Jul 2007 07:59:10 +0300 (EEST) > > > > > I think it's probably good to add tp->snd_una != prior_snd_una > > > check there too... It's not going to make a large difference, > > > mostly just to be conservative when skb collapse stuff got done > > > (and maybe to annoy cheaters too though I couldn't at this point > > > figure out how they could abuse it)... > > > > > > ...I think I can come up with that on Wednesday, so please hold > > > stable push until that. > > > > It'll definitely need to wait at least a day as I cannot even make TCP > > or UDP connections out from my machine with the current net-2.6 tree, > > and this is what I'm debugging at the moment. > > > > I'm hoping that it's just something stupid like the make not > > rebuilding everything necessary after I changed the __u16's into > > __u32's in skb_frag_t. > > I running tests over emulator this morning. > Hopefully, this will fix the window collapse that occurs on startup > of large queue sizes. Great. For the record the bug I was chasing was an IPSEC issue which Herbert fixed a few moments ago. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows
On Mon, 30 Jul 2007 22:21:12 -0700 (PDT) David Miller <[EMAIL PROTECTED]> wrote: > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > Date: Tue, 31 Jul 2007 07:59:10 +0300 (EEST) > > > I think it's probably good to add tp->snd_una != prior_snd_una > > check there too... It's not going to make a large difference, > > mostly just to be conservative when skb collapse stuff got done > > (and maybe to annoy cheaters too though I couldn't at this point > > figure out how they could abuse it)... > > > > ...I think I can come up with that on Wednesday, so please hold > > stable push until that. > > It'll definitely need to wait at least a day as I cannot even make TCP > or UDP connections out from my machine with the current net-2.6 tree, > and this is what I'm debugging at the moment. > > I'm hoping that it's just something stupid like the make not > rebuilding everything necessary after I changed the __u16's into > __u32's in skb_frag_t. I running tests over emulator this morning. Hopefully, this will fix the window collapse that occurs on startup of large queue sizes. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Tue, 31 Jul 2007 07:59:10 +0300 (EEST) > I think it's probably good to add tp->snd_una != prior_snd_una > check there too... It's not going to make a large difference, > mostly just to be conservative when skb collapse stuff got done > (and maybe to annoy cheaters too though I couldn't at this point > figure out how they could abuse it)... > > ...I think I can come up with that on Wednesday, so please hold > stable push until that. It'll definitely need to wait at least a day as I cannot even make TCP or UDP connections out from my machine with the current net-2.6 tree, and this is what I'm debugging at the moment. I'm hoping that it's just something stupid like the make not rebuilding everything necessary after I changed the __u16's into __u32's in skb_frag_t. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows
On Mon, 30 Jul 2007, David Miller wrote: > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > Date: Mon, 30 Jul 2007 20:18:45 +0300 (EEST) > > > Actually, the ratehalving seems to work too well, as cwnd is > > reduced on every second ACK even though the packets in flight > > remains unchanged. Recoveries in a bidirectional flows suffer > > quite badly because of this, both NewReno and SACK are affected. > > > > After this patch, rate halving is performed for ACK only if > > packets in flight was supposedly changed too. > > > > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> > > This fix looks great, and I see no potential problems with > Reno after doing a bit of auditing. > > Applied and I'll push to -stable too. I think it's probably good to add tp->snd_una != prior_snd_una check there too... It's not going to make a large difference, mostly just to be conservative when skb collapse stuff got done (and maybe to annoy cheaters too though I couldn't at this point figure out how they could abuse it)... ...I think I can come up with that on Wednesday, so please hold stable push until that. -- i.
Re: [PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Mon, 30 Jul 2007 20:18:45 +0300 (EEST) > Actually, the ratehalving seems to work too well, as cwnd is > reduced on every second ACK even though the packets in flight > remains unchanged. Recoveries in a bidirectional flows suffer > quite badly because of this, both NewReno and SACK are affected. > > After this patch, rate halving is performed for ACK only if > packets in flight was supposedly changed too. > > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> This fix looks great, and I see no potential problems with Reno after doing a bit of auditing. Applied and I'll push to -stable too. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows
Actually, the ratehalving seems to work too well, as cwnd is reduced on every second ACK even though the packets in flight remains unchanged. Recoveries in a bidirectional flows suffer quite badly because of this, both NewReno and SACK are affected. After this patch, rate halving is performed for ACK only if packets in flight was supposedly changed too. Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> --- net/ipv4/tcp_input.c | 23 +-- 1 files changed, 13 insertions(+), 10 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 0163051..767f92c 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1848,19 +1848,22 @@ static inline u32 tcp_cwnd_min(const struct sock *sk) } /* Decrease cwnd each second ack. */ -static void tcp_cwnd_down(struct sock *sk) +static void tcp_cwnd_down(struct sock *sk, int flag) { struct tcp_sock *tp = tcp_sk(sk); int decr = tp->snd_cwnd_cnt + 1; + + if ((flag&FLAG_FORWARD_PROGRESS) || + (IsReno(tp) && !(flag&FLAG_NOT_DUP))) { + tp->snd_cwnd_cnt = decr&1; + decr >>= 1; - tp->snd_cwnd_cnt = decr&1; - decr >>= 1; + if (decr && tp->snd_cwnd > tcp_cwnd_min(sk)) + tp->snd_cwnd -= decr; - if (decr && tp->snd_cwnd > tcp_cwnd_min(sk)) - tp->snd_cwnd -= decr; - - tp->snd_cwnd = min(tp->snd_cwnd, tcp_packets_in_flight(tp)+1); - tp->snd_cwnd_stamp = tcp_time_stamp; + tp->snd_cwnd = min(tp->snd_cwnd, tcp_packets_in_flight(tp)+1); + tp->snd_cwnd_stamp = tcp_time_stamp; + } } /* Nothing was retransmitted or returned timestamp is less @@ -2057,7 +2060,7 @@ static void tcp_try_to_open(struct sock *sk, int flag) } tcp_moderate_cwnd(tp); } else { - tcp_cwnd_down(sk); + tcp_cwnd_down(sk, flag); } } @@ -2257,7 +2260,7 @@ tcp_fastretrans_alert(struct sock *sk, u32 prior_snd_una, if (is_dupack || tcp_head_timedout(sk)) tcp_update_scoreboard(sk); - tcp_cwnd_down(sk); + tcp_cwnd_down(sk, flag); tcp_xmit_retransmit_queue(sk); } -- 1.5.0.6