Re: [PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows

2007-07-31 Thread Ilpo Järvinen
On Tue, 31 Jul 2007, Stephen Hemminger wrote:

> I noticed no difference in the two flow tests.  That is not a bad thing, just
> that this test doesn't hit that code.

...I'm not too sure about your test setup but the bugs I fixed only cover 
cases that occur if flow is bidirectional (and obviously active in both 
directions at the same time), they won't occur in a case of unidirectional 
transfer or in request-reply style connections (well, in the latter 
case if there's some overlap, it can have effect but that's usually 
not significant)...

In case of bidirectional transfers, you *should* see some difference as 
previously the fast recovery was _very_ broken. Of course there could be 
other issue with large cwnd TCP that hides it by going to RTO still, but 
at least over 384k/200ms link (DBP sized buffers, IIRC), these change
behavior very dramatically, mainly in the initial slow-start overshoot 
recovery because in there losses per RTT is so high number compared to 
what is experienced later on. One or a few losses are usually recovered 
without RTO when congestion happens later on.

> The anomaly is that first flow does slow start then gets loss and ends up
> reducing it's window size all the way to the bottom, finally it recovers.
> This happens with Cubic, H-TCP and others as well; if the queue in the
> network is large enough, they don't handle the initial loss well.

...TCP related stuff that changed in /proc/net/netstat might shed 
some light to this if none of the given explinations please you... :-)

> See the graph.

What exactly do you mean by "RENO" in the title, I mean what's tcp_sack 
set to? There is occassionally a bit confusion in that respect in the 
terminology @ netdev, I've used to reno refering to non-SACK stuff 
elsewhere but in here that's not always the case... Usually it's possible 
to derive the correct interpretation from the context, but in this case 
I'm not too sure... :-)

What I often have often seen with non-SACK TCP is that initial slow-start 
exhausts even very large advertised window on high DBP link and then due 
to draining of ACK feedback, gets RTOed... That usually shows up as long 
lasting recovery where one segment per RTT is recovered and new data is 
being sent as duplicate ACKs arrive with nearly constant rate until the 
window limit is hit (but I cannot see such periond in the graph you 
posted, so I guess it's not the explanation in this case). And if your 
"RENO" refers to something with SACK, that's not going to explain it 
anyway.

...Another nasty one I know is RED+ECN, though I'd say it's a bit far 
fetched one, as ECN cannot be used nicely in retransmission, 
a retransmission gets dropped instead of marking if RED wanted to mark.
I guess that doesn't occur in your test case either?


-- 
 i.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows

2007-07-31 Thread Stephen Hemminger
I noticed no difference in the two flow tests.  That is not a bad thing, just
that this test doesn't hit that code.

The anomaly is that first flow does slow start then gets loss and ends up
reducing it's window size all the way to the bottom, finally it recovers.
This happens with Cubic, H-TCP and others as well; if the queue in the
network is large enough, they don't handle the initial loss well.

See the graph.

<>

Re: [PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows

2007-07-31 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 31 Jul 2007 10:51:13 +0100

> On Mon, 30 Jul 2007 22:21:12 -0700 (PDT)
> David Miller <[EMAIL PROTECTED]> wrote:
> 
> > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
> > Date: Tue, 31 Jul 2007 07:59:10 +0300 (EEST)
> > 
> > > I think it's probably good to add tp->snd_una != prior_snd_una
> > > check there too... It's not going to make a large difference,
> > > mostly just to be conservative when skb collapse stuff got done
> > > (and maybe to annoy cheaters too though I couldn't at this point
> > > figure out how they could abuse it)...
> > > 
> > > ...I think I can come up with that on Wednesday, so please hold
> > > stable push until that.
> > 
> > It'll definitely need to wait at least a day as I cannot even make TCP
> > or UDP connections out from my machine with the current net-2.6 tree,
> > and this is what I'm debugging at the moment.
> > 
> > I'm hoping that it's just something stupid like the make not
> > rebuilding everything necessary after I changed the __u16's into
> > __u32's in skb_frag_t.
> 
> I running tests over emulator this morning.
> Hopefully, this will fix the window collapse that occurs on startup
> of large queue sizes.

Great.

For the record the bug I was chasing was an IPSEC issue which
Herbert fixed a few moments ago.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows

2007-07-31 Thread Stephen Hemminger
On Mon, 30 Jul 2007 22:21:12 -0700 (PDT)
David Miller <[EMAIL PROTECTED]> wrote:

> From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
> Date: Tue, 31 Jul 2007 07:59:10 +0300 (EEST)
> 
> > I think it's probably good to add tp->snd_una != prior_snd_una
> > check there too... It's not going to make a large difference,
> > mostly just to be conservative when skb collapse stuff got done
> > (and maybe to annoy cheaters too though I couldn't at this point
> > figure out how they could abuse it)...
> > 
> > ...I think I can come up with that on Wednesday, so please hold
> > stable push until that.
> 
> It'll definitely need to wait at least a day as I cannot even make TCP
> or UDP connections out from my machine with the current net-2.6 tree,
> and this is what I'm debugging at the moment.
> 
> I'm hoping that it's just something stupid like the make not
> rebuilding everything necessary after I changed the __u16's into
> __u32's in skb_frag_t.

I running tests over emulator this morning.
Hopefully, this will fix the window collapse that occurs on startup
of large queue sizes.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows

2007-07-30 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Tue, 31 Jul 2007 07:59:10 +0300 (EEST)

> I think it's probably good to add tp->snd_una != prior_snd_una
> check there too... It's not going to make a large difference,
> mostly just to be conservative when skb collapse stuff got done
> (and maybe to annoy cheaters too though I couldn't at this point
> figure out how they could abuse it)...
> 
> ...I think I can come up with that on Wednesday, so please hold
> stable push until that.

It'll definitely need to wait at least a day as I cannot even make TCP
or UDP connections out from my machine with the current net-2.6 tree,
and this is what I'm debugging at the moment.

I'm hoping that it's just something stupid like the make not
rebuilding everything necessary after I changed the __u16's into
__u32's in skb_frag_t.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows

2007-07-30 Thread Ilpo Järvinen
On Mon, 30 Jul 2007, David Miller wrote:

> From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
> Date: Mon, 30 Jul 2007 20:18:45 +0300 (EEST)
> 
> > Actually, the ratehalving seems to work too well, as cwnd is
> > reduced on every second ACK even though the packets in flight
> > remains unchanged. Recoveries in a bidirectional flows suffer
> > quite badly because of this, both NewReno and SACK are affected.
> > 
> > After this patch, rate halving is performed for ACK only if
> > packets in flight was supposedly changed too.
> > 
> > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
> 
> This fix looks great, and I see no potential problems with
> Reno after doing a bit of auditing.
> 
> Applied and I'll push to -stable too.

I think it's probably good to add tp->snd_una != prior_snd_una
check there too... It's not going to make a large difference,
mostly just to be conservative when skb collapse stuff got done
(and maybe to annoy cheaters too though I couldn't at this point
figure out how they could abuse it)...

...I think I can come up with that on Wednesday, so please hold
stable push until that.

-- 
 i.

Re: [PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows

2007-07-30 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Mon, 30 Jul 2007 20:18:45 +0300 (EEST)

> Actually, the ratehalving seems to work too well, as cwnd is
> reduced on every second ACK even though the packets in flight
> remains unchanged. Recoveries in a bidirectional flows suffer
> quite badly because of this, both NewReno and SACK are affected.
> 
> After this patch, rate halving is performed for ACK only if
> packets in flight was supposedly changed too.
> 
> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>

This fix looks great, and I see no potential problems with
Reno after doing a bit of auditing.

Applied and I'll push to -stable too.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows

2007-07-30 Thread Ilpo Järvinen

Actually, the ratehalving seems to work too well, as cwnd is
reduced on every second ACK even though the packets in flight
remains unchanged. Recoveries in a bidirectional flows suffer
quite badly because of this, both NewReno and SACK are affected.

After this patch, rate halving is performed for ACK only if
packets in flight was supposedly changed too.

Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
---
 net/ipv4/tcp_input.c |   23 +--
 1 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0163051..767f92c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1848,19 +1848,22 @@ static inline u32 tcp_cwnd_min(const struct sock *sk)
 }
 
 /* Decrease cwnd each second ack. */
-static void tcp_cwnd_down(struct sock *sk)
+static void tcp_cwnd_down(struct sock *sk, int flag)
 {
struct tcp_sock *tp = tcp_sk(sk);
int decr = tp->snd_cwnd_cnt + 1;
+   
+   if ((flag&FLAG_FORWARD_PROGRESS) ||
+   (IsReno(tp) && !(flag&FLAG_NOT_DUP))) {
+   tp->snd_cwnd_cnt = decr&1;
+   decr >>= 1;
 
-   tp->snd_cwnd_cnt = decr&1;
-   decr >>= 1;
+   if (decr && tp->snd_cwnd > tcp_cwnd_min(sk))
+   tp->snd_cwnd -= decr;
 
-   if (decr && tp->snd_cwnd > tcp_cwnd_min(sk))
-   tp->snd_cwnd -= decr;
-
-   tp->snd_cwnd = min(tp->snd_cwnd, tcp_packets_in_flight(tp)+1);
-   tp->snd_cwnd_stamp = tcp_time_stamp;
+   tp->snd_cwnd = min(tp->snd_cwnd, tcp_packets_in_flight(tp)+1);
+   tp->snd_cwnd_stamp = tcp_time_stamp;
+   }
 }
 
 /* Nothing was retransmitted or returned timestamp is less
@@ -2057,7 +2060,7 @@ static void tcp_try_to_open(struct sock *sk, int flag)
}
tcp_moderate_cwnd(tp);
} else {
-   tcp_cwnd_down(sk);
+   tcp_cwnd_down(sk, flag);
}
 }
 
@@ -2257,7 +2260,7 @@ tcp_fastretrans_alert(struct sock *sk, u32 prior_snd_una,
 
if (is_dupack || tcp_head_timedout(sk))
tcp_update_scoreboard(sk);
-   tcp_cwnd_down(sk);
+   tcp_cwnd_down(sk, flag);
tcp_xmit_retransmit_queue(sk);
 }
 
-- 
1.5.0.6