Re: [PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.

2007-03-03 Thread Baruch Even
* David Miller [EMAIL PROTECTED] [070303 08:22]:
 BTW, I think I figured out a way to get rid of
 lost_{skb,cnt}_hint.  The fact of the matter in this case is that
 the setting of the tag bits always propagates from front of the queue
 onward.  We don't get holes mid-way.
 
 So what we can do is search the RB-tree for high_seq and walk
 backwards.  Once we hit something with TCPCB_TAGBITS set, we
 stop processing as there are no earlier SKBs which we'd need
 to do anything with.
 
 Do you see any problems with that idea?

I think this will be a fairly long walk initially.

You can try an augmented walk. If you are on a node which is tagged
anything on the right side will be tagged as well since it is smaller,
so you need to go left. This way you can find the first non-tagged item
in O(log n).

A bug in this logic is that sequence numbers can and do wrap around.

If you are willing to change the logic of the tree you can remove any
sacked element from it. Many of the operations are really only
interested in the non-sacked skbs. This will be similar to my patches
with the non-sacked list but I still needed the hints since the number
of lost packets could still be large and some operations (retransmit
f.ex.) need to get to the end of the list.


 scoreboard_skb_hint is a little bit trickier, but it is a similar
 case to the tcp_lost_skb_hint case.  Except here the termination
 condition is a relative timeout instead of a sequence number and
 packet count test.
 
 Perhaps for that we can remember some state from the
 tcp_mark_head_lost() we do first.  In fact, we can start
 the queue walk from the latest packet which tcp_mark_head_lost()
 marked with a tag bit.
 
 Basically these two algorithms are saying:
 
 1) Mark up to smallest of 'lost' or tp-high_seq.
 2) Mark packets after those processed in #1 which have
timed out.
 
 Right?

Yes. This makes sense, the two algorithms start from the same place. I'd
even consider merging them into a single walk, unless we know that
usually on happens without the other.

There is another case like that for tcp_xmit_retrans where the forward
transmission should only start at the position that the retransmit
finished. I had that in my old patches and it improved performance at
the time.

Baruch
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.

2007-03-03 Thread Ilpo Järvinen
On Fri, 2 Mar 2007, David Miller wrote:

 From: Baruch Even [EMAIL PROTECTED]
 Date: Thu, 1 Mar 2007 20:13:40 +0200

  One drawback for this approach is that you now walk the entire sack
  block when you advance one packet. If you consider a 10,000 packet queue
  which had several losses at the beginning and a large sack block that
  advances from the middle to the end you'll walk a lot of packets for
  that one last stretch of a sack block.

Maybe this information in the last ACK could be even used as advantage to 
speed up the startpoint lookup, since the last ACK very likely contains 
the highest SACK block (could not hold under attack though), no matter how 
big it is, and it's not necessary to process it ever if we know for sure 
that it was the global highest rather than a local one. Using that, it's 
trivial to find the starting point below the SACK block and that could be 
passed to the head_lost marking on-the-fly using stack rather than in the
structs. Sort of fastpath too :-)... Maybe even some auto-tuning thingie 
which enables conditionally skipping of large SACK-blocks to reuse all of 
this last ACK information in mark_head but that would get rather 
complicated I think.

  One way to handle that is to use the still existing sack fast path to
  detect this case and calculate what is the sequence number to search
  for. Since you know what was the end_seq that was handled last, you can
  search for it as the start_seq and go on from there. Does it make sense?
 
 BTW, I think I figured out a way to get rid of
 lost_{skb,cnt}_hint.  The fact of the matter in this case is that
 the setting of the tag bits always propagates from front of the queue
 onward.  We don't get holes mid-way.
 
 So what we can do is search the RB-tree for high_seq and walk
 backwards.  Once we hit something with TCPCB_TAGBITS set, we
 stop processing as there are no earlier SKBs which we'd need
 to do anything with.

If TCP knows the highest SACK block skb (or its seq) and high_seq, finding 
the starting point is really trivial as you can always take min of them 
without any walking. Then walk backwards skipping first reordering, until 
the first LOST one is encountered. ...Only overhead then comes from the 
skipped which depends on the current reordering (of course that was not 
overhead if they were timedout). This would not even require knowledge 
about per skb fack_count because the skipping servers for its purpose and 
it can be done on the fly.

 scoreboard_skb_hint is a little bit trickier, but it is a similar
 case to the tcp_lost_skb_hint case.  Except here the termination
 condition is a relative timeout instead of a sequence number and
 packet count test.
 
 Perhaps for that we can remember some state from the
 tcp_mark_head_lost() we do first.  In fact, we can start
 the queue walk from the latest packet which tcp_mark_head_lost()
 marked with a tag bit.

Yes, the problem with this case compared to head_lost seems to be that
we don't know whether the first skb (in backwards walk) must be marked 
until we have walked past it (actually to the point where the first 
SACKED_RETRANS is encountered, timestamps should be in order except for 
the discontinuity that occurs at SACKED_RETRANS edge). So it seems to me 
that any backwards walk could be a bit problematic in this case exactly 
because of this discontinuity? Armed with this knowledge, however, 
backwards walk could start checking for timedout marking right at the 
first SACKED_RETRANS skb. And continue later from that point forward if 
the skb at the edge was also marked due to timeout. ...Actually, I think 
that other discontinuity can exists at the EVER_RETRANS edge but that 
suffers from the same problem as non-RETRANS skbs before SACKED_RETRANS 
edge when first encountered and therefore is probably pretty useless.

 Basically these two algorithms are saying:
 
 1) Mark up to smallest of 'lost' or tp-high_seq.
 2) Mark packets after those processed in #1 which have
timed out.
 
 Right?

So I would take another angle to this problem (basically combine the 
lost calculation from tcp_update_scoreboard and mark_head lost stuff and 
ignore this lost altogether). Basically what I propose is something like 
this (hopefully I don't stomp your feet by coding it this much as
pseudish approach seemed to get almost as complex :-)):

void tcp_update_scoreboard_fack(struct sock *sk)
{
struct tcp_sock *tp = tcp_sk(sk);
int reord_count = 0;
int had_retrans = 0;
struct sk_buff *timedout_reentry_skb = NULL;

/* How to get this highest_sack? */
skb = get_min_skb_wrapsafe(tp-high_seq, highest_sack);

walk_backwards_from(sk, skb) {
if (TCP_CB_SKB(skb)-sacked  TCPCB_LOST)
break;

if (TCP_CB_SKB(skb)-sacked  TCPCB_SACKED_RETRANS) {
had_retrans = 1;
if (tcp_skb_timedout(skb))
   

Re: [PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.

2007-03-03 Thread Ilpo Järvinen
On Sat, 3 Mar 2007, Ilpo Järvinen wrote:
 On Fri, 2 Mar 2007, David Miller wrote:
  From: Baruch Even [EMAIL PROTECTED]
  Date: Thu, 1 Mar 2007 20:13:40 +0200
 
   One drawback for this approach is that you now walk the entire sack
   block when you advance one packet. If you consider a 10,000 packet queue
   which had several losses at the beginning and a large sack block that
   advances from the middle to the end you'll walk a lot of packets for
   that one last stretch of a sack block.
 
 Maybe this information in the last ACK could be even used as advantage to 
 speed up the startpoint lookup, since the last ACK very likely contains 
 the highest SACK block (could not hold under attack though), no matter how 
 big it is, and it's not necessary to process it ever if we know for sure 
 that it was the global highest rather than a local one. Using that, it's 
 trivial to find the starting point below the SACK block and that could be 
 passed to the head_lost marking on-the-fly using stack rather than in the
 structs. Sort of fastpath too :-)... Maybe even some auto-tuning thingie 
 which enables conditionally skipping of large SACK-blocks to reuse all of 
 this last ACK information in mark_head but that would get rather 
 complicated I think.

Even better, adding to more the globally highest SACKed block range that 
is already larger than tp-reordering does nothing in mark_head_lost 
unless ca_state or high_seq are also changed (just a quick thoughts, 
it might be possible to exclude also them with careful analysis of state), 
isn't that so? But taking advantage of this might require inter-ACK state 
and be then less useful optimization. So only thing that would remain is 
the check for timedout above the highest SACKed block...

-- 
 i.

Re: [PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.

2007-03-02 Thread David Miller
From: Baruch Even [EMAIL PROTECTED]
Date: Thu, 1 Mar 2007 20:13:40 +0200

 If you take this approach it makes sense to also remove the sorting of
 SACKs, the traversal of the SACK blocks will not start from the
 beginning anyway which was the reason for this sorting in the first
 place.
 
 One drawback for this approach is that you now walk the entire sack
 block when you advance one packet. If you consider a 10,000 packet queue
 which had several losses at the beginning and a large sack block that
 advances from the middle to the end you'll walk a lot of packets for
 that one last stretch of a sack block.
 
 One way to handle that is to use the still existing sack fast path to
 detect this case and calculate what is the sequence number to search
 for. Since you know what was the end_seq that was handled last, you can
 search for it as the start_seq and go on from there. Does it make sense?

Thanks for the feedback and these great ideas.

BTW, I think I figured out a way to get rid of
lost_{skb,cnt}_hint.  The fact of the matter in this case is that
the setting of the tag bits always propagates from front of the queue
onward.  We don't get holes mid-way.

So what we can do is search the RB-tree for high_seq and walk
backwards.  Once we hit something with TCPCB_TAGBITS set, we
stop processing as there are no earlier SKBs which we'd need
to do anything with.

Do you see any problems with that idea?

scoreboard_skb_hint is a little bit trickier, but it is a similar
case to the tcp_lost_skb_hint case.  Except here the termination
condition is a relative timeout instead of a sequence number and
packet count test.

Perhaps for that we can remember some state from the
tcp_mark_head_lost() we do first.  In fact, we can start
the queue walk from the latest packet which tcp_mark_head_lost()
marked with a tag bit.

Basically these two algorithms are saying:

1) Mark up to smallest of 'lost' or tp-high_seq.
2) Mark packets after those processed in #1 which have
   timed out.

Right?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.

2007-03-01 Thread Baruch Even
* David Miller [EMAIL PROTECTED] [070228 21:49]:
 
 commit 71b270d966cd42e29eabcd39434c4ad4d33aa2be
 Author: David S. Miller [EMAIL PROTECTED]
 Date:   Tue Feb 27 19:28:07 2007 -0800
 
 [TCP]: Kill fastpath_{skb,cnt}_hint.
 
 Now that we have per-skb fack_counts and an interval
 search mechanism for the retransmit queue, we don't
 need these things any more.
 
 Instead, as we traverse the SACK blocks to tag the
 queue, we use the RB tree to lookup the first SKB
 covering the SACK block by sequence number.
 
 Signed-off-by: David S. Miller [EMAIL PROTECTED]

If you take this approach it makes sense to also remove the sorting of
SACKs, the traversal of the SACK blocks will not start from the
beginning anyway which was the reason for this sorting in the first
place.

One drawback for this approach is that you now walk the entire sack
block when you advance one packet. If you consider a 10,000 packet queue
which had several losses at the beginning and a large sack block that
advances from the middle to the end you'll walk a lot of packets for
that one last stretch of a sack block.

One way to handle that is to use the still existing sack fast path to
detect this case and calculate what is the sequence number to search
for. Since you know what was the end_seq that was handled last, you can
search for it as the start_seq and go on from there. Does it make sense?

Baruch
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.

2007-02-28 Thread David Miller

commit 71b270d966cd42e29eabcd39434c4ad4d33aa2be
Author: David S. Miller [EMAIL PROTECTED]
Date:   Tue Feb 27 19:28:07 2007 -0800

[TCP]: Kill fastpath_{skb,cnt}_hint.

Now that we have per-skb fack_counts and an interval
search mechanism for the retransmit queue, we don't
need these things any more.

Instead, as we traverse the SACK blocks to tag the
queue, we use the RB tree to lookup the first SKB
covering the SACK block by sequence number.

Signed-off-by: David S. Miller [EMAIL PROTECTED]

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index b73687a..c3f08a5 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -326,9 +326,7 @@ struct tcp_sock {
struct sk_buff *scoreboard_skb_hint;
struct sk_buff *retransmit_skb_hint;
struct sk_buff *forward_skb_hint;
-   struct sk_buff *fastpath_skb_hint;
 
-   int fastpath_cnt_hint;
int lost_cnt_hint;
int retransmit_cnt_hint;
int forward_cnt_hint;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 80a572b..408f210 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1047,12 +1047,12 @@ static inline void tcp_mib_init(void)
 }
 
 /*from STCP */
-static inline void clear_all_retrans_hints(struct tcp_sock *tp){
+static inline void clear_all_retrans_hints(struct tcp_sock *tp)
+{
tp-lost_skb_hint = NULL;
tp-scoreboard_skb_hint = NULL;
tp-retransmit_skb_hint = NULL;
tp-forward_skb_hint = NULL;
-   tp-fastpath_skb_hint = NULL;
 }
 
 /* MD5 Signature */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b919cd7..df69726 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -942,16 +942,14 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
struct tcp_sock *tp = tcp_sk(sk);
unsigned char *ptr = ack_skb-h.raw + TCP_SKB_CB(ack_skb)-sacked;
struct tcp_sack_block_wire *sp = (struct tcp_sack_block_wire *)(ptr+2);
-   struct sk_buff *cached_skb;
int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)3;
int reord = tp-packets_out;
int prior_fackets;
u32 lost_retrans = 0;
int flag = 0;
int dup_sack = 0;
-   int cached_fack_count;
+   int fack_count_base;
int i;
-   int first_sack_index;
 
if (!tp-sacked_out)
tp-fackets_out = 0;
@@ -1010,12 +1008,10 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
tp-recv_sack_cache[i].end_seq = 0;
}
 
-   first_sack_index = 0;
if (flag)
num_sacks = 1;
else {
int j;
-   tp-fastpath_skb_hint = NULL;
 
/* order SACK blocks to allow in order walk of the retrans 
queue */
for (i = num_sacks-1; i  0; i--) {
@@ -1027,10 +1023,6 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
tmp = sp[j];
sp[j] = sp[j+1];
sp[j+1] = tmp;
-
-   /* Track where the first SACK block 
goes to */
-   if (j == first_sack_index)
-   first_sack_index = j+1;
}
 
}
@@ -1040,22 +1032,17 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
/* clear flag as used for different purpose in following code */
flag = 0;
 
-   /* Use SACK fastpath hint if valid */
-   cached_skb = tp-fastpath_skb_hint;
-   cached_fack_count = tp-fastpath_cnt_hint;
-   if (!cached_skb) {
-   cached_skb = tcp_write_queue_head(sk);
-   cached_fack_count = 0;
-   }
-
+   fack_count_base = TCP_SKB_CB(tcp_write_queue_head(sk))-fack_count;
for (i=0; inum_sacks; i++, sp++) {
struct sk_buff *skb;
__u32 start_seq = ntohl(sp-start_seq);
__u32 end_seq = ntohl(sp-end_seq);
int fack_count;
 
-   skb = cached_skb;
-   fack_count = cached_fack_count;
+   skb = tcp_write_queue_find(sk, start_seq);
+   if (!skb)
+   continue;
+   fack_count = TCP_SKB_CB(skb)-fack_count - fack_count_base;
 
/* Event B in the comment above. */
if (after(end_seq, tp-high_seq))
@@ -1068,13 +1055,6 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
if (skb == tcp_send_head(sk))
break;
 
-   cached_skb = skb;
-   cached_fack_count = fack_count;
-   if (i == first_sack_index) {
-   

Re: [PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.

2007-02-28 Thread Stephen Hemminger
On Wed, 28 Feb 2007 11:49:49 -0800 (PST)
David Miller [EMAIL PROTECTED] wrote:

 
 commit 71b270d966cd42e29eabcd39434c4ad4d33aa2be
 Author: David S. Miller [EMAIL PROTECTED]
 Date:   Tue Feb 27 19:28:07 2007 -0800
 
 [TCP]: Kill fastpath_{skb,cnt}_hint.
 
 Now that we have per-skb fack_counts and an interval
 search mechanism for the retransmit queue, we don't
 need these things any more.
 
 Instead, as we traverse the SACK blocks to tag the
 queue, we use the RB tree to lookup the first SKB
 covering the SACK block by sequence number.
 
 Signed-off-by: David S. Miller [EMAIL PROTECTED]

I would be happy to see this go.  Have you tried this code with
a SACK DoS stream? I.e. before you could consume a huge amount
of cpu time by giving an certain bad sequence of SACK's. This
code should have a better worst case run time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.

2007-02-28 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Wed, 28 Feb 2007 13:08:08 -0800

 I would be happy to see this go.  Have you tried this code with
 a SACK DoS stream? I.e. before you could consume a huge amount
 of cpu time by giving an certain bad sequence of SACK's. This
 code should have a better worst case run time.

No I didn't do that.

In fact this new code wedged my workstation overnight somehow
and I need to debug that at some point as well. :-)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html