Re: [PATCH]NET: Add ECN support for TSO

2006-07-26 Thread Michael Chan
On Fri, 2006-07-14 at 12:12 -0400, Dan Reader wrote:

 If we implement the approach you suggest and all data sent uses TSO, the
 receiver can easily determine the expected codepoint of almost any
 isolated CE packet.  It simply has to look at the segment before and the
 segment after (which it can wait for, thanks to delayed ACKs).  If they
 have the same ECT value, use that one.  If they're different, then it
 just has to perform a check for non-MSS (i.e. the boundary between TSOs
 if send ops do not yield a multiple of MSS) and it might know.

I don't think it's that easy to guess.  The TSO sizes are almost always
multiples of MSS because they get trimmed or they are limited by the
congestion window.  The receiver will eventually make a mistake using
your scheme above that the sender can reliably detect.

I agree that replicating the ECT bits makes it less random, but it seems
to be the simplest and best approach.  In real life, there will be some
TSO and some non-TSO packets making it even more difficult to guess.
Randomizing the ECT bits in hardware makes verification of the nonce-sum
very complicated and unreliable.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH]NET: Add ECN support for TSO

2006-07-14 Thread Dan Reader


 My take is that the ECT code point should also be replicated on
 all TSO segments.  When RFC3540 is implemented, the stack will
 randomly use ECT(0) or ECT(1) on the TSO super segments.  On
 each of these super segments, the hardware will replicate the ECT
 code point.  This will keep everything stateless.  The stack
 only needs to know the TSO factor (number of divided segements)
 for each TSO super segement to keep track of the NS.

My concern with this approach is that it goes even farther from the
spirit of RFC3540 than our other proposals.  The intent of the RFC, as I
understand it, is to protect against misbehaving receivers by using a
parity code that the sender knows and the receiver can't reliably guess.
If a receiver gets a Congestion Experienced codepoint, it shouldn't be
able to determine what the original codepoint was and thereby ignore its
ECN requirements.

If we implement the approach you suggest and all data sent uses TSO, the
receiver can easily determine the expected codepoint of almost any
isolated CE packet.  It simply has to look at the segment before and the
segment after (which it can wait for, thanks to delayed ACKs).  If they
have the same ECT value, use that one.  If they're different, then it
just has to perform a check for non-MSS (i.e. the boundary between TSOs
if send ops do not yield a multiple of MSS) and it might know.

From my point of view, the one thing we have to do for TSO is make sure
that at least one packet per send op have a random codepoint that has no
greater chance of being guessed by the receiver than packets sent
without TSO.

To critique the other options, then, I see the following problems: Using
ECT(0) for all but one packet will mean that only one packet per TSO
can't be reliably guessed by the receiver.  Assuming all packets sent in
a TSO have an equal probability of being flagged with CE when it occurs,
a misbehaving receiver will have an increased ability to reliably ignore
and override CE that is in proportion to the number of packets generated
by a single TSO.  Using an adapter randomized value is better because
we're then generating a parity sum across all segments of the TSO.  It's
potential for compromise lies in the ability of the receiver to never
ACK the last received segment of a TSO.  If it ACKs other packets, the
sender wouldn't be able to check the sum...  I still prefer that
approach, though, because it yields more robustness if security
enhancements are built into the sender (e.g. occasionally send non-TSO
packets of MSS size or use TSO ops that yield an integral multiple of
MSS segments; in either case, the sender obfuscates the points where the
sum is being checked).

Yet another option would be to add a field to the TSO descriptor that
indicates a random segment on which to oppose the rule or achieve the
desired NS (e.g. when replicating, if all other composite segments are
using ECT(0) or ECT(1), set the opposite on the indicated segment).
Then, the sender will get a point per TSO where it can check the ACKed
NS, that can't be reliably guessed by the receiver...

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH]NET: Add ECN support for TSO

2006-07-13 Thread Ravinandan Arakali
Dave/Michael,
Replicating NS bit(from super segment) across all segments looks fine.

But one of the issues is the random/pseudo-random generation of
ECT code points on each of these segments. The hardware will need to
be capable of generating this, and I guess should be able to verify this
against the NS bit received as part of ACK for that packet.

Following are couple of schemes proposed by our team. Please comment.

Option A)

If we were to permit ourselves to somewhat break the spirit of RFC3540
without breaking the letter, we could come up a fairly easy enhancement
to TSO... I think it would be acceptable to set ECT(0) on all packets
except one (I would suggest the last, but an argument could be made for
the first).  That one would have either ECT(0) or ECT(1) set as per a
field in the TxD (for example).

That would give us a method that works with ECN nonces (ECT(0) doesn't
increment the sum).  Unfortunately, it would give us a relative increase
in the number of packets being sent with ECT(0) (the random generation
should see a 50-50 distribution between ECT (0) and (1); we would be
skewing it toward (0) by whatever the proportion of packets to TSO
operations is).  So, a connection using ECN nonces and TSO would be less
robust than one not using TSO.

But it wouldn't be broken...



Option B)

The hardware could randomly generate either ECN codepoint on all packets
of a TSO operation except the last.  It would keep a local NS value for
the operation and, in the last packet, set either ECT(0) or ECT(1) as
necessary to generate a NS value equal to that specified in the
descriptor.

That way we would keep a much more equal distribution.  It comes at the
cost of a random value generator in the hardware but we could get by
with something extremely basic (e.g. lsb of the internal clock at the
point the packet is generated) if perfect randomness is not required.

The limitation with this scheme is that the sender can't verify the NS
from any returned ACK that falls inside a TSO operation (it can only be
checked at TSO endpoints and on non-TSO transmissions).

Ravi


-Original Message-
From: David Miller [mailto:[EMAIL PROTECTED]
Sent: Saturday, July 08, 2006 1:32 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
netdev@vger.kernel.org
Subject: Re: [PATCH]NET: Add ECN support for TSO


From: Michael Chan [EMAIL PROTECTED]
Date: Fri, 7 Jul 2006 18:01:34 -0700

 However, Large Receive Offload will be a different story.  If
 packets are accumulated in the hardware and presented to the stack
 as one large packet, the stack will not be able to calculate the
 cumulative NS correctly.  Unless the hardware calculates the partial
 NS over the LRO packet and puts it in the SKB when handing over the
 packet.

This is correct, LRO hardware would need to do something to make sure
the nonce parity works out.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]NET: Add ECN support for TSO

2006-07-13 Thread Michael Chan

Ravinandan Arakali wrote:
 But one of the issues is the random/pseudo-random generation of
 ECT code points on each of these segments.

My take is that the ECT code point should also be replicated on
all TSO segments.  When RFC3540 is implemented, the stack will
randomly use ECT(0) or ECT(1) on the TSO super segments.  On
each of these super segments, the hardware will replicate the ECT
code point.  This will keep everything stateless.  The stack
only needs to know the TSO factor (number of divided segements)
for each TSO super segement to keep track of the NS.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]NET: Add ECN support for TSO

2006-07-13 Thread David Miller
From: Michael Chan [EMAIL PROTECTED]
Date: Thu, 13 Jul 2006 12:35:37 -0700

 When RFC3540 is implemented, the stack will randomly use ECT(0) or
 ECT(1) on the TSO super segments.  On each of these super segments,
 the hardware will replicate the ECT code point.  This will keep
 everything stateless.  The stack only needs to know the TSO factor
 (number of divided segements) for each TSO super segement to keep
 track of the NS.

Correct, and this is all transparent to the device as long
as it replicates the ECT bit in every sub-frame.

The only real issues are on receive for LRO where the parity
of NS needs to be maintained.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH]NET: Add ECN support for TSO

2006-07-12 Thread Ravinandan Arakali
Thanks.. I will get rid of the per-session check for ECN.

Ravi

-Original Message-
From: David Miller [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 11, 2006 11:12 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
netdev@vger.kernel.org
Subject: Re: [PATCH]NET: Add ECN support for TSO


From: Michael Chan [EMAIL PROTECTED]
Date: Tue, 11 Jul 2006 21:53:42 -0700

 There is no reason to find out if ECN is enabled or not for any
 TCP connections.  Hw just needs to watch the above bits in the
 incoming packets.

Correct, it can be done in a completely stateless manner.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH]NET: Add ECN support for TSO

2006-07-11 Thread Ravinandan Arakali
Michael/David,
Thanks for the comments on LRO. The current LRO code in S2io driver is not
aware of ECN. While I was trying to fix this, the first thing I encountered
was to check, in the driver, if ECN is enabled for current session. To do
this, I try to get hold of the socket by doing something like:

tk = tcp_sk(skb-sk);
if (tk-ecn_flags  TCP_ECN_OK)
   /* Check CE, ECE, CWR etc */

I find that skb-sk is NULL. Is this the correct way to check the
per-session
ECN capability ? Why is skb-sk NULL ?

Thanks,
Ravi

-Original Message-
From: David Miller [mailto:[EMAIL PROTECTED]
Sent: Saturday, July 08, 2006 1:32 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
netdev@vger.kernel.org
Subject: Re: [PATCH]NET: Add ECN support for TSO


From: Michael Chan [EMAIL PROTECTED]
Date: Fri, 7 Jul 2006 18:01:34 -0700

 However, Large Receive Offload will be a different story.  If
 packets are accumulated in the hardware and presented to the stack
 as one large packet, the stack will not be able to calculate the
 cumulative NS correctly.  Unless the hardware calculates the partial
 NS over the LRO packet and puts it in the SKB when handing over the
 packet.

This is correct, LRO hardware would need to do something to make sure
the nonce parity works out.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]NET: Add ECN support for TSO

2006-07-11 Thread David Miller
From: Ravinandan Arakali [EMAIL PROTECTED]
Date: Tue, 11 Jul 2006 18:45:48 -0700

 tk = tcp_sk(skb-sk);
 if (tk-ecn_flags  TCP_ECN_OK)
/* Check CE, ECE, CWR etc */
 
 I find that skb-sk is NULL. Is this the correct way to check the
 per-session
 ECN capability ? Why is skb-sk NULL ?

On receive?  There is no reason for skb-sk to be anything other than
NULL on receive, the networking stack hasn't even seen the packet yet.
Only the driver has seen the skb.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]NET: Add ECN support for TSO

2006-07-11 Thread Michael Chan

David Miller wrote:

 On receive?  There is no reason for skb-sk to be anything other than
 NULL on receive, the networking stack hasn't even seen the packet yet.
 Only the driver has seen the skb.
 
 

Yeah, here's roughly how LRO should work for ECN:

If CE, ECE, or CWR is set on an incoming packet, hw should stop LRO
on that TCP stream and indicate the packet(s) to the driver right
away.

In the future if RFC 3540 is implemented, hw needs to do the above
and keep track of ECT(0) and ECT(1) on incoming packets.  It needs
to do a partial sum or parity on the ECT codes and give it to the
driver when an LRO packet is ready.

There is no reason to find out if ECN is enabled or not for any
TCP connections.  Hw just needs to watch the above bits in the
incoming packets.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]NET: Add ECN support for TSO

2006-07-08 Thread David Miller
From: Michael Chan [EMAIL PROTECTED]
Date: Fri, 7 Jul 2006 18:01:34 -0700

 However, Large Receive Offload will be a different story.  If
 packets are accumulated in the hardware and presented to the stack
 as one large packet, the stack will not be able to calculate the
 cumulative NS correctly.  Unless the hardware calculates the partial
 NS over the LRO packet and puts it in the SKB when handing over the
 packet.

This is correct, LRO hardware would need to do something to make sure
the nonce parity works out.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH]NET: Add ECN support for TSO

2006-07-07 Thread Ravinandan Arakali
Michael,
Are network cards expected to be aware-of and implement RFC3540(ECN with
nonces) ?

Thanks,
Ravi

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Michael Chan
Sent: Tuesday, June 27, 2006 8:07 PM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Cc: netdev@vger.kernel.org
Subject: [PATCH]NET: Add ECN support for TSO


In the current TSO implementation, NETIF_F_TSO and ECN cannot be
turned on together in a TCP connection.  The problem is that most
hardware that supports TSO does not handle CWR correctly if it is set
in the TSO packet.  Correct handling requires CWR to be set in the
first packet only if it is set in the TSO header.

This patch adds the ability to turn on NETIF_F_TSO and ECN using
GSO if necessary to handle TSO packets with CWR set.  Hardware
that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev-
features flag.

All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set.  If
the output device does not have the NETIF_F_TSO_ECN feature set, GSO
will split the packet up correctly with CWR only set in the first
segment.

It is further assumed that all hardware will handle ECE properly by
replicating the ECE flag in all segments.  If that is not the case, a
simple extension of the logic will be required.


Signed-off-by: Michael Chan [EMAIL PROTECTED]



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]NET: Add ECN support for TSO

2006-07-07 Thread David Miller
From: Michael Chan [EMAIL PROTECTED]
Date: Fri, 7 Jul 2006 13:57:11 -0700

 RFC3540 is still experimental I believe and it is not implemented
 in the Linux stack.  Even if it is implemented and the NS bit is set,
 I think the TSO logic can simply replicate the NS bit as each TSO
 divided segment contains the same acknowledgement information from
 the original super segment.
 
 We should hear Herbert, David, or others' opinions on this.

I'm studying the RFC right now just to make sure your assesment is
accurate.  It probably is :)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]NET: Add ECN support for TSO

2006-07-07 Thread David Miller
From: David Miller [EMAIL PROTECTED]
Date: Fri, 07 Jul 2006 14:59:32 -0700 (PDT)

 From: Michael Chan [EMAIL PROTECTED]
 Date: Fri, 7 Jul 2006 13:57:11 -0700
 
  RFC3540 is still experimental I believe and it is not implemented
  in the Linux stack.  Even if it is implemented and the NS bit is set,
  I think the TSO logic can simply replicate the NS bit as each TSO
  divided segment contains the same acknowledgement information from
  the original super segment.
  
  We should hear Herbert, David, or others' opinions on this.
 
 I'm studying the RFC right now just to make sure your assesment is
 accurate.  It probably is :)

Ok, this is correct.  TSO should simply replicate the NS bit
because the cumulative ACK is going to be the same in each
and every packet emitted.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]NET: Add ECN support for TSO

2006-06-29 Thread David Miller
From: Michael Chan [EMAIL PROTECTED]
Date: Tue, 27 Jun 2006 21:37:01 -0700

 [NET]: Add ECN support for TSO

Applied, thanks a lot.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]NET: Add ECN support for TSO

2006-06-27 Thread Michael Chan
In the current TSO implementation, NETIF_F_TSO and ECN cannot be
turned on together in a TCP connection.  The problem is that most
hardware that supports TSO does not handle CWR correctly if it is set
in the TSO packet.  Correct handling requires CWR to be set in the
first packet only if it is set in the TSO header.

This patch adds the ability to turn on NETIF_F_TSO and ECN using 
GSO if necessary to handle TSO packets with CWR set.  Hardware
that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev-
features flag.

All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set.  If
the output device does not have the NETIF_F_TSO_ECN feature set, GSO
will split the packet up correctly with CWR only set in the first
segment.

It is further assumed that all hardware will handle ECE properly by
replicating the ECE flag in all segments.  If that is not the case, a
simple extension of the logic will be required.


Signed-off-by: Michael Chan [EMAIL PROTECTED]


diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index efd1e2a..f393de2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -316,6 +316,7 @@ struct net_device
 #define NETIF_F_TSO(SKB_GSO_TCPV4  NETIF_F_GSO_SHIFT)
 #define NETIF_F_UFO(SKB_GSO_UDPV4  NETIF_F_GSO_SHIFT)
 #define NETIF_F_GSO_ROBUST (SKB_GSO_DODGY  NETIF_F_GSO_SHIFT)
+#define NETIF_F_TSO_ECN(SKB_GSO_TCPV4_ECN  NETIF_F_GSO_SHIFT)
 
 #define NETIF_F_GEN_CSUM   (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
 #define NETIF_F_ALL_CSUM   (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
@@ -1002,6 +1003,11 @@ static inline int netif_needs_gso(struct
return !skb_gso_ok(skb, dev-features);
 }
 
+static inline int tso_ecn_capable(unsigned long features)
+{
+   return ((features  NETIF_F_GSO) || (features  NETIF_F_TSO_ECN));
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _LINUX_DEV_H */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 5fb72da..e74c294 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -175,6 +175,9 @@ enum {
 
/* This indicates the skb is from an untrusted source. */
SKB_GSO_DODGY = 1  2,
+
+   /* This indicates the tcp segment has CWR set. */
+   SKB_GSO_TCPV4_ECN = 1  3,
 };
 
 /** 
diff --git a/include/net/sock.h b/include/net/sock.h
index 2d8d6ad..2c75172 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1033,7 +1033,8 @@ static inline void sk_setup_caps(struct 
if (sk-sk_route_caps  NETIF_F_GSO)
sk-sk_route_caps |= NETIF_F_TSO;
if (sk-sk_route_caps  NETIF_F_TSO) {
-   if (sock_flag(sk, SOCK_NO_LARGESEND) || dst-header_len)
+   if ((sock_flag(sk, SOCK_NO_LARGESEND) 
+   !tso_ecn_capable(sk-sk_route_caps)) || dst-header_len)
sk-sk_route_caps = ~NETIF_F_TSO;
else 
sk-sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index c6b8439..871dca2 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -31,7 +31,8 @@ static inline void TCP_ECN_send_syn(stru
struct sk_buff *skb)
 {
tp-ecn_flags = 0;
-   if (sysctl_tcp_ecn  !(sk-sk_route_caps  NETIF_F_TSO)) {
+   if (sysctl_tcp_ecn  (!(sk-sk_route_caps  NETIF_F_TSO) ||
+  tso_ecn_capable(sk-sk_route_caps))) {
TCP_SKB_CB(skb)-flags |= TCPCB_FLAG_ECE|TCPCB_FLAG_CWR;
tp-ecn_flags = TCP_ECN_OK;
sock_set_flag(sk, SOCK_NO_LARGESEND);
@@ -56,6 +57,9 @@ static inline void TCP_ECN_send(struct s
if (tp-ecn_flagsTCP_ECN_QUEUE_CWR) {
tp-ecn_flags = ~TCP_ECN_QUEUE_CWR;
skb-h.th-cwr = 1;
+   if (skb_shinfo(skb)-gso_type  SKB_GSO_TCPV4)
+   skb_shinfo(skb)-gso_type |=
+   SKB_GSO_TCPV4_ECN;
}
} else {
/* ACK or retransmitted segment: clear ECT|CE */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index bdd71db..c4a4dba 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2044,7 +2044,8 @@ struct sk_buff * tcp_make_synack(struct 
memset(th, 0, sizeof(struct tcphdr));
th-syn = 1;
th-ack = 1;
-   if (dst-dev-featuresNETIF_F_TSO)
+   if ((dst-dev-features  NETIF_F_TSO) 
+   !tso_ecn_capable(dst-dev-features))
ireq-ecn_ok = 0;
TCP_ECN_make_synack(req, th);
th-source = inet_sk(sk)-sport;


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]NET: Add ECN support for TSO

2006-06-27 Thread Herbert Xu
On Tue, Jun 27, 2006 at 08:06:47PM -0700, Michael Chan wrote:

 diff --git a/include/net/sock.h b/include/net/sock.h
 index 2d8d6ad..2c75172 100644
 --- a/include/net/sock.h
 +++ b/include/net/sock.h
 @@ -1033,7 +1033,8 @@ static inline void sk_setup_caps(struct 
   if (sk-sk_route_caps  NETIF_F_GSO)
   sk-sk_route_caps |= NETIF_F_TSO;
   if (sk-sk_route_caps  NETIF_F_TSO) {
 - if (sock_flag(sk, SOCK_NO_LARGESEND) || dst-header_len)
 + if ((sock_flag(sk, SOCK_NO_LARGESEND) 
 + !tso_ecn_capable(sk-sk_route_caps)) || dst-header_len)
   sk-sk_route_caps = ~NETIF_F_TSO;

Why turn it off? With GSO in place the stack will handle it just fine
(even your description says so :)  We should instead remove all code
that turns off TSO/ECN when the other is present.

Otherwise the patch looks good.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]NET: Add ECN support for TSO

2006-06-27 Thread Herbert Xu
On Tue, Jun 27, 2006 at 08:40:34PM -0700, Michael Chan wrote:

 We need to turn off NETIF_F_TSO for a connection that has negotiated to
 turn on ECN if the output device cannot handle TSO and ECN.  In other
 words, if the output device does not have either GSO or TSO_ECN feature
 set.

I think you're mixing up GSO the mechanism with GSO the flag.  The GSO
flag simply tells the TCP stack whether TSO should be used or not, even
if the hardware does not support TSO at all.  The GSO mechanism on the
other hand is ALWAYS present.  So regardless of the presence of the GSO
flag, you can always rely on the GSO mechanism to pick up the pieces (or
rather generate the pieces as the case may be :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]NET: Add ECN support for TSO

2006-06-27 Thread Michael Chan
On Wed, 2006-06-28 at 13:48 +1000, Herbert Xu wrote:

 I think you're mixing up GSO the mechanism with GSO the flag.  The GSO
 flag simply tells the TCP stack whether TSO should be used or not, even
 if the hardware does not support TSO at all.  The GSO mechanism on the
 other hand is ALWAYS present.  So regardless of the presence of the GSO
 flag, you can always rely on the GSO mechanism to pick up the pieces (or
 rather generate the pieces as the case may be :)
 
Thanks, that was my confusion.  Here's the revised patch:

[NET]: Add ECN support for TSO

In the current TSO implementation, NETIF_F_TSO and ECN cannot be
turned on together in a TCP connection.  The problem is that most
hardware that supports TSO does not handle CWR correctly if it is set
in the TSO packet.  Correct handling requires CWR to be set in the
first packet only if it is set in the TSO header.

This patch adds the ability to turn on NETIF_F_TSO and ECN using 
GSO if necessary to handle TSO packets with CWR set.  Hardware
that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev-
features flag.

All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set.  If
the output device does not have the NETIF_F_TSO_ECN feature set, GSO
will split the packet up correctly with CWR only set in the first
segment.

With help from Herbert Xu [EMAIL PROTECTED].

Since ECN can always be enabled with TSO, the SOCK_NO_LARGESEND sock
flag is completely removed.

Signed-off-by: Michael Chan [EMAIL PROTECTED]


diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 84b0f0d..a42a9f4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -316,6 +316,7 @@ struct net_device
 #define NETIF_F_TSO(SKB_GSO_TCPV4  NETIF_F_GSO_SHIFT)
 #define NETIF_F_UFO(SKB_GSO_UDPV4  NETIF_F_GSO_SHIFT)
 #define NETIF_F_GSO_ROBUST (SKB_GSO_DODGY  NETIF_F_GSO_SHIFT)
+#define NETIF_F_TSO_ECN(SKB_GSO_TCPV4_ECN  NETIF_F_GSO_SHIFT)
 
 #define NETIF_F_GEN_CSUM   (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
 #define NETIF_F_ALL_CSUM   (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 5fb72da..e74c294 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -175,6 +175,9 @@ enum {
 
/* This indicates the skb is from an untrusted source. */
SKB_GSO_DODGY = 1  2,
+
+   /* This indicates the tcp segment has CWR set. */
+   SKB_GSO_TCPV4_ECN = 1  3,
 };
 
 /** 
diff --git a/include/net/sock.h b/include/net/sock.h
index 2d8d6ad..7136bae 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -383,7 +383,6 @@ enum sock_flags {
SOCK_USE_WRITE_QUEUE, /* whether to call sk-sk_write_space in 
sock_wfree */
SOCK_DBG, /* %SO_DEBUG setting */
SOCK_RCVTSTAMP, /* %SO_TIMESTAMP setting */
-   SOCK_NO_LARGESEND, /* whether to sent large segments or not */
SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */
SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */
 };
@@ -1033,7 +1032,7 @@ static inline void sk_setup_caps(struct 
if (sk-sk_route_caps  NETIF_F_GSO)
sk-sk_route_caps |= NETIF_F_TSO;
if (sk-sk_route_caps  NETIF_F_TSO) {
-   if (sock_flag(sk, SOCK_NO_LARGESEND) || dst-header_len)
+   if (dst-header_len)
sk-sk_route_caps = ~NETIF_F_TSO;
else 
sk-sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index c6b8439..7bb366f 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -31,10 +31,9 @@ static inline void TCP_ECN_send_syn(stru
struct sk_buff *skb)
 {
tp-ecn_flags = 0;
-   if (sysctl_tcp_ecn  !(sk-sk_route_caps  NETIF_F_TSO)) {
+   if (sysctl_tcp_ecn) {
TCP_SKB_CB(skb)-flags |= TCPCB_FLAG_ECE|TCPCB_FLAG_CWR;
tp-ecn_flags = TCP_ECN_OK;
-   sock_set_flag(sk, SOCK_NO_LARGESEND);
}
 }
 
@@ -56,6 +55,9 @@ static inline void TCP_ECN_send(struct s
if (tp-ecn_flagsTCP_ECN_QUEUE_CWR) {
tp-ecn_flags = ~TCP_ECN_QUEUE_CWR;
skb-h.th-cwr = 1;
+   if (skb_shinfo(skb)-gso_type  SKB_GSO_TCPV4)
+   skb_shinfo(skb)-gso_type |=
+   SKB_GSO_TCPV4_ECN;
}
} else {
/* ACK or retransmitted segment: clear ECT|CE */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 94fe5b1..7fa0b4a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4178,8 +4178,6 @@ static int tcp_rcv_synsent_state_process
 */
 
TCP_ECN_rcv_synack(tp, th);
-   if (tp-ecn_flagsTCP_ECN_OK

Re: [PATCH]NET: Add ECN support for TSO

2006-06-27 Thread Herbert Xu
On Tue, Jun 27, 2006 at 09:37:01PM -0700, Michael Chan wrote:
 
 Signed-off-by: Michael Chan [EMAIL PROTECTED]

Looks good to me too!

 @@ -56,6 +55,9 @@ static inline void TCP_ECN_send(struct s
   if (tp-ecn_flagsTCP_ECN_QUEUE_CWR) {
   tp-ecn_flags = ~TCP_ECN_QUEUE_CWR;
   skb-h.th-cwr = 1;
 + if (skb_shinfo(skb)-gso_type  SKB_GSO_TCPV4)
 + skb_shinfo(skb)-gso_type |=
 + SKB_GSO_TCPV4_ECN;

As a byte-pincher I must suggest that you turn this check into something
like 

if (skb_shinfo(skb)-gso_type)

or even

if (skb_shinfo(skb)-gso_size)

:)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]NET: Add ECN support for TSO

2006-06-27 Thread Michael Chan
On Wed, 2006-06-28 at 14:42 +1000, Herbert Xu wrote:
 On Tue, Jun 27, 2006 at 09:37:01PM -0700, Michael Chan wrote:

  @@ -56,6 +55,9 @@ static inline void TCP_ECN_send(struct s
  if (tp-ecn_flagsTCP_ECN_QUEUE_CWR) {
  tp-ecn_flags = ~TCP_ECN_QUEUE_CWR;
  skb-h.th-cwr = 1;
  +   if (skb_shinfo(skb)-gso_type  SKB_GSO_TCPV4)
  +   skb_shinfo(skb)-gso_type |=
  +   SKB_GSO_TCPV4_ECN;
 
 As a byte-pincher I must suggest that you turn this check into something
 like 
 
   if (skb_shinfo(skb)-gso_type)
 
 or even
 
   if (skb_shinfo(skb)-gso_size)
 
Assuming that we'll later have GSO_TCPV6, isn't it better to check for
TCPV4 explicitly now?  Or just change it later when necessary.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]NET: Add ECN support for TSO

2006-06-27 Thread Herbert Xu
On Tue, Jun 27, 2006 at 09:54:39PM -0700, Michael Chan wrote:

 Assuming that we'll later have GSO_TCPV6, isn't it better to check for
 TCPV4 explicitly now?  Or just change it later when necessary.

Good point, I suppose you never know whether a V6 TSO-capable card is going
to handle ECN correctly in both cases.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html