Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-21 Thread Wael Noureddine

Christoph Lameter wrote:

On Sat, 20 Aug 2005, David S. Miller wrote:

But by in large, if a stateless alternative ever exists to
get the same performance benefit as TOE, it will undoubtedly
be preferred by the Linux networking maintainers, by in large.
So you TOE guys are fighting more than an uphill battle.


It does not exist today AFAIK. The hope of such a solution will prevent 
the inclusion of TOE technology that exists today?



TOE has a solid track record of being a point-in-time solution with 
decreased features and increased maintenance headaches.


This implies that there have been many TOEs in circulation in the past, can 
you give a list of the ones you've had experience with and explain how 
maintaining them has been a headache? We could definitely learn from past 
mistakes. 


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-21 Thread David S. Miller
From: Wael Noureddine [EMAIL PROTECTED]
Date: Sun, 21 Aug 2005 00:17:17 -0700

 How do you intend on avoiding huge stretch ACKs?

The implication is that stretch ACKs are bad, which is wrong.
Oh yes, that's right, you're the same person who earlier in this
thread tried to teach us that bursty TCPs are non-standard :-)

Stretch ACKs are actually a positive thing on a healthy connection and
do indeed help the sender.  And when loss events occur, LRO stops
immediately and delivers the packets as-is so that loss information
via ACKs with SACK blocks can immediately make their way to the
sender.

Linux does actually currently generate stretch ACKs, when beneficial.

What happens today, due mostly to interrupt mitigation, is that the
stack processes many consequetive packets, and spits out a ton of ACKs
one after another.  That's actually bad.  LRO will cause us to instead
do the right thing, which for a healthy connection is to scale the ACK
response rate to match the interarrival rate of data.

Making the sender process a lot of ACKs is bad, because those are
cycles that could be used to do the context switch that gets the
sender back onto the cpu to fill the send buffer.  What happens right
now is that the first ACK wakes up the task, then as we try to
schedule the process we're inundated with processing the subsequent
data packets and spitting the ACKs out, to the point where it takes
longer than necessary to get the process onto the cpu to fill the
send buffer.

I traced this extensively while working on our TSO implementation.

 This is again assuming TOE cannot implement these features. Users should 
 decide which features they care about, and if TOE doesn't have them then it 
 won't be a viable alternative for them. 

You're not going to replicate the entire Linux TCP stack onto your
card.  TOE is not a viable alternative for anyone who wants to do
anything out of the limited scope of features you'll have in your TOE
stack.

This means if we find an incredible new congestion control algorithm,
the TOE setups won't get it.  This means if the only way to work
around a security hole is to enable some netfilter rule until a better
fix exists, user's either stay vulnerable or lose TOE.

It's all a lose-lose situation for the user, and we're not going
to fall down that slipperly slope if there is anything I can do
about it.

But we don't need the limitation _at all_ with stateless offload
solutions.  That's our _whole point_.  Why do X with limitations
if we don't need do?

But I'm personally sick of talking about why TOE is so bad, and I'm
going to ignore those parts of this thread in the future, and instead
I'm going to participate in the discussions that will actually go
somewhere.  Those are the ones about getting a good LRO implementation
going and into the Linux networking stack.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-21 Thread David S. Miller
From: Wael Noureddine [EMAIL PROTECTED]
Date: Sun, 21 Aug 2005 00:54:51 -0700

  You could also tweak the LRO timeout in a similar fashion based upon
  traffic patterns as well.  In fact, extremely sophisticated things can
  be done here to deal with the LRO timing as seen on WAN vs. LAN
  streams.
 
 The accurate statement is extremely complicated things need to be done here 
 to deal with the LRO timing as seen on WAN vs. LAN streams. Not to mention 
 dealing with retransmissions and the dynamics of congestion control.

LRO will just stop accumulating when out-of-sequence data arrives.
Nothing complicated at all.

And that's _EXACTLY_ what we want to happen.  We want Linux's TCP loss
response algorithms to take care of things, which have been
extensively tuned over many many years and gets several orders of
magnitude more testing and exposure than any customized stack you guys
put onto a network card.

The LRO timing is not complicated, the packet limit is simply a
linearly increasing value that just makes sure that it's always
less than or equal to whatever the congestion window happens to
be at that moment.  It cares not about the exact value.

LRO will work, and it's the negative attitude of the TOE folks that
inspires me to want to help out the LRO folks and ignore the TOE mania
altogether.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-21 Thread Wael Noureddine

LRO will just stop accumulating when out-of-sequence data arrives.
Nothing complicated at all.


Unless the NIC keeps state, it is not always able to know if data is out of 
sequence.



The LRO timing is not complicated, the packet limit is simply a
linearly increasing value that just makes sure that it's always
less than or equal to whatever the congestion window happens to
be at that moment.  It cares not about the exact value.


Yes that is true. However, the congestion window is not known on the 
receiving end.



LRO will work, and it's the negative attitude of the TOE folks that
inspires me to want to help out the LRO folks and ignore the TOE mania
altogether.


Yes, LRO will need serious help to be more than a benchmarking tool. What 
consititutes negative attitude? The point here is that we need an objective 
approach to the two options. There is no reason that they both can't go in, 
especially that LSO/LRO are not as simple and non-intruisive as it may 
appear from their basic description, and so far cannot match the TOE 
performance. Let the users decide. 


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hardware assisted SACK processing (was: [PATCH] TCP Offload (TOE) - Chelsio)

2005-08-21 Thread Baruch Even
David S. Miller wrote:
 From: Wael Noureddine [EMAIL PROTECTED]
 Date: Sun, 21 Aug 2005 00:54:51 -0700
 
 
You could also tweak the LRO timeout in a similar fashion based upon
traffic patterns as well.  In fact, extremely sophisticated things can
be done here to deal with the LRO timing as seen on WAN vs. LAN
streams.

The accurate statement is extremely complicated things need to be done here 
to deal with the LRO timing as seen on WAN vs. LAN streams. Not to mention 
dealing with retransmissions and the dynamics of congestion control.
 
 
 LRO will just stop accumulating when out-of-sequence data arrives.
 Nothing complicated at all.
 
 And that's _EXACTLY_ what we want to happen.  We want Linux's TCP loss
 response algorithms to take care of things, which have been
 extensively tuned over many many years and gets several orders of
 magnitude more testing and exposure than any customized stack you guys
 put onto a network card.

Actually, at high speeds SACK processing becomes a huge bottleneck by
itself. If we could have some help from the hardware with pruning some
of the trivial cases it would help, I guess.

One thing I can think of and which I implemented in software is a SACK
cache feature where at high speeds kicks in and starts processing SACKs
every 16 packets (exact parameters to be researched). This was shown to
increase performance and eliminate stalls that happen otherwise.

In this niche a nic that will understand the SACKs, and batch them for
processing if they are just the common case of additional x packets
added to latest SACK block would help in this regard. Reducing quite a
bit of work the CPU needs to do otherwise.

We still have the full Linux stack processing and reacting to the losses
 and doing the retransmits, we get the hardware to batch some of the
work for us. This needs to be host assisted since we don't want this at
the early stages of the connection, or for slow connections.

Baruch
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-21 Thread Leonid Grossman
 

 -Original Message-
 From: Christoph Lameter [mailto:[EMAIL PROTECTED] 
 Sent: Sunday, August 21, 2005 9:28 AM
 To: David S. Miller
 Cc: [EMAIL PROTECTED]; Leonid Grossman; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: [PATCH] TCP Offload (TOE) - Chelsio
 

 And -by the way- it seems that LRO is patented. Hope you made 
 arrangements for Linux to use that technology? Are others 
 vendors allowed to implement LRO in their hardware?
 

Ahh, I was curious to see if someone will bring this argument up - in
fact, LRO legal issues do not exist, while TOE legal issues are quite
big at the moment. I guess this is one of the reasons why OpenRDMA and
other mainstream industry efforts don't have any provisions for TOE
support.

As I mentioned in Ottawa, there is indeed a patent application filed
about a year ago for Neterion basic LRO implementation.

Linux doesn't need any arrangements to support basic LRO - it will work
in Linux today without any OS changes.
All it needs from the stack is the ability to accept chained skb that is
bigger than advetrized MTU, and this works in Linux stack already. 

Potential TCP loss response algorithm and other changes that David is
talking about will be beneficial, and these are obviously not covered by
the Neterion application.

Anyways, since the application is not granted yet it's probably too
early to discuss it's future - but if any vendor wants to have peace of
mind, we can talk and get this out of the way; we are obviously
motivated to make LRO a de-facto NIC feature (much like TSO and other
stateless offloads have become).


Unlike LRO, TOE is covered by number of existing patents and faces
fundamental legal challenges as we speak, for both OS vendors and IHVs -
as recent Alacritech/Microsoft/Broadcom lawsuit and settlement just
clearly demonstrated :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Stretch ACKs (was: [PATCH] TCP Offload (TOE) - Chelsio)

2005-08-21 Thread Baruch Even
David S. Miller wrote:
 From: Wael Noureddine [EMAIL PROTECTED]
 Date: Sun, 21 Aug 2005 00:17:17 -0700
 
 
How do you intend on avoiding huge stretch ACKs?
 
 
 The implication is that stretch ACKs are bad, which is wrong.
 Oh yes, that's right, you're the same person who earlier in this
 thread tried to teach us that bursty TCPs are non-standard :-)
 
 Stretch ACKs are actually a positive thing on a healthy connection and
 do indeed help the sender.  And when loss events occur, LRO stops
 immediately and delivers the packets as-is so that loss information
 via ACKs with SACK blocks can immediately make their way to the
 sender.
 
 Linux does actually currently generate stretch ACKs, when beneficial.

I do notice that on my own tests, I'm seeing stretch acks of 7 and 8
packets quite often. Is there any intention to add ABC (Accurate Byte
Counting) to Linux to offset the effects this has on the cwnd growth?

I haven't seen anything critical happening because of this, but it
definitely changes the way TCP behaves.

Baruch
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-21 Thread Christoph Lameter
On Sun, 21 Aug 2005, David S. Miller wrote:

 LRO will work, and it's the negative attitude of the TOE folks that
 inspires me to want to help out the LRO folks and ignore the TOE mania
 altogether.

Dave you critized the black and white attitude before. It seems that you 
are the only one in this discussion that has this problem. We just 
discussed the negative aspects of LSO/LRO and then get accused of being 
maniac. The main problem is the categorical NO to TOE that we keep 
hearing. We can do LRO no problem.

Why is it so difficult to allow the TCP layer to support the offload 
capabilities?

And -by the way- it seems that LRO is patented. Hope you made arrangements 
for Linux to use that technology? Are others vendors allowed to implement 
LRO in their hardware?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-21 Thread Wael Noureddine

All other things being equal, it is better not to put packets into the
network faster than it can drain them out.  Large bursts increase delay
variation, and increase the probability that two or more packets in a
connection will be dropped within an RTT (not every box is implementing
AQM yet).  New 10Gig-switch-on-a-chip devices like Fujitsu's MB87Q3140
(http://www.fujitsu.com/us/services/edevices/microelectronics/networkingassps/mb87q3140/) 
have only limited on-chip buffer memory.  So transmit packet pacing is 
preferred; see 
http://yuba.stanford.edu/~yganjali/research/publications/Very-Small-Buffers-CCR.pdf

for further arguments.


True. This is the first thing we learnt when dealing with 10Gbps.

Chelsio's TOE is capable of micro-second granularity pacing and that has 
been crucial for high performance. Even then, it's very hard to fully avoid 
packet loss. Hardware TCP and retransmission straight from the NIC turn out 
to be very helpful in quickly recovering from loss. In fact. even interrupt 
moderation delays show up in degraded performance. 


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-21 Thread Guy Thornley
On Fri, Aug 19, 2005 at 07:05:06PM +0200, Andi Kleen wrote:
  Right. The other issue with jumbos frames (9000MTU) is that
  the allocation needed is just over 2 pages for 4K page size
  machines (common case). 3 page contig allocations tend to fail
  once a server is heavily loaded and memory gets fragmented.
 
 That's just a driver bug. The driver should be splitting up the
 buffers into page sized chunks. TX does that already, but
 for RX the driver needs to do it.

The problem with the e1000 driver in this regard is the following:

1. It internally rounds the Rx buffer up to the next power of 2
   So your 9k MTU request just turned into a 16k (16384) allocation

2. alloc_skb() allocates size+sizeof(struct skb_shared_info) bytes
   using kmalloc(), which bumps the size over the power-of-2 boundary

3. kmalloc() rounds this up to the next power of 2, and you end up with
   a 32k (yes, a 32768 byte) GFP_ATOMIC allocation request.

I tested the version in 2.4.26 (version 5.2.30.1-k1) and the one from
Intel's webpage (6.1.16) on 2.4.26.

Is there any plans to fix this?

Does the 2.4 kernel support chained sk_buff's along the receive path? (This
is news to me!) Is there any driver that does this, that can be read as an
instructive example?

Thanks,
Guy Thornley
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-21 Thread David S. Miller
From: Guy Thornley [EMAIL PROTECTED]
Date: Mon, 22 Aug 2005 11:06:13 +1200

 Does the 2.4 kernel support chained sk_buff's along the receive path? (This
 is news to me!) Is there any driver that does this, that can be read as an
 instructive example?

It does, just that no driver takes advantage of this yet.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-20 Thread David S. Miller
From: Christoph Lameter [EMAIL PROTECTED]
Date: Sat, 20 Aug 2005 10:57:51 -0700 (PDT)

 We are discussing something that is not useful for todays network load 
 and not standardized. TOE is the only answer to offloading transfers of 
 data encountered in contemporary networks.

It is talk like this that makes me want to not participate in such
threads  TOE is the only...  please, spare me the unary view
of the world ok?

Here is one idea.  Do a reverse LSO, have a dynamic cache on the
network card watching saddr/daddr/sport/dport flows, and accumulate
as many in-order TCP packets as possible into one large R-LSO frame.
This accumulation is timed out by a length and time parameter
programmable in the chip, just like HW interrupt mitigation is.

Then the stack receives these (up to 64K) frames.

This is the kind of discussion of alternative ideas I am _NOT_
seeing.  Which shows how blinded people are to alternatives to
TOE.

Christoph, you're a really bright guy, perhaps you can sit and come up
with some other ideas which would act as stateless alternatives to
TOE?  I bet you can do it, if you would simply try...
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-20 Thread John Heffner

On Aug 20, 2005, at 1:57 PM, Christoph Lameter wrote:


On Fri, 19 Aug 2005, Andi Kleen wrote:

Hmm - but is a 9k or 16k packet on the wire not equivalent to a micro 
burst?
(actually it is not that micro compared to 1.5k packets). At least 
against
burstiness they don't help and make things even worse because the 
bursts

cannot be split up anymore.


9k or 16k is just a very puny little size these days. That will not
give anyone much benefit. We need to be able to transfer large amounts
of data. This is going to be measured in megabytes or gigabytes of data
and not in kilobytes.


I think you are really missing something fundamental here.  The 
processing costs for TCP can be split in to per-byte costs and 
per-packet costs.  (This is a slight oversimplification, but good 
enough for this discussion.)  The per-packet costs include things like 
memory allocation, protocol processing, and device interrupt handling.  
The per-byte costs are bus transfer times, calculating checksums, etc.  
Using larger packets helps reduce the overall per-packet costs.  There 
are diminishing returns here in using larger packets, and the marginal 
benefit of using packet much larger than 16k isn't all that great.  (As 
an aside, there are other benefits to using larger packets than just 
processing speed.  For example, it makes congestion control easier.  
But that's another discussion.)


TSO and TOE both help significantly with the per-packet costs.  They 
are effectively equivalent here to using larger packets.  Doing 
zero-copy and checksum offloading helps with the per-byte costs, and is 
possible today with stock Linux, and I believe most TOE implementations 
do.  But TOE and TSO in and of themselves *do not* help with the 
per-byte costs.  TOE currently has an advantage over TSO because it 
reduces the receive path costs in both ack and data processing.




Moreover jumbo packets essentially not standardized and most network
devices switch jumbo packets off or do not support it because of that
fact. Devices that send jumbo frames may cause other devices on
the network to malfunction.


This is certainly a concern.  Fixing these issues IMHO is globally more 
important (and architecturally more desirable) than TOEs.  Some may 
disagree. :-)




We are discussing something that is not useful for todays network load
and not standardized. TOE is the only answer to offloading transfers of
data encountered in contemporary networks.


Most people who are involved in this area see multiple solutions with 
different trade-offs.  TOE is just one option.


  -John

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-20 Thread Leonid Grossman

 Here is one idea.  Do a reverse LSO, have a dynamic cache on 
 the network card watching saddr/daddr/sport/dport flows, and 
 accumulate as many in-order TCP packets as possible into one 
 large R-LSO frame.
 This accumulation is timed out by a length and time parameter 
 programmable in the chip, just like HW interrupt mitigation is.
 
 Then the stack receives these (up to 64K) frames.
 
 This is the kind of discussion of alternative ideas I am 
 _NOT_ seeing.  Which shows how blinded people are to 
 alternatives to TOE.

Number of R-LSO (we call it LRO) hw assists is actually shipping today
in our 10GbE ASIC.
We will submit an LRO driver patch at some point - although MSI-X and
Receive Traffic Hashing driver patches will take precedence.
BTW any comments on the LRO algorithm in my OLS slides are most welcome;
we are looking to extend the implementation in the next ASIC.

 
 Christoph, you're a really bright guy, perhaps you can sit 
 and come up with some other ideas which would act as 
 stateless alternatives to TOE?  I bet you can do it, if you 
 would simply try...
 -
 To unsubscribe from this list: send the line unsubscribe 
 netdev in the body of a message to [EMAIL PROTECTED] 
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-20 Thread Christoph Lameter
On Sat, 20 Aug 2005, David S. Miller wrote:

 Christoph, you're a really bright guy, perhaps you can sit and come up
 with some other ideas which would act as stateless alternatives to
 TOE?  I bet you can do it, if you would simply try...

I worked through the alternatives last year including some of the large 
packet tricks that are not really standard conformant. None of these
was really satisfactory and I ended up wasting a lot of my consulting 
time for a vendor. They finally gave up on jumbo packets that I used to 
favor.

The basic issue is the fundamental design of the TCP layer. If we could 
redesign that layer then we may be able to come up with a stateless 
protocol but we are condemmned to follow TCP standard. Offload technology 
is inevitable as far as I can see and Chelsio has the most innovative 
design in harmony with Linux design principles that I know of. They 
have developed an API that will also allow other vendors to hook into our 
tcp layer. Lets at least give this a try. They are willing to commit 
resources to get this going.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-20 Thread Wael Noureddine
 TSO and TOE both help significantly with the per-packet costs.  They
are 
 effectively equivalent here to using larger packets.  Doing zero-copy
and 
 checksum offloading helps with the per-byte costs, and is possible
today 
 with stock Linux, and I believe most TOE implementations do.  But TOE
and 
 TSO in and of themselves *do not* help with the per-byte costs.  TOE 
 currently has an advantage over TSO because it reduces the receive
path 
 costs in both ack and data processing.

All good points. However, unlike LRO, TOE actually can also reduce
per-byte costs on receive by allowing zero copy with DDP.

 This is certainly a concern.  Fixing these issues IMHO is globally
more 
 important (and architecturally more desirable) than TOEs.  Some may 
 disagree. :-)

If you talk to the IEEE802.3 folks, they give no hope of the current
state 
of affairs changing. Plus, jumbo frames benefits really don't apply to
all 
applications, only to large transfers, and as you say above, per-byte
costs 
are still there. 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-20 Thread David S. Miller
From: Wael Noureddine [EMAIL PROTECTED]
Date: Sat, 20 Aug 2005 13:46:33 -0700

 That's a good point. Why not offer both alternatives and let the
 customers decide what they want? Why would there be a veto against TOE
 if it can be supported non-intrusively and with virtually no changes to
 the software stack?

If a stateless solution exists, it is preferred purely on
technical merits due to maintainability, invasiveness,
and network stack feature preservation (netfilter, packet
classification and scheduling, etc.)

Once a feature goes in, it typicaly is impossible to take
it out.  So I'd rather this issue figure itself out before
either solution gets integrated.

But by in large, if a stateless alternative ever exists to
get the same performance benefit as TOE, it will undoubtedly
be preferred by the Linux networking maintainers, by in large.
So you TOE guys are fighting more than an uphill battle.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-20 Thread David S. Miller
From: Christoph Lameter [EMAIL PROTECTED]
Date: Sat, 20 Aug 2005 15:02:00 -0700 (PDT)

 I worked through the alternatives last year including some of the large 
 packet tricks that are not really standard conformant. None of these
 was really satisfactory and I ended up wasting a lot of my consulting 
 time for a vendor. They finally gave up on jumbo packets that I used to 
 favor.

Please elaborate on why R-LSO would not work?  It requires no
link level changes, no need for jumbo frame support, and no
need for invasive hooks like TOE does.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-20 Thread David S. Miller
From: Wael Noureddine [EMAIL PROTECTED]
Date: Sat, 20 Aug 2005 15:43:06 -0700

 All good points. However, unlike LRO, TOE actually can also reduce
 per-byte costs on receive by allowing zero copy with DDP.

Combined with Intel's I/O AT stuff, LRO can potentially make
the per-byte costs transparent too.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-20 Thread David S. Miller
From: Leonid Grossman [EMAIL PROTECTED]
Date: Sat, 20 Aug 2005 17:43:11 -0400

 BTW any comments on the LRO algorithm in my OLS slides are most welcome;
 we are looking to extend the implementation in the next ASIC.

Pointer?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-20 Thread Wael Noureddine
 But by in large, if a stateless alternative ever exists to
 get the same performance benefit as TOE, it will undoubtedly
 be preferred by the Linux networking maintainers, by in large.
 So you TOE guys are fighting more than an uphill battle.

Nevertheless, this constitutes a reasonable starting ground for an
objective discussion, with the goal of reaching a resolution in finite
time. It is naturally expected that new features be subjected to
scrutiny and a complexity vs. benefits analysis. In this regard, we are
committed to addressing your concerns regarding perceived invasiveness,
feature preservation and maintenance, as well as other
concerns/suggestions which may come up.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-20 Thread David S. Miller
From: Leonid Grossman [EMAIL PROTECTED]
Date: Sat, 20 Aug 2005 21:17:19 -0400

 Which reminds me - some people noted in Ottawa that USO is arguably a
 misleading name for UDP TSO, so better suggestions are welcome.

UDP Fragmentation Offload, aka. UFO ? :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-20 Thread Christoph Lameter
On Sat, 20 Aug 2005, David S. Miller wrote:

 But by in large, if a stateless alternative ever exists to
 get the same performance benefit as TOE, it will undoubtedly
 be preferred by the Linux networking maintainers, by in large.
 So you TOE guys are fighting more than an uphill battle.

It does not exist today AFAIK. The hope of such a solution will prevent 
the inclusion of TOE technology that exists today?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-20 Thread Jeff Garzik

Christoph Lameter wrote:

On Sat, 20 Aug 2005, David S. Miller wrote:

But by in large, if a stateless alternative ever exists to
get the same performance benefit as TOE, it will undoubtedly
be preferred by the Linux networking maintainers, by in large.
So you TOE guys are fighting more than an uphill battle.


It does not exist today AFAIK. The hope of such a solution will prevent 
the inclusion of TOE technology that exists today?



TOE has a solid track record of being a point-in-time solution with 
decreased features and increased maintenance headaches.


That's what prevents its inclusion.

Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-20 Thread David S. Miller
From: Leonid Grossman [EMAIL PROTECTED]
Date: Sat, 20 Aug 2005 21:17:19 -0400

 OLS.pdf at ftp ns1.s2io.com user: linuxdocs password: HALdocs

Looks good, the LRO bits.

It seems important that the OS can specify the LRO sizing limits.
Even better, it would help also to have a flow cache of some sort that
can remember some kind of state.  Here's why.

Early in the connection you can't use a large limit because the
congestion window is still growing.  So if it's beyond a few packets,
you'll hit the LRO timeout for the first couple of round trips.

If you have a flow cache, keyed on saddr/daddr/sport/dport then you
can keep a growing LRO limit.  For example, when a flow cache entry is
created, use a LRO limit of 2 frames.  Each time the LRO limit is
reached, increase the LRO limit by one (until you hit the largest
LRO supported, which for ipv4 would be 64K minus header space).

You could also tweak the LRO timeout in a similar fashion based upon
traffic patterns as well.  In fact, extremely sophisticated things can
be done here to deal with the LRO timing as seen on WAN vs. LAN
streams.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-20 Thread Christoph Lameter
On Sat, 20 Aug 2005, David S. Miller wrote:

 From: Christoph Lameter [EMAIL PROTECTED]
 Date: Sat, 20 Aug 2005 21:16:16 -0700 (PDT)
  It does not exist today AFAIK. The hope of such a solution will prevent 
  the inclusion of TOE technology that exists today?
 What you say isn't exactly true, as Lenoid Grossman and others are
 working on LRO schemes for the XFrame-II chips.  Read his posts
 in this thread.

Yes I have used LSO before (its been awhile I hope I get this right) and 
while it mostly works its not strictly following TCP since the TCP flow 
control does not occur between packets. Similar issues may plague LRO 
(reading the paper from the OLS). I'd rather have some stateful logic 
between packets as expected by the TCP protocol rather than sending 
a sequence of messages in brute force out to the net. But if the 
network card cannot do any better then lets do at least LSO/TSO. Chelsio 
NICs support LSO and I have no doubt that there would not be an issue with 
implementing LRO if need be.

LSO and LRO are like TOE in that they take elements of TCP away from the 
nature. LSO/LRO limit that to TCP flow control playing a bit with the 
logic of TCP to give the illusion of being stateles.

 So it isn't hope.  People are working on this and it's very real.

Its half way thing and likely a bigger problem than getting a few TOE 
hooks into the tcp stack and maintaining them. I bet the tricks that we 
hack into the TCP/IP stack for LSO and for LRO will turn out to 
be more difficult to maintain than the proposed TOE hooks.

 At least the XFrame folks, such as Lenoid, such be largely commended
 for actually pursuing alternate schemes instead of sticking their
 heads in the sand and just accepting TOE the only solution.

Are you sure that these ideas have broad support? Isnt this simply a 
rationalization to justify that their network technology has not been 
able to keep up with emerging offload technologies?

As far as I can tell the mainstream of the industry seems to be moving to 
TOE, seeing LSO as an intermediate implementation on the way to full TCP 
offload.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-20 Thread David S. Miller
From: Christoph Lameter [EMAIL PROTECTED]
Date: Sat, 20 Aug 2005 21:55:22 -0700 (PDT)

 I bet the tricks that we hack into the TCP/IP stack for LSO and for
 LRO will turn out to be more difficult to maintain than the proposed
 TOE hooks.

LRO is going to be mostly transparent.

 As far as I can tell the mainstream of the industry seems to be
 moving to TOE, seeing LSO as an intermediate implementation on the
 way to full TCP offload.

We've been hearing this for years, I'm sick of it already.  TOE turns
critical features off, stateless offloads allow them to stay enabled
and get the performance boost.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-20 Thread Leonid Grossman
 

 -Original Message-
 From: David S. Miller [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, August 20, 2005 9:40 PM

 
 If you have a flow cache, keyed on saddr/daddr/sport/dport 
 then you can keep a growing LRO limit.  For example, when a 
 flow cache entry is created, use a LRO limit of 2 frames.  
 Each time the LRO limit is reached, increase the LRO limit by 
 one (until you hit the largest LRO supported, which for ipv4 
 would be 64K minus header space).

This is a good idea. The saddr/daddr/sport/dport table is already there
for receive traffic steering, 
I just did not realize it could be used for managing LRO limit as well.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-19 Thread Andi Kleen
 Now I can take you even less seriously.  In RFC2581, they are talking
 about unloading a burst of data into a connection where there has been
 significant idle time since the most recent data send.

To be fair Linux would be using TSO in this case too and therefore
cause bursts. But it also would without I think.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-19 Thread John Heffner
On Friday 19 August 2005 12:37 am, Wael Noureddine wrote:
  The is no RFC violated by being bursty.  Show me the RFC where TCP
  burstiness is standardized.  This is yet another strawman.

 You surely know this is a recurring theme in all congestion control RFCs
 (RFC2581 in particular),
 as well as in the Known TCP Implementation Problems RFC2525.

TSO increases micro-burstiness.  Clearly macro-burstiness has konwn harmful 
effects, but I know of nothing in the literature showing harmful effects of 
small bursts.  I'm genuinely curious to see any papers on this subject if you 
have pointers to them.  Admittedly the distinction here between micro and 
macro is fuzzy, but I'd define micro as a small fraction of the cwnd.  The 
Linux TSO implementation doesn't necessarily do the *best* thing, but some 
work has gone in to this recently, and I think it does reasonably well at 
this point under most conditions.  Addressing macro-burstiness issues is 
entirely separate from TSO and is a topic of ongoing research.

One case that does concern me with TSO is switches with short queues.  Imagine 
a GigE switch with a 64k buffer.  If you have two or three machines doing TSO 
toward the same switch port, they're going to start trampling all over each 
other.  However, I'd say this is too short a queue anyway for GigE -- short 
enough that you're screwed with normal TCP.  At 10-Gig, a 64k buffer would be 
even more ridiculous (0.05 ms).  Not all switch manufacturers may agree...

I'm personally not a big fan of TSO or TOE.  They both add a lot of complexity 
to the network stack, and have other downsides.  The *best* way to solve 
these problems is to engineer technologies to use larger packet sizes.  Even 
at 9k (or better yet 16k) the advantages of these offload schemes is 
vanishingly small.  (Though if a TOE can do zero-copy receive, this is a win 
over what currently exists, but I think there are other ways to do that as 
well.)  The Linux kernel may not be able to do too much to encourage 
deployment of larger MTUs, but NIC vendors probably can.

  -John
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-19 Thread Andi Kleen
 I'm personally not a big fan of TSO or TOE.  They both add a lot of 
 complexity 
 to the network stack, and have other downsides.  The *best* way to solve 
 these problems is to engineer technologies to use larger packet sizes.  Even 
 at 9k (or better yet 16k) the advantages of these offload schemes is 
 vanishingly small.  (Though if a TOE can do zero-copy receive, this is a win 
 over what currently exists, but I think there are other ways to do that as 
 well.)  The Linux kernel may not be able to do too much to encourage 
 deployment of larger MTUs, but NIC vendors probably can.

Hmm - but is a 9k or 16k packet on the wire not equivalent to a micro burst? 
(actually it is not that micro compared to 1.5k packets). At least against
burstiness they don't help and make things even worse because the bursts
cannot be split up anymore.

Actually I think there is still much potential to lower the CPU overhead
of individual packets (e.g. by optimizing the cache latencies of fetching
headers and writing TX rings and using per CPU MSIs aggressively for TX 
completion
interrupts). So it might be possible to do much better even with small packets. 
Even for TX. For RX there is even more relatively low hanging fruit given
some NIC support (however it will need some limited amount of state in the NIC) 

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-19 Thread Nivedita Singhvi

Andi Kleen wrote:
I'm personally not a big fan of TSO or TOE.  They both add a lot of complexity 
to the network stack, and have other downsides.  The *best* way to solve 
these problems is to engineer technologies to use larger packet sizes.  Even 
at 9k (or better yet 16k) the advantages of these offload schemes is 
vanishingly small.  (Though if a TOE can do zero-copy receive, this is a win 
over what currently exists, but I think there are other ways to do that as 
well.)  The Linux kernel may not be able to do too much to encourage 
deployment of larger MTUs, but NIC vendors probably can.



Hmm - but is a 9k or 16k packet on the wire not equivalent to a micro burst? 
(actually it is not that micro compared to 1.5k packets). At least against

burstiness they don't help and make things even worse because the bursts
cannot be split up anymore.


Right. The other issue with jumbos frames (9000MTU) is that
the allocation needed is just over 2 pages for 4K page size
machines (common case). 3 page contig allocations tend to fail
once a server is heavily loaded and memory gets fragmented.

thanks,
Nivedita

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-19 Thread Leonid Grossman
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Andi Kleen
 Sent: Friday, August 19, 2005 9:33 AM
 To: John Heffner
 Cc: Wael Noureddine; David S. Miller; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; netdev@vger.kernel.org; [EMAIL PROTECTED]
 Subject: Re: [PATCH] TCP Offload (TOE) - Chelsio
 
  I'm personally not a big fan of TSO or TOE.  They both add a lot of 
  complexity to the network stack, and have other downsides.  
 The *best* 
  way to solve these problems is to engineer technologies to 
 use larger 
  packet sizes.  Even at 9k (or better yet 16k) the 
 advantages of these 
  offload schemes is vanishingly small.  (Though if a TOE can do 
  zero-copy receive, this is a win over what currently exists, but I 
  think there are other ways to do that as
  well.)  The Linux kernel may not be able to do too much to 
 encourage 
  deployment of larger MTUs, but NIC vendors probably can.

This is already done, both on the hardware and on the OS side.

All 10GbE and vast majority GbE NICs and switches/routers support 
9k Jumbo frames in a fully interoperable fashion in LAN and WAN
environments.
16k MTU is more controversial due to crc32 and other issues, but you are
correct 
9k mtu (or even 8k, if one wants to stay with 2 page allocation)
captures the sweet spot.

All Operating systems (except one, and hopefully not for long) support
Jumbo frames in the box.

So, the hardware capability is there, it is just in some rare cases
users can't or unwilling to configure Jumbo frames for the entire path -
and this is the case that stateless and state aware NICs (as well as TOE
engines) are trying to address.


 
 Hmm - but is a 9k or 16k packet on the wire not equivalent to 
 a micro burst? 
 (actually it is not that micro compared to 1.5k packets). At 
 least against burstiness they don't help and make things even 
 worse because the bursts cannot be split up anymore.
 
 Actually I think there is still much potential to lower the 
 CPU overhead of individual packets (e.g. by optimizing the 
 cache latencies of fetching headers and writing TX rings and 
 using per CPU MSIs aggressively for TX completion 
 interrupts). So it might be possible to do much better even 
 with small packets. 
 Even for TX. For RX there is even more relatively low hanging 
 fruit given some NIC support (however it will need some 
 limited amount of state in the NIC) 
 
 -Andi
 -
 To unsubscribe from this list: send the line unsubscribe 
 netdev in the body of a message to [EMAIL PROTECTED] 
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-19 Thread Andi Kleen
 Right. The other issue with jumbos frames (9000MTU) is that
 the allocation needed is just over 2 pages for 4K page size
 machines (common case). 3 page contig allocations tend to fail
 once a server is heavily loaded and memory gets fragmented.

That's just a driver bug. The driver should be splitting up the
buffers into page sized chunks. TX does that already, but
for RX the driver needs to do it.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-19 Thread John Heffner
On Friday 19 August 2005 01:00 pm, Leonid Grossman wrote:
  -Original Message-
   deployment of larger MTUs, but NIC vendors probably can.

 This is already done, both on the hardware and on the OS side.

(Sorry if this is getting a bit offtopic for netdev.)
I know of a number of sites who have not deployed 9k since it is not a 
standard.  I'm just hoping vendors who deal with the IEEE may have some 
ability to change this situation.

  -John
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-19 Thread Andi Kleen
 On the spec website, the current results have it off.

That was because the old implementation violated the congestion
window. With David's new superTSO the next generation of benchmarks
will likely have it on again.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-19 Thread patrick mcmanus
I find the no toe, no way attitude strange.

I've seen a number of server applications that:
a] move a lot of data over TCP.. let's say around 1 Gbps over a hundred
concurrent flows.
b] spend a significant amount of cycles in the kernel stack doing this.
c] spend the rest of their cycles doing userspace crunching
d] have latent unsatisfied demand.

Clearly this box needs more cycles.. If it can add a TOE and move some
of b-c that is a pretty cheap and easy way of getting ahead and
satisfying at least some of d. The issue is not that the box can't do
1Gbps when doing nothing else.. the issue is that it takes significant
cycles to do 1Gbps.

If I have to upgrade the general purpose processors
1] I may lose my existing capital investment
2] If I'm at a boundary I might have to add processors (turn a UP into
an SMP or a 2-way into a 4-way) each of which add significant
complication plus extra heat when compared to the TOEs..

not all scenarios are like this, and I agree TOEs are over-pitched.. but
I think they certainly play a role in whole system design decisions that
are bigger than just the kernel.

While the scale of a,b,c, and d are going to change over time I don't
really see the balance shifting any.

-Patrick



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-18 Thread Timur Tabi

Jeff Garzik wrote:

1) RFC compliance differs based on whether you use a TOE NIC, or Linux 
software stack.  What Linux am I talking to, today?


I think a more accurate question would be, what TCP/IP stack am I talking to, today? 
You're making it sound as if TOE fundamentally changes the entire Linux kernel, when it 
only affects networking.


The concept is very similar to after-market auto parts.  If I replace the intercooler in 
my Audi S4, would you expect me to care if Audi said, But your car's cooling system won't 
work like it used to!  Of course not.  Similarly, if I purchased a $500 TOE-capable 
network adapter and compiled my kernel with TOE support, I'm not going to expect the 
kernel developers to address any problems.


The whole point behind TOE is that you use a different TCP/IP stack.  The only meaningful 
alternative would be to copy the kernel code into the adapter, and have the adapter's 
processor run that code.  It would be a sort of 1.5-way SMP system.  Maybe in the future 
there will be SMP systems that have CPUs dedicated to different I/O devices, but until 
then we need something like TOE to handle 10Gb Ethernet.


I don't think TOE is unreasonable.  If the user enables a TOE device on his system, he 
should be aware that he's now using a different TCP/IP stack.  You can add a network 
stack taint flag if TOE is ever enabled.


About the only TOE situation I could imagine which -would- would be 
where the TOE firmware source code is included in the Linux kernel 
source code, but even then, all the hooks would be nasty.


What if that source code can't be compiled by gcc?  What if it uses a proprietary 
compiler, perhaps one that doesn't even run on Linux?


--
Timur Tabi
Staff Software Engineer
[EMAIL PROTECTED]

One thing a Southern boy will never say is,
I don't think duct tape will fix it.
 -- Ed Smylie, NASA engineer for Apollo 13
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-18 Thread David S. Miller
From: Timur Tabi [EMAIL PROTECTED]
Date: Thu, 18 Aug 2005 17:45:13 -0500

 I think a more accurate question would be, what TCP/IP stack am I
 talking to, today?  You're making it sound as if TOE fundamentally
 changes the entire Linux kernel, when it only affects networking.

Networking is arguably about half of the kernel, and Linux is
pretty useless for most folks without networking.

The point remains that TOE creates an ENORMOUS support burdon
upon us, and makes bugs harder to field even if we add the
TOE Taint thing.

You say what users will expect, and that they will understand, but
history in other areas shows that they simply don't.  Even after
clicking the license agreement et al. on the NVIDIA web site when
downloading their binary-only graphics drivers for Linux, people STILL
REPORT crashes to linux-kernel and various distribution vendors with
that driver loaded.

Think people won't report bugs caused by TOE here?  Think again...
It's a huge problem, and many man hours are wasted on this.

The next issue is when customers ask Well I paid $500 for this TOE
card, how come I can't do netfilter or traffic classification?.  And
they will ask distribution vendors and places like the linux-kernel
and netdev mailing lists these questions, creating a further burdon
upon us.

Finally, even ignoring all of that, the argument for stack
maintainability is still there.  TOE puts it's hooks deep into
the networking stack, and that in and of itself is a long-term
maintainence problem.

I am still very much against TOE going into the Linux networking
stack.  There are ways to obtain TOE's performance without
necessitating stateful support in the cards, everything that's
worthwhile can be done with stateless offloads.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-18 Thread Christoph Lameter
On Thu, 18 Aug 2005, David S. Miller wrote:

 The point remains that TOE creates an ENORMOUS support burdon
 upon us, and makes bugs harder to field even if we add the
 TOE Taint thing.

Simply switch off the TOE to see if its TOE or the OS stack.
TCP is fairly standard though this is a pretty rare case.

 You say what users will expect, and that they will understand, but
 history in other areas shows that they simply don't.  Even after
 clicking the license agreement et al. on the NVIDIA web site when
 downloading their binary-only graphics drivers for Linux, people STILL
 REPORT crashes to linux-kernel and various distribution vendors with
 that driver loaded.

Crashes in the opensource TOE layer or the opensource TOE drivers will 
certainly need to be report to linux-kernel.

 Think people won't report bugs caused by TOE here?  Think again...
 It's a huge problem, and many man hours are wasted on this.

The developer community will also increase since the vendors have 
typically folks on the mailing list to help with these issues.

 The next issue is when customers ask Well I paid $500 for this TOE
 card, how come I can't do netfilter or traffic classification?.  And
 they will ask distribution vendors and places like the linux-kernel
 and netdev mailing lists these questions, creating a further burdon
 upon us.

If its money related then they usually talk to those to whom they gave the 
money to.

 Finally, even ignoring all of that, the argument for stack
 maintainability is still there.  TOE puts it's hooks deep into
 the networking stack, and that in and of itself is a long-term
 maintainence problem.

There are only a few hooks that really do not cost much in terms of 
maintenance.

 I am still very much against TOE going into the Linux networking
 stack.  There are ways to obtain TOE's performance without
 necessitating stateful support in the cards, everything that's
 worthwhile can be done with stateless offloads.

Can we match the performance of the TOE? I doubt that general purpose 
processors have the capabilities to get there.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-18 Thread Christoph Lameter
On Thu, 18 Aug 2005, David S. Miller wrote:

 This is what has always happened in the past, people were preaching
 for TOE back when 100Mbit ethernet was new and fast.  But you
 certainly don't see anyone trying to justify TOE for those link
 speeds today.  The same will happen for 1Gbit and 10Gbit links
 a year or so from now, the cpu, memory, and PCI bus will be fast
 enough.

In that time frame people will have TOEs for even higher speeds.
 
 TOE is therefore by definition a technology which we know will will
 be deprecated for current link technologies over time.  It is a
 specialized hack, and once it's in we can never take it out of
 the kernel.  Why put in a specialized hack when the fully functional,
 fully featureful, general purpose net stack is good enough?

All technology will be depreciated over time. If we follow your line 
of thought then Linux network performance will be condemned to be 
only good enough, beating the prior generation of network 
performance trendsetters and never be the top contender for network 
performance.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-18 Thread David S. Miller
From: Christoph Lameter [EMAIL PROTECTED]
Date: Thu, 18 Aug 2005 20:39:41 -0700 (PDT)

 In that time frame people will have TOEs for even higher speeds.

And once again it will be niche, and very far from commodity.
A specialized optimization for a very small and specialized audience,
ie. not appropriate for Linux upstream.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-18 Thread Wael Noureddine
 With stateless offloading schemes?  Absolutely it is possible.
 
 Even without stateless offloading, if it can't be done today, then
 they will soon.
 
 This is what has always happened in the past, people were preaching
 for TOE back when 100Mbit ethernet was new and fast.  But you
 certainly don't see anyone trying to justify TOE for those link
 speeds today.  The same will happen for 1Gbit and 10Gbit links
 a year or so from now, the cpu, memory, and PCI bus will be fast
 enough.

Sure, today's technology has no issue with handling 1992 network speeds.

 TOE is therefore by definition a technology which we know will will
 be deprecated for current link technologies over time.  It is a
 specialized hack, and once it's in we can never take it out of
 the kernel.  Why put in a specialized hack when the fully functional,
 fully featureful, general purpose net stack is good enough?

Can you explain why TOE is a hack while stateless offload is not?

It is actually surprising that few seem to be concerned with what LSO
and LRO do to TCP. Don't they both change the dynamics of TCP in non-
standard ways? Doesn't this go against Linux's tradition of being the
most RFC compliant of all stacks? LSO, for one, breaks TCP's clock,
increases the sender's burstiness, disrupts congestion control, and
only works in a lossless environment. Has anyone studied the impact
of LSO on network congestion? Who has sanctioned its widespread use?

A TOE must provide a fully standards compliant stack, which does not
break TCP or change its behavior on the wire like stateless offload
does.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-18 Thread Christoph Lameter
On Thu, 18 Aug 2005, David S. Miller wrote:

 And once again it will be niche, and very far from commodity.
 A specialized optimization for a very small and specialized audience,
 ie. not appropriate for Linux upstream.

The TOE method will gradually become standard simply because it allows 
performance that cannot be obtained now with existing hardware. And we 
may be at some speed boundary for the hardware given the limitations on 
clock frequency.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-18 Thread David S. Miller
From: Digital Aryan [EMAIL PROTECTED]
Date: Fri, 19 Aug 2005 09:18:45 +0530

 And seeing what has happened during 100Mbit, 1Gbit and 10Gbit it seems
 reqirements for networking are always one step ahead and the cpu, memory,
 bus-bandwidth will take time to match the requirements. Had this evllution
 been fast enough I wonder if we would have included TSO/LSO support or
 for that matter even the checksum offload in the stack. The full featured,
 generalised network stack would have done it all.

The important point is that all of the offloading is completely
stateless.  And I continually, and vehemently, contend that stateless
offloading is enough, and is highly desirable because it requires
none of the internals exposure nonsense that TOE requires.

And once Microsoft defines an interface for a stateless offload in
their NDI, every network card vendor tends to implement it in their
hardware.

Wouldn't you rather have a commoditized $40.00USD gigabit network card
that got TOE level performance?  I guess that question's answer depends
upon whether you have some financial state in a company doing TOE :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-18 Thread David S. Miller
From: Christoph Lameter [EMAIL PROTECTED]
Date: Thu, 18 Aug 2005 20:50:18 -0700 (PDT)

 The TOE method will gradually become standard simply because it allows 
 performance that cannot be obtained now with existing hardware. And we 
 may be at some speed boundary for the hardware given the limitations on 
 clock frequency.

The same performance can be obtained with stateless offloads.
You continually ignore this possibility, as if TOE is the only
way.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-18 Thread Jeff Garzik

Christoph Lameter wrote:

On Thu, 18 Aug 2005, David S. Miller wrote:



And once again it will be niche, and very far from commodity.
A specialized optimization for a very small and specialized audience,
ie. not appropriate for Linux upstream.



The TOE method will gradually become standard simply because it allows 
performance that cannot be obtained now with existing hardware. And we 
may be at some speed boundary for the hardware given the limitations on 
clock frequency.


False.

Each TOE implementation is locked in time by the speed of the NIC. 
Given time, the network stack will -exceed- the speed of today's TOE NICs.


You can see this with 100mbps TOE NICs, which are slower than today's 
software net stack, with today's software net stack being more 
featureful at the same time.


Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-18 Thread Christoph Lameter
On Thu, 18 Aug 2005, David S. Miller wrote:

 Wouldn't you rather have a commoditized $40.00USD gigabit network card
 that got TOE level performance?  I guess that question's answer depends
 upon whether you have some financial state in a company doing TOE :-)

We may have TOE in $40 network cards. In fact given the way things shape 
up there is the possibility that it may become difficult to get NICs 
without TOE next year.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-18 Thread Christoph Lameter
On Thu, 18 Aug 2005, David S. Miller wrote:

 The same performance can be obtained with stateless offloads.
 You continually ignore this possibility, as if TOE is the only
 way.

TCP is a stateful protocol and what can be done with stateless 
offloads is very limited.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-18 Thread Jeff Garzik

Christoph Lameter wrote:
We may have TOE in $40 network cards. In fact given the way things shape 
up there is the possibility that it may become difficult to get NICs 
without TOE next year.



People have been saying this every year.

Every year, we go through this argument.

Every year, people fail to realize that the bottlenecks are not the 
software net stack, but RAM and PCI bus bandwidth.


Every year, people forget that during the previous year, Intel, AMD, and 
other chipset/CPU makers increase the bandwidth at which network 
bottlenecks.


And yet, somehow, life goes on...

Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-18 Thread David S. Miller
From: Wael Noureddine [EMAIL PROTECTED]
Date: Thu, 18 Aug 2005 20:50:17 -0700

 Can you explain why TOE is a hack while stateless offload is not?

No knowledge of TCP internals necessary, that defines a clean
and maintainable barrier between the device and the network
stack. It also allows all of the network stack features to
be enabled, even when offloading is being performed.

 It is actually surprising that few seem to be concerned with what LSO
 and LRO do to TCP. Don't they both change the dynamics of TCP in non-
 standard ways? Doesn't this go against Linux's tradition of being the
 most RFC compliant of all stacks? LSO, for one, breaks TCP's clock,
 increases the sender's burstiness, disrupts congestion control, and
 only works in a lossless environment. Has anyone studied the impact
 of LSO on network congestion? Who has sanctioned its widespread use?

The loss issue is a bug in our implementation, not a limitation of LSO
in any way, shape, or form.  It will be fixed.  Thanks, but no thanks,
for the strawman.

LSO is fully RFC compliant, and the necessity of that is why I fixed
our implementation to correctly follow the congestion window rules.

The is no RFC violated by being bursty.  Show me the RFC where TCP
burstiness is standardized.  This is yet another strawman.

All of the TOE folks have a big bee in their bonnets because none of
the the networking stack and driver subsystem maintainers see it as a
wise thing to put in.  If it's come to the point where we're
discussing things like LSO standards conformance and other such
strawmen as a justification for TOE, then that's really pathetic.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-18 Thread David S. Miller
From: Christoph Lameter [EMAIL PROTECTED]
Date: Thu, 18 Aug 2005 20:58:39 -0700 (PDT)

 On Thu, 18 Aug 2005, David S. Miller wrote:
 
  The same performance can be obtained with stateless offloads.
  You continually ignore this possibility, as if TOE is the only
  way.
 
 TCP is a stateful protocol and what can be done with stateless 
 offloads is very limited.

Then why are we able to fill the pipe on the send side without
any problem using stateless offloading alone?

There's nothing doing it on receive right now simply because nobody
has tried hard enough.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-18 Thread Wael Noureddine

The is no RFC violated by being bursty.  Show me the RFC where TCP
burstiness is standardized.  This is yet another strawman.


You surely know this is a recurring theme in all congestion control RFCs 
(RFC2581 in particular),
as well as in the Known TCP Implementation Problems RFC2525. 


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-18 Thread David S. Miller
From: Wael Noureddine [EMAIL PROTECTED]
Subject: Re: [PATCH] TCP Offload (TOE) - Chelsio
Date: Thu, 18 Aug 2005 21:37:07 -0700

  The is no RFC violated by being bursty.  Show me the RFC where TCP
  burstiness is standardized.  This is yet another strawman.
 
 You surely know this is a recurring theme in all congestion control RFCs 
 (RFC2581 in particular),

Now I can take you even less seriously.  In RFC2581, they are talking
about unloading a burst of data into a connection where there has been
significant idle time since the most recent data send.

 as well as in the Known TCP Implementation Problems RFC2525. 

In this RFC bursts are only mentioned in:

2.1: this is talking about lack of any slow start at all

2.3: this is talking about an uninitialized congestion window
 at connection startup

2.8: failure of window deflation after loss recovery

2.13: stretch ACK violation, which is discussing receiver behavior

None of any of these RFCs discussing bursting are talking about a
properly inflated congestion window, during an active and healthy
transfer.  LSO violates no RFC standard whatsoever.

In short, you've brought several strawmen in an attempt to discredit
stateless offloading as not being standards compliant.  If you truly
believe what you say, then please go ask SPEC to invalidate most of
the current SpecWEB benchmark results because the vast majority of
them are using LSO.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-15 Thread Dimitris Michailidis
On 8/12/05, David S. Miller [EMAIL PROTECTED] wrote:
 From: Dimitris Michailidis [EMAIL PROTECTED]
 Date: Fri, 12 Aug 2005 10:00:12 -0700
 
  On 8/12/05, David S. Miller [EMAIL PROTECTED] wrote:
   This would mean that every time we wish to change the data structures
   and interfaces for TCP socket lookup, your drivers would need to
   change.
 
  I think using TCP's own functions was done exactly to avoid this
  problem.
 
 That's doesn't achieve the desired result.
 
 I do plan to merge in IBM's move of the TCP hash tables over
 to RCU style locking, and that will require knowledge of the
 locking at the call sites to the functions you have exported
 to the TOE drivers.  The TOE drivers would break as a result.

TOE uses the same locking strategies the host TCP uses (lock_sock and
the rest) so it should at least be familiar.  It doesn't use
ehash_lock or head-lock other than indirectly through functions such
as the above, and does its normal lookups in its own lockless table
that is based on flow ids rather than 4-tuples.  I haven't seen the
patches you mention recently, I recall seeing some RCU ehash
discussion several months ago and that didn't seem it would have much
of an impact.  If you have something more recent I can take a look and
tell you if it would affect anything.

 
 You are creating a maintainence headache for us as well.  Once this
 stuff gets exported to drivers, it becomes nearly impossible to
 change.  And I absolutely reserve the right to create restrictions of
 use that increase the flexibility we have to change interfaces, data
 structures, and locking strategies in the future.
 
I think you have a fine attitude here.  There are and there will be a
lot more users of the SW TCP than of TOEs and I think you should feel
free to improve the former however you can.  The TOE code still works
with kernels going back to 2.4.22, tracking changes in mainline TCP
hasn't been an issue so far.  If you can give maintainers a heads up
before changes you think may be disruptive I think that would be
plenty on your part.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] TCP Offload (TOE) - Chelsio

2005-08-12 Thread Scott Bardone

OPEN TOE submission from Chelsio Communications.

The following items have been addressed:
- cleaned up indentation.
- cleaned up comments.
- cleaned up c-styles.
- using EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL
- removed 2.4 compatibility.
- created TCP_OFFLOAD config option.
- moved #defines to appropriate files.
- removed obfuscating macros.
- included necessary definitions instead of struct.
- made IS_OFFLOADED an inline function instead of macro.

The following items are currently being worked on:
- use sysfs instead of procfs.
- addressing the use of semaphores in 'register_tom'.
- use RCU, need to look at this.
- use inline function instead of TOEDEV macro, requires some work.

Comments:
- static was removed from functions '__tcp_inherit_port'  '__tcp_v4_hash' 
because these are called outside of tcp_ipv4.c from the TOM driver.


Signed-off-by: Scott Bardone [EMAIL PROTECTED]

diff -Naur linux-2.6.13-rc6-git3/include/linux/netdevice.h 
linux-2.6.13-rc6-git3.patched/include/linux/netdevice.h
--- linux-2.6.13-rc6-git3/include/linux/netdevice.h 2005-08-07 
11:18:56.0 -0700
+++ linux-2.6.13-rc6-git3.patched/include/linux/netdevice.h 2005-08-11 
21:28:36.0 -0700
@@ -408,6 +408,9 @@
 #define NETIF_F_VLAN_CHALLENGED1024/* Device cannot handle VLAN 
packets */
 #define NETIF_F_TSO2048/* Can offload TCP/IP segmentation */
 #define NETIF_F_LLTX   4096/* LockLess TX */
+#ifdef CONFIG_TCP_OFFLOAD
+#define NETIF_F_TCPIP_OFFLOAD  65536   /* Can offload TCP/IP */
+#endif
 
/* Called after device is detached from network. */
void(*uninit)(struct net_device *dev);
diff -Naur linux-2.6.13-rc6-git3/include/linux/tcp_diag.h 
linux-2.6.13-rc6-git3.patched/include/linux/tcp_diag.h
--- linux-2.6.13-rc6-git3/include/linux/tcp_diag.h  2005-08-07 
11:18:56.0 -0700
+++ linux-2.6.13-rc6-git3.patched/include/linux/tcp_diag.h  2005-08-11 
21:28:36.0 -0700
@@ -4,6 +4,11 @@
 /* Just some random number */
 #define TCPDIAG_GETSOCK 18
 
+/* TOE API */
+#ifdef CONFIG_TCP_OFFLOAD
+#define TCPDIAG_OFFLOAD 5
+#endif
+
 /* Socket identity */
 struct tcpdiag_sockid
 {
diff -Naur linux-2.6.13-rc6-git3/include/linux/tcp.h 
linux-2.6.13-rc6-git3.patched/include/linux/tcp.h
--- linux-2.6.13-rc6-git3/include/linux/tcp.h   2005-08-07 11:18:56.0 
-0700
+++ linux-2.6.13-rc6-git3.patched/include/linux/tcp.h   2005-08-11 
21:28:36.0 -0700
@@ -235,6 +235,10 @@
return (struct tcp_request_sock *)req;
 }
 
+#ifdef CONFIG_TCP_OFFLOAD
+struct toe_funcs;
+#endif
+
 struct tcp_sock {
/* inet_sock has to be the first member of tcp_sock */
struct inet_sockinet;
@@ -342,6 +346,10 @@
 
struct tcp_func *af_specific;   /* Operations which are 
AF_INET{4,6} specific   */
 
+#ifdef CONFIG_TCP_OFFLOAD
+   struct toe_funcs*toe_specific; /* Operations overriden by TOEs 
*/
+#endif
+
__u32   rcv_wnd;/* Current receiver window  */
__u32   rcv_wup;/* rcv_nxt on last window update sent   */
__u32   write_seq;  /* Tail(+1) of data held in tcp send buffer */
diff -Naur linux-2.6.13-rc6-git3/include/linux/toedev.h 
linux-2.6.13-rc6-git3.patched/include/linux/toedev.h
--- linux-2.6.13-rc6-git3/include/linux/toedev.h1969-12-31 
16:00:00.0 -0800
+++ linux-2.6.13-rc6-git3.patched/include/linux/toedev.h2005-08-11 
22:37:03.94780 -0700
@@ -0,0 +1,126 @@
+/*
+ *   *
+ * File: *
+ *  toedev.h *
+ *   *
+ * Description:  *
+ *  TOE device definitions.  *
+ *   *
+ * This program is free software; you can redistribute it and/or modify  *
+ * it under the terms of the GNU General Public License, version 2, as   *
+ * published by the Free Software Foundation.*
+ *   *
+ * You should have received a copy of the GNU General Public License along   *
+ * with this program; if not, write to the Free Software Foundation, Inc.,   *
+ * 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA. *
+ *   *
+ * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED*
+ * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF  *
+ * MERCHANTABILITY AND FITNESS 

Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-12 Thread David S. Miller
From: Scott Bardone [EMAIL PROTECTED]
Date: Thu, 11 Aug 2005 23:16:14 -0700

 - static was removed from functions '__tcp_inherit_port'  '__tcp_v4_hash' 
 because these are called outside of tcp_ipv4.c from the TOM driver.

There is no way you're going to be allowed to call such deep TCP
internals from your driver.

This would mean that every time we wish to change the data structures
and interfaces for TCP socket lookup, your drivers would need to
change.

This is all looking exactly like the deep dark dungeon I feared TOE
support would be.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-12 Thread Mitchell Blank Jr
The networking gurus can comment on the internals of your patch better than
I can.  Just a few style notes though:

 +#ifdef CONFIG_TCP_OFFLOAD
 +#define NETIF_F_TCPIP_OFFLOAD65536   /* Can offload TCP/IP */
 +#endif

No need to protect this inside CONFIG_* option

 +/* TOE API */
 +#ifdef CONFIG_TCP_OFFLOAD
 +#define TCPDIAG_OFFLOAD 5
 +#endif

Ditto

 +#ifdef CONFIG_TCP_OFFLOAD
 +struct toe_funcs;
 +#endif

Ditto

 +#ifdef CONFIG_TCP_OFFLOAD
 +#include linux/toedev.h
 +#endif

Include linux/toedev.h unconditionally.  Have it handle the !CONFIG_TCP_OFFLOAD
case itself by declaring noop macros for things like toe_neigh_update().
This way you can remove a lot of the #ifdef's you've sprinkled all over the
.c files

 +#define boot_phase 0

Some explaination here?  It looks like something left over from development.

 +#ifndef __raise_softirq_irqoff
 +#define __raise_softirq_irqoff(nr) __cpu_raise_softirq(smp_processor_id(), 
 nr)
 +#endif

What is this needed for?

 +static int toedev_init(void);

This forward declaration seems to be only needed for the boot_phase thing
above, so if that goes this can go as well.

 +/*
 + * Allocate a unique index for a TOE device.  We keep the index within 30 
 bits

Maybe look at lib/idr.c to handle this?

 + struct toedev *dev = kmalloc(sizeof(struct toedev), GFP_KERNEL);
 +
 + if (dev) {
 + memset(dev, 0, sizeof(struct toedev));

Minor nitpick (that some might disagree with)... I usually prefer:

struct toedev *dev = kmalloc(sizeof(*dev), GFP_KERNEL);

 +int toe_receive_skb(struct toedev *dev, struct sk_buff **skb, int n)
 +{
 + int i;

n and i should probably be unsigned int

 +#ifdef CONFIG_TCP_OFFLOAD
 + tcp_listen_offload(sk);
 +#endif

Another example of something that could be an empty macro in a .h file for
the !CONFIG_TCP_OFFLOAD case.

 +#ifndef CONFIG_TCP_OFFLOAD
 +static
 +#endif

Don't do this... just make it non-static unconditionally.  It's not worth
the ugliness.  Same applies to other places.

 +#ifndef CONFIG_TCP_OFFLOAD
 +static
 +#endif
 +__inline__ void __tcp_inherit_port(struct sock *sk, struct sock *child)
  {
   struct tcp_bind_hashbucket *head =
   tcp_bhash[tcp_bhashfn(inet_sk(child)-num)];
 @@ -351,7 +357,10 @@
   }
  }

Things that are inline and are now going to be shared really need to just
remain static inline and move to a header file probably

 +#ifdef CONFIG_TCP_OFFLOAD
 + if (tcp_connect_offload(sk))
 + return 0;
 +#endif

Just another example of the kind of #ifdef that doesn't belong in the .c
files.  If the !CONFIG_TCP_OFFLOAD case just had

#define tcp_connect_offload(sk) (0)

then you can skip the #ifdef

 +#ifndef CONFIG_TCP_OFFLOAD
   LIMIT_NETDEBUG(printk(KERN_DEBUG TCP: drop open 
 request from %u.%u.
 %u.%u/%u\n,
 NIPQUAD(saddr),
 ntohs(skb-h.th-source)));
 +#else
 + NETDEBUG(if (net_ratelimit()) \
 + printk(KERN_DEBUG TCP: drop open 
 +request from %u.%u.
 +%u.%u/%u\n, \
 +NIPQUAD(saddr),
 +ntohs(skb-h.th-source)));
 +#endif

Huh?  What about TOE requires changes to printk ratelimiting?

-Mitch
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-12 Thread Jeff Garzik

David S. Miller wrote:

From: Scott Bardone [EMAIL PROTECTED]
Date: Thu, 11 Aug 2005 23:16:14 -0700


- static was removed from functions '__tcp_inherit_port'  '__tcp_v4_hash' 
because these are called outside of tcp_ipv4.c from the TOM driver.



There is no way you're going to be allowed to call such deep TCP
internals from your driver.

This would mean that every time we wish to change the data structures
and interfaces for TCP socket lookup, your drivers would need to
change.

This is all looking exactly like the deep dark dungeon I feared TOE
support would be.


Although I keep an open mind, I really don't see how any TOE solution 
will ever overcome my own conceptual merge objections:



1) RFC compliance differs based on whether you use a TOE NIC, or Linux 
software stack.  What Linux am I talking to, today?


Linux is consistently the most RFC-compliant net stack in existence, 
AFAIK.  TOE suddenly leaves all that open to question.



2) Security updates.  We can deploy a net stack security fix very 
rapidly, and know that we have solved the issue(s).  With TOE, security 
fixes no longer cover all users.  One has to either wait on multiple TOE 
vendors to deploy firmware fixes, or deploy the software fix and leave 
TOE users exposed.  Once again...  What Linux am I talking to, today?



3) Netfilter.  Either a TOE NIC (a) doesn't support netfilter, (b) needs 
far-reaching packet mangling hooks, or (c) includes its own custom 
netfilter [clone], with attendant bugs and maintenance issues.



4) Configuration.  Either a TOE NIC needs deep net stack hooks, or needs 
its own netlink/ifconfig configuration interfaces.



5) As we see in this thread -- upper layer (TCP, IP) changes in the net 
stack require touching a bunch of low-level drivers.  Brand new 
maintenance issue, which slows down upper layer development.



So far, I haven't seen a TOE NIC that satisfies even half of these 
objections.


About the only TOE situation I could imagine which -would- would be 
where the TOE firmware source code is included in the Linux kernel 
source code, but even then, all the hooks would be nasty.


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-12 Thread Mitchell Blank Jr
I'm fairly pessimistic about full TOE also, I just want to see the patch
cleaned up a bit so we can see the exact impact it would have.  The RX
optimization work presented in the Neterion and Intel papers at OLS sounds a
lot more interesting to me though.

However, I do want to comment on one statement of yours:

Jeff Garzik wrote:
 3) Netfilter.  Either a TOE NIC (a) doesn't support netfilter, (b) needs
 far-reaching packet mangling hooks, or (c) includes its own custom
 netfilter [clone], with attendant bugs and maintenance issues.

I don't think netfilter is a big deal.  The kernel could still check the
TCP handshake packets (or, if needed, faked-up versions with the same data)
at accept()/connect() time.  If those pass muster it's a pretty good bet
that the other 100,000 packets making up that TCP connection would also.
Of course this limitation would need to be documented but I doubt most
netfilter users would mind too much.  There's obviously edge cases where
you can lose like if you update the netfilter rules you ideally want to
revalidate all the currently open connections.

Since TOE hardware is designed to help the TCP end point you probably
don't have to worry about NAT or other fancy mangling on these interfaces.

-Mitch
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-12 Thread David S. Miller
From: Dimitris Michailidis [EMAIL PROTECTED]
Date: Fri, 12 Aug 2005 10:22:47 -0700

 This is true.  There is nothing fundamentally preventing both passive
 and active opens to check netfilter before OKing a connection.  Once a
 connection is established, it's rather impractical to run each of its
 packets through netfilter, this is 10G after all.  You'd probably not
 lose much functionality that you could have otherwise used at these
 speeds.

People don't use netfilter just for state tracking and filtering,
they also use it to some extent for rate limiting, packet logging, and
similar things.  And as busses and cpus get faster, your this is
10G after all argument becomes null and void.

Note that this TOE mess also makes the packet scheduler, queueing
disciplines, and packet classifiers totally unusable as well.

Essentially, half of the Linux networking stack's features are turned
uncontrollably _OFF_ in the presence of TOE.

It is this, along with many other reasons, why the Linux networking
community, in general, are so against TOE.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-12 Thread David S. Miller
From: Dimitris Michailidis [EMAIL PROTECTED]
Date: Fri, 12 Aug 2005 10:00:12 -0700

 On 8/12/05, David S. Miller [EMAIL PROTECTED] wrote:
  This would mean that every time we wish to change the data structures
  and interfaces for TCP socket lookup, your drivers would need to
  change.
 
 I think using TCP's own functions was done exactly to avoid this
 problem.

That's doesn't achieve the desired result.

I do plan to merge in IBM's move of the TCP hash tables over
to RCU style locking, and that will require knowledge of the
locking at the call sites to the functions you have exported
to the TOE drivers.  The TOE drivers would break as a result.

You are creating a maintainence headache for us as well.  Once this
stuff gets exported to drivers, it becomes nearly impossible to
change.  And I absolutely reserve the right to create restrictions of
use that increase the flexibility we have to change interfaces, data
structures, and locking strategies in the future.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html