Re: [PATCH] TCP Offload (TOE) - Chelsio
Christoph Lameter wrote: On Sat, 20 Aug 2005, David S. Miller wrote: But by in large, if a stateless alternative ever exists to get the same performance benefit as TOE, it will undoubtedly be preferred by the Linux networking maintainers, by in large. So you TOE guys are fighting more than an uphill battle. It does not exist today AFAIK. The hope of such a solution will prevent the inclusion of TOE technology that exists today? TOE has a solid track record of being a point-in-time solution with decreased features and increased maintenance headaches. This implies that there have been many TOEs in circulation in the past, can you give a list of the ones you've had experience with and explain how maintaining them has been a headache? We could definitely learn from past mistakes. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Wael Noureddine [EMAIL PROTECTED] Date: Sun, 21 Aug 2005 00:17:17 -0700 How do you intend on avoiding huge stretch ACKs? The implication is that stretch ACKs are bad, which is wrong. Oh yes, that's right, you're the same person who earlier in this thread tried to teach us that bursty TCPs are non-standard :-) Stretch ACKs are actually a positive thing on a healthy connection and do indeed help the sender. And when loss events occur, LRO stops immediately and delivers the packets as-is so that loss information via ACKs with SACK blocks can immediately make their way to the sender. Linux does actually currently generate stretch ACKs, when beneficial. What happens today, due mostly to interrupt mitigation, is that the stack processes many consequetive packets, and spits out a ton of ACKs one after another. That's actually bad. LRO will cause us to instead do the right thing, which for a healthy connection is to scale the ACK response rate to match the interarrival rate of data. Making the sender process a lot of ACKs is bad, because those are cycles that could be used to do the context switch that gets the sender back onto the cpu to fill the send buffer. What happens right now is that the first ACK wakes up the task, then as we try to schedule the process we're inundated with processing the subsequent data packets and spitting the ACKs out, to the point where it takes longer than necessary to get the process onto the cpu to fill the send buffer. I traced this extensively while working on our TSO implementation. This is again assuming TOE cannot implement these features. Users should decide which features they care about, and if TOE doesn't have them then it won't be a viable alternative for them. You're not going to replicate the entire Linux TCP stack onto your card. TOE is not a viable alternative for anyone who wants to do anything out of the limited scope of features you'll have in your TOE stack. This means if we find an incredible new congestion control algorithm, the TOE setups won't get it. This means if the only way to work around a security hole is to enable some netfilter rule until a better fix exists, user's either stay vulnerable or lose TOE. It's all a lose-lose situation for the user, and we're not going to fall down that slipperly slope if there is anything I can do about it. But we don't need the limitation _at all_ with stateless offload solutions. That's our _whole point_. Why do X with limitations if we don't need do? But I'm personally sick of talking about why TOE is so bad, and I'm going to ignore those parts of this thread in the future, and instead I'm going to participate in the discussions that will actually go somewhere. Those are the ones about getting a good LRO implementation going and into the Linux networking stack. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Wael Noureddine [EMAIL PROTECTED] Date: Sun, 21 Aug 2005 00:54:51 -0700 You could also tweak the LRO timeout in a similar fashion based upon traffic patterns as well. In fact, extremely sophisticated things can be done here to deal with the LRO timing as seen on WAN vs. LAN streams. The accurate statement is extremely complicated things need to be done here to deal with the LRO timing as seen on WAN vs. LAN streams. Not to mention dealing with retransmissions and the dynamics of congestion control. LRO will just stop accumulating when out-of-sequence data arrives. Nothing complicated at all. And that's _EXACTLY_ what we want to happen. We want Linux's TCP loss response algorithms to take care of things, which have been extensively tuned over many many years and gets several orders of magnitude more testing and exposure than any customized stack you guys put onto a network card. The LRO timing is not complicated, the packet limit is simply a linearly increasing value that just makes sure that it's always less than or equal to whatever the congestion window happens to be at that moment. It cares not about the exact value. LRO will work, and it's the negative attitude of the TOE folks that inspires me to want to help out the LRO folks and ignore the TOE mania altogether. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
LRO will just stop accumulating when out-of-sequence data arrives. Nothing complicated at all. Unless the NIC keeps state, it is not always able to know if data is out of sequence. The LRO timing is not complicated, the packet limit is simply a linearly increasing value that just makes sure that it's always less than or equal to whatever the congestion window happens to be at that moment. It cares not about the exact value. Yes that is true. However, the congestion window is not known on the receiving end. LRO will work, and it's the negative attitude of the TOE folks that inspires me to want to help out the LRO folks and ignore the TOE mania altogether. Yes, LRO will need serious help to be more than a benchmarking tool. What consititutes negative attitude? The point here is that we need an objective approach to the two options. There is no reason that they both can't go in, especially that LSO/LRO are not as simple and non-intruisive as it may appear from their basic description, and so far cannot match the TOE performance. Let the users decide. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hardware assisted SACK processing (was: [PATCH] TCP Offload (TOE) - Chelsio)
David S. Miller wrote: From: Wael Noureddine [EMAIL PROTECTED] Date: Sun, 21 Aug 2005 00:54:51 -0700 You could also tweak the LRO timeout in a similar fashion based upon traffic patterns as well. In fact, extremely sophisticated things can be done here to deal with the LRO timing as seen on WAN vs. LAN streams. The accurate statement is extremely complicated things need to be done here to deal with the LRO timing as seen on WAN vs. LAN streams. Not to mention dealing with retransmissions and the dynamics of congestion control. LRO will just stop accumulating when out-of-sequence data arrives. Nothing complicated at all. And that's _EXACTLY_ what we want to happen. We want Linux's TCP loss response algorithms to take care of things, which have been extensively tuned over many many years and gets several orders of magnitude more testing and exposure than any customized stack you guys put onto a network card. Actually, at high speeds SACK processing becomes a huge bottleneck by itself. If we could have some help from the hardware with pruning some of the trivial cases it would help, I guess. One thing I can think of and which I implemented in software is a SACK cache feature where at high speeds kicks in and starts processing SACKs every 16 packets (exact parameters to be researched). This was shown to increase performance and eliminate stalls that happen otherwise. In this niche a nic that will understand the SACKs, and batch them for processing if they are just the common case of additional x packets added to latest SACK block would help in this regard. Reducing quite a bit of work the CPU needs to do otherwise. We still have the full Linux stack processing and reacting to the losses and doing the retransmits, we get the hardware to batch some of the work for us. This needs to be host assisted since we don't want this at the early stages of the connection, or for slow connections. Baruch - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] TCP Offload (TOE) - Chelsio
-Original Message- From: Christoph Lameter [mailto:[EMAIL PROTECTED] Sent: Sunday, August 21, 2005 9:28 AM To: David S. Miller Cc: [EMAIL PROTECTED]; Leonid Grossman; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [PATCH] TCP Offload (TOE) - Chelsio And -by the way- it seems that LRO is patented. Hope you made arrangements for Linux to use that technology? Are others vendors allowed to implement LRO in their hardware? Ahh, I was curious to see if someone will bring this argument up - in fact, LRO legal issues do not exist, while TOE legal issues are quite big at the moment. I guess this is one of the reasons why OpenRDMA and other mainstream industry efforts don't have any provisions for TOE support. As I mentioned in Ottawa, there is indeed a patent application filed about a year ago for Neterion basic LRO implementation. Linux doesn't need any arrangements to support basic LRO - it will work in Linux today without any OS changes. All it needs from the stack is the ability to accept chained skb that is bigger than advetrized MTU, and this works in Linux stack already. Potential TCP loss response algorithm and other changes that David is talking about will be beneficial, and these are obviously not covered by the Neterion application. Anyways, since the application is not granted yet it's probably too early to discuss it's future - but if any vendor wants to have peace of mind, we can talk and get this out of the way; we are obviously motivated to make LRO a de-facto NIC feature (much like TSO and other stateless offloads have become). Unlike LRO, TOE is covered by number of existing patents and faces fundamental legal challenges as we speak, for both OS vendors and IHVs - as recent Alacritech/Microsoft/Broadcom lawsuit and settlement just clearly demonstrated :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Stretch ACKs (was: [PATCH] TCP Offload (TOE) - Chelsio)
David S. Miller wrote: From: Wael Noureddine [EMAIL PROTECTED] Date: Sun, 21 Aug 2005 00:17:17 -0700 How do you intend on avoiding huge stretch ACKs? The implication is that stretch ACKs are bad, which is wrong. Oh yes, that's right, you're the same person who earlier in this thread tried to teach us that bursty TCPs are non-standard :-) Stretch ACKs are actually a positive thing on a healthy connection and do indeed help the sender. And when loss events occur, LRO stops immediately and delivers the packets as-is so that loss information via ACKs with SACK blocks can immediately make their way to the sender. Linux does actually currently generate stretch ACKs, when beneficial. I do notice that on my own tests, I'm seeing stretch acks of 7 and 8 packets quite often. Is there any intention to add ABC (Accurate Byte Counting) to Linux to offset the effects this has on the cwnd growth? I haven't seen anything critical happening because of this, but it definitely changes the way TCP behaves. Baruch - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
On Sun, 21 Aug 2005, David S. Miller wrote: LRO will work, and it's the negative attitude of the TOE folks that inspires me to want to help out the LRO folks and ignore the TOE mania altogether. Dave you critized the black and white attitude before. It seems that you are the only one in this discussion that has this problem. We just discussed the negative aspects of LSO/LRO and then get accused of being maniac. The main problem is the categorical NO to TOE that we keep hearing. We can do LRO no problem. Why is it so difficult to allow the TCP layer to support the offload capabilities? And -by the way- it seems that LRO is patented. Hope you made arrangements for Linux to use that technology? Are others vendors allowed to implement LRO in their hardware? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
All other things being equal, it is better not to put packets into the network faster than it can drain them out. Large bursts increase delay variation, and increase the probability that two or more packets in a connection will be dropped within an RTT (not every box is implementing AQM yet). New 10Gig-switch-on-a-chip devices like Fujitsu's MB87Q3140 (http://www.fujitsu.com/us/services/edevices/microelectronics/networkingassps/mb87q3140/) have only limited on-chip buffer memory. So transmit packet pacing is preferred; see http://yuba.stanford.edu/~yganjali/research/publications/Very-Small-Buffers-CCR.pdf for further arguments. True. This is the first thing we learnt when dealing with 10Gbps. Chelsio's TOE is capable of micro-second granularity pacing and that has been crucial for high performance. Even then, it's very hard to fully avoid packet loss. Hardware TCP and retransmission straight from the NIC turn out to be very helpful in quickly recovering from loss. In fact. even interrupt moderation delays show up in degraded performance. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
On Fri, Aug 19, 2005 at 07:05:06PM +0200, Andi Kleen wrote: Right. The other issue with jumbos frames (9000MTU) is that the allocation needed is just over 2 pages for 4K page size machines (common case). 3 page contig allocations tend to fail once a server is heavily loaded and memory gets fragmented. That's just a driver bug. The driver should be splitting up the buffers into page sized chunks. TX does that already, but for RX the driver needs to do it. The problem with the e1000 driver in this regard is the following: 1. It internally rounds the Rx buffer up to the next power of 2 So your 9k MTU request just turned into a 16k (16384) allocation 2. alloc_skb() allocates size+sizeof(struct skb_shared_info) bytes using kmalloc(), which bumps the size over the power-of-2 boundary 3. kmalloc() rounds this up to the next power of 2, and you end up with a 32k (yes, a 32768 byte) GFP_ATOMIC allocation request. I tested the version in 2.4.26 (version 5.2.30.1-k1) and the one from Intel's webpage (6.1.16) on 2.4.26. Is there any plans to fix this? Does the 2.4 kernel support chained sk_buff's along the receive path? (This is news to me!) Is there any driver that does this, that can be read as an instructive example? Thanks, Guy Thornley - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Guy Thornley [EMAIL PROTECTED] Date: Mon, 22 Aug 2005 11:06:13 +1200 Does the 2.4 kernel support chained sk_buff's along the receive path? (This is news to me!) Is there any driver that does this, that can be read as an instructive example? It does, just that no driver takes advantage of this yet. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Christoph Lameter [EMAIL PROTECTED] Date: Sat, 20 Aug 2005 10:57:51 -0700 (PDT) We are discussing something that is not useful for todays network load and not standardized. TOE is the only answer to offloading transfers of data encountered in contemporary networks. It is talk like this that makes me want to not participate in such threads TOE is the only... please, spare me the unary view of the world ok? Here is one idea. Do a reverse LSO, have a dynamic cache on the network card watching saddr/daddr/sport/dport flows, and accumulate as many in-order TCP packets as possible into one large R-LSO frame. This accumulation is timed out by a length and time parameter programmable in the chip, just like HW interrupt mitigation is. Then the stack receives these (up to 64K) frames. This is the kind of discussion of alternative ideas I am _NOT_ seeing. Which shows how blinded people are to alternatives to TOE. Christoph, you're a really bright guy, perhaps you can sit and come up with some other ideas which would act as stateless alternatives to TOE? I bet you can do it, if you would simply try... - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
On Aug 20, 2005, at 1:57 PM, Christoph Lameter wrote: On Fri, 19 Aug 2005, Andi Kleen wrote: Hmm - but is a 9k or 16k packet on the wire not equivalent to a micro burst? (actually it is not that micro compared to 1.5k packets). At least against burstiness they don't help and make things even worse because the bursts cannot be split up anymore. 9k or 16k is just a very puny little size these days. That will not give anyone much benefit. We need to be able to transfer large amounts of data. This is going to be measured in megabytes or gigabytes of data and not in kilobytes. I think you are really missing something fundamental here. The processing costs for TCP can be split in to per-byte costs and per-packet costs. (This is a slight oversimplification, but good enough for this discussion.) The per-packet costs include things like memory allocation, protocol processing, and device interrupt handling. The per-byte costs are bus transfer times, calculating checksums, etc. Using larger packets helps reduce the overall per-packet costs. There are diminishing returns here in using larger packets, and the marginal benefit of using packet much larger than 16k isn't all that great. (As an aside, there are other benefits to using larger packets than just processing speed. For example, it makes congestion control easier. But that's another discussion.) TSO and TOE both help significantly with the per-packet costs. They are effectively equivalent here to using larger packets. Doing zero-copy and checksum offloading helps with the per-byte costs, and is possible today with stock Linux, and I believe most TOE implementations do. But TOE and TSO in and of themselves *do not* help with the per-byte costs. TOE currently has an advantage over TSO because it reduces the receive path costs in both ack and data processing. Moreover jumbo packets essentially not standardized and most network devices switch jumbo packets off or do not support it because of that fact. Devices that send jumbo frames may cause other devices on the network to malfunction. This is certainly a concern. Fixing these issues IMHO is globally more important (and architecturally more desirable) than TOEs. Some may disagree. :-) We are discussing something that is not useful for todays network load and not standardized. TOE is the only answer to offloading transfers of data encountered in contemporary networks. Most people who are involved in this area see multiple solutions with different trade-offs. TOE is just one option. -John - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] TCP Offload (TOE) - Chelsio
Here is one idea. Do a reverse LSO, have a dynamic cache on the network card watching saddr/daddr/sport/dport flows, and accumulate as many in-order TCP packets as possible into one large R-LSO frame. This accumulation is timed out by a length and time parameter programmable in the chip, just like HW interrupt mitigation is. Then the stack receives these (up to 64K) frames. This is the kind of discussion of alternative ideas I am _NOT_ seeing. Which shows how blinded people are to alternatives to TOE. Number of R-LSO (we call it LRO) hw assists is actually shipping today in our 10GbE ASIC. We will submit an LRO driver patch at some point - although MSI-X and Receive Traffic Hashing driver patches will take precedence. BTW any comments on the LRO algorithm in my OLS slides are most welcome; we are looking to extend the implementation in the next ASIC. Christoph, you're a really bright guy, perhaps you can sit and come up with some other ideas which would act as stateless alternatives to TOE? I bet you can do it, if you would simply try... - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
On Sat, 20 Aug 2005, David S. Miller wrote: Christoph, you're a really bright guy, perhaps you can sit and come up with some other ideas which would act as stateless alternatives to TOE? I bet you can do it, if you would simply try... I worked through the alternatives last year including some of the large packet tricks that are not really standard conformant. None of these was really satisfactory and I ended up wasting a lot of my consulting time for a vendor. They finally gave up on jumbo packets that I used to favor. The basic issue is the fundamental design of the TCP layer. If we could redesign that layer then we may be able to come up with a stateless protocol but we are condemmned to follow TCP standard. Offload technology is inevitable as far as I can see and Chelsio has the most innovative design in harmony with Linux design principles that I know of. They have developed an API that will also allow other vendors to hook into our tcp layer. Lets at least give this a try. They are willing to commit resources to get this going. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
TSO and TOE both help significantly with the per-packet costs. They are effectively equivalent here to using larger packets. Doing zero-copy and checksum offloading helps with the per-byte costs, and is possible today with stock Linux, and I believe most TOE implementations do. But TOE and TSO in and of themselves *do not* help with the per-byte costs. TOE currently has an advantage over TSO because it reduces the receive path costs in both ack and data processing. All good points. However, unlike LRO, TOE actually can also reduce per-byte costs on receive by allowing zero copy with DDP. This is certainly a concern. Fixing these issues IMHO is globally more important (and architecturally more desirable) than TOEs. Some may disagree. :-) If you talk to the IEEE802.3 folks, they give no hope of the current state of affairs changing. Plus, jumbo frames benefits really don't apply to all applications, only to large transfers, and as you say above, per-byte costs are still there. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Wael Noureddine [EMAIL PROTECTED] Date: Sat, 20 Aug 2005 13:46:33 -0700 That's a good point. Why not offer both alternatives and let the customers decide what they want? Why would there be a veto against TOE if it can be supported non-intrusively and with virtually no changes to the software stack? If a stateless solution exists, it is preferred purely on technical merits due to maintainability, invasiveness, and network stack feature preservation (netfilter, packet classification and scheduling, etc.) Once a feature goes in, it typicaly is impossible to take it out. So I'd rather this issue figure itself out before either solution gets integrated. But by in large, if a stateless alternative ever exists to get the same performance benefit as TOE, it will undoubtedly be preferred by the Linux networking maintainers, by in large. So you TOE guys are fighting more than an uphill battle. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Christoph Lameter [EMAIL PROTECTED] Date: Sat, 20 Aug 2005 15:02:00 -0700 (PDT) I worked through the alternatives last year including some of the large packet tricks that are not really standard conformant. None of these was really satisfactory and I ended up wasting a lot of my consulting time for a vendor. They finally gave up on jumbo packets that I used to favor. Please elaborate on why R-LSO would not work? It requires no link level changes, no need for jumbo frame support, and no need for invasive hooks like TOE does. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Wael Noureddine [EMAIL PROTECTED] Date: Sat, 20 Aug 2005 15:43:06 -0700 All good points. However, unlike LRO, TOE actually can also reduce per-byte costs on receive by allowing zero copy with DDP. Combined with Intel's I/O AT stuff, LRO can potentially make the per-byte costs transparent too. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Leonid Grossman [EMAIL PROTECTED] Date: Sat, 20 Aug 2005 17:43:11 -0400 BTW any comments on the LRO algorithm in my OLS slides are most welcome; we are looking to extend the implementation in the next ASIC. Pointer? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] TCP Offload (TOE) - Chelsio
But by in large, if a stateless alternative ever exists to get the same performance benefit as TOE, it will undoubtedly be preferred by the Linux networking maintainers, by in large. So you TOE guys are fighting more than an uphill battle. Nevertheless, this constitutes a reasonable starting ground for an objective discussion, with the goal of reaching a resolution in finite time. It is naturally expected that new features be subjected to scrutiny and a complexity vs. benefits analysis. In this regard, we are committed to addressing your concerns regarding perceived invasiveness, feature preservation and maintenance, as well as other concerns/suggestions which may come up. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Leonid Grossman [EMAIL PROTECTED] Date: Sat, 20 Aug 2005 21:17:19 -0400 Which reminds me - some people noted in Ottawa that USO is arguably a misleading name for UDP TSO, so better suggestions are welcome. UDP Fragmentation Offload, aka. UFO ? :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
On Sat, 20 Aug 2005, David S. Miller wrote: But by in large, if a stateless alternative ever exists to get the same performance benefit as TOE, it will undoubtedly be preferred by the Linux networking maintainers, by in large. So you TOE guys are fighting more than an uphill battle. It does not exist today AFAIK. The hope of such a solution will prevent the inclusion of TOE technology that exists today? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
Christoph Lameter wrote: On Sat, 20 Aug 2005, David S. Miller wrote: But by in large, if a stateless alternative ever exists to get the same performance benefit as TOE, it will undoubtedly be preferred by the Linux networking maintainers, by in large. So you TOE guys are fighting more than an uphill battle. It does not exist today AFAIK. The hope of such a solution will prevent the inclusion of TOE technology that exists today? TOE has a solid track record of being a point-in-time solution with decreased features and increased maintenance headaches. That's what prevents its inclusion. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Leonid Grossman [EMAIL PROTECTED] Date: Sat, 20 Aug 2005 21:17:19 -0400 OLS.pdf at ftp ns1.s2io.com user: linuxdocs password: HALdocs Looks good, the LRO bits. It seems important that the OS can specify the LRO sizing limits. Even better, it would help also to have a flow cache of some sort that can remember some kind of state. Here's why. Early in the connection you can't use a large limit because the congestion window is still growing. So if it's beyond a few packets, you'll hit the LRO timeout for the first couple of round trips. If you have a flow cache, keyed on saddr/daddr/sport/dport then you can keep a growing LRO limit. For example, when a flow cache entry is created, use a LRO limit of 2 frames. Each time the LRO limit is reached, increase the LRO limit by one (until you hit the largest LRO supported, which for ipv4 would be 64K minus header space). You could also tweak the LRO timeout in a similar fashion based upon traffic patterns as well. In fact, extremely sophisticated things can be done here to deal with the LRO timing as seen on WAN vs. LAN streams. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
On Sat, 20 Aug 2005, David S. Miller wrote: From: Christoph Lameter [EMAIL PROTECTED] Date: Sat, 20 Aug 2005 21:16:16 -0700 (PDT) It does not exist today AFAIK. The hope of such a solution will prevent the inclusion of TOE technology that exists today? What you say isn't exactly true, as Lenoid Grossman and others are working on LRO schemes for the XFrame-II chips. Read his posts in this thread. Yes I have used LSO before (its been awhile I hope I get this right) and while it mostly works its not strictly following TCP since the TCP flow control does not occur between packets. Similar issues may plague LRO (reading the paper from the OLS). I'd rather have some stateful logic between packets as expected by the TCP protocol rather than sending a sequence of messages in brute force out to the net. But if the network card cannot do any better then lets do at least LSO/TSO. Chelsio NICs support LSO and I have no doubt that there would not be an issue with implementing LRO if need be. LSO and LRO are like TOE in that they take elements of TCP away from the nature. LSO/LRO limit that to TCP flow control playing a bit with the logic of TCP to give the illusion of being stateles. So it isn't hope. People are working on this and it's very real. Its half way thing and likely a bigger problem than getting a few TOE hooks into the tcp stack and maintaining them. I bet the tricks that we hack into the TCP/IP stack for LSO and for LRO will turn out to be more difficult to maintain than the proposed TOE hooks. At least the XFrame folks, such as Lenoid, such be largely commended for actually pursuing alternate schemes instead of sticking their heads in the sand and just accepting TOE the only solution. Are you sure that these ideas have broad support? Isnt this simply a rationalization to justify that their network technology has not been able to keep up with emerging offload technologies? As far as I can tell the mainstream of the industry seems to be moving to TOE, seeing LSO as an intermediate implementation on the way to full TCP offload. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Christoph Lameter [EMAIL PROTECTED] Date: Sat, 20 Aug 2005 21:55:22 -0700 (PDT) I bet the tricks that we hack into the TCP/IP stack for LSO and for LRO will turn out to be more difficult to maintain than the proposed TOE hooks. LRO is going to be mostly transparent. As far as I can tell the mainstream of the industry seems to be moving to TOE, seeing LSO as an intermediate implementation on the way to full TCP offload. We've been hearing this for years, I'm sick of it already. TOE turns critical features off, stateless offloads allow them to stay enabled and get the performance boost. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] TCP Offload (TOE) - Chelsio
-Original Message- From: David S. Miller [mailto:[EMAIL PROTECTED] Sent: Saturday, August 20, 2005 9:40 PM If you have a flow cache, keyed on saddr/daddr/sport/dport then you can keep a growing LRO limit. For example, when a flow cache entry is created, use a LRO limit of 2 frames. Each time the LRO limit is reached, increase the LRO limit by one (until you hit the largest LRO supported, which for ipv4 would be 64K minus header space). This is a good idea. The saddr/daddr/sport/dport table is already there for receive traffic steering, I just did not realize it could be used for managing LRO limit as well. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
Now I can take you even less seriously. In RFC2581, they are talking about unloading a burst of data into a connection where there has been significant idle time since the most recent data send. To be fair Linux would be using TSO in this case too and therefore cause bursts. But it also would without I think. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
On Friday 19 August 2005 12:37 am, Wael Noureddine wrote: The is no RFC violated by being bursty. Show me the RFC where TCP burstiness is standardized. This is yet another strawman. You surely know this is a recurring theme in all congestion control RFCs (RFC2581 in particular), as well as in the Known TCP Implementation Problems RFC2525. TSO increases micro-burstiness. Clearly macro-burstiness has konwn harmful effects, but I know of nothing in the literature showing harmful effects of small bursts. I'm genuinely curious to see any papers on this subject if you have pointers to them. Admittedly the distinction here between micro and macro is fuzzy, but I'd define micro as a small fraction of the cwnd. The Linux TSO implementation doesn't necessarily do the *best* thing, but some work has gone in to this recently, and I think it does reasonably well at this point under most conditions. Addressing macro-burstiness issues is entirely separate from TSO and is a topic of ongoing research. One case that does concern me with TSO is switches with short queues. Imagine a GigE switch with a 64k buffer. If you have two or three machines doing TSO toward the same switch port, they're going to start trampling all over each other. However, I'd say this is too short a queue anyway for GigE -- short enough that you're screwed with normal TCP. At 10-Gig, a 64k buffer would be even more ridiculous (0.05 ms). Not all switch manufacturers may agree... I'm personally not a big fan of TSO or TOE. They both add a lot of complexity to the network stack, and have other downsides. The *best* way to solve these problems is to engineer technologies to use larger packet sizes. Even at 9k (or better yet 16k) the advantages of these offload schemes is vanishingly small. (Though if a TOE can do zero-copy receive, this is a win over what currently exists, but I think there are other ways to do that as well.) The Linux kernel may not be able to do too much to encourage deployment of larger MTUs, but NIC vendors probably can. -John - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
I'm personally not a big fan of TSO or TOE. They both add a lot of complexity to the network stack, and have other downsides. The *best* way to solve these problems is to engineer technologies to use larger packet sizes. Even at 9k (or better yet 16k) the advantages of these offload schemes is vanishingly small. (Though if a TOE can do zero-copy receive, this is a win over what currently exists, but I think there are other ways to do that as well.) The Linux kernel may not be able to do too much to encourage deployment of larger MTUs, but NIC vendors probably can. Hmm - but is a 9k or 16k packet on the wire not equivalent to a micro burst? (actually it is not that micro compared to 1.5k packets). At least against burstiness they don't help and make things even worse because the bursts cannot be split up anymore. Actually I think there is still much potential to lower the CPU overhead of individual packets (e.g. by optimizing the cache latencies of fetching headers and writing TX rings and using per CPU MSIs aggressively for TX completion interrupts). So it might be possible to do much better even with small packets. Even for TX. For RX there is even more relatively low hanging fruit given some NIC support (however it will need some limited amount of state in the NIC) -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
Andi Kleen wrote: I'm personally not a big fan of TSO or TOE. They both add a lot of complexity to the network stack, and have other downsides. The *best* way to solve these problems is to engineer technologies to use larger packet sizes. Even at 9k (or better yet 16k) the advantages of these offload schemes is vanishingly small. (Though if a TOE can do zero-copy receive, this is a win over what currently exists, but I think there are other ways to do that as well.) The Linux kernel may not be able to do too much to encourage deployment of larger MTUs, but NIC vendors probably can. Hmm - but is a 9k or 16k packet on the wire not equivalent to a micro burst? (actually it is not that micro compared to 1.5k packets). At least against burstiness they don't help and make things even worse because the bursts cannot be split up anymore. Right. The other issue with jumbos frames (9000MTU) is that the allocation needed is just over 2 pages for 4K page size machines (common case). 3 page contig allocations tend to fail once a server is heavily loaded and memory gets fragmented. thanks, Nivedita - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] TCP Offload (TOE) - Chelsio
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andi Kleen Sent: Friday, August 19, 2005 9:33 AM To: John Heffner Cc: Wael Noureddine; David S. Miller; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; netdev@vger.kernel.org; [EMAIL PROTECTED] Subject: Re: [PATCH] TCP Offload (TOE) - Chelsio I'm personally not a big fan of TSO or TOE. They both add a lot of complexity to the network stack, and have other downsides. The *best* way to solve these problems is to engineer technologies to use larger packet sizes. Even at 9k (or better yet 16k) the advantages of these offload schemes is vanishingly small. (Though if a TOE can do zero-copy receive, this is a win over what currently exists, but I think there are other ways to do that as well.) The Linux kernel may not be able to do too much to encourage deployment of larger MTUs, but NIC vendors probably can. This is already done, both on the hardware and on the OS side. All 10GbE and vast majority GbE NICs and switches/routers support 9k Jumbo frames in a fully interoperable fashion in LAN and WAN environments. 16k MTU is more controversial due to crc32 and other issues, but you are correct 9k mtu (or even 8k, if one wants to stay with 2 page allocation) captures the sweet spot. All Operating systems (except one, and hopefully not for long) support Jumbo frames in the box. So, the hardware capability is there, it is just in some rare cases users can't or unwilling to configure Jumbo frames for the entire path - and this is the case that stateless and state aware NICs (as well as TOE engines) are trying to address. Hmm - but is a 9k or 16k packet on the wire not equivalent to a micro burst? (actually it is not that micro compared to 1.5k packets). At least against burstiness they don't help and make things even worse because the bursts cannot be split up anymore. Actually I think there is still much potential to lower the CPU overhead of individual packets (e.g. by optimizing the cache latencies of fetching headers and writing TX rings and using per CPU MSIs aggressively for TX completion interrupts). So it might be possible to do much better even with small packets. Even for TX. For RX there is even more relatively low hanging fruit given some NIC support (however it will need some limited amount of state in the NIC) -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
Right. The other issue with jumbos frames (9000MTU) is that the allocation needed is just over 2 pages for 4K page size machines (common case). 3 page contig allocations tend to fail once a server is heavily loaded and memory gets fragmented. That's just a driver bug. The driver should be splitting up the buffers into page sized chunks. TX does that already, but for RX the driver needs to do it. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
On Friday 19 August 2005 01:00 pm, Leonid Grossman wrote: -Original Message- deployment of larger MTUs, but NIC vendors probably can. This is already done, both on the hardware and on the OS side. (Sorry if this is getting a bit offtopic for netdev.) I know of a number of sites who have not deployed 9k since it is not a standard. I'm just hoping vendors who deal with the IEEE may have some ability to change this situation. -John - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
On the spec website, the current results have it off. That was because the old implementation violated the congestion window. With David's new superTSO the next generation of benchmarks will likely have it on again. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] TCP Offload (TOE) - Chelsio
I find the no toe, no way attitude strange. I've seen a number of server applications that: a] move a lot of data over TCP.. let's say around 1 Gbps over a hundred concurrent flows. b] spend a significant amount of cycles in the kernel stack doing this. c] spend the rest of their cycles doing userspace crunching d] have latent unsatisfied demand. Clearly this box needs more cycles.. If it can add a TOE and move some of b-c that is a pretty cheap and easy way of getting ahead and satisfying at least some of d. The issue is not that the box can't do 1Gbps when doing nothing else.. the issue is that it takes significant cycles to do 1Gbps. If I have to upgrade the general purpose processors 1] I may lose my existing capital investment 2] If I'm at a boundary I might have to add processors (turn a UP into an SMP or a 2-way into a 4-way) each of which add significant complication plus extra heat when compared to the TOEs.. not all scenarios are like this, and I agree TOEs are over-pitched.. but I think they certainly play a role in whole system design decisions that are bigger than just the kernel. While the scale of a,b,c, and d are going to change over time I don't really see the balance shifting any. -Patrick - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
Jeff Garzik wrote: 1) RFC compliance differs based on whether you use a TOE NIC, or Linux software stack. What Linux am I talking to, today? I think a more accurate question would be, what TCP/IP stack am I talking to, today? You're making it sound as if TOE fundamentally changes the entire Linux kernel, when it only affects networking. The concept is very similar to after-market auto parts. If I replace the intercooler in my Audi S4, would you expect me to care if Audi said, But your car's cooling system won't work like it used to! Of course not. Similarly, if I purchased a $500 TOE-capable network adapter and compiled my kernel with TOE support, I'm not going to expect the kernel developers to address any problems. The whole point behind TOE is that you use a different TCP/IP stack. The only meaningful alternative would be to copy the kernel code into the adapter, and have the adapter's processor run that code. It would be a sort of 1.5-way SMP system. Maybe in the future there will be SMP systems that have CPUs dedicated to different I/O devices, but until then we need something like TOE to handle 10Gb Ethernet. I don't think TOE is unreasonable. If the user enables a TOE device on his system, he should be aware that he's now using a different TCP/IP stack. You can add a network stack taint flag if TOE is ever enabled. About the only TOE situation I could imagine which -would- would be where the TOE firmware source code is included in the Linux kernel source code, but even then, all the hooks would be nasty. What if that source code can't be compiled by gcc? What if it uses a proprietary compiler, perhaps one that doesn't even run on Linux? -- Timur Tabi Staff Software Engineer [EMAIL PROTECTED] One thing a Southern boy will never say is, I don't think duct tape will fix it. -- Ed Smylie, NASA engineer for Apollo 13 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Timur Tabi [EMAIL PROTECTED] Date: Thu, 18 Aug 2005 17:45:13 -0500 I think a more accurate question would be, what TCP/IP stack am I talking to, today? You're making it sound as if TOE fundamentally changes the entire Linux kernel, when it only affects networking. Networking is arguably about half of the kernel, and Linux is pretty useless for most folks without networking. The point remains that TOE creates an ENORMOUS support burdon upon us, and makes bugs harder to field even if we add the TOE Taint thing. You say what users will expect, and that they will understand, but history in other areas shows that they simply don't. Even after clicking the license agreement et al. on the NVIDIA web site when downloading their binary-only graphics drivers for Linux, people STILL REPORT crashes to linux-kernel and various distribution vendors with that driver loaded. Think people won't report bugs caused by TOE here? Think again... It's a huge problem, and many man hours are wasted on this. The next issue is when customers ask Well I paid $500 for this TOE card, how come I can't do netfilter or traffic classification?. And they will ask distribution vendors and places like the linux-kernel and netdev mailing lists these questions, creating a further burdon upon us. Finally, even ignoring all of that, the argument for stack maintainability is still there. TOE puts it's hooks deep into the networking stack, and that in and of itself is a long-term maintainence problem. I am still very much against TOE going into the Linux networking stack. There are ways to obtain TOE's performance without necessitating stateful support in the cards, everything that's worthwhile can be done with stateless offloads. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
On Thu, 18 Aug 2005, David S. Miller wrote: The point remains that TOE creates an ENORMOUS support burdon upon us, and makes bugs harder to field even if we add the TOE Taint thing. Simply switch off the TOE to see if its TOE or the OS stack. TCP is fairly standard though this is a pretty rare case. You say what users will expect, and that they will understand, but history in other areas shows that they simply don't. Even after clicking the license agreement et al. on the NVIDIA web site when downloading their binary-only graphics drivers for Linux, people STILL REPORT crashes to linux-kernel and various distribution vendors with that driver loaded. Crashes in the opensource TOE layer or the opensource TOE drivers will certainly need to be report to linux-kernel. Think people won't report bugs caused by TOE here? Think again... It's a huge problem, and many man hours are wasted on this. The developer community will also increase since the vendors have typically folks on the mailing list to help with these issues. The next issue is when customers ask Well I paid $500 for this TOE card, how come I can't do netfilter or traffic classification?. And they will ask distribution vendors and places like the linux-kernel and netdev mailing lists these questions, creating a further burdon upon us. If its money related then they usually talk to those to whom they gave the money to. Finally, even ignoring all of that, the argument for stack maintainability is still there. TOE puts it's hooks deep into the networking stack, and that in and of itself is a long-term maintainence problem. There are only a few hooks that really do not cost much in terms of maintenance. I am still very much against TOE going into the Linux networking stack. There are ways to obtain TOE's performance without necessitating stateful support in the cards, everything that's worthwhile can be done with stateless offloads. Can we match the performance of the TOE? I doubt that general purpose processors have the capabilities to get there. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
On Thu, 18 Aug 2005, David S. Miller wrote: This is what has always happened in the past, people were preaching for TOE back when 100Mbit ethernet was new and fast. But you certainly don't see anyone trying to justify TOE for those link speeds today. The same will happen for 1Gbit and 10Gbit links a year or so from now, the cpu, memory, and PCI bus will be fast enough. In that time frame people will have TOEs for even higher speeds. TOE is therefore by definition a technology which we know will will be deprecated for current link technologies over time. It is a specialized hack, and once it's in we can never take it out of the kernel. Why put in a specialized hack when the fully functional, fully featureful, general purpose net stack is good enough? All technology will be depreciated over time. If we follow your line of thought then Linux network performance will be condemned to be only good enough, beating the prior generation of network performance trendsetters and never be the top contender for network performance. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Christoph Lameter [EMAIL PROTECTED] Date: Thu, 18 Aug 2005 20:39:41 -0700 (PDT) In that time frame people will have TOEs for even higher speeds. And once again it will be niche, and very far from commodity. A specialized optimization for a very small and specialized audience, ie. not appropriate for Linux upstream. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] TCP Offload (TOE) - Chelsio
With stateless offloading schemes? Absolutely it is possible. Even without stateless offloading, if it can't be done today, then they will soon. This is what has always happened in the past, people were preaching for TOE back when 100Mbit ethernet was new and fast. But you certainly don't see anyone trying to justify TOE for those link speeds today. The same will happen for 1Gbit and 10Gbit links a year or so from now, the cpu, memory, and PCI bus will be fast enough. Sure, today's technology has no issue with handling 1992 network speeds. TOE is therefore by definition a technology which we know will will be deprecated for current link technologies over time. It is a specialized hack, and once it's in we can never take it out of the kernel. Why put in a specialized hack when the fully functional, fully featureful, general purpose net stack is good enough? Can you explain why TOE is a hack while stateless offload is not? It is actually surprising that few seem to be concerned with what LSO and LRO do to TCP. Don't they both change the dynamics of TCP in non- standard ways? Doesn't this go against Linux's tradition of being the most RFC compliant of all stacks? LSO, for one, breaks TCP's clock, increases the sender's burstiness, disrupts congestion control, and only works in a lossless environment. Has anyone studied the impact of LSO on network congestion? Who has sanctioned its widespread use? A TOE must provide a fully standards compliant stack, which does not break TCP or change its behavior on the wire like stateless offload does. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
On Thu, 18 Aug 2005, David S. Miller wrote: And once again it will be niche, and very far from commodity. A specialized optimization for a very small and specialized audience, ie. not appropriate for Linux upstream. The TOE method will gradually become standard simply because it allows performance that cannot be obtained now with existing hardware. And we may be at some speed boundary for the hardware given the limitations on clock frequency. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Digital Aryan [EMAIL PROTECTED] Date: Fri, 19 Aug 2005 09:18:45 +0530 And seeing what has happened during 100Mbit, 1Gbit and 10Gbit it seems reqirements for networking are always one step ahead and the cpu, memory, bus-bandwidth will take time to match the requirements. Had this evllution been fast enough I wonder if we would have included TSO/LSO support or for that matter even the checksum offload in the stack. The full featured, generalised network stack would have done it all. The important point is that all of the offloading is completely stateless. And I continually, and vehemently, contend that stateless offloading is enough, and is highly desirable because it requires none of the internals exposure nonsense that TOE requires. And once Microsoft defines an interface for a stateless offload in their NDI, every network card vendor tends to implement it in their hardware. Wouldn't you rather have a commoditized $40.00USD gigabit network card that got TOE level performance? I guess that question's answer depends upon whether you have some financial state in a company doing TOE :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Christoph Lameter [EMAIL PROTECTED] Date: Thu, 18 Aug 2005 20:50:18 -0700 (PDT) The TOE method will gradually become standard simply because it allows performance that cannot be obtained now with existing hardware. And we may be at some speed boundary for the hardware given the limitations on clock frequency. The same performance can be obtained with stateless offloads. You continually ignore this possibility, as if TOE is the only way. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
Christoph Lameter wrote: On Thu, 18 Aug 2005, David S. Miller wrote: And once again it will be niche, and very far from commodity. A specialized optimization for a very small and specialized audience, ie. not appropriate for Linux upstream. The TOE method will gradually become standard simply because it allows performance that cannot be obtained now with existing hardware. And we may be at some speed boundary for the hardware given the limitations on clock frequency. False. Each TOE implementation is locked in time by the speed of the NIC. Given time, the network stack will -exceed- the speed of today's TOE NICs. You can see this with 100mbps TOE NICs, which are slower than today's software net stack, with today's software net stack being more featureful at the same time. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
On Thu, 18 Aug 2005, David S. Miller wrote: Wouldn't you rather have a commoditized $40.00USD gigabit network card that got TOE level performance? I guess that question's answer depends upon whether you have some financial state in a company doing TOE :-) We may have TOE in $40 network cards. In fact given the way things shape up there is the possibility that it may become difficult to get NICs without TOE next year. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
On Thu, 18 Aug 2005, David S. Miller wrote: The same performance can be obtained with stateless offloads. You continually ignore this possibility, as if TOE is the only way. TCP is a stateful protocol and what can be done with stateless offloads is very limited. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
Christoph Lameter wrote: We may have TOE in $40 network cards. In fact given the way things shape up there is the possibility that it may become difficult to get NICs without TOE next year. People have been saying this every year. Every year, we go through this argument. Every year, people fail to realize that the bottlenecks are not the software net stack, but RAM and PCI bus bandwidth. Every year, people forget that during the previous year, Intel, AMD, and other chipset/CPU makers increase the bandwidth at which network bottlenecks. And yet, somehow, life goes on... Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Wael Noureddine [EMAIL PROTECTED] Date: Thu, 18 Aug 2005 20:50:17 -0700 Can you explain why TOE is a hack while stateless offload is not? No knowledge of TCP internals necessary, that defines a clean and maintainable barrier between the device and the network stack. It also allows all of the network stack features to be enabled, even when offloading is being performed. It is actually surprising that few seem to be concerned with what LSO and LRO do to TCP. Don't they both change the dynamics of TCP in non- standard ways? Doesn't this go against Linux's tradition of being the most RFC compliant of all stacks? LSO, for one, breaks TCP's clock, increases the sender's burstiness, disrupts congestion control, and only works in a lossless environment. Has anyone studied the impact of LSO on network congestion? Who has sanctioned its widespread use? The loss issue is a bug in our implementation, not a limitation of LSO in any way, shape, or form. It will be fixed. Thanks, but no thanks, for the strawman. LSO is fully RFC compliant, and the necessity of that is why I fixed our implementation to correctly follow the congestion window rules. The is no RFC violated by being bursty. Show me the RFC where TCP burstiness is standardized. This is yet another strawman. All of the TOE folks have a big bee in their bonnets because none of the the networking stack and driver subsystem maintainers see it as a wise thing to put in. If it's come to the point where we're discussing things like LSO standards conformance and other such strawmen as a justification for TOE, then that's really pathetic. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Christoph Lameter [EMAIL PROTECTED] Date: Thu, 18 Aug 2005 20:58:39 -0700 (PDT) On Thu, 18 Aug 2005, David S. Miller wrote: The same performance can be obtained with stateless offloads. You continually ignore this possibility, as if TOE is the only way. TCP is a stateful protocol and what can be done with stateless offloads is very limited. Then why are we able to fill the pipe on the send side without any problem using stateless offloading alone? There's nothing doing it on receive right now simply because nobody has tried hard enough. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
The is no RFC violated by being bursty. Show me the RFC where TCP burstiness is standardized. This is yet another strawman. You surely know this is a recurring theme in all congestion control RFCs (RFC2581 in particular), as well as in the Known TCP Implementation Problems RFC2525. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Wael Noureddine [EMAIL PROTECTED] Subject: Re: [PATCH] TCP Offload (TOE) - Chelsio Date: Thu, 18 Aug 2005 21:37:07 -0700 The is no RFC violated by being bursty. Show me the RFC where TCP burstiness is standardized. This is yet another strawman. You surely know this is a recurring theme in all congestion control RFCs (RFC2581 in particular), Now I can take you even less seriously. In RFC2581, they are talking about unloading a burst of data into a connection where there has been significant idle time since the most recent data send. as well as in the Known TCP Implementation Problems RFC2525. In this RFC bursts are only mentioned in: 2.1: this is talking about lack of any slow start at all 2.3: this is talking about an uninitialized congestion window at connection startup 2.8: failure of window deflation after loss recovery 2.13: stretch ACK violation, which is discussing receiver behavior None of any of these RFCs discussing bursting are talking about a properly inflated congestion window, during an active and healthy transfer. LSO violates no RFC standard whatsoever. In short, you've brought several strawmen in an attempt to discredit stateless offloading as not being standards compliant. If you truly believe what you say, then please go ask SPEC to invalidate most of the current SpecWEB benchmark results because the vast majority of them are using LSO. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
On 8/12/05, David S. Miller [EMAIL PROTECTED] wrote: From: Dimitris Michailidis [EMAIL PROTECTED] Date: Fri, 12 Aug 2005 10:00:12 -0700 On 8/12/05, David S. Miller [EMAIL PROTECTED] wrote: This would mean that every time we wish to change the data structures and interfaces for TCP socket lookup, your drivers would need to change. I think using TCP's own functions was done exactly to avoid this problem. That's doesn't achieve the desired result. I do plan to merge in IBM's move of the TCP hash tables over to RCU style locking, and that will require knowledge of the locking at the call sites to the functions you have exported to the TOE drivers. The TOE drivers would break as a result. TOE uses the same locking strategies the host TCP uses (lock_sock and the rest) so it should at least be familiar. It doesn't use ehash_lock or head-lock other than indirectly through functions such as the above, and does its normal lookups in its own lockless table that is based on flow ids rather than 4-tuples. I haven't seen the patches you mention recently, I recall seeing some RCU ehash discussion several months ago and that didn't seem it would have much of an impact. If you have something more recent I can take a look and tell you if it would affect anything. You are creating a maintainence headache for us as well. Once this stuff gets exported to drivers, it becomes nearly impossible to change. And I absolutely reserve the right to create restrictions of use that increase the flexibility we have to change interfaces, data structures, and locking strategies in the future. I think you have a fine attitude here. There are and there will be a lot more users of the SW TCP than of TOEs and I think you should feel free to improve the former however you can. The TOE code still works with kernels going back to 2.4.22, tracking changes in mainline TCP hasn't been an issue so far. If you can give maintainers a heads up before changes you think may be disruptive I think that would be plenty on your part. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] TCP Offload (TOE) - Chelsio
OPEN TOE submission from Chelsio Communications. The following items have been addressed: - cleaned up indentation. - cleaned up comments. - cleaned up c-styles. - using EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL - removed 2.4 compatibility. - created TCP_OFFLOAD config option. - moved #defines to appropriate files. - removed obfuscating macros. - included necessary definitions instead of struct. - made IS_OFFLOADED an inline function instead of macro. The following items are currently being worked on: - use sysfs instead of procfs. - addressing the use of semaphores in 'register_tom'. - use RCU, need to look at this. - use inline function instead of TOEDEV macro, requires some work. Comments: - static was removed from functions '__tcp_inherit_port' '__tcp_v4_hash' because these are called outside of tcp_ipv4.c from the TOM driver. Signed-off-by: Scott Bardone [EMAIL PROTECTED] diff -Naur linux-2.6.13-rc6-git3/include/linux/netdevice.h linux-2.6.13-rc6-git3.patched/include/linux/netdevice.h --- linux-2.6.13-rc6-git3/include/linux/netdevice.h 2005-08-07 11:18:56.0 -0700 +++ linux-2.6.13-rc6-git3.patched/include/linux/netdevice.h 2005-08-11 21:28:36.0 -0700 @@ -408,6 +408,9 @@ #define NETIF_F_VLAN_CHALLENGED1024/* Device cannot handle VLAN packets */ #define NETIF_F_TSO2048/* Can offload TCP/IP segmentation */ #define NETIF_F_LLTX 4096/* LockLess TX */ +#ifdef CONFIG_TCP_OFFLOAD +#define NETIF_F_TCPIP_OFFLOAD 65536 /* Can offload TCP/IP */ +#endif /* Called after device is detached from network. */ void(*uninit)(struct net_device *dev); diff -Naur linux-2.6.13-rc6-git3/include/linux/tcp_diag.h linux-2.6.13-rc6-git3.patched/include/linux/tcp_diag.h --- linux-2.6.13-rc6-git3/include/linux/tcp_diag.h 2005-08-07 11:18:56.0 -0700 +++ linux-2.6.13-rc6-git3.patched/include/linux/tcp_diag.h 2005-08-11 21:28:36.0 -0700 @@ -4,6 +4,11 @@ /* Just some random number */ #define TCPDIAG_GETSOCK 18 +/* TOE API */ +#ifdef CONFIG_TCP_OFFLOAD +#define TCPDIAG_OFFLOAD 5 +#endif + /* Socket identity */ struct tcpdiag_sockid { diff -Naur linux-2.6.13-rc6-git3/include/linux/tcp.h linux-2.6.13-rc6-git3.patched/include/linux/tcp.h --- linux-2.6.13-rc6-git3/include/linux/tcp.h 2005-08-07 11:18:56.0 -0700 +++ linux-2.6.13-rc6-git3.patched/include/linux/tcp.h 2005-08-11 21:28:36.0 -0700 @@ -235,6 +235,10 @@ return (struct tcp_request_sock *)req; } +#ifdef CONFIG_TCP_OFFLOAD +struct toe_funcs; +#endif + struct tcp_sock { /* inet_sock has to be the first member of tcp_sock */ struct inet_sockinet; @@ -342,6 +346,10 @@ struct tcp_func *af_specific; /* Operations which are AF_INET{4,6} specific */ +#ifdef CONFIG_TCP_OFFLOAD + struct toe_funcs*toe_specific; /* Operations overriden by TOEs */ +#endif + __u32 rcv_wnd;/* Current receiver window */ __u32 rcv_wup;/* rcv_nxt on last window update sent */ __u32 write_seq; /* Tail(+1) of data held in tcp send buffer */ diff -Naur linux-2.6.13-rc6-git3/include/linux/toedev.h linux-2.6.13-rc6-git3.patched/include/linux/toedev.h --- linux-2.6.13-rc6-git3/include/linux/toedev.h1969-12-31 16:00:00.0 -0800 +++ linux-2.6.13-rc6-git3.patched/include/linux/toedev.h2005-08-11 22:37:03.94780 -0700 @@ -0,0 +1,126 @@ +/* + * * + * File: * + * toedev.h * + * * + * Description: * + * TOE device definitions. * + * * + * This program is free software; you can redistribute it and/or modify * + * it under the terms of the GNU General Public License, version 2, as * + * published by the Free Software Foundation.* + * * + * You should have received a copy of the GNU General Public License along * + * with this program; if not, write to the Free Software Foundation, Inc., * + * 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * + * * + * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED* + * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF * + * MERCHANTABILITY AND FITNESS
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Scott Bardone [EMAIL PROTECTED] Date: Thu, 11 Aug 2005 23:16:14 -0700 - static was removed from functions '__tcp_inherit_port' '__tcp_v4_hash' because these are called outside of tcp_ipv4.c from the TOM driver. There is no way you're going to be allowed to call such deep TCP internals from your driver. This would mean that every time we wish to change the data structures and interfaces for TCP socket lookup, your drivers would need to change. This is all looking exactly like the deep dark dungeon I feared TOE support would be. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
The networking gurus can comment on the internals of your patch better than I can. Just a few style notes though: +#ifdef CONFIG_TCP_OFFLOAD +#define NETIF_F_TCPIP_OFFLOAD65536 /* Can offload TCP/IP */ +#endif No need to protect this inside CONFIG_* option +/* TOE API */ +#ifdef CONFIG_TCP_OFFLOAD +#define TCPDIAG_OFFLOAD 5 +#endif Ditto +#ifdef CONFIG_TCP_OFFLOAD +struct toe_funcs; +#endif Ditto +#ifdef CONFIG_TCP_OFFLOAD +#include linux/toedev.h +#endif Include linux/toedev.h unconditionally. Have it handle the !CONFIG_TCP_OFFLOAD case itself by declaring noop macros for things like toe_neigh_update(). This way you can remove a lot of the #ifdef's you've sprinkled all over the .c files +#define boot_phase 0 Some explaination here? It looks like something left over from development. +#ifndef __raise_softirq_irqoff +#define __raise_softirq_irqoff(nr) __cpu_raise_softirq(smp_processor_id(), nr) +#endif What is this needed for? +static int toedev_init(void); This forward declaration seems to be only needed for the boot_phase thing above, so if that goes this can go as well. +/* + * Allocate a unique index for a TOE device. We keep the index within 30 bits Maybe look at lib/idr.c to handle this? + struct toedev *dev = kmalloc(sizeof(struct toedev), GFP_KERNEL); + + if (dev) { + memset(dev, 0, sizeof(struct toedev)); Minor nitpick (that some might disagree with)... I usually prefer: struct toedev *dev = kmalloc(sizeof(*dev), GFP_KERNEL); +int toe_receive_skb(struct toedev *dev, struct sk_buff **skb, int n) +{ + int i; n and i should probably be unsigned int +#ifdef CONFIG_TCP_OFFLOAD + tcp_listen_offload(sk); +#endif Another example of something that could be an empty macro in a .h file for the !CONFIG_TCP_OFFLOAD case. +#ifndef CONFIG_TCP_OFFLOAD +static +#endif Don't do this... just make it non-static unconditionally. It's not worth the ugliness. Same applies to other places. +#ifndef CONFIG_TCP_OFFLOAD +static +#endif +__inline__ void __tcp_inherit_port(struct sock *sk, struct sock *child) { struct tcp_bind_hashbucket *head = tcp_bhash[tcp_bhashfn(inet_sk(child)-num)]; @@ -351,7 +357,10 @@ } } Things that are inline and are now going to be shared really need to just remain static inline and move to a header file probably +#ifdef CONFIG_TCP_OFFLOAD + if (tcp_connect_offload(sk)) + return 0; +#endif Just another example of the kind of #ifdef that doesn't belong in the .c files. If the !CONFIG_TCP_OFFLOAD case just had #define tcp_connect_offload(sk) (0) then you can skip the #ifdef +#ifndef CONFIG_TCP_OFFLOAD LIMIT_NETDEBUG(printk(KERN_DEBUG TCP: drop open request from %u.%u. %u.%u/%u\n, NIPQUAD(saddr), ntohs(skb-h.th-source))); +#else + NETDEBUG(if (net_ratelimit()) \ + printk(KERN_DEBUG TCP: drop open +request from %u.%u. +%u.%u/%u\n, \ +NIPQUAD(saddr), +ntohs(skb-h.th-source))); +#endif Huh? What about TOE requires changes to printk ratelimiting? -Mitch - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
David S. Miller wrote: From: Scott Bardone [EMAIL PROTECTED] Date: Thu, 11 Aug 2005 23:16:14 -0700 - static was removed from functions '__tcp_inherit_port' '__tcp_v4_hash' because these are called outside of tcp_ipv4.c from the TOM driver. There is no way you're going to be allowed to call such deep TCP internals from your driver. This would mean that every time we wish to change the data structures and interfaces for TCP socket lookup, your drivers would need to change. This is all looking exactly like the deep dark dungeon I feared TOE support would be. Although I keep an open mind, I really don't see how any TOE solution will ever overcome my own conceptual merge objections: 1) RFC compliance differs based on whether you use a TOE NIC, or Linux software stack. What Linux am I talking to, today? Linux is consistently the most RFC-compliant net stack in existence, AFAIK. TOE suddenly leaves all that open to question. 2) Security updates. We can deploy a net stack security fix very rapidly, and know that we have solved the issue(s). With TOE, security fixes no longer cover all users. One has to either wait on multiple TOE vendors to deploy firmware fixes, or deploy the software fix and leave TOE users exposed. Once again... What Linux am I talking to, today? 3) Netfilter. Either a TOE NIC (a) doesn't support netfilter, (b) needs far-reaching packet mangling hooks, or (c) includes its own custom netfilter [clone], with attendant bugs and maintenance issues. 4) Configuration. Either a TOE NIC needs deep net stack hooks, or needs its own netlink/ifconfig configuration interfaces. 5) As we see in this thread -- upper layer (TCP, IP) changes in the net stack require touching a bunch of low-level drivers. Brand new maintenance issue, which slows down upper layer development. So far, I haven't seen a TOE NIC that satisfies even half of these objections. About the only TOE situation I could imagine which -would- would be where the TOE firmware source code is included in the Linux kernel source code, but even then, all the hooks would be nasty. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
I'm fairly pessimistic about full TOE also, I just want to see the patch cleaned up a bit so we can see the exact impact it would have. The RX optimization work presented in the Neterion and Intel papers at OLS sounds a lot more interesting to me though. However, I do want to comment on one statement of yours: Jeff Garzik wrote: 3) Netfilter. Either a TOE NIC (a) doesn't support netfilter, (b) needs far-reaching packet mangling hooks, or (c) includes its own custom netfilter [clone], with attendant bugs and maintenance issues. I don't think netfilter is a big deal. The kernel could still check the TCP handshake packets (or, if needed, faked-up versions with the same data) at accept()/connect() time. If those pass muster it's a pretty good bet that the other 100,000 packets making up that TCP connection would also. Of course this limitation would need to be documented but I doubt most netfilter users would mind too much. There's obviously edge cases where you can lose like if you update the netfilter rules you ideally want to revalidate all the currently open connections. Since TOE hardware is designed to help the TCP end point you probably don't have to worry about NAT or other fancy mangling on these interfaces. -Mitch - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Dimitris Michailidis [EMAIL PROTECTED] Date: Fri, 12 Aug 2005 10:22:47 -0700 This is true. There is nothing fundamentally preventing both passive and active opens to check netfilter before OKing a connection. Once a connection is established, it's rather impractical to run each of its packets through netfilter, this is 10G after all. You'd probably not lose much functionality that you could have otherwise used at these speeds. People don't use netfilter just for state tracking and filtering, they also use it to some extent for rate limiting, packet logging, and similar things. And as busses and cpus get faster, your this is 10G after all argument becomes null and void. Note that this TOE mess also makes the packet scheduler, queueing disciplines, and packet classifiers totally unusable as well. Essentially, half of the Linux networking stack's features are turned uncontrollably _OFF_ in the presence of TOE. It is this, along with many other reasons, why the Linux networking community, in general, are so against TOE. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
From: Dimitris Michailidis [EMAIL PROTECTED] Date: Fri, 12 Aug 2005 10:00:12 -0700 On 8/12/05, David S. Miller [EMAIL PROTECTED] wrote: This would mean that every time we wish to change the data structures and interfaces for TCP socket lookup, your drivers would need to change. I think using TCP's own functions was done exactly to avoid this problem. That's doesn't achieve the desired result. I do plan to merge in IBM's move of the TCP hash tables over to RCU style locking, and that will require knowledge of the locking at the call sites to the functions you have exported to the TOE drivers. The TOE drivers would break as a result. You are creating a maintainence headache for us as well. Once this stuff gets exported to drivers, it becomes nearly impossible to change. And I absolutely reserve the right to create restrictions of use that increase the flexibility we have to change interfaces, data structures, and locking strategies in the future. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html