Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Sean Hefty <[EMAIL PROTECTED]> Date: Wed, 10 Oct 2007 14:01:07 -0700 > > The hack to use a socket and bind it to claim the port was just for > > demostrating the idea. The correct solution, IMO, is to enhance the > > core low level 4-tuple allocation services to be more generic (eg: not > > be tied to a struct sock). Then the host tcp stack and the host rdma > > stack can allocate TCP/iWARP ports/4tuples from this common exported > > service and share the port space. This allocation service could also be > > used by other deep adapters like iscsi adapters if needed. > > Since iWarp runs on top of TCP, the port space is really the same. > FWIW, I agree that this proposal is the correct solution to support iWarp. But you can be sure it's not going to happen, sorry. It would mean that we'd need to export the entire TCP socket table so then when iWARP connections are created you can search to make sure there is not an existing full 4-tuple that is the same. It is not just about local TCP ports. iWARP needs to live in it's seperate little container and not contaminate the rest of the networking, this is the deal. Any suggested such change which breaks that deal will be NACK'd by all of the core networking developers. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
The hack to use a socket and bind it to claim the port was just for demostrating the idea. The correct solution, IMO, is to enhance the core low level 4-tuple allocation services to be more generic (eg: not be tied to a struct sock). Then the host tcp stack and the host rdma stack can allocate TCP/iWARP ports/4tuples from this common exported service and share the port space. This allocation service could also be used by other deep adapters like iscsi adapters if needed. Since iWarp runs on top of TCP, the port space is really the same. FWIW, I agree that this proposal is the correct solution to support iWarp. - Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
On Mon, 8 Oct 2007, Steve Wise wrote: > The correct solution, IMO, is to enhance the core low level 4-tuple > allocation services to be more generic (eg: not be tied to a struct > sock). Then the host tcp stack and the host rdma stack can allocate > TCP/iWARP ports/4tuples from this common exported service and share > the port space. This allocation service could also be used by other > deep adapters like iscsi adapters if needed. As a developer of an RDMA ULP, NFS-RDMA, I like this approach because it will simplify the configuration of an RDMA device and the services that use it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
David Miller wrote: From: Sean Hefty <[EMAIL PROTECTED]> Date: Thu, 09 Aug 2007 14:40:16 -0700 Steve Wise wrote: Any more comments? Does anyone have ideas on how to reserve the port space without using a struct socket? How about we just remove the RDMA stack altogether? I am not at all kidding. If you guys can't stay in your sand box and need to cause problems for the normal network stack, it's unacceptable. We were told all along the if RDMA went into the tree none of this kind of stuff would be an issue. These are exactly the kinds of problems for which people like myself were dreading. These subsystems have no buisness using the TCP port space of the Linux software stack, absolutely none. After TCP port reservation, what's next? It seems an at least bi-monthly event that the RDMA folks need to put their fingers into something else in the normal networking stack. No more. I will NACK any patch that opens up sockets to eat up ports or anything stupid like that. Hey Dave, The hack to use a socket and bind it to claim the port was just for demostrating the idea. The correct solution, IMO, is to enhance the core low level 4-tuple allocation services to be more generic (eg: not be tied to a struct sock). Then the host tcp stack and the host rdma stack can allocate TCP/iWARP ports/4tuples from this common exported service and share the port space. This allocation service could also be used by other deep adapters like iscsi adapters if needed. Will you NAK such a solution if I go implement it and submit for review? The dual ip subnet solution really sux, and I'm trying one more time to see if you will entertain the common port space solution, if done correctly. Thanks, Steve. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Roland Dreier <[EMAIL PROTECTED]> Date: Tue, 28 Aug 2007 12:38:07 -0700 > It seems that the NIC would also have to look into a TCP stream (and > handle out of order segments etc) to find message boundaries for this > to be equivalent to what an RDMA NIC does. It would work for data that accumulates in-order, give or take a small window, just like LRO does. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Sorry for the long latency, I was at the beach all last week. > > And direct data placement really does give you a factor of two at > > least, because otherwise you're stuck receiving the data in one > > buffer, looking at some of the data at least, and then figuring out > > where to copy it. And memory bandwidth is if anything becoming more > > valuable; maybe LRO + header splitting + page remapping tricks can get > > you somewhere but as NCPUS grows then it seems the TLB shootdown cost > > of page flipping is only going to get worse. > As Herbert has said already, people can code for this just like > they have to code for RDMA. No argument, you need to change the interface to take advantage of RDMA. > There is no fundamental difference from converting an application > to sendfile or similar. Yes, on the transmit side, there's not much difference from sendfile or splice, although RDMA may give a slightly nicer interface that also gives basically the equivalent of AIO. > The only thing this needs is a > "recvmsg_I_dont_care_where_the_data_is()" call. There are no alignment > issues unless you are trying to push this data directly into the > page cache. I don't understand how this gives you the same thing as direct data placement (DDP). There are many situations where the sender knows where the data has to go and if there's some way to pass that to the receiver, so that info can be used in the receive path to put the data in the right place, the receiver can save a copy. This is fundamentally the same "offload" that an FC HBA does -- the SCSI midlayer queues up commands like "read block A and put the data at address X" and "read block B and put the data at address Y" and the HBA matches tags on incoming data to put the blocks at the right addresses, even if block B is received before block A. RFC 4297 has some discussion of the various approaches, and while you might not agree with their conclusions, it is interesting reading. > Couple this with a card that makes sure that on a per-page basis, only > data for a particular flow (or group of flows) will accumulate. It seems that the NIC would also have to look into a TCP stream (and handle out of order segments etc) to find message boundaries for this to be equivalent to what an RDMA NIC does. - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Roland Dreier <[EMAIL PROTECTED]> Date: Mon, 20 Aug 2007 18:16:54 -0700 > And direct data placement really does give you a factor of two at > least, because otherwise you're stuck receiving the data in one > buffer, looking at some of the data at least, and then figuring out > where to copy it. And memory bandwidth is if anything becoming more > valuable; maybe LRO + header splitting + page remapping tricks can get > you somewhere but as NCPUS grows then it seems the TLB shootdown cost > of page flipping is only going to get worse. As Herbert has said already, people can code for this just like they have to code for RDMA. There is no fundamental difference from converting an application to sendfile or similar. The only thing this needs is a "recvmsg_I_dont_care_where_the_data_is()" call. There are no alignment issues unless you are trying to push this data directly into the page cache. Couple this with a card that makes sure that on a per-page basis, only data for a particular flow (or group of flows) will accumulate. People already make cards that can do stuff like this, it can be done statelessly with an on-chip dynamically maintained flow table. And best yet it doesn't turn off every feature in the networking nor bypass it for the actual protocol processing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
[TSO / LRO discussion snipped -- it's not the main point so no sense spending energy arguing about it] > Just be realistic and accept that RDMA is a point in time solution, > and like any other such technology takes flexibility away from users. > > Horizontal scaling of cpus up to huge arity cores, network devices > using large numbers of transmit and receive queues and classification > based queue selection, are all going to work to make things like RDMA > even more irrelevant than they already are. To me there is a real fundamental difference between RDMA and traditional SOCK_STREAM / SOCK_DATAGRAM networking, namely that messages can carry the address where they're supposed to be delivered (what the IETF calls "direct data placement"). And on top of that you can build one-sided operations aka put/get aka RDMA. And direct data placement really does give you a factor of two at least, because otherwise you're stuck receiving the data in one buffer, looking at some of the data at least, and then figuring out where to copy it. And memory bandwidth is if anything becoming more valuable; maybe LRO + header splitting + page remapping tricks can get you somewhere but as NCPUS grows then it seems the TLB shootdown cost of page flipping is only going to get worse. Don't get too hung up on the fact that current iWARP (RDMA over IP) implementations are using TCP offload -- to me that is just a side effect of doing enough processing on the NIC side of the PCI bus to be able to do direct data placement. InfiniBand with competely different transport, link and physical layers is one way to implement RDMA without TCP offload and I'm sure there will be others -- eg Intel's IOAT stuff could probably evolve to the point where you could implement iWARP with software TCP and the data placement offloaded to some DMA engine. - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Roland Dreier <[EMAIL PROTECTED]> Date: Fri, 17 Aug 2007 22:23:01 -0700 > Also, looking at the complexity and bug-fixing effort that go into > making TSO work vs the really pretty small gain it gives also makes > part of me wonder whether the noble proclamations about > maintainability are always taken to heart. The cpu and bus utilization improvements of TSO on the sender side are more than significant. Ask anyone who looks closely at this. For example, as part of his batching work Krisha Kumar has been posting lots of numbers lately on the netdev list, I'm sure he can post more specific numbers comparing the current stack in the case of TSO disabled vs. TSO enabled if that is what you need to see how beneficial TSO in fact is. If TSO is such a lose why does pretty much every ethernet chip vendor implement it in hardware? If you say it's just because Microsoft defines TSO in their NDI, that's a total cop-out. It really does help performance a lot. Why did the Xen folks bother making generic software TSO infrastructure for the kernel for the benefit of their virtualization network device? Why would someone as bright as Herbert Xu even bother to implement that stuff if TSO gives a "pretty small gain"? Similarly for LRO and this isn't defined in NDI at all. Vendors are going so far as to put full flow tables in their chips in order to do LRO better. Using the bugs and issues we've run into while implementing TSO as evidence there is something wrong with it is a total straw man. Look how many times the filesystem page cache has been rewritten over the years. Use the TSO problems as more of an example of how shitty a programmer I must be. :) Just be realistic and accept that RDMA is a point in time solution, and like any other such technology takes flexibility away from users. Horizontal scaling of cpus up to huge arity cores, network devices using large numbers of transmit and receive queues and classification based queue selection, are all going to work to make things like RDMA even more irrelevant than they already are. If you can't see that this is the future, you have my condolences. Because frankly, the signs are all around that this is where things are going. The work doesn't belong in these special purpose devices, they belong in the far-end-node compute resources, and our computers are getting more and more of these general purpose compute engines every day. We will be constantly moving away from specialized solutions and towards those which solve large classes of problems for large groups of people. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
> This is also a series of falsehoods. All packet filtering, > queue management, and packet scheduling facilities work perfectly > fine and as designed with both LRO and TSO. I'm not sure I follow. Perhaps "broken" was too strong a word to use, but if you pass a huge segment to a NIC with TSO, then you've given the NIC control of scheduling the packets that end up getting put on the wire. If your software packet scheduling is operating at a bigger scale, then things work fine, but I don't see how you can say that TSO doesn't lead to head-of-line blocking etc at short time scales. And yes of course I agree you can make sure things work by using short segments or not using TSO at all. Similarly with LRO the packets that get passed to the stack are not the packets that were actually on the wire. Sure, most filtering will work fine but eg are you sure your RTT estimates aren't going to get screwed up and cause some subtle bug? And I could trot out all the same bugaboos that are brought up about RDMA and warn darkly about security problems with bugs in NIC hardware that after all has to parse and rewrite TCP and IP packets. Also, looking at the complexity and bug-fixing effort that go into making TSO work vs the really pretty small gain it gives also makes part of me wonder whether the noble proclamations about maintainability are always taken to heart. Of course I know everything I just wrote is wrong because I forgot to refer to the crucial axiom that stateless == good && RDMA == bad. And sometimes it's unfortunate that in Linux when there's disagreement about something, the default action is *not* to do something. Sorry for prolonging this argument. Dave, I should say that I appreciate all the work you've done in helping build the most kick-ass networking stack in history. And as I said before, I have plenty of interesting work to do however this turns out, so I'll try to leave any further arguing to people who actually have a dog in this fight. - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Roland Dreier <[EMAIL PROTECTED]> Date: Fri, 17 Aug 2007 16:31:07 -0700 > > > > When using RDMA you lose the capability to do packet shaping, > > > > classification, and all the other wonderful networking facilities > > > > you've grown to love and use over the years. > > > > > > Same thing with TSO and LRO and who knows what else. > > > > Not true at all. Full classification and filtering still is usable > > with TSO and LRO. > > Well, obviously with TSO and LRO the packets that the stack sends or > receives are not the same as what's on the wire. Whether that breaks > your wonderful networking facilities or not depends on the specifics > of the particular facility I guess -- for example shaping is clearly > broken by TSO. (And people can wonder what the packet trains TSO > creates do to congestion control on the internet, but the netdev crowd > has already decided that TSO is "good" and RDMA is "bad") This is also a series of falsehoods. All packet filtering, queue management, and packet scheduling facilities work perfectly fine and as designed with both LRO and TSO. When problems come up, they are bugs, and we fix them. Please stop spreading this FUD about TSO and LRO. The fact is that RDMA bypasses the whole stack so that supporting these facilities is not even _POSSIBLE_. With stateless offloads it is possible to support all of these facilities, and we do. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
> > > When using RDMA you lose the capability to do packet shaping, > > > classification, and all the other wonderful networking facilities > > > you've grown to love and use over the years. > > > > Same thing with TSO and LRO and who knows what else. > > Not true at all. Full classification and filtering still is usable > with TSO and LRO. Well, obviously with TSO and LRO the packets that the stack sends or receives are not the same as what's on the wire. Whether that breaks your wonderful networking facilities or not depends on the specifics of the particular facility I guess -- for example shaping is clearly broken by TSO. (And people can wonder what the packet trains TSO creates do to congestion control on the internet, but the netdev crowd has already decided that TSO is "good" and RDMA is "bad") - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Roland Dreier <[EMAIL PROTECTED]> Date: Fri, 17 Aug 2007 12:52:39 -0700 > > When using RDMA you lose the capability to do packet shaping, > > classification, and all the other wonderful networking facilities > > you've grown to love and use over the years. > > Same thing with TSO and LRO and who knows what else. Not true at all. Full classification and filtering still is usable with TSO and LRO. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
> > Isn't RDMA _part_ of the "software net stack" within Linux? > It very much is not so. This is just nit-picking. You can draw the boundary of the "software net stack" wherever you want, but I think Sean's point was just that RDMA drivers already are part of Linux, and we all want them to get better. > When using RDMA you lose the capability to do packet shaping, > classification, and all the other wonderful networking facilities > you've grown to love and use over the years. Same thing with TSO and LRO and who knows what else. I know you're going to make a distinction between "stateless" and "stateful" offloads, but really it's just an arbitrary distinction between things you like and things you don't. > Imagine if you didn't know any of this, you purchase and begin to > deploy a huge piece of RDMA infrastructure, you then get the mandate > from IT that you need to add firewalling on the RDMA connections at > the host level, and "oh shit" you can't? It's ironic that you bring up firewalling. I've had vendors of iWARP hardware tell me they would *love* to work with the community to make firewalling work better for RDMA connections. But instead we get the catch-22 of your changing arguments -- first, you won't even consider changes that might help RDMA work better in the name of maintainability; then you have to protect poor, ignorant users from accidentally using RDMA because of some problem or another; and then when someone tries to fix some of the problems you mention, it's back to step one. Obviously some decisions have been prejudged here, so I guess this moves to the realm of politics. I have plenty of interesting technical stuff, so I'll leave it to the people with a horse in the race to find ways to twist your arm. - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Tom Tucker <[EMAIL PROTECTED]> Date: Thu, 16 Aug 2007 08:43:11 -0500 > Isn't RDMA _part_ of the "software net stack" within Linux? It very much is not so. When using RDMA you lose the capability to do packet shaping, classification, and all the other wonderful networking facilities you've grown to love and use over the years. I'm glad this is a surprise to you, because it illustrates the point some of us keep trying to make about technologies like this. Imagine if you didn't know any of this, you purchase and begin to deploy a huge piece of RDMA infrastructure, you then get the mandate from IT that you need to add firewalling on the RDMA connections at the host level, and "oh shit" you can't? This is why none of us core networking developers like RDMA at all. It's totally not integrated with the rest of the Linux stack and on top of that it even gets in the way. It's an abberation, an eye sore, and a constant source of consternation. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
On Wed, 2007-08-15 at 22:26 -0400, Jeff Garzik wrote: [...snip...] > > I think removing the RDMA stack is the wrong thing to do, and you > > shouldn't just threaten to yank entire subsystems because you don't like > > the technology. Lets keep this constructive, can we? RDMA should get > > the respect of any other technology in Linux. Maybe its a niche in your > > opinion, but come on, there's more RDMA users than say, the sparc64 > > port. Eh? > > It's not about being a niche. It's about creating a maintainable > software net stack that has predictable behavior. Isn't RDMA _part_ of the "software net stack" within Linux? Why isn't making RDMA stable, supportable and maintainable equally as important as any other subsystem? > > Needing to reach out of the RDMA sandbox and reserve net stack resources > away from itself travels a path we've consistently avoided. > > > >> I will NACK any patch that opens up sockets to eat up ports or > >> anything stupid like that. > > > > Got it. > > Ditto for me as well. > > Jeff > > > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
> Needing to reach out of the RDMA sandbox and reserve net stack > resources away from itself travels a path we've consistently avoided. Where did the idea of an "RDMA sandbox" come from? Obviously no one disagrees with keeping things clean and maintainable, but the idea that RDMA is a second-class citizen that doesn't get any input into the evolution of the networking code seems kind of offensive to me. - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Steve Wise wrote: David Miller wrote: From: Sean Hefty <[EMAIL PROTECTED]> Date: Thu, 09 Aug 2007 14:40:16 -0700 Steve Wise wrote: Any more comments? Does anyone have ideas on how to reserve the port space without using a struct socket? How about we just remove the RDMA stack altogether? I am not at all kidding. If you guys can't stay in your sand box and need to cause problems for the normal network stack, it's unacceptable. We were told all along the if RDMA went into the tree none of this kind of stuff would be an issue. I think removing the RDMA stack is the wrong thing to do, and you shouldn't just threaten to yank entire subsystems because you don't like the technology. Lets keep this constructive, can we? RDMA should get the respect of any other technology in Linux. Maybe its a niche in your opinion, but come on, there's more RDMA users than say, the sparc64 port. Eh? It's not about being a niche. It's about creating a maintainable software net stack that has predictable behavior. Needing to reach out of the RDMA sandbox and reserve net stack resources away from itself travels a path we've consistently avoided. I will NACK any patch that opens up sockets to eat up ports or anything stupid like that. Got it. Ditto for me as well. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
David Miller wrote: From: Sean Hefty <[EMAIL PROTECTED]> Date: Thu, 09 Aug 2007 14:40:16 -0700 Steve Wise wrote: Any more comments? Does anyone have ideas on how to reserve the port space without using a struct socket? How about we just remove the RDMA stack altogether? I am not at all kidding. If you guys can't stay in your sand box and need to cause problems for the normal network stack, it's unacceptable. We were told all along the if RDMA went into the tree none of this kind of stuff would be an issue. I think removing the RDMA stack is the wrong thing to do, and you shouldn't just threaten to yank entire subsystems because you don't like the technology. Lets keep this constructive, can we? RDMA should get the respect of any other technology in Linux. Maybe its a niche in your opinion, but come on, there's more RDMA users than say, the sparc64 port. Eh? These are exactly the kinds of problems for which people like myself were dreading. These subsystems have no buisness using the TCP port space of the Linux software stack, absolutely none. Ok, although IMO its the correct solution. But I'll propose other solutions below. I ask for your feedback (and everyones!) on these alternate solutions. After TCP port reservation, what's next? It seems an at least bi-monthly event that the RDMA folks need to put their fingers into something else in the normal networking stack. No more. The only other change requested and commited, if I recall correctly, was for netevents, and that enabled both Infiniband and iWARP to integrate with the neighbour subsystem. I think that was a useful and needed change. Prior to that, these subsystems were snooping ARP replies to trigger events. That was back in 2.6.18 or 2.6.19 I think... I will NACK any patch that opens up sockets to eat up ports or anything stupid like that. Got it. Here are alternate solutions that avoid the need to share the port space: Solution 1) 1) admins must setup an alias interface on the iwarp device for use with rdma. This interface will have to be a separate subnet from the "TCP used" interface. And with a canonical name that indicates its "for rdma only". Like eth2:iw or eth2:rdma. There can be many of these per device. 2) admins make sure their sockets/tcp services don't use the interface configured in #1, and their rdma service do use said interface. 3) iwarp providers must translation binds to ipaddr 0.0.0.0 to the associated "for rdma only" ip addresses. They can do this by searching for all aliases of the canonical name that are aliases of the TCP interface for their nic device. Or: somehow not handle incoming connections to any address but the "for rdma use" addresses and instead pass them up and not offload them. This will avoid the collisions as long as the above steps are followed. Solution 2) Another possibility would be for the driver to create two net devices (and hence two interace names) like "eth2" and "iw2", and artificially separate the RDMA stuff that way. These two solutions are similar in that they create a "rdma only" interface. Pros: - is not intrusive into the core networking code - very minimal changes needed and in the iwarp provider's code, who are the ones with this problem - makes it clear which subnets are RDMA only Cons: - relies on system admin to set it up correctly. - native stack can still "use" this rdma-only interface and the same port space issue will exist. For the record, here are possible port-sharing solutions Dave sez he'll NAK: Solution NAK-1) The rdma-cma just allocates a socket and binds it to reserve TCP ports. Pros: - minimal changes needed to implement (always a plus in my mind :) - simple, clean, and it works (KISS) - if no RDMA is in use, there is no impact on the native stack - no need for a seperate RDMA interface Cons: - wastes memory - puts a TCP socket in the "CLOSED" state in the pcb tables. - Dave will NAK it :) Solution NAK-2) Create a low-level sockets-agnostic port allocation service that is shared by both TCP and RDMA. This way, the rdma-cm can reserve ports in an efficient manor instead of doing it via kernel_bind() using a sock struct. Pros: - probably the correct solution (my opinion :) if we went down the path of sharing port space - if no RDMA is in use, there is no impact on the native stack - no need for a separate RDMA interface Cons: - very intrusive change because the port allocations stuff is tightly bound to the host stack and sock struct, etc. - Dave will NAK it :) Steve. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
How about we just remove the RDMA stack altogether? I am not at all kidding. If you guys can't stay in your sand box and need to cause problems for the normal network stack, it's unacceptable. We were told all along the if RDMA went into the tree none of this kind of stuff would be an issue. There are currently two RDMA solutions available. Each solution has different requirements and uses the normal network stack differently. Infiniband uses its own transport. iWarp runs over TCP. We have tried to leverage the existing infrastructure where it makes sense. After TCP port reservation, what's next? It seems an at least bi-monthly event that the RDMA folks need to put their fingers into something else in the normal networking stack. No more. Currently, the RDMA stack uses its own port space. This causes a problem for iWarp, and is what Steve is looking for a solution for. I'm not an iWarp guru, so I don't know what options exist. Can iWarp use its own address family? Identify specific IP addresses for iWarp use? Restrict iWarp to specific port numbers? Let the app control the correct operation? I don't know. Steve merely defined a problem and suggested a possible solution. He's looking for constructive help trying to solve the problem. - Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Sean Hefty <[EMAIL PROTECTED]> Date: Thu, 09 Aug 2007 14:40:16 -0700 > Steve Wise wrote: > > Any more comments? > > Does anyone have ideas on how to reserve the port space without using a > struct socket? How about we just remove the RDMA stack altogether? I am not at all kidding. If you guys can't stay in your sand box and need to cause problems for the normal network stack, it's unacceptable. We were told all along the if RDMA went into the tree none of this kind of stuff would be an issue. These are exactly the kinds of problems for which people like myself were dreading. These subsystems have no buisness using the TCP port space of the Linux software stack, absolutely none. After TCP port reservation, what's next? It seems an at least bi-monthly event that the RDMA folks need to put their fingers into something else in the normal networking stack. No more. I will NACK any patch that opens up sockets to eat up ports or anything stupid like that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Steve Wise wrote: Any more comments? Does anyone have ideas on how to reserve the port space without using a struct socket? - Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/