On Tue, Jan 29, 2008 at 07:37:42PM -0500, George Bosilca wrote: > The previous code was correct. Each IP address correspond to a > specific endpoint, and therefore to a specific BTL. This enable us to > have multiple TCP BTL at the same time, and allow the OB1 PML to > stripe the data over all of them. > > Unfortunately, your commit disable the multi-rail over TCP. Please > undo it.
That's exactly what I had in mind when I said "this might break functionality". So we need as many endpoints as IP addresses? Then, simply connecting them leads to oversubscription: two parallel connections on the same media. That's where the kernel index enters the scene: we'll have to make sure not to open two parallel connections to the same remote kernel index. I'll revert the patch and come up with another solution, but for the moment, let me point out that the assumption "One interface, one address" isn't true. So, the previous code was also wrong. I hope not to run into model limitations: avoiding oversubscription means to keep the number of endpoints per peer lower than the amount of his interfaces, but accepting incoming connections from this peer means to have all his addresses (probably more than #remote_NICs) available in order to accept them. As mentioned earlier: it's very common to have multiple addresses per interface, and it's the kernel who assigns the source address, so there's nothing one could say about an incoming connection. Only that it could be any of all exported addresses. Any. -- Cluster and Metacomputing Working Group Friedrich-Schiller-Universität Jena, Germany private: http://adi.thur.de