Jason Gunthorpe wrote:
I was saying that point in the rdmacm where the rdma_cm_id is bound to a local
RDMA device should have only been rdma_resolve_addr and rdma_accept.
Overloading rdma_bind_addr to both bind to an IP and bind to an RDMA device was
a bad API choice.
As you wrote, for the most case, binding comes into play only for users
calling rdma_resolve_address or rdma_accept, for users the need explicit
binding the rdma-cm provides rdma_bind and it binds to both IP and
Device, if you can do better, send a patch, binding can't be removed
from the API since it has users and it makes sense from users to
require it.
Sean is right, there may be special cases that require an early binding, but a
seperate API - like IP's SO_BINDTODEVICE - would has been better - and users
are forewarned that calling it restricts the environments their app will support
its just naming, will $ sed s/rdma_bind/rdma_set_opt(RDMA_BINDTODEVICE)/g
make you happier? why?
As it stands we have several impossible situations. Sean, Dave, and I were disucssing the trades offs of what this means relative to IP route resolution
Don't tell me that Dave's patches are blocked b/c you discovered the
rdma_bind design and now you don't like it, as I wrote you, Dave sent
patch to fix the IPv6 support, during the discussion on his patches you
come and bring up more and more issues you consider as problems (but I
don't) and block the patch set, I don't think this is appropriate. Let
the patches go and send your patches to fix the problems you see. Why
anyone touching some code piece has to fix problems you see in that piece?!
- but it affects bonding too. If you rdma_bind_addr to the IP of a bonding
device, the stack must pick one of the local RDMA ports immediately. If you
then call rdma_listen there is a problem: incoming connections may target
either RDMA device, but you are only bound to one of them. An app cannot say 'I
want to listen on this IP, any RDMA device' with the current API, as you can in
IP, and that is a shame
An app can say, I want to listen on that IP and the RDMA device which is
associated with this IP now. When bonding does fail-over it generated a
netevent, the rdma-cm catches this event and generates address change
event, apps can redo their bind/listen at this point. For the time
being, we never got a user report on a problem, people are doing listen
on all IPs probably which works perfect with bonding. Currently the HA
mode of bonding will respond on ARP only on one of the devices and as
such connection requests will not target any rdma device but rather only
the active one. If this is such a shame, send fix, spraying mud on the
maintainer and/or someone sending another patch is a shame, isn't it?
Traditionally with ethernet the L2 bonding is really only used for link
aggregation, L1 failure, and a simple multi-switch HA scheme. It is not
deployed if you have multiple ethernet domains. Some people prefer to have
dual, independent ethernet fabrics, and in that case you rely on routing
features to get the multipath, and HA features of bonding.
okay, thanks for the crash course.
Go back on the list and look up the posts from Leo who first discovered this,
what he was trying to do is kinda the L3 bonding approach.
if Loe has a problem and you want to help him, bring it on the list,
debate, send patches, jumping into someone's else patch isn't the
constructive way to go.
David has been doing a good job and I am glad he is working on the IPv6
support. My comments are only intended to clarify how this is all supposed to
work and why the IP flow is actually still relevant to RDMA connections.
As I see it, your comments block the the patches sent by Dave, Sean?
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html