On Thu, Sep 16, 2010 at 06:29:03PM -0500, J. Ryan Earl wrote: > Hello, > > I've recently setup an InfiniBand 40Gbit interconnect between two nodes to > run DRBD on top of some pretty fast storage. I am able to get DRBD to work > over Ethernet and IPoIB, however, when I try to enable SDP for the lower > latency, lower overhead communication I'm getting connection errors: > > block drbd0: conn( Unconnected -> WFConnection ) > block drbd0: connect failed, err = -22 > block drbd0: connect failed, err = -22 > block drbd0: connect failed, err = -22
-EINVAL iirc, it is a bug inside the in-kernel SDP connect() peer lookup, which EINVALs if the target address is not given as AF_INET (!), even if the socket itself is AF_INET_SDP. Or the other way around. If you do "drbdadm -d connect $resource", you get the drbdsetup command that would have been issued. replace the second (remmote) sdp with ipv4, and do them manually, on both nodes. If that does not work, replace only the first (local) sdp with ipv4, but keep the second (remote) sdp. If that gets you connected, then its that bug. I think I even patched it in kernel once, but don't find that right now, and don't remember the SDP version either. I think it was drivers/infiniband/ulp/sdp/sdp_main.c:addr_resolve_remote() missing an (... || ... = AF_INET_SDP) > I have the MLNX_OFED installed on CentOS5.5 with SDP active: > > # rpm -qa|grep sdp > libsdp-devel-1.1.100-0.1.g920ea31 > sdpnetstat-1.60-0.2.g8844f04 > libsdp-1.1.100-0.1.g920ea31 > libsdp-1.1.100-0.1.g920ea31 > libsdp-devel-1.1.100-0.1.g920ea31 > libsdp-debuginfo-1.1.100-0.1.g920ea31 That's all userland, and does not affect DRBD, as DRBD does all networking from within the kernel. > [r...@node02 log]# netperf -f g -H 192.168.20.1 -c -C > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.20.1 > (192.168.20.1) port 0 AF_INET > Recv Send Send Utilization Service > Demand > Socket Socket Message Elapsed Send Recv Send Recv > Size Size Size Time Throughput local remote local remote > bytes bytes bytes secs. 10^9bits/s % S % S us/KB us/KB > 87380 65536 65536 10.00 16.15 1.74 4.61 0.211 0.562 > > [r...@node02 log]# LD_PRELOAD="libsdp.so" netperf -f g -H 192.168.20.1 -c -C > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.20.1 > (192.168.20.1) port 0 AF_INET > Recv Send Send Utilization Service > Demand > Socket Socket Message Elapsed Send Recv Send Recv > Size Size Size Time Throughput local remote local remote > bytes bytes bytes secs. 10^9bits/s % S % S us/KB us/KB > 87380 65536 65536 10.01 24.67 3.18 3.28 0.253 0.262 > > There is a significant (50-100%) increase in bandwidth and decrease in > latency using SDP instead of IPoIB, so even though IPoIB works I'd like to > use the SDP method. Share your findings on DRBD performance IPoIB vs. SDP, once you get the thing to work on your platform. HTH, -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed _______________________________________________ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user