On Thu, Sep 16, 2010 at 06:29:03PM -0500, J. Ryan Earl wrote:
> Hello,
> 
> I've recently setup an InfiniBand 40Gbit interconnect between two nodes to
> run DRBD on top of some pretty fast storage.  I am able to get DRBD to work
> over Ethernet and IPoIB, however, when I try to enable SDP for the lower
> latency, lower overhead communication I'm getting connection errors:
> 
> block drbd0: conn( Unconnected -> WFConnection )
> block drbd0: connect failed, err = -22
> block drbd0: connect failed, err = -22
> block drbd0: connect failed, err = -22

-EINVAL

iirc, it is a bug inside the in-kernel SDP connect() peer lookup,
which EINVALs if the target address is not given as AF_INET (!),
even if the socket itself is AF_INET_SDP.
Or the other way around.

If you do "drbdadm -d connect $resource", you get the drbdsetup
command that would have been issued.
replace the second (remmote) sdp with ipv4,
and do them manually, on both nodes.
If that does not work, replace only the first (local) sdp with ipv4,
but keep the second (remote) sdp.

If that gets you connected, then its that bug.
I think I even patched it in kernel once,
but don't find that right now,
and don't remember the SDP version either.
I think it was
drivers/infiniband/ulp/sdp/sdp_main.c:addr_resolve_remote()
missing an (... || ... = AF_INET_SDP)

> I have the MLNX_OFED installed on CentOS5.5 with SDP active:
> 
> # rpm -qa|grep sdp
> libsdp-devel-1.1.100-0.1.g920ea31
> sdpnetstat-1.60-0.2.g8844f04
> libsdp-1.1.100-0.1.g920ea31
> libsdp-1.1.100-0.1.g920ea31
> libsdp-devel-1.1.100-0.1.g920ea31
> libsdp-debuginfo-1.1.100-0.1.g920ea31

That's all userland, and does not affect DRBD, as DRBD does all
networking from within the kernel.

> [r...@node02 log]# netperf -f g -H 192.168.20.1 -c -C
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.20.1
> (192.168.20.1) port 0 AF_INET
> Recv   Send    Send                          Utilization       Service
> Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^9bits/s  % S      % S      us/KB   us/KB
>  87380  65536  65536    10.00        16.15   1.74     4.61     0.211   0.562
> 
> [r...@node02 log]# LD_PRELOAD="libsdp.so" netperf -f g -H 192.168.20.1 -c -C
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.20.1
> (192.168.20.1) port 0 AF_INET
> Recv   Send    Send                          Utilization       Service
> Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^9bits/s  % S      % S      us/KB   us/KB
>  87380  65536  65536    10.01        24.67   3.18     3.28     0.253   0.262
> 
> There is a significant (50-100%) increase in bandwidth and decrease in
> latency using SDP instead of IPoIB, so even though IPoIB works I'd like to
> use the SDP method.

Share your findings on DRBD performance IPoIB vs. SDP,
once you get the thing to work on your platform.

HTH,


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to