ClusterTools 7.x does not support multiple IB HCAs with uDAPL. ClusterTools 8.0 will, this is in EA currently: http://www.sun.com/software/products/clustertools/early_access.xml .
Providing the additional data Bill suggested would be helpful. You say you saw this with dapltest as well, it would be good to know version and where you got dapltest from, I have seen different versions around. -DON Bill Taylor wrote: > The information provided is not enough for an investigation > to be useful. Additional data of interest would minimally be: > > output of "ifconfig -a" > output of "datadm -v" (multiple entries may be a problem) > > If the user was running s10u4, I'd recommend trying IB > Update 1.0. There may be a bug fix in it that is not in > Open Solaris. The one I remember is in datadm, so I think > it should not cause this problem. > > I run with 2 ports connected all the time with only 1 IPoIB > device configured, and do not have the problem described by > this user. I am running s10u4 plus the download (SDLC) of > IB Updates 1.0. The config on both my systems looks like: > > $ ls -l /dev/ibd? > lrwxrwxrwx 1 root root 71 Mar 31 14:54 /dev/ibd1 -> > ../devices/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci15b3,[EMAIL > PROTECTED]/[EMAIL PROTECTED],8001,ipib:ibd1 > lrwxrwxrwx 1 root root 71 Mar 31 14:54 /dev/ibd3 -> > ../devices/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci15b3,[EMAIL > PROTECTED]/[EMAIL PROTECTED],8001,ipib:ibd3 > $ ifconfig -a > lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 > index 1 > inet 127.0.0.1 netmask ff000000 > bge0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500 > index 2 > inet 10.1.49.193 netmask ffffff00 broadcast 10.1.49.255 > ibd1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 2044 index 3 > inet 18.0.0.193 netmask ff000000 broadcast 18.255.255.255 > $ datadm -v > ibd1 u1.2 nonthreadsafe default udapl_tavor.so.1 SUNW.1.0 " " > "driver_name=tavor" > $ mpirun --mca pls_rsh_agent rsh --host 18.0.0.193,18.0.0.194 \ > --mca mpi_preconnect_all 1 \ > --mca btl self,sm,udapl --mca mpi_leave_pinned 1 osu_latency > # OSU MPI Latency Test (Version 2.2) > # Size Latency (us) > 0 4.32 > 1 4.57 > 2 4.57 > 4 4.55 > 8 4.61 > 16 4.63 > 32 4.72 > 64 4.86 > 128 5.06 > 256 5.99 > 512 6.66 > 1024 8.04 > 2048 10.73 > 4096 13.63 > 8192 31.09 > 16384 39.89 > 32768 57.02 > 65536 91.47 > 131072 160.03 > 262144 298.00 > 524288 572.74 > 1048576 1122.57 > 2097152 2221.51 > 4194304 5962.94 > $ > > - BT > > > >> ------------------------------------------------------------------------ >> >> Subject: >> [networking-discuss] Solaris/OpenSolaris uDAPL doesn't work when both IB >> HCA ports are connected >> From: >> Denis Golubev <[EMAIL PROTECTED]> >> Date: >> Thu, 03 Apr 2008 02:23:12 -0700 (PDT) >> To: >> [email protected] >> >> To: >> [email protected] >> >> >> Hello. >> >> I discovered that Solaris/OpenSolaris uDAPL works if only one HCA port is >> connected to IB fabric. When I connect two HCA ports to the IB fabric or two >> ports from different HCAs to the IB fabric, uDAPL doesn't work. >> >> I discovered this problem with HPC ClusterTools 7.x and verified with >> dapltest. When two or more IB ports form single host are connected to the >> IB fabric 'dat_ep_connect' routine always returns DAT_INTERNAL_ERROR. >> >> uDAPL configuration via datadm doesn't care - more than one connection to >> the IB fabric breaks uDAPL despite of quantity of the IPoIB interfaces >> configured for DAT usage. >> >> Please advice is ist possible to avoid this error and use more than on IB >> connection on host for uDAPL. Thanks in advance. >> >> Regards, >> >> Denis >> >> >> This message posted from opensolaris.org >> _______________________________________________ >> networking-discuss mailing list >> [email protected] >> _______________________________________________ networking-discuss mailing list [email protected]
