The information provided is not enough for an investigation
to be useful. Additional data of interest would minimally be:
output of "ifconfig -a"
output of "datadm -v" (multiple entries may be a problem)
If the user was running s10u4, I'd recommend trying IB
Update 1.0. There may be a bug fix in it that is not in
Open Solaris. The one I remember is in datadm, so I think
it should not cause this problem.
I run with 2 ports connected all the time with only 1 IPoIB
device configured, and do not have the problem described by
this user. I am running s10u4 plus the download (SDLC) of
IB Updates 1.0. The config on both my systems looks like:
$ ls -l /dev/ibd?
lrwxrwxrwx 1 root root 71 Mar 31 14:54 /dev/ibd1 ->
../devices/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci15b3,[EMAIL
PROTECTED]/[EMAIL PROTECTED],8001,ipib:ibd1
lrwxrwxrwx 1 root root 71 Mar 31 14:54 /dev/ibd3 ->
../devices/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci15b3,[EMAIL
PROTECTED]/[EMAIL PROTECTED],8001,ipib:ibd3
$ ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232
index 1
inet 127.0.0.1 netmask ff000000
bge0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500 index
2
inet 10.1.49.193 netmask ffffff00 broadcast 10.1.49.255
ibd1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 2044 index 3
inet 18.0.0.193 netmask ff000000 broadcast 18.255.255.255
$ datadm -v
ibd1 u1.2 nonthreadsafe default udapl_tavor.so.1 SUNW.1.0 " "
"driver_name=tavor"
$ mpirun --mca pls_rsh_agent rsh --host 18.0.0.193,18.0.0.194 \
--mca mpi_preconnect_all 1 \
--mca btl self,sm,udapl --mca mpi_leave_pinned 1 osu_latency
# OSU MPI Latency Test (Version 2.2)
# Size Latency (us)
0 4.32
1 4.57
2 4.57
4 4.55
8 4.61
16 4.63
32 4.72
64 4.86
128 5.06
256 5.99
512 6.66
1024 8.04
2048 10.73
4096 13.63
8192 31.09
16384 39.89
32768 57.02
65536 91.47
131072 160.03
262144 298.00
524288 572.74
1048576 1122.57
2097152 2221.51
4194304 5962.94
$
- BT
> ------------------------------------------------------------------------
>
> Subject:
> [networking-discuss] Solaris/OpenSolaris uDAPL doesn't work when both IB
> HCA ports are connected
> From:
> Denis Golubev <[EMAIL PROTECTED]>
> Date:
> Thu, 03 Apr 2008 02:23:12 -0700 (PDT)
> To:
> [email protected]
>
> To:
> [email protected]
>
>
> Hello.
>
> I discovered that Solaris/OpenSolaris uDAPL works if only one HCA port is
> connected to IB fabric. When I connect two HCA ports to the IB fabric or two
> ports from different HCAs to the IB fabric, uDAPL doesn't work.
>
> I discovered this problem with HPC ClusterTools 7.x and verified with
> dapltest. When two or more IB ports form single host are connected to the IB
> fabric 'dat_ep_connect' routine always returns DAT_INTERNAL_ERROR.
>
> uDAPL configuration via datadm doesn't care - more than one connection to the
> IB fabric breaks uDAPL despite of quantity of the IPoIB interfaces configured
> for DAT usage.
>
> Please advice is ist possible to avoid this error and use more than on IB
> connection on host for uDAPL. Thanks in advance.
>
> Regards,
>
> Denis
>
>
> This message posted from opensolaris.org
> _______________________________________________
> networking-discuss mailing list
> [email protected]
_______________________________________________
networking-discuss mailing list
[email protected]