The information provided is not enough for an investigation
to be useful.  Additional data of interest would minimally be:

        output of "ifconfig -a"
        output of "datadm -v"   (multiple entries may be a problem)

If the user was running s10u4, I'd recommend trying IB
Update 1.0.  There may be a bug fix in it that is not in
Open Solaris.  The one I remember is in datadm, so I think
it should not cause this problem.

I run with 2 ports connected all the time with only 1 IPoIB
device configured, and do not have the problem described by
this user.  I am running s10u4 plus the download (SDLC) of
IB Updates 1.0.  The config on both my systems looks like:

   $ ls -l /dev/ibd?
   lrwxrwxrwx   1 root     root          71 Mar 31 14:54 /dev/ibd1 ->
     ../devices/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci15b3,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],8001,ipib:ibd1
   lrwxrwxrwx   1 root     root          71 Mar 31 14:54 /dev/ibd3 ->
     ../devices/[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci15b3,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],8001,ipib:ibd3
   $ ifconfig -a
   lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 
index 1
        inet 127.0.0.1 netmask ff000000
   bge0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500 index 
2
        inet 10.1.49.193 netmask ffffff00 broadcast 10.1.49.255
   ibd1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 2044 index 3
        inet 18.0.0.193 netmask ff000000 broadcast 18.255.255.255
   $ datadm -v
   ibd1  u1.2  nonthreadsafe  default  udapl_tavor.so.1  SUNW.1.0  " "  
"driver_name=tavor"
   $ mpirun --mca pls_rsh_agent rsh --host 18.0.0.193,18.0.0.194 \
    --mca mpi_preconnect_all 1 \
   --mca btl self,sm,udapl --mca mpi_leave_pinned 1 osu_latency
   # OSU MPI Latency Test (Version 2.2)
   # Size              Latency (us)
   0                   4.32
   1                   4.57
   2                   4.57
   4                   4.55
   8                   4.61
   16                  4.63
   32                  4.72
   64                  4.86
   128                 5.06
   256                 5.99
   512                 6.66
   1024                8.04
   2048               10.73
   4096               13.63
   8192               31.09
   16384              39.89
   32768              57.02
   65536              91.47
   131072            160.03
   262144            298.00
   524288            572.74
   1048576          1122.57
   2097152          2221.51
   4194304          5962.94
   $

- BT


> ------------------------------------------------------------------------
> 
> Subject:
> [networking-discuss] Solaris/OpenSolaris uDAPL doesn't work when both IB
> HCA ports are connected
> From:
> Denis Golubev <[EMAIL PROTECTED]>
> Date:
> Thu, 03 Apr 2008 02:23:12 -0700 (PDT)
> To:
> [email protected]
> 
> To:
> [email protected]
> 
> 
> Hello.
> 
> I discovered that Solaris/OpenSolaris uDAPL works if only one HCA  port is 
> connected to IB fabric. When I connect two HCA ports to the IB fabric or two 
> ports from different HCAs to the IB fabric, uDAPL doesn't work.
> 
> I discovered this problem with HPC ClusterTools 7.x and verified with 
> dapltest. When two or more IB ports form single host  are connected to the IB 
> fabric 'dat_ep_connect' routine always returns DAT_INTERNAL_ERROR. 
> 
> uDAPL configuration via datadm doesn't care - more than one connection to the 
> IB fabric breaks uDAPL despite of quantity of the IPoIB interfaces configured 
> for DAT usage.
> 
> Please advice is ist possible to avoid this error and use more than on IB 
> connection on host for uDAPL. Thanks in advance.
> 
> Regards,
> 
> Denis
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> networking-discuss mailing list
> [email protected]
_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to