Hello Arlin,

I am Or's colleague whom he assist with this manner.

OK, we got Intel MPI to run. To test the pkey usage we configured it to run 
over pkey that is not configured on the node. In this case the MPI should have 
failed, but it didn't.
The dapl debug reports the given pkey (0x8001 = 32769).
How can that be?

See attached the different mpi run. I believe the devices are the correct ones 
(ofa-v2*). 

Itay
   

-----Original Message-----
From: Davis, Arlin R [mailto:arlin.r.da...@intel.com] 
Sent: ג 13 יולי 2010 19:19
To: Or Gerlitz
Cc: Itay Berman; linux-rdma
Subject: RE: some dapl assistance

Sorry, Intel MPI requires development packages which include libdat.so and 
libdat2.so  

Please see the install instructions on 
http://www.openfabrics.org/downloads/dapl/

---

For 1.2 and 2.0 support on same system, including development, install RPM 
packages as follow: 

dapl-2.0.29-1 
dapl-utils-2.0.29-1 
dapl-devel-2.0.29-1      <<<<
dapl-debuginfo-2.0.29-1 
compat-dapl-1.2.18-1 
compat-dapl-devel-1.2.18-1  <<<<

---

Thanks for the heads up on dat.conf manpage. I will fix the conflict in next 
release.

-arlin

>-----Original Message-----
>From: Or Gerlitz [mailto:ogerl...@voltaire.com] 
>Sent: Tuesday, July 13, 2010 4:41 AM
>To: Davis, Arlin R
>Cc: Itay Berman; linux-rdma
>Subject: Re: some dapl assistance
>
>Davis, Arlin R wrote:
>> There is limited debug in the non-debug builds. If you want 
>full debugging capabilities
>> you can install the source RPM and configure and make as 
>follows [..] (OFED target example):
>
>okay, got that, once I built the sources by hand as you 
>suggested I could see debug prints
>but things didn't really work, so I stepped back and installed 
>the latest rpms - dapl-2.0.29-1
>and compat-dapl-1.2.18-1, now I couldn't get intel-mpi to run:
>
>> [r...@dodly0 ~]# rpm -qav | grep dapl
>> dapl-utils-2.0.29-1
>> dapl-2.0.29-1
>> compat-dapl-1.2.18-1
>
>> [r...@dodly0 ~]# ldconfig -p | grep libdat
>>         libdat2.so.2 (libc6,x86-64) => /usr/lib64/libdat2.so.2
>>         libdat.so.1 (libc6,x86-64) => /usr/lib64/libdat.so.1
>
>> [r...@dodly0 ~]# rpm -qf /usr/lib64/libdat.so.1
>> compat-dapl-1.2.18-1
>> [r...@dodly0 ~]# rpm -qf /usr/lib64/libdat2.so.2
>> dapl-2.0.29-1
>
>> [r...@dodly0 ~]# 
>/opt/intel/impi/4.0.0.027/intel64/bin/mpiexec -ppn 1 -n 2  
>-env DAPL_IB_PKEY 0x8002 -env DAPL_DBG_TYPE 0xff -env 
>DAPL_DBG_DEST 0x3  -env I_MPI_DEBUG 3 -env 
>I_MPI_CHECK_DAPL_PROVIDER_MISMATCH none -env I_MPI_FABRICS 
>dapl:dapl /tmp/osu
>> [0] MPI startup(): cannot open dynamic library libdat.so
>> [1] MPI startup(): cannot open dynamic library libdat.so
>> [0] MPI startup(): cannot open dynamic library libdat2.so
>> [0] dapl fabric is not available and fallback fabric is not enabled
>> [1] MPI startup(): cannot open dynamic library libdat2.so
>> [1] dapl fabric is not available and fallback fabric is not enabled
>> rank 1 in job 5  dodly0_54941   caused collective abort of all ranks
>>   exit status of rank 1: return code 254
>> rank 0 in job 5  dodly0_54941   caused collective abort of all ranks
>>   exit status of rank 0: return code 254
>
>Any idea what we're doing wrong?
>
>BTW - before things stopped to work, exporting LD_DEBUG=libs 
>to the MPI rank, 
>I noticed that it used the compat-1.2 rpm ...
>
>Now, I can run dapltest fine,
>> [r...@dodly0 ~]# dapltest -T S -D ofa-v2-mthca0-1
>> Dapltest: Service Point Ready - ofa-v2-mthca0-1
>> Dapltest: Service Point Ready - ofa-v2-mthca0-1
>> Server: Transaction Test Finished for this client
>
>> [r...@dodly4 ~]# dapltest -T T -D ofa-v2-mlx4_0-1 -s dodly0 
>-i 1000 server SR 65536 4 client SR 65536 4
>> Server Name: dodly0
>> Server Net Address: 172.30.3.230
>> DT_cs_Client: Starting Test ...
>> ----- Stats ---- : 1 threads, 1 EPs
>> Total WQE        :    2919.70 WQE/Sec
>> Total Time       :       0.68 sec
>> Total Send       :     262.14 MB -     382.69 MB/Sec
>> Total Recv       :     262.14 MB -     382.69 MB/Sec
>> Total RDMA Read  :       0.00 MB -       0.00 MB/Sec
>> Total RDMA Write :       0.00 MB -       0.00 MB/Sec
>> DT_cs_Client: ========== End of Work -- Client Exiting
>
>I also noted that the dapl-utils and the compat-dapl-utils are 
>mutual exclusive as both 
>attempt to install the same man page for dat.conf
>> # rpm -Uvh 
>/usr/src/redhat/RPMS/x86_64/compat-dapl-utils-1.2.18-1.x86_64.rpm
>> Preparing...                
>########################################### [100%]
>>         file /usr/share/man/man5/dat.conf.5.gz from install 
>of compat-dapl-utils-1.2.18-1.x86_64 conflicts with file from 
>package dapl-utils-2.0.29-1.x86_64
>
>Or.
>
[r...@dodly0 compat-dapl-1.2.18]# mpiexec -ppn 1 -n 2 -env I_MPI_FABRICS 
dapl:dapl -env I_MPI_DEBUG 2 -env I_MPI_CHECK_DAPL_PROVIDER_MISMATCH none -env 
DAPL_DBG_TYPE 0xffff /tmp/osu
dodly0:47ba: dapl_init: dbg_type=0xffff,dbg_dest=0x1
dodly4:e32: dapl_init: dbg_type=0xffff,dbg_dest=0x1
dodly0:47ba:  open_hca: device mlx4_0 not found
dodly0:47ba:  open_hca: device mlx4_0 not found
dodly0:47ba:  query_hca: port.link_layer = 0x1
dodly0:47ba:  query_hca: (a0.0) eps 64512, sz 16384 evds 65408, sz 131071 mtu 
2048 - pkey 0 p_idx 0 sl 0
dodly0:47ba:  query_hca: msg 2147483648 rdma 2147483648 iov 27 lmr 131056 rmr 0 
ack_time 16 mr 4294967295
dodly0:47ba:  query_hca: port.link_layer = 0x1
dodly0:47ba:  query_hca: (a0.0) eps 64512, sz 16384 evds 65408, sz 131071 mtu 
2048 - pkey 0 p_idx 0 sl 0
dodly0:47ba:  query_hca: msg 2147483648 rdma 2147483648 iov 27 lmr 131056 rmr 0 
ack_time 16 mr 4294967295
dodly0:47ba:  query_hca: port.link_layer = 0x1
dodly0:47ba:  query_hca: (a0.0) eps 64512, sz 16384 evds 65408, sz 131071 mtu 
2048 - pkey 0 p_idx 0 sl 0
dodly0:47ba:  query_hca: msg 2147483648 rdma 2147483648 iov 27 lmr 131056 rmr 0 
ack_time 16 mr 4294967295
dodly0:47ba:  dapl_poll: fd=17 ret=1, evnts=0x1
dodly0:47ba:  dapl_poll: fd=17 ret=0, evnts=0x0
dodly0:47ba:  dapl_poll: fd=14 ret=0, evnts=0x0
[0] MPI startup(): DAPL provider ofa-v2-mthca0-1
[0] MPI startup(): dapl data transfer mode
dodly4:e32:  query_hca: port.link_layer = 0x1
dodly4:e32:  query_hca: (a0.0) eps 262076, sz 16351 evds 65408, sz 4194303 mtu 
2048 - pkey 0 p_idx 0 sl 0
dodly4:e32:  query_hca: msg 1073741824 rdma 1073741824 iov 32 lmr 524272 rmr 0 
ack_time 16 mr 4294967295
dodly4:e32:  query_hca: port.link_layer = 0x1
dodly4:e32:  query_hca: (a0.0) eps 262076, sz 16351 evds 65408, sz 4194303 mtu 
2048 - pkey 0 p_idx 0 sl 0
dodly4:e32:  query_hca: msg 1073741824 rdma 1073741824 iov 32 lmr 524272 rmr 0 
ack_time 16 mr 4294967295
dodly4:e32:  query_hca: port.link_layer = 0x1
dodly4:e32:  query_hca: (a0.0) eps 262076, sz 16351 evds 65408, sz 4194303 mtu 
2048 - pkey 0 p_idx 0 sl 0
dodly4:e32:  query_hca: msg 1073741824 rdma 1073741824 iov 32 lmr 524272 rmr 0 
ack_time 16 mr 4294967295
dodly4:e32:  dapl_poll: fd=15 ret=1, evnts=0x1
dodly4:e32:  dapl_poll: fd=15 ret=0, evnts=0x0
dodly4:e32:  dapl_poll: fd=13 ret=0, evnts=0x0
[1] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
[1] MPI startup(): dapl data transfer mode
[0] MPI startup(): static connections storm algo
dodly0:47ba:  dapl_poll: fd=17 ret=1, evnts=0x1
dodly0:47ba:  dapl_poll: fd=17 ret=0, evnts=0x0
dodly0:47ba:  dapl_poll: fd=14 ret=0, evnts=0x0
dodly0:47ba:  dapl_poll: fd=19 ret=0, evnts=0x0
dodly0:47ba:  dapl_poll: fd=17 ret=0, evnts=0x0
dodly0:47ba:  dapl_poll: fd=14 ret=0, evnts=0x0
dodly0:47ba:  dapl_poll: fd=19 ret=1, evnts=0x4
dodly0:47ba:  dapl_poll: fd=17 ret=0, evnts=0x0
dodly0:47ba:  dapl_poll: fd=14 ret=0, evnts=0x0
dodly0:47ba:  dapl_poll: fd=19 ret=0, evnts=0x0
dodly4:e32:  dapl_poll: fd=15 ret=0, evnts=0x0
dodly4:e32:  dapl_poll: fd=13 ret=1, evnts=0x1
dodly4:e32:  dapl_poll: fd=13 ret=0, evnts=0x0
dodly4:e32:  dapl_poll: fd=15 ret=1, evnts=0x1
dodly4:e32:  dapl_poll: fd=15 ret=0, evnts=0x0
dodly4:e32:  dapl_poll: fd=13 ret=0, evnts=0x0
dodly4:e32:  dapl_poll: fd=17 ret=1, evnts=0x1
dodly0:47ba:  dapl_poll: fd=17 ret=0, evnts=0x0
dodly0:47ba:  dapl_poll: fd=14 ret=0, evnts=0x0
dodly0:47ba:  dapl_poll: fd=19 ret=1, evnts=0x1
dodly4:e32:  dapl_poll: fd=15 ret=0, evnts=0x0
dodly4:e32:  dapl_poll: fd=13 ret=0, evnts=0x0
dodly4:e32:  dapl_poll: fd=17 ret=1, evnts=0x1
# OSU MPI Bandwidth Test v3.1.1
# Size        Bandwidth (MB/s)
1                         0.42
2                         0.85
4                         1.70
8                         3.38
16                        6.75
32                       13.45
64                       26.66
128                      52.43
256                     102.41
512                     196.05
1024                    350.80
2048                    559.92
4096                    682.33
8192                    748.72
16384                   786.83
32768                   674.08
65536                   795.84
131072                  878.78
262144                  927.75
524288                  949.61
1048576                 965.51
2097152                 974.14
4194304                 978.64
dodly0:47ba:  dapl_poll: fd=17 ret=1, evnts=0x1
dodly0:47ba:  dapl_poll: fd=17 ret=0, evnts=0x0
dodly0:47ba:  dapl_poll: fd=14 ret=0, evnts=0x0
dodly0:47ba:  CM FREE: 0x13c939c0 ep=0x13c80b60 st=CM_FREE sck=19 refs=4
dodly4:e32:  dapl_poll: fd=15 ret=0, evnts=0x0
dodly4:e32:  dapl_poll: fd=13 ret=0, evnts=0x0
dodly4:e32:  dapl_poll: fd=17 ret=1, evnts=0x1
dodly0:47ba: dapl_ep_free: Free CM: EP=0x13c80b60 CM=0x13c939c0
dodly4:e32:  dapl_poll: fd=15 ret=1, evnts=0x1
dodly4:e32:  dapl_poll: fd=15 ret=0, evnts=0x0
dodly4:e32:  dapl_poll: fd=13 ret=0, evnts=0x0
dodly4:e32:  CM FREE: 0x2f08f70 ep=0x2f09370 st=CM_FREE sck=17 refs=4
dodly0:47ba:  cm_free: cm 0x13c939c0 CM_FREE ep 0x13c80b60 refs=1
dodly4:e32: dapl_ep_free: Free CM: EP=0x2f09370 CM=0x2f08f70
dodly4:e32:  cm_free: cm 0x2f08f70 CM_FREE ep 0x2f09370 refs=1
dodly4:e32:  dapl_poll: fd=15 ret=1, evnts=0x1
dodly4:e32:  dapl_poll: fd=15 ret=0, evnts=0x0
dodly4:e32:  CM FREE: 0x2f19ce0 ep=(nil) st=CM_FREE sck=13 refs=3
dodly0:47ba: dodly4:e32:  dapl_poll: fd=15 ret=1, evnts=0x1
dodly4:e32:  dapl_poll: fd=15 ret=0, evnts=0x0
 dapl_poll: fd=17 ret=1, evnts=0x1
dodly0:47ba:  dapl_poll: fd=17 ret=1, evnts=0x1
dodly0:47ba:  dapl_poll: fd=17 ret=1, evnts=0x1
dodly0:47ba:  dapl_poll: fd=17 ret=0, evnts=0x0
dodly0:47ba:  CM FREE: 0x13c7f940 ep=(nil) st=CM_FREE sck=14 refs=3



[r...@dodly0 compat-dapl-1.2.18]# mpiexec -ppn 1 -n 2 -env I_MPI_FABRICS 
dapl:dapl -env I_MPI_DEBUG 2 -env I_MPI_CHECK_DAPL_PROVIDER_MISMATCH none -env 
DAPL_DBG_TYPE 0xffff /tmp/osu
dodly0:3b37: dapl_init: dbg_type=0xffff,dbg_dest=0x1
dodly4:237: dapl_init: dbg_type=0xffff,dbg_dest=0x1
dodly0:3b37:  open_hca: device mlx4_0 not found
dodly0:3b37:  open_hca: device mlx4_0 not found
dodly0:3b37:  Warning: new pkey(32769), query (Success) err or key !found, 
using defaults
dodly0:3b37:  query_hca: port.link_layer = 0x1
dodly0:3b37:  query_hca: (a0.0) eps 64512, sz 16384 evds 65408, sz 131071 mtu 
2048 - pkey 32769 p_idx 0 sl 0
dodly0:3b37:  query_hca: msg 2147483648 rdma 2147483648 iov 27 lmr 131056 rmr 0 
ack_time 16 mr 4294967295
dodly0:3b37:  Warning: new pkey(32769), query (Success) err or key !found, 
using defaults
dodly0:3b37:  query_hca: port.link_layer = 0x1
dodly0:3b37:  query_hca: (a0.0) eps 64512, sz 16384 evds 65408, sz 131071 mtu 
2048 - pkey 32769 p_idx 0 sl 0
dodly0:3b37:  query_hca: msg 2147483648 rdma 2147483648 iov 27 lmr 131056 rmr 0 
ack_time 16 mr 4294967295
dodly0:3b37:  Warning: new pkey(32769), query (Success) err or key !found, 
using defaults
dodly0:3b37:  query_hca: port.link_layer = 0x1
dodly0:3b37:  query_hca: (a0.0) eps 64512, sz 16384 evds 65408, sz 131071 mtu 
2048 - pkey 32769 p_idx 0 sl 0
dodly0:3b37:  query_hca: msg 2147483648 rdma 2147483648 iov 27 lmr 131056 rmr 0 
ack_time 16 mr 4294967295
dodly0:3b37:  dapl_poll: fd=17 ret=1, evnts=0x1
dodly0:3b37:  dapl_poll: fd=17 ret=0, evnts=0x0
dodly0:3b37:  dapl_poll: fd=14 ret=0, evnts=0x0
dodly4:237:  Warning: new pkey(32769), query (Success) err or key !found, using 
defaults
dodly4:237:  query_hca: port.link_layer = 0x1
dodly4:237:  query_hca: (a0.0) eps 262076, sz 16351 evds 65408, sz 4194303 mtu 
2048 - pkey 32769 p_idx 0 sl 0
dodly4:237:  query_hca: msg 1073741824 rdma 1073741824 iov 32 lmr 524272 rmr 0 
ack_time 16 mr 4294967295
[0] MPI startup(): DAPL provider ofa-v2-mthca0-1
[0] MPI startup(): dapl data transfer mode
dodly4:237:  Warning: new pkey(32769), query (Success) err or key !found, using 
defaults
dodly4:237:  query_hca: port.link_layer = 0x1
dodly4:237:  query_hca: (a0.0) eps 262076, sz 16351 evds 65408, sz 4194303 mtu 
2048 - pkey 32769 p_idx 0 sl 0
dodly4:237:  query_hca: msg 1073741824 rdma 1073741824 iov 32 lmr 524272 rmr 0 
ack_time 16 mr 4294967295
dodly4:237:  Warning: new pkey(32769), query (Success) err or key !found, using 
defaults
dodly4:237:  query_hca: port.link_layer = 0x1
dodly4:237:  query_hca: (a0.0) eps 262076, sz 16351 evds 65408, sz 4194303 mtu 
2048 - pkey 32769 p_idx 0 sl 0
dodly4:237:  query_hca: msg 1073741824 rdma 1073741824 iov 32 lmr 524272 rmr 0 
ack_time 16 mr 4294967295
dodly4:237:  dapl_poll: fd=15 ret=1, evnts=0x1
dodly4:237:  dapl_poll: fd=15 ret=0, evnts=0x0
dodly4:237:  dapl_poll: fd=13 ret=0, evnts=0x0
[1] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
[1] MPI startup(): dapl data transfer mode
[0] MPI startup(): static connections storm algo
dodly0:3b37:  dapl_poll: fd=17 ret=1, evnts=0x1
dodly0:3b37:  dapl_poll: fd=17 ret=0, evnts=0x0
dodly0:3b37:  dapl_poll: fd=14 ret=0, evnts=0x0
dodly0:3b37:  dapl_poll: fd=19 ret=0, evnts=0x0
dodly0:3b37:  dapl_poll: fd=17 ret=0, evnts=0x0
dodly0:3b37:  dapl_poll: fd=14 ret=0, evnts=0x0
dodly0:3b37:  dapl_poll: fd=19 ret=1, evnts=0x4
dodly0:3b37:  dapl_poll: fd=17 ret=0, evnts=0x0
dodly0:3b37:  dapl_poll: fd=14 ret=0, evnts=0x0
dodly0:3b37:  dapl_poll: fd=19 ret=0, evnts=0x0
dodly4:237:  dapl_poll: fd=15 ret=0, evnts=0x0
dodly4:237:  dapl_poll: fd=13 ret=1, evnts=0x1
dodly4:237:  dapl_poll: fd=13 ret=0, evnts=0x0
dodly4:237:  dapl_poll: fd=15 ret=1, evnts=0x1
dodly4:237:  dapl_poll: fd=15 ret=0, evnts=0x0
dodly4:237:  dapl_poll: fd=13 ret=0, evnts=0x0
dodly4:237:  dapl_poll: fd=17 ret=1, evnts=0x1
dodly0:3b37:  dapl_poll: fd=17 ret=0, evnts=0x0
dodly0:3b37:  dapl_poll: fd=14 ret=0, evnts=0x0
dodly0:3b37:  dapl_poll: fd=19 ret=1, evnts=0x1
dodly4:237:  dapl_poll: fd=15 ret=0, evnts=0x0
dodly4:237:  dapl_poll: fd=13 ret=0, evnts=0x0
dodly4:237:  dapl_poll: fd=17 ret=1, evnts=0x1
# OSU MPI Bandwidth Test v3.1.1
# Size        Bandwidth (MB/s)
1                         0.42
2                         0.85
4                         1.69
8                         3.37
16                        6.74
32                       13.45
64                       26.70
128                      52.45
256                     102.12
512                     195.68
1024                    349.75
2048                    555.98
4096                    681.94
8192                    747.29
16384                   785.72
32768                   675.27
65536                   797.38
131072                  879.17
262144                  928.16
524288                  949.20
1048576                 965.38
2097152                 974.11
4194304                 978.56
dodly0:3b37:  dapl_poll: fd=17 ret=1, evnts=0x1
dodly0:3b37:  dapl_poll: fd=17 ret=0, evnts=0x0
dodly0:3b37:  dapl_poll: fd=14 ret=0, evnts=0x0
dodly0:3b37:  CM FREE: 0x1f2d4b40 ep=0x1f2c1b90 st=CM_FREE sck=19 refs=4
dodly4:237:  dapl_poll: fd=15 ret=0, evnts=0x0
dodly4:237:  dapl_poll: fd=13 ret=0, evnts=0x0
dodly4:237:  dapl_poll: fd=17 ret=1, evnts=0x1
dodly4:237:  dapl_poll: fd=15 ret=1, evnts=0x1
dodly4:237:  dapl_poll: fd=15 ret=0, evnts=0x0
dodly4:237:  dapl_poll: fd=13 ret=0, evnts=0x0
dodly4:237:  CM FREE: 0x170f3f70 ep=0x170f4370 st=CM_FREE sck=17 refs=4
dodly0:3b37: dapl_ep_free: Free CM: EP=0x1f2c1b90 CM=0x1f2d4b40
dodly0:3b37:  cm_free: cm 0x1f2d4b40 CM_FREE ep 0x1f2c1b90 refs=1
dodly4:237: dapl_ep_free: Free CM: EP=0x170f4370 CM=0x170f3f70
dodly4:237:  cm_free: cm 0x170f3f70 CM_FREE ep 0x170f4370 refs=1
dodly0:3b37:  dapl_poll: fd=17 ret=1, evnts=0x1
dodly4:237:  dapl_poll: fd=15 ret=1, evnts=0x1
dodly4:237:  dapl_poll: fd=15 ret=1, evnts=0x1
dodly4:237:  dapl_poll: fd=15 ret=1, evnts=0x1
dodly4:237:  dapl_poll: fd=15 ret=0, evnts=0x0
dodly4:237:  CM FREE: 0x17104d90 ep=(nil) st=CM_FREE sck=13 refs=3
dodly0:3b37:  dapl_poll: fd=17 ret=1, evnts=0x1
dodly0:3b37:  dapl_poll: fd=17 ret=1, evnts=0x1
dodly0:3b37:  dapl_poll: fd=17 ret=0, evnts=0x0
dodly0:3b37:  CM FREE: 0x1f2c09f0 ep=(nil) st=CM_FREE sck=14 refs=3


[r...@dodly0 compat-dapl-1.2.18]# smpquery PKeyTable 4
   0: 0xffff 0x8002 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
   8: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
  16: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
  24: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
  32: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
  40: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
  48: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
  56: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
64 pkeys capacity for this port


Reply via email to