Hello Arlin, I am Or's colleague whom he assist with this manner.
OK, we got Intel MPI to run. To test the pkey usage we configured it to run over pkey that is not configured on the node. In this case the MPI should have failed, but it didn't. The dapl debug reports the given pkey (0x8001 = 32769). How can that be? See attached the different mpi run. I believe the devices are the correct ones (ofa-v2*). Itay -----Original Message----- From: Davis, Arlin R [mailto:arlin.r.da...@intel.com] Sent: ג 13 יולי 2010 19:19 To: Or Gerlitz Cc: Itay Berman; linux-rdma Subject: RE: some dapl assistance Sorry, Intel MPI requires development packages which include libdat.so and libdat2.so Please see the install instructions on http://www.openfabrics.org/downloads/dapl/ --- For 1.2 and 2.0 support on same system, including development, install RPM packages as follow: dapl-2.0.29-1 dapl-utils-2.0.29-1 dapl-devel-2.0.29-1 <<<< dapl-debuginfo-2.0.29-1 compat-dapl-1.2.18-1 compat-dapl-devel-1.2.18-1 <<<< --- Thanks for the heads up on dat.conf manpage. I will fix the conflict in next release. -arlin >-----Original Message----- >From: Or Gerlitz [mailto:ogerl...@voltaire.com] >Sent: Tuesday, July 13, 2010 4:41 AM >To: Davis, Arlin R >Cc: Itay Berman; linux-rdma >Subject: Re: some dapl assistance > >Davis, Arlin R wrote: >> There is limited debug in the non-debug builds. If you want >full debugging capabilities >> you can install the source RPM and configure and make as >follows [..] (OFED target example): > >okay, got that, once I built the sources by hand as you >suggested I could see debug prints >but things didn't really work, so I stepped back and installed >the latest rpms - dapl-2.0.29-1 >and compat-dapl-1.2.18-1, now I couldn't get intel-mpi to run: > >> [r...@dodly0 ~]# rpm -qav | grep dapl >> dapl-utils-2.0.29-1 >> dapl-2.0.29-1 >> compat-dapl-1.2.18-1 > >> [r...@dodly0 ~]# ldconfig -p | grep libdat >> libdat2.so.2 (libc6,x86-64) => /usr/lib64/libdat2.so.2 >> libdat.so.1 (libc6,x86-64) => /usr/lib64/libdat.so.1 > >> [r...@dodly0 ~]# rpm -qf /usr/lib64/libdat.so.1 >> compat-dapl-1.2.18-1 >> [r...@dodly0 ~]# rpm -qf /usr/lib64/libdat2.so.2 >> dapl-2.0.29-1 > >> [r...@dodly0 ~]# >/opt/intel/impi/4.0.0.027/intel64/bin/mpiexec -ppn 1 -n 2 >-env DAPL_IB_PKEY 0x8002 -env DAPL_DBG_TYPE 0xff -env >DAPL_DBG_DEST 0x3 -env I_MPI_DEBUG 3 -env >I_MPI_CHECK_DAPL_PROVIDER_MISMATCH none -env I_MPI_FABRICS >dapl:dapl /tmp/osu >> [0] MPI startup(): cannot open dynamic library libdat.so >> [1] MPI startup(): cannot open dynamic library libdat.so >> [0] MPI startup(): cannot open dynamic library libdat2.so >> [0] dapl fabric is not available and fallback fabric is not enabled >> [1] MPI startup(): cannot open dynamic library libdat2.so >> [1] dapl fabric is not available and fallback fabric is not enabled >> rank 1 in job 5 dodly0_54941 caused collective abort of all ranks >> exit status of rank 1: return code 254 >> rank 0 in job 5 dodly0_54941 caused collective abort of all ranks >> exit status of rank 0: return code 254 > >Any idea what we're doing wrong? > >BTW - before things stopped to work, exporting LD_DEBUG=libs >to the MPI rank, >I noticed that it used the compat-1.2 rpm ... > >Now, I can run dapltest fine, >> [r...@dodly0 ~]# dapltest -T S -D ofa-v2-mthca0-1 >> Dapltest: Service Point Ready - ofa-v2-mthca0-1 >> Dapltest: Service Point Ready - ofa-v2-mthca0-1 >> Server: Transaction Test Finished for this client > >> [r...@dodly4 ~]# dapltest -T T -D ofa-v2-mlx4_0-1 -s dodly0 >-i 1000 server SR 65536 4 client SR 65536 4 >> Server Name: dodly0 >> Server Net Address: 172.30.3.230 >> DT_cs_Client: Starting Test ... >> ----- Stats ---- : 1 threads, 1 EPs >> Total WQE : 2919.70 WQE/Sec >> Total Time : 0.68 sec >> Total Send : 262.14 MB - 382.69 MB/Sec >> Total Recv : 262.14 MB - 382.69 MB/Sec >> Total RDMA Read : 0.00 MB - 0.00 MB/Sec >> Total RDMA Write : 0.00 MB - 0.00 MB/Sec >> DT_cs_Client: ========== End of Work -- Client Exiting > >I also noted that the dapl-utils and the compat-dapl-utils are >mutual exclusive as both >attempt to install the same man page for dat.conf >> # rpm -Uvh >/usr/src/redhat/RPMS/x86_64/compat-dapl-utils-1.2.18-1.x86_64.rpm >> Preparing... >########################################### [100%] >> file /usr/share/man/man5/dat.conf.5.gz from install >of compat-dapl-utils-1.2.18-1.x86_64 conflicts with file from >package dapl-utils-2.0.29-1.x86_64 > >Or. >
[r...@dodly0 compat-dapl-1.2.18]# mpiexec -ppn 1 -n 2 -env I_MPI_FABRICS dapl:dapl -env I_MPI_DEBUG 2 -env I_MPI_CHECK_DAPL_PROVIDER_MISMATCH none -env DAPL_DBG_TYPE 0xffff /tmp/osu dodly0:47ba: dapl_init: dbg_type=0xffff,dbg_dest=0x1 dodly4:e32: dapl_init: dbg_type=0xffff,dbg_dest=0x1 dodly0:47ba: open_hca: device mlx4_0 not found dodly0:47ba: open_hca: device mlx4_0 not found dodly0:47ba: query_hca: port.link_layer = 0x1 dodly0:47ba: query_hca: (a0.0) eps 64512, sz 16384 evds 65408, sz 131071 mtu 2048 - pkey 0 p_idx 0 sl 0 dodly0:47ba: query_hca: msg 2147483648 rdma 2147483648 iov 27 lmr 131056 rmr 0 ack_time 16 mr 4294967295 dodly0:47ba: query_hca: port.link_layer = 0x1 dodly0:47ba: query_hca: (a0.0) eps 64512, sz 16384 evds 65408, sz 131071 mtu 2048 - pkey 0 p_idx 0 sl 0 dodly0:47ba: query_hca: msg 2147483648 rdma 2147483648 iov 27 lmr 131056 rmr 0 ack_time 16 mr 4294967295 dodly0:47ba: query_hca: port.link_layer = 0x1 dodly0:47ba: query_hca: (a0.0) eps 64512, sz 16384 evds 65408, sz 131071 mtu 2048 - pkey 0 p_idx 0 sl 0 dodly0:47ba: query_hca: msg 2147483648 rdma 2147483648 iov 27 lmr 131056 rmr 0 ack_time 16 mr 4294967295 dodly0:47ba: dapl_poll: fd=17 ret=1, evnts=0x1 dodly0:47ba: dapl_poll: fd=17 ret=0, evnts=0x0 dodly0:47ba: dapl_poll: fd=14 ret=0, evnts=0x0 [0] MPI startup(): DAPL provider ofa-v2-mthca0-1 [0] MPI startup(): dapl data transfer mode dodly4:e32: query_hca: port.link_layer = 0x1 dodly4:e32: query_hca: (a0.0) eps 262076, sz 16351 evds 65408, sz 4194303 mtu 2048 - pkey 0 p_idx 0 sl 0 dodly4:e32: query_hca: msg 1073741824 rdma 1073741824 iov 32 lmr 524272 rmr 0 ack_time 16 mr 4294967295 dodly4:e32: query_hca: port.link_layer = 0x1 dodly4:e32: query_hca: (a0.0) eps 262076, sz 16351 evds 65408, sz 4194303 mtu 2048 - pkey 0 p_idx 0 sl 0 dodly4:e32: query_hca: msg 1073741824 rdma 1073741824 iov 32 lmr 524272 rmr 0 ack_time 16 mr 4294967295 dodly4:e32: query_hca: port.link_layer = 0x1 dodly4:e32: query_hca: (a0.0) eps 262076, sz 16351 evds 65408, sz 4194303 mtu 2048 - pkey 0 p_idx 0 sl 0 dodly4:e32: query_hca: msg 1073741824 rdma 1073741824 iov 32 lmr 524272 rmr 0 ack_time 16 mr 4294967295 dodly4:e32: dapl_poll: fd=15 ret=1, evnts=0x1 dodly4:e32: dapl_poll: fd=15 ret=0, evnts=0x0 dodly4:e32: dapl_poll: fd=13 ret=0, evnts=0x0 [1] MPI startup(): DAPL provider ofa-v2-mlx4_0-1 [1] MPI startup(): dapl data transfer mode [0] MPI startup(): static connections storm algo dodly0:47ba: dapl_poll: fd=17 ret=1, evnts=0x1 dodly0:47ba: dapl_poll: fd=17 ret=0, evnts=0x0 dodly0:47ba: dapl_poll: fd=14 ret=0, evnts=0x0 dodly0:47ba: dapl_poll: fd=19 ret=0, evnts=0x0 dodly0:47ba: dapl_poll: fd=17 ret=0, evnts=0x0 dodly0:47ba: dapl_poll: fd=14 ret=0, evnts=0x0 dodly0:47ba: dapl_poll: fd=19 ret=1, evnts=0x4 dodly0:47ba: dapl_poll: fd=17 ret=0, evnts=0x0 dodly0:47ba: dapl_poll: fd=14 ret=0, evnts=0x0 dodly0:47ba: dapl_poll: fd=19 ret=0, evnts=0x0 dodly4:e32: dapl_poll: fd=15 ret=0, evnts=0x0 dodly4:e32: dapl_poll: fd=13 ret=1, evnts=0x1 dodly4:e32: dapl_poll: fd=13 ret=0, evnts=0x0 dodly4:e32: dapl_poll: fd=15 ret=1, evnts=0x1 dodly4:e32: dapl_poll: fd=15 ret=0, evnts=0x0 dodly4:e32: dapl_poll: fd=13 ret=0, evnts=0x0 dodly4:e32: dapl_poll: fd=17 ret=1, evnts=0x1 dodly0:47ba: dapl_poll: fd=17 ret=0, evnts=0x0 dodly0:47ba: dapl_poll: fd=14 ret=0, evnts=0x0 dodly0:47ba: dapl_poll: fd=19 ret=1, evnts=0x1 dodly4:e32: dapl_poll: fd=15 ret=0, evnts=0x0 dodly4:e32: dapl_poll: fd=13 ret=0, evnts=0x0 dodly4:e32: dapl_poll: fd=17 ret=1, evnts=0x1 # OSU MPI Bandwidth Test v3.1.1 # Size Bandwidth (MB/s) 1 0.42 2 0.85 4 1.70 8 3.38 16 6.75 32 13.45 64 26.66 128 52.43 256 102.41 512 196.05 1024 350.80 2048 559.92 4096 682.33 8192 748.72 16384 786.83 32768 674.08 65536 795.84 131072 878.78 262144 927.75 524288 949.61 1048576 965.51 2097152 974.14 4194304 978.64 dodly0:47ba: dapl_poll: fd=17 ret=1, evnts=0x1 dodly0:47ba: dapl_poll: fd=17 ret=0, evnts=0x0 dodly0:47ba: dapl_poll: fd=14 ret=0, evnts=0x0 dodly0:47ba: CM FREE: 0x13c939c0 ep=0x13c80b60 st=CM_FREE sck=19 refs=4 dodly4:e32: dapl_poll: fd=15 ret=0, evnts=0x0 dodly4:e32: dapl_poll: fd=13 ret=0, evnts=0x0 dodly4:e32: dapl_poll: fd=17 ret=1, evnts=0x1 dodly0:47ba: dapl_ep_free: Free CM: EP=0x13c80b60 CM=0x13c939c0 dodly4:e32: dapl_poll: fd=15 ret=1, evnts=0x1 dodly4:e32: dapl_poll: fd=15 ret=0, evnts=0x0 dodly4:e32: dapl_poll: fd=13 ret=0, evnts=0x0 dodly4:e32: CM FREE: 0x2f08f70 ep=0x2f09370 st=CM_FREE sck=17 refs=4 dodly0:47ba: cm_free: cm 0x13c939c0 CM_FREE ep 0x13c80b60 refs=1 dodly4:e32: dapl_ep_free: Free CM: EP=0x2f09370 CM=0x2f08f70 dodly4:e32: cm_free: cm 0x2f08f70 CM_FREE ep 0x2f09370 refs=1 dodly4:e32: dapl_poll: fd=15 ret=1, evnts=0x1 dodly4:e32: dapl_poll: fd=15 ret=0, evnts=0x0 dodly4:e32: CM FREE: 0x2f19ce0 ep=(nil) st=CM_FREE sck=13 refs=3 dodly0:47ba: dodly4:e32: dapl_poll: fd=15 ret=1, evnts=0x1 dodly4:e32: dapl_poll: fd=15 ret=0, evnts=0x0 dapl_poll: fd=17 ret=1, evnts=0x1 dodly0:47ba: dapl_poll: fd=17 ret=1, evnts=0x1 dodly0:47ba: dapl_poll: fd=17 ret=1, evnts=0x1 dodly0:47ba: dapl_poll: fd=17 ret=0, evnts=0x0 dodly0:47ba: CM FREE: 0x13c7f940 ep=(nil) st=CM_FREE sck=14 refs=3 [r...@dodly0 compat-dapl-1.2.18]# mpiexec -ppn 1 -n 2 -env I_MPI_FABRICS dapl:dapl -env I_MPI_DEBUG 2 -env I_MPI_CHECK_DAPL_PROVIDER_MISMATCH none -env DAPL_DBG_TYPE 0xffff /tmp/osu dodly0:3b37: dapl_init: dbg_type=0xffff,dbg_dest=0x1 dodly4:237: dapl_init: dbg_type=0xffff,dbg_dest=0x1 dodly0:3b37: open_hca: device mlx4_0 not found dodly0:3b37: open_hca: device mlx4_0 not found dodly0:3b37: Warning: new pkey(32769), query (Success) err or key !found, using defaults dodly0:3b37: query_hca: port.link_layer = 0x1 dodly0:3b37: query_hca: (a0.0) eps 64512, sz 16384 evds 65408, sz 131071 mtu 2048 - pkey 32769 p_idx 0 sl 0 dodly0:3b37: query_hca: msg 2147483648 rdma 2147483648 iov 27 lmr 131056 rmr 0 ack_time 16 mr 4294967295 dodly0:3b37: Warning: new pkey(32769), query (Success) err or key !found, using defaults dodly0:3b37: query_hca: port.link_layer = 0x1 dodly0:3b37: query_hca: (a0.0) eps 64512, sz 16384 evds 65408, sz 131071 mtu 2048 - pkey 32769 p_idx 0 sl 0 dodly0:3b37: query_hca: msg 2147483648 rdma 2147483648 iov 27 lmr 131056 rmr 0 ack_time 16 mr 4294967295 dodly0:3b37: Warning: new pkey(32769), query (Success) err or key !found, using defaults dodly0:3b37: query_hca: port.link_layer = 0x1 dodly0:3b37: query_hca: (a0.0) eps 64512, sz 16384 evds 65408, sz 131071 mtu 2048 - pkey 32769 p_idx 0 sl 0 dodly0:3b37: query_hca: msg 2147483648 rdma 2147483648 iov 27 lmr 131056 rmr 0 ack_time 16 mr 4294967295 dodly0:3b37: dapl_poll: fd=17 ret=1, evnts=0x1 dodly0:3b37: dapl_poll: fd=17 ret=0, evnts=0x0 dodly0:3b37: dapl_poll: fd=14 ret=0, evnts=0x0 dodly4:237: Warning: new pkey(32769), query (Success) err or key !found, using defaults dodly4:237: query_hca: port.link_layer = 0x1 dodly4:237: query_hca: (a0.0) eps 262076, sz 16351 evds 65408, sz 4194303 mtu 2048 - pkey 32769 p_idx 0 sl 0 dodly4:237: query_hca: msg 1073741824 rdma 1073741824 iov 32 lmr 524272 rmr 0 ack_time 16 mr 4294967295 [0] MPI startup(): DAPL provider ofa-v2-mthca0-1 [0] MPI startup(): dapl data transfer mode dodly4:237: Warning: new pkey(32769), query (Success) err or key !found, using defaults dodly4:237: query_hca: port.link_layer = 0x1 dodly4:237: query_hca: (a0.0) eps 262076, sz 16351 evds 65408, sz 4194303 mtu 2048 - pkey 32769 p_idx 0 sl 0 dodly4:237: query_hca: msg 1073741824 rdma 1073741824 iov 32 lmr 524272 rmr 0 ack_time 16 mr 4294967295 dodly4:237: Warning: new pkey(32769), query (Success) err or key !found, using defaults dodly4:237: query_hca: port.link_layer = 0x1 dodly4:237: query_hca: (a0.0) eps 262076, sz 16351 evds 65408, sz 4194303 mtu 2048 - pkey 32769 p_idx 0 sl 0 dodly4:237: query_hca: msg 1073741824 rdma 1073741824 iov 32 lmr 524272 rmr 0 ack_time 16 mr 4294967295 dodly4:237: dapl_poll: fd=15 ret=1, evnts=0x1 dodly4:237: dapl_poll: fd=15 ret=0, evnts=0x0 dodly4:237: dapl_poll: fd=13 ret=0, evnts=0x0 [1] MPI startup(): DAPL provider ofa-v2-mlx4_0-1 [1] MPI startup(): dapl data transfer mode [0] MPI startup(): static connections storm algo dodly0:3b37: dapl_poll: fd=17 ret=1, evnts=0x1 dodly0:3b37: dapl_poll: fd=17 ret=0, evnts=0x0 dodly0:3b37: dapl_poll: fd=14 ret=0, evnts=0x0 dodly0:3b37: dapl_poll: fd=19 ret=0, evnts=0x0 dodly0:3b37: dapl_poll: fd=17 ret=0, evnts=0x0 dodly0:3b37: dapl_poll: fd=14 ret=0, evnts=0x0 dodly0:3b37: dapl_poll: fd=19 ret=1, evnts=0x4 dodly0:3b37: dapl_poll: fd=17 ret=0, evnts=0x0 dodly0:3b37: dapl_poll: fd=14 ret=0, evnts=0x0 dodly0:3b37: dapl_poll: fd=19 ret=0, evnts=0x0 dodly4:237: dapl_poll: fd=15 ret=0, evnts=0x0 dodly4:237: dapl_poll: fd=13 ret=1, evnts=0x1 dodly4:237: dapl_poll: fd=13 ret=0, evnts=0x0 dodly4:237: dapl_poll: fd=15 ret=1, evnts=0x1 dodly4:237: dapl_poll: fd=15 ret=0, evnts=0x0 dodly4:237: dapl_poll: fd=13 ret=0, evnts=0x0 dodly4:237: dapl_poll: fd=17 ret=1, evnts=0x1 dodly0:3b37: dapl_poll: fd=17 ret=0, evnts=0x0 dodly0:3b37: dapl_poll: fd=14 ret=0, evnts=0x0 dodly0:3b37: dapl_poll: fd=19 ret=1, evnts=0x1 dodly4:237: dapl_poll: fd=15 ret=0, evnts=0x0 dodly4:237: dapl_poll: fd=13 ret=0, evnts=0x0 dodly4:237: dapl_poll: fd=17 ret=1, evnts=0x1 # OSU MPI Bandwidth Test v3.1.1 # Size Bandwidth (MB/s) 1 0.42 2 0.85 4 1.69 8 3.37 16 6.74 32 13.45 64 26.70 128 52.45 256 102.12 512 195.68 1024 349.75 2048 555.98 4096 681.94 8192 747.29 16384 785.72 32768 675.27 65536 797.38 131072 879.17 262144 928.16 524288 949.20 1048576 965.38 2097152 974.11 4194304 978.56 dodly0:3b37: dapl_poll: fd=17 ret=1, evnts=0x1 dodly0:3b37: dapl_poll: fd=17 ret=0, evnts=0x0 dodly0:3b37: dapl_poll: fd=14 ret=0, evnts=0x0 dodly0:3b37: CM FREE: 0x1f2d4b40 ep=0x1f2c1b90 st=CM_FREE sck=19 refs=4 dodly4:237: dapl_poll: fd=15 ret=0, evnts=0x0 dodly4:237: dapl_poll: fd=13 ret=0, evnts=0x0 dodly4:237: dapl_poll: fd=17 ret=1, evnts=0x1 dodly4:237: dapl_poll: fd=15 ret=1, evnts=0x1 dodly4:237: dapl_poll: fd=15 ret=0, evnts=0x0 dodly4:237: dapl_poll: fd=13 ret=0, evnts=0x0 dodly4:237: CM FREE: 0x170f3f70 ep=0x170f4370 st=CM_FREE sck=17 refs=4 dodly0:3b37: dapl_ep_free: Free CM: EP=0x1f2c1b90 CM=0x1f2d4b40 dodly0:3b37: cm_free: cm 0x1f2d4b40 CM_FREE ep 0x1f2c1b90 refs=1 dodly4:237: dapl_ep_free: Free CM: EP=0x170f4370 CM=0x170f3f70 dodly4:237: cm_free: cm 0x170f3f70 CM_FREE ep 0x170f4370 refs=1 dodly0:3b37: dapl_poll: fd=17 ret=1, evnts=0x1 dodly4:237: dapl_poll: fd=15 ret=1, evnts=0x1 dodly4:237: dapl_poll: fd=15 ret=1, evnts=0x1 dodly4:237: dapl_poll: fd=15 ret=1, evnts=0x1 dodly4:237: dapl_poll: fd=15 ret=0, evnts=0x0 dodly4:237: CM FREE: 0x17104d90 ep=(nil) st=CM_FREE sck=13 refs=3 dodly0:3b37: dapl_poll: fd=17 ret=1, evnts=0x1 dodly0:3b37: dapl_poll: fd=17 ret=1, evnts=0x1 dodly0:3b37: dapl_poll: fd=17 ret=0, evnts=0x0 dodly0:3b37: CM FREE: 0x1f2c09f0 ep=(nil) st=CM_FREE sck=14 refs=3 [r...@dodly0 compat-dapl-1.2.18]# smpquery PKeyTable 4 0: 0xffff 0x8002 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 8: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 16: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 24: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 32: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 40: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 48: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 56: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 64 pkeys capacity for this port