Re: [OMPI users] error configuring OpenMPI 1.6.3 with gcc 4.7.2
Hi All. I was away all of last week so I am late here. A note that I was able to compile OpenMP 1.6.3 with gcc 4.7.2 successfully. I built gcc 4.7.2 by hand with binutils-2.23.1, mpc-1.0.1, mpfr-3.1.1 and gmp-5.0.5. Joseph On 12/6/2012 2:37 PM, Paul Hatton wrote: Thanks. This is obviously (now) a problem with my gcc build which isn't appropriate for this list. I'll re-visit this and post a solution once I've (hopefully) got this working. I don't have any shared libraries (*.so.*) in my gcc tree so something went badly wrong ... Thanks for your help.
Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3
Hi YK. An update that with your latest Mellanox drivers, I was able to compile OpenMPI 1.6.3 successfully. So yes the issue was with the mxm drivers. Thank you, Joseph On 12/06/2012 01:41 AM, Yevgeny Kliteynik wrote: Joseph, Indeed, there was a problem in the MXM rpm. The fixed MXM has been published at the same location: http://mellanox.com/downloads/hpc/mxm/v1.1/mxm-latest.tar -- YK On 12/4/2012 9:20 AM, Joseph Farran wrote: Hi Mike. Removed the old mxm, downloaded and installed: /tmp/mxm/v1.1/per-ofed/1.5.4.1/mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm I am suing OFED 1.5.4.1 and it still fails at the same spot: make[2]: Entering directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm' CC mtl_mxm.lo CC mtl_mxm_cancel.lo CC mtl_mxm_component.lo CC mtl_mxm_endpoint.lo CC mtl_mxm_probe.lo CC mtl_mxm_recv.lo CC mtl_mxm_send.lo CCLD mca_mtl_mxm.la /bin/grep: /usr/local/mofed-inst/1.5.4.1/lib/librdmacm.la: No such file or directory /bin/sed: can't read /usr/local/mofed-inst/1.5.4.1/lib/librdmacm.la: No such file or directory libtool: link: `/usr/local/mofed-inst/1.5.4.1/lib/librdmacm.la' is not a valid libtool archive make[2]: *** [mca_mtl_mxm.la] Error 1 make[2]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi' make: *** [all-recursive] Error 1 On 12/2/2012 10:18 PM, Mike Dubman wrote: ohh.. you have MOFED 1.5.4.1, thought it was 1.5.3-3.1.0 will provide you a link to mxm package compiled with this MOFED version (thanks to no ABI in OFED). On Sun, Dec 2, 2012 at 10:04 PM, Joseph Farran<jfar...@uci.edu<mailto:jfar...@uci.edu>> wrote: 1.5.4.1
Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3
Hi Mike. Removed the old mxm, downloaded and installed: /tmp/mxm/v1.1/per-ofed/1.5.4.1/mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm I am suing OFED 1.5.4.1 and it still fails at the same spot: make[2]: Entering directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm' CC mtl_mxm.lo CC mtl_mxm_cancel.lo CC mtl_mxm_component.lo CC mtl_mxm_endpoint.lo CC mtl_mxm_probe.lo CC mtl_mxm_recv.lo CC mtl_mxm_send.lo CCLD mca_mtl_mxm.la /bin/grep: /usr/local/mofed-inst/1.5.4.1/lib/librdmacm.la: No such file or directory /bin/sed: can't read /usr/local/mofed-inst/1.5.4.1/lib/librdmacm.la: No such file or directory libtool: link: `/usr/local/mofed-inst/1.5.4.1/lib/librdmacm.la' is not a valid libtool archive make[2]: *** [mca_mtl_mxm.la] Error 1 make[2]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi' make: *** [all-recursive] Error 1 On 12/2/2012 10:18 PM, Mike Dubman wrote: ohh.. you have MOFED 1.5.4.1, thought it was 1.5.3-3.1.0 will provide you a link to mxm package compiled with this MOFED version (thanks to no ABI in OFED). On Sun, Dec 2, 2012 at 10:04 PM, Joseph Farran <jfar...@uci.edu <mailto:jfar...@uci.edu>> wrote: 1.5.4.1
Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3
No cigar with MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64 Here are my steps: - Removed OFED-1.5.4.1 & rebooted - Installed MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64 - rpm -e mxm - rpm -i mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm <--- The new mxm you built. - Rebooted Check: # rpm -qa | egrep "1.5.3|mxm|fca" kernel-ib-devel-1.5.3-2.6.32_279.14.1.el6.x86_64.x86_64 ofed-scripts-1.5.3-OFED.1.5.3.3.1.0.x86_64 mxm-1.1.3a5e745-1.x86_64 tree-1.5.3-2.el6.x86_64 kernel-ib-1.5.3-2.6.32_279.14.1.el6.x86_64.x86_64 fca-2.1.12028-1.x86_64 mlnxofed-docs-1.5.3-3.1.0.noarch Try compiling OpenMPI 1.6.3 and get the same results: make[2]: Entering directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm' CC mtl_mxm.lo CC mtl_mxm_cancel.lo CC mtl_mxm_recv.lo t.lo CC mtl_mxm_endpoint.lo CC mtl_mxm_probe.lo CC mtl_mxm_send.lo CCLD mca_mtl_mxm.la /bin/grep: /usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la: No such file or directory /bin/sed: can't read /usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la: No such file or directory libtool: link: `/usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la' is not a valid libtool archive make[2]: *** [mca_mtl_mxm.la] Error 1 make[2]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi' make: *** [all-recursive] Error 1 # ls /usr/local/mofed-inst ls: cannot access /usr/local/mofed-inst: No such file or directory # # ls /usr/local bin etc include lib lib64 libexec sbin share src # # find /usr -name "*mofed*" -print # On 12/2/2012 1:05 PM, Joseph Farran wrote: Next I will try MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64, with the mxm and try again. Joseph
Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3
Next I will try MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64, with the mxm and try again. Joseph On 12/2/2012 12:04 PM, Joseph Farran wrote: Hi again. Had to get some sleep :-) Same thing. Let me outline the steps I took in case I missed something. I have a stock CentOS 6.3 with kernel 2.6.32-279.14.1.el6.x86_64 Install OFED-1.5.4.1 as follows: cd OFED-1.5.4.1 ./install.pl --all --print-available grep -v debuginfo ofed-all.conf > ofed.conf ./install.pl -c ofed.conf After a while, OFED 1.5.4.1 says it installed successfully. I reboot and commands like ibhost, etc work. I now install mxm amd fca as follows ( using your new mxm ): # rpm -e mxm <--- To make sure. # cd /tmp # rpm -i /tmp/mxm/v1.1/per-ofed/1.5.3-3.1.0/mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm # rpm -qa | grep mxm mxm-1.1.3a5e745-1.x86_64 Now I try compiling OpenMPI 1.6.3 with the config: CFLAGS="" FCFLAGS="" ./configure\ --with-sge \ --with-openib=/usr \ --enable-openib-connectx-xrc\ --enable-mpi-thread-multiple\ --with-threads \ --with-hwloc\ --enable-heterogeneous \ --with-fca=/opt/mellanox/fca\ --with-mxm-libdir=/opt/mellanox/mxm/lib \ --with-mxm=/opt/mellanox/mxm\ --prefix=/data/openmpi-1-6.3 And it again fails with the new 1.5.3-3.1.0 make[2]: Entering directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm' CC mtl_mxm.lo CC mtl_mxm_component.lo CC mtl_mxm_endpoint.lo CC mtl_mxm_recv.lo CC mtl_mxm_send.lo CCLD mca_mtl_mxm.la /bin/grep: /usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la: No such file or directory /bin/sed: can't read /usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la: No such file or directory libtool: link: `/usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la' is not a valid libtool archive make[2]: *** [mca_mtl_mxm.la] Error 1 make[2]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi' make: *** [all-recursive] Error 1 Note: I don't see a /usr/local/mofed-inst # ls /usr/local bin etc include lib lib64 libexec sbin share src Question. When I built the OFED 1.5.4.1 above, I skipped the debug packages ( grep -v debuginfo ofed-all.conf > ofed.conf ). I don't think I need them? Any other suggestions? On 12/2/2012 2:56 AM, Mike Dubman wrote: please redownload from http://mellanox.com/downloads/hpc/mxm/v1.1/mxm-latest.tar it contains binaries compiled with mofed 1.5.3-3.1.0 M ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3
Hi again. Had to get some sleep :-) Same thing. Let me outline the steps I took in case I missed something. I have a stock CentOS 6.3 with kernel 2.6.32-279.14.1.el6.x86_64 Install OFED-1.5.4.1 as follows: cd OFED-1.5.4.1 ./install.pl --all --print-available grep -v debuginfo ofed-all.conf > ofed.conf ./install.pl -c ofed.conf After a while, OFED 1.5.4.1 says it installed successfully. I reboot and commands like ibhost, etc work. I now install mxm amd fca as follows ( using your new mxm ): # rpm -e mxm <--- To make sure. # cd /tmp # rpm -i /tmp/mxm/v1.1/per-ofed/1.5.3-3.1.0/mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm # rpm -qa | grep mxm mxm-1.1.3a5e745-1.x86_64 Now I try compiling OpenMPI 1.6.3 with the config: CFLAGS="" FCFLAGS="" ./configure\ --with-sge \ --with-openib=/usr \ --enable-openib-connectx-xrc\ --enable-mpi-thread-multiple\ --with-threads \ --with-hwloc\ --enable-heterogeneous \ --with-fca=/opt/mellanox/fca\ --with-mxm-libdir=/opt/mellanox/mxm/lib \ --with-mxm=/opt/mellanox/mxm\ --prefix=/data/openmpi-1-6.3 And it again fails with the new 1.5.3-3.1.0 make[2]: Entering directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm' CC mtl_mxm.lo CC mtl_mxm_component.lo CC mtl_mxm_endpoint.lo CC mtl_mxm_recv.lo CC mtl_mxm_send.lo CCLD mca_mtl_mxm.la /bin/grep: /usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la: No such file or directory /bin/sed: can't read /usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la: No such file or directory libtool: link: `/usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la' is not a valid libtool archive make[2]: *** [mca_mtl_mxm.la] Error 1 make[2]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi' make: *** [all-recursive] Error 1 Note: I don't see a /usr/local/mofed-inst # ls /usr/local bin etc include lib lib64 libexec sbin share src Question. When I built the OFED 1.5.4.1 above, I skipped the debug packages ( grep -v debuginfo ofed-all.conf > ofed.conf ). I don't think I need them? Any other suggestions? On 12/2/2012 2:56 AM, Mike Dubman wrote: please redownload from http://mellanox.com/downloads/hpc/mxm/v1.1/mxm-latest.tar it contains binaries compiled with mofed 1.5.3-3.1.0 M
Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3
Same thing. My new config: CFLAGS="" FCFLAGS="" ./configure \ --with-sge \ --with-openib=/usr \ --enable-openib-connectx-xrc \ --enable-mpi-thread-multiple \ --with-threads \ --with-hwloc \ --enable-heterogeneous \ --with-fca=/opt/mellanox/fca \ --with-mxm-libdir=/opt/mellanox/mxm/lib \ --with-mxm=/opt/mellanox/mxm \ Fails at the same spot: make[2]: Entering directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm' CC mtl_mxm.lo CC mtl_mxm_cancel.lo CC mtl_mxm_endpoint.lo CC mtl_mxm_probe.lo CC mtl_mxm_recv.lo CCLD mca_mtl_mxm.la /bin/grep: /usr/local/mofed-inst/1.5.3-3.0.0/lib/librdmacm.la: No such file or directory /bin/sed: can't read /usr/local/mofed-inst/1.5.3-3.0.0/lib/librdmacm.la: No such file or directory libtool: link: `/usr/local/mofed-inst/1.5.3-3.0.0/lib/librdmacm.la' is not a valid libtool archive make[2]: *** [mca_mtl_mxm.la] Error 1 make[2]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi' make: *** [all-recursive] Error 1 On 12/2/2012 1:37 AM, Mike Dubman wrote: please change "--with-openib" to "--with-openib=/usr" and retry configure/make stage. 10x On Sun, Dec 2, 2012 at 10:36 AM, Joseph Farran <jfar...@uci.edu> wrote: Hi Mike. Thanks for the help! I am installing OFED on an NFS share partition so that all compute nodes will have access. For the "--with-openib" option, I don't specify one. My config file looks like this: CFLAGS="" FCFLAGS="" ./configure \ --with-sge \ --with-openib \ --enable-openib-connectx-xrc \ --enable-mpi-thread-multiple \ --with-threads \ --with-hwloc \ --enable-heterogeneous \ --with-fca=/opt/mellanox/fca \ --with-mxm-libdir=/opt/mellanox/mxm/lib \ --with-mxm=/opt/mellanox/mxm \ --prefix=/data/openmpi-1-6.3 Please advise, Joseph On 12/1/2012 11:39 PM, Mike Dubman wrote: Hi Joseph, I guess you install MOFED under /usr, is that right? Could you please specify "--with-openib=/usr" parameter during ompi "configure" stage? 10x M On Fri, Nov 30, 2012 at 1:11 AM, Joseph Farran <jfar...@uci.edu> wrote: Hi YK: Yes, I have those installed but they are newer versions: # rpm -qa | grep rdma librdmacm-1.0.15-1.x86_64 librdmacm-utils-1.0.15-1.x86_64 librdmacm-devel-1.0.15-1.x86_64 # locate librdmacm.la # Here are the RPMs the Mellanox build created for kernel: 2.6.32-279.14.1.el6.x86_64 # ls *rdma* librdmacm-1.0.15-1.i686.rpm librdmacm-devel-1.0.15-1.i686.rpm librdmacm-utils-1.0.15-1.i686.rpm
Re: [OMPI users] OpenMPI-1.6.3 & MXM
Hi again. I believe I have the latest mxm: # rpm -qa| fgrep mxm mxm-1.1.3a5e745-1.x86_64 Let me know if I have the config part correct from previous email. Best, Joseph On 12/1/2012 11:44 PM, Mike Dubman wrote: Hi, The mxm which is part of MOFED 1.5.3 supports OMPI 1.6.0. The mxm upgrade is needed to work with OMPI 1.6.3+ Please remove mxm from your cluster nodes (rpm -e mxm) Install latest from http://mellanox/com/products/mxm/ Compile ompi 1.6.3, add following to its configure line: ./configure --with-openib=/usr --with-mxm=/opt/mellanox/mxm <...>) Regards M On Sat, Dec 1, 2012 at 2:23 AM, Joseph Farran <jfar...@uci.edu> wrote: Konz, For whatever it is worth, I am in the same boat. I have CentOS 6.3, trying to compile OpenMPI 1.6.3 with the mxm from Mellanox and it fails. Also, the Mellanox OFED ( MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64 ) does not work either. Mellanox really needs to step in here and help out. I have a cluster full of Mellanox products and I hate to think we chose the wrong Infiniband vendor. Joseph On 11/30/2012 12:33 PM, Konz, Jeffrey (SSA Solution Centers) wrote: I tried building the latest OpenMPI-1.6.3 with MXM support and got this error: make[2]: Entering directory `Src/openmpi-1.6.3/ompi/mca/mtl/mxm' CC mtl_mxm.lo CC mtl_mxm_cancel.lo CC mtl_mxm_component.lo CC mtl_mxm_endpoint.lo CC mtl_mxm_probe.lo CC mtl_mxm_recv.lo CC mtl_mxm_send.lo mtl_mxm_send.c: In function 'ompi_mtl_mxm_send': mtl_mxm_send.c:96: error: 'mxm_wait_t' undeclared (first use in this function) mtl_mxm_send.c:96: error: (Each undeclared identifier is reported only once mtl_mxm_send.c:96: error: for each function it appears in.) mtl_mxm_send.c:96: error: expected ';' before 'wait' mtl_mxm_send.c:104: error: 'MXM_REQ_FLAG_BLOCKING' undeclared (first use in this function) mtl_mxm_send.c:118: error: 'MXM_REQ_FLAG_SEND_SYNC' undeclared (first use in this function) mtl_mxm_send.c:134: error: 'wait' undeclared (first use in this function) mtl_mxm_send.c: In function 'ompi_mtl_mxm_isend': mtl_mxm_send.c:183: error: 'MXM_REQ_FLAG_SEND_SYNC' undeclared (first use in this function) make[2]: *** [mtl_mxm_send.lo] Error 1 Our OFED is 1.5.3 and our MXM version is 1.0.601. Thanks, -Jeff /**/ /* Jeff Konz jeffrey.k...@hp.com */ /* Solutions Architect HPC Benchmarking */ /* Americas Shared Solutions Architecture (SSA) */ /* Hewlett-Packard Company */ /* Office: 248-491-7480 Mobile: 248-345-6857 */ /**/ ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/
Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3
Hi Mike. Thanks for the help! I am installing OFED on an NFS share partition so that all compute nodes will have access. For the "--with-openib" option, I don't specify one. My config file looks like this: CFLAGS="" FCFLAGS="" ./configure \ --with-sge \ --with-openib \ --enable-openib-connectx-xrc \ --enable-mpi-thread-multiple \ --with-threads \ --with-hwloc \ --enable-heterogeneous \ --with-fca=/opt/mellanox/fca \ --with-mxm-libdir=/opt/mellanox/mxm/lib \ --with-mxm=/opt/mellanox/mxm \ --prefix=/data/openmpi-1-6.3 Please advise, Joseph On 12/1/2012 11:39 PM, Mike Dubman wrote: Hi Joseph, I guess you install MOFED under /usr, is that right? Could you please specify "--with-openib=/usr" parameter during ompi "configure" stage? 10x M On Fri, Nov 30, 2012 at 1:11 AM, Joseph Farran <jfar...@uci.edu> wrote: Hi YK: Yes, I have those installed but they are newer versions: # rpm -qa | grep rdma librdmacm-1.0.15-1.x86_64 librdmacm-utils-1.0.15-1.x86_64 librdmacm-devel-1.0.15-1.x86_64 # locate librdmacm.la # Here are the RPMs the Mellanox build created for kernel: 2.6.32-279.14.1.el6.x86_64 # ls *rdma* librdmacm-1.0.15-1.i686.rpm librdmacm-devel-1.0.15-1.i686.rpm librdmacm-utils-1.0.15-1.i686.rpm librdmacm-1.0.15-1.x86_64.rpm librdmacm-devel-1.0.15-1.x86_64.rpm librdmacm-utils-1.0.15-1.x86_64.rpm On 11/29/2012 02:59 PM, Yevgeny Kliteynik wrote: Joseph, You're supposed to have librdmacm installed as part of MLNX_OFED installation. What does "rpm -qa | grep rdma" tell? $ rpm -qa | grep rdma librdmacm-devel-1.0.14.1-1.x86_64 librdmacm-utils-1.0.14.1-1.x86_64 librdmacm-1.0.14.1-1.x86_64 $ locate librdmacm.la /usr/local/mofed/1.5.3-4.0.9/lib/librdmacm.la -- YK
Re: [OMPI users] OpenMPI-1.6.3 & MXM
Konz, For whatever it is worth, I am in the same boat. I have CentOS 6.3, trying to compile OpenMPI 1.6.3 with the mxm from Mellanox and it fails. Also, the Mellanox OFED ( MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64 ) does not work either. Mellanox really needs to step in here and help out. I have a cluster full of Mellanox products and I hate to think we chose the wrong Infiniband vendor. Joseph On 11/30/2012 12:33 PM, Konz, Jeffrey (SSA Solution Centers) wrote: I tried building the latest OpenMPI-1.6.3 with MXM support and got this error: make[2]: Entering directory `Src/openmpi-1.6.3/ompi/mca/mtl/mxm' CC mtl_mxm.lo CC mtl_mxm_cancel.lo CC mtl_mxm_component.lo CC mtl_mxm_endpoint.lo CC mtl_mxm_probe.lo CC mtl_mxm_recv.lo CC mtl_mxm_send.lo mtl_mxm_send.c: In function 'ompi_mtl_mxm_send': mtl_mxm_send.c:96: error: 'mxm_wait_t' undeclared (first use in this function) mtl_mxm_send.c:96: error: (Each undeclared identifier is reported only once mtl_mxm_send.c:96: error: for each function it appears in.) mtl_mxm_send.c:96: error: expected ';' before 'wait' mtl_mxm_send.c:104: error: 'MXM_REQ_FLAG_BLOCKING' undeclared (first use in this function) mtl_mxm_send.c:118: error: 'MXM_REQ_FLAG_SEND_SYNC' undeclared (first use in this function) mtl_mxm_send.c:134: error: 'wait' undeclared (first use in this function) mtl_mxm_send.c: In function 'ompi_mtl_mxm_isend': mtl_mxm_send.c:183: error: 'MXM_REQ_FLAG_SEND_SYNC' undeclared (first use in this function) make[2]: *** [mtl_mxm_send.lo] Error 1 Our OFED is 1.5.3 and our MXM version is 1.0.601. Thanks, -Jeff /**/ /* Jeff Konz jeffrey.k...@hp.com */ /* Solutions Architect HPC Benchmarking */ /* Americas Shared Solutions Architecture (SSA) */ /* Hewlett-Packard Company*/ /* Office: 248-491-7480 Mobile: 248-345-6857 */ /**/ ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] OpenMPI 1.6.3 and Memory Issues
Hi again. I am using /etc/modprobe.d/mofed.conf, otherwise I get: WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/ But I am still getting the memory errors after making the changes and rebooting: $ cat /etc/modprobe.d/mofed.conf options mlx4_core log_num_mtt=24 options mlx4_core log_mtts_per_seg=1 $ mpirun hello -- WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash. On 11/29/2012 04:39 PM, Yevgeny Kliteynik wrote: You can also set these parameters in /etc/modprobe.conf: options mlx4_core log_num_mtt=24 log_mtts_per_seg=1 -- YK
Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3
Hi YK: Yes, I have those installed but they are newer versions: # rpm -qa | grep rdma librdmacm-1.0.15-1.x86_64 librdmacm-utils-1.0.15-1.x86_64 librdmacm-devel-1.0.15-1.x86_64 # locate librdmacm.la # Here are the RPMs the Mellanox build created for kernel: 2.6.32-279.14.1.el6.x86_64 # ls *rdma* librdmacm-1.0.15-1.i686.rpmlibrdmacm-devel-1.0.15-1.i686.rpm librdmacm-utils-1.0.15-1.i686.rpm librdmacm-1.0.15-1.x86_64.rpm librdmacm-devel-1.0.15-1.x86_64.rpm librdmacm-utils-1.0.15-1.x86_64.rpm On 11/29/2012 02:59 PM, Yevgeny Kliteynik wrote: Joseph, You're supposed to have librdmacm installed as part of MLNX_OFED installation. What does "rpm -qa | grep rdma" tell? $ rpm -qa | grep rdma librdmacm-devel-1.0.14.1-1.x86_64 librdmacm-utils-1.0.14.1-1.x86_64 librdmacm-1.0.14.1-1.x86_64 $ locate librdmacm.la /usr/local/mofed/1.5.3-4.0.9/lib/librdmacm.la -- YK
[OMPI users] OpenMPI 1.6.3 and Memory Issues
Hi All. In compiling a simple Hello world with OpenMPI 1.6.3 and mpirun the hello program, I am getting: $ ulimit -l unlimited $ mpirun -np 2 hello -- WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash. This may be caused by your OpenFabrics vendor limiting the amount of physical memory that can be registered. You should investigate the relevant Linux kernel module parameters that control how much physical memory can be registered, and increase them to allow registering all physical memory on your machine. See this Open MPI FAQ item for more information on these Linux kernel module parameters: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages Local host: hpc Registerable memory: 4096 MiB Total memory:258470 MiB Your MPI job will continue, but may be behave poorly and/or hang. -- Hello World. I am the Master Node (hpc) with Rank 0. Hello World. I am compute Node (hpc) with Rank 1 [hpc:08261] 1 more process has sent help message help-mpi-btl-openib.txt / reg mem limit low [hpc:08261] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages I have my limits setup with: cat /etc/security/limits.conf * soft memlock unlimited * hard memlock unlimited What am I missing? OS is CentOS 6.3. Joseph
Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3
On 11/28/2012 10:53 AM, Mike Dubman wrote: You need mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm On Wed, Nov 28, 2012 at 7:44 PM, Joseph Farran <jfar...@uci.edu <mailto:jfar...@uci.edu>> wrote: mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm After installing MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64, removing the old mxm and installing the corrrect mxm, I still cannot get OpenMPI to compile. rpm -qa|fgrep mxm mxm-1.1.3a5e745-1.x86_64 My compile of openmpi 1.6.3 with the following configs fails: --with-sge \ --with-openib \ --enable-openib-connectx-xrc\ --enable-mpi-thread-multiple\ --with-threads \ --with-hwloc\ --enable-heterogeneous \ --with-fca=/opt/mellanox/fca\ --with-mxm=/opt/mellanox/mxm\ --with-mxm-libdir=/opt/mellanox/mxm/lib \ make[2]: Entering directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm' CC mtl_mxm.lo CC mtl_mxm_cancel.lo CC mtl_mxm_component.lo CC mtl_mxm_endpoint.lo CC mtl_mxm_probe.lo CC mtl_mxm_recv.lo CC mtl_mxm_send.lo CCLD mca_mtl_mxm.la /bin/grep: /usr/local/mofed-inst/1.5.3-3.0.0/lib/librdmacm.la: No such file or directory /bin/sed: can't read /usr/local/mofed-inst/1.5.3-3.0.0/lib/librdmacm.la: No such file or directory libtool: link: `/usr/local/mofed-inst/1.5.3-3.0.0/lib/librdmacm.la' is not a valid libtool archive make[2]: *** [mca_mtl_mxm.la] Error 1 make[2]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi' make: *** [all-recursive] Error 1 No where in my system can I find "librdmacm.la"What am I missing? Joseph
Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3
Question on the version to use. The tar files contains the following RPMS: # ls | grep -v debug mxm-1.1.3a5e745-1.x86_64-centos5u6.rpm mxm-1.1.3a5e745-1.x86_64-centos5u7.rpm mxm-1.1.3a5e745-1.x86_64-centos6u0.rpm mxm-1.1.3a5e745-1.x86_64-rhel5u5.rpm mxm-1.1.3a5e745-1.x86_64-rhel6u1.rpm mxm-1.1.3a5e745-1.x86_64-rhel6u2.rpm mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm mxm-1.1.3a5e745-1.x86_64-sles10sp4.rpm mxm-1.1.3a5e745-1.x86_64-sles11sp1.rpm mxm-1.1.3a5e745-1.x86_64-sles11sp2.rpm For CentOS 6.3, which one do I use? Will mxm-1.1.3a5e745-1.x86_64-centos6u0.rpm work, or do I need update 3 ( mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm )? Joseph On 11/28/2012 06:29 AM, Yevgeny Kliteynik wrote: On 11/28/2012 10:52 AM, Pavel Mezentsev wrote: You can try downloading and installing a fresher version of MXM from mellanox web site. There was a thread on the list with the same problem, you can search for it. Indeed, that OFED version comes with older version of MXM. You can get the newer version here: http://www.mellanox.com/products/mxm -- YK 2012/11/28 Joseph Farran<jfar...@uci.edu<mailto:jfar...@uci.edu>> Howdy. I a have a stock CentOS 6.3 OS and a Mellanox MT26428 card. I installed the Mellanox OFED MLNX_OFED_LINUX-1.5.3-3.1.0-__rhel6.3-x86_64 which installed just fine. Rebooted the system and when I try building OpenMPI 1.6.3, it aborts with: mtl_mxm_send.c: In function 'ompi_mtl_mxm_isend': mtl_mxm_send.c:183: error: 'MXM_REQ_FLAG_SEND_SYNC' undeclared (first use in this function) make[2]: *** [mtl_mxm_send.lo] Error 1 make[2]: *** Waiting for unfinished jobs make[2]: Leaving directory `/data/apps/sources/openmpi-1.__6.3/ompi/mca/mtl/mxm' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/data/apps/sources/openmpi-1.__6.3/ompi' make: *** [all-recursive] Error 1 My configure options are: --with-sge \ --with-threads \ --with-hwloc\ --with-openib \ --enable-mpi-thread-multiple\ --with-mxm=/opt/mellanox/mxm\ --with-mxm-libdir=/opt/__mellanox/mxm/lib \ --with-fca=/opt/mellanox/fca\ --enable-heterogeneous \ --enable-openib-connectx-xrc\ Has anyone been able to compile OpenMPI 1.6.3 with the Mellanox OFED on CentOS 6.3? Joseph _ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> http://www.open-mpi.org/__mailman/listinfo.cgi/users<http://www.open-mpi.org/mailman/listinfo.cgi/users> ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3
Perfect and Thanks! I had searched the Mellanox web site for the mxm package to no avail. I will try recompiling later today. Best, Joseph On 11/28/2012 06:29 AM, Yevgeny Kliteynik wrote: On 11/28/2012 10:52 AM, Pavel Mezentsev wrote: You can try downloading and installing a fresher version of MXM from mellanox web site. There was a thread on the list with the same problem, you can search for it. Indeed, that OFED version comes with older version of MXM. You can get the newer version here: http://www.mellanox.com/products/mxm -- YK 2012/11/28 Joseph Farran<jfar...@uci.edu<mailto:jfar...@uci.edu>> Howdy. I a have a stock CentOS 6.3 OS and a Mellanox MT26428 card. I installed the Mellanox OFED MLNX_OFED_LINUX-1.5.3-3.1.0-__rhel6.3-x86_64 which installed just fine. Rebooted the system and when I try building OpenMPI 1.6.3, it aborts with: mtl_mxm_send.c: In function 'ompi_mtl_mxm_isend': mtl_mxm_send.c:183: error: 'MXM_REQ_FLAG_SEND_SYNC' undeclared (first use in this function) make[2]: *** [mtl_mxm_send.lo] Error 1 make[2]: *** Waiting for unfinished jobs make[2]: Leaving directory `/data/apps/sources/openmpi-1.__6.3/ompi/mca/mtl/mxm' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/data/apps/sources/openmpi-1.__6.3/ompi' make: *** [all-recursive] Error 1 My configure options are: --with-sge \ --with-threads \ --with-hwloc\ --with-openib \ --enable-mpi-thread-multiple\ --with-mxm=/opt/mellanox/mxm\ --with-mxm-libdir=/opt/__mellanox/mxm/lib \ --with-fca=/opt/mellanox/fca\ --enable-heterogeneous \ --enable-openib-connectx-xrc\ Has anyone been able to compile OpenMPI 1.6.3 with the Mellanox OFED on CentOS 6.3? Joseph _ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> http://www.open-mpi.org/__mailman/listinfo.cgi/users<http://www.open-mpi.org/mailman/listinfo.cgi/users> ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] CentOS 6.3 & OpenMPI 1.6.3
Howdy. I a have a stock CentOS 6.3 OS and a Mellanox MT26428 card. I installed the Mellanox OFED MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64 which installed just fine. Rebooted the system and when I try building OpenMPI 1.6.3, it aborts with: mtl_mxm_send.c: In function 'ompi_mtl_mxm_isend': mtl_mxm_send.c:183: error: 'MXM_REQ_FLAG_SEND_SYNC' undeclared (first use in this function) make[2]: *** [mtl_mxm_send.lo] Error 1 make[2]: *** Waiting for unfinished jobs make[2]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi' make: *** [all-recursive] Error 1 My configure options are: --with-sge \ --with-threads \ --with-hwloc\ --with-openib \ --enable-mpi-thread-multiple\ --with-mxm=/opt/mellanox/mxm\ --with-mxm-libdir=/opt/mellanox/mxm/lib \ --with-fca=/opt/mellanox/fca\ --enable-heterogeneous \ --enable-openib-connectx-xrc\ Has anyone been able to compile OpenMPI 1.6.3 with the Mellanox OFED on CentOS 6.3? Joseph