Re: [OMPI users] error configuring OpenMPI 1.6.3 with gcc 4.7.2

2012-12-11 Thread Joseph Farran

Hi All.

I was away all of last week so I am late here. A note that I was able to 
compile OpenMP 1.6.3 with gcc 4.7.2 successfully.

I built gcc 4.7.2 by hand with binutils-2.23.1, mpc-1.0.1, mpfr-3.1.1 and 
gmp-5.0.5.

Joseph

On 12/6/2012 2:37 PM, Paul Hatton wrote:

Thanks. This is obviously (now) a problem with my gcc build which isn't 
appropriate for this list. I'll re-visit this and post a solution once I've 
(hopefully) got this working. I don't have any shared libraries (*.so.*) in my 
gcc tree so something went badly wrong ...

Thanks for your help.





Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-10 Thread Joseph Farran

Hi YK.

An update that with your latest Mellanox drivers, I was able to compile OpenMPI 
1.6.3 successfully.

So yes the issue was with the mxm drivers.

Thank you,
Joseph


On 12/06/2012 01:41 AM, Yevgeny Kliteynik wrote:

Joseph,

Indeed, there was a problem in the MXM rpm.
The fixed MXM has been published at the same location:
   http://mellanox.com/downloads/hpc/mxm/v1.1/mxm-latest.tar

-- YK

On 12/4/2012 9:20 AM, Joseph Farran wrote:

Hi Mike.

Removed the old mxm, downloaded and installed:

/tmp/mxm/v1.1/per-ofed/1.5.4.1/mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm

I am suing OFED 1.5.4.1 and it still fails at the same spot:

make[2]: Entering directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
CC mtl_mxm.lo
CC mtl_mxm_cancel.lo
CC mtl_mxm_component.lo
CC mtl_mxm_endpoint.lo
CC mtl_mxm_probe.lo
CC mtl_mxm_recv.lo
CC mtl_mxm_send.lo
CCLD mca_mtl_mxm.la
/bin/grep: /usr/local/mofed-inst/1.5.4.1/lib/librdmacm.la: No such file or 
directory
/bin/sed: can't read /usr/local/mofed-inst/1.5.4.1/lib/librdmacm.la: No such 
file or directory
libtool: link: `/usr/local/mofed-inst/1.5.4.1/lib/librdmacm.la' is not a valid 
libtool archive
make[2]: *** [mca_mtl_mxm.la] Error 1
make[2]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi'
make: *** [all-recursive] Error 1


On 12/2/2012 10:18 PM, Mike Dubman wrote:

ohh.. you have MOFED 1.5.4.1, thought it was 1.5.3-3.1.0
will provide you a link to mxm package compiled with this MOFED version (thanks 
to no ABI in OFED).

On Sun, Dec 2, 2012 at 10:04 PM, Joseph 
Farran<jfar...@uci.edu<mailto:jfar...@uci.edu>>  wrote:

1.5.4.1








Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-04 Thread Joseph Farran

Hi Mike.

Removed the old mxm, downloaded and installed:

/tmp/mxm/v1.1/per-ofed/1.5.4.1/mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm

I am suing OFED 1.5.4.1 and it still fails at the same spot:

make[2]: Entering directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
  CC mtl_mxm.lo
  CC mtl_mxm_cancel.lo
  CC mtl_mxm_component.lo
  CC mtl_mxm_endpoint.lo
  CC mtl_mxm_probe.lo
  CC mtl_mxm_recv.lo
  CC mtl_mxm_send.lo
  CCLD   mca_mtl_mxm.la
/bin/grep: /usr/local/mofed-inst/1.5.4.1/lib/librdmacm.la: No such file or 
directory
/bin/sed: can't read /usr/local/mofed-inst/1.5.4.1/lib/librdmacm.la: No such 
file or directory
libtool: link: `/usr/local/mofed-inst/1.5.4.1/lib/librdmacm.la' is not a valid 
libtool archive
make[2]: *** [mca_mtl_mxm.la] Error 1
make[2]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi'
make: *** [all-recursive] Error 1


On 12/2/2012 10:18 PM, Mike Dubman wrote:

ohh.. you have MOFED 1.5.4.1, thought it was 1.5.3-3.1.0
will provide you a link to mxm package compiled with this MOFED version (thanks 
to no ABI in OFED).

On Sun, Dec 2, 2012 at 10:04 PM, Joseph Farran <jfar...@uci.edu 
<mailto:jfar...@uci.edu>> wrote:

1.5.4.1






Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-02 Thread Joseph Farran

No cigar with MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64

Here are my steps:

- Removed OFED-1.5.4.1 & rebooted
- Installed MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64
- rpm -e mxm
- rpm -i mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm   <--- The new mxm you built.
- Rebooted

Check:
# rpm -qa | egrep "1.5.3|mxm|fca"
kernel-ib-devel-1.5.3-2.6.32_279.14.1.el6.x86_64.x86_64
ofed-scripts-1.5.3-OFED.1.5.3.3.1.0.x86_64
mxm-1.1.3a5e745-1.x86_64
tree-1.5.3-2.el6.x86_64
kernel-ib-1.5.3-2.6.32_279.14.1.el6.x86_64.x86_64
fca-2.1.12028-1.x86_64
mlnxofed-docs-1.5.3-3.1.0.noarch

Try compiling OpenMPI 1.6.3 and get the same results:

make[2]: Entering directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
  CC mtl_mxm.lo
  CC mtl_mxm_cancel.lo
  CC mtl_mxm_recv.lo
t.lo
  CC mtl_mxm_endpoint.lo
  CC mtl_mxm_probe.lo
  CC mtl_mxm_send.lo
  CCLD   mca_mtl_mxm.la
/bin/grep: /usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la: No such file or 
directory
/bin/sed: can't read /usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la: No 
such file or directory
libtool: link: `/usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la' is not a 
valid libtool archive
make[2]: *** [mca_mtl_mxm.la] Error 1
make[2]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi'
make: *** [all-recursive] Error 1


# ls /usr/local/mofed-inst
ls: cannot access /usr/local/mofed-inst: No such file or directory
#
# ls /usr/local
bin  etc  include  lib  lib64  libexec  sbin  share  src
#
# find  /usr -name "*mofed*" -print
#




On 12/2/2012 1:05 PM, Joseph Farran wrote:

Next I will try MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64, with the mxm and 
try again.

Joseph




Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-02 Thread Joseph Farran

Next I will try MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64, with the mxm and 
try again.

Joseph

On 12/2/2012 12:04 PM, Joseph Farran wrote:

Hi again.

Had to get some sleep :-)

Same thing. Let me outline the steps I took in case I missed something.

I have a stock CentOS 6.3 with kernel 2.6.32-279.14.1.el6.x86_64

Install OFED-1.5.4.1 as follows:
cd OFED-1.5.4.1
./install.pl --all --print-available
grep -v debuginfo ofed-all.conf  > ofed.conf
./install.pl -c ofed.conf

After a while, OFED 1.5.4.1 says it installed successfully.   I reboot and 
commands like ibhost, etc work.

I now install mxm amd fca as follows ( using your new mxm ):

# rpm -e mxm <--- To make sure.
# cd /tmp
# rpm -i /tmp/mxm/v1.1/per-ofed/1.5.3-3.1.0/mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm
# rpm -qa | grep mxm
mxm-1.1.3a5e745-1.x86_64

Now I try compiling OpenMPI 1.6.3 with the config:

CFLAGS="" FCFLAGS="" ./configure\
--with-sge  \
--with-openib=/usr  \
--enable-openib-connectx-xrc\
--enable-mpi-thread-multiple\
--with-threads  \
--with-hwloc\
--enable-heterogeneous  \
--with-fca=/opt/mellanox/fca\
--with-mxm-libdir=/opt/mellanox/mxm/lib \
--with-mxm=/opt/mellanox/mxm\
--prefix=/data/openmpi-1-6.3


And it again fails with the new 1.5.3-3.1.0

make[2]: Entering directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
  CC mtl_mxm.lo
  CC mtl_mxm_component.lo
  CC mtl_mxm_endpoint.lo
  CC mtl_mxm_recv.lo

  CC mtl_mxm_send.lo
  CCLD   mca_mtl_mxm.la
/bin/grep: /usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la: No such file or 
directory
/bin/sed: can't read /usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la: No 
such file or directory
libtool: link: `/usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la' is not a 
valid libtool archive
make[2]: *** [mca_mtl_mxm.la] Error 1
make[2]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi'
make: *** [all-recursive] Error 1


Note:  I don't see a /usr/local/mofed-inst

# ls /usr/local
bin  etc  include  lib  lib64  libexec  sbin  share  src

Question.   When I built the OFED 1.5.4.1 above, I skipped the debug packages ( 
grep -v debuginfo ofed-all.conf  > ofed.conf ).   I don't think I need them?

Any other suggestions?


On 12/2/2012 2:56 AM, Mike Dubman wrote:

please redownload from http://mellanox.com/downloads/hpc/mxm/v1.1/mxm-latest.tar
it contains binaries compiled with mofed 1.5.3-3.1.0
M


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-02 Thread Joseph Farran

Hi again.

Had to get some sleep :-)

Same thing. Let me outline the steps I took in case I missed something.

I have a stock CentOS 6.3 with kernel 2.6.32-279.14.1.el6.x86_64

Install OFED-1.5.4.1 as follows:
cd OFED-1.5.4.1
./install.pl --all --print-available
grep -v debuginfo ofed-all.conf  > ofed.conf
./install.pl -c ofed.conf

After a while, OFED 1.5.4.1 says it installed successfully.   I reboot and 
commands like ibhost, etc work.

I now install mxm amd fca as follows ( using your new mxm ):

# rpm -e mxm <--- To make sure.
# cd /tmp
# rpm -i /tmp/mxm/v1.1/per-ofed/1.5.3-3.1.0/mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm
# rpm -qa | grep mxm
mxm-1.1.3a5e745-1.x86_64

Now I try compiling OpenMPI 1.6.3 with the config:

CFLAGS="" FCFLAGS="" ./configure\
--with-sge  \
--with-openib=/usr  \
--enable-openib-connectx-xrc\
--enable-mpi-thread-multiple\
--with-threads  \
--with-hwloc\
--enable-heterogeneous  \
--with-fca=/opt/mellanox/fca\
--with-mxm-libdir=/opt/mellanox/mxm/lib \
--with-mxm=/opt/mellanox/mxm\
--prefix=/data/openmpi-1-6.3


And it again fails with the new 1.5.3-3.1.0

make[2]: Entering directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
  CC mtl_mxm.lo
  CC mtl_mxm_component.lo
  CC mtl_mxm_endpoint.lo
  CC mtl_mxm_recv.lo

  CC mtl_mxm_send.lo
  CCLD   mca_mtl_mxm.la
/bin/grep: /usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la: No such file or 
directory
/bin/sed: can't read /usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la: No 
such file or directory
libtool: link: `/usr/local/mofed-inst/1.5.3-3.1.0/lib/librdmacm.la' is not a 
valid libtool archive
make[2]: *** [mca_mtl_mxm.la] Error 1
make[2]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi'
make: *** [all-recursive] Error 1


Note:  I don't see a /usr/local/mofed-inst

# ls /usr/local
bin  etc  include  lib  lib64  libexec  sbin  share  src

Question.   When I built the OFED 1.5.4.1 above, I skipped the debug packages ( 
grep -v debuginfo ofed-all.conf  > ofed.conf ).   I don't think I need them?

Any other suggestions?


On 12/2/2012 2:56 AM, Mike Dubman wrote:

please redownload from http://mellanox.com/downloads/hpc/mxm/v1.1/mxm-latest.tar
it contains binaries compiled with mofed 1.5.3-3.1.0
M




Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-02 Thread Joseph Farran

  
  
Same thing.
  
  My new config:
  
      CFLAGS="" FCFLAGS="" ./configure    \
      --with-sge  \
      --with-openib=/usr  \
      --enable-openib-connectx-xrc    \
      --enable-mpi-thread-multiple    \
      --with-threads  \
      --with-hwloc    \
      --enable-heterogeneous  \
      --with-fca=/opt/mellanox/fca    \
      --with-mxm-libdir=/opt/mellanox/mxm/lib \
      --with-mxm=/opt/mellanox/mxm    \
  
  Fails at the same spot:
  
  make[2]: Entering directory
  `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
    CC mtl_mxm.lo
    CC mtl_mxm_cancel.lo
    CC mtl_mxm_endpoint.lo
    CC mtl_mxm_probe.lo
    CC mtl_mxm_recv.lo
    CCLD   mca_mtl_mxm.la
  /bin/grep: /usr/local/mofed-inst/1.5.3-3.0.0/lib/librdmacm.la: No
  such file or directory
  /bin/sed: can't read
  /usr/local/mofed-inst/1.5.3-3.0.0/lib/librdmacm.la: No such file
  or directory
  libtool: link:
  `/usr/local/mofed-inst/1.5.3-3.0.0/lib/librdmacm.la' is not a
  valid libtool archive
  make[2]: *** [mca_mtl_mxm.la] Error 1
  make[2]: Leaving directory
  `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
  make[1]: *** [all-recursive] Error 1
  make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi'
  make: *** [all-recursive] Error 1
  
  
  
  On 12/2/2012 1:37 AM, Mike Dubman wrote:


  
please change "--with-openib" to "--with-openib=/usr"  and
  retry configure/make stage.
10x
  

    On Sun, Dec 2, 2012 at 10:36 AM, Joseph
  Farran <jfar...@uci.edu>
  wrote:
  

  Hi Mike.

Thanks for the help!

I am installing OFED on an NFS share partition so that
all compute nodes will have access.

For the "--with-openib" option, I don't specify one.  
My config file looks like this:

    CFLAGS="" FCFLAGS="" ./configure    \

      --with-sge  \
      --with-openib   \
      --enable-openib-connectx-xrc    \
      --enable-mpi-thread-multiple    \
      --with-threads  \
      --with-hwloc    \
      --enable-heterogeneous  \
      --with-fca=/opt/mellanox/fca    \

    --with-mxm-libdir=/opt/mellanox/mxm/lib \
    --with-mxm=/opt/mellanox/mxm    \
    --prefix=/data/openmpi-1-6.3

Please advise,
Joseph

  





On 12/1/2012 11:39 PM, Mike Dubman wrote:
  

  
  

  

  Hi Joseph,
  I guess you install MOFED under /usr, is that
right?
  Could you please specify "--with-openib=/usr"
parameter during ompi "configure" stage?
  10x
  M

          
  On Fri, Nov 30, 2012 at
1:11 AM, Joseph Farran <jfar...@uci.edu>
wrote:
 Hi YK:
  
  Yes, I have those installed but they are newer
  versions:
  
  # rpm -qa | grep rdma
  librdmacm-1.0.15-1.x86_64
  librdmacm-utils-1.0.15-1.x86_64
  librdmacm-devel-1.0.15-1.x86_64
  # locate librdmacm.la
  #
  
  Here are the RPMs the Mellanox build created
  for kernel: 2.6.32-279.14.1.el6.x86_64
  
  # ls *rdma*
  librdmacm-1.0.15-1.i686.rpm  
   librdmacm-devel-1.0.15-1.i686.rpm  
   librdmacm-utils-1.0.15-1.i686.rpm

Re: [OMPI users] OpenMPI-1.6.3 & MXM

2012-12-02 Thread Joseph Farran

  
  
Hi again.
  
  I believe I have the latest mxm:
  
  # rpm -qa| fgrep mxm
  mxm-1.1.3a5e745-1.x86_64
  
  Let me know if I have the config part correct from previous email.
  
  Best,
  Joseph
  
  
  On 12/1/2012 11:44 PM, Mike Dubman wrote:


  
Hi,
 
The mxm which is part of MOFED 1.5.3 supports OMPI 1.6.0.
 
The mxm upgrade is needed to work with OMPI 1.6.3+
 
Please remove mxm from your cluster nodes (rpm -e mxm)
Install latest from  http://mellanox/com/products/mxm/
Compile ompi 1.6.3, add following to its configure line:
  ./configure --with-openib=/usr --with-mxm=/opt/mellanox/mxm
  <...>)
 
Regards
M
  

On Sat, Dec 1, 2012 at 2:23 AM, Joseph
  Farran <jfar...@uci.edu>
  wrote:
  
 Konz,
  
  For whatever it is worth, I am in the same boat.
  
  I have CentOS 6.3, trying to compile OpenMPI 1.6.3 with
  the mxm from Mellanox and it fails.
  
  Also, the Mellanox OFED (
  MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64 ) does not work
  either.
  
  Mellanox really needs to step in here and help out.    
  
  I have a cluster full of Mellanox products and I hate to
  think we chose the wrong Infiniband vendor.
  
  Joseph
  

  
  
  On 11/30/2012 12:33 PM, Konz, Jeffrey (SSA Solution
  Centers) wrote: 
  
  

  

  I tried building the latest OpenMPI-1.6.3 with
MXM support and got this error:
   
  make[2]: Entering directory
  `Src/openmpi-1.6.3/ompi/mca/mtl/mxm'
    CC mtl_mxm.lo
    CC mtl_mxm_cancel.lo
    CC mtl_mxm_component.lo
    CC mtl_mxm_endpoint.lo
    CC mtl_mxm_probe.lo
    CC mtl_mxm_recv.lo
    CC mtl_mxm_send.lo
  mtl_mxm_send.c: In function
  'ompi_mtl_mxm_send':
  mtl_mxm_send.c:96: error: 'mxm_wait_t'
  undeclared (first use in this function)
  mtl_mxm_send.c:96: error: (Each
  undeclared identifier is reported only once
  mtl_mxm_send.c:96: error: for each
  function it appears in.)
  mtl_mxm_send.c:96: error: expected ';'
  before 'wait'
  mtl_mxm_send.c:104: error:
  'MXM_REQ_FLAG_BLOCKING' undeclared (first use
  in this function)
  mtl_mxm_send.c:118: error:
  'MXM_REQ_FLAG_SEND_SYNC' undeclared (first use
  in this function)
  mtl_mxm_send.c:134: error: 'wait'
  undeclared (first use in this function)
  mtl_mxm_send.c: In function
  'ompi_mtl_mxm_isend':
  mtl_mxm_send.c:183: error:
  'MXM_REQ_FLAG_SEND_SYNC' undeclared (first use
  in this function)
  make[2]: *** [mtl_mxm_send.lo] Error 1
   
   
  Our OFED is 1.5.3 and our MXM version is
1.0.601. 
   
  Thanks,
   
  -Jeff
   
  /**/
  /* Jeff Konz  jeffrey.k...@hp.com */
  /* Solutions Architect   HPC
Benchmarking */
  /* Americas Shared Solutions Architecture
(SSA)   */
  /* Hewlett-Packard
Company    */
  /* Office: 248-491-7480 

Mobile: 248-345-6857 */ 
  /**/




  

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/

Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-02 Thread Joseph Farran

  
  
Hi Mike.
  
  Thanks for the help!
  
  I am installing OFED on an NFS share partition so that all compute
  nodes will have access.
  
  For the "--with-openib" option, I don't specify one.   My config
  file looks like this:
  
      CFLAGS="" FCFLAGS="" ./configure    \
      --with-sge  \
      --with-openib   \
      --enable-openib-connectx-xrc    \
      --enable-mpi-thread-multiple    \
      --with-threads  \
      --with-hwloc    \
      --enable-heterogeneous  \
      --with-fca=/opt/mellanox/fca    \
      --with-mxm-libdir=/opt/mellanox/mxm/lib \
      --with-mxm=/opt/mellanox/mxm    \
      --prefix=/data/openmpi-1-6.3
  
  Please advise,
  Joseph
  
  
  
  
  
  On 12/1/2012 11:39 PM, Mike Dubman wrote:


  
Hi Joseph,
I guess you install MOFED under /usr, is that right?
Could you please specify "--with-openib=/usr" parameter
  during ompi "configure" stage?
10x
M
  
    
On Fri, Nov 30, 2012 at 1:11 AM, Joseph
  Farran <jfar...@uci.edu>
  wrote:
  
Hi YK:

Yes, I have those installed but they are newer versions:

# rpm -qa | grep rdma
librdmacm-1.0.15-1.x86_64
librdmacm-utils-1.0.15-1.x86_64
librdmacm-devel-1.0.15-1.x86_64
# locate librdmacm.la
#

Here are the RPMs the Mellanox build created for kernel:
2.6.32-279.14.1.el6.x86_64

# ls *rdma*
librdmacm-1.0.15-1.i686.rpm  
 librdmacm-devel-1.0.15-1.i686.rpm  
 librdmacm-utils-1.0.15-1.i686.rpm
librdmacm-1.0.15-1.x86_64.rpm  librdmacm-devel-1.0.15-1.x86_64.rpm
 librdmacm-utils-1.0.15-1.x86_64.rpm


On 11/29/2012 02:59 PM, Yevgeny Kliteynik wrote:

  Joseph,
  

You're supposed to have librdmacm installed as part of
MLNX_OFED installation.
What does "rpm -qa | grep rdma" tell?

   $ rpm -qa | grep rdma
   librdmacm-devel-1.0.14.1-1.x86_64
   librdmacm-utils-1.0.14.1-1.x86_64
   librdmacm-1.0.14.1-1.x86_64

   $ locate librdmacm.la
   /usr/local/mofed/1.5.3-4.0.9/lib/librdmacm.la

-- YK

  


  


  


  



Re: [OMPI users] OpenMPI-1.6.3 & MXM

2012-11-30 Thread Joseph Farran

Konz,

For whatever it is worth, I am in the same boat.

I have CentOS 6.3, trying to compile OpenMPI 1.6.3 with the mxm from Mellanox 
and it fails.

Also, the Mellanox OFED ( MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64 ) does not 
work either.

Mellanox really needs to step in here and help out.

I have a cluster full of Mellanox products and I hate to think we chose the 
wrong Infiniband vendor.

Joseph


On 11/30/2012 12:33 PM, Konz, Jeffrey (SSA Solution Centers) wrote:


I tried building the latest OpenMPI-1.6.3 with MXM support and got this error:

make[2]: Entering directory `Src/openmpi-1.6.3/ompi/mca/mtl/mxm'

  CC mtl_mxm.lo

  CC mtl_mxm_cancel.lo

  CC mtl_mxm_component.lo

  CC mtl_mxm_endpoint.lo

  CC mtl_mxm_probe.lo

  CC mtl_mxm_recv.lo

  CC mtl_mxm_send.lo

mtl_mxm_send.c: In function 'ompi_mtl_mxm_send':

mtl_mxm_send.c:96: error: 'mxm_wait_t' undeclared (first use in this function)

mtl_mxm_send.c:96: error: (Each undeclared identifier is reported only once

mtl_mxm_send.c:96: error: for each function it appears in.)

mtl_mxm_send.c:96: error: expected ';' before 'wait'

mtl_mxm_send.c:104: error: 'MXM_REQ_FLAG_BLOCKING' undeclared (first use in 
this function)

mtl_mxm_send.c:118: error: 'MXM_REQ_FLAG_SEND_SYNC' undeclared (first use in 
this function)

mtl_mxm_send.c:134: error: 'wait' undeclared (first use in this function)

mtl_mxm_send.c: In function 'ompi_mtl_mxm_isend':

mtl_mxm_send.c:183: error: 'MXM_REQ_FLAG_SEND_SYNC' undeclared (first use in 
this function)

make[2]: *** [mtl_mxm_send.lo] Error 1

Our OFED is 1.5.3 and our MXM version is 1.0.601.

Thanks,

-Jeff

/**/

/* Jeff Konz  jeffrey.k...@hp.com */

/* Solutions Architect   HPC Benchmarking */

/* Americas Shared Solutions Architecture (SSA)   */

/* Hewlett-Packard Company*/

/* Office: 248-491-7480  Mobile: 248-345-6857 */

/**/



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] OpenMPI 1.6.3 and Memory Issues

2012-11-29 Thread Joseph Farran

Hi again.

I am using /etc/modprobe.d/mofed.conf, otherwise I get:

WARNING: Deprecated config file /etc/modprobe.conf, all config files belong 
into /etc/modprobe.d/

But I am still getting the memory errors after making the changes and rebooting:

$ cat /etc/modprobe.d/mofed.conf
options mlx4_core log_num_mtt=24
options mlx4_core log_mtts_per_seg=1

$ mpirun hello
--
WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory.  This can cause MPI jobs to
run with erratic performance, hang, and/or crash.



On 11/29/2012 04:39 PM, Yevgeny Kliteynik wrote:

You can also set these parameters in /etc/modprobe.conf:

   options mlx4_core log_num_mtt=24 log_mtts_per_seg=1

-- YK





Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-11-29 Thread Joseph Farran

Hi YK:

Yes, I have those installed but they are newer versions:

# rpm -qa | grep rdma
librdmacm-1.0.15-1.x86_64
librdmacm-utils-1.0.15-1.x86_64
librdmacm-devel-1.0.15-1.x86_64
# locate librdmacm.la
#

Here are the RPMs the Mellanox build created for kernel: 
2.6.32-279.14.1.el6.x86_64

# ls *rdma*
librdmacm-1.0.15-1.i686.rpmlibrdmacm-devel-1.0.15-1.i686.rpm
librdmacm-utils-1.0.15-1.i686.rpm
librdmacm-1.0.15-1.x86_64.rpm  librdmacm-devel-1.0.15-1.x86_64.rpm  
librdmacm-utils-1.0.15-1.x86_64.rpm


On 11/29/2012 02:59 PM, Yevgeny Kliteynik wrote:

Joseph,

You're supposed to have librdmacm installed as part of MLNX_OFED installation.
What does "rpm -qa | grep rdma" tell?

   $ rpm -qa | grep rdma
   librdmacm-devel-1.0.14.1-1.x86_64
   librdmacm-utils-1.0.14.1-1.x86_64
   librdmacm-1.0.14.1-1.x86_64

   $ locate librdmacm.la
   /usr/local/mofed/1.5.3-4.0.9/lib/librdmacm.la

-- YK





[OMPI users] OpenMPI 1.6.3 and Memory Issues

2012-11-29 Thread Joseph Farran

Hi All.

In compiling a simple Hello world with OpenMPI 1.6.3 and mpirun the hello 
program, I am getting:

$ ulimit -l unlimited
$ mpirun -np 2 hello
--
WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory.  This can cause MPI jobs to
run with erratic performance, hang, and/or crash.

This may be caused by your OpenFabrics vendor limiting the amount of
physical memory that can be registered.  You should investigate the
relevant Linux kernel module parameters that control how much physical
memory can be registered, and increase them to allow registering all
physical memory on your machine.

See this Open MPI FAQ item for more information on these Linux kernel module
parameters:

http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

  Local host:  hpc
  Registerable memory: 4096 MiB
  Total memory:258470 MiB

Your MPI job will continue, but may be behave poorly and/or hang.
--
Hello World.   I am the Master Node (hpc) with Rank 0.
Hello World.   I am compute Node (hpc) with Rank 1
[hpc:08261] 1 more process has sent help message help-mpi-btl-openib.txt / reg 
mem limit low
[hpc:08261] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / 
error messages


I have my limits setup with:
cat /etc/security/limits.conf
* soft memlock unlimited
* hard memlock unlimited

What am I missing?

OS is CentOS 6.3.

Joseph


Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-11-29 Thread Joseph Farran

On 11/28/2012 10:53 AM, Mike Dubman wrote:

You need mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm


On Wed, Nov 28, 2012 at 7:44 PM, Joseph Farran <jfar...@uci.edu 
<mailto:jfar...@uci.edu>> wrote:

mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm




After installing MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64, removing the old 
mxm and installing the corrrect mxm, I still cannot get OpenMPI to compile.

rpm -qa|fgrep mxm
mxm-1.1.3a5e745-1.x86_64

My compile of openmpi 1.6.3 with the following configs fails:

--with-sge  \
--with-openib   \
--enable-openib-connectx-xrc\
--enable-mpi-thread-multiple\
--with-threads  \
--with-hwloc\
--enable-heterogeneous  \
--with-fca=/opt/mellanox/fca\
--with-mxm=/opt/mellanox/mxm\
--with-mxm-libdir=/opt/mellanox/mxm/lib \


make[2]: Entering directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
  CC mtl_mxm.lo
  CC mtl_mxm_cancel.lo
  CC mtl_mxm_component.lo
  CC mtl_mxm_endpoint.lo
  CC mtl_mxm_probe.lo
  CC mtl_mxm_recv.lo
  CC mtl_mxm_send.lo
  CCLD   mca_mtl_mxm.la
/bin/grep: /usr/local/mofed-inst/1.5.3-3.0.0/lib/librdmacm.la: No such file or 
directory
/bin/sed: can't read /usr/local/mofed-inst/1.5.3-3.0.0/lib/librdmacm.la: No 
such file or directory
libtool: link: `/usr/local/mofed-inst/1.5.3-3.0.0/lib/librdmacm.la' is not a 
valid libtool archive
make[2]: *** [mca_mtl_mxm.la] Error 1
make[2]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi'
make: *** [all-recursive] Error 1


No where in my system can I find "librdmacm.la"What am I missing?

Joseph




Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-11-28 Thread Joseph Farran

Question on the version to use.  The tar files contains the following RPMS:

# ls | grep -v debug
mxm-1.1.3a5e745-1.x86_64-centos5u6.rpm
mxm-1.1.3a5e745-1.x86_64-centos5u7.rpm
mxm-1.1.3a5e745-1.x86_64-centos6u0.rpm
mxm-1.1.3a5e745-1.x86_64-rhel5u5.rpm
mxm-1.1.3a5e745-1.x86_64-rhel6u1.rpm
mxm-1.1.3a5e745-1.x86_64-rhel6u2.rpm
mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm
mxm-1.1.3a5e745-1.x86_64-sles10sp4.rpm
mxm-1.1.3a5e745-1.x86_64-sles11sp1.rpm
mxm-1.1.3a5e745-1.x86_64-sles11sp2.rpm

For CentOS 6.3, which one do I use?

Will mxm-1.1.3a5e745-1.x86_64-centos6u0.rpm work, or do I need update 3 ( 
mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm )?

Joseph


On 11/28/2012 06:29 AM, Yevgeny Kliteynik wrote:

On 11/28/2012 10:52 AM, Pavel Mezentsev wrote:

You can try downloading and installing a fresher version of MXM from mellanox 
web site. There was a thread on the list with the same problem, you can search 
for it.

Indeed, that OFED version comes with older version of MXM.
You can get the newer version here: http://www.mellanox.com/products/mxm

-- YK



2012/11/28 Joseph Farran<jfar...@uci.edu<mailto:jfar...@uci.edu>>

 Howdy.

 I a have a stock CentOS 6.3 OS and a Mellanox MT26428 card.

 I installed the Mellanox OFED MLNX_OFED_LINUX-1.5.3-3.1.0-__rhel6.3-x86_64 
which installed just fine.   Rebooted the system and when I try building 
OpenMPI 1.6.3, it aborts with:

 mtl_mxm_send.c: In function 'ompi_mtl_mxm_isend':
 mtl_mxm_send.c:183: error: 'MXM_REQ_FLAG_SEND_SYNC' undeclared (first use 
in this function)
 make[2]: *** [mtl_mxm_send.lo] Error 1
 make[2]: *** Waiting for unfinished jobs
 make[2]: Leaving directory 
`/data/apps/sources/openmpi-1.__6.3/ompi/mca/mtl/mxm'
 make[1]: *** [all-recursive] Error 1
 make[1]: Leaving directory `/data/apps/sources/openmpi-1.__6.3/ompi'
 make: *** [all-recursive] Error 1


 My configure options are:

  --with-sge  \
  --with-threads  \
  --with-hwloc\
  --with-openib   \
  --enable-mpi-thread-multiple\
  --with-mxm=/opt/mellanox/mxm\
  --with-mxm-libdir=/opt/__mellanox/mxm/lib \
  --with-fca=/opt/mellanox/fca\
  --enable-heterogeneous  \
  --enable-openib-connectx-xrc\


 Has anyone been able to compile OpenMPI 1.6.3 with the Mellanox OFED on 
CentOS 6.3?

 Joseph
 _
 users mailing list
 us...@open-mpi.org<mailto:us...@open-mpi.org>
 
http://www.open-mpi.org/__mailman/listinfo.cgi/users<http://www.open-mpi.org/mailman/listinfo.cgi/users>




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-11-28 Thread Joseph Farran

Perfect and Thanks!

I had searched the Mellanox web site for the mxm package to no avail.

I will try recompiling later today.

Best,
Joseph


On 11/28/2012 06:29 AM, Yevgeny Kliteynik wrote:

On 11/28/2012 10:52 AM, Pavel Mezentsev wrote:

You can try downloading and installing a fresher version of MXM from mellanox 
web site. There was a thread on the list with the same problem, you can search 
for it.

Indeed, that OFED version comes with older version of MXM.
You can get the newer version here: http://www.mellanox.com/products/mxm

-- YK



2012/11/28 Joseph Farran<jfar...@uci.edu<mailto:jfar...@uci.edu>>

 Howdy.

 I a have a stock CentOS 6.3 OS and a Mellanox MT26428 card.

 I installed the Mellanox OFED MLNX_OFED_LINUX-1.5.3-3.1.0-__rhel6.3-x86_64 
which installed just fine.   Rebooted the system and when I try building 
OpenMPI 1.6.3, it aborts with:

 mtl_mxm_send.c: In function 'ompi_mtl_mxm_isend':
 mtl_mxm_send.c:183: error: 'MXM_REQ_FLAG_SEND_SYNC' undeclared (first use 
in this function)
 make[2]: *** [mtl_mxm_send.lo] Error 1
 make[2]: *** Waiting for unfinished jobs
 make[2]: Leaving directory 
`/data/apps/sources/openmpi-1.__6.3/ompi/mca/mtl/mxm'
 make[1]: *** [all-recursive] Error 1
 make[1]: Leaving directory `/data/apps/sources/openmpi-1.__6.3/ompi'
 make: *** [all-recursive] Error 1


 My configure options are:

  --with-sge  \
  --with-threads  \
  --with-hwloc\
  --with-openib   \
  --enable-mpi-thread-multiple\
  --with-mxm=/opt/mellanox/mxm\
  --with-mxm-libdir=/opt/__mellanox/mxm/lib \
  --with-fca=/opt/mellanox/fca\
  --enable-heterogeneous  \
  --enable-openib-connectx-xrc\


 Has anyone been able to compile OpenMPI 1.6.3 with the Mellanox OFED on 
CentOS 6.3?

 Joseph
 _
 users mailing list
 us...@open-mpi.org<mailto:us...@open-mpi.org>
 
http://www.open-mpi.org/__mailman/listinfo.cgi/users<http://www.open-mpi.org/mailman/listinfo.cgi/users>




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-11-27 Thread Joseph Farran

Howdy.

I a have a stock CentOS 6.3 OS and a Mellanox MT26428 card.

I installed the Mellanox OFED MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64 which 
installed just fine.   Rebooted the system and when I try building OpenMPI 
1.6.3, it aborts with:

mtl_mxm_send.c: In function 'ompi_mtl_mxm_isend':
mtl_mxm_send.c:183: error: 'MXM_REQ_FLAG_SEND_SYNC' undeclared (first use in 
this function)
make[2]: *** [mtl_mxm_send.lo] Error 1
make[2]: *** Waiting for unfinished jobs
make[2]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi'
make: *** [all-recursive] Error 1


My configure options are:

--with-sge  \
--with-threads  \
--with-hwloc\
--with-openib   \
--enable-mpi-thread-multiple\
--with-mxm=/opt/mellanox/mxm\
--with-mxm-libdir=/opt/mellanox/mxm/lib \
--with-fca=/opt/mellanox/fca\
--enable-heterogeneous  \
--enable-openib-connectx-xrc\


Has anyone been able to compile OpenMPI 1.6.3 with the Mellanox OFED on CentOS 
6.3?

Joseph