Re: [OMPI devel] 1.8.2rc1 available for test

2014-07-11 Thread Ralph Castain
Sorry - gave the link to my test page :-)

http://www.open-mpi.org/software/ompi/v1.8/

On Jul 11, 2014, at 11:46 AM, Ralph Castain  wrote:

> Hi folks
> 
> I've posted the first release candidate for 1.8.2 - please test!!
> 
> http://localhost/~rhc/ompi-www/software/ompi/v1.8/
> 
> Ralph
> 



Re: [OMPI devel] hwloc and pmi

2014-07-11 Thread Ralph Castain
It's probably being picked up from the PMI check and being added to the 
cppflags for components that call that .m4 (e.g., common/pmi). You might print 
out the cppflags being created in that script and see if that's the case.

The slurm check shouldn't be throwing anything into the global cppflags, and I 
don't think common/pmi calls the slurm .m4 check - at least, it probably 
shouldn't

On Jul 11, 2014, at 11:57 AM, Mike Dubman  wrote:

> I think the problem related to new version of SLURM which was upgraded on our 
> machines.
> we had 2.6.6 now it is 14.03.4-2
> 
> $make V=1
> /bin/sh ../../../../libtool  --tag=CC   --mode=link gcc -std=gnu99  -DNDEBUG 
> -O3 -g -finline-functions -fno-strict-aliasing -pthread  -version-info 2:2:1 
> -export-dynamic   -o libmca_common_pmi.la -rpath 
> /hpc/scrap/mtt/scratch/shmem/20140711_210002_6937_8974_sputnik7.vbench.com/installs/vyng/install/lib
>  common_pmi.lo -lpmi2 -lpmi  -Wl,-rpath= -lrt -lnsl  -lutil -lm
> libtool: link: gcc -std=gnu99 -shared  -fPIC -DPIC  .libs/common_pmi.o   
> /usr/lib64/libpmi2.so /usr/lib64/libpmi.so -L/usr/lib64 
> /usr/lib64/libslurm.so -ldl -lhwloc -lpthread -lrt -lnsl -lutil -lm  -O3 
> -pthread -Wl,-rpath=   -pthread -Wl,-soname -Wl,libmca_common_pmi.so.1 -o 
> .libs/libmca_common_pmi.so.1.1.2
> /usr/bin/ld: cannot find -lhwloc
> collect2: ld returned 1 exit status
> make: *** [libmca_common_pmi.la] Error 1
> 
> The Makefile in opal/mca/common/pmi/Makefile has no references to "-lhwloc", 
> so it comes as dependancy from outside.
> does it make sense?
> 
> 
> this is a configure line used:
>   $ ./configure --with-platform=contrib/platform/mellanox/optimized 
> --with-fca=/opt/mellanox/fca 
> --with-mxm=/hpc/local/benchmarks/hpc-stack-gcc/install/mxm --enable-oshmem 
> --with-slurm --with-pmi --with-oshmem-param-check 
> --with-knem=/opt/knem-1.1.1.90mlnx 
> --prefix=/hpc/scrap/mtt/scratch/shmem/20140711_210002_6937_8974_sputnik7.vbench.com/installs/vyng/install
> 
> 
> $ldd /usr/lib64/libpmi.so
> linux-vdso.so.1 =>  (0x77ffe000)
> libslurm.so.27 => /usr/lib64/libslurm.so.27 (0x77ac6000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x77897000)
> libc.so.6 => /lib64/libc.so.6 (0x77504000)
> libdl.so.2 => /lib64/libdl.so.2 (0x7730)
> libhwloc.so.5 => /usr/lib64/libhwloc.so.5 (0x770d7000)
> /lib64/ld-linux-x86-64.so.2 (0x003d9de0)
> libm.so.6 => /lib64/libm.so.6 (0x76e53000)
> libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x76c4a000)
> libpci.so.3 => /lib64/libpci.so.3 (0x76a3d000)
> libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x766eb000)
> libresolv.so.2 => /lib64/libresolv.so.2 (0x764d1000)
> libz.so.1 => /lib64/libz.so.1 (0x762ba000)
> mtt@sputnik7 
> /hpc/scrap/mtt/scratch/shmem/20140711_210002_6937_8974_sputnik7.vbench.com/mpi-install/NoDd/src/ompi-vendor.git
> $ldd /usr/lib64/libslurm.so
> linux-vdso.so.1 =>  (0x77ffe000)
> libdl.so.2 => /lib64/libdl.so.2 (0x77ab6000)
> libhwloc.so.5 => /usr/lib64/libhwloc.so.5 (0x7788d000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x7767)
> libc.so.6 => /lib64/libc.so.6 (0x772dd000)
> /lib64/ld-linux-x86-64.so.2 (0x003d9de0)
> libm.so.6 => /lib64/libm.so.6 (0x77058000)
> libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x76e4f000)
> libpci.so.3 => /lib64/libpci.so.3 (0x76c43000)
> libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x768f)
> libresolv.so.2 => /lib64/libresolv.so.2 (0x766d6000)
> libz.so.1 => /lib64/libz.so.1 (0x764c)
> 
> 
> 
> 
> 
> 
> On Thu, Jul 10, 2014 at 6:53 PM, Nathan Hjelm  wrote:
> Nope, just added a missing file to the tarball.
> 
> -Nathan
> 
> On Thu, Jul 10, 2014 at 06:54:19AM -0700, Ralph Castain wrote:
> >IIRC, I thought I saw a change to that makefile.am flow thru yesterday?
> >Could be there was an error in it
> >On Jul 10, 2014, at 5:26 AM, Jeff Squyres (jsquyres) 
> >wrote:
> >
> >  Shouldn't be - PMI should be linking against the internal hwloc.
> >  I'm AFK and can't look at this. Have a look at other components that 
> > use
> >  hwloc and copy their header file setup and make file.am setup.
> >
> >  Sent from my phone. No type good.
> >  On Jul 10, 2014, at 8:22 AM, "Mike Dubman" 
> >  wrote:
> >
> >Hi guys,
> >jenkins node failing on this.
> >is hwloc-devel now required to be available as part of distro?
> >Thanks
> >M
> >
> >  15:14:11 make[3]: Leaving directory 
> > `/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/opal'
> >  15:14:11 make[2]: Leaving directory 
> > `/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/opal'
> >  15:14:11 Making install in 

Re: [OMPI devel] RFC: Move the Open MPI communication infrastructure in OPAL

2014-07-11 Thread Pritchard, Howard P
Hi Folks,

Now work is planned for the uGNI BTL at this time either.  

Howard


-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
(jsquyres)
Sent: Thursday, July 10, 2014 5:04 PM
To: Open MPI Developers List
Subject: Re: [OMPI devel] RFC: Move the Open MPI communication infrastructure 
in OPAL

FWIW: I can't speak for other BTL maintainers, but I'm out of the office for 
the next week, and the usnic BTL will be standing still during that time.  Once 
I return, I will be making additional changes in the usnic BTL (new features, 
updates, ...etc.).

So if you have the cycles, doing it in the next week or so would be good 
because at least there will be no conflicts with usnic BTL concurrent 
development.  :-)




On Jul 10, 2014, at 2:56 PM, Ralph Castain  wrote:

> George: any update on when this will happen?
> 
> 
> On Jun 4, 2014, at 9:14 PM, George Bosilca  wrote:
> 
>> WHAT:Open our low-level communication infrastructure by moving all
>> necessary components
>>  (btl/rcache/allocator/mpool) down in OPAL
>> 
>> WHY: All the components required for inter-process communications are 
>> currently deeply integrated in the OMPI
>> layer. Several groups/institutions have express interest 
>> in having a more generic communication
>> infrastructure, without all the OMPI layer dependencies.
>> This communication layer should be made
>> available at a different software level, available to all 
>> layers in the Open MPI software stack. As an
>> example, our ORTE layer could replace the current OOB and 
>> instead use the BTL directly, gaining
>> access to more reactive network interfaces than TCP.
>> Similarly, external software libraries could take
>> advantage of our highly optimized AM (active message) 
>> communication layer for their own purpose.
>> 
>> UTK with support from Sandia, developped a version of 
>> Open MPI where the entire communication
>> infrastucture has been moved down to OPAL 
>> (btl/rcache/allocator/mpool). Most of the moved
>> components have been updated to match the new schema, 
>> with few exceptions (mainly BTLs
>> where I have no way of compiling/testing them). Thus, the 
>> completion of this RFC is tied to
>> being able to completing this move for all BTLs. For this 
>> we need help from the rest of the Open MPI
>> community, especially those supporting some of the BTLs.
>> A non-exhaustive list of BTLs that
>> qualify here is: mx, portals4, scif, udapl, ugni, usnic.
>> 
>> WHERE:  bitbucket.org/bosilca/ompi-btl (updated today with respect to 
>> trunk r31952)
>> 
>> TIMEOUT: After all the BTLs have been amended to match the new 
>> location and usage. We will discuss
>> the last bits regarding this RFC at the Open MPI 
>> developers meeting in Chicago, June 24-26. The
>> RFC will become final only after the meeting.
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/06/14974.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15100.php


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/07/15104.php


Re: [OMPI devel] hwloc and pmi

2014-07-11 Thread Mike Dubman
I think the problem related to new version of SLURM which was upgraded on
our machines.
we had 2.6.6 now it is 14.03.4-2

$make V=1
/bin/sh ../../../../libtool  --tag=CC   --mode=link gcc -std=gnu99
 -DNDEBUG -O3 -g -finline-functions -fno-strict-aliasing -pthread
 -version-info 2:2:1 -export-dynamic   -o libmca_common_pmi.la -rpath
/hpc/scrap/mtt/scratch/shmem/
20140711_210002_6937_8974_sputnik7.vbench.com/installs/vyng/install/lib
common_pmi.lo -lpmi2 -lpmi  -Wl,-rpath= -lrt -lnsl  -lutil -lm
libtool: link: gcc -std=gnu99 -shared  -fPIC -DPIC  .libs/common_pmi.o
/usr/lib64/libpmi2.so /usr/lib64/libpmi.so -L/usr/lib64
/usr/lib64/libslurm.so -ldl -lhwloc -lpthread -lrt -lnsl -lutil -lm  -O3
-pthread -Wl,-rpath=   -pthread -Wl,-soname -Wl,libmca_common_pmi.so.1 -o
.libs/libmca_common_pmi.so.1.1.2
/usr/bin/ld: cannot find -lhwloc
collect2: ld returned 1 exit status
make: *** [libmca_common_pmi.la] Error 1

The Makefile in opal/mca/common/pmi/Makefile has no references to
"-lhwloc", so it comes as dependancy from outside.
does it make sense?


this is a configure line used:
  $ ./configure --with-platform=contrib/platform/mellanox/optimized
--with-fca=/opt/mellanox/fca
--with-mxm=/hpc/local/benchmarks/hpc-stack-gcc/install/mxm --enable-oshmem
--with-slurm --with-pmi --with-oshmem-param-check
--with-knem=/opt/knem-1.1.1.90mlnx --prefix=/hpc/scrap/mtt/scratch/shmem/
20140711_210002_6937_8974_sputnik7.vbench.com/installs/vyng/install


$ldd /usr/lib64/libpmi.so
linux-vdso.so.1 =>  (0x77ffe000)
libslurm.so.27 => /usr/lib64/libslurm.so.27 (0x77ac6000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x77897000)
libc.so.6 => /lib64/libc.so.6 (0x77504000)
libdl.so.2 => /lib64/libdl.so.2 (0x7730)
libhwloc.so.5 => /usr/lib64/libhwloc.so.5 (0x770d7000)
/lib64/ld-linux-x86-64.so.2 (0x003d9de0)
libm.so.6 => /lib64/libm.so.6 (0x76e53000)
libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x76c4a000)
libpci.so.3 => /lib64/libpci.so.3 (0x76a3d000)
libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x766eb000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x764d1000)
libz.so.1 => /lib64/libz.so.1 (0x762ba000)
mtt@sputnik7 /hpc/scrap/mtt/scratch/shmem/
20140711_210002_6937_8974_sputnik7.vbench.com/mpi-install/NoDd/src/ompi-vendor.git
$ldd /usr/lib64/libslurm.so
linux-vdso.so.1 =>  (0x77ffe000)
libdl.so.2 => /lib64/libdl.so.2 (0x77ab6000)
libhwloc.so.5 => /usr/lib64/libhwloc.so.5 (0x7788d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x7767)
libc.so.6 => /lib64/libc.so.6 (0x772dd000)
/lib64/ld-linux-x86-64.so.2 (0x003d9de0)
libm.so.6 => /lib64/libm.so.6 (0x77058000)
libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x76e4f000)
libpci.so.3 => /lib64/libpci.so.3 (0x76c43000)
libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x768f)
libresolv.so.2 => /lib64/libresolv.so.2 (0x766d6000)
libz.so.1 => /lib64/libz.so.1 (0x764c)






On Thu, Jul 10, 2014 at 6:53 PM, Nathan Hjelm  wrote:

> Nope, just added a missing file to the tarball.
>
> -Nathan
>
> On Thu, Jul 10, 2014 at 06:54:19AM -0700, Ralph Castain wrote:
> >IIRC, I thought I saw a change to that makefile.am flow thru
> yesterday?
> >Could be there was an error in it
> >On Jul 10, 2014, at 5:26 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com>
> >wrote:
> >
> >  Shouldn't be - PMI should be linking against the internal hwloc.
> >  I'm AFK and can't look at this. Have a look at other components
> that use
> >  hwloc and copy their header file setup and make file.am setup.
> >
> >  Sent from my phone. No type good.
> >  On Jul 10, 2014, at 8:22 AM, "Mike Dubman" <
> mi...@dev.mellanox.co.il>
> >  wrote:
> >
> >Hi guys,
> >jenkins node failing on this.
> >is hwloc-devel now required to be available as part of distro?
> >Thanks
> >M
> >
> >  15:14:11 make[3]: Leaving directory
> `/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/opal'
> >  15:14:11 make[2]: Leaving directory
> `/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/opal'
> >  15:14:11 Making install in mca/common/pmi
> >  15:14:11 make[2]: Entering directory
> `/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/opal/mca/common/pmi'
> >  15:14:11   CC   common_pmi.lo
> >  15:14:11   CCLD libmca_common_pmi.la
> >  15:14:11 /usr/bin/ld: cannot find -lhwloc
> >  15:14:11 collect2: ld returned 1 exit status
> >  15:14:11 make[2]: *** [libmca_common_pmi.la] Error 1
> >  15:14:11 make[2]: Leaving directory
> `/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/opal/mca/common/pmi'
> >
> >

[OMPI devel] 1.8.2rc1 available for test

2014-07-11 Thread Ralph Castain
Hi folks

I've posted the first release candidate for 1.8.2 - please test!!

http://localhost/~rhc/ompi-www/software/ompi/v1.8/

Ralph



Re: [OMPI devel] trunk and fortran errors

2014-07-11 Thread Ralph Castain
I confirm it also works on CentOS 6 under the Intel 14.0.3 and gcc 4.4.7 
compilers, and on Mac under the gcc 4.7.3 compilers.


On Jul 11, 2014, at 10:25 AM, Jeff Squyres (jsquyres)  
wrote:

> Thanks!
> 
> Sent from my phone. No type good. 
> 
>> On Jul 11, 2014, at 1:16 AM, "Gilles Gouaillardet" 
>>  wrote:
>> 
>> Thanks Jeff,
>> 
>> i confirm the problem is fixed on CentOS 5
>> 
>> i commited r32215 because some files were missing from the
>> tarball/nightly snapshot/make dist.
>> 
>> Cheers,
>> 
>> Gilles
>> 
>>> On 2014/07/11 4:21, Jeff Squyres (jsquyres) wrote:
>>> As of r32204, this should be fixed.  Please let me know if it now works for 
>>> you.
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15105.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15107.php



Re: [OMPI devel] trunk and fortran errors

2014-07-11 Thread Jeff Squyres (jsquyres)
Thanks!

Sent from my phone. No type good. 

> On Jul 11, 2014, at 1:16 AM, "Gilles Gouaillardet" 
>  wrote:
> 
> Thanks Jeff,
> 
> i confirm the problem is fixed on CentOS 5
> 
> i commited r32215 because some files were missing from the
> tarball/nightly snapshot/make dist.
> 
> Cheers,
> 
> Gilles
> 
>> On 2014/07/11 4:21, Jeff Squyres (jsquyres) wrote:
>> As of r32204, this should be fixed.  Please let me know if it now works for 
>> you.
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15105.php


[OMPI devel] btl_openib_receive_queues mca param not always taken into account

2014-07-11 Thread Nadia Derbey

Hi,

I noticed that specifying the receive_queues through an mca param (-mca 
btl_openib_receive_queues ) doesn't always override the 
mca-btl-openib-device-params.ini setting.


If for whatever reason we want to bypass the 
mca-btl-openib-device-params.ini file setting for the receive_queues, we 
should be able to specify a value through an mca param.
But if the string provided in the mca param is the same as the default 
one (default_qps in btl_openib_register_mca_params()), this does not 
work: we still get the receive_queues from the .ini file.


This is due to the way the 
mca_btl_openib_component.receive_queues_source (where did we get the 
receive_queues value from) is computed:


1) in btl_openib_register_mca_params() we register 
btl_openib_receive_queues, providing default_qps as a default

   value.
2) mca_btl_openib_component.receive_queues_source is set to 
BTL_OPENIB_RQ_SOURCE_MCA only if the registered string

   is different from default_qps
   (if both strings are equal, the source is set to 
BTL_OPENIB_RQ_SOURCE_DEFAULT).
3) then, in init_one_device(), 
mca_btl_openib_component.receive_queues_source is checked:
 . if its value is BTL_OPENIB_RQ_SOURCE_MCA, we bypass any other 
setting (this is the behaviour I expected)
 . otherwise, we go on, getting the .ini file settings (this is the 
behaviour I got)


I wanted to know if this behaviour is intentional and the reason for it.
If ever it is not, the attached trivial patch fixes it.

Regards,

--
Nadia Derbey

# HG changeset patch
# Parent 4cb09323aca44faec7d027586ffa94e7d9681989
btl/openib: when specifying the receive_queues as an mca param to bypass the XRC settings, the XRC settings in the .ini file are taken into account nevertheless if we use the default QPs value

diff -r 4cb09323aca4 ompi/mca/btl/openib/btl_openib_component.c
--- a/ompi/mca/btl/openib/btl_openib_component.c	Fri Jul 11 05:05:19 2014 +
+++ b/ompi/mca/btl/openib/btl_openib_component.c	Fri Jul 11 11:46:56 2014 +0200
@@ -268,6 +268,17 @@ static int btl_openib_component_close(vo
 ompi_btl_openib_fd_finalize();
 ompi_btl_openib_ini_finalize();

+if (NULL != mca_btl_openib_component.receive_queues
+&& BTL_OPENIB_RQ_SOURCE_DEFAULT ==
+mca_btl_openib_component.receive_queues_source) {
+/*
+ * In that case, the string has not been duplicated during variable
+ * registration. So it won't be freed by the mca_base_var system.
+ * Free it here.
+ */
+free(mca_btl_openib_component.receive_queues);
+}
+
 if (NULL != mca_btl_openib_component.default_recv_qps) {
 free(mca_btl_openib_component.default_recv_qps);
 }
diff -r 4cb09323aca4 ompi/mca/btl/openib/btl_openib_mca.c
--- a/ompi/mca/btl/openib/btl_openib_mca.c	Fri Jul 11 05:05:19 2014 +
+++ b/ompi/mca/btl/openib/btl_openib_mca.c	Fri Jul 11 11:46:56 2014 +0200
@@ -661,12 +661,14 @@ int btl_openib_register_mca_params(void)
 mca_btl_openib_component.default_recv_qps = default_qps;
 CHECK(reg_string("receive_queues", NULL,
  "Colon-delimited, comma-delimited list of receive queues: P,4096,8,6,4:P,32768,8,6,4",
- default_qps, &mca_btl_openib_component.receive_queues,
+ NULL, &mca_btl_openib_component.receive_queues,
  0));
-mca_btl_openib_component.receive_queues_source =
-(0 == strcmp(default_qps,
- mca_btl_openib_component.receive_queues)) ?
-BTL_OPENIB_RQ_SOURCE_DEFAULT : BTL_OPENIB_RQ_SOURCE_MCA;
+if (NULL == mca_btl_openib_component.receive_queues) {
+mca_btl_openib_component.receive_queues = strdup(default_qps);
+mca_btl_openib_component.receive_queues_source = BTL_OPENIB_RQ_SOURCE_DEFAULT;
+} else {
+mca_btl_openib_component.receive_queues_source = BTL_OPENIB_RQ_SOURCE_MCA;
+}

 CHECK(reg_string("if_include", NULL,
  "Comma-delimited list of devices/ports to be used (e.g. \"mthca0,mthca1:2\"; empty value means to use all ports found).  Mutually exclusive with btl_openib_if_exclude.",


Re: [OMPI devel] trunk and fortran errors

2014-07-11 Thread Gilles Gouaillardet
Thanks Jeff,

i confirm the problem is fixed on CentOS 5

i commited r32215 because some files were missing from the
tarball/nightly snapshot/make dist.

Cheers,

Gilles

On 2014/07/11 4:21, Jeff Squyres (jsquyres) wrote:
> As of r32204, this should be fixed.  Please let me know if it now works for 
> you.