Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-09-06 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/09/13 00:23, Hjelm, Nathan T wrote:

> I assume that process binding is enabled for both mpirun and srun?
> If not that could account for a difference between the runtimes.

You raise an interesting point, we have been doing that with:

[samuel@barcoo ~]$ module show openmpi 2>&1 | grep binding
setenv   OMPI_MCA_orte_process_binding core

However, modifying the test program confirms that variable is getting
propagated as expected with both mpirun and srun for 1.6.5 and the 1.7
snapshot. :-(

cheers,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIpVp4ACgkQO2KABBYQAh88rQCggOZkAjPV+/1PX2R9auuij+1M
jdsAn17nDCoubkdvCsLRKozqGEYWjOY1
=RaoK
-END PGP SIGNATURE-


Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-09-06 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/09/13 14:14, Christopher Samuel wrote:

> However, modifying the test program confirms that variable is getting
> propagated as expected with both mpirun and srun for 1.6.5 and the 1.7
> snapshot. :-(

Investigating further by setting:

export OMPI_MCA_orte_report_bindings=1
export SLURM_CPU_BIND=core
export SLURM_CPU_BIND_VERBOSE=verbose

reveals that only OMPI 1.6.5 with mpirun reports bindings being set
(see below).   We cannot understand why Slurm doesn't *appear* to be
setting bindings as we have the correct settings according to the
documentation.

Whilst it may explain the difference between 1.6.5 mpirun and srun
it doesn't to explain why the 1.7 snapshot is so much better as you'd
expect them to be hurt in the same way.


==OPENMPI 1.6.5==
==mpirun==
[barcoo003:03633] System has detected external process binding to cores 0001
[barcoo003:03633] MCW rank 0 bound to socket 0[core 0]: [B]
[barcoo004:04504] MCW rank 1 bound to socket 0[core 0]: [B]
Hello, World, I am 0 of 2 on host barcoo003 from app number 0 universe size 2 
universe envar 2
Hello, World, I am 1 of 2 on host barcoo004 from app number 0 universe size 2 
universe envar 2
==srun==
Hello, World, I am 0 of 2 on host barcoo003 from app number 1 universe size 2 
universe envar NULL
Hello, World, I am 1 of 2 on host barcoo004 from app number 1 universe size 2 
universe envar NULL
=
==OPENMPI 1.7.3==
DANGER: YOU ARE LOADING A TEST VERSION OF OPENMPI. THIS MAY BE BAD.
==mpirun==
Hello, World, I am 0 of 2 on host barcoo003 from app number 0 universe size 2 
universe envar 2
Hello, World, I am 1 of 2 on host barcoo004 from app number 0 universe size 2 
universe envar 2
==srun==
Hello, World, I am 0 of 2 on host barcoo003 from app number 0 universe size 2 
universe envar NULL
Hello, World, I am 1 of 2 on host barcoo004 from app number 0 universe size 2 
universe envar NULL
=



- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIpcxcACgkQO2KABBYQAh/wdQCfR4q7DfGqJVSU0O3BmgXqAn8w
HsEAn3po0xaxB0+ywejWgSjQ385da7Pa
=T3w4
-END PGP SIGNATURE-


[OMPI devel] (no subject)

2013-09-06 Thread Alex Margolin
Hi,

I'm building ompi r29104 with the following command:

make distclean && ./autogen.sh && ./configure
--prefix=/cs/mosna/alexam02/ompi CFLAGS=-m64 CXXFLAGS=-m64 --without-hwloc
--disable-mpi-threads --disable-progress-threads
--enable-mca-no-build=maffinity,paffinity
--enable-contrib-no-build=libnbc,vt && make && make install

When I build and run any MPI app, I'm getting the following error (and the
app fails):

mpirun: Symbol `orte_process_info' has different size in shared object,
consider re-linking
mpirun: Symbol `orte_plm' has different size in shared object, consider
re-linking
mpirun: symbol lookup error: mpirun: undefined symbol:
orte_trigger_event_t_class

Anybody ever stumbled on this or something similar in the past?

Thanks,
Alex


Re: [OMPI devel] (no subject)

2013-09-06 Thread Alex Margolin
Sorry for the title and the html... send button got pressed too earl.

Anyway, I tried to build OMPI without threads at all with the following
command:

./configure --prefix=/cs/mosna/alexam02/ompi CFLAGS=-m64 CXXFLAGS=-m64
--without-threads --without-hwloc --enable-mca-no-build=maffinity,paffinity
--enable-contrib-no-build=libnbc,vt

Sadly, the build failed very early:

  CC runtime/opal_info_support.lo
runtime/opal_info_support.c: In function 'opal_info_do_params':
runtime/opal_info_support.c:444:9: error: 'errno' undeclared (first use in
this function)
runtime/opal_info_support.c:444:9: note: each undeclared identifier is
reported only once for each function it appears in
make[2]: *** [runtime/opal_info_support.lo] Error 1
make[2]: Leaving directory `/a/store-04/h/lab/mosix/alexam02/ompi-jeff/opal'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/a/store-04/h/lab/mosix/alexam02/ompi-jeff/opal'
make: *** [all-recursive] Error 1

Should this be a trac ticket?

Alex



On Fri, Sep 6, 2013 at 1:22 PM, Alex Margolin  wrote:

> Hi,
>
> I'm building ompi r29104 with the following command:
>
> make distclean && ./autogen.sh && ./configure
> --prefix=/cs/mosna/alexam02/ompi CFLAGS=-m64 CXXFLAGS=-m64 --without-hwloc
> --disable-mpi-threads --disable-progress-threads
> --enable-mca-no-build=maffinity,paffinity
> --enable-contrib-no-build=libnbc,vt && make && make install
>
> When I build and run any MPI app, I'm getting the following error (and the
> app fails):
>
> mpirun: Symbol `orte_process_info' has different size in shared object,
> consider re-linking
> mpirun: Symbol `orte_plm' has different size in shared object, consider
> re-linking
> mpirun: symbol lookup error: mpirun: undefined symbol:
> orte_trigger_event_t_class
>
> Anybody ever stumbled on this or something similar in the past?
>
> Thanks,
> Alex
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] (no subject)

2013-09-06 Thread Jeff Squyres (jsquyres)
On Sep 6, 2013, at 8:06 AM, Alex Margolin  wrote:

> Sorry for the title and the html... send button got pressed too earl.
> 
> Anyway, I tried to build OMPI without threads at all with the following 
> command:
> 
> ./configure --prefix=/cs/mosna/alexam02/ompi CFLAGS=-m64 CXXFLAGS=-m64 
> --without-threads --without-hwloc --enable-mca-no-build=maffinity,paffinity 
> --enable-contrib-no-build=libnbc,vt

FWIW, there's no maffinifty/paffinity any more.  And you can just --disable-vt.

> Sadly, the build failed very early:
> 
>   CC runtime/opal_info_support.lo
> runtime/opal_info_support.c: In function 'opal_info_do_params':
> runtime/opal_info_support.c:444:9: error: 'errno' undeclared (first use in 
> this function)
> runtime/opal_info_support.c:444:9: note: each undeclared identifier is 
> reported only once for each function it appears in
> make[2]: *** [runtime/opal_info_support.lo] Error 1
> make[2]: Leaving directory `/a/store-04/h/lab/mosix/alexam02/ompi-jeff/opal'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory `/a/store-04/h/lab/mosix/alexam02/ompi-jeff/opal'
> make: *** [all-recursive] Error 1
> 
> Should this be a trac ticket?

Seems like it should be an easy fix (e.g., a missing header file?) -- can you 
submit a patch?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/