Re: [OMPI users] help understand unhelpful ORTE error message

2015-11-30 Thread Jeff Squyres (jsquyres)
On Nov 24, 2015, at 9:31 AM, Dave Love  wrote:
> 
>> btw, we already use the force, thanks to the ob1 pml and the yoda spml
> 
> I think that's assuming familiarity with something which leaves out some
> people...

FWIW, I agree: we use unhelpful names for components in Open MPI.  What Gilles 
is specifically referring to here is that there are several Star Wars-based 
names of plugins in Open MPI.  They mean something to us developers (they 
started off as a funny joke), but they mean little/nothing to end users.

I actually specifically called out this issue in the SC'15 Open MPI BOF:

http://image.slidesharecdn.com/ompi-bof-2015-for-web-151130155610-lva1-app6891/95/open-mpi-sc15-state-of-the-union-bof-28-638.jpg?cb=1448898995

This is definitely an issue that is on the agenda for the face-to-face Open MPI 
developer's meeting in February 
(https://github.com/open-mpi/ompi/wiki/Meeting-2016-02).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] help understand unhelpful ORTE error message

2015-11-24 Thread Dave Love
Gilles Gouaillardet  writes:

> Currently, ompi create a file in the temporary directory and then mmap it.
> an obvious requirement is the temporary directory must have enough free
> space for that file.
> (this might be an issue on some disk less nodes)
>
>
> a simple alternative could be to try /tmp, and if there is not enough
> space, try /dev/shm
> (unless the tmpdir has been set explicitly)
>
> any thought ?

/tmp is already the default if TMPDIR et al aren't defined, isn't it?

While you may not have any choice to use /dev/shm on a diskless node, it
doesn't seem a good thing to do by default for large maps.  It wasn't
here.

[I've never been sure of the semantics of mmap over tmpfs.]

I think the important thing is clear explanation of any error, and
suggestions for workarounds.  Presumably anyone operating diskless nodes
has made arrangements for this sort of thing.

> Gilles
>
> btw, we already use the force, thanks to the ob1 pml and the yoda spml

I think that's assuming familiarity with something which leaves out some
people...


Re: [OMPI users] help understand unhelpful ORTE error message

2015-11-20 Thread Gilles Gouaillardet
Currently, ompi create a file in the temporary directory and then mmap it.
an obvious requirement is the temporary directory must have enough free
space for that file.
(this might be an issue on some disk less nodes)


a simple alternative could be to try /tmp, and if there is not enough
space, try /dev/shm
(unless the tmpdir has been set explicitly)

any thought ?

Gilles

btw, we already use the force, thanks to the ob1 pml and the yoda spml

On Friday, November 20, 2015, Dave Love  wrote:

> Jeff Hammond > writes:
>
> >> Doesn't mpich have the option to use sysv memory?  You may want to try
> that
> >>
> >>
> > MPICH?  Look, I may have earned my way onto Santa's naughty list more
> than
> > a few times, but at least I have the decency not to post MPICH questions
> to
> > the Open-MPI list ;-)
> >
> > If there is a way to tell Open-MPI to use shm_open without filesystem
> > backing (if that is even possible) at configure time, I'd love to do
> that.
>
> I'm not sure I understand what's required, but is this what you're after?
>
>   $ ompi_info --param shmem all -l 9|grep priority
>  MCA shmem: parameter "shmem_mmap_priority" (current
> value: "50", data source: default, level: 3 user/all, type: int)
>  MCA shmem: parameter "shmem_posix_priority" (current
> value: "40", data source: default, level: 3 user/all, type: int)
>  MCA shmem: parameter "shmem_sysv_priority" (current
> value: "30", data source: default, level: 3 user/all, type: int)
>
> >> In the spirit OMPI - may the force be with you.
> >>
> >>
> > All I will say here is that Open-MPI has a Vader BTL :-)
>
> Whatever that might mean.
> ___
> users mailing list
> us...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/11/28084.php
>


Re: [OMPI users] help understand unhelpful ORTE error message

2015-11-20 Thread Dave Love
Jeff Hammond  writes:

>> Doesn't mpich have the option to use sysv memory?  You may want to try that
>>
>>
> MPICH?  Look, I may have earned my way onto Santa's naughty list more than
> a few times, but at least I have the decency not to post MPICH questions to
> the Open-MPI list ;-)
>
> If there is a way to tell Open-MPI to use shm_open without filesystem
> backing (if that is even possible) at configure time, I'd love to do that.

I'm not sure I understand what's required, but is this what you're after?

  $ ompi_info --param shmem all -l 9|grep priority
 MCA shmem: parameter "shmem_mmap_priority" (current value: 
"50", data source: default, level: 3 user/all, type: int)
 MCA shmem: parameter "shmem_posix_priority" (current value: 
"40", data source: default, level: 3 user/all, type: int)
 MCA shmem: parameter "shmem_sysv_priority" (current value: 
"30", data source: default, level: 3 user/all, type: int)

>> In the spirit OMPI - may the force be with you.
>>
>>
> All I will say here is that Open-MPI has a Vader BTL :-)

Whatever that might mean.


Re: [OMPI users] help understand unhelpful ORTE error message

2015-11-20 Thread Dave Love
[There must be someone better to answer this, but since I've seen it:]

Jeff Hammond  writes:

> I have no idea what this is trying to tell me.  Help?
>
> jhammond@nid00024:~/MPI/qoit/collectives> mpirun -n 2 ./driver.x 64
> [nid00024:00482] [[46168,0],0] ORTE_ERROR_LOG: Not found in file
> ../../../../../orte/mca/plm/alps/plm_alps_module.c at line 418

That must be a system error message, presumably indicating why the
process couldn't be launched; it's not in the OMPI source.

> I can run the same job with srun without incident:
>
> jhammond@nid00024:~/MPI/qoit/collectives> srun -n 2 ./driver.x 64
> MPI was initialized.
>
> This is on the NERSC Cori Cray XC40 system.  I build Open-MPI git head from
> source for OFI libfabric.
>
> I have many other issues, which I will report later.  As a spoiler, if I
> cannot use your mpirun, I cannot set any of the MCA options there.  Is
> there a method to set MCA options with environment variables?  I could not
> find this documented anywhere.

mpirun(1) documents the mechanisms under "Setting MCA Parameters",
unless it's changed since 1.8.  [I have wondered why a file in cwd isn't
a possibility, only in $HOME.]


Re: [OMPI users] help understand unhelpful ORTE error message

2015-11-19 Thread Jeff Hammond
On Thu, Nov 19, 2015 at 4:11 PM, Howard Pritchard 
wrote:

> Hi Jeff H.
>
> Why don't you just try configuring with
>
> ./configure --prefix=my_favorite_install_dir
> --with-libfabric=install_dir_for_libfabric
> make -j 8 install
>
> and see what happens?
>
>
That was the first thing I tried.  However, it seemed to give me a
Verbs-oriented build, and Verbs is the Sith lord to us JedOFIs :-)

>From aforementioned Wiki:

../configure \
 --with-libfabric=$HOME/OFI/install-ofi-gcc-gni-cori \
 --disable-shared \
 --prefix=$HOME/MPI/install-ompi-ofi-gcc-gni-cori

Unfortunately, this (above) leads to an mpicc that indicates support for IB
Verbs, not OFI.
I will try again though just in case.


> Make sure before you configure that you have PrgEnv-gnu or PrgEnv-intel
> module loaded.
>
>
Yeah, I know better than to use the Cray compilers for such things (e.g.
https://github.com/jeffhammond/OpenPA/commit/965ca014ea3148ee5349e16d2cec1024271a7415
)


> Those were the configure/compiler options I used to do testing of ofi mtl
> on cori.
>
> Jeff S. - this thread has gotten intermingled with mpich setup as well,
> hence
> the suggestion for the mpich shm mechanism.
>
>
The first OSS implementation of MPI that I can use on Cray XC using OFI
gets a prize at the December MPI Forum.

Best,

Jeff



> Howard
>
>
>
> 2015-11-19 16:59 GMT-07:00 Jeff Hammond :
>
>>
>>> How did you configure for Cori?  You need to be using the slurm plm
>>> component for that system.  I know this sounds like gibberish.
>>>
>>>
>> ../configure --with-libfabric=$HOME/OFI/install-ofi-gcc-gni-cori \
>>  --enable-mca-static=mtl-ofi \
>>  --enable-mca-no-build=btl-openib,btl-vader,btl-ugni,btl-tcp \
>>  --enable-static --disable-shared --disable-dlopen \
>>  --prefix=$HOME/MPI/install-ompi-ofi-gcc-gni-xpmem-cori \
>>  --with-cray-pmi --with-alps --with-cray-xpmem --with-slurm \
>>  --without-verbs --without-fca --without-mxm --without-ucx \
>>  --without-portals4 --without-psm --without-psm2 \
>>  --without-udreg --without-ugni --without-munge \
>>  --without-sge --without-loadleveler --without-tm --without-lsf \
>>  --without-pvfs2 --without-plfs \
>>  --without-cuda --disable-oshmem \
>>  --disable-mpi-fortran --disable-oshmem-fortran \
>>  LDFLAGS="-L/opt/cray/ugni/default/lib64 -lugni \
>>   -L/opt/cray/alps/default/lib64 -lalps -lalpslli -lalpsutil \   
>>-ldl -lrt"
>>
>>
>> This is copied from
>> https://github.com/jeffhammond/HPCInfo/blob/master/ofi/README.md#open-mpi,
>> which I note in case you want to see what changes I've made at any point in
>> the future.
>>
>>
>>> There should be a with-slurm configure option to pick up this component.
>>>
>>> Indeed there is.
>>
>>
>>> Doesn't mpich have the option to use sysv memory?  You may want to try
>>> that
>>>
>>>
>> MPICH?  Look, I may have earned my way onto Santa's naughty list more
>> than a few times, but at least I have the decency not to post MPICH
>> questions to the Open-MPI list ;-)
>>
>> If there is a way to tell Open-MPI to use shm_open without filesystem
>> backing (if that is even possible) at configure time, I'd love to do that.
>>
>>
>>> Oh for tuning params you can use env variables.  For example lets say
>>> rather than using the gni provider in ofi mtl you want to try sockets. Then
>>> do
>>>
>>> Export OMPI_MCA_mtl_ofi_provider_include=sockets
>>>
>>>
>> Thanks.  I'm glad that there is an option to set them this way.
>>
>>
>>> In the spirit OMPI - may the force be with you.
>>>
>>>
>> All I will say here is that Open-MPI has a Vader BTL :-)
>>
>>>
>>> > On Thu 19.11.2015 09:44:20 Jeff Hammond wrote:
>>> > > I have no idea what this is trying to tell me. Help?
>>> > >
>>> > > jhammond@nid00024:~/MPI/qoit/collectives> mpirun -n 2 ./driver.x 64
>>> > > [nid00024:00482] [[46168,0],0] ORTE_ERROR_LOG: Not found in file
>>> > > ../../../../../orte/mca/plm/alps/plm_alps_module.c at line 418
>>> > >
>>> > > I can run the same job with srun without incident:
>>> > >
>>> > > jhammond@nid00024:~/MPI/qoit/collectives> srun -n 2 ./driver.x 64
>>> > > MPI was initialized.
>>> > >
>>> > > This is on the NERSC Cori Cray XC40 system. I build Open-MPI git
>>> head from
>>> > > source for OFI libfabric.
>>> > >
>>> > > I have many other issues, which I will report later. As a spoiler,
>>> if I
>>> > > cannot use your mpirun, I cannot set any of the MCA options there. Is
>>> > > there a method to set MCA options with environment variables? I
>>> could not
>>> > > find this documented anywhere.
>>> > >
>>> > > In particular, is there a way to cause shm to not use the global
>>> > > filesystem? I see this issue comes up a lot and I read the list
>>> archives,
>>> > > but the warning message (
>>> > >
>>> 

Re: [OMPI users] help understand unhelpful ORTE error message

2015-11-19 Thread Howard Pritchard
Hi Jeff,

I finally got an allocation on cori - its one busy machine.

Anyway, using the ompi i'd built on edison with the above recommended
configure options
I was able to run using either srun or mpirun on cori provided that in the
later case I used

mpirun -np X -N Y --mca plm slurm ./my_favorite_app

I will make an adjustment to the alps plm launcher to disqualify itself if
the wlm_detect
facility on the cray reports that srun is the launcher.  That's a minor fix
and should make
it in to v2.x in a week or so.  It will be a runtime selection so you only
have to build ompi
once for use either on edison or cori.

Howard


2015-11-19 17:11 GMT-07:00 Howard Pritchard :

> Hi Jeff H.
>
> Why don't you just try configuring with
>
> ./configure --prefix=my_favorite_install_dir
> --with-libfabric=install_dir_for_libfabric
> make -j 8 install
>
> and see what happens?
>
> Make sure before you configure that you have PrgEnv-gnu or PrgEnv-intel
> module loaded.
>
> Those were the configure/compiler options I used to do testing of ofi mtl
> on cori.
>
> Jeff S. - this thread has gotten intermingled with mpich setup as well,
> hence
> the suggestion for the mpich shm mechanism.
>
>
> Howard
>
>
>
> 2015-11-19 16:59 GMT-07:00 Jeff Hammond :
>
>>
>>> How did you configure for Cori?  You need to be using the slurm plm
>>> component for that system.  I know this sounds like gibberish.
>>>
>>>
>> ../configure --with-libfabric=$HOME/OFI/install-ofi-gcc-gni-cori \
>>  --enable-mca-static=mtl-ofi \
>>  --enable-mca-no-build=btl-openib,btl-vader,btl-ugni,btl-tcp \
>>  --enable-static --disable-shared --disable-dlopen \
>>  --prefix=$HOME/MPI/install-ompi-ofi-gcc-gni-xpmem-cori \
>>  --with-cray-pmi --with-alps --with-cray-xpmem --with-slurm \
>>  --without-verbs --without-fca --without-mxm --without-ucx \
>>  --without-portals4 --without-psm --without-psm2 \
>>  --without-udreg --without-ugni --without-munge \
>>  --without-sge --without-loadleveler --without-tm --without-lsf \
>>  --without-pvfs2 --without-plfs \
>>  --without-cuda --disable-oshmem \
>>  --disable-mpi-fortran --disable-oshmem-fortran \
>>  LDFLAGS="-L/opt/cray/ugni/default/lib64 -lugni \
>>   -L/opt/cray/alps/default/lib64 -lalps -lalpslli -lalpsutil \   
>>-ldl -lrt"
>>
>>
>> This is copied from
>> https://github.com/jeffhammond/HPCInfo/blob/master/ofi/README.md#open-mpi,
>> which I note in case you want to see what changes I've made at any point in
>> the future.
>>
>>
>>> There should be a with-slurm configure option to pick up this component.
>>>
>>> Indeed there is.
>>
>>
>>> Doesn't mpich have the option to use sysv memory?  You may want to try
>>> that
>>>
>>>
>> MPICH?  Look, I may have earned my way onto Santa's naughty list more
>> than a few times, but at least I have the decency not to post MPICH
>> questions to the Open-MPI list ;-)
>>
>> If there is a way to tell Open-MPI to use shm_open without filesystem
>> backing (if that is even possible) at configure time, I'd love to do that.
>>
>>
>>> Oh for tuning params you can use env variables.  For example lets say
>>> rather than using the gni provider in ofi mtl you want to try sockets. Then
>>> do
>>>
>>> Export OMPI_MCA_mtl_ofi_provider_include=sockets
>>>
>>>
>> Thanks.  I'm glad that there is an option to set them this way.
>>
>>
>>> In the spirit OMPI - may the force be with you.
>>>
>>>
>> All I will say here is that Open-MPI has a Vader BTL :-)
>>
>>>
>>> > On Thu 19.11.2015 09:44:20 Jeff Hammond wrote:
>>> > > I have no idea what this is trying to tell me. Help?
>>> > >
>>> > > jhammond@nid00024:~/MPI/qoit/collectives> mpirun -n 2 ./driver.x 64
>>> > > [nid00024:00482] [[46168,0],0] ORTE_ERROR_LOG: Not found in file
>>> > > ../../../../../orte/mca/plm/alps/plm_alps_module.c at line 418
>>> > >
>>> > > I can run the same job with srun without incident:
>>> > >
>>> > > jhammond@nid00024:~/MPI/qoit/collectives> srun -n 2 ./driver.x 64
>>> > > MPI was initialized.
>>> > >
>>> > > This is on the NERSC Cori Cray XC40 system. I build Open-MPI git
>>> head from
>>> > > source for OFI libfabric.
>>> > >
>>> > > I have many other issues, which I will report later. As a spoiler,
>>> if I
>>> > > cannot use your mpirun, I cannot set any of the MCA options there. Is
>>> > > there a method to set MCA options with environment variables? I
>>> could not
>>> > > find this documented anywhere.
>>> > >
>>> > > In particular, is there a way to cause shm to not use the global
>>> > > filesystem? I see this issue comes up a lot and I read the list
>>> archives,
>>> > > but the warning message (
>>> > >
>>> https://github.com/hpc/cce-mpi-openmpi-1.6.4/blob/master/ompi/mca/common/sm/
>>> > > help-mpi-common-sm.txt) suggested that I could override it by
>>> setting TMP,
>>> 

Re: [OMPI users] help understand unhelpful ORTE error message

2015-11-19 Thread Jeff Hammond
>
>
> How did you configure for Cori?  You need to be using the slurm plm
> component for that system.  I know this sounds like gibberish.
>
>
../configure --with-libfabric=$HOME/OFI/install-ofi-gcc-gni-cori \
 --enable-mca-static=mtl-ofi \
 --enable-mca-no-build=btl-openib,btl-vader,btl-ugni,btl-tcp \
 --enable-static --disable-shared --disable-dlopen \
 --prefix=$HOME/MPI/install-ompi-ofi-gcc-gni-xpmem-cori \
 --with-cray-pmi --with-alps --with-cray-xpmem --with-slurm \
 --without-verbs --without-fca --without-mxm --without-ucx \
 --without-portals4 --without-psm --without-psm2 \
 --without-udreg --without-ugni --without-munge \
 --without-sge --without-loadleveler --without-tm --without-lsf \
 --without-pvfs2 --without-plfs \
 --without-cuda --disable-oshmem \
 --disable-mpi-fortran --disable-oshmem-fortran \
 LDFLAGS="-L/opt/cray/ugni/default/lib64 -lugni \
-L/opt/cray/alps/default/lib64 -lalps -lalpslli -lalpsutil
\  -ldl -lrt"


This is copied from
https://github.com/jeffhammond/HPCInfo/blob/master/ofi/README.md#open-mpi,
which I note in case you want to see what changes I've made at any point in
the future.


> There should be a with-slurm configure option to pick up this component.
>
> Indeed there is.


> Doesn't mpich have the option to use sysv memory?  You may want to try that
>
>
MPICH?  Look, I may have earned my way onto Santa's naughty list more than
a few times, but at least I have the decency not to post MPICH questions to
the Open-MPI list ;-)

If there is a way to tell Open-MPI to use shm_open without filesystem
backing (if that is even possible) at configure time, I'd love to do that.


> Oh for tuning params you can use env variables.  For example lets say
> rather than using the gni provider in ofi mtl you want to try sockets. Then
> do
>
> Export OMPI_MCA_mtl_ofi_provider_include=sockets
>
>
Thanks.  I'm glad that there is an option to set them this way.


> In the spirit OMPI - may the force be with you.
>
>
All I will say here is that Open-MPI has a Vader BTL :-)

>
> > On Thu 19.11.2015 09:44:20 Jeff Hammond wrote:
> > > I have no idea what this is trying to tell me. Help?
> > >
> > > jhammond@nid00024:~/MPI/qoit/collectives> mpirun -n 2 ./driver.x 64
> > > [nid00024:00482] [[46168,0],0] ORTE_ERROR_LOG: Not found in file
> > > ../../../../../orte/mca/plm/alps/plm_alps_module.c at line 418
> > >
> > > I can run the same job with srun without incident:
> > >
> > > jhammond@nid00024:~/MPI/qoit/collectives> srun -n 2 ./driver.x 64
> > > MPI was initialized.
> > >
> > > This is on the NERSC Cori Cray XC40 system. I build Open-MPI git head
> from
> > > source for OFI libfabric.
> > >
> > > I have many other issues, which I will report later. As a spoiler, if I
> > > cannot use your mpirun, I cannot set any of the MCA options there. Is
> > > there a method to set MCA options with environment variables? I could
> not
> > > find this documented anywhere.
> > >
> > > In particular, is there a way to cause shm to not use the global
> > > filesystem? I see this issue comes up a lot and I read the list
> archives,
> > > but the warning message (
> > >
> https://github.com/hpc/cce-mpi-openmpi-1.6.4/blob/master/ompi/mca/common/sm/
> > > help-mpi-common-sm.txt) suggested that I could override it by setting
> TMP,
> > > TEMP or TEMPDIR, which I did to no avail.
> >
> > From my experience on edison: the one environment variable that does
> works is TMPDIR - the one that is not listed in the error message :-)
>

That's great.  I will try that now.  Is there a Github issue open already
to fix that documentation?  If not...


> > Can't help you with your mpirun problem though ...
>
> No worries.  I appreciate all the help I can get.

Thanks,

Jeff

-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/


Re: [OMPI users] help understand unhelpful ORTE error message

2015-11-19 Thread Howard
Hi Jeff

How did you configure for Cori?  You need to be using the slurm plm component 
for that system.  I know this sounds like gibberish.  

There should be a with-slurm configure option to pick up this component. 

Doesn't mpich have the option to use sysv memory?  You may want to try that

Oh for tuning params you can use env variables.  For example lets say rather 
than using the gni provider in ofi mtl you want to try sockets. Then do

Export OMPI_MCA_mtl_ofi_provider_include=sockets

In the spirit OMPI - may the force be with you.   

Howard 

Von meinem iPhone gesendet

> Am 19.11.2015 um 11:51 schrieb Martin Siegert :
> 
> Hi Jeff,
>  
> On Thu 19.11.2015 09:44:20 Jeff Hammond wrote:
> > I have no idea what this is trying to tell me. Help?
> >
> > jhammond@nid00024:~/MPI/qoit/collectives> mpirun -n 2 ./driver.x 64
> > [nid00024:00482] [[46168,0],0] ORTE_ERROR_LOG: Not found in file
> > ../../../../../orte/mca/plm/alps/plm_alps_module.c at line 418
> >
> > I can run the same job with srun without incident:
> >
> > jhammond@nid00024:~/MPI/qoit/collectives> srun -n 2 ./driver.x 64
> > MPI was initialized.
> >
> > This is on the NERSC Cori Cray XC40 system. I build Open-MPI git head from
> > source for OFI libfabric.
> >
> > I have many other issues, which I will report later. As a spoiler, if I
> > cannot use your mpirun, I cannot set any of the MCA options there. Is
> > there a method to set MCA options with environment variables? I could not
> > find this documented anywhere.
> >
> > In particular, is there a way to cause shm to not use the global
> > filesystem? I see this issue comes up a lot and I read the list archives,
> > but the warning message (
> > https://github.com/hpc/cce-mpi-openmpi-1.6.4/blob/master/ompi/mca/common/sm/
> > help-mpi-common-sm.txt) suggested that I could override it by setting TMP,
> > TEMP or TEMPDIR, which I did to no avail.
>  
> From my experience on edison: the one environment variable that does works is 
> TMPDIR - the one that is not listed in the error message :-)
>  
> Can't help you with your mpirun problem though ...
>  
> Cheers,
> Martin
>  
> --
> Martin Siegert
> Head, Research Computing
> WestGrid/ComputeCanada Site Lead
> Simon Fraser University
> Burnaby, British Columbia
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/11/28063.php


Re: [OMPI users] help understand unhelpful ORTE error message

2015-11-19 Thread Martin Siegert
Hi Jeff,

On Thu 19.11.2015 09:44:20 Jeff Hammond wrote:
> I have no idea what this is trying to tell me.  Help?
> 
> jhammond@nid00024:~/MPI/qoit/collectives> mpirun -n 2 ./driver.x 64
> [nid00024:00482] [[46168,0],0] ORTE_ERROR_LOG: Not found in file
> ../../../../../orte/mca/plm/alps/plm_alps_module.c at line 418
> 
> I can run the same job with srun without incident:
> 
> jhammond@nid00024:~/MPI/qoit/collectives> srun -n 2 ./driver.x 64
> MPI was initialized.
> 
> This is on the NERSC Cori Cray XC40 system.  I build Open-MPI git head 
from
> source for OFI libfabric.
> 
> I have many other issues, which I will report later.  As a spoiler, if I
> cannot use your mpirun, I cannot set any of the MCA options there.  Is
> there a method to set MCA options with environment variables?  I could 
not
> find this documented anywhere.
> 
> In particular, is there a way to cause shm to not use the global
> filesystem?  I see this issue comes up a lot and I read the list archives,
> but the warning message (
> https://github.com/hpc/cce-mpi-openmpi-1.6.4/blob/master/ompi/mca/common/sm/
> help-mpi-common-sm.txt) suggested that I could override it by setting 
TMP,
> TEMP or TEMPDIR, which I did to no avail.

>From my experience on edison: the one environment variable that does 
works is TMPDIR - the one that is not listed in the error message :-)

Can't help you with your mpirun problem though ...

Cheers,
Martin

-- 
Martin Siegert
Head, Research Computing
WestGrid/ComputeCanada Site Lead
Simon Fraser University
Burnaby, British Columbia


[OMPI users] help understand unhelpful ORTE error message

2015-11-19 Thread Jeff Hammond
I have no idea what this is trying to tell me.  Help?

jhammond@nid00024:~/MPI/qoit/collectives> mpirun -n 2 ./driver.x 64
[nid00024:00482] [[46168,0],0] ORTE_ERROR_LOG: Not found in file
../../../../../orte/mca/plm/alps/plm_alps_module.c at line 418

I can run the same job with srun without incident:

jhammond@nid00024:~/MPI/qoit/collectives> srun -n 2 ./driver.x 64
MPI was initialized.

This is on the NERSC Cori Cray XC40 system.  I build Open-MPI git head from
source for OFI libfabric.

I have many other issues, which I will report later.  As a spoiler, if I
cannot use your mpirun, I cannot set any of the MCA options there.  Is
there a method to set MCA options with environment variables?  I could not
find this documented anywhere.

In particular, is there a way to cause shm to not use the global
filesystem?  I see this issue comes up a lot and I read the list archives,
but the warning message (
https://github.com/hpc/cce-mpi-openmpi-1.6.4/blob/master/ompi/mca/common/sm/help-mpi-common-sm.txt)
suggested that I could override it by setting TMP, TEMP or TEMPDIR, which I
did to no avail.

Thanks,

Jeff

--
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/