Re: [OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)

2009-11-17 Thread Ralph Castain
Sorry I didn't answer more completely before - a tad tied up today with network 
problems :-/

Actually, both you and Michael pointed out the "flaw" in your own reasoning, 
and hit the reason why we -don't- forward environment. It is obvious, for 
example, that you don't want to forward HOSTNAME and DISPLAY. But how is OMPI 
supposed to know precisely what -is- and -isn't- safe to forward?

Do we grab a sample based on what we developers think? What happens when the 
sys admin of a cluster sets up the environment with site-specific variables on 
the head node that must not be forwarded to the backend compute nodes? We had 
plenty of those at my former employer, and it isn't an uncommon situation. So 
how does OMPI identify and avoid those?

This is why we don't forward anything we are not specifically told to forward. 
The code that crept into the Torque launcher is incorrect and can cause a lot 
of problems. For one thing, it makes the remote processes think they are 
running on the incorrect node name!

Unfortunately, that code was copy/pasted from a different launcher that 
fork/exec's a local cmd to launch the daemons. In that scenario, passing a copy 
of mpirun's environment to execve is fine - the environment is not passed to 
anything remote. tm_spawn is a different story.

I will consult with other developers, but I do believe the right answer is to 
not forward the entire environment for the previously identified reasons, and 
to implement the "forward all except these" as an alternative to the current 
"-x" option. I'll pass along our decision about what to do with the current 
code.

HTH
Ralph

On Nov 17, 2009, at 1:49 PM, David Singleton wrote:

> 
> Hi Ralph,
> 
> Now I'm in a quandry - if I show you that its actually Open MPI that is
> propagating the environment then you are likely to "fix it" and then tm
> users will lose a nice feature.  :-)
> 
> Can I suggest that "least surprise" would require that MPI tasks get
> exactly the same environment/limits/... as mpirun so that "mpirun a.out"
> behaves just like "a.out".  [Following this principle we modified
> tm_spawn to propagate the callers rlimits to the spawned tasks.]
> A comment in orterun.c (see below) below suggests that Open MPI is trying
> to distinguish between "local" and "remote" processes.  I would have
> thought that distinction should be invisible to users as much as possible
> - a user asking for 4 cpus would like to see the same behaviour if all
> 4 are local or "2 local, 2 remote".
> 
> As to why tm does "The Right Thing": in the case of rsh/ssh the full
> mpirun environment is given to the rsh/ssh process locally while in the tm
> case it is an argument to tm_spawn and so gets given to the process (in
> this case orted) being launched remotely. Relevant lines from 1.3.3 below.
> PBS just passes along the environment it is told to.  We dont use torque
> but as of 2.3.3, it was still the same as OpenPBS in this respect.
> 
> Michael just pointed out the slight flaw.  The environment should be
> somewhat selectively propagated (exclude HOSTNAME etc).  I guess if you
> were to "fix" plm_tm_module I would put the propagation behaviour in
> tm_spawn and try to handle these exceptional cases.
> 
> Cheers,
> David
> 
> 
> orterun.c:
> 
>510 /* save the environment for launch purposes. This MUST be
>511  * done so that we can pass it to any local procs we
>512  * spawn - otherwise, those local procs won't see any
>513  * non-MCA envars were set in the enviro prior to calling
>514  * orterun
>515  */
>516 orte_launch_environ = opal_argv_copy(environ);
> 
> 
> plm_rsh_module.c:
> 
>681 /* actually ssh the child */
>682 static void ssh_child(int argc, char **argv,
>683   orte_vpid_t vpid, int proc_vpid_index)
>684 {
> 
>694 /* setup environment */
>695 env = opal_argv_copy(orte_launch_environ);
> 
>766 execve(exec_path, exec_argv, env);
> 
> 
> plm_tm_module.c:
> 
>128 static int plm_tm_launch_job(orte_job_t *jdata)
>129 {
> 
>228 /* setup environment */
>229 env = opal_argv_copy(orte_launch_environ);
> 
>311 rc = tm_spawn(argc, argv, env, node->launch_id, tm_task_ids + 
> launched, tm_events + launched);
> 
> 
> 
> Ralph Castain wrote:
>> Not exactly. It completely depends on how Torque was setup - OMPI isn't 
>> forwarding the environment. Torque is.
>> We made a design decision at the very beginning of the OMPI project not to 
>> forward non-OMPI envars unless directed to do so by the user. I'm afraid I 
>> disagree with Michael's claim that other MPIs do forward them - yes, MPICH 
>> does, but not all others do.
>> The world is bigger than MPICH and OMPI :-)
>> Since there is inconsistency in this regard between MPIs, we chose not to 
>> forward. Reason was simple: there is no way to know what is safe to forward 
>> vs what is not (e.g., what to do with DISPLAY), nor what the underlyin

Re: [OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)

2009-11-17 Thread Ralph Castain
Ah - not good. It is clearly a programming error. I'll have to review the other 
launchers and consult the others in the project to decide on the proper course 
of action.

Thanks

On Nov 17, 2009, at 1:49 PM, David Singleton wrote:

> 
> Hi Ralph,
> 
> Now I'm in a quandry - if I show you that its actually Open MPI that is
> propagating the environment then you are likely to "fix it" and then tm
> users will lose a nice feature.  :-)
> 
> Can I suggest that "least surprise" would require that MPI tasks get
> exactly the same environment/limits/... as mpirun so that "mpirun a.out"
> behaves just like "a.out".  [Following this principle we modified
> tm_spawn to propagate the callers rlimits to the spawned tasks.]
> A comment in orterun.c (see below) below suggests that Open MPI is trying
> to distinguish between "local" and "remote" processes.  I would have
> thought that distinction should be invisible to users as much as possible
> - a user asking for 4 cpus would like to see the same behaviour if all
> 4 are local or "2 local, 2 remote".
> 
> As to why tm does "The Right Thing": in the case of rsh/ssh the full
> mpirun environment is given to the rsh/ssh process locally while in the tm
> case it is an argument to tm_spawn and so gets given to the process (in
> this case orted) being launched remotely. Relevant lines from 1.3.3 below.
> PBS just passes along the environment it is told to.  We dont use torque
> but as of 2.3.3, it was still the same as OpenPBS in this respect.
> 
> Michael just pointed out the slight flaw.  The environment should be
> somewhat selectively propagated (exclude HOSTNAME etc).  I guess if you
> were to "fix" plm_tm_module I would put the propagation behaviour in
> tm_spawn and try to handle these exceptional cases.
> 
> Cheers,
> David
> 
> 
> orterun.c:
> 
>510 /* save the environment for launch purposes. This MUST be
>511  * done so that we can pass it to any local procs we
>512  * spawn - otherwise, those local procs won't see any
>513  * non-MCA envars were set in the enviro prior to calling
>514  * orterun
>515  */
>516 orte_launch_environ = opal_argv_copy(environ);
> 
> 
> plm_rsh_module.c:
> 
>681 /* actually ssh the child */
>682 static void ssh_child(int argc, char **argv,
>683   orte_vpid_t vpid, int proc_vpid_index)
>684 {
> 
>694 /* setup environment */
>695 env = opal_argv_copy(orte_launch_environ);
> 
>766 execve(exec_path, exec_argv, env);
> 
> 
> plm_tm_module.c:
> 
>128 static int plm_tm_launch_job(orte_job_t *jdata)
>129 {
> 
>228 /* setup environment */
>229 env = opal_argv_copy(orte_launch_environ);
> 
>311 rc = tm_spawn(argc, argv, env, node->launch_id, tm_task_ids + 
> launched, tm_events + launched);
> 
> 
> 
> Ralph Castain wrote:
>> Not exactly. It completely depends on how Torque was setup - OMPI isn't 
>> forwarding the environment. Torque is.
>> We made a design decision at the very beginning of the OMPI project not to 
>> forward non-OMPI envars unless directed to do so by the user. I'm afraid I 
>> disagree with Michael's claim that other MPIs do forward them - yes, MPICH 
>> does, but not all others do.
>> The world is bigger than MPICH and OMPI :-)
>> Since there is inconsistency in this regard between MPIs, we chose not to 
>> forward. Reason was simple: there is no way to know what is safe to forward 
>> vs what is not (e.g., what to do with DISPLAY), nor what the underlying 
>> environment is trying to forward vs what it isn't. It is very easy to get 
>> cross-wise and cause totally unexpected behavior, as users have complained 
>> about for years.
>> First, if you are using a managed environment like Torque, we recommend that 
>> you work with your sys admin to decide how to configure it. This is the best 
>> way to resolve a problem.
>> Second, if you are not using a managed environment and/or decide not to have 
>> that environment do the forwarding, you can tell OMPI to forward the envars 
>> you need by specifying them via the -x cmd line option. We already have a 
>> request to expand this capability, and I will be doing so as time permits. 
>> One option I'll be adding is the reverse of -x - i.e., "forward all envars 
>> -except- the specified one(s)".
>> HTH
>> ralph




Re: [OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)

2009-11-17 Thread David Singleton


Hi Ralph,

Now I'm in a quandry - if I show you that its actually Open MPI that is
propagating the environment then you are likely to "fix it" and then tm
users will lose a nice feature.  :-)

Can I suggest that "least surprise" would require that MPI tasks get
exactly the same environment/limits/... as mpirun so that "mpirun a.out"
behaves just like "a.out".  [Following this principle we modified
tm_spawn to propagate the callers rlimits to the spawned tasks.]
A comment in orterun.c (see below) below suggests that Open MPI is trying
to distinguish between "local" and "remote" processes.  I would have
thought that distinction should be invisible to users as much as possible
- a user asking for 4 cpus would like to see the same behaviour if all
4 are local or "2 local, 2 remote".

As to why tm does "The Right Thing": in the case of rsh/ssh the full
mpirun environment is given to the rsh/ssh process locally while in the tm
case it is an argument to tm_spawn and so gets given to the process (in
this case orted) being launched remotely. Relevant lines from 1.3.3 below.
PBS just passes along the environment it is told to.  We dont use torque
but as of 2.3.3, it was still the same as OpenPBS in this respect.

Michael just pointed out the slight flaw.  The environment should be
somewhat selectively propagated (exclude HOSTNAME etc).  I guess if you
were to "fix" plm_tm_module I would put the propagation behaviour in
tm_spawn and try to handle these exceptional cases.

Cheers,
David


orterun.c:

510 /* save the environment for launch purposes. This MUST be
511  * done so that we can pass it to any local procs we
512  * spawn - otherwise, those local procs won't see any
513  * non-MCA envars were set in the enviro prior to calling
514  * orterun
515  */
516 orte_launch_environ = opal_argv_copy(environ);


plm_rsh_module.c:

681 /* actually ssh the child */
682 static void ssh_child(int argc, char **argv,
683   orte_vpid_t vpid, int proc_vpid_index)
684 {

694 /* setup environment */
695 env = opal_argv_copy(orte_launch_environ);

766 execve(exec_path, exec_argv, env);


plm_tm_module.c:

128 static int plm_tm_launch_job(orte_job_t *jdata)
129 {

228 /* setup environment */
229 env = opal_argv_copy(orte_launch_environ);

311 rc = tm_spawn(argc, argv, env, node->launch_id, tm_task_ids + 
launched, tm_events + launched);



Ralph Castain wrote:

Not exactly. It completely depends on how Torque was setup - OMPI isn't 
forwarding the environment. Torque is.

We made a design decision at the very beginning of the OMPI project not to 
forward non-OMPI envars unless directed to do so by the user. I'm afraid I 
disagree with Michael's claim that other MPIs do forward them - yes, MPICH 
does, but not all others do.

The world is bigger than MPICH and OMPI :-)

Since there is inconsistency in this regard between MPIs, we chose not to 
forward. Reason was simple: there is no way to know what is safe to forward vs 
what is not (e.g., what to do with DISPLAY), nor what the underlying 
environment is trying to forward vs what it isn't. It is very easy to get 
cross-wise and cause totally unexpected behavior, as users have complained 
about for years.

First, if you are using a managed environment like Torque, we recommend that 
you work with your sys admin to decide how to configure it. This is the best 
way to resolve a problem.

Second, if you are not using a managed environment and/or decide not to have that 
environment do the forwarding, you can tell OMPI to forward the envars you need by 
specifying them via the -x cmd line option. We already have a request to expand this 
capability, and I will be doing so as time permits. One option I'll be adding is the 
reverse of -x - i.e., "forward all envars -except- the specified one(s)".

HTH
ralph



Re: [OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)

2009-11-17 Thread Michael Sternberg

On Nov 17, 2009, at 10:17 , Michael Sternberg wrote:

On Nov 17, 2009, at 9:10 , Ralph Castain wrote:
Not exactly. It completely depends on how Torque was setup - OMPI  
isn't forwarding the environment. Torque is.


I actually tried compiling OMPI with the tm interface a couple of  
versions back for both packages but ran into memory trouble, which  
is why I didn't pursue this.  With torque-2.4.x out and OpenMPI  
getting close to 1.3.4 I'll try again.


Follow-up:  I recompiled OpenMPI-1.3.2 "--with-tm" (from torque-2.3.6)  
and, lo and behold, environment variables and modules now are passed  
across nodes, which thus includes custom modules loaded in the job  
file.   Darn, that was an old hang-up!


The variables passed do include (unsurprisingly) $HOSTNAME, but I can  
live with that:


login4 $ qsub -l nodes=2:ppn=1 -I
qsub: waiting for job 34717.mds01 to start
qsub: job 34717.mds01 ready

n102 $ mpirun hostname
n102
n091
n102 $ mpirun env | grep HOSTNAME
HOSTNAME=n102
HOSTNAME=n102

Ralph, David - thank you for the pointers!


Michael




Re: [OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)

2009-11-17 Thread Michael Sternberg
Hi,

On Nov 17, 2009, at 9:10 , Ralph Castain wrote:
> Not exactly. It completely depends on how Torque was setup - OMPI isn't 
> forwarding the environment. Torque is.

I actually tried compiling OMPI with the tm interface a couple of versions back 
for both packages but ran into memory trouble, which is why I didn't pursue 
this.  With torque-2.4.x out and OpenMPI getting close to 1.3.4 I'll try again.


> We made a design decision at the very beginning of the OMPI project not to 
> forward non-OMPI envars unless directed to do so by the user. I'm afraid I 
> disagree with Michael's claim that other MPIs do forward them - yes, MPICH 
> does, but not all others do.
> 
> The world is bigger than MPICH and OMPI :-)

Yup, I saw your message from just last month 
http://www.open-mpi.org/community/lists/users/2009/10/10994.php ; I didn't mean 
to make a global claim :-)  I'm aware that exporting environment variables 
(including $PWD) under MPI is implementation dependent.  I just happened to 
have MPICH, Intel MPI (same roots), and OpenMPI on my cluster.

> First, if you are using a managed environment like Torque, we recommend that 
> you work with your sys admin to decide how to configure it. This is the best 
> way to resolve a problem.

Yeah, I wish that guy would know better and not have to ask around mailing 
lists :-)


> Second, if you are not using a managed environment and/or decide not to have 
> that environment do the forwarding, you can tell OMPI to forward the envars 
> you need by specifying them via the -x cmd line option. We already have a 
> request to expand this capability, and I will be doing so as time permits. 
> One option I'll be adding is the reverse of -x - i.e., "forward all envars 
> -except- the specified one(s)".

The issue with -x is that modules may set any random variable.  The reverse 
option to -x would be great of course.  MPICH2 and Intel MPI pass all but a few 
(known to be host-specific) variables by default, and counter that with "none" 
and "all" options.


Thanks!

Michael



> HTH
> ralph
> 
> On Nov 17, 2009, at 5:55 AM, David Singleton wrote:
> 
>> 
>> I can see the difference - we built Open MPI with tm support.  For some
>> reason, I thought mpirun fed its environment to orted (after orted is
>> launched) so orted can pass it on to MPI tasks.  That should be portable
>> between different launch mechanisms.  But it looks like tm launches
>> orted with the full mpirun environment (at the request of mpirun).
>> 
>> Cheers,
>> David
>> 
>> 
>> Michael Sternberg wrote:
>>> Hi David,
>>> Hmm, your demo is well-chosen and crystal-clear, yet the output is 
>>> unexpected.  I do not see environment vars passed by default here:
>>> login3$ qsub -l nodes=2:ppn=1 -I
>>> qsub: waiting for job 34683.mds01 to start
>>> qsub: job 34683.mds01 ready
>>> n102$ mpirun -n 2 -machinefile $PBS_NODEFILE hostname
>>> n102
>>> n085
>>> n102$ mpirun -n 2 -machinefile $PBS_NODEFILE env | grep FOO
>>> n102$ export FOO=BAR
>>> n102$ mpirun -n 2 -machinefile $PBS_NODEFILE env | grep FOO
>>> FOO=BAR
>>> n102$ type mpirun
>>> mpirun is hashed (/opt/soft/openmpi-1.3.2-intel10-1/bin/mpirun)
>>> Curious, what do you get upon:
>>> where mpirun
>>> I built OpenMPI-1.3.2 here from source with:
>>>   CC=icc  CXX=icpc  FC=ifort  F77=ifort \
>>>   LDFLAGS='-Wl,-z,noexecstack' \
>>>   CFLAGS='-O2 -g -fPIC' \
>>>   CXXFLAGS='-O2 -g -fPIC' \
>>>   FFLAGS='-O2 -g -fPIC' \
>>>   ./configure --prefix=$prefix \
>>>   --with-libnuma=/usr \
>>>   --with-openib=/usr \
>>>   --with-udapl \
>>>   --enable-mpirun-prefix-by-default \
>>>   --without-tm
>>> I did't find the behavior I saw strange, given that orterun(1) talks only 
>>> about $OPMI_* and inheritance from the remote shell.  It also mentions a 
>>> "boot MCA module", about which I couldn't find much on open-mpi.org - hmm.
>>> In the meantime, I did find a possible solution, namely, to tell ssh to 
>>> pass a variable using SendEnv/AcceptEnv.  That variable is then seen by and 
>>> can be interpreted (cautiously) in /etc/profile.d/ scripts.  A user could 
>>> set it in the job file (or even qalter it post submission):
>>> #PBS -v VARNAME=foo:bar:baz
>>> For VARNAME, I think simply "MODULES" or "EXTRAMODULES" could do.
>>> With best regards,
>>> Michael
>>> On Nov 17, 2009, at 4:29 , David Singleton wrote:
 I'm not sure why you dont see Open MPI behaving like other MPI's w.r.t.
 modules/environment on remote MPI tasks - we do.
 
 xe:~ > qsub -q express -lnodes=2:ppn=8,walltime=10:00,vmem=2gb -I
 qsub: waiting for job 376366.xepbs to start
 qsub: job 376366.xepbs ready
 
 [dbs900@x27 ~]$ module load openmpi
 [dbs900@x27 ~]$ mpirun -n 2 --bynode hostname
 x27
 x28
 [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep FOO
 [dbs900@x27 ~]$ setenv FOO BAR
 [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep FOO
 FOO=BAR
 FOO=BAR
 [dbs900@x27 ~]$ mpirun -n 2 --bynode env | gre

Re: [OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)

2009-11-17 Thread Ralph Castain
Not exactly. It completely depends on how Torque was setup - OMPI isn't 
forwarding the environment. Torque is.

We made a design decision at the very beginning of the OMPI project not to 
forward non-OMPI envars unless directed to do so by the user. I'm afraid I 
disagree with Michael's claim that other MPIs do forward them - yes, MPICH 
does, but not all others do.

The world is bigger than MPICH and OMPI :-)

Since there is inconsistency in this regard between MPIs, we chose not to 
forward. Reason was simple: there is no way to know what is safe to forward vs 
what is not (e.g., what to do with DISPLAY), nor what the underlying 
environment is trying to forward vs what it isn't. It is very easy to get 
cross-wise and cause totally unexpected behavior, as users have complained 
about for years.

First, if you are using a managed environment like Torque, we recommend that 
you work with your sys admin to decide how to configure it. This is the best 
way to resolve a problem.

Second, if you are not using a managed environment and/or decide not to have 
that environment do the forwarding, you can tell OMPI to forward the envars you 
need by specifying them via the -x cmd line option. We already have a request 
to expand this capability, and I will be doing so as time permits. One option 
I'll be adding is the reverse of -x - i.e., "forward all envars -except- the 
specified one(s)".

HTH
ralph

On Nov 17, 2009, at 5:55 AM, David Singleton wrote:

> 
> I can see the difference - we built Open MPI with tm support.  For some
> reason, I thought mpirun fed its environment to orted (after orted is
> launched) so orted can pass it on to MPI tasks.  That should be portable
> between different launch mechanisms.  But it looks like tm launches
> orted with the full mpirun environment (at the request of mpirun).
> 
> Cheers,
> David
> 
> 
> Michael Sternberg wrote:
>> Hi David,
>> Hmm, your demo is well-chosen and crystal-clear, yet the output is 
>> unexpected.  I do not see environment vars passed by default here:
>> login3$ qsub -l nodes=2:ppn=1 -I
>> qsub: waiting for job 34683.mds01 to start
>> qsub: job 34683.mds01 ready
>> n102$ mpirun -n 2 -machinefile $PBS_NODEFILE hostname
>> n102
>> n085
>> n102$ mpirun -n 2 -machinefile $PBS_NODEFILE env | grep FOO
>> n102$ export FOO=BAR
>> n102$ mpirun -n 2 -machinefile $PBS_NODEFILE env | grep FOO
>> FOO=BAR
>> n102$ type mpirun
>> mpirun is hashed (/opt/soft/openmpi-1.3.2-intel10-1/bin/mpirun)
>> Curious, what do you get upon:
>>  where mpirun
>> I built OpenMPI-1.3.2 here from source with:
>>CC=icc  CXX=icpc  FC=ifort  F77=ifort \
>>LDFLAGS='-Wl,-z,noexecstack' \
>>CFLAGS='-O2 -g -fPIC' \
>>CXXFLAGS='-O2 -g -fPIC' \
>>FFLAGS='-O2 -g -fPIC' \
>>./configure --prefix=$prefix \
>>--with-libnuma=/usr \
>>--with-openib=/usr \
>>--with-udapl \
>>--enable-mpirun-prefix-by-default \
>>--without-tm
>> I did't find the behavior I saw strange, given that orterun(1) talks only 
>> about $OPMI_* and inheritance from the remote shell.  It also mentions a 
>> "boot MCA module", about which I couldn't find much on open-mpi.org - hmm.
>> In the meantime, I did find a possible solution, namely, to tell ssh to pass 
>> a variable using SendEnv/AcceptEnv.  That variable is then seen by and can 
>> be interpreted (cautiously) in /etc/profile.d/ scripts.  A user could set it 
>> in the job file (or even qalter it post submission):
>>  #PBS -v VARNAME=foo:bar:baz
>> For VARNAME, I think simply "MODULES" or "EXTRAMODULES" could do.
>> With best regards,
>> Michael
>> On Nov 17, 2009, at 4:29 , David Singleton wrote:
>>> I'm not sure why you dont see Open MPI behaving like other MPI's w.r.t.
>>> modules/environment on remote MPI tasks - we do.
>>> 
>>> xe:~ > qsub -q express -lnodes=2:ppn=8,walltime=10:00,vmem=2gb -I
>>> qsub: waiting for job 376366.xepbs to start
>>> qsub: job 376366.xepbs ready
>>> 
>>> [dbs900@x27 ~]$ module load openmpi
>>> [dbs900@x27 ~]$ mpirun -n 2 --bynode hostname
>>> x27
>>> x28
>>> [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep FOO
>>> [dbs900@x27 ~]$ setenv FOO BAR
>>> [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep FOO
>>> FOO=BAR
>>> FOO=BAR
>>> [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep amber
>>> [dbs900@x27 ~]$ module load amber
>>> [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep amber
>>> LOADEDMODULES=openmpi/1.3.3:amber/9
>>> PATH=/apps/openmpi/1.3.3/bin:/home/900/dbs900/bin:/bin:/usr/bin::/opt/bin:/usr/X11R6/bin:/opt/pbs/bin:/sbin:/usr/sbin:/apps/amber/9/exe
>>> _LMFILES_=/apps/Modules/modulefiles/openmpi/1.3.3:/apps/Modules/modulefiles/amber/9
>>> AMBERHOME=/apps/amber/9
>>> LOADEDMODULES=openmpi/1.3.3:amber/9
>>> PATH=/apps/openmpi/1.3.3/bin:/home/900/dbs900/bin:/bin:/usr/bin:/opt/bin:/usr/X11R6/bin:/opt/pbs/bin:/sbin:/usr/sbin:/apps/amber/9/exe
>>> _LMFILES_=/apps/Modules/modulefiles/openmpi/1.3.3:/apps/Modules/modulefiles/amber/9
>>> AMBERHOME=/apps/amber/9
>>> 
>>> David
>

Re: [OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)

2009-11-17 Thread David Singleton


I can see the difference - we built Open MPI with tm support.  For some
reason, I thought mpirun fed its environment to orted (after orted is
launched) so orted can pass it on to MPI tasks.  That should be portable
between different launch mechanisms.  But it looks like tm launches
orted with the full mpirun environment (at the request of mpirun).

Cheers,
David


Michael Sternberg wrote:

Hi David,

Hmm, your demo is well-chosen and crystal-clear, yet the output is unexpected.  
I do not see environment vars passed by default here:


login3$ qsub -l nodes=2:ppn=1 -I
qsub: waiting for job 34683.mds01 to start
qsub: job 34683.mds01 ready

n102$ mpirun -n 2 -machinefile $PBS_NODEFILE hostname
n102
n085
n102$ mpirun -n 2 -machinefile $PBS_NODEFILE env | grep FOO
n102$ export FOO=BAR
n102$ mpirun -n 2 -machinefile $PBS_NODEFILE env | grep FOO
FOO=BAR
n102$ type mpirun
mpirun is hashed (/opt/soft/openmpi-1.3.2-intel10-1/bin/mpirun)


Curious, what do you get upon:

where mpirun


I built OpenMPI-1.3.2 here from source with:

CC=icc  CXX=icpc  FC=ifort  F77=ifort \
LDFLAGS='-Wl,-z,noexecstack' \
CFLAGS='-O2 -g -fPIC' \
CXXFLAGS='-O2 -g -fPIC' \
FFLAGS='-O2 -g -fPIC' \
./configure --prefix=$prefix \
--with-libnuma=/usr \
--with-openib=/usr \
--with-udapl \
--enable-mpirun-prefix-by-default \
--without-tm


I did't find the behavior I saw strange, given that orterun(1) talks only about $OPMI_* 
and inheritance from the remote shell.  It also mentions a "boot MCA module", 
about which I couldn't find much on open-mpi.org - hmm.


In the meantime, I did find a possible solution, namely, to tell ssh to pass a 
variable using SendEnv/AcceptEnv.  That variable is then seen by and can be 
interpreted (cautiously) in /etc/profile.d/ scripts.  A user could set it in 
the job file (or even qalter it post submission):

#PBS -v VARNAME=foo:bar:baz

For VARNAME, I think simply "MODULES" or "EXTRAMODULES" could do.


With best regards,
Michael



On Nov 17, 2009, at 4:29 , David Singleton wrote:

I'm not sure why you dont see Open MPI behaving like other MPI's w.r.t.
modules/environment on remote MPI tasks - we do.

xe:~ > qsub -q express -lnodes=2:ppn=8,walltime=10:00,vmem=2gb -I
qsub: waiting for job 376366.xepbs to start
qsub: job 376366.xepbs ready

[dbs900@x27 ~]$ module load openmpi
[dbs900@x27 ~]$ mpirun -n 2 --bynode hostname
x27
x28
[dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep FOO
[dbs900@x27 ~]$ setenv FOO BAR
[dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep FOO
FOO=BAR
FOO=BAR
[dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep amber
[dbs900@x27 ~]$ module load amber
[dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep amber
LOADEDMODULES=openmpi/1.3.3:amber/9
PATH=/apps/openmpi/1.3.3/bin:/home/900/dbs900/bin:/bin:/usr/bin::/opt/bin:/usr/X11R6/bin:/opt/pbs/bin:/sbin:/usr/sbin:/apps/amber/9/exe
_LMFILES_=/apps/Modules/modulefiles/openmpi/1.3.3:/apps/Modules/modulefiles/amber/9
AMBERHOME=/apps/amber/9
LOADEDMODULES=openmpi/1.3.3:amber/9
PATH=/apps/openmpi/1.3.3/bin:/home/900/dbs900/bin:/bin:/usr/bin:/opt/bin:/usr/X11R6/bin:/opt/pbs/bin:/sbin:/usr/sbin:/apps/amber/9/exe
_LMFILES_=/apps/Modules/modulefiles/openmpi/1.3.3:/apps/Modules/modulefiles/amber/9
AMBERHOME=/apps/amber/9

David


Michael Sternberg wrote:

Dear readers,
With OpenMPI, how would one go about requesting to load environment modules (of 
the http://modules.sourceforge.net/ kind) on remote nodes, augmenting those  
normally loaded there by shell dotfiles?
Background:
I run a RHEL-5/CentOS-5 cluster.  I load a bunch of default modules through 
/etc/profile.d/ and recommend to users to customize modules in ~/.bashrc.  A 
problem arises for PBS jobs which might need job-specific modules, e.g., to 
pick a specific flavor of an application.  With other MPI implementations 
(ahem) which export all (or judiciously nearly all) environment variables by 
default, you can say:
#PBS ...
module load foo # not for OpenMPI
mpirun -np 42 ... \
bar-app
Not so with OpenMPI - any such customization is only effective for processes on 
the master (=local) node of the job, and any variables changed by a given 
module would have to be specifically passed via mpirun -x VARNAME.   On the 
remote nodes, those variables are not available in the dotfiles because they 
are passed only once orted is live (after dotfile processing by the shell), 
which then immediately spawns the application binaries (right?)
I thought along the following lines:
(1) I happen to run Lustre, which would allow writing a file coherently across 
nodes prior to mpirun, and thus hook into the shell dotfile processing, but 
that seems rather crude.
(2) "mpirun -x PATH -x LD_LIBRARY_PATH …" would take care of a lot, but is not 
really general.
Is there a recommended way?
regards,
Michael

___
users mailing list
us...@open-mpi.org
http://www.open-

Re: [OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)

2009-11-17 Thread Michael Sternberg
Hi David,

Hmm, your demo is well-chosen and crystal-clear, yet the output is unexpected.  
I do not see environment vars passed by default here:


login3$ qsub -l nodes=2:ppn=1 -I
qsub: waiting for job 34683.mds01 to start
qsub: job 34683.mds01 ready

n102$ mpirun -n 2 -machinefile $PBS_NODEFILE hostname
n102
n085
n102$ mpirun -n 2 -machinefile $PBS_NODEFILE env | grep FOO
n102$ export FOO=BAR
n102$ mpirun -n 2 -machinefile $PBS_NODEFILE env | grep FOO
FOO=BAR
n102$ type mpirun
mpirun is hashed (/opt/soft/openmpi-1.3.2-intel10-1/bin/mpirun)


Curious, what do you get upon:

where mpirun


I built OpenMPI-1.3.2 here from source with:

CC=icc  CXX=icpc  FC=ifort  F77=ifort \
LDFLAGS='-Wl,-z,noexecstack' \
CFLAGS='-O2 -g -fPIC' \
CXXFLAGS='-O2 -g -fPIC' \
FFLAGS='-O2 -g -fPIC' \
./configure --prefix=$prefix \
--with-libnuma=/usr \
--with-openib=/usr \
--with-udapl \
--enable-mpirun-prefix-by-default \
--without-tm


I did't find the behavior I saw strange, given that orterun(1) talks only about 
$OPMI_* and inheritance from the remote shell.  It also mentions a "boot MCA 
module", about which I couldn't find much on open-mpi.org - hmm.


In the meantime, I did find a possible solution, namely, to tell ssh to pass a 
variable using SendEnv/AcceptEnv.  That variable is then seen by and can be 
interpreted (cautiously) in /etc/profile.d/ scripts.  A user could set it in 
the job file (or even qalter it post submission):

#PBS -v VARNAME=foo:bar:baz

For VARNAME, I think simply "MODULES" or "EXTRAMODULES" could do.


With best regards,
Michael



On Nov 17, 2009, at 4:29 , David Singleton wrote:
> 
> I'm not sure why you dont see Open MPI behaving like other MPI's w.r.t.
> modules/environment on remote MPI tasks - we do.
> 
> xe:~ > qsub -q express -lnodes=2:ppn=8,walltime=10:00,vmem=2gb -I
> qsub: waiting for job 376366.xepbs to start
> qsub: job 376366.xepbs ready
> 
> [dbs900@x27 ~]$ module load openmpi
> [dbs900@x27 ~]$ mpirun -n 2 --bynode hostname
> x27
> x28
> [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep FOO
> [dbs900@x27 ~]$ setenv FOO BAR
> [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep FOO
> FOO=BAR
> FOO=BAR
> [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep amber
> [dbs900@x27 ~]$ module load amber
> [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep amber
> LOADEDMODULES=openmpi/1.3.3:amber/9
> PATH=/apps/openmpi/1.3.3/bin:/home/900/dbs900/bin:/bin:/usr/bin::/opt/bin:/usr/X11R6/bin:/opt/pbs/bin:/sbin:/usr/sbin:/apps/amber/9/exe
> _LMFILES_=/apps/Modules/modulefiles/openmpi/1.3.3:/apps/Modules/modulefiles/amber/9
> AMBERHOME=/apps/amber/9
> LOADEDMODULES=openmpi/1.3.3:amber/9
> PATH=/apps/openmpi/1.3.3/bin:/home/900/dbs900/bin:/bin:/usr/bin:/opt/bin:/usr/X11R6/bin:/opt/pbs/bin:/sbin:/usr/sbin:/apps/amber/9/exe
> _LMFILES_=/apps/Modules/modulefiles/openmpi/1.3.3:/apps/Modules/modulefiles/amber/9
> AMBERHOME=/apps/amber/9
> 
> David
> 
> 
> Michael Sternberg wrote:
>> Dear readers,
>> With OpenMPI, how would one go about requesting to load environment modules 
>> (of the http://modules.sourceforge.net/ kind) on remote nodes, augmenting 
>> those  normally loaded there by shell dotfiles?
>> Background:
>> I run a RHEL-5/CentOS-5 cluster.  I load a bunch of default modules through 
>> /etc/profile.d/ and recommend to users to customize modules in ~/.bashrc.  A 
>> problem arises for PBS jobs which might need job-specific modules, e.g., to 
>> pick a specific flavor of an application.  With other MPI implementations 
>> (ahem) which export all (or judiciously nearly all) environment variables by 
>> default, you can say:
>>  #PBS ...
>>  module load foo # not for OpenMPI
>>  mpirun -np 42 ... \
>>  bar-app
>> Not so with OpenMPI - any such customization is only effective for processes 
>> on the master (=local) node of the job, and any variables changed by a given 
>> module would have to be specifically passed via mpirun -x VARNAME.   On the 
>> remote nodes, those variables are not available in the dotfiles because they 
>> are passed only once orted is live (after dotfile processing by the shell), 
>> which then immediately spawns the application binaries (right?)
>> I thought along the following lines:
>> (1) I happen to run Lustre, which would allow writing a file coherently 
>> across nodes prior to mpirun, and thus hook into the shell dotfile 
>> processing, but that seems rather crude.
>> (2) "mpirun -x PATH -x LD_LIBRARY_PATH …" would take care of a lot, but is 
>> not really general.
>> Is there a recommended way?
>> regards,
>> Michael
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)

2009-11-17 Thread David Singleton


Hi Michael,

I'm not sure why you dont see Open MPI behaving like other MPI's w.r.t.
modules/environment on remote MPI tasks - we do.

xe:~ > qsub -q express -lnodes=2:ppn=8,walltime=10:00,vmem=2gb -I
qsub: waiting for job 376366.xepbs to start
qsub: job 376366.xepbs ready

[dbs900@x27 ~]$ module load openmpi
[dbs900@x27 ~]$ mpirun -n 2 --bynode hostname
x27
x28
[dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep FOO
[dbs900@x27 ~]$ setenv FOO BAR
[dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep FOO
FOO=BAR
FOO=BAR
[dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep amber
[dbs900@x27 ~]$ module load amber
[dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep amber
LOADEDMODULES=openmpi/1.3.3:amber/9
PATH=/apps/openmpi/1.3.3/bin:/home/900/dbs900/bin:/bin:/usr/bin::/opt/bin:/usr/X11R6/bin:/opt/pbs/bin:/sbin:/usr/sbin:/apps/amber/9/exe
_LMFILES_=/apps/Modules/modulefiles/openmpi/1.3.3:/apps/Modules/modulefiles/amber/9
AMBERHOME=/apps/amber/9
LOADEDMODULES=openmpi/1.3.3:amber/9
PATH=/apps/openmpi/1.3.3/bin:/home/900/dbs900/bin:/bin:/usr/bin:/opt/bin:/usr/X11R6/bin:/opt/pbs/bin:/sbin:/usr/sbin:/apps/amber/9/exe
_LMFILES_=/apps/Modules/modulefiles/openmpi/1.3.3:/apps/Modules/modulefiles/amber/9
AMBERHOME=/apps/amber/9

David


Michael Sternberg wrote:

Dear readers,

With OpenMPI, how would one go about requesting to load environment modules (of 
the http://modules.sourceforge.net/ kind) on remote nodes, augmenting those  
normally loaded there by shell dotfiles?


Background:

I run a RHEL-5/CentOS-5 cluster.  I load a bunch of default modules through 
/etc/profile.d/ and recommend to users to customize modules in ~/.bashrc.  A 
problem arises for PBS jobs which might need job-specific modules, e.g., to 
pick a specific flavor of an application.  With other MPI implementations 
(ahem) which export all (or judiciously nearly all) environment variables by 
default, you can say:

#PBS ...

module load foo # not for OpenMPI

mpirun -np 42 ... \
bar-app

Not so with OpenMPI - any such customization is only effective for processes on 
the master (=local) node of the job, and any variables changed by a given 
module would have to be specifically passed via mpirun -x VARNAME.   On the 
remote nodes, those variables are not available in the dotfiles because they 
are passed only once orted is live (after dotfile processing by the shell), 
which then immediately spawns the application binaries (right?)

I thought along the following lines:

(1) I happen to run Lustre, which would allow writing a file coherently across 
nodes prior to mpirun, and thus hook into the shell dotfile processing, but 
that seems rather crude.

(2) "mpirun -x PATH -x LD_LIBRARY_PATH …" would take care of a lot, but is not 
really general.

Is there a recommended way?


regards,
Michael



[OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)

2009-11-17 Thread Michael Sternberg
Dear readers,

With OpenMPI, how would one go about requesting to load environment modules (of 
the http://modules.sourceforge.net/ kind) on remote nodes, augmenting those  
normally loaded there by shell dotfiles?


Background:

I run a RHEL-5/CentOS-5 cluster.  I load a bunch of default modules through 
/etc/profile.d/ and recommend to users to customize modules in ~/.bashrc.  A 
problem arises for PBS jobs which might need job-specific modules, e.g., to 
pick a specific flavor of an application.  With other MPI implementations 
(ahem) which export all (or judiciously nearly all) environment variables by 
default, you can say:

#PBS ...

module load foo # not for OpenMPI

mpirun -np 42 ... \
bar-app

Not so with OpenMPI - any such customization is only effective for processes on 
the master (=local) node of the job, and any variables changed by a given 
module would have to be specifically passed via mpirun -x VARNAME.   On the 
remote nodes, those variables are not available in the dotfiles because they 
are passed only once orted is live (after dotfile processing by the shell), 
which then immediately spawns the application binaries (right?)

I thought along the following lines:

(1) I happen to run Lustre, which would allow writing a file coherently across 
nodes prior to mpirun, and thus hook into the shell dotfile processing, but 
that seems rather crude.

(2) "mpirun -x PATH -x LD_LIBRARY_PATH …" would take care of a lot, but is not 
really general.

Is there a recommended way?


regards,
Michael