On Fri, 2010-01-22 at 08:22 -0700, Ralph Castain wrote:
> A quick and easy way to answer my question of slurm vs ompi:
> 
> Just do "srun script-that-echos-hostname-and-gethostname". If you get the 
> right hostnames, then OMPI is to blame, not slurm.
> 

No, I'm not...
Will check the configuration.

Thanks a lot,
Nadia

> On Jan 22, 2010, at 8:07 AM, Ralph Castain wrote:
> 
> > Hi Nadia
> > 
> > That sounds like a bug in your SLURM config file - SLURM certainly doesn't 
> > propagate "hostname" by default as that would definitely mess things up for 
> > more than OMPI.
> > 
> > Are you sure that SLURM is propagating the environment (something I have 
> > never seen before)? Or is OMPI mistakenly picking it up and propagating it?
> > 
> > On Jan 22, 2010, at 7:25 AM, Nadia Derbey wrote:
> > 
> >> Hi,
> >> 
> >> I'm wondering whether the HOSTNAME environment variable shouldn't be
> >> handled as a "special case" when the orted daemons launch the remote
> >> jobs. This particularly applies to batch schedulers where the caller's
> >> environment is copied to the remote job: we are inheriting a $HOSTNAME
> >> which is the name of the host mpirun was called from:
> >> 
> >> I tried to run the following small test (see getenv.c in attachment - it
> >> substantially gets the hostname once through $HOSTNAME, and once through
> >> gethostname(2)):
> >> 
> >> ------------
> >> [derbeyn@pichu0 ~]$ hostname
> >> pichu0
> >> [derbeyn@pichu0 ~]$ salloc -N 2 -p pichu mpirun ./getenv
> >> salloc: Granted job allocation 358789
> >> Processor 0 of 2 on $HOSTNAME pichu0: Hello World
> >> Processor 0 of 2 on host pichu93: Hello World
> >> Processor 1 of 2 on $HOSTNAME pichu0: Hello World
> >> Processor 1 of 2 on host pichu94: Hello World
> >> salloc: Relinquishing job allocation 358789
> >> ------------
> >> 
> >> Shouldn't we be getting the same value when using getenv("HOSTNAME") and 
> >> gethsotname()?
> >> Applying the following small patch, we actually do.
> >> 
> >> Regards,
> >> Nadia
> >> 
> >> --------------
> >> 
> >> Do not propagate the HOSTNAME environment variable on remote hosts
> >> 
> >> diff -r 4ab256be2a17 orte/orted/orted_main.c
> >> --- a/orte/orted/orted_main.c   Wed Jan 20 16:45:07 2010 +0100
> >> +++ b/orte/orted/orted_main.c   Fri Jan 22 14:54:02 2010 +0100
> >> @@ -299,12 +299,17 @@ int orte_daemon(int argc, char *argv[])
> >>     */
> >>    orte_launch_environ = opal_argv_copy(environ);
> >> 
> >> +    /*
> >> +     * Set HOSTNAME to the actual hostname in order to avoid propagating
> >> +     * the caller's HOSTNAME.
> >> +     */
> >> +    gethostname(hostname, 100);
> >> +    opal_setenv("HOSTNAME", hostname, true, &orte_launch_environ);
> >> 
> >>    /* if orte_daemon_debug is set, let someone know we are alive right
> >>     * away just in case we have a problem along the way
> >>     */
> >>    if (orted_globals.debug) {
> >> -        gethostname(hostname, 100);
> >>        fprintf(stderr, "Daemon was launched on %s - beginning to 
> >> initialize\n", hostname);
> >>    }
> >> 
> >> <getenv.c>_______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > 
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
-- 
Nadia Derbey <nadia.der...@bull.net>

Reply via email to