On Fri, 2010-01-22 at 08:22 -0700, Ralph Castain wrote: > A quick and easy way to answer my question of slurm vs ompi: > > Just do "srun script-that-echos-hostname-and-gethostname". If you get the > right hostnames, then OMPI is to blame, not slurm. >
No, I'm not... Will check the configuration. Thanks a lot, Nadia > On Jan 22, 2010, at 8:07 AM, Ralph Castain wrote: > > > Hi Nadia > > > > That sounds like a bug in your SLURM config file - SLURM certainly doesn't > > propagate "hostname" by default as that would definitely mess things up for > > more than OMPI. > > > > Are you sure that SLURM is propagating the environment (something I have > > never seen before)? Or is OMPI mistakenly picking it up and propagating it? > > > > On Jan 22, 2010, at 7:25 AM, Nadia Derbey wrote: > > > >> Hi, > >> > >> I'm wondering whether the HOSTNAME environment variable shouldn't be > >> handled as a "special case" when the orted daemons launch the remote > >> jobs. This particularly applies to batch schedulers where the caller's > >> environment is copied to the remote job: we are inheriting a $HOSTNAME > >> which is the name of the host mpirun was called from: > >> > >> I tried to run the following small test (see getenv.c in attachment - it > >> substantially gets the hostname once through $HOSTNAME, and once through > >> gethostname(2)): > >> > >> ------------ > >> [derbeyn@pichu0 ~]$ hostname > >> pichu0 > >> [derbeyn@pichu0 ~]$ salloc -N 2 -p pichu mpirun ./getenv > >> salloc: Granted job allocation 358789 > >> Processor 0 of 2 on $HOSTNAME pichu0: Hello World > >> Processor 0 of 2 on host pichu93: Hello World > >> Processor 1 of 2 on $HOSTNAME pichu0: Hello World > >> Processor 1 of 2 on host pichu94: Hello World > >> salloc: Relinquishing job allocation 358789 > >> ------------ > >> > >> Shouldn't we be getting the same value when using getenv("HOSTNAME") and > >> gethsotname()? > >> Applying the following small patch, we actually do. > >> > >> Regards, > >> Nadia > >> > >> -------------- > >> > >> Do not propagate the HOSTNAME environment variable on remote hosts > >> > >> diff -r 4ab256be2a17 orte/orted/orted_main.c > >> --- a/orte/orted/orted_main.c Wed Jan 20 16:45:07 2010 +0100 > >> +++ b/orte/orted/orted_main.c Fri Jan 22 14:54:02 2010 +0100 > >> @@ -299,12 +299,17 @@ int orte_daemon(int argc, char *argv[]) > >> */ > >> orte_launch_environ = opal_argv_copy(environ); > >> > >> + /* > >> + * Set HOSTNAME to the actual hostname in order to avoid propagating > >> + * the caller's HOSTNAME. > >> + */ > >> + gethostname(hostname, 100); > >> + opal_setenv("HOSTNAME", hostname, true, &orte_launch_environ); > >> > >> /* if orte_daemon_debug is set, let someone know we are alive right > >> * away just in case we have a problem along the way > >> */ > >> if (orted_globals.debug) { > >> - gethostname(hostname, 100); > >> fprintf(stderr, "Daemon was launched on %s - beginning to > >> initialize\n", hostname); > >> } > >> > >> <getenv.c>_______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Nadia Derbey <nadia.der...@bull.net>