A quick and easy way to answer my question of slurm vs ompi: Just do "srun script-that-echos-hostname-and-gethostname". If you get the right hostnames, then OMPI is to blame, not slurm.
On Jan 22, 2010, at 8:07 AM, Ralph Castain wrote: > Hi Nadia > > That sounds like a bug in your SLURM config file - SLURM certainly doesn't > propagate "hostname" by default as that would definitely mess things up for > more than OMPI. > > Are you sure that SLURM is propagating the environment (something I have > never seen before)? Or is OMPI mistakenly picking it up and propagating it? > > On Jan 22, 2010, at 7:25 AM, Nadia Derbey wrote: > >> Hi, >> >> I'm wondering whether the HOSTNAME environment variable shouldn't be >> handled as a "special case" when the orted daemons launch the remote >> jobs. This particularly applies to batch schedulers where the caller's >> environment is copied to the remote job: we are inheriting a $HOSTNAME >> which is the name of the host mpirun was called from: >> >> I tried to run the following small test (see getenv.c in attachment - it >> substantially gets the hostname once through $HOSTNAME, and once through >> gethostname(2)): >> >> ------------ >> [derbeyn@pichu0 ~]$ hostname >> pichu0 >> [derbeyn@pichu0 ~]$ salloc -N 2 -p pichu mpirun ./getenv >> salloc: Granted job allocation 358789 >> Processor 0 of 2 on $HOSTNAME pichu0: Hello World >> Processor 0 of 2 on host pichu93: Hello World >> Processor 1 of 2 on $HOSTNAME pichu0: Hello World >> Processor 1 of 2 on host pichu94: Hello World >> salloc: Relinquishing job allocation 358789 >> ------------ >> >> Shouldn't we be getting the same value when using getenv("HOSTNAME") and >> gethsotname()? >> Applying the following small patch, we actually do. >> >> Regards, >> Nadia >> >> -------------- >> >> Do not propagate the HOSTNAME environment variable on remote hosts >> >> diff -r 4ab256be2a17 orte/orted/orted_main.c >> --- a/orte/orted/orted_main.c Wed Jan 20 16:45:07 2010 +0100 >> +++ b/orte/orted/orted_main.c Fri Jan 22 14:54:02 2010 +0100 >> @@ -299,12 +299,17 @@ int orte_daemon(int argc, char *argv[]) >> */ >> orte_launch_environ = opal_argv_copy(environ); >> >> + /* >> + * Set HOSTNAME to the actual hostname in order to avoid propagating >> + * the caller's HOSTNAME. >> + */ >> + gethostname(hostname, 100); >> + opal_setenv("HOSTNAME", hostname, true, &orte_launch_environ); >> >> /* if orte_daemon_debug is set, let someone know we are alive right >> * away just in case we have a problem along the way >> */ >> if (orted_globals.debug) { >> - gethostname(hostname, 100); >> fprintf(stderr, "Daemon was launched on %s - beginning to >> initialize\n", hostname); >> } >> >> <getenv.c>_______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >