A quick and easy way to answer my question of slurm vs ompi:

Just do "srun script-that-echos-hostname-and-gethostname". If you get the right 
hostnames, then OMPI is to blame, not slurm.

On Jan 22, 2010, at 8:07 AM, Ralph Castain wrote:

> Hi Nadia
> 
> That sounds like a bug in your SLURM config file - SLURM certainly doesn't 
> propagate "hostname" by default as that would definitely mess things up for 
> more than OMPI.
> 
> Are you sure that SLURM is propagating the environment (something I have 
> never seen before)? Or is OMPI mistakenly picking it up and propagating it?
> 
> On Jan 22, 2010, at 7:25 AM, Nadia Derbey wrote:
> 
>> Hi,
>> 
>> I'm wondering whether the HOSTNAME environment variable shouldn't be
>> handled as a "special case" when the orted daemons launch the remote
>> jobs. This particularly applies to batch schedulers where the caller's
>> environment is copied to the remote job: we are inheriting a $HOSTNAME
>> which is the name of the host mpirun was called from:
>> 
>> I tried to run the following small test (see getenv.c in attachment - it
>> substantially gets the hostname once through $HOSTNAME, and once through
>> gethostname(2)):
>> 
>> ------------
>> [derbeyn@pichu0 ~]$ hostname
>> pichu0
>> [derbeyn@pichu0 ~]$ salloc -N 2 -p pichu mpirun ./getenv
>> salloc: Granted job allocation 358789
>> Processor 0 of 2 on $HOSTNAME pichu0: Hello World
>> Processor 0 of 2 on host pichu93: Hello World
>> Processor 1 of 2 on $HOSTNAME pichu0: Hello World
>> Processor 1 of 2 on host pichu94: Hello World
>> salloc: Relinquishing job allocation 358789
>> ------------
>> 
>> Shouldn't we be getting the same value when using getenv("HOSTNAME") and 
>> gethsotname()?
>> Applying the following small patch, we actually do.
>> 
>> Regards,
>> Nadia
>> 
>> --------------
>> 
>> Do not propagate the HOSTNAME environment variable on remote hosts
>> 
>> diff -r 4ab256be2a17 orte/orted/orted_main.c
>> --- a/orte/orted/orted_main.c   Wed Jan 20 16:45:07 2010 +0100
>> +++ b/orte/orted/orted_main.c   Fri Jan 22 14:54:02 2010 +0100
>> @@ -299,12 +299,17 @@ int orte_daemon(int argc, char *argv[])
>>     */
>>    orte_launch_environ = opal_argv_copy(environ);
>> 
>> +    /*
>> +     * Set HOSTNAME to the actual hostname in order to avoid propagating
>> +     * the caller's HOSTNAME.
>> +     */
>> +    gethostname(hostname, 100);
>> +    opal_setenv("HOSTNAME", hostname, true, &orte_launch_environ);
>> 
>>    /* if orte_daemon_debug is set, let someone know we are alive right
>>     * away just in case we have a problem along the way
>>     */
>>    if (orted_globals.debug) {
>> -        gethostname(hostname, 100);
>>        fprintf(stderr, "Daemon was launched on %s - beginning to 
>> initialize\n", hostname);
>>    }
>> 
>> <getenv.c>_______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Reply via email to