Re: [OMPI users] Honor host_aliases file for tight SGE integration

2017-09-15 Thread r...@open-mpi.org
Hi Reuti

As far as I am concerned, you SGE users “own” the SGE support - so feel free to 
submit a patch!

Ralph

> On Sep 13, 2017, at 9:10 AM, Reuti  wrote:
> 
> Hi,
> 
> I wonder whether it came ever to the discussion, that SGE can have a similar 
> behavior like Torque/PBS regarding the mangling of hostnames. It's similiar 
> to https://github.com/open-mpi/ompi/issues/2328, in the behavior that a node 
> can have multiple network interfaces and each has an unique name. SGE's 
> operation can be routed to a specific network interface by the use of a file:
> 
> $SGE_ROOT/$SGE_CELL/common/host_aliases
> 
> which has the format:
> 
>   
> 
> Hence in the generated $PE_HOSTFILE the name known to SGE is listed, although 
> the `hostname` command provides the real name. Open MPI would in this case 
> start a `qrsh -inherit …` call instead of forking, as it thinks that these 
> are different machines (assuming an allocation_rule of $PE_SLOTS so that the 
> `mpiexec` is supposed to be on the same machine as the started tasks).
> 
> I tried to go the "old" way to provide a start_proc_args to the PE to create 
> a symbolic link to `hostname` in $TMPDIR, so that inside the job script an 
> adjusted `hostname` call is available, but obviously Open MPI calls 
> gethostname() directly and not by an external binary.
> 
> So I mangled the hostname in the created machinefile in the jobscript to feed 
> an "adjusted" $PE_HOSTFILE to Open MPI and then it's working as intended: 
> Open MPI creates forks.
> 
> Does anyone else need such a patch in Open MPI and is it suitable to be 
> included?
> 
> -- Reuti
> 
> PS: Only the headnodes have more than one network interface in our case and 
> hence it's didn't come to my attention up to now, as now there was a need to 
> use also some cores on the headnodes. They are known internally to SGE as 
> "login" and "master", but the external names may be "foo" and "baz" which 
> gethostname() returns.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Honor host_aliases file for tight SGE integration

2017-09-13 Thread Reuti
Hi,

I wonder whether it came ever to the discussion, that SGE can have a similar 
behavior like Torque/PBS regarding the mangling of hostnames. It's similiar to 
https://github.com/open-mpi/ompi/issues/2328, in the behavior that a node can 
have multiple network interfaces and each has an unique name. SGE's operation 
can be routed to a specific network interface by the use of a file:

$SGE_ROOT/$SGE_CELL/common/host_aliases

which has the format:

  

Hence in the generated $PE_HOSTFILE the name known to SGE is listed, although 
the `hostname` command provides the real name. Open MPI would in this case 
start a `qrsh -inherit …` call instead of forking, as it thinks that these are 
different machines (assuming an allocation_rule of $PE_SLOTS so that the 
`mpiexec` is supposed to be on the same machine as the started tasks).

I tried to go the "old" way to provide a start_proc_args to the PE to create a 
symbolic link to `hostname` in $TMPDIR, so that inside the job script an 
adjusted `hostname` call is available, but obviously Open MPI calls 
gethostname() directly and not by an external binary.

So I mangled the hostname in the created machinefile in the jobscript to feed 
an "adjusted" $PE_HOSTFILE to Open MPI and then it's working as intended: Open 
MPI creates forks.

Does anyone else need such a patch in Open MPI and is it suitable to be 
included?

-- Reuti

PS: Only the headnodes have more than one network interface in our case and 
hence it's didn't come to my attention up to now, as now there was a need to 
use also some cores on the headnodes. They are known internally to SGE as 
"login" and "master", but the external names may be "foo" and "baz" which 
gethostname() returns.
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users