Sorry for delay - was on vacation and am now trying to work my way
back to the surface.
I'm not sure I can fix this one for two reasons:
1. In general, OMPI doesn't really care what name is used for the
node. However, the problem is that it needs to be consistent. In this
case, ORTE has already used the name returned by gethostname to create
its session directory structure long before mpirun reads a hostfile.
This is why we retain the value from gethostname instead of allowing
it to be overwritten by the name in whatever allocation we are given.
Using the name in hostfile would require that I either find some way
to remember any prior name, or that I tear down and rebuild the
session directory tree - neither seems attractive nor simple (e.g.,
what happens when the user provides multiple entries in the hostfile
for the node, each with a different IP address based on another
interface in that node? Sounds crazy, but we have already seen it done
- which one do I use?).
2. We don't actually store the hostfile info anywhere - we just use it
and forget it. For us to add an XML attribute containing any hostfile-
related info would therefore require us to re-read the hostfile. I
could have it do that -only- in the case of "XML output required", but
it seems rather ugly.
An alternative might be for you to simply do a "gethostbyname" lookup
of the IP address or hostname to see if it matches instead of just
doing a strcmp. This is what we have to do internally as we frequently
have problems with FQDN vs. non-FQDN vs. IP addresses etc. If the
local OS hasn't cached the IP address for the node in question it can
take a little time to DNS resolve it, but otherwise works fine.
I can point you to the code in OPAL that we use - I would think
something similar would be easy to implement in your code and would
readily solve the problem.
Ralph
On Sep 19, 2008, at 7:18 AM, Greg Watson wrote:
Ralph,
The problem we're seeing is just with the head node. If I specify a
particular IP address for the head node in the hostfile, it gets
changed to the FQDN when displayed in the map. This is a problem for
us as we need to be able to match the two, and since we're not
necessarily running on the head node, we can't always do the same
resolution you're doing.
Would it be possible to use the same address that is specified in
the hostfile, or alternatively provide an XML attribute that
contains this information?
Thanks,
Greg
On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote:
Not in that regard, depending upon what you mean by "recently". The
only changes I am aware of wrt nodes consisted of some changes to
the order in which we use the nodes when specified by hostfile or -
host, and a little #if protectionism needed by Brian for the Cray
port.
Are you seeing this for every node? Reason I ask: I can't offhand
think of anything in the code base that would replace a host name
with the FQDN because we don't get that info for remote nodes. The
only exception is the head node (where mpirun sits) - in that lone
case, we default to the name returned to us by gethostname(). We do
that because the head node is frequently accessible on a more
global basis than the compute nodes - thus, the FQDN is required to
ensure that there is no address confusion on the network.
If the user refers to compute nodes in a hostfile or -host (or in
an allocation from a resource manager) by non-FQDN, we just assume
they know what they are doing and the name will correctly resolve
to a unique address.
On Sep 10, 2008, at 9:45 AM, Greg Watson wrote:
Hi,
Has there been a change in the behavior of the -display-map option
has changed recently in the 1.3 branch. We're now seeing the host
name as a fully resolved DN rather than the entry that was
specified in the hostfile. Is there any particular reason for
this? If so, would it be possible to add the hostfile entry to the
output since we need to be able to match the two?
Thanks,
Greg
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel