Sorry for delay - had to ponder this one for awhile.
Jeff and I agree that adding something to ompi_info would not be a
good idea. Ompi_info has no knowledge or understanding of hostfiles,
and adding that capability to it would be a major distortion of its
intended use.
However, we think we can offer an alternative that might better solve
the problem. Remember, we now treat hostfiles in a very different
manner than before - see the wiki page for a complete description, or
"man orte_hosts".
So the problem is that, to provide you with what you want, we need to
"dump" the information from whatever default-hostfile was provided,
and, if no default-hostfile was provided, then the information from
each hostfile that was provided with an app_context.
The best way we could think of to do this is to add another mpirun cmd
line option --dump-hostfiles that would output the line-by-line name
from the hostfile plus the name we resolved it to. Of course, --xml
would cause it to be in xml format.
Would that meet your needs?
Ralph
On Oct 15, 2008, at 3:12 PM, Greg Watson wrote:
Hi Ralph,
We've been discussing this back and forth a bit internally and don't
really see an easy solution. Our problem is that Eclipse is not
running on the head node, so gethostbyname will not necessarily
resolve to the same address. For example, the hostfile might refer
to the head node by an internal network address that is not visible
to the outside world. Since gethostname also looks in /etc/hosts, it
may resolve locally but not on a remote system. The only think I can
think of would be, rather than us reading the hostfile directly as
we do now, to provide an option to ompi_info that would dump the
hostfile using the same rules that you apply when you're using the
hostfile. Would that be feasible?
Greg
On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:
Sorry for delay - was on vacation and am now trying to work my way
back to the surface.
I'm not sure I can fix this one for two reasons:
1. In general, OMPI doesn't really care what name is used for the
node. However, the problem is that it needs to be consistent. In
this case, ORTE has already used the name returned by gethostname
to create its session directory structure long before mpirun reads
a hostfile. This is why we retain the value from gethostname
instead of allowing it to be overwritten by the name in whatever
allocation we are given. Using the name in hostfile would require
that I either find some way to remember any prior name, or that I
tear down and rebuild the session directory tree - neither seems
attractive nor simple (e.g., what happens when the user provides
multiple entries in the hostfile for the node, each with a
different IP address based on another interface in that node?
Sounds crazy, but we have already seen it done - which one do I
use?).
2. We don't actually store the hostfile info anywhere - we just use
it and forget it. For us to add an XML attribute containing any
hostfile-related info would therefore require us to re-read the
hostfile. I could have it do that -only- in the case of "XML output
required", but it seems rather ugly.
An alternative might be for you to simply do a "gethostbyname"
lookup of the IP address or hostname to see if it matches instead
of just doing a strcmp. This is what we have to do internally as we
frequently have problems with FQDN vs. non-FQDN vs. IP addresses
etc. If the local OS hasn't cached the IP address for the node in
question it can take a little time to DNS resolve it, but otherwise
works fine.
I can point you to the code in OPAL that we use - I would think
something similar would be easy to implement in your code and would
readily solve the problem.
Ralph
On Sep 19, 2008, at 7:18 AM, Greg Watson wrote:
Ralph,
The problem we're seeing is just with the head node. If I specify
a particular IP address for the head node in the hostfile, it gets
changed to the FQDN when displayed in the map. This is a problem
for us as we need to be able to match the two, and since we're not
necessarily running on the head node, we can't always do the same
resolution you're doing.
Would it be possible to use the same address that is specified in
the hostfile, or alternatively provide an XML attribute that
contains this information?
Thanks,
Greg
On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote:
Not in that regard, depending upon what you mean by "recently".
The only changes I am aware of wrt nodes consisted of some
changes to the order in which we use the nodes when specified by
hostfile or -host, and a little #if protectionism needed by Brian
for the Cray port.
Are you seeing this for every node? Reason I ask: I can't offhand
think of anything in the code base that would replace a host name
with the FQDN because we don't get that info for remote nodes.
The only exception is the head node (where mpirun sits) - in that
lone case, we default to the name returned to us by
gethostname(). We do that because the head node is frequently
accessible on a more global basis than the compute nodes - thus,
the FQDN is required to ensure that there is no address confusion
on the network.
If the user refers to compute nodes in a hostfile or -host (or in
an allocation from a resource manager) by non-FQDN, we just
assume they know what they are doing and the name will correctly
resolve to a unique address.
On Sep 10, 2008, at 9:45 AM, Greg Watson wrote:
Hi,
Has there been a change in the behavior of the -display-map
option has changed recently in the 1.3 branch. We're now seeing
the host name as a fully resolved DN rather than the entry that
was specified in the hostfile. Is there any particular reason
for this? If so, would it be possible to add the hostfile entry
to the output since we need to be able to match the two?
Thanks,
Greg
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel