I'm embarrassed to admit that I never actually implemented the xml
option for tag-output...this has been rectified with r20302.
Let me know if that works for you - sorry for confusion.
Ralph
On Jan 20, 2009, at 8:08 AM, Greg Watson wrote:
Ralph,
The encapsulation is not quite right yet. I'm seeing this:
[1,0]<stdout>n = 0
[1,1]<stdout>n = 0
but it should be:
<stdout rank="0">n = 0</stdout>
<stdout rank="1">n = 0</stdout>
Thanks,
Greg
On Jan 20, 2009, at 9:20 AM, Ralph Castain wrote:
You need to add --tag-output - this is a separate option as it
applies both to xml and non-xml situations.
If you like, I can force tag-output "on" by default whenever -xml
is specified.
Ralph
On Jan 16, 2009, at 12:52 PM, Greg Watson wrote:
Ralph,
Is there something I need to do to enable stdout/err encapsulation
(apart from -xml)? Here's what I see:
$ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np
5 /Users/greg/Documents/workspace1/testMPI/Debug/testMPI
<map>
<host name="Jarrah.local" slots="8" max_slots="0">
<noderesolve resolved="node0"/>
<noderesolve resolved="node1"/>
<noderesolve resolved="node2"/>
<noderesolve resolved="node3"/>
<noderesolve resolved="node4"/>
<noderesolve resolved="node5"/>
<noderesolve resolved="node6"/>
<noderesolve resolved="node7"/>
<process rank="0"/>
<process rank="1"/>
<process rank="2"/>
<process rank="3"/>
<process rank="4"/>
</host>
</map>
n = 0
n = 0
n = 0
n = 0
n = 0
On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:
Okay, it is in the trunk as of r20284 - I'll file the request to
have it moved to 1.3.1.
Let me know if you get a chance to test the stdout/err stuff in
the trunk - we should try and iterate it so any changes can make
1.3.1 as well.
Thanks!
Ralph
On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:
Ralph,
I think the second form would be ideal and would simplify things
greatly.
Greg
On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:
Here is what I was able to do - note that the resolve messages
are associated with the specific hostname, not the overall map:
<map>
<host name="graywolf54.lanl.gov" slots="1" max_slots="0">
<noderesolve name="graywolf54.lanl.gov" resolved="localhost"/>
<process rank="0"/>
<process rank="1"/>
<process rank="2"/>
</host>
</map>
Will that work for you? If you like, I can remove the name=
field from the noderesolve element since the info is specific
to the host element that contains it. In other words, I can
make it look like this:
<map>
<host name="graywolf54.lanl.gov" slots="1" max_slots="0">
<noderesolve resolved="localhost"/>
<process rank="0"/>
<process rank="1"/>
<process rank="2"/>
</host>
</map>
if that would help.
Ralph
On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:
We -may- be able to do a more formal XML output at some point.
The problem will be the natural interleaving of stdout/err
from the various procs due to the async behavior of MPI.
Mpirun receives fragmented output in the forwarding system,
limited by the buffer sizes and the amount of data we can read
at any one "bite" from the pipes connecting us to the procs.
So even though the user -thinks- they output a single large
line of stuff, it may show up at mpirun as a series of
fragments. Hence, it gets tricky to know how to put
appropriate XML brackets around it.
Given this input about when you actually want resolved name
info, I can at least do something about that area. Won't be in
1.3.0, but should make 1.3.1.
As for XML-tagged stdout/err: the OMPI community asked me not
to turn that feature "on" for 1.3.0 as they felt it hasn't
been adequately tested yet. The code is present, but cannot be
activated in 1.3.0. However, I believe it is activated on the
trunk when you do --xml --tagged-output, so perhaps some
testing will help us debug and validate it adequately for 1.3.1?
Thanks
Ralph
On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:
Ralph,
The only time we use the resolved names is when we get a map,
so we consider them part of the map output.
If quasi-XML is all that will ever be possible with 1.3, then
you may as well leave as-is and we will attempt to clean it
up in Eclipse. It would be nice if a future version of ompi
could output correct XML (including stdout) as this would
vastly simplify the parsing we need to do.
Regards,
Greg
On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:
Hmmm...well, I can't do either for 1.3.0 as it is departing
this afternoon.
The first option would be very hard to do. I would have to
expose the display-map option across the code base and check
it prior to printing anything about resolving node names. I
guess I should ask: do you only want noderesolve statements
when we are displaying the map? Right now, I will output
them regardless.
The second option could be done. I could check if any
"display" option has been specified, and output the <ompi>
root at that time (likewise for the end). Anything we output
in-between would be encapsulated between the two, but that
would include any user output to stdout and/or stderr -
which for 1.3.0 is not in xml.
Any thoughts?
Ralph
PS. Guess I should clarify that I was not striving for true
XML interaction here, but rather a quasi-XML format that
would help you to filter the output. I have no problem
trying to get to something more formally correct, but it
could be tricky in some places to achieve it due to the
inherent async nature of the beast.
On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:
Ralph,
The XML is looking better now, but there is still one
problem. To be valid, there needs to be only one root
element, but currently you don't have any (or many). So
rather than:
<noderesolve name="node0" resolved="Jarrah.local"/>
<noderesolve name="node1" resolved="Jarrah.local"/>
<map>
<host name="Jarrah.local" slots="8" max_slots="0">
<process rank="0"/>
<process rank="1"/>
<process rank="2"/>
<process rank="3"/>
<process rank="4"/>
</host>
</map>
the XML should be:
<map>
<noderesolve name="node0" resolved="Jarrah.local"/>
<noderesolve name="node1" resolved="Jarrah.local"/>
<host name="Jarrah.local" slots="8" max_slots="0">
<process rank="0"/>
<process rank="1"/>
<process rank="2"/>
<process rank="3"/>
<process rank="4"/>
</host>
</map>
or:
<ompi>
<noderesolve name="node0" resolved="Jarrah.local"/>
<noderesolve name="node1" resolved="Jarrah.local"/>
<map>
<host name="Jarrah.local" slots="8" max_slots="0">
<process rank="0"/>
<process rank="1"/>
<process rank="2"/>
<process rank="3"/>
<process rank="4"/>
</host>
</map>
</ompi>
Would either of these be possible?
Thanks,
Greg
On Dec 8, 2008, at 2:18 PM, Greg Watson wrote:
Ok thanks. I'll test from trunk in future.
Greg
On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote:
Working its way around the CMR process now.
Might be easier in the future if we could test/debug this
in the trunk, though. Otherwise, the CMR procedure will
fall behind and a fix might miss a release window.
Anyway, hopefully this one will make the 1.3.0 release
cutoff.
Thanks
Ralph
On Dec 8, 2008, at 9:56 AM, Greg Watson wrote:
Hi Ralph,
This is now in 1.3rc2, thanks. However there are a
couple of problems. Here is what I see:
[Jarrah.watson.ibm.com:58957] <noderesolve name="node0"
resolved="Jarrah.watson.ibm.com">
For some reason each line is prefixed with "[...]", any
idea why this is? Also the end tag should be "/>" not ">".
Thanks,
Greg
On Nov 24, 2008, at 3:06 PM, Greg Watson wrote:
Great, thanks. I'll take a look once it comes over to
1.3.
Cheers,
Greg
On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote:
Yo Greg
This is in the trunk as of r20032. I'll bring it over
to 1.3 in a few days.
I implemented it as another MCA param
"orte_show_resolved_nodenames" so you can actually get
the info as you execute the job, if you want. The xml
tag is "noderesolve" - let me know if you need any
changes.
Ralph
On Oct 22, 2008, at 11:55 AM, Greg Watson wrote:
Ralph,
I guess the issue for us is that we will have to run
two commands to get the information we need. One to
get the configuration information, such as version
and MCA parameters, and one to get the host
information, whereas it would seem more logical that
this should all be available via some kind of
"configuration discovery" command. I understand the
issue with supplying the hostfile though, so maybe
this just points at the need for us to separate
configuration information from the host information.
In any case, we'll work with what you think is best.
Greg
On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote:
Hmmm...just to be sure we are all clear on this. The
reason we proposed to use mpirun is that "hostfile"
has no meaning outside of mpirun. That's why
ompi_info can't do anything in this regard.
We have no idea what hostfile the user may specify
until we actually get the mpirun cmd line. They may
have specified a default-hostfile, but they could
also specify hostfiles for the individual
app_contexts. These may or may not include the node
upon which mpirun is executing.
So the only way to provide you with a separate
command to get a hostfile<->nodename mapping would
require you to provide us with the default-hostifle
and/or hostfile cmd line options just as if you were
issuing the mpirun cmd. We just wouldn't launch -
but it would be the exact equivalent of doing
"mpirun --do-not-launch".
Am I missing something? If so, please do correct me
- I would be happy to provide a tool if that would
make it easier. Just not sure what that tool would do.
Thanks
Ralph
On Oct 19, 2008, at 1:59 PM, Greg Watson wrote:
Ralph,
It seems a little strange to be using mpirun for
this, but barring providing a separate command, or
using ompi_info, I think this would solve our
problem.
Thanks,
Greg
On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote:
Sorry for delay - had to ponder this one for awhile.
Jeff and I agree that adding something to
ompi_info would not be a good idea. Ompi_info has
no knowledge or understanding of hostfiles, and
adding that capability to it would be a major
distortion of its intended use.
However, we think we can offer an alternative that
might better solve the problem. Remember, we now
treat hostfiles in a very different manner than
before - see the wiki page for a complete
description, or "man orte_hosts".
So the problem is that, to provide you with what
you want, we need to "dump" the information from
whatever default-hostfile was provided, and, if no
default-hostfile was provided, then the
information from each hostfile that was provided
with an app_context.
The best way we could think of to do this is to
add another mpirun cmd line option --dump-
hostfiles that would output the line-by-line name
from the hostfile plus the name we resolved it to.
Of course, --xml would cause it to be in xml format.
Would that meet your needs?
Ralph
On Oct 15, 2008, at 3:12 PM, Greg Watson wrote:
Hi Ralph,
We've been discussing this back and forth a bit
internally and don't really see an easy solution.
Our problem is that Eclipse is not running on the
head node, so gethostbyname will not necessarily
resolve to the same address. For example, the
hostfile might refer to the head node by an
internal network address that is not visible to
the outside world. Since gethostname also looks
in /etc/hosts, it may resolve locally but not on
a remote system. The only think I can think of
would be, rather than us reading the hostfile
directly as we do now, to provide an option to
ompi_info that would dump the hostfile using the
same rules that you apply when you're using the
hostfile. Would that be feasible?
Greg
On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:
Sorry for delay - was on vacation and am now
trying to work my way back to the surface.
I'm not sure I can fix this one for two reasons:
1. In general, OMPI doesn't really care what
name is used for the node. However, the problem
is that it needs to be consistent. In this case,
ORTE has already used the name returned by
gethostname to create its session directory
structure long before mpirun reads a hostfile.
This is why we retain the value from gethostname
instead of allowing it to be overwritten by the
name in whatever allocation we are given. Using
the name in hostfile would require that I either
find some way to remember any prior name, or
that I tear down and rebuild the session
directory tree - neither seems attractive nor
simple (e.g., what happens when the user
provides multiple entries in the hostfile for
the node, each with a different IP address based
on another interface in that node? Sounds crazy,
but we have already seen it done - which one do
I use?).
2. We don't actually store the hostfile info
anywhere - we just use it and forget it. For us
to add an XML attribute containing any hostfile-
related info would therefore require us to re-
read the hostfile. I could have it do that -
only- in the case of "XML output required", but
it seems rather ugly.
An alternative might be for you to simply do a
"gethostbyname" lookup of the IP address or
hostname to see if it matches instead of just
doing a strcmp. This is what we have to do
internally as we frequently have problems with
FQDN vs. non-FQDN vs. IP addresses etc. If the
local OS hasn't cached the IP address for the
node in question it can take a little time to
DNS resolve it, but otherwise works fine.
I can point you to the code in OPAL that we use
- I would think something similar would be easy
to implement in your code and would readily
solve the problem.
Ralph
On Sep 19, 2008, at 7:18 AM, Greg Watson wrote:
Ralph,
The problem we're seeing is just with the head
node. If I specify a particular IP address for
the head node in the hostfile, it gets changed
to the FQDN when displayed in the map. This is
a problem for us as we need to be able to match
the two, and since we're not necessarily
running on the head node, we can't always do
the same resolution you're doing.
Would it be possible to use the same address
that is specified in the hostfile, or
alternatively provide an XML attribute that
contains this information?
Thanks,
Greg
On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote:
Not in that regard, depending upon what you
mean by "recently". The only changes I am
aware of wrt nodes consisted of some changes
to the order in which we use the nodes when
specified by hostfile or -host, and a little
#if protectionism needed by Brian for the Cray
port.
Are you seeing this for every node? Reason I
ask: I can't offhand think of anything in the
code base that would replace a host name with
the FQDN because we don't get that info for
remote nodes. The only exception is the head
node (where mpirun sits) - in that lone case,
we default to the name returned to us by
gethostname(). We do that because the head
node is frequently accessible on a more global
basis than the compute nodes - thus, the FQDN
is required to ensure that there is no address
confusion on the network.
If the user refers to compute nodes in a
hostfile or -host (or in an allocation from a
resource manager) by non-FQDN, we just assume
they know what they are doing and the name
will correctly resolve to a unique address.
On Sep 10, 2008, at 9:45 AM, Greg Watson wrote:
Hi,
Has there been a change in the behavior of
the -display-map option has changed recently
in the 1.3 branch. We're now seeing the host
name as a fully resolved DN rather than the
entry that was specified in the hostfile. Is
there any particular reason for this? If so,
would it be possible to add the hostfile
entry to the output since we need to be able
to match the two?
Thanks,
Greg
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/
devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel