Greg and I chatted on the phone about this. I now understand much better about what he is trying to do (short version: Eclipse is running on one machine, it is opening an ssh session to a remote machine and launching mpirun on that remote machine).

Results of the phone conversation (for the web archives):

- In the short term, there's a few remaining issues to be figured out. Ralph (who is now full-time at Cisco) may or may not have time to fix these in the near team. We (Open MPI) would happily review patches from others in this area if a solution is required before Ralph can get to it.

- In the long term, we came up with a "thinking outside the box" solution that seems to be *much* better (think 1.5 and beyond). I'll describe the scheme, but at the same time, I'll indicate that Cisco likely does not have time in the foreseeable future to implement it. Again, we would be happy to provide guidance to anyone who would want to implement it (e.g., IBM) and/or review patches.

-----

1. Currently, the Eclipse plugin is effectively executing "ssh <otherhost> mpirun ...". This has several advantages:
   - Use whatever the native OMPI is on <otherhost>
- No need for binary compatibility (i.e., version match of Eclipse plugin and remote OMPI installation)

2. The proposal is to change this to "ssh <otherhost> mpirun- proxy ..." where mpirun-proxy is a new executable that does the following: - fork/exec the real mpirun, making pipes to mpirun's stdin/stdout/ stderr
   - tell mpirun to not display any IOF output from MPI processes
   - tell mpirun to not display any show_help messages
- register to receive ORTE "events" (more below) via the ORTE comm library - register to receive IOF from all the MPI processes via the ORTE comm library - register to receive show_help messages from MPI processes via the ORTE comm library - upon receipt of specific events (e.g., determination of host/ node/process maps), output this data encased in a specific XML schema (e.g., a specific set of XML tags to encase each data item in the nodemap) to ssh's stdout - read output from mpirun's stdout/stderr, output it on ssh's stdout, encased in <stdout> / <stderr> (etc.) - read IOF from MPI processes and output them to ssh's stdout, encased in appropriate XML tagging - read show_help messages from MPI processes and output them to ssh's stdout, encased in appropriate XML tagging

--> Note that some of the above functionality already exists; its would just need to be marshaled together and used in some new logic. Other parts of the functionality do not exist and would need to be written (e.g., redirecting show_help messages to something other than the HNP).

3. Once #2 is done, remove all the XML processing from mpirun, libopen- rte, libmpi, and all OMPI plugins (since it's now all in mpirun-proxy).

-----

This functionality would accomplish the following:

- The code is distributed in Open MPI -- not Eclipse or an Eclipse plugin -- there's no additional compilation or linking step for the Eclipse plugin to talk to OMPI.

- The Eclipse plugin, which already checks the output from ompi_info, can know when to use this new functionality (ssh mpirun-proxy instead of mpirun).

- All the OMPI XML parsing can be centralized to the mpirun-proxy executable. This is a *huge* improvement over having XML sprinkled all over the OMPI code base, as it is now. Additionally, with this method, *all* OMPI output will be encased in XML before it is sent to the Eclipse plugin (via ssh's stdout). Today, we have "XML-lite" functionality in that "most" of OMPI's output is XML-ified, but there's oodles and oodles of corner cases where output is *not* XML- ified. The above proposal seems to be the best idea so far on how to address this issue in a holistic way (rather than adding a bunch more band-aids every time we find another output that isn't XML-ified).





On Sep 10, 2009, at 9:23 AM, Greg Watson wrote:

The most appealing thing about the XML option is that it just works
"out of the box." Using a library API invariably requires compiling an
agent or distributing pre-compiled binaries with all the associated
complications. We tried that in the dim past and it was pretty
unworkable. The other problem was that the API headers were not
installed by default, so users were forced to install local copies of
OMPI with development headers enabled. It was not a great end-user
experience.

Greg

On Sep 10, 2009, at 8:45 AM, Jeff Squyres wrote:

> Thinking about this a little more ...
>
> This all seems like Open MPI-specific functionality for Eclipse.  If
> that's the case, don't we have an ORTE tools communication library
> that could be used?  IIRC, it pretty much does exactly what you want
> and would be far less clumsy than trying to jury-rig sending XML
> down files/fd's/whatever.  I have dim recollections of the ORTE
> tools communication library API returning the data that you have
> asked for in data structures -- no parsing of XML at all (and, more
> importantly to us, no need to add all kinds of special code paths
> for wrapping our output in XML).
>
> If I'm right (and that's a big "if"!), is there a reason that this
> library is not attractive to you?
>
>
>
>
> On Sep 10, 2009, at 8:04 AM, Jeff Squyres wrote:
>
>> On Sep 9, 2009, at 12:17 PM, Ralph Castain wrote:
>>
>>> Hmmm....I never considered the possibility of output-filename being
>>> used that way. Interesting idea!
>>>
>>
>> That feels way weird to me -- for example, how do you know that
>> you're actually outputting to a tty?
>>
>> FWIW: +1 on the idea of writing to numbered fd's passed on the
>> command line.  It just "feels" like a more POSIX-ish way of doing
>> things...?  I guess I'm surprised that that would be difficult to
>> do from Java.
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>>
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
jsquy...@cisco.com

Reply via email to