I don't have a problem using a different interface, assuming it's adequately supported and provides the functionality we need. I presume the recursive behavior you're referring to is calling OMPI interfaces from the callback functions. Any event-based system has this issue, and it is usually solved by clearly specifying the allowable interfaces that can be called (possibly none). Since PTP doesn't call OMPI functions from callbacks, it's not a problem for us if no interfaces can be called.

The major missing features appear to be:

- Ability to request a process allocation without launching the job
- I/O forwarding callbacks

Without these, PTP support will be so limited that I'd be reluctant to say we support OMPI.

Greg

On Mar 4, 2008, at 4:50 PM, Ralph H Castain wrote:

It is buried deep-down in the thread, but I'll just reiterate it here. I have "restored" the ability to "subscribe" to changes in job, proc, and node state via OMPI's tool interface library. I have -not- checked this into the trunk yet, though, until the community has a chance to consider whether or
not it wants it.

Restoring the ability to have such changes "callback" to user functions raises the concern again about recursive behavior. We worked hard to remove
recursion from the code base, and it would be a concern to see it
potentially re-enter.

I realize there is some difference between ORTE calling back into itself vs calling back into a user-specified function. However, unless that user truly understands ORTE/OMPI and takes considerable precautions, it is very easy to
recreate the recursive behavior without intending to do so.

The tool interface library was built to accomplish two things:

1. help reduce the impact on external tools of changes to ORTE/OMPI
interfaces, and

2. provide a degree of separation to prevent the tool from inadvertently
causing OMPI to "behave badly"

I think we accomplished that - I would encourage you to at least consider using the library. If there is something missing, we can always add it.

Ralph



On 3/4/08 2:37 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

Greg --

I admit to being a bit puzzled here.  Ralph sent around RFCs about
these changes many months ago.  Everyone said they didn't want this
functionality -- it was seen as excess functionality that Open MPI
didn't want or need -- so it was all removed.

As such, I have to agree with Ralph that it is an "enhancement" to re-
add the functionality.  That being said, patches are always welcome!
IBM has signed the OMPI 3rd party contribution agreement, so it could
be contributed directly.

Sidenote: I was also under the impression that PTP was being re- geared
towards STCI and moving away from ORTE anyway.  Is this incorrect?



On Mar 4, 2008, at 3:24 PM, Greg Watson wrote:

Hi all,

Ralph informs me that significant functionality has been removed from ORTE in 1.3. Unfortunately this functionality was being used by PTP to
provide support for OMPI, and without it, it seems unlikely that PTP
will be able to work with 1.3. Apparently restoring this lost
functionality is an "enhancement" of 1.3, and so is something that
will not necessarily be done. Having worked with OMPI from a very
early stage to ensure that we were able to provide robust support, I
must say it is a bit disappointing that this approach is being taken. I hope that the community will view this "enhancement" as worthwhile.

Regards,

Greg

Begin forwarded message:


On 2/29/08 7:13 AM, "Gregory R Watson" <g...@us.ibm.com> wrote:



Ralph Castain <r...@lanl.gov> wrote on 02/29/2008 12:18:39 AM:

Ralph Castain <r...@lanl.gov>
02/29/08 12:18 AM

To

Gregory R Watson/Watson/IBM@IBMUS

cc

Subject

Re: OpenMPI changes

Hi Greg

All of the prior options (and some new ones) for spawning a job
are fully
supported in the new interface. Instead of setting them with
"attributes",
you create an orte_job_t object and just fill them in. This is
precisely how
mpirun does it - you can look at that code if you want an
example, though it
is somewhat complex. Alternatively, you can look at the way it is
done for
comm_spawn, which may be more analogous to your situation - that
code is in
ompi/mca/dpm/orte.

All the tools library does is communicate the job object to the
target
persistent daemon so it can do the work. This way, you don't have
to open
all the frameworks, deal directly with the plm interface, etc.

Alternatively, you are welcome to do a full orte_init and use the
frameworks
yourself - there is no requirement to use the library. I only
offer it as an
alternative.

As far as I can tell, neither API provides the same functionality
as that
available in 1.2. While this might be beneficial for OMPI-specific
activities,
the changes appear to severely limit the interaction of tools with
the
runtime. At this point, I can't see either interface supporting PTP.

I went ahead and added a notification capability to the system -
took about
30 minutes. I can provide notice of job and process state changes
since I
see those. Node state changes, however, are different - I can notify
on
them, but we have no way of seeing them. None of the environments we
support
tell us when a node fails.



I know that the tool library works because it uses the identical
APIs as
comm_spawn and mpirun. I have also tested them by building my own
tools.

There's a big difference being on a code path that *must* work
because it is
used by core components, to one that is provided as an add-on for
external
tools. I may be worrying needlessly if this new interface becomes an
"officially supported" API. Is that planned? At a minimum, it
seems like it's
going to complicate your testing process, since you're going to
need to
provide a separate set of tests that exercise this interface
independent of
the rest of OMPI.

It is an officially supported API. Testing is not as big a problem
as you
might expect since the library exercises the same code paths as
mpirun and
comm_spawn. Like I said, I have written my own tools that exercise
the
library - no problem using them as tests.



We do not launch an orted for any tool-library query. All we do is
communicate the query to the target persistent daemon or mpirun.
Those
entities have recv's posted to catch any incoming messages and
execute the
request.

You are correct that we no longer have event driven notification
in the
system. I repeatedly asked the community (on both devel and core
lists) for
input on that question, and received no indications that anyone
wanted it
supported. It can be added back into the system, but would
require the
approval of the OMPI community. I don't know how problematic that
would be -
there is a lot of concern over the amount of memory, overhead,
and potential
reliability issues that surround event notification. If you want
that
capability, I suggest we discuss it, come up with a plan that
deals with
those issues, and then take a proposal to the devel list for
discussion.

As for reliability, the objectives of the last year's effort were
precisely
scalability and reliability. We did a lot of work to eliminate
recursive
deadlocks and improve the reliability of the code. Our current
testing
indicates we had considerable success in that regard,
particularly with the
recursion elimination commit earlier today.

I would be happy to work with you to meet the PTP's needs - we'll
just need
to work with the OMPI community to ensure everyone buys into the
plan. If it
would help, I could come and review the new arch with the team (I
already
gave a presentation on it to IBM Rochester MN) and discuss required
enhancements.

PTP's needs have not changed since 1.0. From our perspective, the
1.3 branch
simply removes functionality that is required for PTP to support
OMPI. It
seems strange that we need "approval of the OMPI community" to
continue to use
functionality that has been available since 1.0. In any case,
there are
unfortunately no resources to work on the kind of re-engineering
that appears
to be required to support 1.3, even if it did provide the
functionality we
need.

Afraid I have to be driven by the OMPI community's requirements
since they
pay my salary :-) What they need is a "lean, mean, OMPI machine" as
they
say, and (for some reason) they view the debugger community as
consisting of
folks like totalview, vampirtrace, etc. - all of whom get involved
(either
directly or via one of the OMPI members) in the requirements
discussions.

Can't argue with business decisions, though. I gather there was some
mention
of PTP at the recent LANL/IBM RR meeting, so I'll let people know
that PTP
won't be an option on RR.

And I'll see if there is any interest here in adding 1.3 support to
PTP
ourselves - from looking at your code, I think it would take about a
day,
assuming someone more familiar with PTP will work with me.

Take care
Ralph


Greg





_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to