The likelihood of a physical meeting about this in the near future is
unlikely; I think we're all facing travel restrictions and constraints
with the holidays coming up.
How about a teleconf to discuss the following about the notifier:
- what exactly is there today
- why what is there today is the way it is
- discuss proposals on different ways to do it
More specifically, I think we all agree that the idea of an MPI
application notifying a higher-level entity when it detects errors is
a good one (e.g., on the host, or in the network, or ...). I think
that it is worth discussing in higher bandwidth so that we can avoid
email hell (I agree with Ralph; this could devolve pretty easily).
I propose any of the following times to discuss (I'll setup a phone
bridge):
- Mon, Dec 8, 2pm, 3pm, or 4pm Eastern
- Tue, Dec 9, 10am, noon, 1pm, 2pm, 3pm, or 4pm Eastern
- Wed, Dec 10, any time
- Thu, Dec 11, 11am, 1pm, 2pm, 3pm, or 4pm Eastern
- Fri, Dec 12, 9am, 10am, 11am, 2pm, 3pm, or 4pm Eastern
On Dec 4, 2008, at 3:16 PM, Ralph Castain wrote:
I'm beginning to believe that we need a design meeting specifically
over this question. Too many unknowns exist, with significant
potential problems lurking behind them. Frankly, this issue could
have a major impact on how we operate, performance, and a variety of
other factors going forward - many of which may be difficult to
predict.
I suspect there may not be "optimal" solutions to many of these
questions, but there certainly will be strong opinions in multiple
directions.
As part of that discussion, I propose that we consider alternative
methods for meeting the same overall objective - namely, reuse of
the BTL's by another software project. For example, a simple copy-
and-branch is the dominant method today, with patches used by both
parties to cherry-pick the changes they want from the other code
users. Multiple tools have been developed to support this mode of
operation, yet we haven't discussed any of them in this context. The
proposed approach contains a number of impacts that may be avoided
with an alternative approach.
Without such a meeting, I fear we are going to rapidly dissolve into
email hell again.
Ralph
On Dec 4, 2008, at 1:07 PM, Eugene Loh wrote:
Richard Graham wrote:
I expect this will involve some sort of well defined interface
between the btl’s and orte, and I don’t know if this will also
require something like this between the btl’s and the pml – I
think that interface is rigidly enforced, but am not sure.
I'm probably missing the scope of what you're saying here, but it
raises another question in my mind. Is there today a well-defined
interface between the BTLs and... anything else? PML or whatever?
Maybe this comes back to a documentation question: do we (or will
we) have anything written down that says what a BTL must do, what
it may rely on, etc.?
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Jeff Squyres
Cisco Systems