(Updated RFC, per online/offline discussions)
======================================================================
[RFC 1/2] ORTE WDC and improvements to the "notifier" framework
======================================================================
WHAT: Merge improvements to the "notifier" framework from the OPAL SOS
and the ORTE WDC mercurial branches into the SVN trunk.
WHY: Some improvements and interface changes were put into the ORTE
notifier framework during the development of the OPAL SOS[1] and
ORTE WDC[2] branches.
WHERE: Mostly restricted to ORTE notifier files and files using the
notifier interface in OMPI.
TIMEOUT: May 17, Monday, COB.
REFERENCE MERCURIAL REPOS:
* SOS development: http://bitbucket.org/jsquyres/opal-sos-fixed/
* WDC development: http://bitbucket.org/derbeyn/orte-wdc-fixed/
======================================================================
BACKGROUND:
The notifier interface and its components underwent a host of
improvements and changes during the development of the SOS[1] and the
WDC[2] branches. The ORTE WDC (Warning Data Capture) branch enables
accounting of events through the use of notifier interface, whereas
OPAL SOS uses the notifier interface by setting up callbacks to relay
out logged events.
Some of the improvements include:
- added more severity levels.
- "ftb" notifier improvements.
- "command" notifier improvements.
- added "file" notifier component
- changes in the notifier modules selection
- activate only a subset of the callbacks
(i.e. any combination of log, help, log_peer)
- define different output media for any given callback (e.g. log_peer
can be redirected to the syslog and smtp, while the show_help can be
sent to the hnp).
- ORTE_NOTIFIER_LOG_EVENT() (that accounts and warns about unusual
events)
Much more information is available on these two wiki pages:
[1] http://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
[2] http://svn.open-mpi.org/trac/ompi/wiki/ORTEWDC
NOTE: This is first of a two-part RFC to bring the SOS and WDC branches
to the trunk. This only brings in the "notifier" changes from the SOS
branch, while the rest of the branch will be brought over after the
timeout of the second RFC.
======================================================================
On Mar 30, 2010, at 9:50 AM, Abhishek Kulkarni wrote:
On Mar 29, 2010, at 9:16 PM, Ralph Castain wrote:
On Mar 29, 2010, at 5:53 PM, Abhishek Kulkarni wrote:
On Mon, 29 Mar 2010, Sylvain Jeaugey wrote:
Hi Ralph,
For now, I think that yes, this is a unique identifier. However,
in my opinion, this could be improved in the future replacing it
by a unique string.
Something like :
#define ORTE_NOTIFIER_DEFINE_EVENT(eventstr, associated_text) {
static int event = -1;
if (OPAL_UNLIKELY(event == -1) {
event = opal_sos_create_new_event(eventstr, associated_text);
}
...<increase event counter>...
}
This would move the event numbering to the OPAL layer, making it
transparent to the developper.
This is a good suggestion, but then I think we end up relying on
run-time generation of the event numbers and have to pay the extra
cost of looking up the event in a list/array/hash each time we log
the event.
Since it is -solely- intended to be in an error path, I fail to see
the concern here.
My bad. Clearly I misunderstood here -- mostly because I vaguely
remember (from [1]) that the original motivation
was to put conditional #ifdef'd hooks in the "fast path" as well.
But if they ought to be on the "slow path", I think
it would be fair enough to consider Sylvain's suggestion of pushing
the event numbering to SOS. In that, the
SOS hashtable could map the notifier events to their unique
identifier and the threshold counter itself could be
encoded inside the identifier returned by SOS.
[1] http://www.open-mpi.org/community/lists/devel/2009/05/6132.php
From what I understand, and from the discussions that took place
when this
proposal was first put up on the devel list, is that since the
event tracing hooks could lie in the critical path, we want the
overhead to be as low as possible. By manually defining the unique
identifiers, we can generate the event tracing macro at compile-
time and have a minimal tracing impact.
Surely you jest - yes?? The event tracing hooks should -never- be
in the critical path. The notifier is intended -solely- to be
called when an error (or some other critical event) has already
been detected. The idea was that we detect an error, and then (if
selected) notify someone about it.
The last thing we want to do, IMHO, is put the notifier in a
critical path. If we do, I personally will regret having created
it :-)
My 2ยข ofcourse.
Thanks
Abhishek
Just my 2 cents ...
Sylvain
On Mon, 29 Mar 2010, Ralph Castain wrote:
Hi Abhishek
I'm confused by the WDC wiki page, specifically the part about the
new ORTE_NOTIFIER_DEFINE_EVENT macro. Are you saying
that I (as the developer) have to provide this macro with a unique
notifier id? So that would mean that ORTE/OMPI would
have to maintain a global notifier id counter to ensure it is
unique?
If so, that seems really cumbersome. Could you please clarify?
Thanks
Ralph
On Mar 29, 2010, at 8:57 AM, Abhishek Kulkarni wrote:
=
=
=
=
==================================================================
[RFC 1/2]
=
=
=
=
==================================================================
WHAT: Merge improvements to the "notifier" framework from
the OPAL
SOS
and the ORTE WDC mercurial branches into the SVN trunk.
WHY: Some improvements and interface changes were put into
the ORTE
notifier framework during the development of the OPAL
SOS[1] and
ORTE WDC[2] branches.
WHERE: Mostly restricted to ORTE notifier files and files
using the
notifier interface in OMPI.
TIMEOUT: The weekend of April 2-3.
REFERENCE MERCURIAL REPOS:
* SOS development: http://bitbucket.org/jsquyres/opal-sos-fixed/
* WDC development: http://bitbucket.org/derbeyn/orte-wdc-
fixed/
=
=
=
=
==================================================================
BACKGROUND:
The notifier interface and its components underwent a host of
improvements and changes during the development of the
SOS[1] and
the
WDC[2] branches. The ORTE WDC (Warning Data Capture) branch
enables
accounting of events through the use of notifier interface,
whereas
OPAL SOS uses the notifier interface by setting up callbacks
to
relay
out logged events.
Some of the improvements include:
- added more severity levels.
- "ftb" notifier improvements.
- "command" notifier improvements.
- added "file" notifier component
- changes in the notifier modules selection
- activate only a subset of the callbacks
(i.e. any combination of log, help, log_peer)
- define different output media for any given callback (e.g.
log_peer
can be redirected to the syslog and smtp, while the
show_help can be
sent to the hnp).
- ORTE_NOTIFIER_LOG_EVENT() (that accounts and warns about
unusual
events)
Much more information is available on these two wiki pages:
[1] http://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
[2] http://svn.open-mpi.org/trac/ompi/wiki/ORTEWDC
NOTE: This is first of a two-part RFC to bring the SOS and WDC
branches
to the trunk. This only brings in the "notifier" changes
from the
SOS
branch, while the rest of the branch will be brought over
after the
timeout of the second RFC.
=
=
=
=
==================================================================
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel