Re: [OMPI devel] problem in the ORTE notifier framework

Jeff Squyres Tue, 26 May 2009 19:47:30 -0400

Sure, I can setup a webex (with international dialins) if it would beuseful.


On May 26, 2009, at 7:24 PM, Ralph Castain wrote:

First, to answer Nadia's question: you will find that the initfunction for the module is already called when it is selected - seethe code in orte/mca/base/notifier_base_select.c, lines 72-76 (inthe trunk.
It would be a good idea to tie into the sos work to avoid conflictswhen it all gets merged back together, assuming that isn't a bigproblem for you.
As for Jeff's suggestion: dealing with the performance hit problemis why I suggested ORTE_NOTIFIER_VERBOSE, modeled after theOPAL_OUTPUT_VERBOSE model. The idea was to compile it in -only- whenthe system is built for it - maybe using a --with-notifier-verboseconfiguration option. Frankly, some organizations would happily paya small performance penalty for the benefits.
I would personally recommend that the notifier framework keep thestats so things can be compact and self-contained. We still getatomicity by allowing each framework/component/whatever specify thethreshold. Creating yet another system to do nothing more than trackerror/warning frequencies to decide whether or not to notify seemswasteful.
Perhaps worth a phone call to decide path forward?
On Tue, May 26, 2009 at 1:06 PM, Jeff Squyres <jsquy...@cisco.com>wrote:
Nadia --

Sorry I didn't get to jump in on the other thread earlier.
We have made considerable changes to the notifier framework in abranch to better support "SOS" functionality:
   https://www.open-mpi.org/hg/auth/hgwebdir.cgi/jsquyres/opal-sos
Cisco and Indiana U. have been working on this branch for a while.A description of the SOS stuff is here:
   https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
As for setting up an external web server with hg, don't bother --just get an account at bitbucket.org. They're free and allow you tohost hg repositories there. I've used bitbucket to collaborate oncode before it hits OMPI's SVN trunk with both internal and externalOMPI developers.
We can certainly move the opal-sos repo to bitbucket (or branchagain off opal-sos to bitbucket -- whatever makes more sense) tofacilitate collaborating with you.
Back on topic...
I'd actually suggest a combination of what has been discussed in theother thread. The notifier can be the mechanism that actually sendsthe output message, but it doesn't have to be the mechanism thattracks the stats and decides when to output a message. That can beseparate logic, and therefore be more fine-grained (and potentiallyeven specific to the MPI layer).
The Big Question will how to do this with zero performance impactwhen it is not being used. This has always been the difficult issuewhen trying to implement any kind of monitoring inside the core OMPIperformance-sensitive paths. Even adding individual branches hasmet with resistance (in performance-critical code paths)...
On May 26, 2009, at 10:59 AM, Nadia Derbey wrote:

Hi,
While having a look at the notifier framework under orte, I noticedthat
the way it is written, the init routine for the selected module cannot
be called.

Attached is a small patch that fixes this issue.

Regards,
Nadia

<orte_notifier_fix_select.patch><ATT14046023.txt>


--
Jeff Squyres
Cisco Systems

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] problem in the ORTE notifier framework

Reply via email to