On Tue, 2009-05-26 at 17:24 -0600, Ralph Castain wrote: > First, to answer Nadia's question: you will find that the init > function for the module is already called when it is selected - see > the code in orte/mca/base/notifier_base_select.c, lines 72-76 (in the > trunk.
Strange? Our repository is a clone of the trunk? > It's true that if I "hg update" to v1.3 I see that the fix is there. Regards, Nadia > It would be a good idea to tie into the sos work to avoid conflicts > when it all gets merged back together, assuming that isn't a big > problem for you. > > As for Jeff's suggestion: dealing with the performance hit problem is > why I suggested ORTE_NOTIFIER_VERBOSE, modeled after the > OPAL_OUTPUT_VERBOSE model. The idea was to compile it in -only- when > the system is built for it - maybe using a --with-notifier-verbose > configuration option. Frankly, some organizations would happily pay a > small performance penalty for the benefits. > > I would personally recommend that the notifier framework keep the > stats so things can be compact and self-contained. We still get > atomicity by allowing each framework/component/whatever specify the > threshold. Creating yet another system to do nothing more than track > error/warning frequencies to decide whether or not to notify seems > wasteful. > > Perhaps worth a phone call to decide path forward? > > > On Tue, May 26, 2009 at 1:06 PM, Jeff Squyres <jsquy...@cisco.com> > wrote: > Nadia -- > > Sorry I didn't get to jump in on the other thread earlier. > > We have made considerable changes to the notifier framework in > a branch to better support "SOS" functionality: > > > https://www.open-mpi.org/hg/auth/hgwebdir.cgi/jsquyres/opal-sos > > Cisco and Indiana U. have been working on this branch for a > while. A description of the SOS stuff is here: > > https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages > > As for setting up an external web server with hg, don't bother > -- just get an account at bitbucket.org. They're free and > allow you to host hg repositories there. I've used bitbucket > to collaborate on code before it hits OMPI's SVN trunk with > both internal and external OMPI developers. > > We can certainly move the opal-sos repo to bitbucket (or > branch again off opal-sos to bitbucket -- whatever makes more > sense) to facilitate collaborating with you. > > Back on topic... > > I'd actually suggest a combination of what has been discussed > in the other thread. The notifier can be the mechanism that > actually sends the output message, but it doesn't have to be > the mechanism that tracks the stats and decides when to output > a message. That can be separate logic, and therefore be more > fine-grained (and potentially even specific to the MPI layer). > > The Big Question will how to do this with zero performance > impact when it is not being used. This has always been the > difficult issue when trying to implement any kind of > monitoring inside the core OMPI performance-sensitive paths. > Even adding individual branches has met with resistance (in > performance-critical code paths)... > > > > > > On May 26, 2009, at 10:59 AM, Nadia Derbey wrote: > > > > Hi, > > While having a look at the notifier framework under > orte, I noticed that > the way it is written, the init routine for the > selected module cannot > be called. > > Attached is a small patch that fixes this issue. > > Regards, > Nadia > > > <orte_notifier_fix_select.patch><ATT14046023.txt> > > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Nadia Derbey <nadia.der...@bull.net>