Re: [OMPI devel] problem in the ORTE notifier framework

Ralph Castain Tue, 26 May 2009 19:24:12 -0400

First, to answer Nadia's question: you will find that the init function for
the module is already called when it is selected - see the code in
orte/mca/base/notifier_base_select.c, lines 72-76 (in the trunk.


It would be a good idea to tie into the sos work to avoid conflicts when it
all gets merged back together, assuming that isn't a big problem for you.

As for Jeff's suggestion: dealing with the performance hit problem is why I
suggested ORTE_NOTIFIER_VERBOSE, modeled after the OPAL_OUTPUT_VERBOSE
model. The idea was to compile it in -only- when the system is built for it
- maybe using a --with-notifier-verbose configuration option. Frankly, some
organizations would happily pay a small performance penalty for the
benefits.

I would personally recommend that the notifier framework keep the stats so
things can be compact and self-contained. We still get atomicity by allowing
each framework/component/whatever specify the threshold. Creating yet
another system to do nothing more than track error/warning frequencies to
decide whether or not to notify seems wasteful.

Perhaps worth a phone call to decide path forward?


On Tue, May 26, 2009 at 1:06 PM, Jeff Squyres <[email protected]> wrote:

> Nadia --
>
> Sorry I didn't get to jump in on the other thread earlier.
>
> We have made considerable changes to the notifier framework in a branch to
> better support "SOS" functionality:
>
>    https://www.open-mpi.org/hg/auth/hgwebdir.cgi/jsquyres/opal-sos
>
> Cisco and Indiana U. have been working on this branch for a while.  A
> description of the SOS stuff is here:
>
>    https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
>
> As for setting up an external web server with hg, don't bother -- just get
> an account at bitbucket.org.  They're free and allow you to host hg
> repositories there.  I've used bitbucket to collaborate on code before it
> hits OMPI's SVN trunk with both internal and external OMPI developers.
>
> We can certainly move the opal-sos repo to bitbucket (or branch again off
> opal-sos to bitbucket -- whatever makes more sense) to facilitate
> collaborating with you.
>
> Back on topic...
>
> I'd actually suggest a combination of what has been discussed in the other
> thread.  The notifier can be the mechanism that actually sends the output
> message, but it doesn't have to be the mechanism that tracks the stats and
> decides when to output a message.  That can be separate logic, and therefore
> be more fine-grained (and potentially even specific to the MPI layer).
>
> The Big Question will how to do this with zero performance impact when it
> is not being used. This has always been the difficult issue when trying to
> implement any kind of monitoring inside the core OMPI performance-sensitive
> paths.  Even adding individual branches has met with resistance (in
> performance-critical code paths)...
>
>
>
>
> On May 26, 2009, at 10:59 AM, Nadia Derbey wrote:
>
>  Hi,
>>
>> While having a look at the notifier framework under orte, I noticed that
>> the way it is written, the init routine for the selected module cannot
>> be called.
>>
>> Attached is a small patch that fixes this issue.
>>
>> Regards,
>> Nadia
>>
>> <orte_notifier_fix_select.patch><ATT14046023.txt>
>>
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Re: [OMPI devel] problem in the ORTE notifier framework

Reply via email to