Re: [OMPI devel] problem in the ORTE notifier framework

Nadia Derbey Thu, 28 May 2009 03:40:22 -0400

On Tue, 2009-05-26 at 17:24 -0600, Ralph Castain wrote:
> First, to answer Nadia's question: you will find that the init
> function for the module is already called when it is selected - see
> the code in orte/mca/base/notifier_base_select.c, lines 72-76 (in the
> trunk.


Strange? Our repository is a clone of the trunk?
> 
It's true that if I "hg update" to v1.3 I see that the fix is there.

Regards,
Nadia

> It would be a good idea to tie into the sos work to avoid conflicts
> when it all gets merged back together, assuming that isn't a big
> problem for you.
> 
> As for Jeff's suggestion: dealing with the performance hit problem is
> why I suggested ORTE_NOTIFIER_VERBOSE, modeled after the
> OPAL_OUTPUT_VERBOSE model. The idea was to compile it in -only- when
> the system is built for it - maybe using a --with-notifier-verbose
> configuration option. Frankly, some organizations would happily pay a
> small performance penalty for the benefits.
> 
> I would personally recommend that the notifier framework keep the
> stats so things can be compact and self-contained. We still get
> atomicity by allowing each framework/component/whatever specify the
> threshold. Creating yet another system to do nothing more than track
> error/warning frequencies to decide whether or not to notify seems
> wasteful.
> 
> Perhaps worth a phone call to decide path forward?
> 
> 
> On Tue, May 26, 2009 at 1:06 PM, Jeff Squyres <jsquy...@cisco.com>
> wrote:
>         Nadia --
>         
>         Sorry I didn't get to jump in on the other thread earlier.
>         
>         We have made considerable changes to the notifier framework in
>         a branch to better support "SOS" functionality:
>         
>         
>          https://www.open-mpi.org/hg/auth/hgwebdir.cgi/jsquyres/opal-sos
>         
>         Cisco and Indiana U. have been working on this branch for a
>         while.  A description of the SOS stuff is here:
>         
>            https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
>         
>         As for setting up an external web server with hg, don't bother
>         -- just get an account at bitbucket.org.  They're free and
>         allow you to host hg repositories there.  I've used bitbucket
>         to collaborate on code before it hits OMPI's SVN trunk with
>         both internal and external OMPI developers.
>         
>         We can certainly move the opal-sos repo to bitbucket (or
>         branch again off opal-sos to bitbucket -- whatever makes more
>         sense) to facilitate collaborating with you.
>         
>         Back on topic...
>         
>         I'd actually suggest a combination of what has been discussed
>         in the other thread.  The notifier can be the mechanism that
>         actually sends the output message, but it doesn't have to be
>         the mechanism that tracks the stats and decides when to output
>         a message.  That can be separate logic, and therefore be more
>         fine-grained (and potentially even specific to the MPI layer).
>         
>         The Big Question will how to do this with zero performance
>         impact when it is not being used. This has always been the
>         difficult issue when trying to implement any kind of
>         monitoring inside the core OMPI performance-sensitive paths.
>          Even adding individual branches has met with resistance (in
>         performance-critical code paths)...
>         
>         
>         
>         
>         
>         On May 26, 2009, at 10:59 AM, Nadia Derbey wrote:
>         
>         
>                 
>                 Hi,
>                 
>                 While having a look at the notifier framework under
>                 orte, I noticed that
>                 the way it is written, the init routine for the
>                 selected module cannot
>                 be called.
>                 
>                 Attached is a small patch that fixes this issue.
>                 
>                 Regards,
>                 Nadia
>                 
>                 
>                 <orte_notifier_fix_select.patch><ATT14046023.txt>
>         
>         
>         -- 
>         Jeff Squyres
>         Cisco Systems
>         
>         _______________________________________________
>         devel mailing list
>         de...@open-mpi.org
>         http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
-- 
Nadia Derbey <nadia.der...@bull.net>

Re: [OMPI devel] problem in the ORTE notifier framework

Reply via email to