Re: [OMPI devel] problem in the ORTE notifier framework

Ralph Castain Thu, 28 May 2009 07:53:18 -0400

The code in 1.3 is definitely different from the trunk as it lags quite a
bit behind. However, the trunk definitely does include the code I
referenced.


Not sure why the hg mirror wouldn't have it. I would have to defer to Jeff
on that question - could be a bug in the update macro that maintains the
mirror?

I haven't checked the opal_sos branch to see if it has the code in it, but I
would have thought those guys were tracking the trunk that closely - that
code was committed in r19209.

Ralph


On Thu, May 28, 2009 at 1:45 AM, Sylvain Jeaugey
<sylvain.jeau...@bull.net>wrote:

> To be more complete, we pull Hg from
> http://www.open-mpi.org/hg/hgwebdir.cgi/ompi-svn-mirror/ ; are we mistaken
> ?
>
> If not, the code in v1.3 seems to be different from the code in the trunk
> ...
>
> Sylvain
>
>
> On Thu, 28 May 2009, Nadia Derbey wrote:
>
>  On Tue, 2009-05-26 at 17:24 -0600, Ralph Castain wrote:
>>
>>> First, to answer Nadia's question: you will find that the init
>>> function for the module is already called when it is selected - see
>>> the code in orte/mca/base/notifier_base_select.c, lines 72-76 (in the
>>> trunk.
>>>
>>
>> Strange? Our repository is a clone of the trunk?
>>
>>>
>>>  It's true that if I "hg update" to v1.3 I see that the fix is there.
>>
>> Regards,
>> Nadia
>>
>>  It would be a good idea to tie into the sos work to avoid conflicts
>>> when it all gets merged back together, assuming that isn't a big
>>> problem for you.
>>>
>>> As for Jeff's suggestion: dealing with the performance hit problem is
>>> why I suggested ORTE_NOTIFIER_VERBOSE, modeled after the
>>> OPAL_OUTPUT_VERBOSE model. The idea was to compile it in -only- when
>>> the system is built for it - maybe using a --with-notifier-verbose
>>> configuration option. Frankly, some organizations would happily pay a
>>> small performance penalty for the benefits.
>>>
>>> I would personally recommend that the notifier framework keep the
>>> stats so things can be compact and self-contained. We still get
>>> atomicity by allowing each framework/component/whatever specify the
>>> threshold. Creating yet another system to do nothing more than track
>>> error/warning frequencies to decide whether or not to notify seems
>>> wasteful.
>>>
>>> Perhaps worth a phone call to decide path forward?
>>>
>>>
>>> On Tue, May 26, 2009 at 1:06 PM, Jeff Squyres <jsquy...@cisco.com>
>>> wrote:
>>>        Nadia --
>>>
>>>        Sorry I didn't get to jump in on the other thread earlier.
>>>
>>>        We have made considerable changes to the notifier framework in
>>>        a branch to better support "SOS" functionality:
>>>
>>>
>>>         https://www.open-mpi.org/hg/auth/hgwebdir.cgi/jsquyres/opal-sos
>>>
>>>        Cisco and Indiana U. have been working on this branch for a
>>>        while.  A description of the SOS stuff is here:
>>>
>>>           https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
>>>
>>>        As for setting up an external web server with hg, don't bother
>>>        -- just get an account at bitbucket.org.  They're free and
>>>        allow you to host hg repositories there.  I've used bitbucket
>>>        to collaborate on code before it hits OMPI's SVN trunk with
>>>        both internal and external OMPI developers.
>>>
>>>        We can certainly move the opal-sos repo to bitbucket (or
>>>        branch again off opal-sos to bitbucket -- whatever makes more
>>>        sense) to facilitate collaborating with you.
>>>
>>>        Back on topic...
>>>
>>>        I'd actually suggest a combination of what has been discussed
>>>        in the other thread.  The notifier can be the mechanism that
>>>        actually sends the output message, but it doesn't have to be
>>>        the mechanism that tracks the stats and decides when to output
>>>        a message.  That can be separate logic, and therefore be more
>>>        fine-grained (and potentially even specific to the MPI layer).
>>>
>>>        The Big Question will how to do this with zero performance
>>>        impact when it is not being used. This has always been the
>>>        difficult issue when trying to implement any kind of
>>>        monitoring inside the core OMPI performance-sensitive paths.
>>>         Even adding individual branches has met with resistance (in
>>>        performance-critical code paths)...
>>>
>>>
>>>
>>>
>>>
>>>        On May 26, 2009, at 10:59 AM, Nadia Derbey wrote:
>>>
>>>
>>>
>>>                Hi,
>>>
>>>                While having a look at the notifier framework under
>>>                orte, I noticed that
>>>                the way it is written, the init routine for the
>>>                selected module cannot
>>>                be called.
>>>
>>>                Attached is a small patch that fixes this issue.
>>>
>>>                Regards,
>>>                Nadia
>>>
>>>
>>>                <orte_notifier_fix_select.patch><ATT14046023.txt>
>>>
>>>
>>>        --
>>>        Jeff Squyres
>>>        Cisco Systems
>>>
>>>        _______________________________________________
>>>        devel mailing list
>>>        de...@open-mpi.org
>>>        http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>> --
>> Nadia Derbey <nadia.der...@bull.net>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>  _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Re: [OMPI devel] problem in the ORTE notifier framework

Reply via email to