Re: [OMPI devel] problem in the ORTE notifier framework

Sylvain Jeaugey Wed, 27 May 2009 03:25:25 -0400

About performance, I may miss something, but our first goal was to trackalready slow pathes.

We imagined that it could be possible to add at the beginning (or end) ofthis "bad path" just one line that would basically do an atomic inc. So,in terms of CPU cycles, something like 1 for the inc and maybe 1 jumpbefore. Are a couple of cycles really an issue in slow pathes (which takeat least hundreds of cycles), or do you fear out-of-cache memory accesses- or something else ?

As for outputs, they indeed are slow (and can slow down considerably anapplication if not synchronized), but aggregation on the head node shouldsolve our problems. And if not, we can also disable outputs at runtime.

So, in my opinion, no application should notice a difference (unless youtune the framework to output every warning).


Sylvain

On Tue, 26 May 2009, Jeff Squyres wrote:

Nadia --

Sorry I didn't get to jump in on the other thread earlier.
We have made considerable changes to the notifier framework in a branch tobetter support "SOS" functionality:
  https://www.open-mpi.org/hg/auth/hgwebdir.cgi/jsquyres/opal-sos
Cisco and Indiana U. have been working on this branch for a while. Adescription of the SOS stuff is here:
  https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
As for setting up an external web server with hg, don't bother -- just get anaccount at bitbucket.org. They're free and allow you to host hg repositoriesthere. I've used bitbucket to collaborate on code before it hits OMPI's SVNtrunk with both internal and external OMPI developers.
We can certainly move the opal-sos repo to bitbucket (or branch again offopal-sos to bitbucket -- whatever makes more sense) to facilitatecollaborating with you.
Back on topic...
I'd actually suggest a combination of what has been discussed in the otherthread. The notifier can be the mechanism that actually sends the outputmessage, but it doesn't have to be the mechanism that tracks the stats anddecides when to output a message. That can be separate logic, and thereforebe more fine-grained (and potentially even specific to the MPI layer).
The Big Question will how to do this with zero performance impact when it isnot being used. This has always been the difficult issue when trying toimplement any kind of monitoring inside the core OMPI performance-sensitivepaths. Even adding individual branches has met with resistance (inperformance-critical code paths)...
On May 26, 2009, at 10:59 AM, Nadia Derbey wrote:
Hi,

While having a look at the notifier framework under orte, I noticed that
the way it is written, the init routine for the selected module cannot
be called.

Attached is a small patch that fixes this issue.

Regards,
Nadia

<orte_notifier_fix_select.patch><ATT14046023.txt>
--
Jeff Squyres
Cisco Systems

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] problem in the ORTE notifier framework

Reply via email to