Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Nadia Derbey
On Wed, 2009-05-27 at 14:25 -0400, Jeff Squyres wrote:
> Excellent points; Ralph and I chatted about this on the phone today --  
> we concur with George.
> 
> Bull -- would peruse work for you?  I think you mentioned before that  
> it didn't seem attractive to you.

Well, it didn't because from what I understood, the MPI program need to
be changed (register a callback routine for the event, activate the
event, etc), and this is something we wanted to avoid.

Now, if we are allowed to 
1. define new "internal" PERUSE events, 
2. internally set the associated callback routines
why not using peruse? This combined with the orte notifier framework,
could do the job I think.

Regards,
Nadia

>   I think George's point is that we  
> already have lots of hooks in place in the PML -- and they're called  
> peruse.  So if we could use those hooks, then a) they're run-time  
> selectable already, and b) there's no additional cost in performance  
> critical/not-critical code paths (for the case where these stats are  
> not being collected) because PERUSE has been in the code base for a  
> long time.
> 
> I think the idea is that your callbacks could be invoked by the peruse  
> hooks and then they can do whatever they want -- increment counters,  
> conditionally invoke the ORTE notifier system, etc.
> 
> 
> 
> On May 27, 2009, at 11:34 AM, George Bosilca wrote:
> 
> > What is a generic threshold? And what is a counter? We have a policy
> > against such coding standards, and to be honest I would like to stick
> > to it. The reason is that the PML is a very complex piece of code, and
> > I would like to keep it as easy to understand as possible. If people
> > start adding #if/#endif all over the code, we diverging from this  
> > goal.
> >
> > The only way to make this work is to call the notifier or some other
> > framework in this "slow path" and let this other framework do it's own
> > logic to determine what and when to print. Of course the cost of this
> > is a function call plus an atomic operation (which is already not
> > cheap). It's starting to get expensive, even for a "slow path", which
> > in this particular context is just one insertion in an atomic FIFO.
> >
> > If instead of counting in number of times we try to send the fragment,
> > and switch to a time base approach, this can be solved with the PERUSE
> > calls. There is a callback when the request is created, and another
> > callback when the first fragment is pushed successfully into the
> > network. Computing the time between these two, allow a tool to figure
> > out how much time the request was waiting in some internal queues, and
> > therefore how much delay this added to the execution time.
> >
> >george.
> >
> > On May 27, 2009, at 06:59 , Ralph Castain wrote:
> >
> > > ORTE_NOTIFIER_VERBOSE(api, counter, threshold,...)
> > >
> > > #if WANT_NOTIFIER_VERBOSE
> > > opal_atomic_increment(counter);
> > > if (counter > threshold) {
> > > orte_notifier.api(.)
> > > }
> > > #endif
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> 
> 
-- 
Nadia Derbey 



Re: [OMPI devel] Remove IMB 2.3 from ompi-tests?

2009-05-28 Thread Holger Mickler


Jeff Squyres wrote:
> On May 27, 2009, at 6:49 AM, Holger Mickler wrote:
> 
>> would you mind sharing this patch? We'd like to test our current VT
>> version with
>> some MPI RMA code :)
>>
> 
> No problem-o.  I've submitted this patch upstream to Intel as well. 
> Note that the patch slightly changed between 3.1 and 3.2; this is the
> 3.2 patch:
> 
> --- imb/src/IMB_window.c2008-10-21 01:17:31.0 -0700
> +++ IMB_3.2/src/IMB_window.c2009-05-26 05:29:15.0 -0700
> @@ -140,6 +140,9 @@
>   c_info->rank, 0, 1, c_info->r_data_type,
> c_info->WIN);
>MPI_ERRHAND(ierr);
>}
> +  /* JMS Added a call to MPI_WIN_FENCE, per MPI-2.1 11.2.1 */
> +  ierr = MPI_Win_fence(0, c_info->WIN);
> +  MPI_ERRHAND(ierr);
>ierr = MPI_Win_free(&c_info->WIN);
>MPI_ERRHAND(ierr);
>  }
> 

Great, works fine!

>> Does anyone know of some (small) code/benchmark that uses all
>> available MPI RMA
>> functionality? As far as I see, IMB only uses fence and
>> put/get/accumulate. No
>> locks or post/wait/start/complete...
>>
> 
> We have a few one-sided tests in the ompi-test repository (which I think
> Dresden has access to?), but I'm not 100% sure that they're correct...
> 

Yes, we do have access. We'll try the tests and see how far we can get :)
Thanks a lot!

Holger


Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Nadia Derbey
On Tue, 2009-05-26 at 17:24 -0600, Ralph Castain wrote:
> First, to answer Nadia's question: you will find that the init
> function for the module is already called when it is selected - see
> the code in orte/mca/base/notifier_base_select.c, lines 72-76 (in the
> trunk.

Strange? Our repository is a clone of the trunk?
> 
It's true that if I "hg update" to v1.3 I see that the fix is there.

Regards,
Nadia

> It would be a good idea to tie into the sos work to avoid conflicts
> when it all gets merged back together, assuming that isn't a big
> problem for you.
> 
> As for Jeff's suggestion: dealing with the performance hit problem is
> why I suggested ORTE_NOTIFIER_VERBOSE, modeled after the
> OPAL_OUTPUT_VERBOSE model. The idea was to compile it in -only- when
> the system is built for it - maybe using a --with-notifier-verbose
> configuration option. Frankly, some organizations would happily pay a
> small performance penalty for the benefits.
> 
> I would personally recommend that the notifier framework keep the
> stats so things can be compact and self-contained. We still get
> atomicity by allowing each framework/component/whatever specify the
> threshold. Creating yet another system to do nothing more than track
> error/warning frequencies to decide whether or not to notify seems
> wasteful.
> 
> Perhaps worth a phone call to decide path forward?
> 
> 
> On Tue, May 26, 2009 at 1:06 PM, Jeff Squyres 
> wrote:
> Nadia --
> 
> Sorry I didn't get to jump in on the other thread earlier.
> 
> We have made considerable changes to the notifier framework in
> a branch to better support "SOS" functionality:
> 
> 
>  https://www.open-mpi.org/hg/auth/hgwebdir.cgi/jsquyres/opal-sos
> 
> Cisco and Indiana U. have been working on this branch for a
> while.  A description of the SOS stuff is here:
> 
>https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
> 
> As for setting up an external web server with hg, don't bother
> -- just get an account at bitbucket.org.  They're free and
> allow you to host hg repositories there.  I've used bitbucket
> to collaborate on code before it hits OMPI's SVN trunk with
> both internal and external OMPI developers.
> 
> We can certainly move the opal-sos repo to bitbucket (or
> branch again off opal-sos to bitbucket -- whatever makes more
> sense) to facilitate collaborating with you.
> 
> Back on topic...
> 
> I'd actually suggest a combination of what has been discussed
> in the other thread.  The notifier can be the mechanism that
> actually sends the output message, but it doesn't have to be
> the mechanism that tracks the stats and decides when to output
> a message.  That can be separate logic, and therefore be more
> fine-grained (and potentially even specific to the MPI layer).
> 
> The Big Question will how to do this with zero performance
> impact when it is not being used. This has always been the
> difficult issue when trying to implement any kind of
> monitoring inside the core OMPI performance-sensitive paths.
>  Even adding individual branches has met with resistance (in
> performance-critical code paths)...
> 
> 
> 
> 
> 
> On May 26, 2009, at 10:59 AM, Nadia Derbey wrote:
> 
> 
> 
> Hi,
> 
> While having a look at the notifier framework under
> orte, I noticed that
> the way it is written, the init routine for the
> selected module cannot
> be called.
> 
> Attached is a small patch that fixes this issue.
> 
> Regards,
> Nadia
> 
> 
> 
> 
> 
> -- 
> Jeff Squyres
> Cisco Systems
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
-- 
Nadia Derbey 



Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Sylvain Jeaugey
To be more complete, we pull Hg from 
http://www.open-mpi.org/hg/hgwebdir.cgi/ompi-svn-mirror/ ; are we 
mistaken ?


If not, the code in v1.3 seems to be different from the code in the trunk 
...


Sylvain

On Thu, 28 May 2009, Nadia Derbey wrote:


On Tue, 2009-05-26 at 17:24 -0600, Ralph Castain wrote:

First, to answer Nadia's question: you will find that the init
function for the module is already called when it is selected - see
the code in orte/mca/base/notifier_base_select.c, lines 72-76 (in the
trunk.


Strange? Our repository is a clone of the trunk?



It's true that if I "hg update" to v1.3 I see that the fix is there.

Regards,
Nadia


It would be a good idea to tie into the sos work to avoid conflicts
when it all gets merged back together, assuming that isn't a big
problem for you.

As for Jeff's suggestion: dealing with the performance hit problem is
why I suggested ORTE_NOTIFIER_VERBOSE, modeled after the
OPAL_OUTPUT_VERBOSE model. The idea was to compile it in -only- when
the system is built for it - maybe using a --with-notifier-verbose
configuration option. Frankly, some organizations would happily pay a
small performance penalty for the benefits.

I would personally recommend that the notifier framework keep the
stats so things can be compact and self-contained. We still get
atomicity by allowing each framework/component/whatever specify the
threshold. Creating yet another system to do nothing more than track
error/warning frequencies to decide whether or not to notify seems
wasteful.

Perhaps worth a phone call to decide path forward?


On Tue, May 26, 2009 at 1:06 PM, Jeff Squyres 
wrote:
Nadia --

Sorry I didn't get to jump in on the other thread earlier.

We have made considerable changes to the notifier framework in
a branch to better support "SOS" functionality:


 https://www.open-mpi.org/hg/auth/hgwebdir.cgi/jsquyres/opal-sos

Cisco and Indiana U. have been working on this branch for a
while.  A description of the SOS stuff is here:

   https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages

As for setting up an external web server with hg, don't bother
-- just get an account at bitbucket.org.  They're free and
allow you to host hg repositories there.  I've used bitbucket
to collaborate on code before it hits OMPI's SVN trunk with
both internal and external OMPI developers.

We can certainly move the opal-sos repo to bitbucket (or
branch again off opal-sos to bitbucket -- whatever makes more
sense) to facilitate collaborating with you.

Back on topic...

I'd actually suggest a combination of what has been discussed
in the other thread.  The notifier can be the mechanism that
actually sends the output message, but it doesn't have to be
the mechanism that tracks the stats and decides when to output
a message.  That can be separate logic, and therefore be more
fine-grained (and potentially even specific to the MPI layer).

The Big Question will how to do this with zero performance
impact when it is not being used. This has always been the
difficult issue when trying to implement any kind of
monitoring inside the core OMPI performance-sensitive paths.
 Even adding individual branches has met with resistance (in
performance-critical code paths)...





On May 26, 2009, at 10:59 AM, Nadia Derbey wrote:



Hi,

While having a look at the notifier framework under
orte, I noticed that
the way it is written, the init routine for the
selected module cannot
be called.

Attached is a small patch that fixes this issue.

Regards,
Nadia





--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Nadia Derbey 

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Terry Dontje

Nadia Derbey wrote:

On Wed, 2009-05-27 at 14:25 -0400, Jeff Squyres wrote:
  
Excellent points; Ralph and I chatted about this on the phone today --  
we concur with George.


Bull -- would peruse work for you?  I think you mentioned before that  
it didn't seem attractive to you.



Well, it didn't because from what I understood, the MPI program need to
be changed (register a callback routine for the event, activate the
event, etc), and this is something we wanted to avoid.

Now, if we are allowed to 
1. define new "internal" PERUSE events, 
2. internally set the associated callback routines

why not using peruse? This combined with the orte notifier framework,
could do the job I think.

  
FWIW, I did a prototype of some dtrace probes piggybacking on the PERUSE 
macros and letting those changes be enabled/disabled at configure 
time.   One word of caution, if you start adding if statements to all 
the PERUSE macros you will more than likely end up significantly slowing 
down the performance.  So be careful as keep an eye on the overhead as 
you add stuff to the macros.


--td

Regards,
Nadia

  
  I think George's point is that we  
already have lots of hooks in place in the PML -- and they're called  
peruse.  So if we could use those hooks, then a) they're run-time  
selectable already, and b) there's no additional cost in performance  
critical/not-critical code paths (for the case where these stats are  
not being collected) because PERUSE has been in the code base for a  
long time.


I think the idea is that your callbacks could be invoked by the peruse  
hooks and then they can do whatever they want -- increment counters,  
conditionally invoke the ORTE notifier system, etc.




On May 27, 2009, at 11:34 AM, George Bosilca wrote:



What is a generic threshold? And what is a counter? We have a policy
against such coding standards, and to be honest I would like to stick
to it. The reason is that the PML is a very complex piece of code, and
I would like to keep it as easy to understand as possible. If people
start adding #if/#endif all over the code, we diverging from this  
goal.


The only way to make this work is to call the notifier or some other
framework in this "slow path" and let this other framework do it's own
logic to determine what and when to print. Of course the cost of this
is a function call plus an atomic operation (which is already not
cheap). It's starting to get expensive, even for a "slow path", which
in this particular context is just one insertion in an atomic FIFO.

If instead of counting in number of times we try to send the fragment,
and switch to a time base approach, this can be solved with the PERUSE
calls. There is a callback when the request is created, and another
callback when the first fragment is pushed successfully into the
network. Computing the time between these two, allow a tool to figure
out how much time the request was waiting in some internal queues, and
therefore how much delay this added to the execution time.

   george.

On May 27, 2009, at 06:59 , Ralph Castain wrote:

  

ORTE_NOTIFIER_VERBOSE(api, counter, threshold,...)

#if WANT_NOTIFIER_VERBOSE
opal_atomic_increment(counter);
if (counter > threshold) {
orte_notifier.api(.)
}
#endif


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  





Re: [OMPI devel] Remove IMB 2.3 from ompi-tests?

2009-05-28 Thread Jeff Squyres

On May 28, 2009, at 3:10 AM, Holger Mickler wrote:

> We have a few one-sided tests in the ompi-test repository (which I  
think
> Dresden has access to?), but I'm not 100% sure that they're  
correct...


Yes, we do have access. We'll try the tests and see how far we can  
get :)




Look in ompi-tests/trunk/onesided.  Like I said, I won't vouch for the  
correctness of those tests.  :-)


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Ralph Castain
The code in 1.3 is definitely different from the trunk as it lags quite a
bit behind. However, the trunk definitely does include the code I
referenced.

Not sure why the hg mirror wouldn't have it. I would have to defer to Jeff
on that question - could be a bug in the update macro that maintains the
mirror?

I haven't checked the opal_sos branch to see if it has the code in it, but I
would have thought those guys were tracking the trunk that closely - that
code was committed in r19209.

Ralph


On Thu, May 28, 2009 at 1:45 AM, Sylvain Jeaugey
wrote:

> To be more complete, we pull Hg from
> http://www.open-mpi.org/hg/hgwebdir.cgi/ompi-svn-mirror/ ; are we mistaken
> ?
>
> If not, the code in v1.3 seems to be different from the code in the trunk
> ...
>
> Sylvain
>
>
> On Thu, 28 May 2009, Nadia Derbey wrote:
>
>  On Tue, 2009-05-26 at 17:24 -0600, Ralph Castain wrote:
>>
>>> First, to answer Nadia's question: you will find that the init
>>> function for the module is already called when it is selected - see
>>> the code in orte/mca/base/notifier_base_select.c, lines 72-76 (in the
>>> trunk.
>>>
>>
>> Strange? Our repository is a clone of the trunk?
>>
>>>
>>>  It's true that if I "hg update" to v1.3 I see that the fix is there.
>>
>> Regards,
>> Nadia
>>
>>  It would be a good idea to tie into the sos work to avoid conflicts
>>> when it all gets merged back together, assuming that isn't a big
>>> problem for you.
>>>
>>> As for Jeff's suggestion: dealing with the performance hit problem is
>>> why I suggested ORTE_NOTIFIER_VERBOSE, modeled after the
>>> OPAL_OUTPUT_VERBOSE model. The idea was to compile it in -only- when
>>> the system is built for it - maybe using a --with-notifier-verbose
>>> configuration option. Frankly, some organizations would happily pay a
>>> small performance penalty for the benefits.
>>>
>>> I would personally recommend that the notifier framework keep the
>>> stats so things can be compact and self-contained. We still get
>>> atomicity by allowing each framework/component/whatever specify the
>>> threshold. Creating yet another system to do nothing more than track
>>> error/warning frequencies to decide whether or not to notify seems
>>> wasteful.
>>>
>>> Perhaps worth a phone call to decide path forward?
>>>
>>>
>>> On Tue, May 26, 2009 at 1:06 PM, Jeff Squyres 
>>> wrote:
>>>Nadia --
>>>
>>>Sorry I didn't get to jump in on the other thread earlier.
>>>
>>>We have made considerable changes to the notifier framework in
>>>a branch to better support "SOS" functionality:
>>>
>>>
>>> https://www.open-mpi.org/hg/auth/hgwebdir.cgi/jsquyres/opal-sos
>>>
>>>Cisco and Indiana U. have been working on this branch for a
>>>while.  A description of the SOS stuff is here:
>>>
>>>   https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
>>>
>>>As for setting up an external web server with hg, don't bother
>>>-- just get an account at bitbucket.org.  They're free and
>>>allow you to host hg repositories there.  I've used bitbucket
>>>to collaborate on code before it hits OMPI's SVN trunk with
>>>both internal and external OMPI developers.
>>>
>>>We can certainly move the opal-sos repo to bitbucket (or
>>>branch again off opal-sos to bitbucket -- whatever makes more
>>>sense) to facilitate collaborating with you.
>>>
>>>Back on topic...
>>>
>>>I'd actually suggest a combination of what has been discussed
>>>in the other thread.  The notifier can be the mechanism that
>>>actually sends the output message, but it doesn't have to be
>>>the mechanism that tracks the stats and decides when to output
>>>a message.  That can be separate logic, and therefore be more
>>>fine-grained (and potentially even specific to the MPI layer).
>>>
>>>The Big Question will how to do this with zero performance
>>>impact when it is not being used. This has always been the
>>>difficult issue when trying to implement any kind of
>>>monitoring inside the core OMPI performance-sensitive paths.
>>> Even adding individual branches has met with resistance (in
>>>performance-critical code paths)...
>>>
>>>
>>>
>>>
>>>
>>>On May 26, 2009, at 10:59 AM, Nadia Derbey wrote:
>>>
>>>
>>>
>>>Hi,
>>>
>>>While having a look at the notifier framework under
>>>orte, I noticed that
>>>the way it is written, the init routine for the
>>>selected module cannot
>>>be called.
>>>
>>>Attached is a small patch that fixes this issue.
>>>
>>>Regards,
>>>Nadia
>>>
>>>
>>>
>>>
>>>
>>>--
>>>Jeff Squyres
>>>Cisco Systems
>>>
>>>___
>>>devel mailing list
>>>de...@open-m

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Ralph Castain
I agree with Terry here about being careful in pursuing this path. What I
wouldn't want to have happen is to force anyone wanting to be notified of
error events to have to also turn on peruse, which impacts the non-error
code path.

Again, I'm not entirely sure what you are trying to do here. As I understood
the original RFC, it sounded like you wanted to track errors but only report
them when they occurred a controlled number of times (as opposed to every
time). I think this would better be done outside of peruse.

If you are trying to track normal performance (e.g., trying to alert sys
admins when networks aren't running as fast as they should), then that
probably should be done inside of peruse. However, that definitely will
impact the critical code path, so Terry's caution is definitely a concern.


On Thu, May 28, 2009 at 12:55 AM, Nadia Derbey wrote:

> On Wed, 2009-05-27 at 14:25 -0400, Jeff Squyres wrote:
> > Excellent points; Ralph and I chatted about this on the phone today --
> > we concur with George.
> >
> > Bull -- would peruse work for you?  I think you mentioned before that
> > it didn't seem attractive to you.
>
> Well, it didn't because from what I understood, the MPI program need to
> be changed (register a callback routine for the event, activate the
> event, etc), and this is something we wanted to avoid.
>
> Now, if we are allowed to
> 1. define new "internal" PERUSE events,
> 2. internally set the associated callback routines
> why not using peruse? This combined with the orte notifier framework,
> could do the job I think.
>
> Regards,
> Nadia
>
> >   I think George's point is that we
> > already have lots of hooks in place in the PML -- and they're called
> > peruse.  So if we could use those hooks, then a) they're run-time
> > selectable already, and b) there's no additional cost in performance
> > critical/not-critical code paths (for the case where these stats are
> > not being collected) because PERUSE has been in the code base for a
> > long time.
> >
> > I think the idea is that your callbacks could be invoked by the peruse
> > hooks and then they can do whatever they want -- increment counters,
> > conditionally invoke the ORTE notifier system, etc.
> >
> >
> >
> > On May 27, 2009, at 11:34 AM, George Bosilca wrote:
> >
> > > What is a generic threshold? And what is a counter? We have a policy
> > > against such coding standards, and to be honest I would like to stick
> > > to it. The reason is that the PML is a very complex piece of code, and
> > > I would like to keep it as easy to understand as possible. If people
> > > start adding #if/#endif all over the code, we diverging from this
> > > goal.
> > >
> > > The only way to make this work is to call the notifier or some other
> > > framework in this "slow path" and let this other framework do it's own
> > > logic to determine what and when to print. Of course the cost of this
> > > is a function call plus an atomic operation (which is already not
> > > cheap). It's starting to get expensive, even for a "slow path", which
> > > in this particular context is just one insertion in an atomic FIFO.
> > >
> > > If instead of counting in number of times we try to send the fragment,
> > > and switch to a time base approach, this can be solved with the PERUSE
> > > calls. There is a callback when the request is created, and another
> > > callback when the first fragment is pushed successfully into the
> > > network. Computing the time between these two, allow a tool to figure
> > > out how much time the request was waiting in some internal queues, and
> > > therefore how much delay this added to the execution time.
> > >
> > >george.
> > >
> > > On May 27, 2009, at 06:59 , Ralph Castain wrote:
> > >
> > > > ORTE_NOTIFIER_VERBOSE(api, counter, threshold,...)
> > > >
> > > > #if WANT_NOTIFIER_VERBOSE
> > > > opal_atomic_increment(counter);
> > > > if (counter > threshold) {
> > > > orte_notifier.api(.)
> > > > }
> > > > #endif
> > >
> > > ___
> > > devel mailing list
> > > de...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >
> >
> >
> --
> Nadia Derbey 
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Nadia Derbey
On Thu, 2009-05-28 at 05:57 -0600, Ralph Castain wrote:
> I agree with Terry here about being careful in pursuing this path.
> What I wouldn't want to have happen is to force anyone wanting to be
> notified of error events to have to also turn on peruse, which impacts
> the non-error code path.

Agreed, I missed that part!

Regards,
Nadia
> 
> Again, I'm not entirely sure what you are trying to do here. As I
> understood the original RFC, it sounded like you wanted to track
> errors but only report them when they occurred a controlled number of
> times (as opposed to every time). I think this would better be done
> outside of peruse.
> 
> If you are trying to track normal performance (e.g., trying to alert
> sys admins when networks aren't running as fast as they should), then
> that probably should be done inside of peruse. However, that
> definitely will impact the critical code path, so Terry's caution is
> definitely a concern.
> 
> 
> On Thu, May 28, 2009 at 12:55 AM, Nadia Derbey 
> wrote:
> On Wed, 2009-05-27 at 14:25 -0400, Jeff Squyres wrote:
> > Excellent points; Ralph and I chatted about this on the
> phone today --
> > we concur with George.
> >
> > Bull -- would peruse work for you?  I think you mentioned
> before that
> > it didn't seem attractive to you.
> 
> 
> Well, it didn't because from what I understood, the MPI
> program need to
> be changed (register a callback routine for the event,
> activate the
> event, etc), and this is something we wanted to avoid.
> 
> Now, if we are allowed to
> 1. define new "internal" PERUSE events,
> 2. internally set the associated callback routines
> why not using peruse? This combined with the orte notifier
> framework,
> could do the job I think.
> 
> Regards,
> Nadia
> 
> 
> >   I think George's point is that we
> > already have lots of hooks in place in the PML -- and
> they're called
> > peruse.  So if we could use those hooks, then a) they're
> run-time
> > selectable already, and b) there's no additional cost in
> performance
> > critical/not-critical code paths (for the case where these
> stats are
> > not being collected) because PERUSE has been in the code
> base for a
> > long time.
> >
> > I think the idea is that your callbacks could be invoked by
> the peruse
> > hooks and then they can do whatever they want -- increment
> counters,
> > conditionally invoke the ORTE notifier system, etc.
> >
> >
> >
> > On May 27, 2009, at 11:34 AM, George Bosilca wrote:
> >
> > > What is a generic threshold? And what is a counter? We
> have a policy
> > > against such coding standards, and to be honest I would
> like to stick
> > > to it. The reason is that the PML is a very complex piece
> of code, and
> > > I would like to keep it as easy to understand as possible.
> If people
> > > start adding #if/#endif all over the code, we diverging
> from this
> > > goal.
> > >
> > > The only way to make this work is to call the notifier or
> some other
> > > framework in this "slow path" and let this other framework
> do it's own
> > > logic to determine what and when to print. Of course the
> cost of this
> > > is a function call plus an atomic operation (which is
> already not
> > > cheap). It's starting to get expensive, even for a "slow
> path", which
> > > in this particular context is just one insertion in an
> atomic FIFO.
> > >
> > > If instead of counting in number of times we try to send
> the fragment,
> > > and switch to a time base approach, this can be solved
> with the PERUSE
> > > calls. There is a callback when the request is created,
> and another
> > > callback when the first fragment is pushed successfully
> into the
> > > network. Computing the time between these two, allow a
> tool to figure
> > > out how much time the request was waiting in some internal
> queues, and
> > > therefore how much delay this added to the execution time.
> > >
> > >george.
> > >
> > > On May 27, 2009, at 06:59 , Ralph Castain wrote:
> > >
> > > > ORTE_NOTIFIER_VERBOSE(api, counter, threshold,...)
> > > >
> > > > #if WANT_NOTIFIER_VERBOSE
> > > > opal_atomic_increment(counter);
> > > > if (counter > threshold) {
> > > > orte_notifier.api(.)
> > > > }
> > > > #en

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Jeff Squyres

On May 28, 2009, at 7:53 AM, Ralph Castain wrote:

The code in 1.3 is definitely different from the trunk as it lags  
quite a bit behind. However, the trunk definitely does include the  
code I referenced.


Not sure why the hg mirror wouldn't have it. I would have to defer  
to Jeff on that question - could be a bug in the update macro that  
maintains the mirror?


FWIW: I see the code right here:

http://www.open-mpi.org/hg/hgwebdir.cgi/ompi-svn-mirror/file/tip/orte/mca/notifier/base/notifier_base_select.c#l72

I haven't checked the opal_sos branch to see if it has the code in  
it, but I would have thought those guys were tracking the trunk that  
closely - that code was committed in r19209.


Yes, the opal-sos branch has a variant of this as well.  Note that we  
changed the notifier framework in the opal-sos branch to be many-of- 
many, not one-of-many.  Specifically: the trunk will select the *one*  
available notifier with the highest priority.  The opal-sos branch  
will select *all* available notifiers and then subsequently invoke  
them in priority order.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Jeff Squyres

On May 28, 2009, at 2:55 AM, Nadia Derbey wrote:

Well, it didn't because from what I understood, the MPI program need  
to

be changed (register a callback routine for the event, activate the
event, etc), and this is something we wanted to avoid.



Combined with what Terry and Ralph already said, I just wanted to make  
sure this point is crystal clear: what we're proposing is that you use  
peruse *internally* -- there's no need to change MPI applications.



Now, if we are allowed to
1. define new "internal" PERUSE events,
2. internally set the associated callback routines




Peruse was designed to be extensible, I believe.  So adding new events  
into its infrastructure may not be too terrible (I didn't work on the  
peruse stuff; George/Rainer would have to comment on that).  The  
bigger issue is adding hooks to call those peruse events in the main  
progression engines.  Adding them to error paths (or already-slow  
paths) might not be too bad.  But I'm sure that many of us would  
scrutinize such changes closely -- as previously stated, we don't want  
to negatively impact performance for those who will not be using this  
functionality.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Jeff Squyres

On May 28, 2009, at 8:48 AM, Jeff Squyres (jsquyres) wrote:


Yes, the opal-sos branch has a variant of this as well.




One thing I didn't mention: the opal-sos hg tree is unfortunately  
unrelated from the main ompi-svn-mirror, so you can't just push/pull  
between them.  :-(


Most of us OMPI developers make specific SVN+Mercurial branches (see https://svn.open-mpi.org/trac/ompi/wiki/UsingMercurial) 
 rather than branch from the ompi-svn-mirror so that we can more  
easily eventually commit back to the SVN trunk.


Eventually, Open MPI will be moving to 100% Mercurial and this issue  
will go away (current status is that Indiana U. is working on  
revamping our hosting infrastructure to support Mercurial).


--
Jeff Squyres
Cisco Systems



[OMPI devel] [Fwd: LAM: undefined reference to `mpi_bcast__']

2009-05-28 Thread Eugene Loh




I guess a bunch of you already saw this on the lam mail alias.  The
part that caught my eye was a user choosing LAM over OMPI due to lack
of "clear documentation" for OMPI.

 Original Message 

  

  Subject: 
  LAM: undefined reference to `mpi_bcast__'


  Date: 
  Thu, 28 May 2009 08:32:46 -0700 (PDT)


  From: 
  Silviu Groza 


  Reply-To: 
  General LAM/MPI mailing list 


  To: 
  l...@lam-mpi.org

  



Hello,

I am trying to install a qauntum chemistry program (Dalton) with LAM-MPI under PelicanHPC. PelicanHPC has both LAM-MPI as well as OpenMPI. I have chosen LAM-MPI due to lack of clear documentation of OpenMPI, and because LAM-MPI environment is the default on PelicanHPC.

So, I have the following outputs:

user@pelican:~$ mpif77 -c foo.c
user@pelican:~$ mpif77 -show
gfortran -I/usr/lib/lam/include -pthread -L/usr/lib/lam/lib -llammpio
 -llamf77mpi -lmpi -llam -lutil -ldl
user@pelican:~$ mpicc -show
gcc -I/usr/lib/lam/include -pthread -L/usr/lib/lam/lib -llammpio -llamf77mpi 
-lmpi -llam -lutil -ldl


Therefore, my Makefile.config is:

ARCH= linux 
# 
# 
CPPFLAGS  = -DVAR_G77 -DSYS_LINUX -DVAR_MFDS -D'INSTALL_WRKMEM=1' -D'INSTALL_BASDIR="/mnt/sda8/home/dan/Daltonsubpelican/dalton-2.0/basis/"' -DVAR_MPI -DIMPLICIT_NONE 
F77   = mpif77 
CC= mpicc 
RM= rm -f 
FFLAGS= -march=x86-64 -O3 -ffast-math -fexpensive-optimizations -funroll-loops -fno-range-check -fsecond-underscore 
SAFEFFLAGS= -march=x86-64 -O3 -ffast-math -fexpensive-optimizations -funroll-loops -fno-range-check -fsecond-underscore 
CFLAGS= -march=x86-64 -O3 -ffast-math -fexpensive-optimizations -funroll-loops -std=c99 -DRESTRICT=restrict 
INCLUDES  = -I../include 
LIBS  = -L/usr/lib -llapack -lblas 
INSTALLDIR= /mnt/sda8/home/dalton-2.0/bin 
PDPACK_EXTRAS = linpack.o eispack.o 
GP_EXTRAS = 
AR= ar 
ARFLAGS   = rvs 
# flags for ftnchek on Dalton /hjaaj 
CHEKFLAGS  = -nopure -nopretty -nocommon -nousage -noarray -notruncation -quiet  -noargumants -arguments=number  -usage=var-unitialized 
# -usage=var-unitialized:arg-const-modified:arg-alias 
# -usage=var-unitialized:var-set-unused:arg-unused:arg-const-modified:arg-alias 
# 
default : linuxparallel.x 
# 
# Parallel initialization 
# 
MPI_INCLUDE_DIR = -I/usr/lib/lam/include 
MPI_LIB_PATH= -L/usr/lib/lam/lib 
MPI_LIB = -lmpi 
# 
# 
# Suffix rules 
# hjaaj Oct 04: .g is a "cheat" suffix, for debugging. 
#   'make x.g' will create x.o from x.F or x.c with -g debug flag set. 
# 
.SUFFIXES : .F .o .c .i .g 

.F.o: 
$(F77) $(INCLUDES) $(CPPFLAGS) $(FFLAGS) -c $*.F 

.F.g: 
$(F77) $(INCLUDES) $(CPPFLAGS) $(FFLAGS) -g -c $*.F 

.c.o: 
$(CC) $(INCLUDES) $(CPPFLAGS) $(CFLAGS) -c $*.c 

.c.g: 
$(CC) $(INCLUDES) $(CPPFLAGS) $(CFLAGS) -g -c $*.c 

.F.i: 
$(F77) $(INCLUDES) $(CPPFLAGS) -E $*.F > $*.i 




and the errors are: 





---> Linking parallel dalpar.x ... 
mpif77 -march=x86-64 -O3 -ffast-math -fexpensive-optimizations -funroll-loops -fno-range-check -fsecond-underscore \ 
-o /mnt/sda8/home/dalton-2.0/bin/dalpar.x abacus/dalton.o cc/crayio.o abacus/linux_mem_allo.o \ 
abacus/herpar.o eri/eri2par.o amfi/amfi.o amfi/symtra.o -Labacus -labacus -Lrsp -lrsp -Lsirius -lsirius -labacus -Leri -leri -Ldensfit -ldensfit -Lcc  -lcc -Ldft -ldft -Lgp -lgp -Lpdpack -lpdpack -L/usr/lib -llapack -lblas  \ 
-L/usr/lib/lam/lib -lmpi 
abacus/dalton.o: In function `getmmbas_': 
dalton.F:(.text+0x379): undefined reference to `mpi_bcast__' 
abacus/dalton.o: In function `MAIN__': 
dalton.F:(.text+0x739): undefined reference to `mpi_bcast__' 
abacus/libabacus.a(dalgnr.o): In function `parion_': 
dalgnr.F:(.text+0x223): undefined reference to `mpi_bcast__' 
dalgnr.F:(.text+0x3ea): undefined reference to `mpi_bcast__' 
dalgnr.F:(.text+0x438): undefined reference to `mpi_bcast__' 
abacus/libabacus.a(dalgnr.o):dalgnr.F:(.text+0x686): more undefined references to `mpi_bcast__' follow 
dft/libdft.a(dft_ksm.o): In function `ksmcollect_': 
dft_ksm.F:(.text+0x8c): undefined reference to `mpi_reduce__' 
dft_ksm.F:(.text+0xd7): undefined reference to `mpi_reduce__' 
dft/libdft.a(dft_ksm.o): In function `ksmsync_': 
dft_ksm.F:(.text+0x12c): undefined reference to `mpi_bcast__' 
dft/libdft.a(dft_ksm.o): In function `kick_ksm_slaves_alive__': 
dft_ksm.F:(.text+0x27d): undefined reference to `mpi_bcast__' 
dft_ksm.F:(.text+0x29f): undefined reference to `mpi_bcast__' 
dft/libdft.a(dft_mag.o): In function `dft_suscep_collect__': 
dft_mag.F:(.text+0x21b0): undefined reference to `mpi_reduce__' 
dft/libdft.a(dft_mag.o): In function `kick_slaves_suscep__': 
dft_mag.F:(.text+0x231d): undefined reference to `mpi_bcast__' 
dft_mag.F:(.text+0x233f): undefined reference to `mpi_bcast__' 
dft/libdft.a(dft_mag.o): In function `dft_brhs_colle

Re: [OMPI devel] [Fwd: LAM: undefined reference to `mpi_bcast__']

2009-05-28 Thread Paul H. Hargrove
I think the user meant that PelicanHPC lacked clear OMPI-specifc 
documentation.

-Paul

Eugene Loh wrote:
I guess a bunch of you already saw this on the lam mail alias.  The 
part that caught my eye was a user choosing LAM over OMPI due to lack 
of "clear documentation" for OMPI.


 Original Message 
Subject:LAM: undefined reference to `mpi_bcast__'
Date:   Thu, 28 May 2009 08:32:46 -0700 (PDT)
From:   Silviu Groza 
Reply-To:   General LAM/MPI mailing list 
To: l...@lam-mpi.org



Hello,

I am trying to install a qauntum chemistry program (Dalton) with LAM-MPI under 
PelicanHPC. PelicanHPC has both LAM-MPI as well as OpenMPI. I have chosen 
LAM-MPI due to lack of clear documentation of OpenMPI, and because LAM-MPI 
environment is the default on PelicanHPC.

So, I have the following outputs:

user@pelican:~$ mpif77 -c foo.c
user@pelican:~$ mpif77 -show
gfortran -I/usr/lib/lam/include -pthread -L/usr/lib/lam/lib -llammpio
 -llamf77mpi -lmpi -llam -lutil -ldl
user@pelican:~$ mpicc -show
gcc -I/usr/lib/lam/include -pthread -L/usr/lib/lam/lib -llammpio -llamf77mpi 
-lmpi -llam -lutil -ldl



Therefore, my Makefile.config is:

ARCH= linux 
# 
# 
CPPFLAGS  = -DVAR_G77 -DSYS_LINUX -DVAR_MFDS -D'INSTALL_WRKMEM=1' -D'INSTALL_BASDIR="/mnt/sda8/home/dan/Daltonsubpelican/dalton-2.0/basis/"' -DVAR_MPI -DIMPLICIT_NONE 
F77   = mpif77 
CC= mpicc 
RM= rm -f 
FFLAGS= -march=x86-64 -O3 -ffast-math -fexpensive-optimizations -funroll-loops -fno-range-check -fsecond-underscore 
SAFEFFLAGS= -march=x86-64 -O3 -ffast-math -fexpensive-optimizations -funroll-loops -fno-range-check -fsecond-underscore 
CFLAGS= -march=x86-64 -O3 -ffast-math -fexpensive-optimizations -funroll-loops -std=c99 -DRESTRICT=restrict 
INCLUDES  = -I../include 
LIBS  = -L/usr/lib -llapack -lblas 
INSTALLDIR= /mnt/sda8/home/dalton-2.0/bin 
PDPACK_EXTRAS = linpack.o eispack.o 
GP_EXTRAS = 
AR= ar 
ARFLAGS   = rvs 
# flags for ftnchek on Dalton /hjaaj 
CHEKFLAGS  = -nopure -nopretty -nocommon -nousage -noarray -notruncation -quiet  -noargumants -arguments=number  -usage=var-unitialized 
# -usage=var-unitialized:arg-const-modified:arg-alias 
# -usage=var-unitialized:var-set-unused:arg-unused:arg-const-modified:arg-alias 
# 
default : linuxparallel.x 
# 
# Parallel initialization 
# 
MPI_INCLUDE_DIR = -I/usr/lib/lam/include 
MPI_LIB_PATH= -L/usr/lib/lam/lib 
MPI_LIB = -lmpi 
# 
# 
# Suffix rules 
# hjaaj Oct 04: .g is a "cheat" suffix, for debugging. 
#   'make x.g' will create x.o from x.F or x.c with -g debug flag set. 
# 
.SUFFIXES : .F .o .c .i .g 

.F.o: 
$(F77) $(INCLUDES) $(CPPFLAGS) $(FFLAGS) -c $*.F 

.F.g: 
$(F77) $(INCLUDES) $(CPPFLAGS) $(FFLAGS) -g -c $*.F 

.c.o: 
$(CC) $(INCLUDES) $(CPPFLAGS) $(CFLAGS) -c $*.c 

.c.g: 
$(CC) $(INCLUDES) $(CPPFLAGS) $(CFLAGS) -g -c $*.c 

.F.i: 
$(F77) $(INCLUDES) $(CPPFLAGS) -E $*.F > $*.i 





and the errors are: 






---> Linking parallel dalpar.x ... 
mpif77 -march=x86-64 -O3 -ffast-math -fexpensive-optimizations -funroll-loops -fno-range-check -fsecond-underscore \ 
-o /mnt/sda8/home/dalton-2.0/bin/dalpar.x abacus/dalton.o cc/crayio.o abacus/linux_mem_allo.o \ 
abacus/herpar.o eri/eri2par.o amfi/amfi.o amfi/symtra.o -Labacus -labacus -Lrsp -lrsp -Lsirius -lsirius -labacus -Leri -leri -Ldensfit -ldensfit -Lcc  -lcc -Ldft -ldft -Lgp -lgp -Lpdpack -lpdpack -L/usr/lib -llapack -lblas  \ 
-L/usr/lib/lam/lib -lmpi 
abacus/dalton.o: In function `getmmbas_': 
dalton.F:(.text+0x379): undefined reference to `mpi_bcast__' 
abacus/dalton.o: In function `MAIN__': 
dalton.F:(.text+0x739): undefined reference to `mpi_bcast__' 
abacus/libabacus.a(dalgnr.o): In function `parion_': 
dalgnr.F:(.text+0x223): undefined reference to `mpi_bcast__' 
dalgnr.F:(.text+0x3ea): undefined reference to `mpi_bcast__' 
dalgnr.F:(.text+0x438): undefined reference to `mpi_bcast__' 
abacus/libabacus.a(dalgnr.o):dalgnr.F:(.text+0x686): more undefined references to `mpi_bcast__' follow 
dft/libdft.a(dft_ksm.o): In function `ksmcollect_': 
dft_ksm.F:(.text+0x8c): undefined reference to `mpi_reduce__' 
dft_ksm.F:(.text+0xd7): undefined reference to `mpi_reduce__' 
dft/libdft.a(dft_ksm.o): In function `ksmsync_': 
dft_ksm.F:(.text+0x12c): undefined reference to `mpi_bcast__' 
dft/libdft.a(dft_ksm.o): In function `kick_ksm_slaves_alive__': 
dft_ksm.F:(.text+0x27d): undefined reference to `mpi_bcast__' 
dft_ksm.F:(.text+0x29f): undefined reference to `mpi_bcast__' 
dft/libdft.a(dft_mag.o): In function `dft_suscep_collect__': 
dft_mag.F:(.text+0x21b0): undefined reference to `mpi_reduce__' 
dft/libdft.a(dft_mag.o): In function `kick_slaves_suscep__': 
dft_mag.F:(.text+0x231d): undefined reference to `mpi_bcast__' 
dft_mag.F:(.text+0x233f): undefined reference to `mpi_bcast__' 
dft/libdft.a(dft_mag.o): In function `dft