date:20080213

Re: [OMPI devel] btl_openib_rnr_retry MCA param

2008-02-13 Thread Gleb Natapov

On Tue, Feb 12, 2008 at 05:41:13PM -0500, Jeff Squyres wrote:
> I see that in the OOB CPC for the openib BTL, when setting up the send  
> side of the QP, we set the rnr_retry value depending on whether the  
> remote receive queue is a per-peer or SRQ:
> 
> - SRQ: btl_openib_rnr_retry MCA param value
> - PP: 0
> 
> The rationale given in a comment is that setting the RNR to 0 is a  
> good way to find bugs in our flow control.
> 
> Do we really want this in production builds?  Or do we want 0 for  
> developer builds and the same btl_openib_rnr_retry value for PP queues?
> 
The comment is mine and IMO it should stay that way for production
builds. SW flow control either work or it doesn't and if it doesn't I
prefer to know about it immediately. Setting PP to some value greater
then 0 just delays the manifestation of the problem and in the case of
iWarp such possibility doesn't even exists.

--
Gleb.

Re: [OMPI devel] [RFC] Remove explicit call to progress() from ob1.

2008-02-13 Thread Gleb Natapov

On Tue, Feb 12, 2008 at 05:57:22PM -0500, Jeff Squyres wrote:
> Were these supposed to cover the time required for pinning and  
> unpinning?
That what the comment says, but CPU executes code and not comments :)
Memory pinning happens inside prepare_dst() after prepare_dst() returns
the memory is already pinned. If you want to call progress after each
call to prepare_dst() you still can do it by setting recv_pipeline_depth
to 1. And unpinning happens in entirely different place after RDMA
completion is acknowledged.

> 
> Can you explain why you think they're unnecessary?
> 
The much better question is "Why they are necessary?", because if there
is not good answer to this question then they should be removed, since
they are harmful as they cause uncontrollable recursion calls.

> 
> On Feb 12, 2008, at 5:27 AM, Gleb Natapov wrote:
> 
> > Hi,
> >
> > I am planning to commit the following patch. Those two progress()  
> > calls
> > are responsible for most of our deep recursion troubles. And I also
> > think they are completely unnecessary.
> >
> > diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c b/ompi/mca/pml/ob1/ 
> > pml_ob1_recvreq.c
> > index 5899243..641176e 100644
> > --- a/ompi/mca/pml/ob1/pml_ob1_recvreq.c
> > +++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
> > @@ -704,9 +704,6 @@ int mca_pml_ob1_recv_request_schedule_once(
> > mca_bml_base_free(bml_btl,dst);
> > continue;
> > }
> > -
> > -/* run progress as the prepare (pinning) can take some time  
> > */
> > -mca_bml.bml_progress();
> > }
> >
> > return OMPI_SUCCESS;
> > diff --git a/ompi/mca/pml/ob1/pml_ob1_sendreq.c b/ompi/mca/pml/ob1/ 
> > pml_ob1_sendreq.c
> > index 0998a05..9d7f3f9 100644
> > --- a/ompi/mca/pml/ob1/pml_ob1_sendreq.c
> > +++ b/ompi/mca/pml/ob1/pml_ob1_sendreq.c
> > @@ -968,7 +968,6 @@ cannot_pack:
> > mca_bml_base_free(bml_btl,des);
> > continue;
> > }
> > -mca_bml.bml_progress();
> > }
> >
> > return OMPI_SUCCESS;
> > --
> > Gleb.
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> Cisco Systems
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Gleb.

Re: [OMPI devel] merging new PLPA to the trunk

2008-02-13 Thread Lenny Verkhovsky



> -Original Message-
> From: Jeff Squyres [mailto:jsquy...@cisco.com]
> Sent: Tuesday, February 12, 2008 10:34 PM
> To: Lenny Verkhovsky
> Cc: PLPA users list; Open MPI Developers; Sharon Melamed; Ralph
Castain;
> Pak Lui
> Subject: Re: merging new PLPA to the trunk
> 
> On Feb 12, 2008, at 7:11 AM, Lenny Verkhovsky wrote:
> 
> > During coding new RMAPS component I found strange behavior of PLPA.
> > Same
> > behavior that was described in
> > http://www.open-mpi.org/community/lists/plpa-users/2007/04/0073.php
> >
> > I believe that it was fixed in new version of PLPA.
> >
> > This new version needed to be merged to the trunk due to bug fixes
and
> > changes in API.
> >
> > If there is no objection I volunteer to do it.
> 
> 
> That would be great.  Please use the official SVN "3rd party import"
> guidelines.  There's a /vendor/plpa branch that *may* be in good shape
> for this, but may not (I don't think I fully grokked the SVN 3rd part
> import procedures when I was using that branch before).  :-\  In a
> worst-cast scenario, we can "reset the clock" in the /vendor/plpa
> branch and make the new PLPA version be the "first" version in that
> tree (i.e., as if it were the first version we imported).
> 
> What's your timeframe?
> 
> I ask because it would probably be best if I finally get around to
> releasing a stable version of PLPA.  The last version is technically
> still a beta.

We are working on the newest version from the trunk 
http://svn.open-mpi.org/svn/plpa/trunk/ right now.
It's newer than /vendor/plpa branch.


> 
> --
> Jeff Squyres
> Cisco Systems

Re: [OMPI devel] more vt woes

2008-02-13 Thread Matthias Jurenz

Hi George,

I'm not sure, whether you are able to see my reply of the ticket 1214...

...
For building VT on cross-platforms it's possible to build the compiler
wrappers (vtcc, vtcxx, vtf77, and vtf90) and the OPARI binary for the
front-end. Therefor the user should set the variable CXX_FOR_BUILD to
the 'native' compiler on the front-end. That means that the compiler
wrappers and OPARI will be built with the CXX_FOR_BUILD instead of the
cross-compiler (CXX). Futhermore, the user can set compiler and linker
flags for the front-end compiler (e.g. CXXFLAGS_FOR_BUILD). The
Makefile.am's for the compiler wrappers (tools/compwrap) and OPARI
(tools/opari) overwrite the user-variables (e.g. CXXFLAGS) by the
*_FOR_BUILD stuff. Unfortunately, the variables AM_CXXFLAGS,
AM_CPPFLAGS, and AM_LDFLAGS cannot be used for do that, because these
variables don't overwrite the user-variables but they will be append.
This could means that unsupported compiler flags will be passed to the
front-end compiler.

Example: configure CXX_FOR_BUILD=g++ CXXFLAGS_FOR_BUILD=-m64
CC=cross-xlc CXX=cross-xlC CFLAGS=-q64 CXXFLAGS=-q64 ...

In this case the compiler flag -q64 is not supported by g++, so
CXXFLAGS_FOR_BUILD should be used instead of CXXFLAGS.

So, please ignore the warnings from Automake... Currently, I see no
better solution ;-)
...

Regards,
Matthias

On Di, 2008-02-12 at 11:27 -0500, George Bosilca wrote:

> I keep getting some warnings when I compile with gcc-4.2 on MAC OS X.
> 
> tools/compwrap/Makefile.am:38: `CXXFLAGS' is a user variable, you  
> should not override it;
> tools/compwrap/Makefile.am:38: use `AM_CXXFLAGS' instead.
> tools/compwrap/Makefile.am:40: `CPPFLAGS' is a user variable, you  
> should not override it;
> tools/compwrap/Makefile.am:40: use `AM_CPPFLAGS' instead.
> tools/compwrap/Makefile.am:41: `LDFLAGS' is a user variable, you  
> should not override it;
> tools/compwrap/Makefile.am:41: use `AM_LDFLAGS' instead.
> tools/opari/tool/Makefile.am:8: `CXXFLAGS' is a user variable, you  
> should not override it;
> tools/opari/tool/Makefile.am:8: use `AM_CXXFLAGS' instead.
> tools/opari/tool/Makefile.am:10: `CPPFLAGS' is a user variable, you  
> should not override it;
> tools/opari/tool/Makefile.am:10: use `AM_CPPFLAGS' instead.
> tools/opari/tool/Makefile.am:11: `LDFLAGS' is a user variable, you  
> should not override it;
> tools/opari/tool/Makefile.am:11: use `AM_LDFLAGS' instead.
> 
>Thanks,
>  george.
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Matthias Jurenz,
Center for Information Services and 
High Performance Computing (ZIH), TU Dresden, 
Willersbau A106, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-31945, fax +49-351-463-37773

smime.p7s
Description: S/MIME cryptographic signature

Re: [OMPI devel] Please set svn:ignore properly

2008-02-13 Thread Ralph Castain

Yo Jeff

I sympathize with your request. However, we should note that those of us not
using subversion for our work (e.g., using Hg or GIT) may not see this
problem despite best intentions. Those system set "ignore" on a global
basis, not on a per directory basis like svn. So (a) we just don't see any
warning about this, and (b) we don't have a way to set those properties in
our repositories.

When we merge the work from our repository over to an svn checkout, we
typically do not build it there. This helps when we are transitioning back
and forth between the official svn repository and our local repository. So
we again won't see an svn:ignore issue.

I know that doesn't help any, but I think it probably explains the majority
of what you are seeing. I'm not sure there is a good answer, unfortunately.

Ralph

On 2/12/08 7:46 AM, "Jeff Squyres"  wrote:

> Developers --
> 
> When you add a new component, framework, or anything that includes one
> or more new directories: please be sure to set the svn:ignore property
> on each new directory properly.  Here's the SVN docs on the svn:ignore
> property:
> 
> http://svnbook.red-bean.com/en/1.4/svn-book.html#svn.advanced.props.special.ig
> nore
> 
> It is proper to ignore all automatically-generated files, such as (but
> not limited to):
> 
> *.la
> *.lo
> .libs
> .deps
> .dirstamp
> Makefile
> Makefile.in
> static-components.h
> ...etc.
> 
> Thanks.

Re: [OMPI devel] more vt woes

2008-02-13 Thread Ralf Wildenhues

Hallo Matthias,

* Matthias Jurenz wrote on Wed, Feb 13, 2008 at 01:49:41PM CET:
> On Di, 2008-02-12 at 11:27 -0500, George Bosilca wrote:
> 
> > I keep getting some warnings when I compile with gcc-4.2 on MAC OS X.
> > 
> > tools/compwrap/Makefile.am:38: `CXXFLAGS' is a user variable, you  
> > should not override it;
[...]
> So, please ignore the warnings from Automake... Currently, I see no
> better solution ;-)

You can put
  AUTOMAKE_OPTIONS = -Wno-gnu

in tools/compwrap/Makefile.am to avoid the warnings from automake.

Cheers,
Ralf

Re: [OMPI devel] more vt woes

2008-02-13 Thread Matthias Jurenz

Thanks for the hint, Ralf ! I will give it a try...


On Mi, 2008-02-13 at 13:58 +0100, Ralf Wildenhues wrote:

> Hallo Matthias,
> 
> * Matthias Jurenz wrote on Wed, Feb 13, 2008 at 01:49:41PM CET:
> > On Di, 2008-02-12 at 11:27 -0500, George Bosilca wrote:
> > 
> > > I keep getting some warnings when I compile with gcc-4.2 on MAC OS X.
> > > 
> > > tools/compwrap/Makefile.am:38: `CXXFLAGS' is a user variable, you  
> > > should not override it;
> [...]
> > So, please ignore the warnings from Automake... Currently, I see no
> > better solution ;-)
> 
> You can put
>   AUTOMAKE_OPTIONS = -Wno-gnu
> 
> in tools/compwrap/Makefile.am to avoid the warnings from automake.
> 
> Cheers,
> Ralf
> 

--
Matthias Jurenz,
Center for Information Services and 
High Performance Computing (ZIH), TU Dresden, 
Willersbau A106, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-31945, fax +49-351-463-37773


smime.p7s
Description: S/MIME cryptographic signature

Re: [OMPI devel] Please set svn:ignore properly

2008-02-13 Thread Jeff Squyres

Understood; I too, have started to use hg internally at Cisco.  But I  
still take care to set svn:ignore properly when I commit back to the  
main repository, for a few reasons:


- SVN is the official SCM for OMPI; it's a choice to *not* use it
- there are still a good chunk of developers using SVN exclusively
- the svn:ignore information can be mined and used in other SCM  
systems, such as hg and git (hg has some internal "ignore" problems,  
but that's a different issue)


So I still think that everyone should be setting svn:ignore properly.

My $0.02...


On Feb 13, 2008, at 7:55 AM, Ralph Castain wrote:


Yo Jeff

I sympathize with your request. However, we should note that those  
of us not

using subversion for our work (e.g., using Hg or GIT) may not see this
problem despite best intentions. Those system set "ignore" on a global
basis, not on a per directory basis like svn. So (a) we just don't  
see any
warning about this, and (b) we don't have a way to set those  
properties in

our repositories.

When we merge the work from our repository over to an svn checkout, we
typically do not build it there. This helps when we are  
transitioning back
and forth between the official svn repository and our local  
repository. So

we again won't see an svn:ignore issue.

I know that doesn't help any, but I think it probably explains the  
majority
of what you are seeing. I'm not sure there is a good answer,  
unfortunately.


Ralph



On 2/12/08 7:46 AM, "Jeff Squyres"  wrote:


Developers --

When you add a new component, framework, or anything that includes  
one
or more new directories: please be sure to set the svn:ignore  
property
on each new directory properly.  Here's the SVN docs on the  
svn:ignore

property:

http://svnbook.red-bean.com/en/1.4/svn-book.html#svn.advanced.props.special.ig
nore

It is proper to ignore all automatically-generated files, such as  
(but

not limited to):

*.la
*.lo
.libs
.deps
.dirstamp
Makefile
Makefile.in
static-components.h
...etc.

Thanks.



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

[OMPI devel] Newest PLPA

2008-02-13 Thread Jeff Squyres


Hey!

This was *just* discussed on the list yesterday and I said that we  
needed to use the official 3rd party import SVN procedures for PLPA.   
This was *NOT* done here!


I also said that I would do an actual PLPA release before it was  
imported into Open MPI so that we could have an official drop rather  
than someone grabbing an arbitrary PLPA release.


Even worse, the new PLPA was mixed in with other code in a single SVN  
commit.  Bad, bad, bad!


r17443 should be backed out immediately and done properly.





On Feb 13, 2008, at 8:09 AM, shar...@osl.iu.edu wrote:


Author: sharonm
Date: 2008-02-13 08:09:11 EST (Wed, 13 Feb 2008)
New Revision: 17443
URL: https://svn.open-mpi.org/trac/ompi/changeset/17443

Log:
Replaced PLPA to the latest PLPA (plpa-1.1a3r123)


Text files modified:
  trunk/ompi/mca/btl/openib/ 
btl_openib_component.c| 4
  trunk/opal/mca/paffinity/base/ 
base.h| 6
  trunk/opal/mca/paffinity/base/ 
paffinity_base_wrappers.c |13 +-
  trunk/opal/mca/paffinity/linux/ 
paffinity_linux_module.c |24 ++--
  trunk/opal/mca/paffinity/linux/plpa/src/libplpa/ 
plpa_bottom.h   | 9
  trunk/opal/mca/paffinity/linux/plpa/src/libplpa/ 
plpa_map.c  |   218 +++
  trunk/opal/mca/paffinity/linux/plpa/src/plpa-info/plpa- 
info.c   |23 ++-
  trunk/opal/mca/paffinity/linux/plpa/src/plpa-taskset/plpa- 
taskset.c |15 +-
  trunk/opal/mca/paffinity/ 
paffinity.h|12 +-
  trunk/opal/mca/paffinity/solaris/ 
paffinity_solaris_module.c |18 +-
  trunk/opal/mca/paffinity/windows/ 
paffinity_windows_module.c |18 +-

  11 files changed, 224 insertions(+), 136 deletions(-)

Modified: trunk/ompi/mca/btl/openib/btl_openib_component.c
= 
= 
= 
= 
= 
= 
= 
= 
==

--- trunk/ompi/mca/btl/openib/btl_openib_component.c(original)
+++ trunk/ompi/mca/btl/openib/btl_openib_component.c	2008-02-13  
08:09:11 EST (Wed, 13 Feb 2008)

@@ -1175,10 +1175,10 @@
{
opal_paffinity_base_cpu_set_t cpus;
opal_carto_base_node_t *hca_node;
-int min_distance = -1, i, max_proc_id;
+int min_distance = -1, i, max_proc_id, num_processors;
const char *hca = ibv_get_device_name(dev);

-if(opal_paffinity_base_max_processor_id(&max_proc_id) !=  
OMPI_SUCCESS)
+if(opal_paffinity_base_get_processor_info(&num_processors,  
&max_proc_id) != OMPI_SUCCESS)

max_proc_id = 100; /* Choose something big enough */

hca_node = carto_base_find_node(host_topo, hca);

Modified: trunk/opal/mca/paffinity/base/base.h
= 
= 
= 
= 
= 
= 
= 
= 
==

--- trunk/opal/mca/paffinity/base/base.h(original)
+++ trunk/opal/mca/paffinity/base/base.h	2008-02-13 08:09:11 EST  
(Wed, 13 Feb 2008)

@@ -167,7 +167,7 @@
 * @return int - OPAL_SUCCESS or OPAL_ERR_NOT_SUPPORTED if not
 * supported
 */
-OPAL_DECLSPEC int opal_paffinity_base_max_processor_id(int  
*max_processor_id);
+OPAL_DECLSPEC int opal_paffinity_base_get_processor_info(int  
*num_processors, int *max_processor_id);


/**
 * Return the max socket number
@@ -177,7 +177,7 @@
 * @return int - OPAL_SUCCESS or OPAL_ERR_NOT_SUPPORTED if not
 * supported
 */
-OPAL_DECLSPEC int opal_paffinity_base_max_socket(int  
*max_socket);
+OPAL_DECLSPEC int opal_paffinity_base_get_socket_info(int  
*num_sockets, int *max_socket_num);


/**
 * Return the max core number for a given socket
@@ -188,7 +188,7 @@
 * @return int - OPAL_SUCCESS or OPAL_ERR_NOT_SUPPORTED if not
 * supported
 */
-OPAL_DECLSPEC int opal_paffinity_base_max_core(int socket, int  
*max_core);
+OPAL_DECLSPEC int opal_paffinity_base_get_core_info(int socket,  
int *num_cores, int *max_core_num);


/**
 * Indication of whether a component was successfully selected or

Modified: trunk/opal/mca/paffinity/base/paffinity_base_wrappers.c
= 
= 
= 
= 
= 
= 
= 
= 
==

--- trunk/opal/mca/paffinity/base/paffinity_base_wrappers.c (original)
+++ trunk/opal/mca/paffinity/base/paffinity_base_wrappers.c	 
2008-02-13 08:09:11 EST (Wed, 13 Feb 2008)

@@ -63,27 +63,28 @@
return opal_paffinity_base_module- 
>paff_map_to_socket_core(processor_id, socket, core);

}

-int opal_paffinity_base_max_processor_id(int *max_processor_id)
+
+int opal_paffinity_base_get_processor_info(int *num_processors, int  
*max_processor_id)

{
if (!opal_paffinity_base_selected) {
return OPAL_ERR_NOT_FOUND;
}
-return opal_paffinity_base_module- 
>paff_max_processor_id(max_processor_id);
+return opal_paffinity_base_module- 
>paff_get_processor_info(num_processors, max_processor_id);

}

-int opal_paffinity_base_max_socke

Re: [OMPI devel] btl_openib_rnr_retry MCA param

2008-02-13 Thread Jeff Squyres

Ok.  I'll clean up the description of that MCA param to state that it  
only applies to SRQs.


Thanks.


On Feb 13, 2008, at 12:59 AM, Gleb Natapov wrote:


On Tue, Feb 12, 2008 at 05:41:13PM -0500, Jeff Squyres wrote:
I see that in the OOB CPC for the openib BTL, when setting up the  
send

side of the QP, we set the rnr_retry value depending on whether the
remote receive queue is a per-peer or SRQ:

- SRQ: btl_openib_rnr_retry MCA param value
- PP: 0

The rationale given in a comment is that setting the RNR to 0 is a
good way to find bugs in our flow control.

Do we really want this in production builds?  Or do we want 0 for
developer builds and the same btl_openib_rnr_retry value for PP  
queues?



The comment is mine and IMO it should stay that way for production
builds. SW flow control either work or it doesn't and if it doesn't I
prefer to know about it immediately. Setting PP to some value greater
then 0 just delays the manifestation of the problem and in the case of
iWarp such possibility doesn't even exists.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] btl_openib_rnr_retry MCA param

2008-02-13 Thread Jeff Squyres

Actually, we should then also print out a different error message when  
RNR occurs in PP QP's, too.  It should be something along the lines of  
"flow control problem occurred; this shouldn't happen..." (right now  
it says RNR happened, and goes into detail into what that means -- but  
that's not the real problem).


I'll do that as well.


On Feb 13, 2008, at 12:59 AM, Gleb Natapov wrote:


On Tue, Feb 12, 2008 at 05:41:13PM -0500, Jeff Squyres wrote:
I see that in the OOB CPC for the openib BTL, when setting up the  
send

side of the QP, we set the rnr_retry value depending on whether the
remote receive queue is a per-peer or SRQ:

- SRQ: btl_openib_rnr_retry MCA param value
- PP: 0

The rationale given in a comment is that setting the RNR to 0 is a
good way to find bugs in our flow control.

Do we really want this in production builds?  Or do we want 0 for
developer builds and the same btl_openib_rnr_retry value for PP  
queues?



The comment is mine and IMO it should stay that way for production
builds. SW flow control either work or it doesn't and if it doesn't I
prefer to know about it immediately. Setting PP to some value greater
then 0 just delays the manifestation of the problem and in the case of
iWarp such possibility doesn't even exists.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] btl_openib_rnr_retry MCA param

2008-02-13 Thread Gleb Natapov

On Wed, Feb 13, 2008 at 09:05:24AM -0500, Jeff Squyres wrote:
> Actually, we should then also print out a different error message when  
> RNR occurs in PP QP's, too.  It should be something along the lines of  
> "flow control problem occurred; this shouldn't happen..." (right now  
> it says RNR happened, and goes into detail into what that means -- but  
> that's not the real problem).
> 
Good point.

> I'll do that as well.
Thanks!

> 
> 
> On Feb 13, 2008, at 12:59 AM, Gleb Natapov wrote:
> 
> > On Tue, Feb 12, 2008 at 05:41:13PM -0500, Jeff Squyres wrote:
> >> I see that in the OOB CPC for the openib BTL, when setting up the  
> >> send
> >> side of the QP, we set the rnr_retry value depending on whether the
> >> remote receive queue is a per-peer or SRQ:
> >>
> >> - SRQ: btl_openib_rnr_retry MCA param value
> >> - PP: 0
> >>
> >> The rationale given in a comment is that setting the RNR to 0 is a
> >> good way to find bugs in our flow control.
> >>
> >> Do we really want this in production builds?  Or do we want 0 for
> >> developer builds and the same btl_openib_rnr_retry value for PP  
> >> queues?
> >>
> > The comment is mine and IMO it should stay that way for production
> > builds. SW flow control either work or it doesn't and if it doesn't I
> > prefer to know about it immediately. Setting PP to some value greater
> > then 0 just delays the manifestation of the problem and in the case of
> > iWarp such possibility doesn't even exists.
> >
> > --
> > Gleb.
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> Cisco Systems
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Gleb.

Re: [OMPI devel] [RFC] Remove explicit call to progress() from ob1.

2008-02-13 Thread Jeff Squyres

Good enough for me.  I'd also say that the comments should be  
fixed.  :-)


On Feb 13, 2008, at 3:24 AM, Gleb Natapov wrote:


On Tue, Feb 12, 2008 at 05:57:22PM -0500, Jeff Squyres wrote:

Were these supposed to cover the time required for pinning and
unpinning?

That what the comment says, but CPU executes code and not comments :)
Memory pinning happens inside prepare_dst() after prepare_dst()  
returns

the memory is already pinned. If you want to call progress after each
call to prepare_dst() you still can do it by setting  
recv_pipeline_depth

to 1. And unpinning happens in entirely different place after RDMA
completion is acknowledged.



Can you explain why you think they're unnecessary?

The much better question is "Why they are necessary?", because if  
there

is not good answer to this question then they should be removed, since
they are harmful as they cause uncontrollable recursion calls.



On Feb 12, 2008, at 5:27 AM, Gleb Natapov wrote:


Hi,

I am planning to commit the following patch. Those two progress()
calls
are responsible for most of our deep recursion troubles. And I also
think they are completely unnecessary.

diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c b/ompi/mca/pml/ob1/
pml_ob1_recvreq.c
index 5899243..641176e 100644
--- a/ompi/mca/pml/ob1/pml_ob1_recvreq.c
+++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
@@ -704,9 +704,6 @@ int mca_pml_ob1_recv_request_schedule_once(
   mca_bml_base_free(bml_btl,dst);
   continue;
   }
-
-/* run progress as the prepare (pinning) can take some time
*/
-mca_bml.bml_progress();
   }

   return OMPI_SUCCESS;
diff --git a/ompi/mca/pml/ob1/pml_ob1_sendreq.c b/ompi/mca/pml/ob1/
pml_ob1_sendreq.c
index 0998a05..9d7f3f9 100644
--- a/ompi/mca/pml/ob1/pml_ob1_sendreq.c
+++ b/ompi/mca/pml/ob1/pml_ob1_sendreq.c
@@ -968,7 +968,6 @@ cannot_pack:
   mca_bml_base_free(bml_btl,des);
   continue;
   }
-mca_bml.bml_progress();
   }

   return OMPI_SUCCESS;
--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] Scheduled merge of ORTE devel branch to trunk

2008-02-13 Thread Doug Tody

Hi Ralph -

Eliminating the dependence of OMPI on the GPR is in some ways
actually a plus, as it should make it much easier to enhance the GPR
as an optional advanced capability.  In general, it would be great
if OMPI/ORTE could make it easier to support this sort of extension
mechanism, for example by evolving the framework mechanism to a general
plugin mechanism supporting dynamic components as well as statically
compiled in ones.  Probably this is what you meant by dynamic binary
modules below.

> That said, it would be relatively simple to add an extension that provided a
> level of data storage that user-level programs could access. It would not
> provide any subscription or trigger capabilities, however - we need to leave
> those out of the system to avoid reintroducing the event-driven problems
> again. But if you just wanted to store and retrieve data for sharing it
> across processes, that could be provided with minimal effort or impact.

Yes, this is what I had in mind.  I do not understand the problem with
event-driven capabilities however; so long as these are only used in
some applications and not used for OMPI they should not compromise
OMPI.  Even given a storage-only GPR, it should be possible for an
application to use the RML to accomplish much the same thing.  Also,
whether there are problems (such as deadlock) with asynchronous,
event driven interactions is largely an issue of the interaction
patterns employed, and can be managed by careful design of the higher
level applications and their interactions.

> Another alternative: there is a separate "ORTE" project in Europe that is
> building extensions to our ORTE - they are tracking these code changes,

Sounds interesting - how would one find out more about this?

- Doug


On Tue, 12 Feb 2008, Ralph Castain wrote:

> Hi Doug
> 
> The changes are rather far-reaching. We essentially revamped the entire RTE
> to switch from an event-driven architecture to one based on sequential
> logic. This had large benefits, but the GPR was the casualty. Remember, the
> aim for the past year has been to create a dedicated "lean, mean OMPI
> machine"!
> 
> That said, it would be relatively simple to add an extension that provided a
> level of data storage that user-level programs could access. It would not
> provide any subscription or trigger capabilities, however - we need to leave
> those out of the system to avoid reintroducing the event-driven problems
> again. But if you just wanted to store and retrieve data for sharing it
> across processes, that could be provided with minimal effort or impact.
> Probably best done as a compile-time optional module, though, to avoid
> adding to the memory footprint for everyone.
> 
> Another alternative: there is a separate "ORTE" project in Europe that is
> building extensions to our ORTE - they are tracking these code changes, but
> adding "bolt-ons" such as a GPR-like central data store, hooks for workflow
> management and the grid, multi-cluster operations, etc. I'm working with
> them on those efforts - if there is interest in such capabilities, I can
> probably look into architecting things so that some of the "bolt-ons" could
> be dynamically picked up by OMPI as binary modules or something.
> 
> For now, though, there will be no GPR-like storage in the new system.
> Ralph
> 
> 
> 
> On 2/12/08 1:43 PM, "Doug Tody"  wrote:
> 
> > Hi Ralph -
> > 
> > How extensive are the changes involved in removing the GPR?  How hard would
> > it be for someone to maintain an enhanced version of this as an addon or
> > compile-time optional module?  Thanks.
> > 
> > - Doug
> > 
> > 
> > On Mon, 11 Feb 2008, Ralph Castain wrote:
> > 
> >> Hello all
> >> 
> >> Per last week's telecon, we planned the merge of the latest ORTE devel
> >> branch to the OMPI trunk for after Sun had committed its C++ changes. That
> >> happened over the weekend.
> >> 
> >> Therefore, based on the requests at the telecon, I will be merging the
> >> current ORTE devel branch to the trunk on Wed 2/13. I'll make the commit
> >> around 4:30pm Eastern time - will send out warning shortly before the 
> >> commit
> >> to let you know it is coming. I'll advise of any delays.
> >> 
> >> This will be a snapshot of that devel branch - it will include the upgraded
> >> launch system, remove the GPR, add the new tool communication library, 
> >> allow
> >> arbitrary mpiruns to interconnect, supports the revamped hostfile and
> >> dash-host behaviors per the wiki, etc.
> >> 
> >> However, it is incomplete and contains some known flaws. For example,
> >> totalview support has not been enabled yet. Comm_spawn, which is currently
> >> broken on the OMPI trunk, is fixed - but singleton comm_spawn remains
> >> broken. I am in the process of establishing support for direct and
> >> standalone launch capabilities, but those won't be in the merge. I have
> >> updated all of the launchers, but can only certify the SLURM, TM, and RSH
> >> ones to work - the Xgrid

[OMPI devel] --with-visibility

2008-02-13 Thread Jeff Squyres

Just curious -- is there a reason we don't have --with-visibility  
enabled by default on platforms that support it?  It seems like a  
useful mechanism.


Also, I notice that we don't have an output line in configure that  
shows if visibility was enabled or not.  Can it be added?


--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] btl_openib_rnr_retry MCA param

Re: [OMPI devel] [RFC] Remove explicit call to progress() from ob1.

Re: [OMPI devel] merging new PLPA to the trunk

Re: [OMPI devel] more vt woes

Re: [OMPI devel] Please set svn:ignore properly

Re: [OMPI devel] more vt woes

Re: [OMPI devel] more vt woes

Re: [OMPI devel] Please set svn:ignore properly

[OMPI devel] Newest PLPA

Re: [OMPI devel] btl_openib_rnr_retry MCA param

Re: [OMPI devel] btl_openib_rnr_retry MCA param

Re: [OMPI devel] btl_openib_rnr_retry MCA param

Re: [OMPI devel] [RFC] Remove explicit call to progress() from ob1.

Re: [OMPI devel] Scheduled merge of ORTE devel branch to trunk

[OMPI devel] --with-visibility

15 matches

Site Navigation

Mail list logo

Footer information