Re: [OMPI devel] LOCK_SHARED?

2009-01-05 Thread Rolf Vandevaart


Hi Jim:
Yes, we ran into this also and your diagnosis is correct.  The details 
are in this ticket.

https://svn.open-mpi.org/trac/ompi/ticket/1477

We fixed it in the trunk and in the 1.3 series but we never backported 
it to the 1.2 series
as 1.3 was going to be released "really soon".  Here is the ticket for 
moving the fix

into the 1.3 series.
https://svn.open-mpi.org/trac/ompi/ticket/1494

Send me an email offline and we can figure out how to fix this for your 
case.


Rolf


Jim Langston wrote:

Hi all,

Quick question, I'm compiling 1.2.9rc1 and get an error during 
compilation:


//


source='mpicxx.cc' object='mpicxx.lo' libtool=yes \
   DEPDIR=.deps depmode=none /bin/sh ../../../config/depcomp \
   /bin/sh ../../../libtool --tag=CXX   --mode=compile 
/export/home/langston/COMPILER/SUNWspro/bin/CC -DHAVE_CONFIG_H -I. 
-I../../../opal/include -I../../../orte/include 
-I../../../ompi/include  -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 
-DOMPI_SKIP_MPICXX=1 -I../../..-O -DNDEBUG  -mt -c -o mpicxx.lo 
mpicxx.cc
libtool: compile:  /export/home/langston/COMPILER/SUNWspro/bin/CC 
-DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include 
-I../../../ompi/include -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 
-DOMPI_SKIP_MPICXX=1 -I../../.. -O -DNDEBUG -mt -c mpicxx.cc  -KPIC 
-DPIC -o .libs/mpicxx.o
"mpicxx.cc", line 293: Error: A declaration does not specify a tag or 
an identifier.

"mpicxx.cc", line 293: Error: Use ";" to terminate declarations.
"mpicxx.cc", line 293: Error: A declaration was expected instead of 
"0x01".

3 Error(s) detected.
gmake: *** [mpicxx.lo] Error 1



I'm working with OpenSolaris 2008.11 and have found the conflict to be 
with:


/usr/include/sys/synch.h , which also contains LOCK_SHARED


/* Keep the following values in sync with pthread.h */
#define LOCK_NORMAL 0x00/* same as 
USYNC_THREAD */
#define LOCK_SHARED 0x01/* same as 
USYNC_PROCESS */

#define LOCK_ERRORCHECK 0x02/* error check lock */
#define LOCK_RECURSIVE  0x04/* recursive lock */
#define LOCK_PRIO_INHERIT   0x10/* priority 
inheritance lock */
#define LOCK_PRIO_PROTECT   0x20/* priority ceiling 
lock */

#define LOCK_ROBUST 0x40/* robust lock */

..

If I comment out the line in the system include file, everything will 
finish
compiling, or if I comment out the line in mpicxx.cc, everything will 
finish

compiling.

Has anyone else found this issue and/or a workaround?

Thanks,

Jim
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--

=
rolf.vandeva...@sun.com
781-442-3043
=



[OMPI devel] RFC: Component-izing MPI_Op

2009-01-05 Thread Jeff Squyres
WHAT: Converting the back-end of MPI_Op's to use components instead of  
hard-coded C functions.


WHY: To support specialized hardware (such as GPUs).

WHERE: Changes most of the MPI_Op code, adds a new ompi/mca/op  
framework.


WHEN: Work has started in an hg branch (http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/cuda/ 
).


TIMEOUT: Next Tuesday's teleconference, Jan 13 2008.

---

Note: I don't plan to finish the work by Jan 13; I just want to get a  
yea/nay from the community on the concept.  Final review of the code  
before coming into the trunk can come later when I have more work to  
show / review.


Background: Today, the back-end MPI_Op functionality of (MPI_Op,  
MPI_Datatype) tuples are implemented as function pointers to a series  
of hard-coded C functions in the ompi/op/ directory.


  *** NOTE: Since we already implement MPI_Op functionality via  
function pointer, this proposed extension is not expected to cause any  
performance difference in terms of OMPI's infrastructure.


Proposal: Extend the current implementation by creating a new  
framework ("op") that allows components to provide back-end MPI_Op  
functions instead of/in addition to the hard-coded C functions (we've  
talked about this idea before, but never done it).


The "op" framework will be similar to the MPI coll framework in that  
individual function pointers from multiple different modules can be  
mixed-n-matched.  For example, if you want to write a new coll  
component that implements *only* a new MPI_BCAST algorithm, that coll  
component can be mixed-n-matched with other coll components at run  
time to get a full set of collective implementations on a  
communicator.  A similar concept will be applied to the "op"  
framework.  Case in point: some specialized hardware is only good at  
*some* operations on *some* datatypes; we'll need to fall back to the  
hard-coded C versions for all other tuples.


It is likely that the the "op" framework base will have all the hard- 
coded C "basic" MPI_Op functions that will always be available for  
fallback if a component is not used at run-time for a specialized  
implementation.  Specifically: the intent is that components will be  
for specialized implementations.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RFC: Component-izing MPI_Op

2009-01-05 Thread Brian W. Barrett
I think this sounds reasonable, if (and only if) MPI_Accumulate is 
properly handled.  The interface for calling the op functions was broken 
in some fairly obvious way for accumulate when I was writing the one-sided 
code.  I think I had to call some supposedly internal bits of the 
interface to make accumulate work.  I can't remember what they are now, 
but I do remember it being a problem.


Of course, unless it makes mpi_allreduce on one double-sized floating 
point number using sum go faster, I'm not entirely sure a change is 
helpful ;).


Brian

On Mon, 5 Jan 2009, Jeff Squyres wrote:

WHAT: Converting the back-end of MPI_Op's to use components instead of 
hard-coded C functions.


WHY: To support specialized hardware (such as GPUs).

WHERE: Changes most of the MPI_Op code, adds a new ompi/mca/op framework.

WHEN: Work has started in an hg branch 
(http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/cuda/).


TIMEOUT: Next Tuesday's teleconference, Jan 13 2008.

---

Note: I don't plan to finish the work by Jan 13; I just want to get a yea/nay 
from the community on the concept.  Final review of the code before coming 
into the trunk can come later when I have more work to show / review.


Background: Today, the back-end MPI_Op functionality of (MPI_Op, 
MPI_Datatype) tuples are implemented as function pointers to a series of 
hard-coded C functions in the ompi/op/ directory.


*** NOTE: Since we already implement MPI_Op functionality via function 
pointer, this proposed extension is not expected to cause any performance 
difference in terms of OMPI's infrastructure.


Proposal: Extend the current implementation by creating a new framework 
("op") that allows components to provide back-end MPI_Op functions instead 
of/in addition to the hard-coded C functions (we've talked about this idea 
before, but never done it).


The "op" framework will be similar to the MPI coll framework in that 
individual function pointers from multiple different modules can be 
mixed-n-matched.  For example, if you want to write a new coll component that 
implements *only* a new MPI_BCAST algorithm, that coll component can be 
mixed-n-matched with other coll components at run time to get a full set of 
collective implementations on a communicator.  A similar concept will be 
applied to the "op" framework.  Case in point: some specialized hardware is 
only good at *some* operations on *some* datatypes; we'll need to fall back 
to the hard-coded C versions for all other tuples.


It is likely that the the "op" framework base will have all the hard-coded C 
"basic" MPI_Op functions that will always be available for fallback if a 
component is not used at run-time for a specialized implementation. 
Specifically: the intent is that components will be for specialized 
implementations.





Re: [OMPI devel] RFC: Component-izing MPI_Op

2009-01-05 Thread Jeff Squyres

On Jan 5, 2009, at 10:09 AM, Brian W. Barrett wrote:

I think this sounds reasonable, if (and only if) MPI_Accumulate is  
properly handled.  The interface for calling the op functions was  
broken in some fairly obvious way for accumulate when I was writing  
the one-sided code.  I think I had to call some supposedly internal  
bits of the interface to make accumulate work.  I can't remember  
what they are now, but I do remember it being a problem.


Coolio; I'll look into it.

Of course, unless it makes mpi_allreduce on one double-sized  
floating point number using sum go faster, I'm not entirely sure a  
change is helpful ;).


From my (admittedly limited) understanding, since there are memory  
registration and/or copy in/out issues with GPUs, the operation has to  
be "big enough" and/or already located in GPU memory for the GPU to  
outperform the CPU.  It is my assumption that the component-ized CUDA/ 
OpenCL/whatever code will need to make a decision whether it should  
perform the operation at run-time or pass it back to a fallback  
[probably CPU-based] implementation, analogous to how "tuned" picks  
the right coll algorithm.


I'm told that there's some researchy middleware working on exactly  
this kind of problem (determining if a given operation is suitable to  
run on the GPU or the main CPU).  So in a best-case scenario, OMPI can  
just link against and use that middleware rather than implementing all  
the logic in the component itself.  We'll see how it plays out.


My goal is to give these guys the infrastructure that they need in  
OMPI to play with these kind of concepts and see what they can  
accomplish in terms of real performance.  FWIW: a few SC08 attendees  
thought that they could avoid writing much CUDA/CL/whatever code if  
MPI_REDUCE did the work for them (particularly if paired with the  
proposed MPI_REDUCE_LOCAL function, https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/24) 
.  [shrug]  We'll see!


--
Jeff Squyres
Cisco Systems



[OMPI devel] problem compiling r20196

2009-01-05 Thread Thomas Ropars

Hi,

I don't manage to compile the code from the svn r20196.

I get the following error:
pstat_linux_module.c:34:73: error: asm/page.h: No such file or directory
make[2]: *** [pstat_linux_module.lo] Error 1

It seems that it is because new Linux kernels no longer install 
asm/page.h (I use a 2.6.27 Linux kernel).


Regards,

Thomas.





Re: [OMPI devel] [OMPI svn] svn:open-mpi r20196

2009-01-05 Thread Aurélien Bouteiller

Tim,

To answer to your question in ticket #869: the only known missing  
feature to the opal_stdint.h is that there is no portable way to  
printf size_t. Their type is subject to so many changes depending on  
the platform and compiler that it is impossible to be sure that  
PRI_size_t is not gonna dump a lot of warnings. Aside from that, it  
should be pretty solid.


Aurelien



Le 4 janv. 09 à 00:09, timat...@osl.iu.edu a écrit :


Author: timattox
Date: 2009-01-04 00:09:18 EST (Sun, 04 Jan 2009)
New Revision: 20196
URL: https://svn.open-mpi.org/trac/ompi/changeset/20196

Log:
Refs #868, #869

The fix for #868, r14358, introduced an (unneeded?) inconsitency...
For Mac OS X systems, inttypes.h will always be included with  
opal_config.h,
and NOT included for non-Mac OS X systems.  For developers using Mac  
OS X,
this masks the need to include inttypes.h or more properly  
opal_stdint.h.


This changeset corrects one of these oopses.  However, the  
underlying problem

still exists.  Moving the equivelent of r14358 into opal_stdint.h from
opal_config_bottom.h might be the "right" solution, but AFAIK, we  
would then
need to replace each direct inclusion of inttypes.h with  
opal_stdint.h to

properly address tickets #868 and #869.

Text files modified:
  trunk/opal/dss/dss_print.c | 1 +
  1 files changed, 1 insertions(+), 0 deletions(-)

Modified: trunk/opal/dss/dss_print.c
= 
= 
= 
= 
= 
= 
= 
= 
==

--- trunk/opal/dss/dss_print.c  (original)
+++ trunk/opal/dss/dss_print.c	2009-01-04 00:09:18 EST (Sun, 04 Jan  
2009)

@@ -18,6 +18,7 @@

#include "opal_config.h"

+#include "opal_stdint.h"
#include 

#include "opal/dss/dss_internal.h"
___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn





Re: [OMPI devel] [OMPI svn] svn:open-mpi r20196

2009-01-05 Thread Aurélien Bouteiller
Addendum to the previous message concerning this discussion: I think  
we should stick with including opal_stdint everywhere instead of  
inttypes.h (this file does not always exist on ansi pedantic compilers).


Aurelien


Le 4 janv. 09 à 00:09, timat...@osl.iu.edu a écrit :


Author: timattox
Date: 2009-01-04 00:09:18 EST (Sun, 04 Jan 2009)
New Revision: 20196
URL: https://svn.open-mpi.org/trac/ompi/changeset/20196

Log:
Refs #868, #869

The fix for #868, r14358, introduced an (unneeded?) inconsitency...
For Mac OS X systems, inttypes.h will always be included with  
opal_config.h,
and NOT included for non-Mac OS X systems.  For developers using Mac  
OS X,
this masks the need to include inttypes.h or more properly  
opal_stdint.h.


This changeset corrects one of these oopses.  However, the  
underlying problem

still exists.  Moving the equivelent of r14358 into opal_stdint.h from
opal_config_bottom.h might be the "right" solution, but AFAIK, we  
would then
need to replace each direct inclusion of inttypes.h with  
opal_stdint.h to

properly address tickets #868 and #869.

Text files modified:
  trunk/opal/dss/dss_print.c | 1 +
  1 files changed, 1 insertions(+), 0 deletions(-)

Modified: trunk/opal/dss/dss_print.c
= 
= 
= 
= 
= 
= 
= 
= 
==

--- trunk/opal/dss/dss_print.c  (original)
+++ trunk/opal/dss/dss_print.c	2009-01-04 00:09:18 EST (Sun, 04 Jan  
2009)

@@ -18,6 +18,7 @@

#include "opal_config.h"

+#include "opal_stdint.h"
#include 

#include "opal/dss/dss_internal.h"
___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn





Re: [OMPI devel] problem compiling r20196

2009-01-05 Thread Jeff Squyres

Is there some other file that should be included instead?


On Jan 5, 2009, at 1:16 PM, Thomas Ropars wrote:


Hi,

I don't manage to compile the code from the svn r20196.

I get the following error:
pstat_linux_module.c:34:73: error: asm/page.h: No such file or  
directory

make[2]: *** [pstat_linux_module.lo] Error 1

It seems that it is because new Linux kernels no longer install asm/ 
page.h (I use a 2.6.27 Linux kernel).


Regards,

Thomas.



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] problem compiling r20196

2009-01-05 Thread Ralph Castain
The file is present on the 2.6.19 distribution, which is the most  
current I have access to.


However, after looking at the code, I realized that we no longer need  
that include file anyway - so I have removed it. Hopefully, that  
should let you build.


Ralph

On Jan 5, 2009, at 12:08 PM, Jeff Squyres wrote:


Is there some other file that should be included instead?


On Jan 5, 2009, at 1:16 PM, Thomas Ropars wrote:


Hi,

I don't manage to compile the code from the svn r20196.

I get the following error:
pstat_linux_module.c:34:73: error: asm/page.h: No such file or  
directory

make[2]: *** [pstat_linux_module.lo] Error 1

It seems that it is because new Linux kernels no longer install asm/ 
page.h (I use a 2.6.27 Linux kernel).


Regards,

Thomas.



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] LOCK_SHARED?

2009-01-05 Thread Jim Langston

Hi Rolf,

Thanks for the pointers, they are very clear and concise. I followed the 
general
flow of what was done to fix the issue in 1.3 and did something similar 
for 1.2.9.


In mpicxx.cc, I did this change:

#include 
#ifdef LOCK_SHARED
  static const int ompi_synch_lock_shared = LOCK_SHARED ;
#undef LOCK_SHARED
#endif
const int LOCK_SHARED = MPI_LOCK_SHARED;

Even though the variable getting set is basically dead code and not 
necessary,
my goal is that if someone is looking at the 1.3 notes, they will see 
what I did. This

makes OpenMPI happy and the compile continues and chugs along.

If someone thinks I screwed up OpenMPI , please let me know.

Thanks,

Jim

///


Rolf Vandevaart wrote:


Hi Jim:
Yes, we ran into this also and your diagnosis is correct.  The details 
are in this ticket.

https://svn.open-mpi.org/trac/ompi/ticket/1477

We fixed it in the trunk and in the 1.3 series but we never backported 
it to the 1.2 series
as 1.3 was going to be released "really soon".  Here is the ticket for 
moving the fix

into the 1.3 series.
https://svn.open-mpi.org/trac/ompi/ticket/1494

Send me an email offline and we can figure out how to fix this for 
your case.


Rolf


Jim Langston wrote:

Hi all,

Quick question, I'm compiling 1.2.9rc1 and get an error during 
compilation:


//


source='mpicxx.cc' object='mpicxx.lo' libtool=yes \
   DEPDIR=.deps depmode=none /bin/sh ../../../config/depcomp \
   /bin/sh ../../../libtool --tag=CXX   --mode=compile 
/export/home/langston/COMPILER/SUNWspro/bin/CC -DHAVE_CONFIG_H -I. 
-I../../../opal/include -I../../../orte/include 
-I../../../ompi/include  -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 
-DOMPI_SKIP_MPICXX=1 -I../../..-O -DNDEBUG  -mt -c -o mpicxx.lo 
mpicxx.cc
libtool: compile:  /export/home/langston/COMPILER/SUNWspro/bin/CC 
-DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include 
-I../../../ompi/include -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 
-DOMPI_SKIP_MPICXX=1 -I../../.. -O -DNDEBUG -mt -c mpicxx.cc  -KPIC 
-DPIC -o .libs/mpicxx.o
"mpicxx.cc", line 293: Error: A declaration does not specify a tag or 
an identifier.

"mpicxx.cc", line 293: Error: Use ";" to terminate declarations.
"mpicxx.cc", line 293: Error: A declaration was expected instead of 
"0x01".

3 Error(s) detected.
gmake: *** [mpicxx.lo] Error 1



I'm working with OpenSolaris 2008.11 and have found the conflict to 
be with:


/usr/include/sys/synch.h , which also contains LOCK_SHARED


/* Keep the following values in sync with pthread.h */
#define LOCK_NORMAL 0x00/* same as 
USYNC_THREAD */
#define LOCK_SHARED 0x01/* same as 
USYNC_PROCESS */

#define LOCK_ERRORCHECK 0x02/* error check lock */
#define LOCK_RECURSIVE  0x04/* recursive lock */
#define LOCK_PRIO_INHERIT   0x10/* priority 
inheritance lock */
#define LOCK_PRIO_PROTECT   0x20/* priority ceiling 
lock */

#define LOCK_ROBUST 0x40/* robust lock */

..

If I comment out the line in the system include file, everything will 
finish
compiling, or if I comment out the line in mpicxx.cc, everything will 
finish

compiling.

Has anyone else found this issue and/or a workaround?

Thanks,

Jim
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel






--
/

Jim Langston
Sun Microsystems, Inc.

(877) 854-5583 (AccessLine)
(513) 702-4741 (Cell)
AIM: jl9594
jim.langs...@sun.com



Re: [OMPI devel] LOCK_SHARED?

2009-01-05 Thread Terry Dontje

Jim Langston wrote:

Hi Rolf,

Thanks for the pointers, they are very clear and concise. I followed 
the general
flow of what was done to fix the issue in 1.3 and did something 
similar for 1.2.9.


In mpicxx.cc, I did this change:

#include 
#ifdef LOCK_SHARED
  static const int ompi_synch_lock_shared = LOCK_SHARED ;
#undef LOCK_SHARED
#endif
const int LOCK_SHARED = MPI_LOCK_SHARED;

Even though the variable getting set is basically dead code and not 
necessary,
my goal is that if someone is looking at the 1.3 notes, they will see 
what I did. This

makes OpenMPI happy and the compile continues and chugs along.

If someone thinks I screwed up OpenMPI , please let me know.
For a one off change for usage with Solaris and Sun Studio I think the 
above is fine.  However, for a general fix that would not break builds 
for other platforms you'd really want to pull over the other handful of 
lines.It probably wouldn't be that bad to just CMR the changes to 
the 1.2 branch.  When the original changes to the trunk and 1.3 happened 
I really didn't think there were going to be more changes to the 1.2 
branch at the time which is why we opted not to CMR it at the time.


--td


Thanks,

Jim

///


Rolf Vandevaart wrote:


Hi Jim:
Yes, we ran into this also and your diagnosis is correct.  The 
details are in this ticket.

https://svn.open-mpi.org/trac/ompi/ticket/1477

We fixed it in the trunk and in the 1.3 series but we never 
backported it to the 1.2 series
as 1.3 was going to be released "really soon".  Here is the ticket 
for moving the fix

into the 1.3 series.
https://svn.open-mpi.org/trac/ompi/ticket/1494

Send me an email offline and we can figure out how to fix this for 
your case.


Rolf


Jim Langston wrote:

Hi all,

Quick question, I'm compiling 1.2.9rc1 and get an error during 
compilation:


//


source='mpicxx.cc' object='mpicxx.lo' libtool=yes \
   DEPDIR=.deps depmode=none /bin/sh ../../../config/depcomp \
   /bin/sh ../../../libtool --tag=CXX   --mode=compile 
/export/home/langston/COMPILER/SUNWspro/bin/CC -DHAVE_CONFIG_H -I. 
-I../../../opal/include -I../../../orte/include 
-I../../../ompi/include  -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 
-DOMPI_SKIP_MPICXX=1 -I../../..-O -DNDEBUG  -mt -c -o mpicxx.lo 
mpicxx.cc
libtool: compile:  /export/home/langston/COMPILER/SUNWspro/bin/CC 
-DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include 
-I../../../ompi/include -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 
-DOMPI_SKIP_MPICXX=1 -I../../.. -O -DNDEBUG -mt -c mpicxx.cc  -KPIC 
-DPIC -o .libs/mpicxx.o
"mpicxx.cc", line 293: Error: A declaration does not specify a tag 
or an identifier.

"mpicxx.cc", line 293: Error: Use ";" to terminate declarations.
"mpicxx.cc", line 293: Error: A declaration was expected instead of 
"0x01".

3 Error(s) detected.
gmake: *** [mpicxx.lo] Error 1



I'm working with OpenSolaris 2008.11 and have found the conflict to 
be with:


/usr/include/sys/synch.h , which also contains LOCK_SHARED


/* Keep the following values in sync with pthread.h */
#define LOCK_NORMAL 0x00/* same as 
USYNC_THREAD */
#define LOCK_SHARED 0x01/* same as 
USYNC_PROCESS */

#define LOCK_ERRORCHECK 0x02/* error check lock */
#define LOCK_RECURSIVE  0x04/* recursive lock */
#define LOCK_PRIO_INHERIT   0x10/* priority 
inheritance lock */
#define LOCK_PRIO_PROTECT   0x20/* priority ceiling 
lock */

#define LOCK_ROBUST 0x40/* robust lock */

..

If I comment out the line in the system include file, everything 
will finish
compiling, or if I comment out the line in mpicxx.cc, everything 
will finish

compiling.

Has anyone else found this issue and/or a workaround?

Thanks,

Jim
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel