Re: [OMPI devel] LOCK_SHARED?
Hi Jim: Yes, we ran into this also and your diagnosis is correct. The details are in this ticket. https://svn.open-mpi.org/trac/ompi/ticket/1477 We fixed it in the trunk and in the 1.3 series but we never backported it to the 1.2 series as 1.3 was going to be released "really soon". Here is the ticket for moving the fix into the 1.3 series. https://svn.open-mpi.org/trac/ompi/ticket/1494 Send me an email offline and we can figure out how to fix this for your case. Rolf Jim Langston wrote: Hi all, Quick question, I'm compiling 1.2.9rc1 and get an error during compilation: // source='mpicxx.cc' object='mpicxx.lo' libtool=yes \ DEPDIR=.deps depmode=none /bin/sh ../../../config/depcomp \ /bin/sh ../../../libtool --tag=CXX --mode=compile /export/home/langston/COMPILER/SUNWspro/bin/CC -DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include -I../../../ompi/include -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 -I../../..-O -DNDEBUG -mt -c -o mpicxx.lo mpicxx.cc libtool: compile: /export/home/langston/COMPILER/SUNWspro/bin/CC -DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include -I../../../ompi/include -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 -I../../.. -O -DNDEBUG -mt -c mpicxx.cc -KPIC -DPIC -o .libs/mpicxx.o "mpicxx.cc", line 293: Error: A declaration does not specify a tag or an identifier. "mpicxx.cc", line 293: Error: Use ";" to terminate declarations. "mpicxx.cc", line 293: Error: A declaration was expected instead of "0x01". 3 Error(s) detected. gmake: *** [mpicxx.lo] Error 1 I'm working with OpenSolaris 2008.11 and have found the conflict to be with: /usr/include/sys/synch.h , which also contains LOCK_SHARED /* Keep the following values in sync with pthread.h */ #define LOCK_NORMAL 0x00/* same as USYNC_THREAD */ #define LOCK_SHARED 0x01/* same as USYNC_PROCESS */ #define LOCK_ERRORCHECK 0x02/* error check lock */ #define LOCK_RECURSIVE 0x04/* recursive lock */ #define LOCK_PRIO_INHERIT 0x10/* priority inheritance lock */ #define LOCK_PRIO_PROTECT 0x20/* priority ceiling lock */ #define LOCK_ROBUST 0x40/* robust lock */ .. If I comment out the line in the system include file, everything will finish compiling, or if I comment out the line in mpicxx.cc, everything will finish compiling. Has anyone else found this issue and/or a workaround? Thanks, Jim ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- = rolf.vandeva...@sun.com 781-442-3043 =
[OMPI devel] RFC: Component-izing MPI_Op
WHAT: Converting the back-end of MPI_Op's to use components instead of hard-coded C functions. WHY: To support specialized hardware (such as GPUs). WHERE: Changes most of the MPI_Op code, adds a new ompi/mca/op framework. WHEN: Work has started in an hg branch (http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/cuda/ ). TIMEOUT: Next Tuesday's teleconference, Jan 13 2008. --- Note: I don't plan to finish the work by Jan 13; I just want to get a yea/nay from the community on the concept. Final review of the code before coming into the trunk can come later when I have more work to show / review. Background: Today, the back-end MPI_Op functionality of (MPI_Op, MPI_Datatype) tuples are implemented as function pointers to a series of hard-coded C functions in the ompi/op/ directory. *** NOTE: Since we already implement MPI_Op functionality via function pointer, this proposed extension is not expected to cause any performance difference in terms of OMPI's infrastructure. Proposal: Extend the current implementation by creating a new framework ("op") that allows components to provide back-end MPI_Op functions instead of/in addition to the hard-coded C functions (we've talked about this idea before, but never done it). The "op" framework will be similar to the MPI coll framework in that individual function pointers from multiple different modules can be mixed-n-matched. For example, if you want to write a new coll component that implements *only* a new MPI_BCAST algorithm, that coll component can be mixed-n-matched with other coll components at run time to get a full set of collective implementations on a communicator. A similar concept will be applied to the "op" framework. Case in point: some specialized hardware is only good at *some* operations on *some* datatypes; we'll need to fall back to the hard-coded C versions for all other tuples. It is likely that the the "op" framework base will have all the hard- coded C "basic" MPI_Op functions that will always be available for fallback if a component is not used at run-time for a specialized implementation. Specifically: the intent is that components will be for specialized implementations. -- Jeff Squyres Cisco Systems
Re: [OMPI devel] RFC: Component-izing MPI_Op
I think this sounds reasonable, if (and only if) MPI_Accumulate is properly handled. The interface for calling the op functions was broken in some fairly obvious way for accumulate when I was writing the one-sided code. I think I had to call some supposedly internal bits of the interface to make accumulate work. I can't remember what they are now, but I do remember it being a problem. Of course, unless it makes mpi_allreduce on one double-sized floating point number using sum go faster, I'm not entirely sure a change is helpful ;). Brian On Mon, 5 Jan 2009, Jeff Squyres wrote: WHAT: Converting the back-end of MPI_Op's to use components instead of hard-coded C functions. WHY: To support specialized hardware (such as GPUs). WHERE: Changes most of the MPI_Op code, adds a new ompi/mca/op framework. WHEN: Work has started in an hg branch (http://www.open-mpi.org/hg/hgwebdir.cgi/jsquyres/cuda/). TIMEOUT: Next Tuesday's teleconference, Jan 13 2008. --- Note: I don't plan to finish the work by Jan 13; I just want to get a yea/nay from the community on the concept. Final review of the code before coming into the trunk can come later when I have more work to show / review. Background: Today, the back-end MPI_Op functionality of (MPI_Op, MPI_Datatype) tuples are implemented as function pointers to a series of hard-coded C functions in the ompi/op/ directory. *** NOTE: Since we already implement MPI_Op functionality via function pointer, this proposed extension is not expected to cause any performance difference in terms of OMPI's infrastructure. Proposal: Extend the current implementation by creating a new framework ("op") that allows components to provide back-end MPI_Op functions instead of/in addition to the hard-coded C functions (we've talked about this idea before, but never done it). The "op" framework will be similar to the MPI coll framework in that individual function pointers from multiple different modules can be mixed-n-matched. For example, if you want to write a new coll component that implements *only* a new MPI_BCAST algorithm, that coll component can be mixed-n-matched with other coll components at run time to get a full set of collective implementations on a communicator. A similar concept will be applied to the "op" framework. Case in point: some specialized hardware is only good at *some* operations on *some* datatypes; we'll need to fall back to the hard-coded C versions for all other tuples. It is likely that the the "op" framework base will have all the hard-coded C "basic" MPI_Op functions that will always be available for fallback if a component is not used at run-time for a specialized implementation. Specifically: the intent is that components will be for specialized implementations.
Re: [OMPI devel] RFC: Component-izing MPI_Op
On Jan 5, 2009, at 10:09 AM, Brian W. Barrett wrote: I think this sounds reasonable, if (and only if) MPI_Accumulate is properly handled. The interface for calling the op functions was broken in some fairly obvious way for accumulate when I was writing the one-sided code. I think I had to call some supposedly internal bits of the interface to make accumulate work. I can't remember what they are now, but I do remember it being a problem. Coolio; I'll look into it. Of course, unless it makes mpi_allreduce on one double-sized floating point number using sum go faster, I'm not entirely sure a change is helpful ;). From my (admittedly limited) understanding, since there are memory registration and/or copy in/out issues with GPUs, the operation has to be "big enough" and/or already located in GPU memory for the GPU to outperform the CPU. It is my assumption that the component-ized CUDA/ OpenCL/whatever code will need to make a decision whether it should perform the operation at run-time or pass it back to a fallback [probably CPU-based] implementation, analogous to how "tuned" picks the right coll algorithm. I'm told that there's some researchy middleware working on exactly this kind of problem (determining if a given operation is suitable to run on the GPU or the main CPU). So in a best-case scenario, OMPI can just link against and use that middleware rather than implementing all the logic in the component itself. We'll see how it plays out. My goal is to give these guys the infrastructure that they need in OMPI to play with these kind of concepts and see what they can accomplish in terms of real performance. FWIW: a few SC08 attendees thought that they could avoid writing much CUDA/CL/whatever code if MPI_REDUCE did the work for them (particularly if paired with the proposed MPI_REDUCE_LOCAL function, https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/24) . [shrug] We'll see! -- Jeff Squyres Cisco Systems
[OMPI devel] problem compiling r20196
Hi, I don't manage to compile the code from the svn r20196. I get the following error: pstat_linux_module.c:34:73: error: asm/page.h: No such file or directory make[2]: *** [pstat_linux_module.lo] Error 1 It seems that it is because new Linux kernels no longer install asm/page.h (I use a 2.6.27 Linux kernel). Regards, Thomas.
Re: [OMPI devel] [OMPI svn] svn:open-mpi r20196
Tim, To answer to your question in ticket #869: the only known missing feature to the opal_stdint.h is that there is no portable way to printf size_t. Their type is subject to so many changes depending on the platform and compiler that it is impossible to be sure that PRI_size_t is not gonna dump a lot of warnings. Aside from that, it should be pretty solid. Aurelien Le 4 janv. 09 à 00:09, timat...@osl.iu.edu a écrit : Author: timattox Date: 2009-01-04 00:09:18 EST (Sun, 04 Jan 2009) New Revision: 20196 URL: https://svn.open-mpi.org/trac/ompi/changeset/20196 Log: Refs #868, #869 The fix for #868, r14358, introduced an (unneeded?) inconsitency... For Mac OS X systems, inttypes.h will always be included with opal_config.h, and NOT included for non-Mac OS X systems. For developers using Mac OS X, this masks the need to include inttypes.h or more properly opal_stdint.h. This changeset corrects one of these oopses. However, the underlying problem still exists. Moving the equivelent of r14358 into opal_stdint.h from opal_config_bottom.h might be the "right" solution, but AFAIK, we would then need to replace each direct inclusion of inttypes.h with opal_stdint.h to properly address tickets #868 and #869. Text files modified: trunk/opal/dss/dss_print.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) Modified: trunk/opal/dss/dss_print.c = = = = = = = = == --- trunk/opal/dss/dss_print.c (original) +++ trunk/opal/dss/dss_print.c 2009-01-04 00:09:18 EST (Sun, 04 Jan 2009) @@ -18,6 +18,7 @@ #include "opal_config.h" +#include "opal_stdint.h" #include #include "opal/dss/dss_internal.h" ___ svn mailing list s...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn
Re: [OMPI devel] [OMPI svn] svn:open-mpi r20196
Addendum to the previous message concerning this discussion: I think we should stick with including opal_stdint everywhere instead of inttypes.h (this file does not always exist on ansi pedantic compilers). Aurelien Le 4 janv. 09 à 00:09, timat...@osl.iu.edu a écrit : Author: timattox Date: 2009-01-04 00:09:18 EST (Sun, 04 Jan 2009) New Revision: 20196 URL: https://svn.open-mpi.org/trac/ompi/changeset/20196 Log: Refs #868, #869 The fix for #868, r14358, introduced an (unneeded?) inconsitency... For Mac OS X systems, inttypes.h will always be included with opal_config.h, and NOT included for non-Mac OS X systems. For developers using Mac OS X, this masks the need to include inttypes.h or more properly opal_stdint.h. This changeset corrects one of these oopses. However, the underlying problem still exists. Moving the equivelent of r14358 into opal_stdint.h from opal_config_bottom.h might be the "right" solution, but AFAIK, we would then need to replace each direct inclusion of inttypes.h with opal_stdint.h to properly address tickets #868 and #869. Text files modified: trunk/opal/dss/dss_print.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) Modified: trunk/opal/dss/dss_print.c = = = = = = = = == --- trunk/opal/dss/dss_print.c (original) +++ trunk/opal/dss/dss_print.c 2009-01-04 00:09:18 EST (Sun, 04 Jan 2009) @@ -18,6 +18,7 @@ #include "opal_config.h" +#include "opal_stdint.h" #include #include "opal/dss/dss_internal.h" ___ svn mailing list s...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn
Re: [OMPI devel] problem compiling r20196
Is there some other file that should be included instead? On Jan 5, 2009, at 1:16 PM, Thomas Ropars wrote: Hi, I don't manage to compile the code from the svn r20196. I get the following error: pstat_linux_module.c:34:73: error: asm/page.h: No such file or directory make[2]: *** [pstat_linux_module.lo] Error 1 It seems that it is because new Linux kernels no longer install asm/ page.h (I use a 2.6.27 Linux kernel). Regards, Thomas. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] problem compiling r20196
The file is present on the 2.6.19 distribution, which is the most current I have access to. However, after looking at the code, I realized that we no longer need that include file anyway - so I have removed it. Hopefully, that should let you build. Ralph On Jan 5, 2009, at 12:08 PM, Jeff Squyres wrote: Is there some other file that should be included instead? On Jan 5, 2009, at 1:16 PM, Thomas Ropars wrote: Hi, I don't manage to compile the code from the svn r20196. I get the following error: pstat_linux_module.c:34:73: error: asm/page.h: No such file or directory make[2]: *** [pstat_linux_module.lo] Error 1 It seems that it is because new Linux kernels no longer install asm/ page.h (I use a 2.6.27 Linux kernel). Regards, Thomas. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] LOCK_SHARED?
Hi Rolf, Thanks for the pointers, they are very clear and concise. I followed the general flow of what was done to fix the issue in 1.3 and did something similar for 1.2.9. In mpicxx.cc, I did this change: #include #ifdef LOCK_SHARED static const int ompi_synch_lock_shared = LOCK_SHARED ; #undef LOCK_SHARED #endif const int LOCK_SHARED = MPI_LOCK_SHARED; Even though the variable getting set is basically dead code and not necessary, my goal is that if someone is looking at the 1.3 notes, they will see what I did. This makes OpenMPI happy and the compile continues and chugs along. If someone thinks I screwed up OpenMPI , please let me know. Thanks, Jim /// Rolf Vandevaart wrote: Hi Jim: Yes, we ran into this also and your diagnosis is correct. The details are in this ticket. https://svn.open-mpi.org/trac/ompi/ticket/1477 We fixed it in the trunk and in the 1.3 series but we never backported it to the 1.2 series as 1.3 was going to be released "really soon". Here is the ticket for moving the fix into the 1.3 series. https://svn.open-mpi.org/trac/ompi/ticket/1494 Send me an email offline and we can figure out how to fix this for your case. Rolf Jim Langston wrote: Hi all, Quick question, I'm compiling 1.2.9rc1 and get an error during compilation: // source='mpicxx.cc' object='mpicxx.lo' libtool=yes \ DEPDIR=.deps depmode=none /bin/sh ../../../config/depcomp \ /bin/sh ../../../libtool --tag=CXX --mode=compile /export/home/langston/COMPILER/SUNWspro/bin/CC -DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include -I../../../ompi/include -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 -I../../..-O -DNDEBUG -mt -c -o mpicxx.lo mpicxx.cc libtool: compile: /export/home/langston/COMPILER/SUNWspro/bin/CC -DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include -I../../../ompi/include -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 -I../../.. -O -DNDEBUG -mt -c mpicxx.cc -KPIC -DPIC -o .libs/mpicxx.o "mpicxx.cc", line 293: Error: A declaration does not specify a tag or an identifier. "mpicxx.cc", line 293: Error: Use ";" to terminate declarations. "mpicxx.cc", line 293: Error: A declaration was expected instead of "0x01". 3 Error(s) detected. gmake: *** [mpicxx.lo] Error 1 I'm working with OpenSolaris 2008.11 and have found the conflict to be with: /usr/include/sys/synch.h , which also contains LOCK_SHARED /* Keep the following values in sync with pthread.h */ #define LOCK_NORMAL 0x00/* same as USYNC_THREAD */ #define LOCK_SHARED 0x01/* same as USYNC_PROCESS */ #define LOCK_ERRORCHECK 0x02/* error check lock */ #define LOCK_RECURSIVE 0x04/* recursive lock */ #define LOCK_PRIO_INHERIT 0x10/* priority inheritance lock */ #define LOCK_PRIO_PROTECT 0x20/* priority ceiling lock */ #define LOCK_ROBUST 0x40/* robust lock */ .. If I comment out the line in the system include file, everything will finish compiling, or if I comment out the line in mpicxx.cc, everything will finish compiling. Has anyone else found this issue and/or a workaround? Thanks, Jim ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- / Jim Langston Sun Microsystems, Inc. (877) 854-5583 (AccessLine) (513) 702-4741 (Cell) AIM: jl9594 jim.langs...@sun.com
Re: [OMPI devel] LOCK_SHARED?
Jim Langston wrote: Hi Rolf, Thanks for the pointers, they are very clear and concise. I followed the general flow of what was done to fix the issue in 1.3 and did something similar for 1.2.9. In mpicxx.cc, I did this change: #include #ifdef LOCK_SHARED static const int ompi_synch_lock_shared = LOCK_SHARED ; #undef LOCK_SHARED #endif const int LOCK_SHARED = MPI_LOCK_SHARED; Even though the variable getting set is basically dead code and not necessary, my goal is that if someone is looking at the 1.3 notes, they will see what I did. This makes OpenMPI happy and the compile continues and chugs along. If someone thinks I screwed up OpenMPI , please let me know. For a one off change for usage with Solaris and Sun Studio I think the above is fine. However, for a general fix that would not break builds for other platforms you'd really want to pull over the other handful of lines.It probably wouldn't be that bad to just CMR the changes to the 1.2 branch. When the original changes to the trunk and 1.3 happened I really didn't think there were going to be more changes to the 1.2 branch at the time which is why we opted not to CMR it at the time. --td Thanks, Jim /// Rolf Vandevaart wrote: Hi Jim: Yes, we ran into this also and your diagnosis is correct. The details are in this ticket. https://svn.open-mpi.org/trac/ompi/ticket/1477 We fixed it in the trunk and in the 1.3 series but we never backported it to the 1.2 series as 1.3 was going to be released "really soon". Here is the ticket for moving the fix into the 1.3 series. https://svn.open-mpi.org/trac/ompi/ticket/1494 Send me an email offline and we can figure out how to fix this for your case. Rolf Jim Langston wrote: Hi all, Quick question, I'm compiling 1.2.9rc1 and get an error during compilation: // source='mpicxx.cc' object='mpicxx.lo' libtool=yes \ DEPDIR=.deps depmode=none /bin/sh ../../../config/depcomp \ /bin/sh ../../../libtool --tag=CXX --mode=compile /export/home/langston/COMPILER/SUNWspro/bin/CC -DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include -I../../../ompi/include -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 -I../../..-O -DNDEBUG -mt -c -o mpicxx.lo mpicxx.cc libtool: compile: /export/home/langston/COMPILER/SUNWspro/bin/CC -DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include -I../../../ompi/include -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 -I../../.. -O -DNDEBUG -mt -c mpicxx.cc -KPIC -DPIC -o .libs/mpicxx.o "mpicxx.cc", line 293: Error: A declaration does not specify a tag or an identifier. "mpicxx.cc", line 293: Error: Use ";" to terminate declarations. "mpicxx.cc", line 293: Error: A declaration was expected instead of "0x01". 3 Error(s) detected. gmake: *** [mpicxx.lo] Error 1 I'm working with OpenSolaris 2008.11 and have found the conflict to be with: /usr/include/sys/synch.h , which also contains LOCK_SHARED /* Keep the following values in sync with pthread.h */ #define LOCK_NORMAL 0x00/* same as USYNC_THREAD */ #define LOCK_SHARED 0x01/* same as USYNC_PROCESS */ #define LOCK_ERRORCHECK 0x02/* error check lock */ #define LOCK_RECURSIVE 0x04/* recursive lock */ #define LOCK_PRIO_INHERIT 0x10/* priority inheritance lock */ #define LOCK_PRIO_PROTECT 0x20/* priority ceiling lock */ #define LOCK_ROBUST 0x40/* robust lock */ .. If I comment out the line in the system include file, everything will finish compiling, or if I comment out the line in mpicxx.cc, everything will finish compiling. Has anyone else found this issue and/or a workaround? Thanks, Jim ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel