[OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k
Hi folks, The 'new' Open MPI packages for Debian that are maintained by a few of us via a group on alioth.debian.org have now [1] reached the main distribution. This means they are being built on all supported architecture and logs accumulate at http://buildd.debian.org/build.php?pkg=openmpi This shows success on 'alpha', 'amd64', 'ia64', 'powerpc' and 'x86' (implied, my build architecture). However, we have failures on 'hppa', 'mips' and 'mipsel'. The remaining ones ('arm', 'm68k', 's390' and 'sparc') are still outstanding. Looking at the most recent error logs on the failing architectures we see i) that hppa (in the way configure sees it) is not supported: checking if .size is needed... yes checking if .align directive takes logarithmic value... no configure: error: No atomic primitives available for hppa-unknown-linux-gnu make: *** [config.status] Error 1 Now, configure and aclocal have lots of 'hppa*64' statements. Is it enough to turn these into 'hppa*64*|hppa*linux*' or something similar ? This issue has previously been logged in the Debian Bug Tracking System, see http://bugs.debian.org/431631 ii) that mips croaks at the assembler level ln -s "../../opal/asm/generated/atomic-local.s" atomic-asm.s /bin/sh ../../libtool --mode=compile gcc -DNDEBUG -Wall -g -O2 -finline-functions -fno-strict-aliasing -c -o atomic-asm.lo atomic-asm.s libtool: compile: gcc -DNDEBUG -Wall -g -O2 -finline-functions -fno-strict-aliasing -c atomic-asm.s -fPIC -DPIC -o .libs/atomic-asm.o atomic-asm.s: Assembler messages: I haven't looked in detail, but is there a non-assembler code branch we could invoke? iii) mipsel is also not supported: checking if .size is needed... yes checking if .align directive takes logarithmic value... yes configure: error: No atomic primitives available for mipsel-unknown-linux-gnu make: *** [config.status] Error 1 What can we do here? Can the Debian porters help with tests to devise a mips/mipsel configuration? Looking at the bug archive for 'openmpi' we see more failures: iv) s390 has an open bug about the same 'atomic primitives' issue, see http://bugs.debian.org/376833 v) m68k has an open bug about the same 'atomic primitives' issue, see http://bugs.debian.org/405929 It is possible to just declare a lists of architectures on which to build, but this is somewhat strongly discouraged. Please let us (ie Debian's openmpi maintainers) how else we can help. I am ccing the porters lists (for hppa, m68k, mips) too to invite them to help. I hope that doesn't get the spam filters going... I may contact the 'arm' porters once we have a failure; s390 and sparc activity are not as big these days. Regards, Dirk [1] New packages go into the NEW queue so that the ftpfaster can inspect the packaging, licenses, ... and reorganised source packages with new or renmamed binary packages get the same treatment. -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison
Re: [OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k
On Jul 14, 2007, at 8:26 AM, Dirk Eddelbuettel wrote: Please let us (ie Debian's openmpi maintainers) how else we can help. I am ccing the porters lists (for hppa, m68k, mips) too to invite them to help. I hope that doesn't get the spam filters going... I may contact the 'arm' porters once we have a failure; s390 and sparc activity are not as big these days. Open MPI uses some assembly for things like atomic locks, atomic compare and swap, memory barriers, and the like. We currently have support for: * x86 (32 bit) * x86_64 / amd64 (32 or 64 bit) * UltraSparc (v8plus and v9* targets) * IA64 * PowerPC (32 or 64 bit) We also have code for: * Alpha * MIPS (32 bit NEW ABI & 64 bit) This support isn't well tested in a while and it sounds like it doesn't work for MIPS. At one time, we supported the sparc v8 target, but that The other platforms (hppa, mipsel (how is this different than MIPS?), s390, m68k) aren't at all supported by Open MPI. If you can get the real error messages, I can help on the MIPS issue, although it'll have to be a low priority. We don't currently have support for a non-assembly code path. We originally planned on having one, but the team went away from that route over time and there's no way to build Open MPI without assembly support right now. Brian -- Brian W. Barrett Networking Team, CCS-1 Los Alamos National Laboratory
Re: [OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k
Hi Carlos, Thanks for the quick reply. On 14 July 2007 at 11:03, Carlos O'Donell wrote: | On 7/14/07, Dirk Eddelbuettel wrote: | > i) that hppa (in the way configure sees it) is not supported: | > | > checking if .size is needed... yes | > checking if .align directive takes logarithmic value... no | > configure: error: No atomic primitives available for hppa-unknown-linux-gnu | > make: *** [config.status] Error 1 | > | > Now, configure and aclocal have lots of 'hppa*64' statements. Is it | > enough to turn these into 'hppa*64*|hppa*linux*' or something similar ? | > | > This issue has previously been logged in the Debian Bug Tracking System, | > see http://bugs.debian.org/431631 | | That bug does not appear to have any relevance to the failed configure check. | What atomic primitives does Open MPI need? I am confused. I am not sure I understand your question. Are you aware that configure checks for this? Eg from my x86 build logs: checking for pre-built assembly file... yes (atomic-ia32-linux.s) checking for atomic assembly filename... atomic-ia32-linux.s So atomic-$foo better be there for a given architecture foo as there are sources for some platforms: edd@basebud:~/src/debian/SVN/tarballs/openmpi-1.2.3/opal> ls -1 asm/generated/ atomic-alpha-linux.s atomic-amd64-linux-nongas.s atomic-amd64-linux.s atomic-ia32-cygwin-nongas.s atomic-ia32-cygwin.s atomic-ia32-linux-nongas.s atomic-ia32-linux.s atomic-ia32-osx.s atomic-ia64-linux-nongas.s atomic-ia64-linux.s atomic-mips-irix.s atomic-powerpc32-64-osx.s atomic-powerpc32-aix.s atomic-powerpc32-linux-nongas.s atomic-powerpc32-linux.s atomic-powerpc32-osx.s atomic-powerpc64-aix.s atomic-powerpc64-linux-nongas.s atomic-powerpc64-linux.s atomic-powerpc64-osx.s atomic-sparc-solaris.s atomic-sparcv9-32-solaris.s atomic-sparcv9-64-solaris.s Methinks we need to fill in a few blanks here, or make do with non-asm solutions. I don't know the problem space that well (being a maintainer rather than upstream developer) and am looking for guidance. For what it's worth, lam (7.1.2, currently) us available on all build architectures for Debian, but it may not push the (hardware) envelope as hard. Hope this helps, Dirk -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison
Re: [OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k
On Jul 14, 2007, at 10:53 AM, Dirk Eddelbuettel wrote: Methinks we need to fill in a few blanks here, or make do with non-asm solutions. I don't know the problem space that well (being a maintainer rather than upstream developer) and am looking for guidance. Either way is an option. There are really only a couple of functions that have to be implemented: * atomic word-size compare and swap * memory barrier We'll emulte atomic adds and spin-locks with compare and swap if not directly implemented. The memory barrier functions have to exist, even if they don't do anything. We require compare-and-swap for a couple of pieces of code, which is why we lost our Sparc v8 support a couple of releases ago. For what it's worth, lam (7.1.2, currently) us available on all build architectures for Debian, but it may not push the (hardware) envelope as hard. Correct, LAM only had very limited ASM requirements (basically, memory barrier on platforms that required it -- like PowerPC). Brian
Re: [OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k
Hi Brian, On 14 July 2007 at 10:47, Brian Barrett wrote: | On Jul 14, 2007, at 8:26 AM, Dirk Eddelbuettel wrote: | | > Please let us (ie Debian's openmpi maintainers) how else we can | > help. I am | > ccing the porters lists (for hppa, m68k, mips) too to invite them | > to help. I | > hope that doesn't get the spam filters going... I may contact the | > 'arm' | > porters once we have a failure; s390 and sparc activity are not as | > big these | > days. | | Open MPI uses some assembly for things like atomic locks, atomic | compare and swap, memory barriers, and the like. We currently have | support for: | |* x86 (32 bit) |* x86_64 / amd64 (32 or 64 bit) |* UltraSparc (v8plus and v9* targets) |* IA64 |* PowerPC (32 or 64 bit) | | We also have code for: | |* Alpha |* MIPS (32 bit NEW ABI & 64 bit) | | This support isn't well tested in a while and it sounds like it We'd be glad to help. This has worked well for other project. I think that Debian is the quasi-official testbed for xfree.org given all our platforms. So we can definitely try to get Alpha, Mips, ... up to speed with suitable regression tests. | doesn't work for MIPS. At one time, we supported the sparc v8 | target, but that The other platforms (hppa, mipsel (how is this | different than MIPS?), s390, m68k) aren't at all supported by Open | MPI. If you can get the real error messages, I can help on the MIPS | issue, although it'll have to be a low priority. I think mipsel is the lower-endian variant. Something similar now exists for arm where there's also armel. Mips support would be nice as there are some HPC platform based on these chips. Maybe someone from the debian-mips team can speak up and take a lead here to work with with you. | We don't currently have support for a non-assembly code path. We | originally planned on having one, but the team went away from that | route over time and there's no way to build Open MPI without assembly | support right now. Personally, I think that's a fair call given what Open MPI sets out to do. Debian 'at large' aims for the 'everything ought to build everywhere' model (which has its merits too) so I'll have to see if we get pushback if we restric the platforms. So given the list of current failures, hppa and mips/mipsel are the most likely candidates for improvement. Sparc and s390 are fairly dead at Debian so not sure if anything will change there. m68k is close to officially dead but a few vocal enthusiast try to keep it on life-support. Cheers, Dirk -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison
Re: [OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k
Instead of failing at configure time, we might want to disable the threading features and the shared memory device if we detect that we don't have support for atomics on a specified platform. In a non threaded build, the shared memory device is the only place where we need support for memory barrier. I'll look in the code to see why we need support for compare-and-swap on a non threaded build. Thanks, george. On Jul 14, 2007, at 1:06 PM, Brian Barrett wrote: On Jul 14, 2007, at 10:53 AM, Dirk Eddelbuettel wrote: Methinks we need to fill in a few blanks here, or make do with non- asm solutions. I don't know the problem space that well (being a maintainer rather than upstream developer) and am looking for guidance. Either way is an option. There are really only a couple of functions that have to be implemented: * atomic word-size compare and swap * memory barrier We'll emulte atomic adds and spin-locks with compare and swap if not directly implemented. The memory barrier functions have to exist, even if they don't do anything. We require compare-and-swap for a couple of pieces of code, which is why we lost our Sparc v8 support a couple of releases ago. For what it's worth, lam (7.1.2, currently) us available on all build architectures for Debian, but it may not push the (hardware) envelope as hard. Correct, LAM only had very limited ASM requirements (basically, memory barrier on platforms that required it -- like PowerPC). Brian ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k
On Jul 14, 2007, at 11:16 AM, George Bosilca wrote: Instead of failing at configure time, we might want to disable the threading features and the shared memory device if we detect that we don't have support for atomics on a specified platform. In a non threaded build, the shared memory device is the only place where we need support for memory barrier. I'll look in the code to see why we need support for compare-and-swap on a non threaded build. George - Disabling SM and threads if there's no atomic support would definitely be one option. The compare-and-swap is used by the LIFO used for ompi free lists. Brian
Re: [OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k
On Sat, Jul 14, 2007 at 01:16:42PM -0400, George Bosilca wrote: > Instead of failing at configure time, we might want to disable the > threading features and the shared memory device if we detect that we > don't have support for atomics on a specified platform. In a non > threaded build, the shared memory device is the only place where we > need support for memory barrier. I'll look in the code to see why we > need support for compare-and-swap on a non threaded build. Proper memory barrier is also needed for openib BTL eager RDMA support. > >Thanks, > george. > > On Jul 14, 2007, at 1:06 PM, Brian Barrett wrote: > > > On Jul 14, 2007, at 10:53 AM, Dirk Eddelbuettel wrote: > > > >> Methinks we need to fill in a few blanks here, or make do with non- > >> asm > >> solutions. I don't know the problem space that well (being a > >> maintainer > >> rather than upstream developer) and am looking for guidance. > > > > Either way is an option. There are really only a couple of functions > > that have to be implemented: > > > >* atomic word-size compare and swap > >* memory barrier > > > > We'll emulte atomic adds and spin-locks with compare and swap if not > > directly implemented. The memory barrier functions have to exist, > > even if they don't do anything. We require compare-and-swap for a > > couple of pieces of code, which is why we lost our Sparc v8 support a > > couple of releases ago. > > > >> For what it's worth, lam (7.1.2, currently) us available on all build > >> architectures for Debian, but it may not push the (hardware) > >> envelope as > >> hard. > > > > Correct, LAM only had very limited ASM requirements (basically, > > memory barrier on platforms that required it -- like PowerPC). > > > > Brian > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.
Re: [OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k
On Jul 14, 2007, at 11:51 AM, Gleb Natapov wrote: On Sat, Jul 14, 2007 at 01:16:42PM -0400, George Bosilca wrote: Instead of failing at configure time, we might want to disable the threading features and the shared memory device if we detect that we don't have support for atomics on a specified platform. In a non threaded build, the shared memory device is the only place where we need support for memory barrier. I'll look in the code to see why we need support for compare-and-swap on a non threaded build. Proper memory barrier is also needed for openib BTL eager RDMA support. Removed all the platform lists, since they won't care about this part :). Ah, true. The eager RDMA code should check that the preprocessor symbol OPAL_HAVE_ATOMIC_MEM_BARRIER is 1 and disable itself if that isn't the case. All the "sections" of ASM support (memory barriers, locks, compare-and-swap, and atomic math) have preprocessor symbols indicating whether support exists or not in the current build. These should really be used :). Brian
Re: [OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k
Brian, We should be able to use these defines in the configure.m4 files for each component right ? I think the asm section is detected before we go in the component configuration. So far we know about the following components that have to disable themselves if no atomic or memory barrier is detected: - MPOOL: sm - BTL: sm, openib (completely or partially?) Anybody knows about any other components with atomic requirements ? george. On Jul 14, 2007, at 1:59 PM, Brian Barrett wrote: On Jul 14, 2007, at 11:51 AM, Gleb Natapov wrote: On Sat, Jul 14, 2007 at 01:16:42PM -0400, George Bosilca wrote: Instead of failing at configure time, we might want to disable the threading features and the shared memory device if we detect that we don't have support for atomics on a specified platform. In a non threaded build, the shared memory device is the only place where we need support for memory barrier. I'll look in the code to see why we need support for compare-and-swap on a non threaded build. Proper memory barrier is also needed for openib BTL eager RDMA support. Removed all the platform lists, since they won't care about this part :). Ah, true. The eager RDMA code should check that the preprocessor symbol OPAL_HAVE_ATOMIC_MEM_BARRIER is 1 and disable itself if that isn't the case. All the "sections" of ASM support (memory barriers, locks, compare-and-swap, and atomic math) have preprocessor symbols indicating whether support exists or not in the current build. These should really be used :). Brian ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k
If the OMPI_HAVE_THREAD_SUPPORT is not set the LIFO fall back to a default version where atomic operations are not required. We can even remove the dependency on the atomic.h header if the thread support is not enabled. Unfortunately, our shared memory device require the atomic operations plus the memory barriers. Therefore, we cannot do anything more fine grained (such as non-existence of atomic compare-and-swap disable only the threading support and the non-existence of the memory barrier disable only the shared memory support). george. On Jul 14, 2007, at 1:27 PM, Brian Barrett wrote: On Jul 14, 2007, at 11:16 AM, George Bosilca wrote: Instead of failing at configure time, we might want to disable the threading features and the shared memory device if we detect that we don't have support for atomics on a specified platform. In a non threaded build, the shared memory device is the only place where we need support for memory barrier. I'll look in the code to see why we need support for compare-and-swap on a non threaded build. George - Disabling SM and threads if there's no atomic support would definitely be one option. The compare-and-swap is used by the LIFO used for ompi free lists. Brian ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] lsf support / farm use models
hi everyone, firstly, i'm new around here, and somewhat clueless when it comes to the details of working with an big autoconfiscated project like open-rte/open-mpi the svn checkout level ... i've read some of the archives that turned up in searches for terms like 'LSF', and it would seem there was some discussion about adding some form of LSF support to open-rte, but that the discussion ended a while back. so, after playing around with the 1.2.3 release tarball for a while, and reading various pieces of the code until i had a (vague) idea of the top-level control flow and such, i decided i was ready to try to add ras and pls component to support LSF. once i had the build system up, i tried to create an ras/lsf directory, and slightly to my surprise, it already existed. i was kinda hoping for that, but it appears to be *very* fresh code at the moment. nonetheless, i played around a bit more, and ran into two issues: 1) it appears that you (jeff, i guess ;) are using new LSF 7.0 API features. i'm working to support customers in the EDA space, and it's not clear if/when they will migrate to 7.0 -- not to mention that our company (cadence) doesn't appear to have LSF 7.0 yet. i'm still looking in to the deatils, but it appears that (from the Platform docs) lsb_getalloc is probably just a thin wrapper around the LSB_MCPU_HOSTS (spelling?) environment variable. so that could be worked around fairly easily. i dunno about lsb_launch -- it seems equivalent to a set of ls_rtask() calls (one per process). however, i have heard that there can be significant subtleties with the semantics of these functions, in terms of compatibility across differently configured LSF-controlled farms, specifically with regrads to administrators ability to track and control job execution. personally, i don't see how it's really possible for LSF to prevent 'bad' users from spamming out jobs or short-cutting queues, but perhaps some of the methods they attempt to use can complicate things for a library like open-rte. 2) this brings us to point 2 -- upon talking to the author(s) of cadence's internal open-rte-like library, several key issues were raised. mainly, customers want their applications to be 'farm-friendly' in several key ways. firstly, they do not want any persistent daemons running outside of a given job -- this requirement seems met by the current open-mpi default behavior, at least as far i can tell. secondly, they prefer (strongly) that applications acquire resources incrementally, and perform work with whatever nodes are currently available, rather than forcing a large up-front node allocation. fault tolerance is nice too, although it's unclear to me if it's really practically needed. in any case, many of our applications can structure their computation to use resources in just such a way, generally by dividing the work into independent, restartable pieces (i.e. they are embarrassingly ||). also, MPI communication + MPI-2 process creation seems to be a reasonable interface for handling communication and dynamic process creation on the application side. however, it's not clear that open-rte supports the needed dynamic resource acquisition model in any of the ras/pls components i looked at. in fact, other that just folding everything in the pls component, it's not clear that the entire flow via the rmgr really supports it very well. specifically for LSF, the use model is that the initial job either is created with bsub/lsb_submit(), (or automatically submits itself as step zero perhaps) to run initially on N machines. N should be 'small' (1-16) -- perhaps only 1 for simplicity. then, as the application runs, it will continue to consume more resources as limited by the farm status, the user selection, and the max # of processes that the job can usefully support (generally 'large' -- 100-1000 cpus). so, i figure it's up to me to implement this stuff ;) ... clearly, i want to keep the 'normal' style ras/pls for LSF working, but somehow add the dynamic behavior as an option. my initial thought was to (in the dynamic case) basically ignore/fudge the ras/rmaps(/pls?) stages and simply use bsub/lsb_submit() in pls to launch new daemons as needed/requested. again, though it's not clear that the current control flow supports this well. given that there may be a large (10sec - 15min) delay between lsb_submit() and job launch, it may be necessary to both acquire minimum size blocks of new daemons at a time, and to have some non-blocking way to perform spawning. for example, in the current code, the MPI-2 spawn is blocking because it needs to return a communicator to the spawned process. however, this is not really necessary for the application to continue -- it can continue with other work until the new worker is up and running. perhaps some form of multi-threading could help with this, but it's not totally clear. i think i would prefer some lower-level open-rte calls that perform daemon pre-allocation (i.e. dynamic ras/daemon
Re: [OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k
the availability of functionality is set by the header files for each platform, not by configure. So we'd have to play some games to get at the information, but it should be possible. Brian On Jul 14, 2007, at 12:41 PM, George Bosilca wrote: Brian, We should be able to use these defines in the configure.m4 files for each component right ? I think the asm section is detected before we go in the component configuration. So far we know about the following components that have to disable themselves if no atomic or memory barrier is detected: - MPOOL: sm - BTL: sm, openib (completely or partially?) Anybody knows about any other components with atomic requirements ? george. On Jul 14, 2007, at 1:59 PM, Brian Barrett wrote: On Jul 14, 2007, at 11:51 AM, Gleb Natapov wrote: On Sat, Jul 14, 2007 at 01:16:42PM -0400, George Bosilca wrote: Instead of failing at configure time, we might want to disable the threading features and the shared memory device if we detect that we don't have support for atomics on a specified platform. In a non threaded build, the shared memory device is the only place where we need support for memory barrier. I'll look in the code to see why we need support for compare-and-swap on a non threaded build. Proper memory barrier is also needed for openib BTL eager RDMA support. Removed all the platform lists, since they won't care about this part :). Ah, true. The eager RDMA code should check that the preprocessor symbol OPAL_HAVE_ATOMIC_MEM_BARRIER is 1 and disable itself if that isn't the case. All the "sections" of ASM support (memory barriers, locks, compare-and-swap, and atomic math) have preprocessor symbols indicating whether support exists or not in the current build. These should really be used :). Brian ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [Pkg-openmpi-maintainers] Bug#433142: openmpi: FTBFS on GNU/kFreeBSD
Petr, On 14 July 2007 at 22:26, Petr Salinger wrote: | Package: openmpi | Severity: important | Version: 1.2.3-1 | Tags: patch | User: glibc-bsd-de...@lists.alioth.debian.org | Usertags: kfreebsd | | | Hi, | | the current version fails to build on GNU/kFreeBSD. | | It needs small fixups for munmap hackery and stacktrace. | It also needs to exclude linux specific build-depends. | Please find attached patch with that. Thanks for that patch. | It would be nice if you can ask upstream | to include changes to opal/util/stacktrace.c and | opal/mca/memory/ptmalloc2/opal_ptmalloc2_munmap.c . Doing so now for their consideration. Regards, Dirk | | Thanks in advance | | Petrdiff -u openmpi-1.2.3/debian/control openmpi-1.2.3/debian/control | --- openmpi-1.2.3/debian/control | +++ openmpi-1.2.3/debian/control | @@ -3,7 +3,7 @@ | Priority: optional | Maintainer: Debian OpenMPI Maintainers | Uploaders: Dirk Eddelbuettel | -Build-Depends: debhelper (>= 5.0.0), dpatch, libibverbs-dev, gfortran, libsysfs-dev, automake, gcc (>= 4:4.1.2) | +Build-Depends: debhelper (>= 5.0.0), dpatch, libibverbs-dev [!kfreebsd-i386 !kfreebsd-amd64 !hurd-i386], gfortran, libsysfs-dev [!kfreebsd-i386 !kfreebsd-amd64 !hurd-i386], automake, gcc (>= 4:4.1.2) | Standards-Version: 3.7.2 | XS-Vcs-Svn: svn://svn.debian.org/svn/pkg-openmpi/openmpi/trunk/ | XS-Vcs-Browser: http://svn.debian.org/wsvn/pkg-openmpi/openmpi/trunk/ | only in patch2: | unchanged: | --- openmpi-1.2.3.orig/opal/mca/memory/ptmalloc2/opal_ptmalloc2_munmap.c | +++ openmpi-1.2.3/opal/mca/memory/ptmalloc2/opal_ptmalloc2_munmap.c | @@ -26,7 +26,8 @@ | #elif defined(HAVE_SYSCALL) | #include | #include | -#elif defined(HAVE_DLSYM) | +#endif | +#if defined(HAVE_DLSYM) | #ifndef __USE_GNU | #define __USE_GNU | #endif | @@ -59,7 +60,7 @@ | int | opal_mem_free_ptmalloc2_munmap(void *start, size_t length, int from_alloc) | { | -#if !defined(HAVE___MUNMAP) && !defined(HAVE_SYSCALL) && defined(HAVE_DLSYM) | +#if !defined(HAVE___MUNMAP) && !(defined(HAVE_SYSCALL) && defined(__NR_munmap)) && defined(HAVE_DLSYM) | static int (*realmunmap)(void*, size_t); | #endif | | @@ -67,7 +68,7 @@ | | #if defined(HAVE___MUNMAP) | return __munmap(start, length); | -#elif defined(HAVE_SYSCALL) | +#elif defined(HAVE_SYSCALL) && defined(__NR_munmap) | return syscall(__NR_munmap, start, length); | #elif defined(HAVE_DLSYM) | if (NULL == realmunmap) { | only in patch2: | unchanged: | --- openmpi-1.2.3.orig/opal/util/stacktrace.c | +++ openmpi-1.2.3/opal/util/stacktrace.c | @@ -145,8 +145,12 @@ | case FPE_FLTDIV: si_code_str = "Floating point divide-by-zero"; break; | case FPE_FLTOVF: si_code_str = "Floating point overflow"; break; | case FPE_FLTUND: si_code_str = "Floating point underflow"; break; | +#ifdef FPE_FLTRES | case FPE_FLTRES: si_code_str = "Floating point inexact result"; break; | +#endif | +#ifdef FPE_FLTINV | case FPE_FLTINV: si_code_str = "Invalid floating point operation"; break; | +#endif | #ifdef FPE_FLTSUB | case FPE_FLTSUB: si_code_str = "Subscript out of range"; break; | #endif | ___ | Pkg-openmpi-maintainers mailing list | pkg-openmpi-maintain...@lists.alioth.debian.org | http://lists.alioth.debian.org/mailman/listinfo/pkg-openmpi-maintainers -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison
Re: [OMPI devel] lsf support / farm use models
Welcome! Yes, Jeff and I have been working on the LSF support based on 7.0 features in collab with the folks at Platform. Some further comments below... Ralph On 7/14/07 2:02 PM, "Matthew Moskewicz" wrote: > hi everyone, > > firstly, i'm new around here, and somewhat clueless when it comes to the > details of working with an big autoconfiscated project like open-rte/open-mpi > the svn checkout level ... > > i've read some of the archives that turned up in searches for terms like > 'LSF', and it would seem there was some discussion about adding some form of > LSF support to open-rte, but that the discussion ended a while back. so, after > playing around with the 1.2.3 release tarball for a while, and reading > various pieces of the code until i had a (vague) idea of the top-level > control flow and such, i decided i was ready to try to add ras and pls > component to support LSF. once i had the build system up, i tried to create an > ras/lsf directory, and slightly to my surprise, it already existed. i was > kinda hoping for that, but it appears to be *very* fresh code at the moment. > nonetheless, i played around a bit more, and ran into two issues: > > 1) it appears that you (jeff, i guess ;) are using new LSF 7.0 API features. > i'm working to support customers in the EDA space, and it's not clear if/when > they will migrate to 7.0 -- not to mention that our company (cadence) doesn't > appear to have LSF 7.0 yet. i'm still looking in to the deatils, but it > appears that (from the Platform docs) lsb_getalloc is probably just a thin > wrapper around the LSB_MCPU_HOSTS (spelling?) environment variable. so that > could be worked around fairly easily. i dunno about lsb_launch -- it seems > equivalent to a set of ls_rtask() calls (one per process). however, i have > heard that there can be significant subtleties with the semantics of these > functions, in terms of compatibility across differently configured > LSF-controlled farms, specifically with regrads to administrators ability to > track and control job execution. personally, i don't see how it's really > possible for LSF to prevent 'bad' users from spamming out jobs or > short-cutting queues, but perhaps some of the methods they attempt to use can > complicate things for a library like open-rte. After lengthy discussions with Platform, it was deemed the best path forward is to use the lsb_getalloc interface. While it currently reads the enviro variable, they indicated a potential change to read a file instead for scalability. Rather than chasing any changes, we all agreed that using lsb_getalloc would remain the "stable" interface - so that is what we used. Similar reasons for using lsb_launch. I would really advise against making any changes away from that support. Instead, we could take a lesson from our bproc support and simply (a) detect if we are on a pre-7.0 release, and then (b) build our own internal wrapper that provides back-support. See the bproc pls component for examples. > > 2) this brings us to point 2 -- upon talking to the author(s) of cadence's > internal open-rte-like library, several key issues were raised. mainly, > customers want their applications to be 'farm-friendly' in several key ways. > firstly, they do not want any persistent daemons running outside of a given > job -- this requirement seems met by the current open-mpi default behavior, at > least as far i can tell. secondly, they prefer (strongly) that applications > acquire resources incrementally, and perform work with whatever nodes are > currently available, rather than forcing a large up-front node allocation. > fault tolerance is nice too, although it's unclear to me if it's really > practically needed. in any case, many of our applications can structure their > computation to use resources in just such a way, generally by dividing the > work into independent, restartable pieces ( i.e. they are embarrassingly ||). > also, MPI communication + MPI-2 process creation seems to be a reasonable > interface for handling communication and dynamic process creation on the > application side. however, it's not clear that open-rte supports the needed > dynamic resource acquisition model in any of the ras/pls components i looked > at. in fact, other that just folding everything in the pls component, it's not > clear that the entire flow via the rmgr really supports it very well. > specifically for LSF, the use model is that the initial job either is created > with bsub/lsb_submit(), (or automatically submits itself as step zero > perhaps) to run initially on N machines. N should be 'small' (1-16) -- perhaps > only 1 for simplicity. then, as the application runs, it will continue to > consume more resources as limited by the farm status, the user selection, and > the max # of processes that the job can usefully support (generally 'large' -- > 100-1000 cpus). OpenRTE will be undergoing some changes shortly, so I would strongly recommend you avoid making major chang
Re: [OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k
Brian Barrett wrote: > On Jul 14, 2007, at 8:26 AM, Dirk Eddelbuettel wrote: > >> Please let us (ie Debian's openmpi maintainers) how else we can >> help. I am >> ccing the porters lists (for hppa, m68k, mips) too to invite them >> to help. I >> hope that doesn't get the spam filters going... I may contact the >> 'arm' >> porters once we have a failure; s390 and sparc activity are not as >> big these >> days. > > Open MPI uses some assembly for things like atomic locks, atomic > compare and swap, memory barriers, and the like. We currently have > support for: > >* x86 (32 bit) >* x86_64 / amd64 (32 or 64 bit) >* UltraSparc (v8plus and v9* targets) >* IA64 >* PowerPC (32 or 64 bit) > > We also have code for: > >* Alpha >* MIPS (32 bit NEW ABI & 64 bit) > > This support isn't well tested in a while and it sounds like it > doesn't work for MIPS. At one time, we supported the sparc v8 > target, but that The other platforms (hppa, mipsel (how is this > different than MIPS?), s390, m68k) aren't at all supported by Open > MPI. If you can get the real error messages, I can help on the MIPS > issue, although it'll have to be a low priority. > As maintainer of the atomics code for two projects unrelated to OpenMPI, I thought I'd pass on some of my insight. I'll not post any code here to avoid any accidental license questions. HPPA lacks an atomic compare-and-swap and is therefore probably a lost cause. The Linux kernel uses HPPA's only atomic instruction, load-and-clear, to implement a spinlock and a hashed table of spinlocks to implement atomic operations. This works because the atomic_read and atomic_set macros honor the spinlocks. This is not the case with ompi's atomics, is it? OpenMPI appears to contain fragments of such an array-of-spinlocks implementation for SPARCv8, but Brian's comments suggest to me that this may no longer work. ARM before v6 needs no memory barriers, but lacks atomic instructions other than unconditional swap (though very few multi-processor systems were built with earlier chips). However, on the libc-ports mailing list (http://sourceware.org/ml/libc-ports/2005-10/msg00016.html) says of the code used in glibc /* Atomic compare and exchange. These sequences are not actually Atomic; there is a race if *MEM != OLDVAL and we are preempted between the two swaps. However, they are very close to atomic, and are the best that a pre-ARMv6 implementation can do without operating system support. LinuxThreads has been using these sequences for many years. */ So, ompi might try getting away with the same logic if an ARM port is high priority for somebody. Alternatively, if one is on a new enough Linux kernel (>= 2.6.12 IIRC) you get kernel support for CAS by calling to a function in a "highpage" (like the VDSO on x86) that is implemented natively on >=ARMv6 and traps to the kernel otherwise (the kernel disables interrupts and then uses the not-quite-atomic sequence). For ARMv6 you get a load-exclusive and store-exclusive pair, and you get real memory barriers as well. M68K has a CAS instruction and memory barriers are no-ops. This should be an easy one to implement from the instruction set reference docs. s390 is one I don't have any first-hand experience with but know from peeking at the Linux kernel source that it has the eieio memory-barrier instruction of early PPCs and a CAS instruction. Again, should be easy from the ISA docs. MIPS is supposed to work w/ ompi on IRIX, but there is no atomic-mips-linux.s on OpenMPI 1.2.3. I was going to try to build 1.2.3 on an O2K (IRIX64 6.5 and gcc 3.3) today, but found that configure dies with configure: error: Could not determine global symbol label prefix So, I'll not be pursuing that. -Paul > We don't currently have support for a non-assembly code path. We > originally planned on having one, but the team went away from that > route over time and there's no way to build Open MPI without assembly > support right now. > > > Brian > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900