[OMPI devel] Trunk borked
We seem to have a problem on the trunk this morning. I am building on a platform with the following configuration: with_threads=no enable_dlopen=no enable_pty_support=no with_tm=/opt/PBS LDFLAGS=-L/opt/PBS/lib64 with_openib=/opt/ofed with_memory_manager=no enable_mem_debug=yes enable_mem_profile=no enable_debug_symbols=yes enable_binaries=yes with_devel_headers=yes enable_heterogeneous=no enable_picky=yes The compile errors out in the OpenIB BTL with the following error: btl_openib_proc.c: In function `mca_btl_openib_proc_create': btl_openib_proc.c:159: error: `i' undeclared (first use in this function) btl_openib_proc.c:159: error: (Each undeclared identifier is reported only once btl_openib_proc.c:159: error: for each function it appears in.) make[2]: *** [btl_openib_proc.lo] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 When I look at the code, the problem is the following #if: #if !defined(WORDS_BIGENDIAN) && OMPI_ENABLE_HETEROGENEOUS_SUPPORT size_t i; #endif Yet the code will ALWAYS use that variable to unpack all the ports. I removed the #if to clear the problem, but before committing the change, I wanted to ask why someone thought this test needed to be in the code. Should the entire loop unpacking all the ports be similarly protected, or was the protection around the variable declaration simply an error? Thanks Ralph
Re: [OMPI devel] Trunk borked
On Mon, Jan 28, 2008 at 07:26:56AM -0700, Ralph H Castain wrote: > We seem to have a problem on the trunk this morning. I am building on a There are more errors: /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function `fsetpos': /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:850: error: request for member `__pos' in something not a structure or union /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function `fsetpos64': /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:876: error: request for member `__pos' in something not a structure or union gmake[5]: *** [vt_iowrap.o] Error 1 gmake[5]: Leaving directory `/tmp/ompi/build/SunOS-i86pc/ompi/ompi/contrib/vt/vt/vtlib' /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function `fsetpos': /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:850: error: request for member `__pos' in something not a structure or union /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function `fsetpos64': /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:876: error: request for member `__pos' in something not a structure or union gmake[5]: *** [vt_iowrap.o] Error 1 gmake[5]: Leaving directory `/tmp/ompi/build/SunOS-i86pc/ompi/ompi/contrib/vt/vt/vtlib' Just my $0.02 -- Cluster and Metacomputing Working Group Friedrich-Schiller-Universität Jena, Germany private: http://adi.thur.de
Re: [OMPI devel] Trunk borked
Doh -- sorry about that. r17282 removes the erroneous #if. On Jan 28, 2008, at 9:26 AM, Ralph H Castain wrote: We seem to have a problem on the trunk this morning. I am building on a platform with the following configuration: with_threads=no enable_dlopen=no enable_pty_support=no with_tm=/opt/PBS LDFLAGS=-L/opt/PBS/lib64 with_openib=/opt/ofed with_memory_manager=no enable_mem_debug=yes enable_mem_profile=no enable_debug_symbols=yes enable_binaries=yes with_devel_headers=yes enable_heterogeneous=no enable_picky=yes The compile errors out in the OpenIB BTL with the following error: btl_openib_proc.c: In function `mca_btl_openib_proc_create': btl_openib_proc.c:159: error: `i' undeclared (first use in this function) btl_openib_proc.c:159: error: (Each undeclared identifier is reported only once btl_openib_proc.c:159: error: for each function it appears in.) make[2]: *** [btl_openib_proc.lo] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 When I look at the code, the problem is the following #if: #if !defined(WORDS_BIGENDIAN) && OMPI_ENABLE_HETEROGENEOUS_SUPPORT size_t i; #endif Yet the code will ALWAYS use that variable to unpack all the ports. I removed the #if to clear the problem, but before committing the change, I wanted to ask why someone thought this test needed to be in the code. Should the entire loop unpacking all the ports be similarly protected, or was the protection around the variable declaration simply an error? Thanks Ralph ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Trunk borked
Doh - this is Solaris on x86? I think Terry said Solaris/sparc was tested... VT guys -- can you check out what's going on? On Jan 28, 2008, at 9:36 AM, Adrian Knoth wrote: On Mon, Jan 28, 2008 at 07:26:56AM -0700, Ralph H Castain wrote: We seem to have a problem on the trunk this morning. I am building on a There are more errors: /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function `fsetpos': /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:850: error: request for member `__pos' in something not a structure or union /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function `fsetpos64': /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:876: error: request for member `__pos' in something not a structure or union gmake[5]: *** [vt_iowrap.o] Error 1 gmake[5]: Leaving directory `/tmp/ompi/build/SunOS-i86pc/ompi/ompi/contrib/vt/vt/vtlib' /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function `fsetpos': /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:850: error: request for member `__pos' in something not a structure or union /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function `fsetpos64': /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:876: error: request for member `__pos' in something not a structure or union gmake[5]: *** [vt_iowrap.o] Error 1 gmake[5]: Leaving directory `/tmp/ompi/build/SunOS-i86pc/ompi/ompi/contrib/vt/vt/vtlib' Just my $0.02 -- Cluster and Metacomputing Working Group Friedrich-Schiller-Universität Jena, Germany private: http://adi.thur.de ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
[OMPI devel] vt Makefile.in's
I see the following in SVN: ompi/contrib/vt/Makefile.in ompi/contrib/vt/wrappers/Makefile.in I don't think that these should in SVN because there are corresponding Makefile.am's in those dirs. I'll remove them and update svn:ignore in those dirs tonight (because removing them will cause everyone to re-autogen). -- Jeff Squyres Cisco Systems
[OMPI devel] VT in trunk + how to disable
Hi everybody, the vampirtrace integration arrived at the trunk today. There seems to be one issue already, but we'll fix this asap. As a general hint, this is how to completely disable anything we integrated: configure --enable-contrib-no-build=vt ... Then again, we'd like to see all the issues you may encounter and fix them. Best regards, Andreas -- Dipl. Math. Andreas Knuepfer, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A114, Zellescher Weg 12, 01062 Dresden phone +49-351-463-38323, fax +49-351-463-37773 pgp5Xd0iiL0dD.pgp Description: PGP signature
Re: [OMPI devel] Trunk borked
Hello, this problem should be fixed now... It seems that the symbol '__pos' is not available on every platform. This isn't a problem, because it's only used for a debug control message. Regards, Matthias On Mo, 2008-01-28 at 09:41 -0500, Jeff Squyres wrote: > Doh - this is Solaris on x86? I think Terry said Solaris/sparc was > tested... > > VT guys -- can you check out what's going on? > > > > On Jan 28, 2008, at 9:36 AM, Adrian Knoth wrote: > > > On Mon, Jan 28, 2008 at 07:26:56AM -0700, Ralph H Castain wrote: > > > >> We seem to have a problem on the trunk this morning. I am building > >> on a > > > > There are more errors: > > > > /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function > > `fsetpos': > > /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:850: error: request > > for member `__pos' in something not a structure or union > > /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function > > `fsetpos64': > > /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:876: error: request > > for member `__pos' in something not a structure or union > > gmake[5]: *** [vt_iowrap.o] Error 1 > > gmake[5]: Leaving directory > > `/tmp/ompi/build/SunOS-i86pc/ompi/ompi/contrib/vt/vt/vtlib' > > /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function > > `fsetpos': > > /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:850: error: request > > for member `__pos' in something not a structure or union > > /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function > > `fsetpos64': > > /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:876: error: request > > for member `__pos' in something not a structure or union > > gmake[5]: *** [vt_iowrap.o] Error 1 > > gmake[5]: Leaving directory > > `/tmp/ompi/build/SunOS-i86pc/ompi/ompi/contrib/vt/vt/vtlib' > > > > > > Just my $0.02 > > > > -- > > Cluster and Metacomputing Working Group > > Friedrich-Schiller-Universität Jena, Germany > > > > private: http://adi.thur.de > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > -- Matthias Jurenz, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A106, Zellescher Weg 12, 01062 Dresden phone +49-351-463-31945, fax +49-351-463-37773 smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] dropping a pls module into an Open MPI build
One thing you might check if you suspect compiler alignment issues is running "ompi_info --all" and see what Apple used to configure/build OMPI. We save the CFLAGS and whatnot; they may be helpful to you...? I see on my MBP/Leopard 10.5.1, for example: C compiler absolute: /usr/bin/gcc ... Build CFLAGS: -O3 -DNDEBUG -arch i386 -finline-functions - fno-strict-aliasing Build CXXFLAGS: -O3 -DNDEBUG -arch i386 -finline-functions Build FFLAGS: Build FCFLAGS: Build LDFLAGS: -export-dynamic -Wl,-u,_munmap -Wl,- multiply_defined,suppress Build LIBS: -lutil Wrapper extra CFLAGS: Wrapper extra CXXFLAGS: Wrapper extra FFLAGS: Wrapper extra FCFLAGS: Wrapper extra LDFLAGS: -Wl,-u,_munmap -Wl,- multiply_defined,suppress Wrapper extra LIBS: -lutil I'll *guess* that the -Wl options came from OMPI's normal configure script. But the -arch and -f might have come from Apple...? That being said, I'm *not* sure how this information relates to the universal binaries... It *may* be that you'll see the different options for the different architectures depending on which machine you run "ompi_info" on...? I don't know enough about how universal binaries are built or run to know. On Jan 24, 2008, at 1:12 PM, Ralph H Castain wrote: Appreciate the clarification. I am unaware of anyone attempting that procedure in the past, but I'm not terribly surprised to hear it would encounter problems and/or fail. Given the myriad of configuration options in the code base, it would seem almost miraculous that you could either (a) hit the same config options used by Apple (whatever they were), or (b) manage to find a combination that matched enough to let you do this without problem. Frankly, I'm surprised even this small a fix would let you work around the problems... ;-) Unless you have some overriding reason to use the shipped binaries for everything other than this special component, you're probably going to have a lot more success just rebuilding from source. But that's just an opinion - either way, good luck with your efforts! Ralph On 1/24/08 10:54 AM, "Dean Dauger, Ph. D." wrote: I'm sorry, but now I am totally confused. Are you saying that you are having problems with the default rsh component in the distributed 1.2.3 code?? Yes ... Or are you having a problem with your customized version? and yes. Each exhibited the same problem, a bus error. What compiler are you using? If it's your customized version, did you make sure to change the names of the data structures and modules as I pointed out? gcc 4.0.1, the default of Leopard. Yes, in the customized version, I did change the names of the data structures, subroutines, support file names, and where it says "rsh" just like you said. We regularly work on Macs, both PPC and Intel based (I develop and test on both every day), and I have -never- seen this problem in our code base. Hence my confusion. I'm sorry to confuse. I'm starting with the shipping Mac OS X 10.5.1 "Leopard", which contains its own build of Open MPI (v1.2.3 according to "orterun -version"). So I assumed that the v1.2.3 branch from svn.open-mpi.org was the same code Apple used to build the Open MPI that ships in Leopard. My motivation was to build a new pls module based on pls_rsh module's source code, substituting the rsh with my own name like you said, but I encountered a bus error. So to be sure I didn't screw up somewhere in my custom module I rebuilt the unmodified pls_rsh module and discovered the same problem. Then, after downloading the Open MPI from opensource.apple.com (suspecting it was different), I tried recompiling the pls_rsh module from that source code, dropped in just the resulting mca_pls_rsh.la and mca_pls_rsh.so into the existing /usr/lib/openmpi of Leopard, overwriting Leopard's versions, and the bus error happened the same as before. That's where I was with my first post to this list. My last post regards the discovery that rearranging the elements of orte_pls_rsh_component_t, without changing anything else about the pls_rsh code, affects the bus error outcome. Then I padded out orte_pls_rsh_component_t and my "orte_pls_dean_component_t" by hand so that it would be "data alignment agnostic", if you will. Consequently the bus error no longer occurs and both pls modules now run as they should. My hypothesis: Apple's procedure to build Open MPI into Leopard had a side effect requiring shared object code structures to follow a data alignment different than if I simply recompile Open MPI straight from its source. I'm not saying anyone is to blame, but I'm recognizing that those builds have different timelines. I predict that if I overwrite all of Leopard's Open MPI object code, then it would all run too. For my needs, I have a sufficient workaround: realign my data structures to be "agnostic". I'm sharing this littl
[OMPI devel] Configure error/warning in nightly tarball
I noticed that when running configure on the nightly snapshot tarball the following errors (warnings really, since it didn't stop configure) were produced. These seem to be remnants from the autogen.sh script pointing to files that do not (and should not) exist in the distribution. - shell$ ./configure --prefix=/foo/bar/ ... grep: ./orte/mca/gpr/proxy/configure.params: No such file or directory grep: ./orte/mca/gpr/replica/configure.params: No such file or directory grep: ./orte/mca/gpr/null/configure.params: No such file or directory - Any thoughts on how to fix this? I was using the r17175 nightly tarball. Cheers, Josh