Re: [OMPI devel] orte_ns_base_select failed: returned value -1 instead of ORTE_SUCCESS
I tried using a fresh trunk, same problem have occured. Here is the complete configure line. I am using libtool 1.5.22 from fink. Otherwise everything is standard OS 10.5. $ ../trunk/configure --prefix=/Users/bouteill/ompi/build --enable- mpirun-prefix-by-default --disable-io-romio --enable-debug --enable- picky --enable-mem-debug --enable-mem-profile --enable-visibility -- disable-dlopen --disable-shared --enable-static The error message generated by abort contains garbage (line numbers do not match anything in .c files and according to gdb the failure does not occur during ns initialization). This looks like a heap corruption or something as bad. orterun (argc=4, argv=0xb81c) at ../../../../trunk/orte/tools/ orterun/orterun.c:529 529 cb_states = ORTE_PROC_STATE_TERMINATED | ORTE_PROC_STATE_AT_STG1; (gdb) n 530 rc = orte_rmgr.spawn_job(apps, num_apps, &jobid, 0, NULL, job_state_callback, cb_states, &attributes); (gdb) n 531 while (NULL != (item = opal_list_remove_first(&attributes))) OBJ_RELEASE(item); (gdb) n ** Stepping over inlined function code. ** 532 OBJ_DESTRUCT(&attributes); (gdb) n 534 if (orterun_globals.do_not_launch) { (gdb) n 539 OPAL_THREAD_LOCK(&orterun_globals.lock); (gdb) n 541 if (ORTE_SUCCESS == rc) { (gdb) n 542 while (!orterun_globals.exit) { (gdb) n 543 opal_condition_wait(&orterun_globals.cond, (gdb) n [grosse-pomme.local:77335] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/oob/base/ oob_base_init.c at line 74 Aurelien Le 30 janv. 08 à 17:18, Ralph Castain a écrit : Are you running on the trunk, or an earlier release? If the trunk, then I suspect you have a stale library hanging around. I build and run statically on Leopard regularly. On 1/30/08 2:54 PM, "Aurélien Bouteiller" wrote: I get a runtime error in static build on Mac OS 10.5 (automake 1.10, autoconf 2.60, gcc-apple-darwin 4.01, libtool 1.5.22). The error does not occur in dso builds, and everything seems to work fine on Linux. Here is the error log. ~/ompi$ mpirun -np 2 NetPIPE_3.6/NPmpi [grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/oob/base/ oob_base_init.c at line 74 [grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/ns/proxy/ ns_proxy_component.c at line 222 [grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Error in file / SourceCache/openmpi/openmpi-5/openmpi/orte/runtime/orte_init_stage1.c at line 230 -- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ns_base_select failed --> Returned value -1 instead of ORTE_SUCCESS -- -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init_stage1 failed --> Returned "Error" (-1) instead of "Success" (0) -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) -- Dr. Aurélien Bouteiller Sr. Research Associate - Innovative Computing Laboratory Suite 350, 1122 Volunteer Boulevard Knoxville, TN 37996 865 974 6321 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] 32 bit udapl warnings
Hi, I am seeing some warnings on the trunk when compiling udapl in 32 bit mode with OFED 1.2.5.1: btl_udapl.c: In function 'udapl_reg_mr': btl_udapl.c:95: warning: cast from pointer to integer of different size btl_udapl.c: In function 'mca_btl_udapl_alloc': btl_udapl.c:852: warning: cast from pointer to integer of different size btl_udapl.c: In function 'mca_btl_udapl_prepare_src': btl_udapl.c:959: warning: cast from pointer to integer of different size btl_udapl.c:1008: warning: cast from pointer to integer of different size btl_udapl_component.c: In function 'mca_btl_udapl_component_progress': btl_udapl_component.c:871: warning: cast from pointer to integer of different size btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_write_eager': btl_udapl_endpoint.c:130: warning: cast from pointer to integer of different size btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_finish_max': btl_udapl_endpoint.c:775: warning: cast from pointer to integer of different size btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_post_recv': btl_udapl_endpoint.c:864: warning: cast from pointer to integer of different size btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_initialize_control_message': btl_udapl_endpoint.c:1012: warning: cast from pointer to integer of different size Thanks, Tim
Re: [OMPI devel] vt compiler warnings and errors
Hi Matthias, I just noticed something else that seems odd. On a fresh checkout, I did a autogen and configure. Then I type 'make clean'. Things seem to progress normally, but once it gets to ompi/contrib/vt/vt/extlib/otf, a new configure script gets run. Specifically: [tprins@sif test]$ make clean Making clean in otf make[5]: Entering directory `/san/homedirs/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf' cd . && /bin/sh /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing --run automake-1.10 --gnu cd . && /bin/sh /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing --run autoconf /bin/sh ./config.status --recheck running CONFIG_SHELL=/bin/sh /bin/sh ./configure --with-zlib-lib=-lz --prefix=/usr/local --exec-prefix=/usr/local --bindir=/usr/local/bin --libdir=/usr/local/lib --includedir=/usr/local/include --datarootdir=/usr/local/share/vampirtrace --datadir=${prefix}/share/${PACKAGE_TARNAME} --docdir=${prefix}/share/${PACKAGE_TARNAME}/doc --cache-file=/dev/null --srcdir=. CXXFLAGS=-g -Wall -Wundef -Wno-long-long -finline-functions -pthread LDFLAGS= LIBS=-lnsl -lutil -lm CPPFLAGS= CFLAGS=-g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -finline-functions -fno-strict-aliasing -pthread FFLAGS= --no-create --no-recursion checking build system type... x86_64-unknown-linux-gnu Not sure if this is expected behavior, but it seems wrong to me. Thanks, Tim Matthias Jurenz wrote: Hello, all three VT related errors which MTT reported should be fixed now. 516: The fix from George Bosilca at this morning should work on MacOS PPC. Thanks! 517: The compile error occurred due to a missing header include. Futhermore, the compiler warnings should be also fixed. 518: I have added a check whether MPI I/O is available and add the corresponding VT's configure option to enable/disable MPI I/O support. Therefor I used the variable "define_mpi_io" from 'ompi/mca/io/configure.m4'. Is that o.k. or should I use another variable ? Matthias On Di, 2008-01-29 at 09:19 -0500, Jeff Squyres wrote: I got a bunch of compiler warnings and errors with VT on the PGI compiler last night -- my mail client won't paste it in nicely. :-( See these MTT reports for details: - On Absoft systems: http://www.open-mpi.org/mtt/index.php?do_redir=516 - On Cisco systems: With PGI compilers: http://www.open-mpi.org/mtt/index.php?do_redir=517 With GNU compilers: http://www.open-mpi.org/mtt/index.php?do_redir=518 The output may be a bit hard to read -- for MTT builds, we separate the stdout and stderr into 2 streams. So you kinda have to merge them in your head; sorry... -- Matthias Jurenz, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A106, Zellescher Weg 12, 01062 Dresden phone +49-351-463-31945, fax +49-351-463-37773 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] 32 bit udapl warnings
This was brought to my attention once before but I don't see this message so I just plain forgot about it. :-( uDAPL defines its pointers as uint64, "typedef DAT_UINT64 DAT_VADDR", and pval is a "void *" which is why the message comes up. If I remove the cast I believe I get a different warning and I just haven't stopped to think of a way around this. Tim Prins wrote: Hi, I am seeing some warnings on the trunk when compiling udapl in 32 bit mode with OFED 1.2.5.1: btl_udapl.c: In function 'udapl_reg_mr': btl_udapl.c:95: warning: cast from pointer to integer of different size btl_udapl.c: In function 'mca_btl_udapl_alloc': btl_udapl.c:852: warning: cast from pointer to integer of different size btl_udapl.c: In function 'mca_btl_udapl_prepare_src': btl_udapl.c:959: warning: cast from pointer to integer of different size btl_udapl.c:1008: warning: cast from pointer to integer of different size btl_udapl_component.c: In function 'mca_btl_udapl_component_progress': btl_udapl_component.c:871: warning: cast from pointer to integer of different size btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_write_eager': btl_udapl_endpoint.c:130: warning: cast from pointer to integer of different size btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_finish_max': btl_udapl_endpoint.c:775: warning: cast from pointer to integer of different size btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_post_recv': btl_udapl_endpoint.c:864: warning: cast from pointer to integer of different size btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_initialize_control_message': btl_udapl_endpoint.c:1012: warning: cast from pointer to integer of different size Thanks, Tim ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] 32 bit udapl warnings
On Thu, Jan 31, 2008 at 08:45:54AM -0500, Don Kerr wrote: > This was brought to my attention once before but I don't see this > message so I just plain forgot about it. :-( > uDAPL defines its pointers as uint64, "typedef DAT_UINT64 DAT_VADDR", > and pval is a "void *" which is why the message comes up. If I remove > the cast I believe I get a different warning and I just haven't stopped > to think of a way around this. dat_pointer = (DAT_VADDR)(uintptr_t)void_pointer; This is not just a warning. This is a real bug. If MSB of a void pointer will be 1 it will be sign extended. > > Tim Prins wrote: > > Hi, > > > > I am seeing some warnings on the trunk when compiling udapl in 32 bit > > mode with OFED 1.2.5.1: > > > > btl_udapl.c: In function 'udapl_reg_mr': > > btl_udapl.c:95: warning: cast from pointer to integer of different size > > btl_udapl.c: In function 'mca_btl_udapl_alloc': > > btl_udapl.c:852: warning: cast from pointer to integer of different size > > btl_udapl.c: In function 'mca_btl_udapl_prepare_src': > > btl_udapl.c:959: warning: cast from pointer to integer of different size > > btl_udapl.c:1008: warning: cast from pointer to integer of different size > > btl_udapl_component.c: In function 'mca_btl_udapl_component_progress': > > btl_udapl_component.c:871: warning: cast from pointer to integer of > > different size > > btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_write_eager': > > btl_udapl_endpoint.c:130: warning: cast from pointer to integer of > > different size > > btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_finish_max': > > btl_udapl_endpoint.c:775: warning: cast from pointer to integer of > > different size > > btl_udapl_endpoint.c: In function 'mca_btl_udapl_endpoint_post_recv': > > btl_udapl_endpoint.c:864: warning: cast from pointer to integer of > > different size > > btl_udapl_endpoint.c: In function > > 'mca_btl_udapl_endpoint_initialize_control_message': > > btl_udapl_endpoint.c:1012: warning: cast from pointer to integer of > > different size > > > > > > Thanks, > > > > Tim > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.
Re: [OMPI devel] [OMPI svn] svn:open-mpi r17307
On Wed, Jan 30, 2008 at 06:48:54PM +0100, Adrian Knoth wrote: > > What is the real issue behind this whole discussion? > Hanging connections. > I'll have a look at it tomorrow. To everybody who's interested in BTL-TCP, especially George and (to a minor degree) rhc: I've integrated something what I call "magic address selection code". See the comments in r17348. Can you check https://svn.open-mpi.org/svn/ompi/tmp-public/btl-tcp if it's working for you? Read: multi-rail TCP, FNN, whatever is important to you? The code is proof of concept and could use a little tuning (if it's working at all. Over here, it satisfies all tests). I vaguely remember that at least Ralph doesn't like int a[perm_size * sizeof(int)]; where perm_size is dynamically evaluated (read: array size is runtime dependent) There are also some large arrays, search for MAX_KERNEL_INTERFACE_INDEX. Perhaps it's better to replace them with an appropriate OMPI data structure. I don't know what fits best, you guys know the details... So please give the code a try, and if it's working, feel free to cleanup whatever is necessary to make it the OMPI style or give me some pointers what to change. I'd like to point to Thomas' diploma thesis. The PDF explains the theory behind the code, it's like an rationale. Unfortunately, the PDF has some typos, but I guess you'll get the idea. It's a graph matching algorithm, Chapter 3 covers everything in detail: http://cluster.inf-ra.uni-jena.de/~adi/peiselt-thesis.pdf HTH -- Cluster and Metacomputing Working Group Friedrich-Schiller-Universität Jena, Germany private: http://adi.thur.de
[OMPI devel] SnapC
Hi all (and Josh), Why the ompi-checkpoint have to contact the HNP specifically? If I use another process to start the snapshot coordinator, apparently it´s works fine, no? PS: I prefer to send this message to the list... to keep it on the history for further use... -- Leonardo Fialho Computer Architecture and Operating Systems Department - CAOS Universidad Autonoma de Barcelona - UAB ETSE, Edifcio Q, QC/3088 http://www.caos.uab.es Phone: +34-93-581-2888 Fax: +34-93-581-2478
Re: [OMPI devel] SnapC
So the ompi-checkpoint command connects with the Global Coordinator in the SnapC 'full' component. The Global Coordinator lives in the HNP (mpirun/orterun) as determined by the 'full' component. As a result to start a checkpoint ompi-checkpoint must connect to the HNP. From a user standpoint, they are typically running ompi-checkpoint from the same machine where they started mpirun. So it made the most sense to have these two connect to each other, especially if we ask the user to provide the PID of the mpirun process to checkpoint. That being said, with the proper changes to 'full' (or with a new SnapC component), ompi-checkpoint could issue the checkpoint request to any process in the MPI job [orterun, orted, application processes] and have the correct things happen. I have received one request for this functionality, but have not had the time yet to dig into it. Does that help? Cheers, Josh On Jan 31, 2008, at 9:51 AM, Leonardo Fialho wrote: Hi all (and Josh), Why the ompi-checkpoint have to contact the HNP specifically? If I use another process to start the snapshot coordinator, apparently it´s works fine, no? PS: I prefer to send this message to the list... to keep it on the history for further use... -- Leonardo Fialho Computer Architecture and Operating Systems Department - CAOS Universidad Autonoma de Barcelona - UAB ETSE, Edifcio Q, QC/3088 http://www.caos.uab.es Phone: +34-93-581-2888 Fax: +34-93-581-2478 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] vt compiler warnings and errors
Hi Tim, that seems wrong for me, too. I could not reproduce this on my computer. The VT-integration comes with an own configure script, which will not created by the OMPI's autogen.sh. I have not really an idea what's going wrong... I suppose, the problem is that you use another version of the Autotools as I have used to bootstap VT ?!? The VT's configure script was created by following version of the Autotools: autoconf 2.61, automake 1.10, libtool 1.5.24. Which version of the Autotools you are using to boostrap OpenMPI ? Matthias On Do, 2008-01-31 at 08:09 -0500, Tim Prins wrote: > Hi Matthias, > > I just noticed something else that seems odd. On a fresh checkout, I did > a autogen and configure. Then I type 'make clean'. Things seem to > progress normally, but once it gets to ompi/contrib/vt/vt/extlib/otf, a > new configure script gets run. > > Specifically: > [tprins@sif test]$ make clean > > Making clean in otf > make[5]: Entering directory > `/san/homedirs/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf' > cd . && /bin/sh > /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing --run > automake-1.10 --gnu > cd . && /bin/sh /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing > --run autoconf > /bin/sh ./config.status --recheck > running CONFIG_SHELL=/bin/sh /bin/sh ./configure --with-zlib-lib=-lz > --prefix=/usr/local --exec-prefix=/usr/local --bindir=/usr/local/bin > --libdir=/usr/local/lib --includedir=/usr/local/include > --datarootdir=/usr/local/share/vampirtrace > --datadir=${prefix}/share/${PACKAGE_TARNAME} > --docdir=${prefix}/share/${PACKAGE_TARNAME}/doc --cache-file=/dev/null > --srcdir=. CXXFLAGS=-g -Wall -Wundef -Wno-long-long -finline-functions > -pthread LDFLAGS= LIBS=-lnsl -lutil -lm CPPFLAGS= CFLAGS=-g -Wall > -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes > -Wstrict-prototypes -Wcomment -pedantic > -Werror-implicit-function-declaration -finline-functions > -fno-strict-aliasing -pthread FFLAGS= --no-create --no-recursion > checking build system type... x86_64-unknown-linux-gnu > > > > Not sure if this is expected behavior, but it seems wrong to me. > > Thanks, > > Tim > > Matthias Jurenz wrote: > > Hello, > > > > all three VT related errors which MTT reported should be fixed now. > > > > 516: > > The fix from George Bosilca at this morning should work on MacOS PPC. > > Thanks! > > > > 517: > > The compile error occurred due to a missing header include. > > Futhermore, the compiler warnings should be also fixed. > > > > 518: > > I have added a check whether MPI I/O is available and add the > > corresponding VT's > > configure option to enable/disable MPI I/O support. Therefor I used the > > variable > > "define_mpi_io" from 'ompi/mca/io/configure.m4'. Is that o.k. or should > > I use another > > variable ? > > > > > > Matthias > > > > > > On Di, 2008-01-29 at 09:19 -0500, Jeff Squyres wrote: > >> I got a bunch of compiler warnings and errors with VT on the PGI > >> compiler last night -- my mail client won't paste it in nicely. :-( > >> > >> See these MTT reports for details: > >> > >> - On Absoft systems: > >>http://www.open-mpi.org/mtt/index.php?do_redir=516 > >> - On Cisco systems: > >>With PGI compilers: > >>http://www.open-mpi.org/mtt/index.php?do_redir=517 > >>With GNU compilers: > >>http://www.open-mpi.org/mtt/index.php?do_redir=518 > >> > >> The output may be a bit hard to read -- for MTT builds, we separate > >> the stdout and stderr into 2 streams. So you kinda have to merge them > >> in your head; sorry... > >> > > -- > > Matthias Jurenz, > > Center for Information Services and > > High Performance Computing (ZIH), TU Dresden, > > Willersbau A106, Zellescher Weg 12, 01062 Dresden > > phone +49-351-463-31945, fax +49-351-463-37773 > > > > > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Matthias Jurenz, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A106, Zellescher Weg 12, 01062 Dresden phone +49-351-463-31945, fax +49-351-463-37773 smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] vt compiler warnings and errors
Ah -- I didn't notice this before -- do you have a configure script committed to SVN? If so, this could be the problem. Whether what Tim sees happens or not will depend on the timestamps that SVN puts on configure and all of the files dependent upon configure (Makefile.in, Makefile, ...etc.) in the VT tree. If some of them have "bad" timestamps, then the dependencies in the Makefiles can end up re-running VT's configure, re-create configure, etc. Is there a way to get OMPI's autogen to also autogen the VT software? This would ensure one, consistent set of timestamps (not dependent upon what timestamps SVN wrote to your filesystem for these sensitive files). On Jan 31, 2008, at 12:36 PM, Matthias Jurenz wrote: Hi Tim, that seems wrong for me, too. I could not reproduce this on my computer. The VT-integration comes with an own configure script, which will not created by the OMPI's autogen.sh. I have not really an idea what's going wrong... I suppose, the problem is that you use another version of the Autotools as I have used to bootstap VT ?!? The VT's configure script was created by following version of the Autotools: autoconf 2.61, automake 1.10, libtool 1.5.24. Which version of the Autotools you are using to boostrap OpenMPI ? Matthias On Do, 2008-01-31 at 08:09 -0500, Tim Prins wrote: Hi Matthias, I just noticed something else that seems odd. On a fresh checkout, I did a autogen and configure. Then I type 'make clean'. Things seem to progress normally, but once it gets to ompi/contrib/vt/vt/extlib/ otf, a new configure script gets run. Specifically: [tprins@sif test]$ make clean Making clean in otf make[5]: Entering directory `/san/homedirs/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf' cd . && /bin/sh /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/missing --run automake-1.10 --gnu cd . && /bin/sh /u/tprins/sif/test/ompi/contrib/vt/vt/extlib/otf/ missing --run autoconf /bin/sh ./config.status --recheck running CONFIG_SHELL=/bin/sh /bin/sh ./configure --with-zlib-lib=-lz --prefix=/usr/local --exec-prefix=/usr/local --bindir=/usr/local/bin --libdir=/usr/local/lib --includedir=/usr/local/include --datarootdir=/usr/local/share/vampirtrace --datadir=${prefix}/share/${PACKAGE_TARNAME} --docdir=${prefix}/share/${PACKAGE_TARNAME}/doc --cache-file=/dev/ null --srcdir=. CXXFLAGS=-g -Wall -Wundef -Wno-long-long -finline- functions -pthread LDFLAGS= LIBS=-lnsl -lutil -lm CPPFLAGS= CFLAGS=-g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -finline-functions -fno-strict-aliasing -pthread FFLAGS= --no-create --no-recursion checking build system type... x86_64-unknown-linux-gnu Not sure if this is expected behavior, but it seems wrong to me. Thanks, Tim Matthias Jurenz wrote: > Hello, > > all three VT related errors which MTT reported should be fixed now. > > 516: > The fix from George Bosilca at this morning should work on MacOS PPC. > Thanks! > > 517: > The compile error occurred due to a missing header include. > Futhermore, the compiler warnings should be also fixed. > > 518: > I have added a check whether MPI I/O is available and add the > corresponding VT's > configure option to enable/disable MPI I/O support. Therefor I used the > variable > "define_mpi_io" from 'ompi/mca/io/configure.m4'. Is that o.k. or should > I use another > variable ? > > > Matthias > > > On Di, 2008-01-29 at 09:19 -0500, Jeff Squyres wrote: >> I got a bunch of compiler warnings and errors with VT on the PGI >> compiler last night -- my mail client won't paste it in nicely. :-( >> >> See these MTT reports for details: >> >> - On Absoft systems: >>http://www.open-mpi.org/mtt/index.php?do_redir=516 >> - On Cisco systems: >>With PGI compilers: >>http://www.open-mpi.org/mtt/index.php?do_redir=517 >>With GNU compilers: >>http://www.open-mpi.org/mtt/index.php?do_redir=518 >> >> The output may be a bit hard to read -- for MTT builds, we separate >> the stdout and stderr into 2 streams. So you kinda have to merge them >> in your head; sorry... >> > -- > Matthias Jurenz, > Center for Information Services and > High Performance Computing (ZIH), TU Dresden, > Willersbau A106, Zellescher Weg 12, 01062 Dresden > phone +49-351-463-31945, fax +49-351-463-37773 > > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Matthias Jurenz, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A106, Zellescher Weg 12, 01062 Dresden phone +49-351-463-31945, fax +49-351-463-37773
Re: [OMPI devel] orte_ns_base_select failed: returned value -1 instead of ORTE_SUCCESS
Hmmm...well, my bad. There does indeed appear to be something funny going on with Leopard. No idea what - it used to work fine. I haven't tested it in awhile though - I've been test building regularly on Leopard, but running on Tiger (I misspoke earlier). For now, I'm afraid you can't run on Leopard. Have to figure it out later when I have more time. Ralph > -- Forwarded Message >> From: Aurélien Bouteiller >> Reply-To: Open MPI Developers >> Date: Thu, 31 Jan 2008 02:18:27 -0500 >> To: Open MPI Developers >> Subject: Re: [OMPI devel] orte_ns_base_select failed: returned value -1 >> instead of ORTE_SUCCESS >> >> I tried using a fresh trunk, same problem have occured. Here is the >> complete configure line. I am using libtool 1.5.22 from fink. >> Otherwise everything is standard OS 10.5. >> >>$ ../trunk/configure --prefix=/Users/bouteill/ompi/build --enable- >> mpirun-prefix-by-default --disable-io-romio --enable-debug --enable- >> picky --enable-mem-debug --enable-mem-profile --enable-visibility -- >> disable-dlopen --disable-shared --enable-static >> >> The error message generated by abort contains garbage (line numbers do >> not match anything in .c files and according to gdb the failure does >> not occur during ns initialization). This looks like a heap corruption >> or something as bad. >> >> orterun (argc=4, argv=0xb81c) at ../../../../trunk/orte/tools/ >> orterun/orterun.c:529 >> 529 cb_states = ORTE_PROC_STATE_TERMINATED | >> ORTE_PROC_STATE_AT_STG1; >> (gdb) n >> 530 rc = orte_rmgr.spawn_job(apps, num_apps, &jobid, 0, NULL, >> job_state_callback, cb_states, &attributes); >> (gdb) n >> 531 while (NULL != (item = opal_list_remove_first(&attributes))) >> OBJ_RELEASE(item); >> (gdb) n >> ** Stepping over inlined function code. ** >> 532 OBJ_DESTRUCT(&attributes); >> (gdb) n >> 534 if (orterun_globals.do_not_launch) { >> (gdb) n >> 539 OPAL_THREAD_LOCK(&orterun_globals.lock); >> (gdb) n >> 541 if (ORTE_SUCCESS == rc) { >> (gdb) n >> 542 while (!orterun_globals.exit) { >> (gdb) n >> 543 opal_condition_wait(&orterun_globals.cond, >> (gdb) n >> [grosse-pomme.local:77335] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in >> file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/oob/base/ >> oob_base_init.c at line 74 >> >> Aurelien >> >> >> Le 30 janv. 08 à 17:18, Ralph Castain a écrit : >> >>> Are you running on the trunk, or an earlier release? >>> >>> If the trunk, then I suspect you have a stale library hanging >>> around. I >>> build and run statically on Leopard regularly. >>> >>> >>> On 1/30/08 2:54 PM, "Aurélien Bouteiller" >>> wrote: >>> I get a runtime error in static build on Mac OS 10.5 (automake 1.10, autoconf 2.60, gcc-apple-darwin 4.01, libtool 1.5.22). The error does not occur in dso builds, and everything seems to work fine on Linux. Here is the error log. ~/ompi$ mpirun -np 2 NetPIPE_3.6/NPmpi [grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/oob/base/ oob_base_init.c at line 74 [grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Bad parameter in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/ns/proxy/ ns_proxy_component.c at line 222 [grosse-pomme.local:34247] [NO-NAME] ORTE_ERROR_LOG: Error in file / SourceCache/openmpi/openmpi-5/openmpi/orte/runtime/orte_init_stage1.c at line 230 -- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ns_base_select failed --> Returned value -1 instead of ORTE_SUCCESS -- -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init_stage1 failed --> Returned "Error" (-1) instead of "Success" (0) -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) -- D