Re: [O-MPI devel] [PATCH] Update Open MPI for new libibverbs API
On Sep 26, 2005, at 4:20 PM, Roland Dreier wrote: [It's somewhat annoying to have to subscribe to de...@open-mpi.org just to be able to send patches, but oh well...] It's even more annoying to be deluged with SPAM ;). We (the LAM developers) used to try to keep our mailing lists as open as possible. In the end, SPAM pushed the signal to noise ratio way too high and something had to be done. Requiring subscriptions to post was the best we could do. This patch updates Open MPI for the new ibv_create_cq() API. Signed-off-by: Roland Dreier I'll admit my ignorance - is this part of a particular release of OpenIB, or is this something that has happened recently in SVN? I ask because we already have people using OpenIB and Open MPI together, and it would be bad to suddenly break things for them. Testing for number of arguments in a function is horribly unreliable - is there some version number or other key in the Open IB headers we can use to determine which version of the function to use? Brian --- ompi/mca/btl/openib/btl_openib.c(revision 7507) +++ ompi/mca/btl/openib/btl_openib.c(working copy) @@ -656,7 +656,8 @@ int mca_btl_openib_module_init(mca_btl_o } /* Create the low and high priority queue pairs */ -openib_btl->ib_cq_low = ibv_create_cq(ctx, mca_btl_openib_component.ib_cq_size, NULL); +openib_btl->ib_cq_low = ibv_create_cq(ctx, mca_btl_openib_component.ib_cq_size, + NULL, NULL, 0); if(NULL == openib_btl->ib_cq_low) { BTL_ERROR(("error creating low priority cq for %s errno says %s\n", @@ -665,7 +666,8 @@ int mca_btl_openib_module_init(mca_btl_o return OMPI_ERROR; } -openib_btl->ib_cq_high = ibv_create_cq(ctx, mca_btl_openib_component.ib_cq_size, NULL); +openib_btl->ib_cq_high = ibv_create_cq(ctx, mca_btl_openib_component.ib_cq_size, + NULL, NULL, 0); if(NULL == openib_btl->ib_cq_high) { BTL_ERROR(("error creating high priority cq for %s errno says %s\n", ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [O-MPI devel] ompi_info Seg Fault, missing component -- linux (solved?) (fwd)
On Mon, 2005-09-26 at 20:09 +, Ferris McCormick wrote: > On Mon, 2005-09-26 at 14:59 +, Ferris McCormick wrote: > > On Fri, 2005-09-16 at 11:35 -0500, Brian Barrett wrote: > > > On Sep 16, 2005, at 8:44 AM, Ferris McCormick wrote: > > > > > > > == > > > > fmccor@polylepis util [235]% ./opal_timer > > > > --> frequency: 9 > > > > --> cycle count > > > > Slept approximately 903151189 cycles, or 1003501 us > > > > --> usecs > > > > Slept approximately 18446744073289684648 us > > > > == > > > > > > That last value means that I'm munging the upper 32 bits of the tick > > > register (it's 64 bits long). So we're not quite there yet, but > > > getting closer. I should be able to get to that today. > > > > > > The other problem is very odd. Since you're compiling in 32bit mode, > > > I'd expect us to see it on our PowerPC machines, but I haven't run into > > > that one yet. I'll try to compile without debugging and see what I can > > > see. > > > > > > > > > Brian Here's where the SegFault comes from. For whatever reason, when working with the verbose opal_output_stream, eventually opal_paffinity_base_open sets opal_paffinity_base_output=-1 (at paffinity_base_open.c, 62) and calls mca_base_components_open with that value as the output_id. In turn, if output_id!=0, mca_base_components_open calls: = if (output_id != 0) { opal_output_set_verbosity(output_id, verbose_level); } == Now, opal_set_verbosity (in opal/util/output.c) unconditionally does this: info[output_id].ldi_verbose_level = level; (where, for verbose, this is info[-1].ldi_verbose_level=0;) On my system, this wipes out verbose itself. Elsewhere in output.c, such constructs are bracketed with if(output_id >= 0) { ... } (or if(-1 == output_id) {...}), and I suspect that is needed here, too. Hope this helps, Ferris -- Ferris McCormick (P44646, MI) Developer, Gentoo Linux (Sparc, Devrel) signature.asc Description: This is a digitally signed message part
Re: [O-MPI devel] ompi_info Seg Fault, missing component -- linux (solved?) (fwd)
Thanks muchly for tracking this down! I'm working on the fixes right now; will commit shortly. On Sep 27, 2005, at 11:59 AM, Ferris McCormick wrote: On Mon, 2005-09-26 at 20:09 +, Ferris McCormick wrote: On Mon, 2005-09-26 at 14:59 +, Ferris McCormick wrote: On Fri, 2005-09-16 at 11:35 -0500, Brian Barrett wrote: On Sep 16, 2005, at 8:44 AM, Ferris McCormick wrote: == fmccor@polylepis util [235]% ./opal_timer --> frequency: 9 --> cycle count Slept approximately 903151189 cycles, or 1003501 us --> usecs Slept approximately 18446744073289684648 us == That last value means that I'm munging the upper 32 bits of the tick register (it's 64 bits long). So we're not quite there yet, but getting closer. I should be able to get to that today. The other problem is very odd. Since you're compiling in 32bit mode, I'd expect us to see it on our PowerPC machines, but I haven't run into that one yet. I'll try to compile without debugging and see what I can see. Brian Here's where the SegFault comes from. For whatever reason, when working with the verbose opal_output_stream, eventually opal_paffinity_base_open sets opal_paffinity_base_output=-1 (at paffinity_base_open.c, 62) and calls mca_base_components_open with that value as the output_id. In turn, if output_id!=0, mca_base_components_open calls: = if (output_id != 0) { opal_output_set_verbosity(output_id, verbose_level); } == Now, opal_set_verbosity (in opal/util/output.c) unconditionally does this: info[output_id].ldi_verbose_level = level; (where, for verbose, this is info[-1].ldi_verbose_level=0;) On my system, this wipes out verbose itself. Elsewhere in output.c, such constructs are bracketed with if(output_id >= 0) { ... } (or if(-1 == output_id) {...}), and I suspect that is needed here, too. Hope this helps, Ferris -- Ferris McCormick (P44646, MI) Developer, Gentoo Linux (Sparc, Devrel) ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
Re: [O-MPI devel] [PATCH] Update Open MPI for new libibverbs API
Brian> It's even more annoying to be deluged with SPAM ;). We Brian> (the LAM developers) used to try to keep our mailing lists Brian> as open as possible. In the end, SPAM pushed the signal to Brian> noise ratio way too high and something had to be done. Brian> Requiring subscriptions to post was the best we could do. I understand that you have limited resources to administer your mailing list, but certainly lists like openib-general and linux-kernel show that it is possible to run lists with low levels of spam and still allow posting by anyone. In general, if I have to subscribe to a list just to send a bug fix to a project, I'm quite likely to forget about it. So you are definitely missing out on contributions by closing your lists. Brian> I'll admit my ignorance - is this part of a particular Brian> release of OpenIB, or is this something that has happened Brian> recently in SVN? I ask because we already have people Brian> using OpenIB and Open MPI together, and it would be bad to Brian> suddenly break things for them. Testing for number of Brian> arguments in a function is horribly unreliable - is there Brian> some version number or other key in the Open IB headers we Brian> can use to determine which version of the function to use? OpenIB has not done an "official" release of any userspace components, so this falls into the category of prerelease API breakage. New kernels will require a new libibverbs, so the number of obsolete old development versions should decrease fairly quickly. - R.
[O-MPI devel] Back to 32bit on 64bit machines...
So is this an error or am I configuring wrong? Here's my configure: [sparkplug]~/ompi > ./configure CFLAGS=-m32 FFLAGS=-m32 CXXFLAGS=-m32 --without-threads --prefix=/home/ndebard/local/ompi --with-devel-headers --without-gm I've also tried adding --build=i586-suse-linux, that didn't help either. Basically the compile eventually ends here: g++ -DHAVE_CONFIG_H -I. -I. -I../../../include -I../../../include -I../../../include -I../../.. -I../../.. -I../../../include -I../../../opal -I../../../orte -I../../../ompi -m32 -g -Wall -Wundef -Wno-long-long -finline-functions -MT comm.lo -MD -MP -MF .deps/comm.Tpo -c comm.cc -fPIC -DPIC -o .libs/comm.o /bin/sh ../../../libtool --mode=link g++ -m32 -g -Wall -Wundef -Wno-long-long -finline-functions -export-dynamic -o libmpi_cxx.la -rpath /home/ndebard/local/ompi/lib mpicxx.lo intercepts.lo comm.lo -lm -lutil -lnsl g++ -shared -nostdlib /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib/crti.o /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32/crtbeginS.o .libs/mpicxx.o .libs/intercepts.o .libs/comm.o -lutil -lnsl -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib/../lib -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -L/lib/../lib -L/usr/lib/../lib /usr/lib64/libstdc++.so -lm -lc -lgcc_s_32 /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32/crtendS.o /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib/crtn.o -m32 -Wl,-soname -Wl,libmpi_cxx.so.0 -o .libs/libmpi_cxx.so.0.0.0 /usr/lib64/libstdc++.so: could not read symbols: Invalid operation collect2: ld returned 1 exit status make[3]: *** [libmpi_cxx.la] Error 1 make[3]: Leaving directory `/home/ndebard/ompi/ompi/mpi/cxx' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/home/ndebard/ompi/ompi/mpi' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/ndebard/ompi/ompi' make: *** [all-recursive] Error 1 [sparkplug]~/ompi > I'm having problems I think might be 64bit related and want to prove it by building in 32bit mode. Oh, here's some basics if it helps. [sparkplug]~/ompi > cat /etc/issue Welcome to SuSE Linux 9.1 (x86-64) - Kernel \r (\l). [sparkplug]~/ompi > uname -a Linux sparkplug 2.6.10 #4 SMP Wed Jan 26 11:50:00 MST 2005 x86_64 x86_64 x86_64 GNU/Linux [sparkplug]~/ompi > -- -- Nathan Correspondence - Nathan DeBardeleben, Ph.D. Los Alamos National Laboratory Parallel Tools Team High Performance Computing Environments phone: 505-667-3428 email: ndeb...@lanl.gov -
Re: [O-MPI devel] Back to 32bit on 64bit machines...
This looks like it *might* be a libtool problem -- it's picking up the /usr/lib64/libstdc++.so when you're compiling in 32 bit mode (and therefore barfing). Can you send the libtool command that immediately preceded this link line? As a workaround, you should be able to --disable-cxx to disable the MPI C++ bindings, and therefore skip building in this tree. Ralf -- any thoughts? On Sep 27, 2005, at 3:23 PM, Nathan DeBardeleben wrote: So is this an error or am I configuring wrong? Here's my configure: [sparkplug]~/ompi > ./configure CFLAGS=-m32 FFLAGS=-m32 CXXFLAGS=-m32 --without-threads --prefix=/home/ndebard/local/ompi --with-devel-headers --without-gm I've also tried adding --build=i586-suse-linux, that didn't help either. Basically the compile eventually ends here: g++ -DHAVE_CONFIG_H -I. -I. -I../../../include -I../../../include -I../../../include -I../../.. -I../../.. -I../../../include -I../../../opal -I../../../orte -I../../../ompi -m32 -g -Wall -Wundef -Wno-long-long -finline-functions -MT comm.lo -MD -MP -MF .deps/comm.Tpo -c comm.cc -fPIC -DPIC -o .libs/comm.o /bin/sh ../../../libtool --mode=link g++ -m32 -g -Wall -Wundef -Wno-long-long -finline-functions -export-dynamic -o libmpi_cxx.la -rpath /home/ndebard/local/ompi/lib mpicxx.lo intercepts.lo comm.lo -lm -lutil -lnsl g++ -shared -nostdlib /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib/crti.o /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32/crtbeginS.o .libs/mpicxx.o .libs/intercepts.o .libs/comm.o -lutil -lnsl -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse- linux/lib/../lib -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse- linux/lib -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -L/lib/../lib -L/usr/lib/../lib /usr/lib64/libstdc++.so -lm -lc -lgcc_s_32 /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32/crtendS.o /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib/crtn.o -m32 -Wl,-soname -Wl,libmpi_cxx.so.0 -o .libs/libmpi_cxx.so.0.0.0 /usr/lib64/libstdc++.so: could not read symbols: Invalid operation collect2: ld returned 1 exit status make[3]: *** [libmpi_cxx.la] Error 1 make[3]: Leaving directory `/home/ndebard/ompi/ompi/mpi/cxx' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/home/ndebard/ompi/ompi/mpi' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/ndebard/ompi/ompi' make: *** [all-recursive] Error 1 [sparkplug]~/ompi > I'm having problems I think might be 64bit related and want to prove it by building in 32bit mode. Oh, here's some basics if it helps. [sparkplug]~/ompi > cat /etc/issue Welcome to SuSE Linux 9.1 (x86-64) - Kernel \r (\l). [sparkplug]~/ompi > uname -a Linux sparkplug 2.6.10 #4 SMP Wed Jan 26 11:50:00 MST 2005 x86_64 x86_64 x86_64 GNU/Linux [sparkplug]~/ompi > -- -- Nathan Correspondence - Nathan DeBardeleben, Ph.D. Los Alamos National Laboratory Parallel Tools Team High Performance Computing Environments phone: 505-667-3428 email: ndeb...@lanl.gov - ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
Re: [O-MPI devel] Back to 32bit on 64bit machines...
Hi Nathan, Jeff, * Jeff Squyres wrote on Tue, Sep 27, 2005 at 09:39:59PM CEST: > This looks like it *might* be a libtool problem -- it's picking up the > /usr/lib64/libstdc++.so when you're compiling in 32 bit mode (and > therefore barfing). Yep, I think it is. > Can you send the libtool command that immediately preceded this link > line? > > As a workaround, you should be able to --disable-cxx to disable the MPI > C++ bindings, and therefore skip building in this tree. Other, better-suited workarounds: either - remove the 64bit paths from compiler_lib_search_path and sys_lib_search_path_spec in the generated libtool script(s) (note these variables are set both at the very beginning, and at the very end, once for each source file language), or - link with "LDFLAGS=-L/usr/lib", if /usr/lib is where your 32-bit libstdc++.so is located. We're not really sure yet how to fix this for all distributions. Sorry for the inconvenience, Ralf > On Sep 27, 2005, at 3:23 PM, Nathan DeBardeleben wrote: > > > So is this an error or am I configuring wrong? > > > > Here's my configure: > > > >> [sparkplug]~/ompi > ./configure CFLAGS=-m32 FFLAGS=-m32 CXXFLAGS=-m32 > >> --without-threads --prefix=/home/ndebard/local/ompi > >> --with-devel-headers --without-gm > > > > I've also tried adding --build=i586-suse-linux, that didn't help > > either. > > Basically the compile eventually ends here: > > > >> g++ -DHAVE_CONFIG_H -I. -I. -I../../../include -I../../../include > >> -I../../../include -I../../.. -I../../.. -I../../../include > >> -I../../../opal -I../../../orte -I../../../ompi -m32 -g -Wall -Wundef > >> -Wno-long-long -finline-functions -MT comm.lo -MD -MP -MF > >> .deps/comm.Tpo -c comm.cc -fPIC -DPIC -o .libs/comm.o > >> /bin/sh ../../../libtool --mode=link g++ -m32 -g -Wall -Wundef > >> -Wno-long-long -finline-functions -export-dynamic -o libmpi_cxx.la > >> -rpath /home/ndebard/local/ompi/lib mpicxx.lo intercepts.lo comm.lo > >> -lm -lutil -lnsl > >> g++ -shared -nostdlib > >> /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib/crti.o > >> /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32/crtbeginS.o > >> .libs/mpicxx.o .libs/intercepts.o .libs/comm.o -lutil -lnsl > >> -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32 > >> -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 > >> -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse- > >> linux/lib/../lib > >> -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse- > >> linux/lib > >> -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib > >> -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -L/lib/../lib > >> -L/usr/lib/../lib /usr/lib64/libstdc++.so -lm -lc -lgcc_s_32 > >> /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32/crtendS.o > >> /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib/crtn.o > >> -m32 -Wl,-soname -Wl,libmpi_cxx.so.0 -o .libs/libmpi_cxx.so.0.0.0 > >> /usr/lib64/libstdc++.so: could not read symbols: Invalid operation > >> collect2: ld returned 1 exit status > >> make[3]: *** [libmpi_cxx.la] Error 1 > >> make[3]: Leaving directory `/home/ndebard/ompi/ompi/mpi/cxx' > >> make[2]: *** [all-recursive] Error 1 > >> make[2]: Leaving directory `/home/ndebard/ompi/ompi/mpi' > >> make[1]: *** [all-recursive] Error 1 > >> make[1]: Leaving directory `/home/ndebard/ompi/ompi' > >> make: *** [all-recursive] Error 1 > >> [sparkplug]~/ompi > > > > > I'm having problems I think might be 64bit related and want to prove it > > by building in 32bit mode. > > Oh, here's some basics if it helps. > > > >> [sparkplug]~/ompi > cat /etc/issue > >> > >> Welcome to SuSE Linux 9.1 (x86-64) - Kernel \r (\l). > >> > >> > >> [sparkplug]~/ompi > uname -a > >> Linux sparkplug 2.6.10 #4 SMP Wed Jan 26 11:50:00 MST 2005 x86_64 > >> x86_64 x86_64 GNU/Linux > >> [sparkplug]~/ompi >
[O-MPI devel] bproc question
Hi, Trying to install ompi on a bproc machine with no network filesystem. I've copied the contents of the ompi lib directory into /tmp/ompi on each node and set my LD_LIBRARY_PATH to /tmp/ompi. However when I run the program, I get the following error. Any suggestions on what else I need to do? Thanks, Greg [n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file orte_init_stage1.c at line 191 [n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file orte_system_init.c at line 39 [n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file orte_init.c at line 47 -- Sorry! You were supposed to get help about: orted:init-failure from the file: help-orted.txt But I couldn't find any file matching that name. Sorry! -- -- A daemon (pid 31161) launched by the bproc PLS component on node 0 died unexpectedly so we are aborting. This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- [bluesteel.lanl.gov:31160] [0,0,0] ORTE_ERROR_LOG: Error in file pls_bproc.c at line 870
Re: [O-MPI devel] bproc question
Hi Greg, * Greg Watson wrote on Tue, Sep 27, 2005 at 10:27:22PM CEST: > > Trying to install ompi on a bproc machine with no network filesystem. > I've copied the contents of the ompi lib directory into /tmp/ompi on > each node and set my LD_LIBRARY_PATH to /tmp/ompi. However when I run > the program, I get the following error. Any suggestions on what else > I need to do? [ Disclaimer: I don't know much about bproc, so I don't know if this applies here ] You could try to configure --prefix=/tmp/ompi and then just make install there? Cheers, Ralf > [n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file > orte_init_stage1.c at line 191 > [n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file > orte_system_init.c at line 39 > [n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file orte_init.c at > line 47
Re: [O-MPI devel] bproc question
This exact problem came up in a different context today. This is only a side-effect of us having crummy error messages. :-( What is happening is that OMPI is not finding its components. Specifically, it's looking for the SDS components in this case, not finding them, and then barfing. Open MPI, by default, looks in $prefix/lib/openmpi and $HOME/.openmpi/components for its components. This is set with the "mca_component_path" MCA parameter -- you can certainly change it to be whatever you need. For example: - [15:26] odin:~/svn/ompi/ompi/runtime % ompi_info --param mca all [snipped] MCA mca: parameter "mca_component_path" (current value: "/u/jsquyres/bogus/lib/openmpi:/u/jsquyres/.openmpi components") Path where to look for Open MPI and ORTE components [snipped] - So you should be able to: orteun --mca mca_component_path /path/where/you/have/them ... Disclaimer: this *used* to work, but I haven't tried it in a long time. There's no reason that it shouldn't work, but we all know how bit rot happens... However, be aware that the wrapper compilers are still hard-coded to look in $prefix/lib to link the OMPI/ORTE/OPAL compilers. You can override that stuff with environment variables if you need to, but it's not desirable. Sidenote: in LAM, we had a single, top-level environment variable named LAMHOME that would override all this stuff. However, we found that it *really* confused most users -- there were very, very few instances where there was a genuine need for it. So we didn't add a single, top-level control like that in OMPI. On Sep 27, 2005, at 4:27 PM, Greg Watson wrote: Hi, Trying to install ompi on a bproc machine with no network filesystem. I've copied the contents of the ompi lib directory into /tmp/ompi on each node and set my LD_LIBRARY_PATH to /tmp/ompi. However when I run the program, I get the following error. Any suggestions on what else I need to do? Thanks, Greg [n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file orte_init_stage1.c at line 191 [n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file orte_system_init.c at line 39 [n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file orte_init.c at line 47 --- - -- Sorry! You were supposed to get help about: orted:init-failure from the file: help-orted.txt But I couldn't find any file matching that name. Sorry! --- - -- --- - -- A daemon (pid 31161) launched by the bproc PLS component on node 0 died unexpectedly so we are aborting. This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. --- - -- [bluesteel.lanl.gov:31160] [0,0,0] ORTE_ERROR_LOG: Error in file pls_bproc.c at line 870 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
Re: [O-MPI devel] bproc question
Yes!!! It worked: I have the components installed in /home/gwatson/ompi_install/lib/ openmpi on the front end and in /tmp/ompi/openmpi on the nodes. The two things I need to do are: 1. Set my LD_LIBRARY_PATH to /home/gwatson/ompi_install/lib:/tmp/ompi so that it picks up the shared libraries on the front end and on the nodes. 2. Use the following command to run my program 'x': orterun --mca mca_component_path /home/gwatson/ompi_install/lib/ openmpi:/tmp/ompi/openmpi -np 2 ./x Cheers, Greg On Sep 27, 2005, at 2:53 PM, Jeff Squyres wrote: This exact problem came up in a different context today. This is only a side-effect of us having crummy error messages. :-( What is happening is that OMPI is not finding its components. Specifically, it's looking for the SDS components in this case, not finding them, and then barfing. Open MPI, by default, looks in $prefix/lib/openmpi and $HOME/.openmpi/components for its components. This is set with the "mca_component_path" MCA parameter -- you can certainly change it to be whatever you need. For example: - [15:26] odin:~/svn/ompi/ompi/runtime % ompi_info --param mca all [snipped] MCA mca: parameter "mca_component_path" (current value: "/u/jsquyres/bogus/lib/openmpi:/u/jsquyres/.openmpi components") Path where to look for Open MPI and ORTE components [snipped] - So you should be able to: orteun --mca mca_component_path /path/where/you/have/them ... Disclaimer: this *used* to work, but I haven't tried it in a long time. There's no reason that it shouldn't work, but we all know how bit rot happens... However, be aware that the wrapper compilers are still hard-coded to look in $prefix/lib to link the OMPI/ORTE/OPAL compilers. You can override that stuff with environment variables if you need to, but it's not desirable. Sidenote: in LAM, we had a single, top-level environment variable named LAMHOME that would override all this stuff. However, we found that it *really* confused most users -- there were very, very few instances where there was a genuine need for it. So we didn't add a single, top-level control like that in OMPI. On Sep 27, 2005, at 4:27 PM, Greg Watson wrote: Hi, Trying to install ompi on a bproc machine with no network filesystem. I've copied the contents of the ompi lib directory into /tmp/ompi on each node and set my LD_LIBRARY_PATH to /tmp/ompi. However when I run the program, I get the following error. Any suggestions on what else I need to do? Thanks, Greg [n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file orte_init_stage1.c at line 191 [n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file orte_system_init.c at line 39 [n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file orte_init.c at line 47 - -- - -- Sorry! You were supposed to get help about: orted:init-failure from the file: help-orted.txt But I couldn't find any file matching that name. Sorry! - -- - -- - -- - -- A daemon (pid 31161) launched by the bproc PLS component on node 0 died unexpectedly so we are aborting. This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. - -- - -- [bluesteel.lanl.gov:31160] [0,0,0] ORTE_ERROR_LOG: Error in file pls_bproc.c at line 870 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel