Re: [OMPI devel] undefined references for rdma_get_peer_addr & rdma_get_local_addr
(moving to devel so that others are aware) Crud. Can you send me your config.log? I don't know why it's able to find rdma_get_peer_addr() in configure, but then later not able to find it during the build - I'd like to see what happened during configure. On May 2, 2008, at 7:09 PM, Pak Lui wrote: Hi Jeff, It seems that the cpc3 merge causes my Ranger build to break. I believe it is using OFED 1.2 but I don't know how to check. It passes the ompi_check_openib.m4 that you added in for the rdma_get_peer_addr. Is there a missing #include for openib/ofed related somewhere? 1236 checking rdma/rdma_cma.h usability... yes 1237 checking rdma/rdma_cma.h presence... yes 1238 checking for rdma/rdma_cma.h... yes 1239 checking for rdma_create_id in -lrdmacm... yes 1240 checking for rdma_get_peer_addr... yes pgCC -DHAVE_CONFIG_H -I. -I../../../../ompi/tools/ompi_info - I../../../opal/include -I../../../orte/include -I../../../ompi/ include -I../../../opal/mca/paffinity/linux/plpa/src/libplpa - DOMPI_CONFIGURE_USER="\"paklui\"" - DOMPI_CONFIGURE_HOST="\"login4.ranger.tacc.utexas.edu\"" - DOMPI_CONFIGURE_DATE="\"Fri May 2 17:07:01 CDT 2008\"" - DOMPI_BUILD_USER="\"$USER\"" -DOMPI_BUILD_HOST="\"`hostname`\"" - DOMPI_BUILD_DATE="\"`date`\"" -DOMPI_BUILD_CFLAGS="\"-O -DNDEBUG \"" -DOMPI_BUILD_CPPFLAGS="\"-I../../../.. -I../../.. -I../../../../ opal/include -I../../../../orte/include -I../../../../ompi/include - D_REENTRANT\"" -DOMPI_BUILD_CXXFLAGS="\"-O -DNDEBUG \"" - DOMPI_BUILD_CXXCPPFLAGS="\"-I../../../.. -I../../.. -I../../../../ opal/include -I../../../../orte/include -I../../../../ompi/include - D_REENTRANT\"" -DOMPI_BUILD_FFLAGS="\"\"" - DOMPI_BUILD_FCFLAGS="\"\"" -DOMPI_BUILD_LDFLAGS="\" \"" - DOMPI_BUILD_LIBS="\"-lnsl -lutil -lpthread\"" - DOMPI_CC_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/bin/pgcc\"" - DOMPI_CXX_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/bin/pgCC\"" -DOMPI_F77_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/bin/ pgf77\"" -DOMPI_F90_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/ bin/pgf95\"" -DOMPI_F90_BUILD_SIZE="\"small\"" -I../../../.. - I../../.. -I../../../../opal/include -I../../../../orte/include - I../../../../ompi/include -D_REENTRANT -O -DNDEBUG -c -o version.o ../../../../ompi/tools/ompi_info/version.cc /bin/sh ../../../libtool --tag=CXX --mode=link pgCC -O -DNDEBUG - o ompi_info components.o ompi_info.o output.o param.o version.o ../../../ompi/libmpi.la -lnsl -lutil -lpthread libtool: link: pgCC -O -DNDEBUG -o .libs/ompi_info components.o ompi_info.o output.o param.o version.o ../../../ompi/.libs/ libmpi.so -L/opt/ofed/lib64 -libcm -lrdmacm -libverbs -lrt /share/ home/00951/paklui/ompi-trunk5/config-data1/orte/.libs/libopen- rte.so /share/home/00951/paklui/ompi-trunk5/config-data1/opal/.libs/ libopen-pal.so -lnuma -ldl -lnsl -lutil -lpthread -Wl,--rpath -Wl,/ share/home/00951/paklui/ompi-trunk5/shared-install1/lib [1]Exit 2make install >& make.install.log.0 ../../../ompi/.libs/libmpi.so: undefined reference to `rdma_get_peer_addr' ../../../ompi/.libs/libmpi.so: undefined reference to `rdma_get_local_addr' make[2]: *** [ompi_info] Error 2 make[2]: Leaving directory `/share/home/00951/paklui/ompi-trunk5/ config-data1/ompi/tools/ompi_info' make[1]: *** [install-recursive] Error 1 make[1]: Leaving directory `/share/home/00951/paklui/ompi-trunk5/ config-data1/ompi' make: *** [install-recursive] Error 1 -- - Pak Lui pak@sun.com -- Jeff Squyres Cisco Systems
Re: [OMPI devel] OMPI Mercurial read-only mirror
Roland Dreier wrote: > Can I make a /tmp branch from the hg read-only branch that is not tied > to the svn /tmp branches. Why do you want to do that? Mercurial is a fully distributed system, so you could just start committing to one of your local copies of the repository, and I can't see anything missing that a /tmp branch would give you. - R. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel Same reason you do an SVN tmp branch. So others (outside of my employer's WAN) can actually clone the branch and try it out before you push it back to the repository. --td
Re: [OMPI devel] OMPI Mercurial read-only mirror
Ralph Castain wrote: Sure: hg clone http://www.open-mpi.org/hg/hgwebdir.cgi/ompi-svn-mirror my-tmp I want the tmp to reside on www.open-mpi.org not in my own directory. --td On 5/2/08 9:57 AM, "Terry Dontje" wrote: Jeff Squyres wrote: On May 2, 2008, at 11:04 AM, Terry Dontje wrote: Is there a way to make a hg specific /tmp branch? I'm not sure what you're asking...? Can I make a /tmp branch from the hg read-only branch that is not tied to the svn /tmp branches. --td ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] OMPI Mercurial read-only mirror
On May 3, 2008, at 7:23 AM, Terry Dontje wrote: Same reason you do an SVN tmp branch. So others (outside of my employer's WAN) can actually clone the branch and try it out before you push it back to the repository. From my original mail :-) -- "We're still working on a way for OMPI core members to publish their own HG trees. Ralph and a few others can publish their HG tree now because they were willing to be guinea pigs, but it's not setup for the general case yet -- give us a little more time to get that going." -- Jeff Squyres Cisco Systems
Re: [OMPI devel] undefined references for rdma_get_peer_addr & rdma_get_local_addr
Sure Jeff, see attached. Jeff Squyres wrote: (moving to devel so that others are aware) Crud. Can you send me your config.log? I don't know why it's able to find rdma_get_peer_addr() in configure, but then later not able to find it during the build - I'd like to see what happened during configure. On May 2, 2008, at 7:09 PM, Pak Lui wrote: Hi Jeff, It seems that the cpc3 merge causes my Ranger build to break. I believe it is using OFED 1.2 but I don't know how to check. It passes the ompi_check_openib.m4 that you added in for the rdma_get_peer_addr. Is there a missing #include for openib/ofed related somewhere? 1236 checking rdma/rdma_cma.h usability... yes 1237 checking rdma/rdma_cma.h presence... yes 1238 checking for rdma/rdma_cma.h... yes 1239 checking for rdma_create_id in -lrdmacm... yes 1240 checking for rdma_get_peer_addr... yes pgCC -DHAVE_CONFIG_H -I. -I../../../../ompi/tools/ompi_info - I../../../opal/include -I../../../orte/include -I../../../ompi/ include -I../../../opal/mca/paffinity/linux/plpa/src/libplpa - DOMPI_CONFIGURE_USER="\"paklui\"" - DOMPI_CONFIGURE_HOST="\"login4.ranger.tacc.utexas.edu\"" - DOMPI_CONFIGURE_DATE="\"Fri May 2 17:07:01 CDT 2008\"" - DOMPI_BUILD_USER="\"$USER\"" -DOMPI_BUILD_HOST="\"`hostname`\"" - DOMPI_BUILD_DATE="\"`date`\"" -DOMPI_BUILD_CFLAGS="\"-O -DNDEBUG \"" -DOMPI_BUILD_CPPFLAGS="\"-I../../../.. -I../../.. -I../../../../ opal/include -I../../../../orte/include -I../../../../ompi/include - D_REENTRANT\"" -DOMPI_BUILD_CXXFLAGS="\"-O -DNDEBUG \"" - DOMPI_BUILD_CXXCPPFLAGS="\"-I../../../.. -I../../.. -I../../../../ opal/include -I../../../../orte/include -I../../../../ompi/include - D_REENTRANT\"" -DOMPI_BUILD_FFLAGS="\"\"" - DOMPI_BUILD_FCFLAGS="\"\"" -DOMPI_BUILD_LDFLAGS="\" \"" - DOMPI_BUILD_LIBS="\"-lnsl -lutil -lpthread\"" - DOMPI_CC_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/bin/pgcc\"" - DOMPI_CXX_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/bin/pgCC\"" -DOMPI_F77_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/bin/ pgf77\"" -DOMPI_F90_ABSOLUTE="\"/opt/apps/pgi/7.1/linux86-64/7.1-2/ bin/pgf95\"" -DOMPI_F90_BUILD_SIZE="\"small\"" -I../../../.. - I../../.. -I../../../../opal/include -I../../../../orte/include - I../../../../ompi/include -D_REENTRANT -O -DNDEBUG -c -o version.o ../../../../ompi/tools/ompi_info/version.cc /bin/sh ../../../libtool --tag=CXX --mode=link pgCC -O -DNDEBUG - o ompi_info components.o ompi_info.o output.o param.o version.o ../../../ompi/libmpi.la -lnsl -lutil -lpthread libtool: link: pgCC -O -DNDEBUG -o .libs/ompi_info components.o ompi_info.o output.o param.o version.o ../../../ompi/.libs/ libmpi.so -L/opt/ofed/lib64 -libcm -lrdmacm -libverbs -lrt /share/ home/00951/paklui/ompi-trunk5/config-data1/orte/.libs/libopen- rte.so /share/home/00951/paklui/ompi-trunk5/config-data1/opal/.libs/ libopen-pal.so -lnuma -ldl -lnsl -lutil -lpthread -Wl,--rpath -Wl,/ share/home/00951/paklui/ompi-trunk5/shared-install1/lib [1]Exit 2make install >& make.install.log.0 ../../../ompi/.libs/libmpi.so: undefined reference to `rdma_get_peer_addr' ../../../ompi/.libs/libmpi.so: undefined reference to `rdma_get_local_addr' make[2]: *** [ompi_info] Error 2 make[2]: Leaving directory `/share/home/00951/paklui/ompi-trunk5/ config-data1/ompi/tools/ompi_info' make[1]: *** [install-recursive] Error 1 make[1]: Leaving directory `/share/home/00951/paklui/ompi-trunk5/ config-data1/ompi' make: *** [install-recursive] Error 1 -- - Pak Lui pak@sun.com -- - Pak Lui pak@sun.com config.log.bz2 Description: Binary data
Re: [OMPI devel] MCA component open
The problem: The orted open all plm before discarding most of them, all this in the context where a "--mca plm rsh" was present on the mpirun invocation. The non problem: In the context of the mpirun process, only the rsh plm is opened, as the mpirun is the only process who get the "--mca plm rsh" information. As this specific argument is not included on the list of arguments we forward to the orted processes, there is no way that the orted can abide to the imposed restriction. Note that if the restriction is inserted in the config file, then even the orted respect it. So far the only problem I can see here, is that the orted are opening a framework that they are not supposed to (at least not in most of the cases). When we implemented the MCA filtering stuff, we proposed another optimization. More specifically, a default component for all special frameworks (i.e. used or not based on the type of process) that will be statically linked inside the library (and therefore will not generate any NFS traffic). Its only goal was to execute the selection logic when any of its functions were called, in other words on-demand component loading feature. Starting from there, a real component will be selected, and all other calls to this component will be directed to the selected component. I perfectly remember that Ralph was completely against this feature for two reasons: 1) all components in the ORTE framework had to be loaded and they will do the "if(!hnp) return NULL"; 2) he proposed to implement the null component. I was and I'm still against 1) so I guess that any effort toward implementing a null or none component will have my support. george. On May 2, 2008, at 4:40 PM, Josh Hursey wrote: We could also call it 'null' for the empty set of components? Or maybe OMPI-NULL. Outside of the naming do others this this is a useful feature to implement? -- Josh On May 2, 2008, at 10:51 AM, Ralph Castain wrote: I would think that adding a special keyword would be the correct method. I would suggest something with an "ompi" in it, perhaps capitalized so there is no confusion...something like "OMPI-NONE"? On 5/2/08 8:37 AM, "Josh Hursey" wrote: I don't believe we have the logic in place to tell mca_component_open 'do not open anything'. (I could be wrong though). Adding such an option might be useful, but we would have to consider how that option should be specified by the user. Currently if you do not set a value (leave empty space in mca-params.conf) then the MCA system takes this to indicate that all components are eligible for selection. If you specify any options then only those options should be opened. We could add a special keyword (such as 'none') to indicate 'open nothing'. What do people think about that? -- Josh On May 2, 2008, at 10:22 AM, Ralph Castain wrote: I see what the problem is. In the case of slurm, I don't want -any- components to be opened, even though I am going to call plm open/ select. I have to leave that logic in place for those environments that -do- want to specify some backend secondary launcher. So the question is: how do I tell mca_component_open "do not open anything"? If we don't have a mechanism for doing that, can we create one? On 5/2/08 8:02 AM, "Ralph Castain" wrote: Well, I have a current version of the trunk. I add an MCA param to the environment indicating that only rsh is to be used by the orted. Yet I get an output from every orted indicating that slurm (misspelled!) is available for selection. This tells me that the slurm component is being opened, even though the param is set. I can check again to ensure that the param is set... On 5/2/08 7:53 AM, "Jeff Squyres" wrote: (moving to devel list for wider audience) Hmm. I thought the UTK stuff from a while ago supposedly changed this behavior to only open the components that were specifically requested. This behavior looks like the *original* MCA behavior -- open them all, then discard what we don't want (but doesn't necessarily reclaim the memory because of how dlclose works). On May 2, 2008, at 9:48 AM, Ralph Castain wrote: Yo guys I've noticed something on the trunk that just doesn't strike me as correct. If I specify "-mca plm rsh", it is my expectation that (a) only the rsh component will be opened, and (b) only the rsh module will be selected, unless that component indicates that it cannot run. What I am seeing, though, is that -all- the plm components are being opened. This is not only unnecessary, but consumes memory and leads to concern over whether or not some other module could become active. Is this the intended behavior? If so, may I suggest we change it in Josh's branch prior to bringing it over? Ralph ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de
Re: [OMPI devel] Build failure on FreeBSD 7
The small commit that Karol originally suggested was just pushed to ompi-trunk. This just simply adds the appropriate header files for FreeBSD (6.2, 6.3 and 7) to be able to compile. https://svn.open-mpi.org/trac/ompi/changeset/18366 This didn't fix the hanging on the kevent call mentioned in this thread, however, setting the environment variable EVENT_NOKQUEUE did find a work-around. I'm not sure if that is the solution we want for all FreeBSD platforms in the long term (requiring the user to set particular environment variables for particular platforms), but for now at least I can run the MTT tests that I need to (once it gets in a nightly build). Feel free to contact me if you would care to work together on another solution. Or is it as simple as returning NULL from kq_init if EVENT_NOKQUEUE is set or #if defined(__FreeBSD__) like the minor patch below? Thanks, brad --- opal/event/kqueue.c (revision 18366) +++ opal/event/kqueue.c (working copy) @@ -116,7 +116,11 @@ struct kqop *kqueueop; /* Disable kqueue when this environment variable is set */ -if (getenv("EVENT_NOKQUEUE")) +if (getenv("EVENT_NOKQUEUE") +#if defined(__FreeBSD__) +|| 1 +#endif + ) return (NULL); if (!(kqueueop = calloc(1, sizeof(struct kqop On Thu, May 1, 2008 at 9:02 AM, Brad Penoff wrote: > I believe Karol's patch in the original mail in this thread adds the > appropriate headers for openpty to be resolved when --enable-picky is > supplied. Without --enable-picky, it's able to resolve it too, as the > code is. However, even if it compiles, the call to kevent (line 177 > of opal/event/kqueue.c) still hangs, so this is more of the mystery... > > Would giving you access to a FreeBSD 7 machine be useful? Contact me > off the list, if so and we'll try to sort something out. Or if you > have any patches/suggestions you'd like to try to fix this, I could > run them myself and let you know. > > Thanks, > brad > > > > On Thu, May 1, 2008 at 5:51 AM, Jeff Squyres wrote: > > George -- did you get to make this fix? > > > > What header file is openpty declared in on FreeBSD 7? It should be > > easy enough to add the right #include to that file. > > > > > > > > On Apr 29, 2008, at 7:45 PM, Brad Penoff wrote: > > > > > hey all, > > > > > > I was just configuring MTT to run some multihost tests on FreeBSD 7 > > > and I came across this same error you guys were, using the > > > openmpi-1.3a1r18325.tar.gz trunk nightly tarball : > > > > > > kqueue.c:165: error: implicit declaration of function 'openpty' > > > > > > However, this error seems to only come up if I use --enable-picky to > > > configure. Getting rid of --enable-picky results in a successful > > > compilation. Any idea why that is? Should this be fixed in the long > > > term? > > > > > > For now, I'm just adjusting my MTT runs to not have --enable-picky in > > > the ompi_configure_arguments... > > > > > > brad > > > > > > > > > 2008/4/11 George Bosilca : > > >> That's good that you guys revive this thread, I almost forget about > > >> it. > > >> > > >> The code you're referring, is not part of the libevent. It was one > > >> of my > > >> "fixes" around for problem on OS X (where kevent is not able to > > >> work nicely > > >> with pty). It works on MAC as the code trigger an error so there is > > >> no need > > >> for the timeout ... I'll make the corrections over the weekend. > > >> > > >> Thanks, > > >>george. > > >> > > >> > > >> > > >> On Apr 11, 2008, at 7:39 PM, Karol Mroz wrote: > > >> > > >>> Hi, Jeff... > > >>> > > >>> This test was performed locally, yes. I'm short on machines at the > > >>> moment > > >> to perform any proper distributed tests. > > >>> > > >>> -- > > >>> Karol > > >>> > > >>> -Original Message- > > >>> From: Jeff Squyres > > >>> > > >>> Date: Fri, 11 Apr 2008 16:36:33 > > >>> To:Open MPI Developers > > >>> Subject: Re: [OMPI devel] Build failure on FreeBSD 7 > > >>> > > >>> > > >>> This may depend on how you ran the app on FreeBSD -- did you run on > > >>> the localhost only? > > >>> > > >>> We have/had a problem when running locally with regards to kevent -- > > >>> I'm not 100% sure if we've fixed it yet. Let me check... > > >>> > > >>> > > >>> On Apr 5, 2008, at 1:53 AM, Karol Mroz wrote: > > >>> > > After digging a little deeper, it turns out that the kevent() > > call in > > opal/event/kquene.c: > > if (kevent(kq, > > kqueueop->changes, 1, kqueueop->events, NEVENT, NULL) != > > 1 || > > (int)kqueueop->events[0].ident != master || > > kqueueop->events[0].flags != EV_ERROR) { > > > > seems to hang in freebsd 7. Changing the NULL parameter to, lets > > say > > 1000, causes the function to return and print out th