Re: [OMPI devel] Adding a BTL module implementing poll()
How are you linking in the mos_poll() implementation? Is it up in the BTL? If so, you'll need to move it down to the OPAL libevent section. This is because all OPAL things are built before any OMPI things, to include some executables (e.g., the opal/tools/wrappers directory). They link against libopen-pal (i.e., the library for OPAL), and don't "see" anything in the upper layers, such as BTLs. Does that help? On Oct 31, 2010, at 4:54 AM, Alex Margolin wrote: > Hi, > I'm developing a new module under for BTL component to utilize an > existing distributed computing software in our lab. > I decided to write a TCP-like interface (implementing socket(), > connect(), accept(), send(), recv(), etc.) and then copy and modify > the existing BTL TCP module to create my own. I've also given > consideration to using LD_PRELOAD, but haven't gotten to it yet. > > Currently, i'm having trouble with the the poll() syscall. Since the > I'm using other, "non-linux" sockets (no valid FD) with my own poll > implementation on them, I tried to replace the poll() in > opal/event/poll.c with a call to my poll in ompi/mca/btl/... and > failed to build open-mpi. > Since my poll needs to use the internal data structures of my module, > I did the following steps: > 1. Create a sym-link to my .h file in opal/event/ > 2. in poll.c include my .h file and change the poll() syscall to call > my implementation (same interface). > 3. in Makefile.am added my .h file under EXTRA_DIST, my .lo file under > libevent_la_DEPENDENCIES and my module path under ompidir. > 4. tried to compile (x64): ./autogen.sh ; ./configure CFLAGS=-m64 > CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64 --prefix /home/alex/huji/mpi/ ; > make ; make install > 5. failed miserably: > > Making install in tools/wrappers > make[2]: Entering directory > `/home/alex/huji/openmpi-1.4.1/opal/tools/wrappers' > /bin/sh ../../../libtool --tag=CC --mode=link gcc -O3 -DNDEBUG -m64 > -finline-functions -fno-strict-aliasing -pthread -fvisibility=hidden > -export-dynamic -o opal_wrapper opal_wrapper.o > ../../../opal/libopen-pal.la -lnsl -lutil -lm > libtool: link: gcc -O3 -DNDEBUG -m64 -finline-functions > -fno-strict-aliasing -pthread -fvisibility=hidden -o > .libs/opal_wrapper opal_wrapper.o -Wl,--export-dynamic > ../../../opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm -pthread > -Wl,-rpath -Wl,/home/alex/huji/mpi/lib > ../../../opal/.libs/libopen-pal.so: undefined reference to `mos_poll' > collect2: ld returned 1 exit status > make[2]: *** [opal_wrapper] Error 1 > make[2]: Leaving directory `/home/alex/huji/openmpi-1.4.1/opal/tools/wrappers' > make[1]: *** [install-recursive] Error 1 > make[1]: Leaving directory `/home/alex/huji/openmpi-1.4.1/opal' > make: *** [install-recursive] Error 1 > > Can you please help me build open-mpi with my module, or suggest a > better way to do this? > Thanks, > Alex > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] 1.5.x plans
I think bringing the large changes from the trunk via patches in the CMR style is a non-starter, so I am glad that none of the options include this. So I am for any of the options proposed. I would just like the development branch (whether it be v1.5.X or v1.7) to be released more often. The original intention was that it would happen once every month or two. We missed that mark by quite a lot, which only compounds the problem with this particular decision. So I vote for any of the three options :) -- Josh On Oct 30, 2010, at 3:16 PM, Shamis, Pavel wrote: > IMHO "B" will require a lot of attention from all developers/vendors, as well > it maybe quite time consuming task (btw, I think it is q couple of openib btl > changes that aren't on the list). So probably it will be good to ask all btl > (or other modules/features) maintainers directly. > > Personally I prefer option C , A. > > My 0.02c > > - Pasha > > On Oct 26, 2010, at 5:07 PM, Jeff Squyres wrote: > >> On the teleconf today, two important topics were discussed about the 1.5.x >> series: >> >> - >> >> 1. I outlined my plan for a "small" 1.5.1 release. It is intended to fix a >> small number of compilation and portability issues. Everyone seemed to >> think that this was an ok idea. I have done some tomfoolery in Trac to >> re-target a bunch of tickets -- those listed in 1.5.1 are the only ones that >> I intend to apply to 1.5.1: >> >> https://svn.open-mpi.org/trac/ompi/report/15 >> >> (there's one critical bug that I don't know how to fix -- I'm waiting for >> feedback from Red Hat before I can continue) >> >> *** Does anyone have any other tickets/bugs that they want/need in a >> short-term 1.5.1 release? >> >> - >> >> 2. We discussed what to do for 1.5.2. Because 1.5[.0] took s long to >> release, there's now a sizable divergence between the trunk and the 1.5 >> branch. The problem is that there are a number of wide-reaching new >> features on the trunk, some of which may (will) be difficult to bring to the >> v1.5 branch in a piecemeal fashion, including (but not limited to): >> >> - Paffinity changes (including new hwloc component) >> - --with-libltdl changes >> - Ummunotify support >> - Solaris sysinfo component >> - Notifier improvements >> - OPAL_SOS >> - Common shared memory improvements >> - Build system improvements >> - New libevent >> - BFO PML >> - Almost all ORTE changes >> - Bunches of checkpoint restart mo'betterness (including MPI extensions) >> >> There seem to be 3 obvious options about moving forward (all assume that we >> do 1.5.1 as described above): >> >> A. End the 1.5 line (i.e., work towards transitioning it to 1.6), and then >> re-branch the trunk to be v1.7. >> B. Sync the trunk to the 1.5 branch en masse. Stabilize that and call it >> 1.5.2. >> C. Do the same thing as A, but wait at least 6 months (i.e., give the 1.5 >> series time to mature). >> >> Most people (including me) favored B. Rich was a little concerned that B >> spent too much time on maintenance/logistics when we could just be moving >> forward, and therefore favored either A or C. >> >> Any opinions from people who weren't there on the call today? >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey
[OMPI devel] Question about barrier()
Hi I have the following small program where the rank-0 process does sleep and then all the processes perform barrier(). #include "mpi.h" #include int main(int argc, char *argv[]) { int rank, nprocs; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&nprocs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); if ( rank == 0) sleep(60); MPI_Barrier(MPI_COMM_WORLD); printf("Hello, world. I am %d of %d\n", rank, nprocs);fflush(stdout); MPI_Finalize(); return 0; } When I run this program on two nodes consuming 16 cores, I see that the non rank-0 processes which are in wait mode for rank-0 process to complete barrier() are consuming only user time. I was expecting this behavior and there are no questions about it. However if I initialize MPI threads by replacing MPI_Init() with MPI_Init_thread(), I see quite a different behavior of this program. While rank-0 process is sleeping, all non rank-0 processes seem to be spending time in kernel mode (thus increasing system time) instead of waiting in user mode. Following is the sar output on the node where rank-0 process is running Node1> sar 2 10 Linux 2.6.18-128.1.10.el5-perfctr (Node1) 10/29/2010 02:33:51 PM CPU %user %nice %system %iowait%steal %idle 02:33:53 PM all 6.69 0.00 80.88 0.00 0.00 12.44 02:33:55 PM all 6.56 0.00 81.00 0.00 0.00 12.44 02:33:57 PM all 6.62 0.00 80.89 0.00 0.00 12.49 02:33:59 PM all 6.68 0.00 80.89 0.00 0.00 12.43 02:34:01 PM all 6.69 0.00 81.00 0.00 0.00 12.31 02:34:03 PM all 6.75 0.00 80.76 0.00 0.00 12.49 02:34:05 PM all 6.75 0.00 80.82 0.00 0.00 12.43 02:34:07 PM all 6.75 0.00 81.19 0.00 0.00 12.06 02:34:09 PM all 6.93 0.00 80.64 0.00 0.00 12.43 02:34:11 PM all 6.75 0.00 80.81 0.00 0.00 12.44 Average: all 6.72 0.00 80.89 0.00 0.00 12.40 And following is the sar output on the second node: Node2> sar 2 10 Linux 2.6.18-128.1.10.el5-perfctr (Node2) 10/29/2010 02:33:48 PM CPU %user %nice %system %iowait%steal %idle 02:33:50 PM all 6.37 0.00 93.63 0.00 0.00 0.00 02:33:52 PM all 6.19 0.00 93.81 0.00 0.00 0.00 02:33:54 PM all 6.31 0.00 93.69 0.00 0.00 0.00 02:33:56 PM all 6.50 0.00 93.50 0.00 0.00 0.00 02:33:58 PM all 6.81 0.00 93.19 0.00 0.00 0.00 02:34:00 PM all 6.56 0.00 93.44 0.00 0.00 0.00 02:34:02 PM all 6.50 0.00 93.50 0.00 0.00 0.00 02:34:04 PM all 6.50 0.00 93.50 0.00 0.00 0.00 02:34:06 PM all 6.56 0.00 93.44 0.00 0.00 0.00 02:34:08 PM all 6.68 0.00 93.32 0.00 0.00 0.00 Average: all 6.50 0.00 93.50 0.00 0.00 0.00 Can someone please explain the difference in behavior of barrier() call when I use MPI_Init() vs MPI_Init_thread()? Thanks Ananda Ananda B Mudar, PMP Senior Technical Architect Wipro Technologies Ph: 972 765 8093 ananda.mu...@wipro.com Please do not print this email unless it is absolutely necessary. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com
Re: [OMPI devel] === CREATE FAILURE (trunk) ===
Sorry for the delay on this -- the issue was quite subtle and the holiday weekend got in the way. I have a fix that will be committed a little after 6pm US Eastern. It seems to allow a fresh SVN checkout (with my patch applied) to pass "make distcheck". Hopefully we'll finally get a new trunk tarball tonight. On Oct 31, 2010, at 9:16 PM, MPI Team wrote: > > ERROR: Command returned a non-zero exist status (trunk): > make distcheck > > Start time: Sun Oct 31 21:00:12 EDT 2010 > End time: Sun Oct 31 21:16:33 EDT 2010 > > === > [... previous lines snipped ...] > checking for OPAL CXXFLAGS... -pthread > checking for OPAL CXXFLAGS_PREFIX... > checking for OPAL LDFLAGS... > checking for OPAL LIBS... -ldl -Wl,--export-dynamic -lrt -lnsl -lutil -lm > -ldl > checking for OPAL extra include dirs... > checking for ORTE CPPFLAGS... > checking for ORTE CXXFLAGS... -pthread > checking for ORTE CXXFLAGS_PREFIX... > checking for ORTE CFLAGS... -pthread > checking for ORTE CFLAGS_PREFIX... > checking for ORTE LDFLAGS... > checking for ORTE LIBS... -ldl -Wl,--export-dynamic -lrt -lnsl -lutil -lm > -ldl > checking for ORTE extra include dirs... > checking for OMPI CPPFLAGS... > checking for OMPI CFLAGS... -pthread > checking for OMPI CFLAGS_PREFIX... > checking for OMPI CXXFLAGS... -pthread > checking for OMPI CXXFLAGS_PREFIX... > checking for OMPI FFLAGS... -pthread > checking for OMPI FFLAGS_PREFIX... > checking for OMPI FCFLAGS... -pthread > checking for OMPI FCFLAGS_PREFIX... > checking for OMPI LDFLAGS... > checking for OMPI LIBS... -ldl -Wl,--export-dynamic -lrt -lnsl -lutil -lm > -ldl > checking for OMPI extra include dirs... > > *** Final output > configure: creating ./config.status > config.status: creating ompi/include/ompi/version.h > config.status: creating orte/include/orte/version.h > config.status: creating opal/include/opal/version.h > config.status: creating opal/mca/backtrace/Makefile > config.status: creating opal/mca/backtrace/printstack/Makefile > config.status: creating opal/mca/backtrace/execinfo/Makefile > config.status: creating opal/mca/backtrace/darwin/Makefile > config.status: creating opal/mca/backtrace/none/Makefile > config.status: creating opal/mca/carto/Makefile > config.status: creating opal/mca/carto/auto_detect/Makefile > config.status: creating opal/mca/carto/file/Makefile > config.status: creating opal/mca/compress/Makefile > config.status: creating opal/mca/compress/gzip/Makefile > config.status: creating opal/mca/compress/bzip/Makefile > config.status: creating opal/mca/crs/Makefile > config.status: creating opal/mca/crs/none/Makefile > config.status: creating opal/mca/crs/self/Makefile > config.status: creating opal/mca/crs/blcr/Makefile > config.status: creating opal/mca/event/Makefile > config.status: creating opal/mca/event/libevent207/Makefile > config.status: error: cannot find input file: > `opal/mca/event/libevent207/libevent/include/event2/event-config.h.in' > make: *** [distcheck] Error 1 > === > > Your friendly daemon, > Cyrador > ___ > testing mailing list > test...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/testing -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/