Re: [OMPI devel] Multi-rail on openib
On Tue, Jun 09, 2009 at 04:33:51PM +0300, Pavel Shamis (Pasha) wrote: > >> Open MPI currently needs to have connected fabrics, but maybe that's >> something we will like to change in the future, having two separate >> rails. (Btw Pasha, will your current work enable this ?) > I do not completely understand what do you mean here under two separate > rails ... > Already today you may connect each port to different subnet, and ports > in the same > subnet may talk to each other. > Subnet? (subnet .vs. fabric) Does this imply tcp/ip What IB protocols are involved and Is there any agent that notices the disconnect and will trigger the switch? -- T o m M i t c h e l l Found me a new hat, now what?
Re: [OMPI devel] Enabling debugging and profiling in openMPI (make "CFLAGS=-pg -g")
The firewall should already be solved. Basically, you have to define a set of ports in your firewall that will let TCP messages pass through, and then tell OMPI to use those ports for both the TCP BTL and the OOB. "ompi_info --params btl tcp" - will tell you the right param to set for the TCP BTL "ompi_info --params oob tcp" will do the same for the OOB Of course, that -does- leave a hole in your firewall that any TCP message can exploit. :-/ You could look at more secure alternative ways. I'm not sure how to solve the NAT problem as it boils down to how to specify the names/IP addresses of the nodes behind the NAT. Someone who understands NATs better can help you there - I know there is a way to do it, but I've never played with it. Ralph On Jun 12, 2009, at 11:00 AM, Leo P. wrote: Thank you Ralph and Samuel. Sorry for the complete newbie question. The reason that i wanted to study openMPI is because i wanted to make open MPI support nodes that are behind NAT or firewall. If you guys could give me some pointers on how to go about doing this i would appreciate alot. I am considering this for my thesis project. Sincerely, LEO From: Ralph Castain To: Open MPI Developers Sent: Friday, 12 June, 2009 9:56:16 PM Subject: Re: [OMPI devel] Enabling debugging and profiling in openMPI (make "CFLAGS=-pg -g") If you do a "./configure --help" you will get a complete list of the configure options. You may want to turn on more things than just enable-debug, though that is the critical first step. On Jun 12, 2009, at 8:31 AM, Samuel K. Gutierrez wrote: Hi, Let me begin by stating that I'm at most an Open MPI novice - but you may want to try the addition of the --enable-debug configure option. That is, for example: ./configure --enable-debug; make Hope this helps. Samuel K. Gutierrez On Jun 12, 2009, at 3:27 AM, Leo P. wrote: Hi everyone, I am trying to understand the openMPI code so was trying to enable debug and profiling by issusing $ make "CFLAGS=-pg -g" But i am getting this error. libtool: link: ( cd ".libs" && rm -f "mca_paffinity_linux.la" && ln -s "../mca_paffinity_linux.la" "mca_paffinity_linux.la" ) make[3]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal/mca/ paffinity/linux' make[2]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal/mca/ paffinity/linux' Making all in tools/wrappers make[2]: Entering directory `/home/Desktop/openmpi-1.3.2/opal/ tools/wrappers' depbase=`echo opal_wrapper.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\ gcc "-DEXEEXT=\"\"" -I. -I../../../opal/include -I../../../ orte/include -I../../../ompi/include -I../../../opal/mca/paffinity/ linux/plpa/src/libplpa -I../../..-pg -g -MT opal_wrapper.o - MD -MP -MF $depbase.Tpo -c -o opal_wrapper.o opal_wrapper.c &&\ mv -f $depbase.Tpo $depbase.Po /bin/bash ../../../libtool --tag=CC --mode=link gcc -pg -g - export-dynamic -o opal_wrapper opal_wrapper.o ../../../opal/ libopen-pal.la -lnsl -lutil -lm libtool: link: gcc -pg -g -o .libs/opal_wrapper opal_wrapper.o - Wl,--export-dynamic ../../../opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_key_create' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_getspecific' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_create' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_atfork' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_setspecific' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_join' collect2: ld returned 1 exit status make[2]: *** [opal_wrapper] Error 1 make[2]: Leaving directory `/home//Desktop/openmpi-1.3.2/opal/ tools/wrappers' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal' make: *** [all-recursive] Error 1 Is there any other way of enabling debugging and profilling in open MPI. Leo Explore your hobbies and interests. Click here to begin.___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel Bollywood news, movie reviews, film trailers and more! Click here.___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Hang in collectives involving shared memory
Sylvain Jeaugey wrote: Hi Ralph, I managed to have a deadlock after a whole night, but not the same you have : after a quick analysis, process 0 seems to be blocked in the very first send through shared memory. Still maybe a bug, but not the same as yours IMO. Yes, that's the one Terry and I have tried to hunt down. Kind of elusive. Apparently, there is a race condition in sm start-up. It *appears* as though a process (the lowest rank on a node?) computes offsets into shared memory using bad values and ends up with a FIFO pointer to the wrong spot. Up through 1.3.1, this meant that OMPI would fail in add_procs()... Jeff and Terry have seen a couple of these. With changes to sm in 1.3.2, the failure expresses itself differently... not until the first send (namely, first use of a remote FIFO). At least that's my understanding. George added some sync to the code to make it bulletproof. But doesn't seem to have fixed the problem. Sigh. Anyhow, I think you ran into a different but known yet not understood problem.
Re: [OMPI devel] Enabling debugging and profiling in openMPI (make "CFLAGS=-pg -g")
Thank you Ralph and Samuel. Sorry for the complete newbie question. The reason that i wanted to study openMPI is because i wanted to make open MPI support nodes that are behind NAT or firewall. If you guys could give me some pointers on how to go about doing this i would appreciate alot. I am considering this for my thesis project. Sincerely, LEO From: Ralph Castain To: Open MPI Developers Sent: Friday, 12 June, 2009 9:56:16 PM Subject: Re: [OMPI devel] Enabling debugging and profiling in openMPI (make "CFLAGS=-pg -g") If you do a "./configure --help" you will get a complete list of the configure options. You may want to turn on more things than just enable-debug, though that is the critical first step. On Jun 12, 2009, at 8:31 AM, Samuel K. Gutierrez wrote: Hi, Let me begin by stating that I'm at most an Open MPI novice - but you may want to try the addition of the --enable-debug configure option. That is, for example: ./configure --enable-debug; make Hope this helps. Samuel K. Gutierrez On Jun 12, 2009, at 3:27 AM, Leo P. wrote: Hi everyone, I am trying to understand the openMPI code so was trying to enable debug and profiling by issusing $ make "CFLAGS=-pg -g" But i am getting this error. libtool: link: ( cd ".libs" && rm -f "mca_paffinity_linux.la" && ln -s "../mca_paffinity_linux.la" "mca_paffinity_linux.la" ) make[3]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal/mca/paffinity/linux' make[2]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal/mca/paffinity/linux' Making all in tools/wrappers make[2]: Entering directory `/home/Desktop/openmpi-1.3.2/opal/tools/wrappers' depbase=`echo opal_wrapper.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\ gcc "-DEXEEXT=\"\"" -I. -I../../../opal/include -I../../../orte/include -I../../../ompi/include -I../../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../..-pg -g -MT opal_wrapper.o -MD -MP -MF $depbase.Tpo -c -o opal_wrapper.o opal_wrapper.c &&\ mv -f $depbase.Tpo $depbase.Po /bin/bash ../../../libtool --tag=CC --mode=link gcc -pg -g -export-dynamic -o opal_wrapper opal_wrapper.o ../../../opal/libopen-pal.la -lnsl -lutil -lm libtool: link: gcc -pg -g -o .libs/opal_wrapper opal_wrapper.o -Wl,--export-dynamic ../../../opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_key_create' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_getspecific' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_create' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_atfork' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_setspecific' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_join' collect2: ld returned 1 exit status make[2]: *** [opal_wrapper] Error 1 make[2]: Leaving directory `/home//Desktop/openmpi-1.3.2/opal/tools/wrappers' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal' make: *** [all-recursive] Error 1 Is there any other way of enabling debugging and profilling in open MPI. Leo Explore your hobbies and interests. Click here to begin.___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel Bollywood news, movie reviews, film trailers and more! Go to http://in.movies.yahoo.com/
Re: [OMPI devel] Hang in collectives involving shared memory
Hi Ralph, I managed to have a deadlock after a whole night, but not the same you have : after a quick analysis, process 0 seems to be blocked in the very first send through shared memory. Still maybe a bug, but not the same as yours IMO. I also figured out that libnuma support was not in my library, so I rebuilt the lib and this doesn't seem to change anything : same execution speed, same memory footprint, and of course same the-bug-does-not-appear :-(. So, no luck so far in reproducing your problem. I guess you're the only one to be able to progress on this (since you seem to have a real reproducer). Sylvain On Wed, 10 Jun 2009, Sylvain Jeaugey wrote: Hum, very glad that padb works with Open MPI, I couldn't live without it. In my opinion, the best debug tool for parallel applications, and more importantly, the only one that scales. About the issue, I couldn't reproduce it on my platform (tried 2 nodes with 2 to 8 processes each, nodes are twin 2.93 GHz Nehalem, IB is Mellanox QDR). So my feeling about that is that is may be very hardware related. Especially if you use the hierarch component, some transactions will be done through RDMA on one side and read directly through shared memory on the other side, which can, depending on the hardware, produce very different timings and bugs. Did you try with a different collective component (i.e. not hierarch) ? Or with another interconnect ? [Yes, of course, if it is a race condition, we might well avoid the bug because timings will be different, but that's still information] Perhaps all what I'm saying makes no sense or you already thought about this, anyway, if you want me to try different things, just let me know. Sylvain On Wed, 10 Jun 2009, Ralph Castain wrote: Hi Ashley Thanks! I would definitely be interested and will look at the tool. Meantime, I have filed a bunch of data on this in ticket #1944, so perhaps you might take a glance at that and offer some thoughts? https://svn.open-mpi.org/trac/ompi/ticket/1944 Will be back after I look at the tool. Thanks again Ralph On Wed, Jun 10, 2009 at 8:51 AM, Ashley Pittman wrote: Ralph, If I may say this is exactly the type of problem the tool I have been working on recently aims to help with and I'd be happy to help you through it. Firstly I'd say of the three collectives you mention, MPI_Allgather, MPI_Reduce and MPI_Bcast one exhibit a many-to-many, one a many-to-one and the last a many-to-one communication pattern. The scenario of a root process falling behind and getting swamped in comms is a plausible one for MPI_Reduce only but doesn't hold water with the other two. You also don't mention if the loop is over a single collective or if you have loop calling a number of different collectives each iteration. padb, the tool I've been working on has the ability to look at parallel jobs and report on the state of collective comms and should help narrow you down on erroneous processes and those simply blocked waiting for comms. I'd recommend using it to look at maybe four or five instances where the application has hung and look for any common features between them. Let me know if you are willing to try this route and I'll talk, the code is downloadable from http://padb.pittman.org.uk and if you want the full collective functionality you'll need to patch openmp with the patch from http://padb.pittman.org.uk/extensions.html Ashley, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Enabling debugging and profiling in openMPI (make "CFLAGS=-pg -g")
If you do a "./configure --help" you will get a complete list of the configure options. You may want to turn on more things than just enable-debug, though that is the critical first step. On Jun 12, 2009, at 8:31 AM, Samuel K. Gutierrez wrote: Hi, Let me begin by stating that I'm at most an Open MPI novice - but you may want to try the addition of the --enable-debug configure option. That is, for example: ./configure --enable-debug; make Hope this helps. Samuel K. Gutierrez On Jun 12, 2009, at 3:27 AM, Leo P. wrote: Hi everyone, I am trying to understand the openMPI code so was trying to enable debug and profiling by issusing $ make "CFLAGS=-pg -g" But i am getting this error. libtool: link: ( cd ".libs" && rm -f "mca_paffinity_linux.la" && ln -s "../mca_paffinity_linux.la" "mca_paffinity_linux.la" ) make[3]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal/mca/ paffinity/linux' make[2]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal/mca/ paffinity/linux' Making all in tools/wrappers make[2]: Entering directory `/home/Desktop/openmpi-1.3.2/opal/tools/ wrappers' depbase=`echo opal_wrapper.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\ gcc "-DEXEEXT=\"\"" -I. -I../../../opal/include -I../../../orte/ include -I../../../ompi/include -I../../../opal/mca/paffinity/linux/ plpa/src/libplpa -I../../..-pg -g -MT opal_wrapper.o -MD -MP - MF $depbase.Tpo -c -o opal_wrapper.o opal_wrapper.c &&\ mv -f $depbase.Tpo $depbase.Po /bin/bash ../../../libtool --tag=CC --mode=link gcc -pg -g - export-dynamic -o opal_wrapper opal_wrapper.o ../../../opal/ libopen-pal.la -lnsl -lutil -lm libtool: link: gcc -pg -g -o .libs/opal_wrapper opal_wrapper.o - Wl,--export-dynamic ../../../opal/.libs/libopen-pal.so -ldl -lnsl - lutil -lm ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_key_create' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_getspecific' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_create' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_atfork' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_setspecific' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_join' collect2: ld returned 1 exit status make[2]: *** [opal_wrapper] Error 1 make[2]: Leaving directory `/home//Desktop/openmpi-1.3.2/opal/tools/ wrappers' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal' make: *** [all-recursive] Error 1 Is there any other way of enabling debugging and profilling in open MPI. Leo Explore your hobbies and interests. Click here to begin.___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Enabling debugging and profiling in openMPI (make "CFLAGS=-pg -g")
Hi, Let me begin by stating that I'm at most an Open MPI novice - but you may want to try the addition of the --enable-debug configure option. That is, for example: ./configure --enable-debug; make Hope this helps. Samuel K. Gutierrez On Jun 12, 2009, at 3:27 AM, Leo P. wrote: Hi everyone, I am trying to understand the openMPI code so was trying to enable debug and profiling by issusing $ make "CFLAGS=-pg -g" But i am getting this error. libtool: link: ( cd ".libs" && rm -f "mca_paffinity_linux.la" && ln - s "../mca_paffinity_linux.la" "mca_paffinity_linux.la" ) make[3]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal/mca/ paffinity/linux' make[2]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal/mca/ paffinity/linux' Making all in tools/wrappers make[2]: Entering directory `/home/Desktop/openmpi-1.3.2/opal/tools/ wrappers' depbase=`echo opal_wrapper.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\ gcc "-DEXEEXT=\"\"" -I. -I../../../opal/include -I../../../orte/ include -I../../../ompi/include -I../../../opal/mca/paffinity/linux/ plpa/src/libplpa -I../../..-pg -g -MT opal_wrapper.o -MD -MP - MF $depbase.Tpo -c -o opal_wrapper.o opal_wrapper.c &&\ mv -f $depbase.Tpo $depbase.Po /bin/bash ../../../libtool --tag=CC --mode=link gcc -pg -g - export-dynamic -o opal_wrapper opal_wrapper.o ../../../opal/ libopen-pal.la -lnsl -lutil -lm libtool: link: gcc -pg -g -o .libs/opal_wrapper opal_wrapper.o -Wl,-- export-dynamic ../../../opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_key_create' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_getspecific' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_create' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_atfork' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_setspecific' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_join' collect2: ld returned 1 exit status make[2]: *** [opal_wrapper] Error 1 make[2]: Leaving directory `/home//Desktop/openmpi-1.3.2/opal/tools/ wrappers' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal' make: *** [all-recursive] Error 1 Is there any other way of enabling debugging and profilling in open MPI. Leo Explore your hobbies and interests. Click here to begin.___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] Enabling debugging and profiling in openMPI (make "CFLAGS=-pg -g")
Hi everyone, I am trying to understand the openMPI code so was trying to enable debug and profiling by issusing $ make "CFLAGS=-pg -g" But i am getting this error. libtool: link: ( cd ".libs" && rm -f "mca_paffinity_linux.la" && ln -s "../mca_paffinity_linux.la" "mca_paffinity_linux.la" ) make[3]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal/mca/paffinity/linux' make[2]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal/mca/paffinity/linux' Making all in tools/wrappers make[2]: Entering directory `/home/Desktop/openmpi-1.3.2/opal/tools/wrappers' depbase=`echo opal_wrapper.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\ gcc "-DEXEEXT=\"\"" -I. -I../../../opal/include -I../../../orte/include -I../../../ompi/include -I../../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../..-pg -g -MT opal_wrapper.o -MD -MP -MF $depbase.Tpo -c -o opal_wrapper.o opal_wrapper.c &&\ mv -f $depbase.Tpo $depbase.Po /bin/bash ../../../libtool --tag=CC --mode=link gcc -pg -g -export-dynamic -o opal_wrapper opal_wrapper.o ../../../opal/libopen-pal.la -lnsl -lutil -lm libtool: link: gcc -pg -g -o .libs/opal_wrapper opal_wrapper.o -Wl,--export-dynamic ../../../opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_key_create' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_getspecific' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_create' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_atfork' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_setspecific' ../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_join' collect2: ld returned 1 exit status make[2]: *** [opal_wrapper] Error 1 make[2]: Leaving directory `/home//Desktop/openmpi-1.3.2/opal/tools/wrappers' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal' make: *** [all-recursive] Error 1 Is there any other way of enabling debugging and profilling in open MPI. Leo Own a website.Get an unlimited package.Pay next to nothing.*Go to http://in.business.yahoo.com/