Sorry for the delay -- I just replied on the users list. I think you need to use MPI_INIT_THREAD with MPI_THREAD_MULTIPLE. See if that helps.
On Oct 26, 2011, at 7:19 AM, Pedro Gonnet wrote: > > Hi all, > > I'm forwarding this message from the "users" mailing list as it wasn't > getting any attention there and I believe this is a bona-fide bug. > > The issue is that if an MPI node has two threads, one exchanging data > with other nodes through the non-blocking routines, the other exchanging > data with MPI_Allreduce, the system hangs. > > The attached example program reproduces this bug. It can be compiled and > run using the following: > > mpicc -g -Wall mpitest.c -pthread > mpirun -np 8 xterm -e gdb -ex run ./a.out > > Note that you may need to fiddle with the delay in line 146 to reproduce > the problem. > > Many thanks, > Pedro > > > > -------- Forwarded Message -------- > From: Pedro Gonnet <gon...@maths.ox.ac.uk> > To: users <us...@open-mpi.org> > Subject: Re: Troubles using MPI_Isend/MPI_Irecv/MPI_Waitany and > MPI_Allreduce > Date: Sun, 23 Oct 2011 18:11:50 +0100 > > Hi again, > > As promised, I implemented a small program reproducing the error. > > The program's main routine spawns a pthread which calls the function > "exchange". "exchange" uses MPI_Isend/MPI_Irecv/MPI_Waitany to exchange > a buffer of double-precision numbers with all other nodes. > > At the same time, the "main" routine exchanges the sum of all the > buffers using MPI_Allreduce. > > To compile and run the program, do the following: > > mpicc -g -Wall mpitest.c -pthread > mpirun -np 8 ./a.out > > Timing is, of course, of the essence and you may have to run the program > a few times or twiddle with the value of "usleep" in line 146 for it to > hang. To see where things go bad, you can do the following > > mpirun -np 8 xterm -e gdb -ex run ./a.out > > Things go bad when MPI_Allreduce is called while any of the threads are > in MPI_Waitany. The value of "usleep" in line 146 should be long enough > for all the nodes to have started exchanging data but small enough so > that they are not done yet. > > Cheers, > Pedro > > > > On Thu, 2011-10-20 at 11:25 +0100, Pedro Gonnet wrote: >> Short update: >> >> I just installed version 1.4.4 from source (compiled with >> --enable-mpi-threads), and the problem persists. >> >> I should also point out that if, in thread (ii), I wait for the >> nonblocking communication in thread (i) to finish, nothing bad happens. >> But this makes the nonblocking communication somewhat pointless. >> >> Cheers, >> Pedro >> >> >> On Thu, 2011-10-20 at 10:42 +0100, Pedro Gonnet wrote: >>> Hi all, >>> >>> I am currently working on a multi-threaded hybrid parallel simulation >>> which uses both pthreads and OpenMPI. The simulation uses several >>> pthreads per MPI node. >>> >>> My code uses the nonblocking routines MPI_Isend/MPI_Irecv/MPI_Waitany >>> quite successfully to implement the node-to-node communication. When I >>> try to interleave other computations during this communication, however, >>> bad things happen. >>> >>> I have two MPI nodes with two threads each: one thread (i) doing the >>> nonblocking communication and the other (ii) doing other computations. >>> At some point, the threads (ii) need to exchange data using >>> MPI_Allreduce, which fails if the first thread (i) has not completed all >>> the communication, i.e. if thread (i) is still in MPI_Waitany. >>> >>> Using the in-place MPI_Allreduce, I get a re-run of this bug: >>> http://www.open-mpi.org/community/lists/users/2011/09/17432.php. If I >>> don't use in-place, the call to MPI_Waitany (thread ii) on one of the >>> MPI nodes waits forever. >>> >>> My guess is that when the thread (ii) calls MPI_Allreduce, it gets >>> whatever the other node sent with MPI_Isend to thread (i), drops >>> whatever it should have been getting from the other node's >>> MPI_Allreduce, and the call to MPI_Waitall hangs. >>> >>> Is this a known issue? Is MPI_Allreduce not designed to work alongside >>> the nonblocking routines? Is there a "safe" variant of MPI_Allreduce I >>> should be using instead? >>> >>> I am using OpenMPI version 1.4.3 (version 1.4.3-1ubuntu3 of the package >>> openmpi-bin in Ubuntu). Both MPI nodes are run on the same dual-core >>> computer (Lenovo x201 laptop). >>> >>> If you need more information, please do let me know! I'll also try to >>> cook-up a small program reproducing this problem... >>> >>> Cheers and kind regards, >>> Pedro >>> >>> >>> >>> >> > > > <mpitest.c>_______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/