[OMPI users] Bad behavior in Allgatherv when a count is 0
I have found that on rare occasion Allgatherv fails to pass the data to all processes. Given some magical combination of receive counts and displacements, one or more processes are missing some or all of some arrays in their receive buffer. A necessary, but not sufficient, condition seems to be that one of the receive counts is 0. Beyond that I have not figured out any real pattern, but the example program listed below demonstrates the failure. I have tried it on OpenMPI version 1.2.3 and 1.2.4; it fails on both. However, it works fine with version 1.1.2, so the problem must have been introduced since then. -Ken Kenneth Moreland *** Sandia National Laboratories *** *** *** *** email: kmo...@sandia.gov ** *** ** phone: (505) 844-8919 *** fax: (505) 845-0833 #include #include #include int main(int argc, char **argv) { int rank; int size; MPI_Comm smallComm; int senddata[5], recvdata[100]; int lengths[3], offsets[3]; int i, j; MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, ); MPI_Comm_size(MPI_COMM_WORLD, ); if (size != 3) { printf("Need 3 processes."); MPI_Abort(MPI_COMM_WORLD, 1); } for (i = 0; i < 100; i++) recvdata[i] = -1; for (i = 0; i < 5; i++) senddata[i] = rank*10 + i; lengths[0] = 5; lengths[1] = 0; lengths[2] = 5; offsets[0] = 3; offsets[1] = 9; offsets[2] = 10; MPI_Allgatherv(senddata, lengths[rank], MPI_INT, recvdata, lengths, offsets, MPI_INT, MPI_COMM_WORLD); for (i = 0; i < size; i++) { for (j = 0; j < lengths[i]; j++) { if (recvdata[offsets[i]+j] != 10*i+j) { printf("%d: Got bad data from rank %d, index %d: %d\n", rank, i, j, recvdata[offsets[i]+j]); break; } } } MPI_Finalize(); return 0; }
Re: [OMPI users] Problems with GATHERV on one process
Excellent. Thanks. -Ken > -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Jeff Squyres > Sent: Thursday, December 13, 2007 6:02 AM > To: Open MPI Users > Subject: Re: [OMPI users] Problems with GATHERV on one process > > Correct. Here's the original commit that fixed the problem: > > https://svn.open-mpi.org/trac/ompi/changeset/16360 > > And the commit to the v1.2 branch: > > https://svn.open-mpi.org/trac/ompi/changeset/16519 > > > On Dec 12, 2007, at 2:43 PM, Moreland, Kenneth wrote: > > > Thanks Tim. I've since noticed similar problems with MPI_Allgatherv > > and > > MPI_Scatterv. I'm guessing they are all related. Do you happen to > > know > > if those are being fixed as well? > > > > -Ken > > > >> -Original Message- > >> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] > > On > >> Behalf Of Tim Mattox > >> Sent: Tuesday, December 11, 2007 3:34 PM > >> To: Open MPI Users > >> Subject: Re: [OMPI users] Problems with GATHERV on one process > >> > >> Hello Ken, > >> This is a known bug, which is fixed in the upcoming 1.2.5 release. > >> We > >> expect 1.2.5 > >> to come out very soon. We should have a new release candidate for > > 1.2.5 > >> posted > >> by tomorrow. > >> > >> See these tickets about the bug if you care to look: > >> https://svn.open-mpi.org/trac/ompi/ticket/1166 > >> https://svn.open-mpi.org/trac/ompi/ticket/1157 > >> > >> On Dec 11, 2007 2:48 PM, Moreland, Kenneth <kmo...@sandia.gov> wrote: > >>> I recently ran into a problem with GATHERV while running some > > randomized > >>> tests on my MPI code. The problem seems to occur when running > >>> MPI_Gatherv with a displacement on a communicator with a single > > process. > >>> The code listed below exercises this errant behavior. I have tried > > it > >>> on OpenMPI 1.1.2 and 1.2.4. > >>> > >>> Granted, this is not a situation that one would normally run into in > > a > >>> real application, but I just wanted to check to make sure I was not > >>> doing anything wrong. > >>> > >>> -Ken > >>> > >>> > >>> > >>> #include > >>> > >>> #include > >>> #include > >>> > >>> int main(int argc, char **argv) > >>> { > >>> int rank; > >>> MPI_Comm smallComm; > >>> int senddata[4], recvdata[4], length, offset; > >>> > >>> MPI_Init(, ); > >>> > >>> MPI_Comm_rank(MPI_COMM_WORLD, ); > >>> > >>> // Split up into communicators of size 1. > >>> MPI_Comm_split(MPI_COMM_WORLD, rank, 0, ); > >>> > >>> // Now try to do a gatherv. > >>> senddata[0] = 5; senddata[1] = 6; senddata[2] = 7; senddata[3] = > > 8; > >>> recvdata[0] = 0; recvdata[1] = 0; recvdata[2] = 0; recvdata[3] = > > 0; > >>> length = 3; > >>> offset = 1; > >>> MPI_Gatherv(senddata, length, MPI_INT, > >>> recvdata, , , MPI_INT, 0, smallComm); > >>> if (senddata[0] != recvdata[offset]) > >>>{ > >>>printf("%d: %d != %d?\n", rank, senddata[0], recvdata[offset]); > >>>} > >>> else > >>>{ > >>>printf("%d: Everything OK.\n", rank); > >>>} > >>> > >>> return 0; > >>> } > >>> > >>> Kenneth Moreland > >>>*** Sandia National Laboratories > >>> *** > >>> *** *** *** email: kmo...@sandia.gov > >>> ** *** ** phone: (505) 844-8919 > >>>*** fax: (505) 845-0833 > >>> > >>> > >>> > >>> ___ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> > >> > >> -- > >> Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ > >> tmat...@gmail.com || timat...@open-mpi.org > >>I'm a bright... http://www.the-brights.net/ > >> ___ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > Cisco Systems > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Problems with GATHERV on one process
Thanks Tim. I've since noticed similar problems with MPI_Allgatherv and MPI_Scatterv. I'm guessing they are all related. Do you happen to know if those are being fixed as well? -Ken > -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Tim Mattox > Sent: Tuesday, December 11, 2007 3:34 PM > To: Open MPI Users > Subject: Re: [OMPI users] Problems with GATHERV on one process > > Hello Ken, > This is a known bug, which is fixed in the upcoming 1.2.5 release. We > expect 1.2.5 > to come out very soon. We should have a new release candidate for 1.2.5 > posted > by tomorrow. > > See these tickets about the bug if you care to look: > https://svn.open-mpi.org/trac/ompi/ticket/1166 > https://svn.open-mpi.org/trac/ompi/ticket/1157 > > On Dec 11, 2007 2:48 PM, Moreland, Kenneth <kmo...@sandia.gov> wrote: > > I recently ran into a problem with GATHERV while running some randomized > > tests on my MPI code. The problem seems to occur when running > > MPI_Gatherv with a displacement on a communicator with a single process. > > The code listed below exercises this errant behavior. I have tried it > > on OpenMPI 1.1.2 and 1.2.4. > > > > Granted, this is not a situation that one would normally run into in a > > real application, but I just wanted to check to make sure I was not > > doing anything wrong. > > > > -Ken > > > > > > > > #include > > > > #include > > #include > > > > int main(int argc, char **argv) > > { > > int rank; > > MPI_Comm smallComm; > > int senddata[4], recvdata[4], length, offset; > > > > MPI_Init(, ); > > > > MPI_Comm_rank(MPI_COMM_WORLD, ); > > > > // Split up into communicators of size 1. > > MPI_Comm_split(MPI_COMM_WORLD, rank, 0, ); > > > > // Now try to do a gatherv. > > senddata[0] = 5; senddata[1] = 6; senddata[2] = 7; senddata[3] = 8; > > recvdata[0] = 0; recvdata[1] = 0; recvdata[2] = 0; recvdata[3] = 0; > > length = 3; > > offset = 1; > > MPI_Gatherv(senddata, length, MPI_INT, > > recvdata, , , MPI_INT, 0, smallComm); > > if (senddata[0] != recvdata[offset]) > > { > > printf("%d: %d != %d?\n", rank, senddata[0], recvdata[offset]); > > } > > else > > { > > printf("%d: Everything OK.\n", rank); > > } > > > > return 0; > > } > > > > Kenneth Moreland > > *** Sandia National Laboratories > > *** > > *** *** *** email: kmo...@sandia.gov > > ** *** ** phone: (505) 844-8919 > > *** fax: (505) 845-0833 > > > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -- > Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ > tmat...@gmail.com || timat...@open-mpi.org > I'm a bright... http://www.the-brights.net/ > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Problems with GATHERV on one process
I recently ran into a problem with GATHERV while running some randomized tests on my MPI code. The problem seems to occur when running MPI_Gatherv with a displacement on a communicator with a single process. The code listed below exercises this errant behavior. I have tried it on OpenMPI 1.1.2 and 1.2.4. Granted, this is not a situation that one would normally run into in a real application, but I just wanted to check to make sure I was not doing anything wrong. -Ken #include #include #include int main(int argc, char **argv) { int rank; MPI_Comm smallComm; int senddata[4], recvdata[4], length, offset; MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, ); // Split up into communicators of size 1. MPI_Comm_split(MPI_COMM_WORLD, rank, 0, ); // Now try to do a gatherv. senddata[0] = 5; senddata[1] = 6; senddata[2] = 7; senddata[3] = 8; recvdata[0] = 0; recvdata[1] = 0; recvdata[2] = 0; recvdata[3] = 0; length = 3; offset = 1; MPI_Gatherv(senddata, length, MPI_INT, recvdata, , , MPI_INT, 0, smallComm); if (senddata[0] != recvdata[offset]) { printf("%d: %d != %d?\n", rank, senddata[0], recvdata[offset]); } else { printf("%d: Everything OK.\n", rank); } return 0; } Kenneth Moreland *** Sandia National Laboratories *** *** *** *** email: kmo...@sandia.gov ** *** ** phone: (505) 844-8919 *** fax: (505) 845-0833
Re: [OMPI users] MPI_File_set_view rejecting subarray views.
Thanks, Brian. That did the trick. -Ken > -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Brian Barrett > Sent: Thursday, July 19, 2007 3:39 PM > To: Open MPI Users > Subject: Re: [OMPI users] MPI_File_set_view rejecting subarray views. > > On Jul 19, 2007, at 3:24 PM, Moreland, Kenneth wrote: > > > I've run into a problem with the File I/O with openmpi version 1.2.3. > > It is not possible to call MPI_File_set_view with a datatype created > > from a subarray. Instead of letting me set a view of this type, it > > gives an invalid datatype error. I have attached a simple program > > that > > demonstrates the problem. In particular, the following sequence of > > function calls should be supported, but they are not. > > > > MPI_Type_create_subarray(3, sizes, subsizes, starts, > >MPI_ORDER_FORTRAN, MPI_BYTE, ); > > MPI_File_set_view(fd, 20, MPI_BYTE, view, "native", MPI_INFO_NULL); > > > > After poking around in the source code a bit, I discovered that the > > I/O > > implementation actually supports the subarray data type, but there > > is a > > check that is issuing an error before the underlying I/O layer (ROMIO) > > has a chance to handle the request. > > You need to commit the datatype after calling > MPI_Type_create_subarray. If you add: > >MPI_Type_commit(); > > after the Type_create, but before File_set_view, the code will run to > completion. > > Well, the code will then complain about a Barrier after MPI_Finalize > due to an error in how we shut down when there are files that have > been opened but not closed (you should also add a call to > MPI_File_close after the set_view, but I'm assuming it's not there > because this is a test code). This is something we need to fix, but > also signifies a user error. > > > Brian > > -- >Brian W. Barrett >Networking Team, CCS-1 >Los Alamos National Laboratory > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users