Re: [OMPI users] MPI_Allgather problem
Looking at the change log for 1.5.1 I see: - Use memmove (instead of memcpy) when necessary (e.g., source and destination overlap). It seems as though this might be a likely candidate for a change that might fix my problems if I am indeed using 1.5.3 following the installation of OpenFOAM? On Fri, Jan 27, 2012 at 10:02 AM, Brett Tully wrote: > Interesting. In the same set of updates, I installed OpenFOAM from their > Ubuntu deb package and it claims to ship with openmpi. I just downloaded > their Third-party source tar and unzipped it to see what version of openmpi > they are using, and it is 1.5.3. However, when I do man openmpi, or > ompi_info, I get the same version as before (1.4.3). How do I determine for > sure what is being included when I compile something using mpicc? > > Thanks, > Brett. > > > > On Thu, Jan 26, 2012 at 10:05 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > >> What version did you upgrade to? (we don't control the Ubuntu packaging) >> >> I see a bullet in the soon-to-be-released 1.4.5 release notes: >> >> - Fix obscure cases where MPI_ALLGATHER could crash. Thanks to Andrew >> Senin for reporting the problem. >> >> But that would be surprising if this is what fixed your issue, especially >> since it's not released yet. :-) >> >> >> >> On Jan 26, 2012, at 5:24 AM, Brett Tully wrote: >> >> > As of two days ago, this problem has disappeared and the tests that I >> had written and run each night are now passing. Having looked through the >> update log of my machine (Ubuntu 11.10) it appears as though I got a new >> version of mpi-default-dev (0.6ubuntu1). I would like to understand this >> problem in more detail -- is it possible to see what changed in this update? >> > Thanks, >> > Brett. >> > >> > >> > >> > On Fri, Dec 9, 2011 at 6:43 PM, teng ma <t...@eecs.utk.edu> wrote: >> > I guess your output is from different ranks. YOu can add rank infor >> inside print to tell like follows: >> > >> > (void) printf("rank %d: gathered[%d].node = %d\n", rank, i, >> gathered[i].node); >> > >> > From my side, I did not see anything wrong from your code in Open MPI >> 1.4.3. after I add rank, the output is >> > rank 5: gathered[0].node = 0 >> > rank 5: gathered[1].node = 1 >> > rank 5: gathered[2].node = 2 >> > rank 5: gathered[3].node = 3 >> > rank 5: gathered[4].node = 4 >> > rank 5: gathered[5].node = 5 >> > rank 3: gathered[0].node = 0 >> > rank 3: gathered[1].node = 1 >> > rank 3: gathered[2].node = 2 >> > rank 3: gathered[3].node = 3 >> > rank 3: gathered[4].node = 4 >> > rank 3: gathered[5].node = 5 >> > rank 1: gathered[0].node = 0 >> > rank 1: gathered[1].node = 1 >> > rank 1: gathered[2].node = 2 >> > rank 1: gathered[3].node = 3 >> > rank 1: gathered[4].node = 4 >> > rank 1: gathered[5].node = 5 >> > rank 0: gathered[0].node = 0 >> > rank 0: gathered[1].node = 1 >> > rank 0: gathered[2].node = 2 >> > rank 0: gathered[3].node = 3 >> > rank 0: gathered[4].node = 4 >> > rank 0: gathered[5].node = 5 >> > rank 4: gathered[0].node = 0 >> > rank 4: gathered[1].node = 1 >> > rank 4: gathered[2].node = 2 >> > rank 4: gathered[3].node = 3 >> > rank 4: gathered[4].node = 4 >> > rank 4: gathered[5].node = 5 >> > rank 2: gathered[0].node = 0 >> > rank 2: gathered[1].node = 1 >> > rank 2: gathered[2].node = 2 >> > rank 2: gathered[3].node = 3 >> > rank 2: gathered[4].node = 4 >> > rank 2: gathered[5].node = 5 >> > >> > Is that what you expected? >> > >> > On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully <brett.tu...@oxyntix.com> >> wrote: >> > Dear all, >> > >> > I have not used OpenMPI much before, but am maintaining a large legacy >> application. We noticed a bug to do with a call to MPI_Allgather as >> summarised in this post to Stackoverflow: >> http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results >> > >> > In the process of looking further into the problem, I noticed that the >> following function results in strange behaviour. >> > >> > void test_all_gather() { >> > >> > struct _TEST_ALL_GATHER { >> > int node; >> > }; >> > >> > int ierr, size, rank; >> > ierr = MPI_Comm_size(MPI_COMM_WORLD, ); >> > ierr = MPI_Com
Re: [OMPI users] MPI_Allgather problem
Interesting. In the same set of updates, I installed OpenFOAM from their Ubuntu deb package and it claims to ship with openmpi. I just downloaded their Third-party source tar and unzipped it to see what version of openmpi they are using, and it is 1.5.3. However, when I do man openmpi, or ompi_info, I get the same version as before (1.4.3). How do I determine for sure what is being included when I compile something using mpicc? Thanks, Brett. On Thu, Jan 26, 2012 at 10:05 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > What version did you upgrade to? (we don't control the Ubuntu packaging) > > I see a bullet in the soon-to-be-released 1.4.5 release notes: > > - Fix obscure cases where MPI_ALLGATHER could crash. Thanks to Andrew > Senin for reporting the problem. > > But that would be surprising if this is what fixed your issue, especially > since it's not released yet. :-) > > > > On Jan 26, 2012, at 5:24 AM, Brett Tully wrote: > > > As of two days ago, this problem has disappeared and the tests that I > had written and run each night are now passing. Having looked through the > update log of my machine (Ubuntu 11.10) it appears as though I got a new > version of mpi-default-dev (0.6ubuntu1). I would like to understand this > problem in more detail -- is it possible to see what changed in this update? > > Thanks, > > Brett. > > > > > > > > On Fri, Dec 9, 2011 at 6:43 PM, teng ma <t...@eecs.utk.edu> wrote: > > I guess your output is from different ranks. YOu can add rank infor > inside print to tell like follows: > > > > (void) printf("rank %d: gathered[%d].node = %d\n", rank, i, > gathered[i].node); > > > > From my side, I did not see anything wrong from your code in Open MPI > 1.4.3. after I add rank, the output is > > rank 5: gathered[0].node = 0 > > rank 5: gathered[1].node = 1 > > rank 5: gathered[2].node = 2 > > rank 5: gathered[3].node = 3 > > rank 5: gathered[4].node = 4 > > rank 5: gathered[5].node = 5 > > rank 3: gathered[0].node = 0 > > rank 3: gathered[1].node = 1 > > rank 3: gathered[2].node = 2 > > rank 3: gathered[3].node = 3 > > rank 3: gathered[4].node = 4 > > rank 3: gathered[5].node = 5 > > rank 1: gathered[0].node = 0 > > rank 1: gathered[1].node = 1 > > rank 1: gathered[2].node = 2 > > rank 1: gathered[3].node = 3 > > rank 1: gathered[4].node = 4 > > rank 1: gathered[5].node = 5 > > rank 0: gathered[0].node = 0 > > rank 0: gathered[1].node = 1 > > rank 0: gathered[2].node = 2 > > rank 0: gathered[3].node = 3 > > rank 0: gathered[4].node = 4 > > rank 0: gathered[5].node = 5 > > rank 4: gathered[0].node = 0 > > rank 4: gathered[1].node = 1 > > rank 4: gathered[2].node = 2 > > rank 4: gathered[3].node = 3 > > rank 4: gathered[4].node = 4 > > rank 4: gathered[5].node = 5 > > rank 2: gathered[0].node = 0 > > rank 2: gathered[1].node = 1 > > rank 2: gathered[2].node = 2 > > rank 2: gathered[3].node = 3 > > rank 2: gathered[4].node = 4 > > rank 2: gathered[5].node = 5 > > > > Is that what you expected? > > > > On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully <brett.tu...@oxyntix.com> > wrote: > > Dear all, > > > > I have not used OpenMPI much before, but am maintaining a large legacy > application. We noticed a bug to do with a call to MPI_Allgather as > summarised in this post to Stackoverflow: > http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results > > > > In the process of looking further into the problem, I noticed that the > following function results in strange behaviour. > > > > void test_all_gather() { > > > > struct _TEST_ALL_GATHER { > > int node; > > }; > > > > int ierr, size, rank; > > ierr = MPI_Comm_size(MPI_COMM_WORLD, ); > > ierr = MPI_Comm_rank(MPI_COMM_WORLD, ); > > > > struct _TEST_ALL_GATHER local; > > struct _TEST_ALL_GATHER *gathered; > > > > gathered = (struct _TEST_ALL_GATHER*) malloc(size * > sizeof(*gathered)); > > > > local.node = rank; > > > > MPI_Allgather(, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE, > > gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE, > MPI_COMM_WORLD); > > > > int i; > > for (i = 0; i < numnodes; ++i) { > > (void) printf("gathered[%d].node = %d\n", i, gathered[i].node); > > } > > > > FREE(gathered); > > } > > > > At one point, this function printed the following: > > gathered[0].node
Re: [OMPI users] MPI_Allgather problem
As of two days ago, this problem has disappeared and the tests that I had written and run each night are now passing. Having looked through the update log of my machine (Ubuntu 11.10) it appears as though I got a new version of mpi-default-dev (0.6ubuntu1). I would like to understand this problem in more detail -- is it possible to see what changed in this update? Thanks, Brett. > > On Fri, Dec 9, 2011 at 6:43 PM, teng ma <t...@eecs.utk.edu> wrote: > >> I guess your output is from different ranks. YOu can add rank infor >> inside print to tell like follows: >> >> (void) printf("rank %d: gathered[%d].node = %d\n", rank, i, >> gathered[i].node); >> >> From my side, I did not see anything wrong from your code in Open MPI >> 1.4.3. after I add rank, the output is >> rank 5: gathered[0].node = 0 >> rank 5: gathered[1].node = 1 >> rank 5: gathered[2].node = 2 >> rank 5: gathered[3].node = 3 >> rank 5: gathered[4].node = 4 >> rank 5: gathered[5].node = 5 >> rank 3: gathered[0].node = 0 >> rank 3: gathered[1].node = 1 >> rank 3: gathered[2].node = 2 >> rank 3: gathered[3].node = 3 >> rank 3: gathered[4].node = 4 >> rank 3: gathered[5].node = 5 >> rank 1: gathered[0].node = 0 >> rank 1: gathered[1].node = 1 >> rank 1: gathered[2].node = 2 >> rank 1: gathered[3].node = 3 >> rank 1: gathered[4].node = 4 >> rank 1: gathered[5].node = 5 >> rank 0: gathered[0].node = 0 >> rank 0: gathered[1].node = 1 >> rank 0: gathered[2].node = 2 >> rank 0: gathered[3].node = 3 >> rank 0: gathered[4].node = 4 >> rank 0: gathered[5].node = 5 >> rank 4: gathered[0].node = 0 >> rank 4: gathered[1].node = 1 >> rank 4: gathered[2].node = 2 >> rank 4: gathered[3].node = 3 >> rank 4: gathered[4].node = 4 >> rank 4: gathered[5].node = 5 >> rank 2: gathered[0].node = 0 >> rank 2: gathered[1].node = 1 >> rank 2: gathered[2].node = 2 >> rank 2: gathered[3].node = 3 >> rank 2: gathered[4].node = 4 >> rank 2: gathered[5].node = 5 >> >> Is that what you expected? >> >> On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully <brett.tu...@oxyntix.com>wrote: >> >>> Dear all, >>> >>> I have not used OpenMPI much before, but am maintaining a large legacy >>> application. We noticed a bug to do with a call to MPI_Allgather as >>> summarised in this post to Stackoverflow: >>> http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results >>> >>> In the process of looking further into the problem, I noticed that the >>> following function results in strange behaviour. >>> >>> void test_all_gather() { >>> >>> struct _TEST_ALL_GATHER { >>> int node; >>> }; >>> >>> int ierr, size, rank; >>> ierr = MPI_Comm_size(MPI_COMM_WORLD, ); >>> ierr = MPI_Comm_rank(MPI_COMM_WORLD, ); >>> >>> struct _TEST_ALL_GATHER local; >>> struct _TEST_ALL_GATHER *gathered; >>> >>> gathered = (struct _TEST_ALL_GATHER*) malloc(size * >>> sizeof(*gathered)); >>> >>> local.node = rank; >>> >>> MPI_Allgather(, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE, >>> gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE, >>> MPI_COMM_WORLD); >>> >>> int i; >>> for (i = 0; i < numnodes; ++i) { >>> (void) printf("gathered[%d].node = %d\n", i, gathered[i].node); >>> } >>> >>> FREE(gathered); >>> } >>> >>> At one point, this function printed the following: >>> gathered[0].node = 2 >>> gathered[1].node = 3 >>> gathered[2].node = 2 >>> gathered[3].node = 3 >>> gathered[4].node = 4 >>> gathered[5].node = 5 >>> >>> Can anyone suggest a place to start looking into why this might be >>> happening? There is a section of the code that calls MPI_Comm_split, but I >>> am not sure if that is related... >>> >>> Running on Ubuntu 11.10 and a summary of ompi_info: >>> Package: Open MPI buildd@allspice Distribution >>> Open MPI: 1.4.3 >>> Open MPI SVN revision: r23834 >>> Open MPI release date: Oct 05, 2010 >>> Open RTE: 1.4.3 >>> Open RTE SVN revision: r23834 >>> Open RTE release date: Oct 05, 2010 >>> OPAL: 1.4.3 >>> OPAL SVN revision: r23834 >>> OPAL release date: Oct 05, 2010 >>> Ident string: 1.4.3 >>> Prefix: /usr >>> Configured architecture: x86_64-pc-linux-gnu >>> Configure host: allspice >>> Configured by: buildd >>> >>> Thanks! >>> Brett >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> >> -- >> | Teng Ma Univ. of Tennessee | >> | t...@cs.utk.eduKnoxville, TN | >> | http://web.eecs.utk.edu/~tma/ | >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >
[OMPI users] MPI_Allgather problem
Dear all, I have not used OpenMPI much before, but am maintaining a large legacy application. We noticed a bug to do with a call to MPI_Allgather as summarised in this post to Stackoverflow: http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results In the process of looking further into the problem, I noticed that the following function results in strange behaviour. void test_all_gather() { struct _TEST_ALL_GATHER { int node; }; int ierr, size, rank; ierr = MPI_Comm_size(MPI_COMM_WORLD, ); ierr = MPI_Comm_rank(MPI_COMM_WORLD, ); struct _TEST_ALL_GATHER local; struct _TEST_ALL_GATHER *gathered; gathered = (struct _TEST_ALL_GATHER*) malloc(size * sizeof(*gathered)); local.node = rank; MPI_Allgather(, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE, gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE, MPI_COMM_WORLD); int i; for (i = 0; i < numnodes; ++i) { (void) printf("gathered[%d].node = %d\n", i, gathered[i].node); } FREE(gathered); } At one point, this function printed the following: gathered[0].node = 2 gathered[1].node = 3 gathered[2].node = 2 gathered[3].node = 3 gathered[4].node = 4 gathered[5].node = 5 Can anyone suggest a place to start looking into why this might be happening? There is a section of the code that calls MPI_Comm_split, but I am not sure if that is related... Running on Ubuntu 11.10 and a summary of ompi_info: Package: Open MPI buildd@allspice Distribution Open MPI: 1.4.3 Open MPI SVN revision: r23834 Open MPI release date: Oct 05, 2010 Open RTE: 1.4.3 Open RTE SVN revision: r23834 Open RTE release date: Oct 05, 2010 OPAL: 1.4.3 OPAL SVN revision: r23834 OPAL release date: Oct 05, 2010 Ident string: 1.4.3 Prefix: /usr Configured architecture: x86_64-pc-linux-gnu Configure host: allspice Configured by: buildd Thanks! Brett