Re: [OMPI users] MPI_Allgather problem

2012-01-27 Thread Brett Tully
Looking at the change log for 1.5.1 I see:
- Use memmove (instead of memcpy) when necessary (e.g., source
and destination overlap).

It seems as though this might be a likely candidate for a change that might
fix my problems if I am indeed using 1.5.3 following the installation of
OpenFOAM?

On Fri, Jan 27, 2012 at 10:02 AM, Brett Tully wrote:

> Interesting. In the same set of updates, I installed OpenFOAM from their
> Ubuntu deb package and it claims to ship with openmpi. I just downloaded
> their Third-party source tar and unzipped it to see what version of openmpi
> they are using, and it is 1.5.3. However, when I do man openmpi, or
> ompi_info, I get the same version as before (1.4.3). How do I determine for
> sure what is being included when I compile something using mpicc?
>
> Thanks,
> Brett.
>
>
>
> On Thu, Jan 26, 2012 at 10:05 PM, Jeff Squyres <jsquy...@cisco.com> wrote:
>
>> What version did you upgrade to?  (we don't control the Ubuntu packaging)
>>
>> I see a bullet in the soon-to-be-released 1.4.5 release notes:
>>
>> - Fix obscure cases where MPI_ALLGATHER could crash.  Thanks to Andrew
>>  Senin for reporting the problem.
>>
>> But that would be surprising if this is what fixed your issue, especially
>> since it's not released yet.  :-)
>>
>>
>>
>> On Jan 26, 2012, at 5:24 AM, Brett Tully wrote:
>>
>> > As of two days ago, this problem has disappeared and the tests that I
>> had written and run each night are now passing. Having looked through the
>> update log of my machine (Ubuntu 11.10) it appears as though I got a new
>> version of mpi-default-dev (0.6ubuntu1). I would like to understand this
>> problem in more detail -- is it possible to see what changed in this update?
>> > Thanks,
>> > Brett.
>> >
>> >
>> >
>> > On Fri, Dec 9, 2011 at 6:43 PM, teng ma <t...@eecs.utk.edu> wrote:
>> > I guess your output is from different ranks.   YOu can add rank infor
>> inside print to tell like follows:
>> >
>> > (void) printf("rank %d: gathered[%d].node = %d\n", rank, i,
>> gathered[i].node);
>> >
>> > From my side, I did not see anything wrong from your code in Open MPI
>> 1.4.3. after I add rank, the output is
>> > rank 5: gathered[0].node = 0
>> > rank 5: gathered[1].node = 1
>> > rank 5: gathered[2].node = 2
>> > rank 5: gathered[3].node = 3
>> > rank 5: gathered[4].node = 4
>> > rank 5: gathered[5].node = 5
>> > rank 3: gathered[0].node = 0
>> > rank 3: gathered[1].node = 1
>> > rank 3: gathered[2].node = 2
>> > rank 3: gathered[3].node = 3
>> > rank 3: gathered[4].node = 4
>> > rank 3: gathered[5].node = 5
>> > rank 1: gathered[0].node = 0
>> > rank 1: gathered[1].node = 1
>> > rank 1: gathered[2].node = 2
>> > rank 1: gathered[3].node = 3
>> > rank 1: gathered[4].node = 4
>> > rank 1: gathered[5].node = 5
>> > rank 0: gathered[0].node = 0
>> > rank 0: gathered[1].node = 1
>> > rank 0: gathered[2].node = 2
>> > rank 0: gathered[3].node = 3
>> > rank 0: gathered[4].node = 4
>> > rank 0: gathered[5].node = 5
>> > rank 4: gathered[0].node = 0
>> > rank 4: gathered[1].node = 1
>> > rank 4: gathered[2].node = 2
>> > rank 4: gathered[3].node = 3
>> > rank 4: gathered[4].node = 4
>> > rank 4: gathered[5].node = 5
>> > rank 2: gathered[0].node = 0
>> > rank 2: gathered[1].node = 1
>> > rank 2: gathered[2].node = 2
>> > rank 2: gathered[3].node = 3
>> > rank 2: gathered[4].node = 4
>> > rank 2: gathered[5].node = 5
>> >
>> > Is that what you expected?
>> >
>> > On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully <brett.tu...@oxyntix.com>
>> wrote:
>> > Dear all,
>> >
>> > I have not used OpenMPI much before, but am maintaining a large legacy
>> application. We noticed a bug to do with a call to MPI_Allgather as
>> summarised in this post to Stackoverflow:
>> http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results
>> >
>> > In the process of looking further into the problem, I noticed that the
>> following function results in strange behaviour.
>> >
>> > void test_all_gather() {
>> >
>> > struct _TEST_ALL_GATHER {
>> > int node;
>> > };
>> >
>> > int ierr, size, rank;
>> > ierr = MPI_Comm_size(MPI_COMM_WORLD, );
>> > ierr = MPI_Com

Re: [OMPI users] MPI_Allgather problem

2012-01-27 Thread Brett Tully
Interesting. In the same set of updates, I installed OpenFOAM from their
Ubuntu deb package and it claims to ship with openmpi. I just downloaded
their Third-party source tar and unzipped it to see what version of openmpi
they are using, and it is 1.5.3. However, when I do man openmpi, or
ompi_info, I get the same version as before (1.4.3). How do I determine for
sure what is being included when I compile something using mpicc?

Thanks,
Brett.


On Thu, Jan 26, 2012 at 10:05 PM, Jeff Squyres <jsquy...@cisco.com> wrote:

> What version did you upgrade to?  (we don't control the Ubuntu packaging)
>
> I see a bullet in the soon-to-be-released 1.4.5 release notes:
>
> - Fix obscure cases where MPI_ALLGATHER could crash.  Thanks to Andrew
>  Senin for reporting the problem.
>
> But that would be surprising if this is what fixed your issue, especially
> since it's not released yet.  :-)
>
>
>
> On Jan 26, 2012, at 5:24 AM, Brett Tully wrote:
>
> > As of two days ago, this problem has disappeared and the tests that I
> had written and run each night are now passing. Having looked through the
> update log of my machine (Ubuntu 11.10) it appears as though I got a new
> version of mpi-default-dev (0.6ubuntu1). I would like to understand this
> problem in more detail -- is it possible to see what changed in this update?
> > Thanks,
> > Brett.
> >
> >
> >
> > On Fri, Dec 9, 2011 at 6:43 PM, teng ma <t...@eecs.utk.edu> wrote:
> > I guess your output is from different ranks.   YOu can add rank infor
> inside print to tell like follows:
> >
> > (void) printf("rank %d: gathered[%d].node = %d\n", rank, i,
> gathered[i].node);
> >
> > From my side, I did not see anything wrong from your code in Open MPI
> 1.4.3. after I add rank, the output is
> > rank 5: gathered[0].node = 0
> > rank 5: gathered[1].node = 1
> > rank 5: gathered[2].node = 2
> > rank 5: gathered[3].node = 3
> > rank 5: gathered[4].node = 4
> > rank 5: gathered[5].node = 5
> > rank 3: gathered[0].node = 0
> > rank 3: gathered[1].node = 1
> > rank 3: gathered[2].node = 2
> > rank 3: gathered[3].node = 3
> > rank 3: gathered[4].node = 4
> > rank 3: gathered[5].node = 5
> > rank 1: gathered[0].node = 0
> > rank 1: gathered[1].node = 1
> > rank 1: gathered[2].node = 2
> > rank 1: gathered[3].node = 3
> > rank 1: gathered[4].node = 4
> > rank 1: gathered[5].node = 5
> > rank 0: gathered[0].node = 0
> > rank 0: gathered[1].node = 1
> > rank 0: gathered[2].node = 2
> > rank 0: gathered[3].node = 3
> > rank 0: gathered[4].node = 4
> > rank 0: gathered[5].node = 5
> > rank 4: gathered[0].node = 0
> > rank 4: gathered[1].node = 1
> > rank 4: gathered[2].node = 2
> > rank 4: gathered[3].node = 3
> > rank 4: gathered[4].node = 4
> > rank 4: gathered[5].node = 5
> > rank 2: gathered[0].node = 0
> > rank 2: gathered[1].node = 1
> > rank 2: gathered[2].node = 2
> > rank 2: gathered[3].node = 3
> > rank 2: gathered[4].node = 4
> > rank 2: gathered[5].node = 5
> >
> > Is that what you expected?
> >
> > On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully <brett.tu...@oxyntix.com>
> wrote:
> > Dear all,
> >
> > I have not used OpenMPI much before, but am maintaining a large legacy
> application. We noticed a bug to do with a call to MPI_Allgather as
> summarised in this post to Stackoverflow:
> http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results
> >
> > In the process of looking further into the problem, I noticed that the
> following function results in strange behaviour.
> >
> > void test_all_gather() {
> >
> > struct _TEST_ALL_GATHER {
> > int node;
> > };
> >
> > int ierr, size, rank;
> > ierr = MPI_Comm_size(MPI_COMM_WORLD, );
> > ierr = MPI_Comm_rank(MPI_COMM_WORLD, );
> >
> > struct _TEST_ALL_GATHER local;
> > struct _TEST_ALL_GATHER *gathered;
> >
> > gathered = (struct _TEST_ALL_GATHER*) malloc(size *
> sizeof(*gathered));
> >
> > local.node = rank;
> >
> > MPI_Allgather(, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
> > gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
> MPI_COMM_WORLD);
> >
> > int i;
> > for (i = 0; i < numnodes; ++i) {
> > (void) printf("gathered[%d].node = %d\n", i, gathered[i].node);
> > }
> >
> > FREE(gathered);
> > }
> >
> > At one point, this function printed the following:
> > gathered[0].node 

Re: [OMPI users] MPI_Allgather problem

2012-01-26 Thread Brett Tully
As of two days ago, this problem has disappeared and the tests that I had
written and run each night are now passing. Having looked through the
update log of my machine (Ubuntu 11.10) it appears as though I got a new
version of mpi-default-dev (0.6ubuntu1). I would like to understand this
problem in more detail -- is it possible to see what changed in this update?
Thanks,
Brett.


>
> On Fri, Dec 9, 2011 at 6:43 PM, teng ma <t...@eecs.utk.edu> wrote:
>
>> I guess your output is from different ranks.   YOu can add rank infor
>> inside print to tell like follows:
>>
>> (void) printf("rank %d: gathered[%d].node = %d\n", rank, i,
>> gathered[i].node);
>>
>> From my side, I did not see anything wrong from your code in Open MPI
>> 1.4.3. after I add rank, the output is
>> rank 5: gathered[0].node = 0
>> rank 5: gathered[1].node = 1
>> rank 5: gathered[2].node = 2
>> rank 5: gathered[3].node = 3
>> rank 5: gathered[4].node = 4
>> rank 5: gathered[5].node = 5
>> rank 3: gathered[0].node = 0
>> rank 3: gathered[1].node = 1
>> rank 3: gathered[2].node = 2
>> rank 3: gathered[3].node = 3
>> rank 3: gathered[4].node = 4
>> rank 3: gathered[5].node = 5
>> rank 1: gathered[0].node = 0
>> rank 1: gathered[1].node = 1
>> rank 1: gathered[2].node = 2
>> rank 1: gathered[3].node = 3
>> rank 1: gathered[4].node = 4
>> rank 1: gathered[5].node = 5
>> rank 0: gathered[0].node = 0
>> rank 0: gathered[1].node = 1
>> rank 0: gathered[2].node = 2
>> rank 0: gathered[3].node = 3
>> rank 0: gathered[4].node = 4
>> rank 0: gathered[5].node = 5
>> rank 4: gathered[0].node = 0
>> rank 4: gathered[1].node = 1
>> rank 4: gathered[2].node = 2
>> rank 4: gathered[3].node = 3
>> rank 4: gathered[4].node = 4
>> rank 4: gathered[5].node = 5
>> rank 2: gathered[0].node = 0
>> rank 2: gathered[1].node = 1
>> rank 2: gathered[2].node = 2
>> rank 2: gathered[3].node = 3
>> rank 2: gathered[4].node = 4
>> rank 2: gathered[5].node = 5
>>
>> Is that what you expected?
>>
>> On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully <brett.tu...@oxyntix.com>wrote:
>>
>>> Dear all,
>>>
>>> I have not used OpenMPI much before, but am maintaining a large legacy
>>> application. We noticed a bug to do with a call to MPI_Allgather as
>>> summarised in this post to Stackoverflow:
>>> http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results
>>>
>>> In the process of looking further into the problem, I noticed that the
>>> following function results in strange behaviour.
>>>
>>> void test_all_gather() {
>>>
>>> struct _TEST_ALL_GATHER {
>>> int node;
>>> };
>>>
>>> int ierr, size, rank;
>>> ierr = MPI_Comm_size(MPI_COMM_WORLD, );
>>> ierr = MPI_Comm_rank(MPI_COMM_WORLD, );
>>>
>>> struct _TEST_ALL_GATHER local;
>>> struct _TEST_ALL_GATHER *gathered;
>>>
>>> gathered = (struct _TEST_ALL_GATHER*) malloc(size *
>>> sizeof(*gathered));
>>>
>>> local.node = rank;
>>>
>>> MPI_Allgather(, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
>>> gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
>>> MPI_COMM_WORLD);
>>>
>>> int i;
>>> for (i = 0; i < numnodes; ++i) {
>>> (void) printf("gathered[%d].node = %d\n", i, gathered[i].node);
>>> }
>>>
>>> FREE(gathered);
>>> }
>>>
>>> At one point, this function printed the following:
>>> gathered[0].node = 2
>>> gathered[1].node = 3
>>> gathered[2].node = 2
>>> gathered[3].node = 3
>>> gathered[4].node = 4
>>> gathered[5].node = 5
>>>
>>> Can anyone suggest a place to start looking into why this might be
>>> happening? There is a section of the code that calls MPI_Comm_split, but I
>>> am not sure if that is related...
>>>
>>> Running on Ubuntu 11.10 and a summary of ompi_info:
>>> Package: Open MPI buildd@allspice Distribution
>>> Open MPI: 1.4.3
>>> Open MPI SVN revision: r23834
>>> Open MPI release date: Oct 05, 2010
>>> Open RTE: 1.4.3
>>> Open RTE SVN revision: r23834
>>> Open RTE release date: Oct 05, 2010
>>> OPAL: 1.4.3
>>> OPAL SVN revision: r23834
>>> OPAL release date: Oct 05, 2010
>>> Ident string: 1.4.3
>>> Prefix: /usr
>>> Configured architecture: x86_64-pc-linux-gnu
>>> Configure host: allspice
>>> Configured by: buildd
>>>
>>> Thanks!
>>> Brett
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>> --
>> | Teng Ma  Univ. of Tennessee |
>> | t...@cs.utk.eduKnoxville, TN |
>> | http://web.eecs.utk.edu/~tma/   |
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>


[OMPI users] MPI_Allgather problem

2011-12-09 Thread Brett Tully
Dear all,

I have not used OpenMPI much before, but am maintaining a large legacy
application. We noticed a bug to do with a call to MPI_Allgather as
summarised in this post to Stackoverflow:
http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results

In the process of looking further into the problem, I noticed that the
following function results in strange behaviour.

void test_all_gather() {

struct _TEST_ALL_GATHER {
int node;
};

int ierr, size, rank;
ierr = MPI_Comm_size(MPI_COMM_WORLD, );
ierr = MPI_Comm_rank(MPI_COMM_WORLD, );

struct _TEST_ALL_GATHER local;
struct _TEST_ALL_GATHER *gathered;

gathered = (struct _TEST_ALL_GATHER*) malloc(size * sizeof(*gathered));

local.node = rank;

MPI_Allgather(, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
MPI_COMM_WORLD);

int i;
for (i = 0; i < numnodes; ++i) {
(void) printf("gathered[%d].node = %d\n", i, gathered[i].node);
}

FREE(gathered);
}

At one point, this function printed the following:
gathered[0].node = 2
gathered[1].node = 3
gathered[2].node = 2
gathered[3].node = 3
gathered[4].node = 4
gathered[5].node = 5

Can anyone suggest a place to start looking into why this might be
happening? There is a section of the code that calls MPI_Comm_split, but I
am not sure if that is related...

Running on Ubuntu 11.10 and a summary of ompi_info:
Package: Open MPI buildd@allspice Distribution
Open MPI: 1.4.3
Open MPI SVN revision: r23834
Open MPI release date: Oct 05, 2010
Open RTE: 1.4.3
Open RTE SVN revision: r23834
Open RTE release date: Oct 05, 2010
OPAL: 1.4.3
OPAL SVN revision: r23834
OPAL release date: Oct 05, 2010
Ident string: 1.4.3
Prefix: /usr
Configured architecture: x86_64-pc-linux-gnu
Configure host: allspice
Configured by: buildd

Thanks!
Brett