[OMPI users] Bad behavior in Allgatherv when a count is 0

2007-12-13 Thread Moreland, Kenneth
I have found that on rare occasion Allgatherv fails to pass the data to
all processes.  Given some magical combination of receive counts and
displacements, one or more processes are missing some or all of some
arrays in their receive buffer.  A necessary, but not sufficient,
condition seems to be that one of the receive counts is 0.  Beyond that
I have not figured out any real pattern, but the example program listed
below demonstrates the failure.  I have tried it on OpenMPI version
1.2.3 and 1.2.4; it fails on both.  However, it works fine with version
1.1.2, so the problem must have been introduced since then.

-Ken

     Kenneth Moreland
***  Sandia National Laboratories
***  
*** *** ***  email: kmo...@sandia.gov
**  ***  **  phone: (505) 844-8919
***  fax:   (505) 845-0833



#include 

#include 
#include 

int main(int argc, char **argv)
{
  int rank;
  int size;
  MPI_Comm smallComm;
  int senddata[5], recvdata[100];
  int lengths[3], offsets[3];
  int i, j;

  MPI_Init(, );

  MPI_Comm_rank(MPI_COMM_WORLD, );
  MPI_Comm_size(MPI_COMM_WORLD, );
  if (size != 3)
{
printf("Need 3 processes.");
MPI_Abort(MPI_COMM_WORLD, 1);
}

  for (i = 0; i < 100; i++) recvdata[i] = -1;
  for (i = 0; i < 5; i++) senddata[i] = rank*10 + i;
  lengths[0] = 5;  lengths[1] = 0;  lengths[2] = 5;
  offsets[0] = 3;  offsets[1] = 9;  offsets[2] = 10;
  MPI_Allgatherv(senddata, lengths[rank], MPI_INT,
 recvdata, lengths, offsets, MPI_INT, MPI_COMM_WORLD);

  for (i = 0; i < size; i++)
{
for (j = 0; j < lengths[i]; j++)
  {
  if (recvdata[offsets[i]+j] != 10*i+j)
{
printf("%d: Got bad data from rank %d, index %d: %d\n", rank, i,
j,
   recvdata[offsets[i]+j]);
break;
}
  }
}

  MPI_Finalize();

  return 0;
}





Re: [OMPI users] Problems with GATHERV on one process

2007-12-13 Thread Moreland, Kenneth
Excellent.  Thanks.

-Ken

> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
On
> Behalf Of Jeff Squyres
> Sent: Thursday, December 13, 2007 6:02 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Problems with GATHERV on one process
> 
> Correct.  Here's the original commit that fixed the problem:
> 
>  https://svn.open-mpi.org/trac/ompi/changeset/16360
> 
> And the commit to the v1.2 branch:
> 
>  https://svn.open-mpi.org/trac/ompi/changeset/16519
> 
> 
> On Dec 12, 2007, at 2:43 PM, Moreland, Kenneth wrote:
> 
> > Thanks Tim.  I've since noticed similar problems with MPI_Allgatherv
> > and
> > MPI_Scatterv.  I'm guessing they are all related.  Do you happen to
> > know
> > if those are being fixed as well?
> >
> > -Ken
> >
> >> -Original Message-
> >> From: users-boun...@open-mpi.org
[mailto:users-boun...@open-mpi.org]
> > On
> >> Behalf Of Tim Mattox
> >> Sent: Tuesday, December 11, 2007 3:34 PM
> >> To: Open MPI Users
> >> Subject: Re: [OMPI users] Problems with GATHERV on one process
> >>
> >> Hello Ken,
> >> This is a known bug, which is fixed in the upcoming 1.2.5 release.
> >> We
> >> expect 1.2.5
> >> to come out very soon.  We should have a new release candidate for
> > 1.2.5
> >> posted
> >> by tomorrow.
> >>
> >> See these tickets about the bug if you care to look:
> >> https://svn.open-mpi.org/trac/ompi/ticket/1166
> >> https://svn.open-mpi.org/trac/ompi/ticket/1157
> >>
> >> On Dec 11, 2007 2:48 PM, Moreland, Kenneth <kmo...@sandia.gov>
wrote:
> >>> I recently ran into a problem with GATHERV while running some
> > randomized
> >>> tests on my MPI code.  The problem seems to occur when running
> >>> MPI_Gatherv with a displacement on a communicator with a single
> > process.
> >>> The code listed below exercises this errant behavior.  I have
tried
> > it
> >>> on OpenMPI 1.1.2 and 1.2.4.
> >>>
> >>> Granted, this is not a situation that one would normally run into
in
> > a
> >>> real application, but I just wanted to check to make sure I was
not
> >>> doing anything wrong.
> >>>
> >>> -Ken
> >>>
> >>>
> >>>
> >>> #include 
> >>>
> >>> #include 
> >>> #include 
> >>>
> >>> int main(int argc, char **argv)
> >>> {
> >>>  int rank;
> >>>  MPI_Comm smallComm;
> >>>  int senddata[4], recvdata[4], length, offset;
> >>>
> >>>  MPI_Init(, );
> >>>
> >>>  MPI_Comm_rank(MPI_COMM_WORLD, );
> >>>
> >>>  // Split up into communicators of size 1.
> >>>  MPI_Comm_split(MPI_COMM_WORLD, rank, 0, );
> >>>
> >>>  // Now try to do a gatherv.
> >>>  senddata[0] = 5; senddata[1] = 6; senddata[2] = 7; senddata[3] =
> > 8;
> >>>  recvdata[0] = 0; recvdata[1] = 0; recvdata[2] = 0; recvdata[3] =
> > 0;
> >>>  length = 3;
> >>>  offset = 1;
> >>>  MPI_Gatherv(senddata, length, MPI_INT,
> >>>  recvdata, , , MPI_INT, 0, smallComm);
> >>>  if (senddata[0] != recvdata[offset])
> >>>{
> >>>printf("%d: %d != %d?\n", rank, senddata[0], recvdata[offset]);
> >>>}
> >>>  else
> >>>{
> >>>printf("%d: Everything OK.\n", rank);
> >>>}
> >>>
> >>>  return 0;
> >>> }
> >>>
> >>>     Kenneth Moreland
> >>>***  Sandia National Laboratories
> >>> ***
> >>> *** *** ***  email: kmo...@sandia.gov
> >>> **  ***  **  phone: (505) 844-8919
> >>>***  fax:   (505) 845-0833
> >>>
> >>>
> >>>
> >>> ___
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>
> >> --
> >> Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
> >> tmat...@gmail.com || timat...@open-mpi.org
> >>I'm a bright... http://www.the-brights.net/
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> Cisco Systems
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] Problems with GATHERV on one process

2007-12-12 Thread Moreland, Kenneth
Thanks Tim.  I've since noticed similar problems with MPI_Allgatherv and
MPI_Scatterv.  I'm guessing they are all related.  Do you happen to know
if those are being fixed as well?

-Ken

> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
On
> Behalf Of Tim Mattox
> Sent: Tuesday, December 11, 2007 3:34 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] Problems with GATHERV on one process
> 
> Hello Ken,
> This is a known bug, which is fixed in the upcoming 1.2.5 release.  We
> expect 1.2.5
> to come out very soon.  We should have a new release candidate for
1.2.5
> posted
> by tomorrow.
> 
> See these tickets about the bug if you care to look:
> https://svn.open-mpi.org/trac/ompi/ticket/1166
> https://svn.open-mpi.org/trac/ompi/ticket/1157
> 
> On Dec 11, 2007 2:48 PM, Moreland, Kenneth <kmo...@sandia.gov> wrote:
> > I recently ran into a problem with GATHERV while running some
randomized
> > tests on my MPI code.  The problem seems to occur when running
> > MPI_Gatherv with a displacement on a communicator with a single
process.
> > The code listed below exercises this errant behavior.  I have tried
it
> > on OpenMPI 1.1.2 and 1.2.4.
> >
> > Granted, this is not a situation that one would normally run into in
a
> > real application, but I just wanted to check to make sure I was not
> > doing anything wrong.
> >
> > -Ken
> >
> >
> >
> > #include 
> >
> > #include 
> > #include 
> >
> > int main(int argc, char **argv)
> > {
> >   int rank;
> >   MPI_Comm smallComm;
> >   int senddata[4], recvdata[4], length, offset;
> >
> >   MPI_Init(, );
> >
> >   MPI_Comm_rank(MPI_COMM_WORLD, );
> >
> >   // Split up into communicators of size 1.
> >   MPI_Comm_split(MPI_COMM_WORLD, rank, 0, );
> >
> >   // Now try to do a gatherv.
> >   senddata[0] = 5; senddata[1] = 6; senddata[2] = 7; senddata[3] =
8;
> >   recvdata[0] = 0; recvdata[1] = 0; recvdata[2] = 0; recvdata[3] =
0;
> >   length = 3;
> >   offset = 1;
> >   MPI_Gatherv(senddata, length, MPI_INT,
> >   recvdata, , , MPI_INT, 0, smallComm);
> >   if (senddata[0] != recvdata[offset])
> > {
> > printf("%d: %d != %d?\n", rank, senddata[0], recvdata[offset]);
> > }
> >   else
> > {
> > printf("%d: Everything OK.\n", rank);
> > }
> >
> >   return 0;
> > }
> >
> >  Kenneth Moreland
> > ***  Sandia National Laboratories
> > ***
> > *** *** ***  email: kmo...@sandia.gov
> > **  ***  **  phone: (505) 844-8919
> > ***  fax:   (505) 845-0833
> >
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> 
> --
> Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
>  tmat...@gmail.com || timat...@open-mpi.org
> I'm a bright... http://www.the-brights.net/
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users





[OMPI users] Problems with GATHERV on one process

2007-12-11 Thread Moreland, Kenneth
I recently ran into a problem with GATHERV while running some randomized
tests on my MPI code.  The problem seems to occur when running
MPI_Gatherv with a displacement on a communicator with a single process.
The code listed below exercises this errant behavior.  I have tried it
on OpenMPI 1.1.2 and 1.2.4.

Granted, this is not a situation that one would normally run into in a
real application, but I just wanted to check to make sure I was not
doing anything wrong.

-Ken



#include 

#include 
#include 

int main(int argc, char **argv)
{
  int rank;
  MPI_Comm smallComm;
  int senddata[4], recvdata[4], length, offset;

  MPI_Init(, );

  MPI_Comm_rank(MPI_COMM_WORLD, );

  // Split up into communicators of size 1.
  MPI_Comm_split(MPI_COMM_WORLD, rank, 0, );

  // Now try to do a gatherv.
  senddata[0] = 5; senddata[1] = 6; senddata[2] = 7; senddata[3] = 8;
  recvdata[0] = 0; recvdata[1] = 0; recvdata[2] = 0; recvdata[3] = 0;
  length = 3;
  offset = 1;
  MPI_Gatherv(senddata, length, MPI_INT,
  recvdata, , , MPI_INT, 0, smallComm);
  if (senddata[0] != recvdata[offset])
{
printf("%d: %d != %d?\n", rank, senddata[0], recvdata[offset]);
}
  else
{
printf("%d: Everything OK.\n", rank);
}

  return 0;
}

     Kenneth Moreland
***  Sandia National Laboratories
***  
*** *** ***  email: kmo...@sandia.gov
**  ***  **  phone: (505) 844-8919
***  fax:   (505) 845-0833





Re: [OMPI users] MPI_File_set_view rejecting subarray views.

2007-07-23 Thread Moreland, Kenneth
Thanks, Brian.  That did the trick.

-Ken

> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
On
> Behalf Of Brian Barrett
> Sent: Thursday, July 19, 2007 3:39 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] MPI_File_set_view rejecting subarray views.
> 
> On Jul 19, 2007, at 3:24 PM, Moreland, Kenneth wrote:
> 
> > I've run into a problem with the File I/O with openmpi version
1.2.3.
> > It is not possible to call MPI_File_set_view with a datatype created
> > from a subarray.  Instead of letting me set a view of this type, it
> > gives an invalid datatype error.  I have attached a simple program
> > that
> > demonstrates the problem.  In particular, the following sequence of
> > function calls should be supported, but they are not.
> >
> >   MPI_Type_create_subarray(3, sizes, subsizes, starts,
> >MPI_ORDER_FORTRAN, MPI_BYTE, );
> >   MPI_File_set_view(fd, 20, MPI_BYTE, view, "native",
MPI_INFO_NULL);
> >
> > After poking around in the source code a bit, I discovered that the
> > I/O
> > implementation actually supports the subarray data type, but there
> > is a
> > check that is issuing an error before the underlying I/O layer
(ROMIO)
> > has a chance to handle the request.
> 
> You need to commit the datatype after calling
> MPI_Type_create_subarray.  If you add:
> 
>MPI_Type_commit();
> 
> after the Type_create, but before File_set_view, the code will run to
> completion.
> 
> Well, the code will then complain about a Barrier after MPI_Finalize
> due to an error in how we shut down when there are files that have
> been opened but not closed (you should also add a call to
> MPI_File_close after the set_view, but I'm assuming it's not there
> because this is a test code).  This is something we need to fix, but
> also signifies a user error.
> 
> 
> Brian
> 
> --
>Brian W. Barrett
>Networking Team, CCS-1
>Los Alamos National Laboratory
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users