Re: [OMPI users] issue with column type in language C

2012-08-20 Thread Jeff Squyres
It looks like you also posted this to Stackoverflow:

http://stackoverflow.com/questions/12031330/mpi-issue-with-column-type-in-language-c

It looks like it was answered, too.  :-)


On Aug 19, 2012, at 8:38 PM, Christian Perrier wrote:

> Hi,
> 
> Indeed I try to make the equivalent of this Fortran program in C. The Fortran 
> version works fine but I have problems now in C.
> 
> I can't get to exchange between 2 processes a single column.
> 
> Could you try please to compile and execute the following test code wich 
> simply sends a column from the rank=2 and received by rank=0 ( you need to 
> execute it with nproc=4) :
> 
> --
> 
> #include 
> #include 
> #include 
> #include "mpi.h"
> 
> int main(int argc, char *argv[]) 
> {
>   /* size of the discretization */
> 
>   double** x;
>   double** x0;
>   int bonk1, bonk2;  
>   int i,j,k,l;
>   int nproc;
>   int ndims; 
>   int S=0, E=1, N=2, W=3;
>   int NeighBor[4];
>   int xcell, ycell, size_tot_x, size_tot_y;
>   int *xs,*ys,*xe,*ye;
>   int size_x = 4;
>   int size_y = 4;
>   int me;
>   int x_domains=2;
>   int y_domains=2;
>   int flag = 1;  
>   MPI_Comm comm, comm2d;
>   int dims[2];
>   int periods[2];
>   int reorganisation = 0;
>   int row;
>   MPI_Datatype column_type;
>   MPI_Status status;
>   
>   
>   size_tot_x=size_x+2*x_domains+2;
>   size_tot_y=size_y+2*y_domains+2;
>   
>   xcell=(size_x/x_domains);
>   ycell=(size_y/y_domains);
> 
>   MPI_Init(, );
>   comm = MPI_COMM_WORLD;
>   MPI_Comm_size(comm,);
>   MPI_Comm_rank(comm,);
> 
>   x = malloc(size_tot_y*sizeof(double*));
>   x0 = malloc(size_tot_y*sizeof(double*));
> 
> 
>   for(j=0;j<=size_tot_y-1;j++) {
> x[j] = malloc(size_tot_x*sizeof(double));
> x0[j] = malloc(size_tot_x*sizeof(double));
>   }
> 
>   xs = malloc(nproc*sizeof(int));
>   xe = malloc(nproc*sizeof(int));
>   ys = malloc(nproc*sizeof(int));
>   ye = malloc(nproc*sizeof(int));
> 
>   /* Create 2D cartesian grid */
>   periods[0] = 0;
>   periods[1] = 0;
> 
>   ndims = 2;
>   dims[0]=x_domains;
>   dims[1]=y_domains;
> 
>   MPI_Cart_create(comm, ndims, dims, periods, reorganisation, );
> 
>   /* Identify neighbors */
>   NeighBor[0] = MPI_PROC_NULL;
>   NeighBor[1] = MPI_PROC_NULL;
>   NeighBor[2] = MPI_PROC_NULL;
>   NeighBor[3] = MPI_PROC_NULL;
> 
>   /* Left/West and right/Est neigbors */
>   MPI_Cart_shift(comm2d,0,1,[W],[E]);
>   /* Bottom/South and Upper/North neigbors */
>   MPI_Cart_shift(comm2d,1,1,[S],[N]);
> 
>   /* coordinates of current cell with me rank */
> 
>   xcell=(size_x/x_domains);
>   ycell=(size_y/y_domains);
> 
>   ys[me]=(y_domains-me%(y_domains)-1)*(ycell+2)+2;
>   ye[me]=ys[me]+ycell-1;
> 
>   for(i=0;i<=y_domains-1;i++) 
>   {xs[i]=2;}
>   
>   for(i=0;i<=y_domains-1;i++) 
>   {xe[i]=xs[i]+xcell-1;}
> 
>   for(i=1;i<=(x_domains-1);i++)
>  { for(j=0;j<=(y_domains-1);j++) 
>   {
>xs[i*y_domains+j]=xs[(i-1)*y_domains+j]+xcell+2;
>xe[i*y_domains+j]=xs[i*y_domains+j]+xcell-1;
>   }
>  }
>   
>   for(i=0;i<=size_tot_y-1;i++)
>   { for(j=0;j<=size_tot_x-1;j++)
> { x0[i][j]= i+j;
> }
>   }
>   
>   /*  Create column data type to communicate with South and North 
> neighbors */
> 
> 
> 
>   MPI_Type_vector( ycell, 1, size_tot_x, MPI_DOUBLE, _type);  
>   MPI_Type_commit(_type);
>  
>if(me==2) {
>printf("Before Send - Process 2 subarray\n");
> for(i=ys[me]-1;i<=ye[me]+1;i++)
> { for(j=xs[me]-1;j<=xe[me]+1;j++)
>   { printf("%f ",x0[i][j]);
>   }
>   printf("\n");
> }
> printf("\n");
> 
>
>
>MPI_Send(&(x0[ys[2]][xs[2]]), 1, column_type,  0, flag, comm2d );
>}
> 
>  if(me==0) {
>  
>  MPI_Recv(&(x0[ys[0]][xe[0]]), 1, column_type,  2, flag, comm2d , 
> );
>  printf("After Receive - Process 0 subarray\n");
> for(i=ys[me]-1;i<=ye[me]+1;i++)
> { for(j=xs[me]-1;j<=xe[me]+1;j++)
>   { printf("%f ",x0[i][j]);
>   }
>   printf("\n");
> }
> printf("\n");
> 
> MPI_Get_count(,column_type,);
> MPI_Get_elements(,MPI_DOUBLE,);
> printf("got %d elements of type column_type\n",bonk1);
> printf("which contained %d elements of type MPI_DOUBLE\n",bonk2);
> printf("\n");
>  

Re: [OMPI users] MPI/FORTRAN on a cluster system

2012-08-20 Thread Gus Correa

On 08/20/2012 11:39 AM, Noam Bernstein wrote:

On Aug 20, 2012, at 11:12 AM, David Warren wrote:


The biggest issue you may have is that gnu fortran does not support all the 
fortran constructs that all the others do. Most fortrans have supported the 
standard plus the DEC extentions. Gnu fortran does not quite get all the 
standards.Intel fortran does support them all, and I believe that portland 
group and absoft may also.

In my experience most recent versions of gfortran (at least 4.5, maybe earlier)
support  about as large a set of standards as anything else (with the exception 
of a
few  F2003 things, but then again, (almost) no one supports those 
comprehensively).
Definitely all of F95 + approved extensions.  Non-standard extensions (DEC,
Cray Pointers pre F2003) are another matter - I don't know about those.


Noam


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Hi Bill

I think gfortran supports 'Cray pointers'.
From the quite old gfortran 4.1.2 man page:

   -fcray-pointer
   Enables the Cray pointer extension, which provides a C-like 
pointer.


My recollection is that it also supports some DEC extensions,
particularly those related to 'Cray pointers' [LOC, etc], but I may be 
wrong.


If the code is F77 with some tidbits of C++, you probably don't need to 
worry about gfortran having all the

F90/95/3003/2008 features.
You could try to simply adjust your Makefile to point to the OpenMPI 
compiler wrappers,

i.e., F77=mpif77 [or FC depending on the Makefile]
and CXX=mpicxx [or whatever macro/variable your Makefile uses the C++ 
compiler].
Using the compiler wrappers you don't need to specify library or include 
directories, and life becomes much easier.

If the Makefile somehow forces you to specify these things,
find out what libraries and includes you really need by looking at the 
output of these commands:

mpif77 --show
mpicxx --show
You could try this just for kicks,  It may work out of the box, as Jeff 
suggested, if the program is really portable.


You may need to use full paths [or tweak with your PATH to point right] 
to the OpenMPI compiler wrappers,

in case there are various different MPI flavors installed in your cluster.
Likewise when you launch the program with mpiexec, make sure it points 
to the OpenMPI flavor you want.

Mixing different MPIs is a common source of frustration.

Make sure your OpenMPI was built with the underlying Gnu compilers, and 
that the F77 and C++ interface were built

[you must have the mpif77 and mpicxx wrappers at least].
Otherwise, it is easy to build OpenMPI from source, with support for 
your cluster's bells and whistles

[e.g. Infinband/OFED, Torque or SGE resource managers].

I hope this helps,
Gus Correa

On 08/20/2012 10:02 AM, Jeff Squyres wrote:

On Aug 19, 2012, at 12:11 PM, Bill Mulberry wrote:


I have a large program written in FORTRAN 77 with a couple of routines
written in C++.  It has MPI commands built into it to run on a large scale
multiprocessor IBM systems.  I am now having the task of transferring this
program over to a cluster system.  Both the multiprocessor and cluster
system has linux hosted on them.  The Cluster system has GNU FORTRAN and GNU
C compilers on it.  I am told the Cluster has openmpi.  I am wondering if
anybody out there has had to do the same task and if so what I can expect
from this.  Will I be expected to make some big changes, etc.?  Any advice
will be appreciated.

MPI and Fortran are generally portable, meaning that if you wrote a correct MPI 
Fortran application, it should be immediately portable to a new system.

That being said, many applications are accidentally/inadvertently not correct.  
For example, when you try to compile your application on a Linux cluster with 
Open MPI, you'll find that you accidentally used a Fortran construct that was 
specific to IBM's Fortran compiler and is not portable.  Similarly, when you 
run the application, you may find that inadvertently you used an implicit 
assumption for IBM's MPI implementation that isn't true for Open MPI.

...or you may find that everything just works, and you can raise a toast to the 
portability gods.

I expect that your build / compile / link procedure may change a bit from the old system to the new 
system.  In Open MPI, you should be able to use "mpif77" and/or "mpif90" to 
compile and link everything.  No further MPI-related flags are necessary (no need to -I to specify 
where mpif.h is located, no need to -lmpi, ...etc.).





Re: [OMPI users] MPI/FORTRAN on a cluster system

2012-08-20 Thread Noam Bernstein

On Aug 20, 2012, at 11:12 AM, David Warren wrote:

> The biggest issue you may have is that gnu fortran does not support all the 
> fortran constructs that all the others do. Most fortrans have supported the 
> standard plus the DEC extentions. Gnu fortran does not quite get all the 
> standards.Intel fortran does support them all, and I believe that portland 
> group and absoft may also.

In my experience most recent versions of gfortran (at least 4.5, maybe earlier)
support  about as large a set of standards as anything else (with the exception 
of a 
few  F2003 things, but then again, (almost) no one supports those 
comprehensively).
Definitely all of F95 + approved extensions.  Non-standard extensions (DEC, 
Cray Pointers pre F2003) are another matter - I don't know about those.


Noam




Re: [OMPI users] MPI/FORTRAN on a cluster system

2012-08-20 Thread David Warren
The biggest issue you may have is that gnu fortran does not support all the
fortran constructs that all the others do. Most fortrans have supported the
standard plus the DEC extentions. Gnu fortran does not quite get all the
standards.Intel fortran does support them all, and I believe that portland
group and absoft may also.

On Sun, Aug 19, 2012 at 9:11 AM, Bill Mulberry wrote:

>
> Hi
>
> I have a large program written in FORTRAN 77 with a couple of routines
> written in C++.  It has MPI commands built into it to run on a large scale
> multiprocessor IBM systems.  I am now having the task of transferring this
> program over to a cluster system.  Both the multiprocessor and cluster
> system has linux hosted on them.  The Cluster system has GNU FORTRAN and
> GNU
> C compilers on it.  I am told the Cluster has openmpi.  I am wondering if
> anybody out there has had to do the same task and if so what I can expect
> from this.  Will I be expected to make some big changes, etc.?  Any advice
> will be appreciated.
>
> Thanks.
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
David Warren
University of Washington
206 543-0954


Re: [OMPI users] MPI_Irecv: Confusion with <> inputy parameter

2012-08-20 Thread Jeff Squyres
On Aug 20, 2012, at 5:51 AM, devendra rai wrote:

> Is it the number of elements that have been received *thus far* in the buffer?

No.

> Or is it the number of elements that are expected to be received, and hence 
> MPI_Test will tell me that the receive is not complete untill "count" number 
> of elements have not been received?

Yes.

> Here's the reason why I have a problem (and I think I may be completely 
> stupid here, I'd appreciate your patience):
[snip]
> Does anyone see what could be going wrong?

Double check that the (sender_rank, tag, communicator) tuple that you issued in 
the MPI_Irecv matches the (rank, tag, communicator) tuple from the sender (tag 
and communicator are arguments on the sending side, and rank is the rank of the 
sender in that communicator).

When receives block like this without completing like this, it usually means a 
mismatch between the tuples.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] MPI/FORTRAN on a cluster system

2012-08-20 Thread Jeff Squyres
On Aug 19, 2012, at 12:11 PM, Bill Mulberry wrote:

> I have a large program written in FORTRAN 77 with a couple of routines
> written in C++.  It has MPI commands built into it to run on a large scale
> multiprocessor IBM systems.  I am now having the task of transferring this
> program over to a cluster system.  Both the multiprocessor and cluster
> system has linux hosted on them.  The Cluster system has GNU FORTRAN and GNU
> C compilers on it.  I am told the Cluster has openmpi.  I am wondering if
> anybody out there has had to do the same task and if so what I can expect
> from this.  Will I be expected to make some big changes, etc.?  Any advice
> will be appreciated.


MPI and Fortran are generally portable, meaning that if you wrote a correct MPI 
Fortran application, it should be immediately portable to a new system.

That being said, many applications are accidentally/inadvertently not correct.  
For example, when you try to compile your application on a Linux cluster with 
Open MPI, you'll find that you accidentally used a Fortran construct that was 
specific to IBM's Fortran compiler and is not portable.  Similarly, when you 
run the application, you may find that inadvertently you used an implicit 
assumption for IBM's MPI implementation that isn't true for Open MPI.

...or you may find that everything just works, and you can raise a toast to the 
portability gods.

I expect that your build / compile / link procedure may change a bit from the 
old system to the new system.  In Open MPI, you should be able to use "mpif77" 
and/or "mpif90" to compile and link everything.  No further MPI-related flags 
are necessary (no need to -I to specify where mpif.h is located, no need to 
-lmpi, ...etc.).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] "Connection to lifeline lost" when developing a new rsh agent

2012-08-20 Thread Ralph Castain
Just to be clear: what you are launching is an orted daemon, not your 
application process. Once the daemons are running, then we use them to launch 
the actual application process. So the issue here is with starting the daemons 
themselves. You might try adding "-mca plm_base_verbose 5 --debug-daemons" to 
watch the debug output from the daemons as they are launched.

The lifeline is a socket connection between the daemons and mpirun. For some 
reason, the socket from your remote daemon back to mpirun is being closed, 
which the remote daemon interprets as "lifeline lost" and terminates itself. 
You could try setting the verbosity on the OOB to get the debug output from it 
(see "ompi_info --param oob tcp" for the settings), though it's likely to just 
tell you that the socket closed.


On Aug 20, 2012, at 5:11 AM, Yann RADENAC  wrote:

> 
> Hi,
> 
> I'm developing MPI support for XtreemOS (www.xtreemos.eu) so that an MPI 
> program is managed as a single XtreemOS job.
> To manage all processes as a single XtreemOS job, I've developed the program 
> xos-createProcess that plays the role of the rsh agent (replacing ssh/rsh) to 
> start a process on a remote machine that is part of the ones reserved for the 
> current job.
> 
> I'm running a simple hello world MPI program where each processes sends a 
> string to the process 0 that itself prints them on standard output.
> 
> When using OpenMPI with ssh, this program works perfectly on several machines.
> 
> When using OpenMPI with my launcher xos-createProcess, it works with an MPI 
> program of 2 processes on 2 different machines.
> 
> However I cannot pass through the following error that happens when running 
> an MPI program of 3 processes on 3 different machines (or any n processes on 
> n different machines with n >= 3).
> 
> A process started by xos-createProcess on a remote machine ends with the 
> following error:
> 
> [paradent-5.rennes.grid5000.fr:08191] [[50627,0],2] routed:binomial: 
> Connection to lifeline [[50627,0],0] lost
> 
> But, process 0 is still running! lifeline should not have been lost!
> Actually, process 0 is still waiting for remote process to terminate (checked 
> with gdb, the initial process is calling libc's poll()).
> 
> 
> The run command is:
> 
> -bash -c '(mpirun  --mca orte_rsh_agent xos-createProcess 
> --leave-session-attached   -np 2   -host `xreservation -a $XOS_RSVID` 
> mpi/hello_world_MPI  < /dev/null > mpirun.out) >& mpirun.err'
> 
> Same problem with or without option --leave-session-attached.
> 
> 
> 
> So, how is the lifeline implemented? why does it work with 2 processes but 
> start failing when using 3 or more processes?
> 
> 
> I'm using Open MPI 1.6.
> 
> 
> Thanks for your help.
> 
> -- 
> Yann Radenac
> Research Engineer, INRIA
> Myriads research team, INRIA Rennes - Bretagne Atlantique
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] "Connection to lifeline lost" when developing a new rsh agent

2012-08-20 Thread Yann RADENAC


Hi,

I'm developing MPI support for XtreemOS (www.xtreemos.eu) so that an MPI 
program is managed as a single XtreemOS job.
To manage all processes as a single XtreemOS job, I've developed the 
program xos-createProcess that plays the role of the rsh agent 
(replacing ssh/rsh) to start a process on a remote machine that is part 
of the ones reserved for the current job.


I'm running a simple hello world MPI program where each processes sends 
a string to the process 0 that itself prints them on standard output.


When using OpenMPI with ssh, this program works perfectly on several 
machines.


When using OpenMPI with my launcher xos-createProcess, it works with an 
MPI program of 2 processes on 2 different machines.


However I cannot pass through the following error that happens when 
running an MPI program of 3 processes on 3 different machines (or any n 
processes on n different machines with n >= 3).


A process started by xos-createProcess on a remote machine ends with the 
following error:


[paradent-5.rennes.grid5000.fr:08191] [[50627,0],2] routed:binomial: 
Connection to lifeline [[50627,0],0] lost


But, process 0 is still running! lifeline should not have been lost!
Actually, process 0 is still waiting for remote process to terminate 
(checked with gdb, the initial process is calling libc's poll()).



The run command is:

-bash -c '(mpirun  --mca orte_rsh_agent xos-createProcess 
--leave-session-attached   -np 2   -host `xreservation -a $XOS_RSVID` 
mpi/hello_world_MPI  < /dev/null > mpirun.out) >& mpirun.err'


Same problem with or without option --leave-session-attached.



So, how is the lifeline implemented? why does it work with 2 processes 
but start failing when using 3 or more processes?



I'm using Open MPI 1.6.


Thanks for your help.

--
Yann Radenac
Research Engineer, INRIA
Myriads research team, INRIA Rennes - Bretagne Atlantique



Re: [OMPI users] hangs of MPI_WIN_LOCK/UNLOCK (gfortran)

2012-08-20 Thread EatDirt

On 16/08/12 20:35, eatdirt wrote:

Hi there,
I have attached a little piece of code which summarizes a "bug?" that
annoys me ultimately. Issuing various calls to MPI_WIN_LOCK/UNLOCK seems
to hang some processes until a MPI_BARRIER is encountered!??



ping?

I am new to this mailing list, may someone just advice me what I should 
do with this issue. Shall I open a ticket, or post it to the devel list?


Thanks in advance,

Cheers,
Chris.



[OMPI users] MPI_Irecv: Confusion with <> inputy parameter

2012-08-20 Thread devendra rai
Hello Community,

I have a problem understanding the API for MPI_Irecv:

int MPI_Irecv( void *buf, int count, MPI_Datatype datatype, int source, int 
tag, MPI_Comm comm, MPI_Request *request );  
Parameters
buf 
[in] initial address of receive buffer (choice) 
count 
[in] number of elements in receive buffer (integer) 
datatype 
[in] datatype of each receive buffer element (handle) 
source 
[in] rank of source (integer) 
tag 
[in] message tag (integer) 
comm 
[in] communicator (handle) 
request 
[out] communication request (handle) 

What exactly does "count" mean here? 

Is it the number of elements that have been received *thus far* in the buffer?
Or is it the number of elements that are expected to be received, and hence 
MPI_Test will tell me that the receive is not complete untill "count" number of 
elements have not been received?

Here's the reason why I have a problem (and I think I may be completely stupid 
here, I'd appreciate your patience):

I have node 1 transmit data to node 2, in a pack of 80 bytes:

Mon Aug 20 11:09:04 2012[1,1]:    Finished transmitting 80 bytes to 2 
node with Tag 1000

On the receiving end:

MPI_Irecv(
    (void*)this->receivebuffer,/* the receive buffer */
    this->receive_packetsize,  /* 80 */
    MPI_BYTE,   /* The data type expected */
    this->transmittingnode,    /* The node from which to 
receive */
    this->uniquetag,   /* Tag */
    MPI_COMM_WORLD, /* Communicator */
    _request  /* request handle */
    );

I see that node 1 tells me that the transmit was successful using the MPI_Test:

MPI_Test(_request, , _status);

which returns me "true" on Node 1 (sender).

However, I am never able to receive the payload on Node 2:

Mon Aug 20 11:09:04 2012[1,2]:Attemting to receive payload from node 1 
with tag 1000, receivepacketsize: 80


I am using MPI_Issend to send payload between node 1 and node 2.

Does anyone see what could be going wrong?

Thanks a lot

Devendra