[OMPI users] Flow control in OMPI
Hi there, I am facing some problems in an Open MPI application. Part of the application is composed by a sender and a receiver. The problem is that the sender is so much faster than the receiver, what causes the receiver's memory to be completely used, aborting the application. I would like to know if there is a flow control scheme implemented in open mpi or if this issue have to be treated at the user application's layer. If exists, how it works and how can I use it in my application? I did some research about this subject, but I did not find a conclusive explanation. Thanks a lot.
[OMPI users] ConnectX with InfiniHost IB HCAs
Hi all, it is more hardware or system configuration question but I hope people in this list have an experience. I have just added new ConnectX IB card to cluster with InfiniHost cards. And no mpi programs work. Even ofed's tests do not work. For example ib_send_*, ib_write_* just segfault on the host with ConnectX card and still wait on the hosts with InfiniHost card. rdma_lat/bw tests segfault too but with messages on the InfiniHost card hosts like this: server read: No such file or directory 5924:pp_server_exch_dest: 0/45 Couldn't read remote address pp_read_keys: No such file or directory Couldn't read remote address Other diagnostic tools like ibv_device, ibchecknet, ibstat, ibstatus... show no errors and show ConnectX card in system. All modules (mlx4_*, rdma_*) loaded. IPoIB configured. openibd, opensmd services started without errors. 08:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev a0) OFED is 1.3.1, CentOS 5.2. ibstat CA 'mlx4_0' CA type: MT26428 Number of ports: 1 Firmware version: 2.7.0 Hardware version: a0 Node GUID: 0x0002c903000cad14 System image GUID: 0x0002c903000cad17 Port 1: State: Active Physical state: LinkUp Rate: 20 Base lid: 60 LMC: 0 SM lid: 60 Capability mask: 0x0251086a Port GUID: 0x0002c903000cad15 Where is a problem? Thanx in advance, Egor.
Re: [OMPI users] Related to project ideas in OpenMPI
Don't know which SSI project you are referring to... I only know the OpenSSI project, and I was one of the first who subscribed to its mailing list (since 2001). http://openssi.org/cgi-bin/view?page=openssi.html I don't think those OpenSSI clusters are designed for tens of thousands of nodes, and not sure if it scales well to even a thousand nodes -- so IMO they have limited use for HPC clusters. Rayson On Thu, Aug 25, 2011 at 11:45 AM, Durga Choudhurywrote: > Also, in 2005 there was an attempt to implement SSI (Single System > Image) functionality to the then-current 2.6.10 kernel. The proposal > was very detailed and covered most of the bases of task creation, PID > allocation etc across a loosely tied cluster (without using fancy > hardware such as RDMA fabric). Anybody knows if it was ever > implemented? Any pointers in this direction? > > Thanks and regards > Durga > > > On Thu, Aug 25, 2011 at 11:08 AM, Rayson Ho wrote: >> Srinivas, >> >> There's also Kernel-Level Checkpointing vs. User-Level Checkpointing - >> if you can checkpoint an MPI task and restart it on a new node, then >> this is also "process migration". >> >> Of course, doing a checkpoint & restart can be slower than pure >> in-kernel process migration, but the advantage is that you don't need >> any kernel support, and can in fact do all of it in user-space. >> >> Rayson >> >> >> On Thu, Aug 25, 2011 at 10:26 AM, Ralph Castain wrote: >>> It also depends on what part of migration interests you - are you wanting >>> to look at the MPI part of the problem (reconnecting MPI transports, >>> ensuring messages are not lost, etc.) or the RTE part of the problem (where >>> to restart processes, detecting failures, etc.)? >>> >>> >>> On Aug 24, 2011, at 7:04 AM, Jeff Squyres wrote: >>> Be aware that process migration is a pretty complex issue. Josh is probably the best one to answer your question directly, but he's out today. On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote: > I am final year grad student looking for my final year project in > OpenMPI.We are group of 4 students. > I wanted to know about the "Process Migration" process of MPI processes > in OpenMPI. > Can anyone suggest me any ideas for project related to process migration > in OenMPI or other topics in Systems. > > > > regards, > Srinivas Kundaram > srinu1...@gmail.com > +91-8149399160 > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> >> -- >> Rayson >> >> == >> Open Grid Scheduler - The Official Open Source Grid Engine >> http://gridscheduler.sourceforge.net/ >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Rayson == Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/
Re: [OMPI users] Related to project ideas in OpenMPI
Is anything done at the kernel level portable (e.g. to Windows)? It *can* be, in principle at least (by putting appropriate #ifdef's in the code), but I am wondering if it is in reality. Also, in 2005 there was an attempt to implement SSI (Single System Image) functionality to the then-current 2.6.10 kernel. The proposal was very detailed and covered most of the bases of task creation, PID allocation etc across a loosely tied cluster (without using fancy hardware such as RDMA fabric). Anybody knows if it was ever implemented? Any pointers in this direction? Thanks and regards Durga On Thu, Aug 25, 2011 at 11:08 AM, Rayson Howrote: > Srinivas, > > There's also Kernel-Level Checkpointing vs. User-Level Checkpointing - > if you can checkpoint an MPI task and restart it on a new node, then > this is also "process migration". > > Of course, doing a checkpoint & restart can be slower than pure > in-kernel process migration, but the advantage is that you don't need > any kernel support, and can in fact do all of it in user-space. > > Rayson > > > On Thu, Aug 25, 2011 at 10:26 AM, Ralph Castain wrote: >> It also depends on what part of migration interests you - are you wanting to >> look at the MPI part of the problem (reconnecting MPI transports, ensuring >> messages are not lost, etc.) or the RTE part of the problem (where to >> restart processes, detecting failures, etc.)? >> >> >> On Aug 24, 2011, at 7:04 AM, Jeff Squyres wrote: >> >>> Be aware that process migration is a pretty complex issue. >>> >>> Josh is probably the best one to answer your question directly, but he's >>> out today. >>> >>> >>> On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote: >>> I am final year grad student looking for my final year project in OpenMPI.We are group of 4 students. I wanted to know about the "Process Migration" process of MPI processes in OpenMPI. Can anyone suggest me any ideas for project related to process migration in OenMPI or other topics in Systems. regards, Srinivas Kundaram srinu1...@gmail.com +91-8149399160 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > > -- > Rayson > > == > Open Grid Scheduler - The Official Open Source Grid Engine > http://gridscheduler.sourceforge.net/ > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Related to project ideas in OpenMPI
Srinivas, There's also Kernel-Level Checkpointing vs. User-Level Checkpointing - if you can checkpoint an MPI task and restart it on a new node, then this is also "process migration". Of course, doing a checkpoint & restart can be slower than pure in-kernel process migration, but the advantage is that you don't need any kernel support, and can in fact do all of it in user-space. Rayson On Thu, Aug 25, 2011 at 10:26 AM, Ralph Castainwrote: > It also depends on what part of migration interests you - are you wanting to > look at the MPI part of the problem (reconnecting MPI transports, ensuring > messages are not lost, etc.) or the RTE part of the problem (where to restart > processes, detecting failures, etc.)? > > > On Aug 24, 2011, at 7:04 AM, Jeff Squyres wrote: > >> Be aware that process migration is a pretty complex issue. >> >> Josh is probably the best one to answer your question directly, but he's out >> today. >> >> >> On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote: >> >>> I am final year grad student looking for my final year project in >>> OpenMPI.We are group of 4 students. >>> I wanted to know about the "Process Migration" process of MPI processes in >>> OpenMPI. >>> Can anyone suggest me any ideas for project related to process migration in >>> OenMPI or other topics in Systems. >>> >>> >>> >>> regards, >>> Srinivas Kundaram >>> srinu1...@gmail.com >>> +91-8149399160 >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Rayson == Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/
Re: [OMPI users] Related to project ideas in OpenMPI
It also depends on what part of migration interests you - are you wanting to look at the MPI part of the problem (reconnecting MPI transports, ensuring messages are not lost, etc.) or the RTE part of the problem (where to restart processes, detecting failures, etc.)? On Aug 24, 2011, at 7:04 AM, Jeff Squyres wrote: > Be aware that process migration is a pretty complex issue. > > Josh is probably the best one to answer your question directly, but he's out > today. > > > On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote: > >> I am final year grad student looking for my final year project in OpenMPI.We >> are group of 4 students. >> I wanted to know about the "Process Migration" process of MPI processes in >> OpenMPI. >> Can anyone suggest me any ideas for project related to process migration in >> OenMPI or other topics in Systems. >> >> >> >> regards, >> Srinivas Kundaram >> srinu1...@gmail.com >> +91-8149399160 >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] problems with parallel IO solved!
Hi Folks, the problem could be solved be using the same compiler settings for writung out and reading in. Writing out was done with -trace (Intel compiler), and the read in withou any supplemental options. Best wishes Alexander > Hi Folks, > > I have problems to retrieve my data thatI have written out with MPI > parallel IO. Ins tests everything works fine, but within an huger > environment, the data read in differ from those written out. > > Here the setup of my experiment: > > # the writer # > program parallel_io > > use mpi > > implicit none > > integer,parameter :: nx=1,ny=300,nz=256,nv=12 > integer ierr, i, myrank, comm_size, BUFSIZE, thefile, intsize > > parameter (BUFSIZE=1075200) > > real,dimension(nv+2,nx,ny,nz) :: v1 > > integer (kind=MPI_OFFSET_KIND) disp > integer ix, iy, iz, nn, counter > > character(6) cname > call mpi_init(ierr) > call mpi_comm_size(mpi_comm_world, comm_size, ierr) > call mpi_comm_rank(mpi_comm_world, myrank,ierr) > > counter=0 > do ix = 1,nz > do iy=1,ny > do iz=1,nx >do nn=1,nv+2 > v1(nn,ix,iy,iz) = counter*(myrank+20)/200. > counter=counter+1 >end do > end do > end do > end do > > call mpi_barrier(mpi_comm_world,ierr) > > call mpi_type_extent(mpi_real, intsize, ierr) > call mpi_file_open(mpi_comm_world, 'testfile', MPI_MODE_WRONLY + > MPI_MODE_CREATE, mpi_info_null, thefile, ierr) > call mpi_type_size(MPI_INTEGER, intsize, ierr) > > disp = myrank * BUFSIZE * intsize > > ! call mpi_file_set_view(thefile, disp, MPI_INTEGER, MPI_INTEGER, > 'native', mpi_info_null, ierr) > call mpi_file_write_at(thefile, disp, v1(1,1,1,1), BUFSIZE, MPI_REAL, > mpi_status_ignore, ierr) > > call mpi_file_close(thefile, ierr) > > ! print the data read in... > > open (12, file='out000.dat-parallel-write-0') > > if (myrank.eq.0) then > write (12,'(i4,e18.8)') myrank, > v1(nn,ix,iy,iz),nn=1,nv+2),ix=1,nx),iy=1,ny), iz=1,nz) > endif > > close (12) > > call mpi_finalize(ierr) > > > end program parallel_io > > ### > > and the reader... > > reader### > program parallel_read_io > > use mpi > > implicit none > integer,parameter :: nx=1,ny=300,nz=256,nv=12 > > integer ierr, i, myrank, comm_size, BUFSIZE, thefile, realsize > parameter (BUFSIZE=1075200) > > real,dimension(nv+2,nx,ny,nz) :: v1 > > integer (kind=MPI_OFFSET_KIND) disp > > integer ix, iy, iz, nn > > call mpi_init(ierr) > call mpi_comm_size(mpi_comm_world, comm_size, ierr) > call mpi_comm_rank(mpi_comm_world, myrank,ierr) > > ! do i=0,BUFSIZE > ! buf(i) = myrank*BUFSIZE + i > ! end do > > call mpi_type_extent(mpi_integer, realsize, ierr) > call mpi_file_open(mpi_comm_world, 'testfile', MPI_MODE_RDONLY, > mpi_info_null, thefile, ierr) > call mpi_type_size(MPI_REAL, realsize, ierr) > > disp = myrank * BUFSIZE * realsize > print*, 'myrank: ', myrank, ' disp: ', disp, ' realsize: ', realsize > > ! call mpi_file_set_view(thefile, disp, MPI_INTEGER, MPI_INTEGER, > 'native', mpi_info_null, ierr) > ! call mpi_file_read(thefile, buf, BUFSIZE, MPI_INTEGER, > mpi_status_ignore, ierr) > > call mpi_file_read_at(thefile, disp, v1(1,1,1,1), BUFSIZE, MPI_REAL, > mpi_status_ignore, ierr) > > call mpi_file_close(thefile, ierr) > > call mpi_barrier(mpi_comm_world,ierr) > > ! print the data read in... > > open (12, file='out000.dat-parallel-read-0') > > if (myrank.eq.0) then > write (12,'(i4,e18.8)') myrank, > v1(nn,ix,iy,iz),nn=1,nv+2),ix=1,nx),iy=1,ny), iz=1,nz) > endif > > close (12) > > call mpi_finalize(ierr) > > > end program parallel_read_io > ### > > Here everything is working fine. However integrating this into a huger > program, I get totally different data written out and read in. > > The setup up is the same as in the experiment, but I need some more > memory... > > What might be the reason for such problems, and if I have an MPI error, how > can I estimate this within a fortan program. I have only found examples for > the error handling of MPI errors in C or C++. I would need an example for > C. > > So any hints or ideas? > > Best wishes > > Alexander > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] problems with parallel IO
Hi Folks, I have problems to retrieve my data thatI have written out with MPI parallel IO. Ins tests everything works fine, but within an huger environment, the data read in differ from those written out. Here the setup of my experiment: # the writer # program parallel_io use mpi implicit none integer,parameter :: nx=1,ny=300,nz=256,nv=12 integer ierr, i, myrank, comm_size, BUFSIZE, thefile, intsize parameter (BUFSIZE=1075200) real,dimension(nv+2,nx,ny,nz) :: v1 integer (kind=MPI_OFFSET_KIND) disp integer ix, iy, iz, nn, counter character(6) cname call mpi_init(ierr) call mpi_comm_size(mpi_comm_world, comm_size, ierr) call mpi_comm_rank(mpi_comm_world, myrank,ierr) counter=0 do ix = 1,nz do iy=1,ny do iz=1,nx do nn=1,nv+2 v1(nn,ix,iy,iz) = counter*(myrank+20)/200. counter=counter+1 end do end do end do end do call mpi_barrier(mpi_comm_world,ierr) call mpi_type_extent(mpi_real, intsize, ierr) call mpi_file_open(mpi_comm_world, 'testfile', MPI_MODE_WRONLY + MPI_MODE_CREATE, mpi_info_null, thefile, ierr) call mpi_type_size(MPI_INTEGER, intsize, ierr) disp = myrank * BUFSIZE * intsize ! call mpi_file_set_view(thefile, disp, MPI_INTEGER, MPI_INTEGER, 'native', mpi_info_null, ierr) call mpi_file_write_at(thefile, disp, v1(1,1,1,1), BUFSIZE, MPI_REAL, mpi_status_ignore, ierr) call mpi_file_close(thefile, ierr) ! print the data read in... open (12, file='out000.dat-parallel-write-0') if (myrank.eq.0) then write (12,'(i4,e18.8)') myrank, v1(nn,ix,iy,iz),nn=1,nv+2),ix=1,nx),iy=1,ny), iz=1,nz) endif close (12) call mpi_finalize(ierr) end program parallel_io ### and the reader... reader### program parallel_read_io use mpi implicit none integer,parameter :: nx=1,ny=300,nz=256,nv=12 integer ierr, i, myrank, comm_size, BUFSIZE, thefile, realsize parameter (BUFSIZE=1075200) real,dimension(nv+2,nx,ny,nz) :: v1 integer (kind=MPI_OFFSET_KIND) disp integer ix, iy, iz, nn call mpi_init(ierr) call mpi_comm_size(mpi_comm_world, comm_size, ierr) call mpi_comm_rank(mpi_comm_world, myrank,ierr) ! do i=0,BUFSIZE ! buf(i) = myrank*BUFSIZE + i ! end do call mpi_type_extent(mpi_integer, realsize, ierr) call mpi_file_open(mpi_comm_world, 'testfile', MPI_MODE_RDONLY, mpi_info_null, thefile, ierr) call mpi_type_size(MPI_REAL, realsize, ierr) disp = myrank * BUFSIZE * realsize print*, 'myrank: ', myrank, ' disp: ', disp, ' realsize: ', realsize ! call mpi_file_set_view(thefile, disp, MPI_INTEGER, MPI_INTEGER, 'native', mpi_info_null, ierr) ! call mpi_file_read(thefile, buf, BUFSIZE, MPI_INTEGER, mpi_status_ignore, ierr) call mpi_file_read_at(thefile, disp, v1(1,1,1,1), BUFSIZE, MPI_REAL, mpi_status_ignore, ierr) call mpi_file_close(thefile, ierr) call mpi_barrier(mpi_comm_world,ierr) ! print the data read in... open (12, file='out000.dat-parallel-read-0') if (myrank.eq.0) then write (12,'(i4,e18.8)') myrank, v1(nn,ix,iy,iz),nn=1,nv+2),ix=1,nx),iy=1,ny), iz=1,nz) endif close (12) call mpi_finalize(ierr) end program parallel_read_io ### Here everything is working fine. However integrating this into a huger program, I get totally different data written out and read in. The setup up is the same as in the experiment, but I need some more memory... What might be the reason for such problems, and if I have an MPI error, how can I estimate this within a fortan program. I have only found examples for the error handling of MPI errors in C or C++. I would need an example for C. So any hints or ideas? Best wishes Alexander