Re: [OMPI users] Cast MPI inside another MPI?
Diego, MPI+MPI is a well known parallel programming paradigm.Why are you trying to avoid MPI + OpenMP ? Open MPI is a fully 3.1-compatible implementation of the MPI standard, and as such it implements all API described in the version 3.1 of the MPI standard (http://mpi-forum.org/docs/). Otherwise our FAQ and mailing list is the best place to learn about particular capabilities of the Open MPI software stack. George. On Fri, Nov 25, 2016 at 6:30 AM, Diego Avesani wrote: > Dear all, > > I have the following question. Is it possible to cast an MPI inside > another MPI? > I would like to have to level of parallelization, but I would like to > avoid the MPI-openMP paradigm. > > Another question. I normally use openMPI but I would like to read > something to understand and learn all its potentialities. Can anyone > suggest me any book or documentation? > > Thanks > > Diego > > > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Segmentation fault (invalid mem ref) at MPI_StartAll (second call)
At the first glance I would say you are confusing the variables counting your requests, reqcount and nrequests. George. On Fri, Nov 25, 2016 at 7:11 AM, Paolo Pezzutto wrote: > Dear all, > > I am struggling with an invalid memory reference when calling SUB EXC_MPI > (MOD01), and precisely at MPI_StartAll (see comment) below. > > @@ > ! ** file mod01.f90 ! > MODULE MOD01 > > implicit none > include 'mpif.h' > ! alternatively > ! use mpi > ! implicit none > PRIVATE > ! ... > INTERFACE exc_mpi >MODULE PROCEDURE exc_mpi > END INTERFACE > PUBLIC exc_mpi > > CONTAINS > > subroutine exc_mpi (X) > !! send and receive from procs PN0 <-> PN1 and PN0 <-> PN2 > real, dimension (ni:ns, m, l), intent(inout) :: X > > logical, save :: frstime=.true. > integer, save :: mpitype_sn, mpitype_sp, mpitype_rn, mpitype_rp > integer, save :: requests(4), reqcount > integer :: istatus(MPI_STATUS_SIZE,4), ierr > > if (frstime) then >call exc_init() >frstime = .false. > end if > call MPI_StartAll(reqcount,requests,ierr) !! <-- segfault > here > call MPI_WaitAll(reqcount,requests,istatus,ierr) > return > > contains > > subroutine exc_init > > integer :: i0, ierrs(12), ktag > > nrequests = 0 > ierrs=0 > ktag = 1 > > ! find i0 > > if ( condition1 ) then > ! send to PN2 >call MPI_Type_Vector(m*l, messlengthup(PN2), ns-ni+1, MPI_REAL, > mpitype_sn, ierrs(1)) >call MPI_Type_Commit(mpitype_sn, ierrs(3)) >call MPI_Send_Init(X(i0, 1, 1), 1, mpitype_sn, PN2-1, ktag, > MPI_COMM_WORLD, requests(reqcount+1), ierrs(5)) > ! recieve from PN2 >call MPI_Type_Vector(m*l, messlengthdo(PN0), ns-ni+1, MPI_REAL, > mpitype_rn, ierrs(2)) >call MPI_Type_Commit(mpitype_rn,ierrs(4)) >call MPI_Recv_Init(X(nend(irank)+1, 1, 1), 1, mpitype_rn, PN2-1, > ktag+1, MPI_COMM_WORLD, requests(nrequests+2), ierrs(6)) >nrequests = nrequests + 2 > end if > > if ( condition2 ) then > ! send and rec PN0 <-> PN1 >nrequests = nrequests + 2 > end if > > return > end subroutine exc_init > > end subroutine exc_mpi > > ! ... > > END MODULE MOD01 > @@ > > The calls are coming from this other module in a separate file: > > @@ > > ! ** file mod02.f90 ! > MODULE MOD02 > > use MOD01, only: exc_mpi > > IMPLICIT NONE > include 'mpif.h' > ! alternatively > ! use mpi > ! implicit none > PRIVATE > > ! ... > > INTERFACE MYSUB >MODULE PROCEDURE MYSUB > END INTERFACE > PUBLIC MYSUB > > CONTAINS > > SUBROUTINE MYSUB (Y) > > IMPLICIT NONE > REAL,INTENT(INOUT) :: Y(nl:nr, m, l) ! ni<=nl, nr>=ns > REAL, ALLOCATABLE, DIMENSION(:,:,:) :: Y0 > !... > allocate ( Y0(n-1:ns, 1:m, 1:l) ) > > DO i = 1, icount > >Y0(nl:nr,:,:) = F3(:,:,:) >call exc_mpi ( Y0(ni:ns, :, :) ) ! <-- segfault here >call mpi_barrier (mpi_comm_world, ierr) >Y0(ni-1,:,:) = 0. >CALL SUB01 > > END DO > deallocate (Y0) > RETURN > > CONTAINS > > SUBROUTINE SUB01 > !... >FRE: DO iterm = 1, m > DIR: DO iterl = 1, l > DO itern = nl, nr > !Y(itern, iterm, iterl) = some_lin_combination(Y0) > END DO > END DO DIR >END DO FRE > > END SUBROUTINE SUB01 > > ! ... > END SUBROUTINE MYSUB > > END MODULE MOD02 > @@ > > Segmentation fault is raised at runtime when MAIN (actually a sub in a > module) calls MYSUB (in MOD02) for the second time, i.e. just MPI_StartAll > without re-initialisation. The segfault is an invalid mem reference, but I > swear that the bounds aren't changing. > > The error is not systematic, in the sense that the program works if > splitting the job up to a certain number of processes, say NPMAX, which > depend on the size of decomposed array (the bigger the size, the higher > NPMAX). With more procs than NPMAX, the program segfaults. > > The same issue arises with [gfortran+ompi], [gfortran+mpich], while with > [ifort+mpich] does not always segfault but one process might hang > indefinitely. So I bet it is not strictly an ompi issue, so apologize for > posting here. It is not a single version issue too: same for deb-jessie, > ubuntu 14 and personal 2.0.1 -can share config.log if necessary-. > > The only thing in common is glibc (2.19, distro stable). Actually the > backtrace of ifort-mpich lists libpthread.so. I have not tried with > alternative c-libs, nor with newest glibc. > > Intel Virtual threading is enabled on all the three archs (one mini hpc > and two pc). > > This error is not reported on "serious" archs like
Re: [OMPI users] malloc related crash inside openmpi
> On Nov 24, 2016, at 10:52 AM, r...@open-mpi.org wrote: > > Just to be clear: are you saying that mpirun exits with that message? Or is > your application process exiting with it? > > There is no reason for mpirun to be looking for that library. > > The library in question is in the /lib/openmpi directory, and is > named mca_ess_pmi.[la,so] > Looks like this openmpi 2 crash was a matter of not using the correctly linked executable on all nodes. Now that it’s straightened out, I think it’s all working, and apparently even fixed my malloc related crash, so perhaps the allocator fix in 2.0.1 is really addressing the problem. Thank you all for the help. Noam ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
[OMPI users] Segmentation fault (invalid mem ref) at MPI_StartAll (second call)
Dear all, I am struggling with an invalid memory reference when calling SUB EXC_MPI (MOD01), and precisely at MPI_StartAll (see comment) below. @@ ! ** file mod01.f90 ! MODULE MOD01 implicit none include 'mpif.h' ! alternatively ! use mpi ! implicit none PRIVATE ! ... INTERFACE exc_mpi MODULE PROCEDURE exc_mpi END INTERFACE PUBLIC exc_mpi CONTAINS subroutine exc_mpi (X) !! send and receive from procs PN0 <-> PN1 and PN0 <-> PN2 real, dimension (ni:ns, m, l), intent(inout) :: X logical, save :: frstime=.true. integer, save :: mpitype_sn, mpitype_sp, mpitype_rn, mpitype_rp integer, save :: requests(4), reqcount integer :: istatus(MPI_STATUS_SIZE,4), ierr if (frstime) then call exc_init() frstime = .false. end if call MPI_StartAll(reqcount,requests,ierr) !! <-- segfault here call MPI_WaitAll(reqcount,requests,istatus,ierr) return contains subroutine exc_init integer :: i0, ierrs(12), ktag nrequests = 0 ierrs=0 ktag = 1 ! find i0 if ( condition1 ) then ! send to PN2 call MPI_Type_Vector(m*l, messlengthup(PN2), ns-ni+1, MPI_REAL, mpitype_sn, ierrs(1)) call MPI_Type_Commit(mpitype_sn, ierrs(3)) call MPI_Send_Init(X(i0, 1, 1), 1, mpitype_sn, PN2-1, ktag, MPI_COMM_WORLD, requests(reqcount+1), ierrs(5)) ! recieve from PN2 call MPI_Type_Vector(m*l, messlengthdo(PN0), ns-ni+1, MPI_REAL, mpitype_rn, ierrs(2)) call MPI_Type_Commit(mpitype_rn,ierrs(4)) call MPI_Recv_Init(X(nend(irank)+1, 1, 1), 1, mpitype_rn, PN2-1, ktag+1, MPI_COMM_WORLD, requests(nrequests+2), ierrs(6)) nrequests = nrequests + 2 end if if ( condition2 ) then ! send and rec PN0 <-> PN1 nrequests = nrequests + 2 end if return end subroutine exc_init end subroutine exc_mpi ! ... END MODULE MOD01 @@ The calls are coming from this other module in a separate file: @@ ! ** file mod02.f90 ! MODULE MOD02 use MOD01, only: exc_mpi IMPLICIT NONE include 'mpif.h' ! alternatively ! use mpi ! implicit none PRIVATE ! ... INTERFACE MYSUB MODULE PROCEDURE MYSUB END INTERFACE PUBLIC MYSUB CONTAINS SUBROUTINE MYSUB (Y) IMPLICIT NONE REAL,INTENT(INOUT) :: Y(nl:nr, m, l) ! ni<=nl, nr>=ns REAL, ALLOCATABLE, DIMENSION(:,:,:) :: Y0 !... allocate ( Y0(n-1:ns, 1:m, 1:l) ) DO i = 1, icount Y0(nl:nr,:,:) = F3(:,:,:) call exc_mpi ( Y0(ni:ns, :, :) ) ! <-- segfault here call mpi_barrier (mpi_comm_world, ierr) Y0(ni-1,:,:) = 0. CALL SUB01 END DO deallocate (Y0) RETURN CONTAINS SUBROUTINE SUB01 !... FRE: DO iterm = 1, m DIR: DO iterl = 1, l DO itern = nl, nr !Y(itern, iterm, iterl) = some_lin_combination(Y0) END DO END DO DIR END DO FRE END SUBROUTINE SUB01 ! ... END SUBROUTINE MYSUB END MODULE MOD02 @@ Segmentation fault is raised at runtime when MAIN (actually a sub in a module) calls MYSUB (in MOD02) for the second time, i.e. just MPI_StartAll without re-initialisation. The segfault is an invalid mem reference, but I swear that the bounds aren't changing. The error is not systematic, in the sense that the program works if splitting the job up to a certain number of processes, say NPMAX, which depend on the size of decomposed array (the bigger the size, the higher NPMAX). With more procs than NPMAX, the program segfaults. The same issue arises with [gfortran+ompi], [gfortran+mpich], while with [ifort+mpich] does not always segfault but one process might hang indefinitely. So I bet it is not strictly an ompi issue, so apologize for posting here. It is not a single version issue too: same for deb-jessie, ubuntu 14 and personal 2.0.1 -can share config.log if necessary-. The only thing in common is glibc (2.19, distro stable). Actually the backtrace of ifort-mpich lists libpthread.so. I have not tried with alternative c-libs, nor with newest glibc. Intel Virtual threading is enabled on all the three archs (one mini hpc and two pc). This error is not reported on "serious" archs like nec, sun (ifort+ompi) and ibm. I am trying to find a possible MPI workaround for deb-based systems, maintaining efficiency as much as possible. As can be seen, MOD02 passes to the exchange procedure (MOD01) a sliced array Y0 which is non contiguous. But I should not worry because MPI_Type_Vector is expected to do the remapping job for me. I could almost overcome the fault (NPMAX growing by one order of magnitude) is to exchange the dimensions back and forth, but this causes the
[OMPI users] Cast MPI inside another MPI?
Dear all, I have the following question. Is it possible to cast an MPI inside another MPI? I would like to have to level of parallelization, but I would like to avoid the MPI-openMP paradigm. Another question. I normally use openMPI but I would like to read something to understand and learn all its potentialities. Can anyone suggest me any book or documentation? Thanks Diego ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] MPI_Sendrecv datatype memory bug ?
Yann, Please post the test case that evidences the issue. What is the minimal config required to reproduce it (E.g. number of nodes and tasks per node) If more than one node, which interconnect are you using ? Out of curiosity, what if you mpirun --mca mpi_leave_pinned 0 ... or mpirun --mca btl tcp,self --mca pml ob1 ... Cheers, Gilles Yann Jobic wrote: >Hi all, > >I'm going crazy about a possible bug in my code. I'm using a derived mpi >datatype in a sendrecv function. >The problem is that the memory footprint of my code is growing as time >increases. >The problem is not showing with a regular datatype, as MPI_DOUBLE. >I don't have this problem for openmpi 1.8.4, but it's present for 1.10.1 >and 2.0.1 > >The key parts of the code are (i'm using a 1D array with a macro in >order to be 3D) : > >Definition of the datatype: > > MPI_Type_vector( Ny, 1, Nx, MPI_DOUBLE, &mpi.MPI_COL ); > MPI_Type_commit( &mpi.MPI_COL ) ; > >And the sendrecv part: > > MPI_Sendrecv( &(thebigone[_(1,0,k)]), 1, mpi.MPI_COL , >mpi.left , 3, \ > &(thebigone[_(Nx-1,0,k)]) , 1, mpi.MPI_COL , >mpi.right, 3, \ > mpi.com, &mpi.stat ); > >Is it coming from my code ? > >I isolated the communications in a small code (500 lines). I can give it >in order to reproduce the problem. > >Thanks, > >Yann > > >--- >L'absence de virus dans ce courrier électronique a été vérifiée par le >logiciel antivirus Avast. >https://www.avast.com/antivirus > >___ >users mailing list >users@lists.open-mpi.org >https://rfd.newmexicoconsortium.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users