Re: [OMPI users] Cast MPI inside another MPI?

2016-11-25 Thread George Bosilca
Diego,

MPI+MPI is a well known parallel programming paradigm.Why are you trying to
avoid MPI + OpenMP ?

Open MPI is a fully 3.1-compatible implementation of the MPI standard, and
as such it implements all API described in the version 3.1 of the MPI
standard (http://mpi-forum.org/docs/). Otherwise our FAQ and mailing list
is the best place to learn about particular capabilities of the Open MPI
software stack.

  George.


On Fri, Nov 25, 2016 at 6:30 AM, Diego Avesani 
wrote:

> Dear all,
>
> I have the following question. Is it possible to cast an MPI inside
> another MPI?
> I would like to have to level of parallelization, but I would like to
> avoid the MPI-openMP paradigm.
>
> Another question. I normally use openMPI but I would like to read
> something to understand and learn all its potentialities. Can anyone
> suggest me any book or documentation?
>
> Thanks
>
> Diego
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Segmentation fault (invalid mem ref) at MPI_StartAll (second call)

2016-11-25 Thread George Bosilca
At the first glance I would say you are confusing the variables counting
your requests, reqcount and nrequests.

  George.


On Fri, Nov 25, 2016 at 7:11 AM, Paolo Pezzutto  wrote:

> Dear all,
>
> I am struggling with an invalid memory reference when calling SUB EXC_MPI
> (MOD01), and precisely at MPI_StartAll (see comment) below.
>
> @@
> ! ** file mod01.f90  !
> MODULE MOD01
>
> implicit none
> include 'mpif.h'
> ! alternatively
> ! use mpi
> ! implicit none
> PRIVATE
> ! ...
> INTERFACE exc_mpi
>MODULE PROCEDURE exc_mpi
> END INTERFACE
> PUBLIC exc_mpi
>
> CONTAINS
>
> subroutine exc_mpi (X)
> !! send and receive from procs PN0 <-> PN1 and PN0 <-> PN2
> real, dimension (ni:ns, m, l), intent(inout) :: X
>
> logical, save :: frstime=.true.
> integer, save :: mpitype_sn, mpitype_sp, mpitype_rn, mpitype_rp
> integer, save :: requests(4), reqcount
> integer   :: istatus(MPI_STATUS_SIZE,4), ierr
>
> if (frstime) then
>call exc_init()
>frstime = .false.
> end if
> call MPI_StartAll(reqcount,requests,ierr) !!  <-- segfault
> here
> call MPI_WaitAll(reqcount,requests,istatus,ierr)
> return
>
> contains
>
> subroutine exc_init
>
> integer :: i0, ierrs(12), ktag
>
> nrequests = 0
> ierrs=0
> ktag = 1
>
> ! find i0
>
> if ( condition1 ) then
> ! send to PN2
>call MPI_Type_Vector(m*l, messlengthup(PN2), ns-ni+1, MPI_REAL,
> mpitype_sn, ierrs(1))
>call MPI_Type_Commit(mpitype_sn, ierrs(3))
>call MPI_Send_Init(X(i0, 1, 1), 1, mpitype_sn, PN2-1, ktag,
> MPI_COMM_WORLD, requests(reqcount+1), ierrs(5))
> ! recieve from PN2
>call MPI_Type_Vector(m*l, messlengthdo(PN0), ns-ni+1, MPI_REAL,
> mpitype_rn, ierrs(2))
>call MPI_Type_Commit(mpitype_rn,ierrs(4))
>call MPI_Recv_Init(X(nend(irank)+1, 1, 1), 1, mpitype_rn, PN2-1,
> ktag+1, MPI_COMM_WORLD, requests(nrequests+2), ierrs(6))
>nrequests = nrequests + 2
> end if
>
> if ( condition2 ) then
> !   send and rec PN0 <-> PN1
>nrequests = nrequests + 2
> end if
>
> return
> end subroutine exc_init
>
> end subroutine exc_mpi
>
> ! ...
>
> END MODULE MOD01
> @@
>
> The calls are coming from this other module in a separate file:
>
> @@
>
> ! ** file mod02.f90  !
> MODULE MOD02
>
> use MOD01, only: exc_mpi
>
> IMPLICIT NONE
> include 'mpif.h'
> ! alternatively
> ! use mpi
> ! implicit none
> PRIVATE
>
> ! ...
>
> INTERFACE MYSUB
>MODULE PROCEDURE MYSUB
> END INTERFACE
> PUBLIC MYSUB
>
> CONTAINS
>
> SUBROUTINE MYSUB (Y)
>
> IMPLICIT NONE
> REAL,INTENT(INOUT)   :: Y(nl:nr, m, l) ! ni<=nl, nr>=ns
> REAL, ALLOCATABLE, DIMENSION(:,:,:) :: Y0
> !...
> allocate ( Y0(n-1:ns, 1:m, 1:l) )
>
> DO i = 1, icount
>
>Y0(nl:nr,:,:) = F3(:,:,:)
>call exc_mpi ( Y0(ni:ns, :, :) )   !  <-- segfault here
>call mpi_barrier (mpi_comm_world, ierr)
>Y0(ni-1,:,:) = 0.
>CALL SUB01
>
> END DO
> deallocate (Y0)
> RETURN
>
> CONTAINS
>
> SUBROUTINE SUB01
> !...
>FRE: DO iterm = 1, m
>   DIR: DO iterl = 1, l
>  DO itern = nl, nr
> !Y(itern, iterm, iterl) = some_lin_combination(Y0)
>  END DO
>   END DO DIR
>END DO FRE
>
> END SUBROUTINE SUB01
>
> ! ...
> END SUBROUTINE MYSUB
>
> END MODULE MOD02
> @@
>
> Segmentation fault is raised at runtime when MAIN (actually a sub in a
> module) calls MYSUB (in MOD02) for the second time, i.e. just MPI_StartAll
> without re-initialisation. The segfault is an invalid mem reference, but I
> swear that the bounds aren't changing.
>
> The error is not systematic, in the sense that the program works if
> splitting the job up to a certain number of processes, say NPMAX, which
> depend on the size of decomposed array (the bigger the size, the higher
> NPMAX). With more procs than NPMAX, the program segfaults.
>
> The same issue arises with [gfortran+ompi], [gfortran+mpich], while with
> [ifort+mpich] does not always segfault but one process might hang
> indefinitely. So I bet it is not strictly an ompi issue, so apologize for
> posting here. It is not a single version issue too: same for deb-jessie,
> ubuntu 14 and personal 2.0.1 -can share config.log if necessary-.
>
> The only thing in common is glibc (2.19, distro stable). Actually the
> backtrace of ifort-mpich lists libpthread.so. I have not tried with
> alternative c-libs, nor with newest glibc.
>
> Intel Virtual threading is enabled on all the three archs (one mini hpc
> and two pc).
>
> This error is not reported on "serious" archs like 

Re: [OMPI users] malloc related crash inside openmpi

2016-11-25 Thread Noam Bernstein
> On Nov 24, 2016, at 10:52 AM, r...@open-mpi.org wrote:
> 
> Just to be clear: are you saying that mpirun exits with that message? Or is 
> your application process exiting with it?
> 
> There is no reason for mpirun to be looking for that library.
> 
> The library in question is in the /lib/openmpi directory, and is 
> named mca_ess_pmi.[la,so]
> 

Looks like this openmpi 2 crash was a matter of not using the correctly linked 
executable on all nodes. Now that it’s straightened out, I think it’s all 
working, and apparently even fixed my malloc related crash, so perhaps the 
allocator fix in 2.0.1 is really addressing the problem.

Thank you all for the help.

Noam
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Segmentation fault (invalid mem ref) at MPI_StartAll (second call)

2016-11-25 Thread Paolo Pezzutto
Dear all,

I am struggling with an invalid memory reference when calling SUB EXC_MPI
(MOD01), and precisely at MPI_StartAll (see comment) below.

@@
! ** file mod01.f90  !
MODULE MOD01

implicit none
include 'mpif.h'
! alternatively
! use mpi
! implicit none
PRIVATE
! ...
INTERFACE exc_mpi
   MODULE PROCEDURE exc_mpi
END INTERFACE
PUBLIC exc_mpi

CONTAINS

subroutine exc_mpi (X)
!! send and receive from procs PN0 <-> PN1 and PN0 <-> PN2
real, dimension (ni:ns, m, l), intent(inout) :: X

logical, save :: frstime=.true.
integer, save :: mpitype_sn, mpitype_sp, mpitype_rn, mpitype_rp
integer, save :: requests(4), reqcount
integer   :: istatus(MPI_STATUS_SIZE,4), ierr

if (frstime) then
   call exc_init()
   frstime = .false.
end if
call MPI_StartAll(reqcount,requests,ierr) !!  <-- segfault here
call MPI_WaitAll(reqcount,requests,istatus,ierr)
return

contains

subroutine exc_init

integer :: i0, ierrs(12), ktag

nrequests = 0
ierrs=0
ktag = 1

! find i0

if ( condition1 ) then
! send to PN2
   call MPI_Type_Vector(m*l, messlengthup(PN2), ns-ni+1, MPI_REAL,
mpitype_sn, ierrs(1))
   call MPI_Type_Commit(mpitype_sn, ierrs(3))
   call MPI_Send_Init(X(i0, 1, 1), 1, mpitype_sn, PN2-1, ktag,
MPI_COMM_WORLD, requests(reqcount+1), ierrs(5))
! recieve from PN2
   call MPI_Type_Vector(m*l, messlengthdo(PN0), ns-ni+1, MPI_REAL,
mpitype_rn, ierrs(2))
   call MPI_Type_Commit(mpitype_rn,ierrs(4))
   call MPI_Recv_Init(X(nend(irank)+1, 1, 1), 1, mpitype_rn, PN2-1,
ktag+1, MPI_COMM_WORLD, requests(nrequests+2), ierrs(6))
   nrequests = nrequests + 2
end if

if ( condition2 ) then
!   send and rec PN0 <-> PN1
   nrequests = nrequests + 2
end if

return
end subroutine exc_init

end subroutine exc_mpi

! ...

END MODULE MOD01
@@

The calls are coming from this other module in a separate file:

@@

! ** file mod02.f90  !
MODULE MOD02

use MOD01, only: exc_mpi

IMPLICIT NONE
include 'mpif.h'
! alternatively
! use mpi
! implicit none
PRIVATE

! ...

INTERFACE MYSUB
   MODULE PROCEDURE MYSUB
END INTERFACE
PUBLIC MYSUB

CONTAINS

SUBROUTINE MYSUB (Y)

IMPLICIT NONE
REAL,INTENT(INOUT)   :: Y(nl:nr, m, l) ! ni<=nl, nr>=ns
REAL, ALLOCATABLE, DIMENSION(:,:,:) :: Y0
!...
allocate ( Y0(n-1:ns, 1:m, 1:l) )

DO i = 1, icount

   Y0(nl:nr,:,:) = F3(:,:,:)
   call exc_mpi ( Y0(ni:ns, :, :) )   !  <-- segfault here
   call mpi_barrier (mpi_comm_world, ierr)
   Y0(ni-1,:,:) = 0.
   CALL SUB01

END DO
deallocate (Y0)
RETURN

CONTAINS

SUBROUTINE SUB01
!...
   FRE: DO iterm = 1, m
  DIR: DO iterl = 1, l
 DO itern = nl, nr
!Y(itern, iterm, iterl) = some_lin_combination(Y0)
 END DO
  END DO DIR
   END DO FRE

END SUBROUTINE SUB01

! ...
END SUBROUTINE MYSUB

END MODULE MOD02
@@

Segmentation fault is raised at runtime when MAIN (actually a sub in a
module) calls MYSUB (in MOD02) for the second time, i.e. just MPI_StartAll
without re-initialisation. The segfault is an invalid mem reference, but I
swear that the bounds aren't changing.

The error is not systematic, in the sense that the program works if
splitting the job up to a certain number of processes, say NPMAX, which
depend on the size of decomposed array (the bigger the size, the higher
NPMAX). With more procs than NPMAX, the program segfaults.

The same issue arises with [gfortran+ompi], [gfortran+mpich], while with
[ifort+mpich] does not always segfault but one process might hang
indefinitely. So I bet it is not strictly an ompi issue, so apologize for
posting here. It is not a single version issue too: same for deb-jessie,
ubuntu 14 and personal 2.0.1 -can share config.log if necessary-.

The only thing in common is glibc (2.19, distro stable). Actually the
backtrace of ifort-mpich lists libpthread.so. I have not tried with
alternative c-libs, nor with newest glibc.

Intel Virtual threading is enabled on all the three archs (one mini hpc and
two pc).

This error is not reported on "serious" archs like nec, sun (ifort+ompi)
and ibm.

I am trying to find a possible MPI workaround for deb-based systems,
maintaining efficiency as much as possible.

As can be seen, MOD02 passes to the exchange procedure (MOD01) a sliced
array Y0 which is non contiguous. But I should not worry because
MPI_Type_Vector is expected to do the remapping job for me.

I could almost overcome the fault (NPMAX growing by one order of magnitude)
is to exchange the dimensions back and forth, but this causes the 

[OMPI users] Cast MPI inside another MPI?

2016-11-25 Thread Diego Avesani
Dear all,

I have the following question. Is it possible to cast an MPI inside another
MPI?
I would like to have to level of parallelization, but I would like to avoid
the MPI-openMP paradigm.

Another question. I normally use openMPI but I would like to read something
to understand and learn all its potentialities. Can anyone suggest me any
book or documentation?

Thanks

Diego
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI_Sendrecv datatype memory bug ?

2016-11-25 Thread Gilles Gouaillardet
Yann,

Please post the test case that evidences the issue.

What is the minimal config required to reproduce it
(E.g. number of nodes and tasks per node)

If more than one node, which interconnect are you using ?
Out of curiosity, what if you
mpirun --mca mpi_leave_pinned 0 ...
or
mpirun --mca btl tcp,self --mca pml ob1 ...


Cheers,

Gilles

Yann Jobic  wrote:
>Hi all,
>
>I'm going crazy about a possible bug in my code. I'm using a derived mpi 
>datatype in a sendrecv function.
>The problem is that the memory footprint of my code is growing as time 
>increases.
>The problem is not showing with a regular datatype, as MPI_DOUBLE.
>I don't have this problem for openmpi 1.8.4, but it's present for 1.10.1 
>and 2.0.1
>
>The key parts of the code are (i'm using a 1D array with a macro in 
>order to be 3D) :
>
>Definition of the datatype:
>
>   MPI_Type_vector( Ny, 1, Nx, MPI_DOUBLE, &mpi.MPI_COL );
>   MPI_Type_commit( &mpi.MPI_COL ) ;
>
>And the sendrecv part:
>
>   MPI_Sendrecv( &(thebigone[_(1,0,k)]), 1, mpi.MPI_COL , 
>mpi.left , 3, \
>   &(thebigone[_(Nx-1,0,k)]) , 1, mpi.MPI_COL , 
>mpi.right, 3, \
>   mpi.com, &mpi.stat );
>
>Is it coming from my code ?
>
>I isolated the communications in a small code (500 lines). I can give it 
>in order to reproduce the problem.
>
>Thanks,
>
>Yann
>
>
>---
>L'absence de virus dans ce courrier électronique a été vérifiée par le 
>logiciel antivirus Avast.
>https://www.avast.com/antivirus
>
>___
>users mailing list
>users@lists.open-mpi.org
>https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users