Hi Juanchao,
Thanks for the hints below, they will take some time to absorb as the vectors that are being  moved around
are actually partly petsc vectors and partly local process vectors.

Attached is the modified routine that now works (on leaking memory) with openmpi.

-sanjay


On 5/30/19 8:41 PM, Zhang, Junchao wrote:

Hi, Sanjay,
  Could you send your modified data exchange code (psetb.F) with MPI_Waitall? See other inlined comments below. Thanks.

On Thu, May 30, 2019 at 1:49 PM Sanjay Govindjee via petsc-users <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>> wrote:

    Lawrence,
    Thanks for taking a look!  This is what I had been wondering about
    -- my
    knowledge of MPI is pretty minimal and
    this origins of the routine were from a programmer we hired a decade+
    back from NERSC.  I'll have to look into
    VecScatter.  It will be great to dispense with our roll-your-own
    routines (we even have our own reduceALL scattered around the code).

Petsc VecScatter has a very simple interface and you definitely should go with.  With VecScatter, you can think in familiar vectors and indices instead of the low level MPI_Send/Recv. Besides that, PETSc has optimized VecScatter so that communication is efficient.


    Interestingly, the MPI_WaitALL has solved the problem when using
    OpenMPI
    but it still persists with MPICH.  Graphs attached.
    I'm going to run with openmpi for now (but I guess I really still
    need
    to figure out what is wrong with MPICH and WaitALL;
    I'll try Barry's suggestion of
    --download-mpich-configure-arguments="--enable-error-messages=all
    --enable-g" later today and report back).

    Regarding MPI_Barrier, it was put in due a problem that some
    processes
    were finishing up sending and receiving and exiting the subroutine
    before the receiving processes had completed (which resulted in data
    loss as the buffers are freed after the call to the routine).
    MPI_Barrier was the solution proposed
    to us.  I don't think I can dispense with it, but will think about
    some
    more.

After MPI_Send(), or after MPI_Isend(..,req) and MPI_Wait(req), you can safely free the send buffer without worry that the receive has not completed. MPI guarantees the receiver can get the data, for example, through internal buffering.


    I'm not so sure about using MPI_IRecv as it will require a bit of
    rewriting since right now I process the received
    data sequentially after each blocking MPI_Recv -- clearly slower but
    easier to code.

    Thanks again for the help.

    -sanjay

    On 5/30/19 4:48 AM, Lawrence Mitchell wrote:
    > Hi Sanjay,
    >
    >> On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users
    <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>> wrote:
    >>
    >> The problem seems to persist but with a different signature. 
    Graphs attached as before.
    >>
    >> Totals with MPICH (NB: single run)
    >>
    >> For the CG/Jacobi          data_exchange_total = 41,385,984;
    kspsolve_total = 38,289,408
    >> For the GMRES/BJACOBI      data_exchange_total = 41,324,544;
    kspsolve_total = 41,324,544
    >>
    >> Just reading the MPI docs I am wondering if I need some sort of
    MPI_Wait/MPI_Waitall before my MPI_Barrier in the data exchange
    routine?
    >> I would have thought that with the blocking receives and the
    MPI_Barrier that everything will have fully completed and cleaned
    up before
    >> all processes exited the routine, but perhaps I am wrong on that.
    >
    > Skimming the fortran code you sent you do:
    >
    > for i in ...:
    >     call MPI_Isend(..., req, ierr)
    >
    > for i in ...:
    >     call MPI_Recv(..., ierr)
    >
    > But you never call MPI_Wait on the request you got back from the
    Isend. So the MPI library will never free the data structures it
    created.
    >
    > The usual pattern for these non-blocking communications is to
    allocate an array for the requests of length nsend+nrecv and then do:
    >
    > for i in nsend:
    >     call MPI_Isend(..., req[i], ierr)
    > for j in nrecv:
    >     call MPI_Irecv(..., req[nsend+j], ierr)
    >
    > call MPI_Waitall(req, ..., ierr)
    >
    > I note also there's no need for the Barrier at the end of the
    routine, this kind of communication does neighbourwise
    synchronisation, no need to add (unnecessary) global
    synchronisation too.
    >
    > As an aside, is there a reason you don't use PETSc's VecScatter
    to manage this global to local exchange?
    >
    > Cheers,
    >
    > Lawrence


!$Id:$
      subroutine psetb(b,getp,getv,senp,senv,eq, ndf, rdatabuf,sdatabuf)

!      * * F E A P * * A Finite Element Analysis Program

!....  Copyright (c) 1984-2017: Regents of the University of California
!                               All rights reserved

!-----[--.----+----.----+----.-----------------------------------------]
!     Modification log                                Date (dd/mm/year)
!       Original version                                    01/11/2006
!       1. Revise send/receive data add barrier             24/11/2006
!       2. Correct send/receive and add error messages      16/03/2007
!       3. Change 'include/finclude' to 'finclude'          23/01/2009
!       4. Remove common 'pfeapa' (values in 'setups')      05/02/2009
!       5. finclude -> petsc/finclude                       12/05/2016
!       6. Update for PETSc 3.8.3                           28/02/2018
!       7. Change 'id' to 'eq'                              23/05/2019
!-----[--.----+----.----+----.-----------------------------------------]
!     Purpose:  Transfer PETSC vector to local arrays including ghost
!               node data via MPI messaging

!     Inputs:
!        getp(ntasks)    - Pointer array for getting ghost data
!        getv(sum(getp)) - Local node numbers for getting ghost data
!        senp(ntasks)    - Pointer array for sending ghost data
!        senv(sum(senp)) - Local node numbers to send out as ghost data
!        eq(ndf,numnp)   - Local equation numbers
!        ndf             - dofs per node
!        rdatabuf(*)     - receive communication array
!        sdatabuf(*)     - send    communication array

!     Outputs:

!        b(neq) - Local solution vector
!-----[--.----+----.----+----.-----------------------------------------]
#     include   <petsc/finclude/petscsys.h>
      use                       petscsys
      implicit   none


#     include   "cdata.h"
#     include   "iofile.h"
#     include   "pfeapb.h"
#     include   "setups.h"

      integer    ndf
      integer    i, j, k, lnodei, eqi, soff, rbuf,sbuf,tbuf, idesp
      integer    getp(*),getv(*),senp(*),senv(*)
      integer    eq(ndf,*)
      real*8     b(*), rdatabuf(*), sdatabuf(*)

      integer    usolve_msg, req(ntasks),reqcnt
      parameter (usolve_msg=10)

!     Petsc values

      PetscErrorCode ierr

      integer        msg_stat(MPI_STATUS_SIZE)

!     Sending Data Asynchronously

      soff  = 0
      idesp = 0
      req(:) = 0 
      reqcnt = 0 
      do i = 1, ntasks

        if (senp(i) .gt. 0) then
          sbuf   = soff
          do j = 1, senp(i)
            lnodei = senv(j + idesp)
            do k = 1, ndf
              eqi = eq(k,lnodei)
              if (eqi .gt. 0) then
                sbuf           = sbuf + 1
                sdatabuf(sbuf) = b(eqi)
              endif
            end do ! k
          end do ! j
          idesp = idesp + senp(i)
          sbuf  = sbuf  - soff
          reqcnt = reqcnt + 1 
          call MPI_Isend( sdatabuf(soff+1), sbuf, MPI_DOUBLE_PRECISION,
     &                    i-1, usolve_msg, MPI_COMM_WORLD, req(reqcnt), 
     &                    ierr) 

!         Report send error

          if(ierr.ne.0) then
            write(iow,*) ' -->> Send Error[',rank+1,'->',i,']',ierr
            write(  *,*) ' -->> Send Error[',rank+1,'->',i,']',ierr
          endif

          soff = soff + sbuf
        endif

      end do ! i

!     Receiving Data in blocking mode

      idesp = 0
      do i = 1, ntasks
        if (getp(i) .gt. 0) then

!         Determine receive length

          tbuf = getp(i)*ndf
          call MPI_Recv( rdatabuf, tbuf, MPI_DOUBLE_PRECISION, i-1,
     &                   usolve_msg, MPI_COMM_WORLD,msg_stat, ierr)
          if(ierr.ne.0) then
             write(iow,*) 'Recv Error[',i,'->',rank+1,']',ierr
             write(  *,*) 'Recv Error[',i,'->',rank+1,']',ierr
          endif
          rbuf = 0
          do j = 1, getp(i)
            lnodei = getv(j + idesp)
            do k = 1, ndf
              eqi = eq(k,lnodei)
              if (eqi .gt. 0 ) then
                rbuf     = rbuf + 1
                b( eqi ) = rdatabuf( rbuf )
              endif
            end do ! k
          end do ! j
          idesp = idesp + getp(i)
        endif
      end do ! i

      call MPI_WaitALL(reqcnt,req,MPI_STATUSES_IGNORE,ierr)
      call MPI_BARRIER(MPI_COMM_WORLD, ierr) 

      end

Reply via email to