Re: [petsc-users] PETSC matrix assembling super slow

Yaxiong Chen via petsc-users Mon, 04 Feb 2019 13:17:23 -0800

Hi Mark,

     Will the parameter MatMPIAIJSetPreallocation in influence the following 
part


      do i=mystart,nelem,nproc
            call ptSystem%getElementalMAT(i, Ae, auxRHSe, idx)
            ne=size(idx)
            idx=idx-1           !-1 since PETSC index starts from zero
            if (allocated(auxRHSe))  call 
VecSetValues(bvec,ne,idx,auxRHSe,ADD_VALUES,ierr)
            call MatSetValues(Amat,ne,idx,ne,idx,Ae,ADD_VALUES,ierr)
      end do

I found this part will impede my assembling process. In my case ,the total DOF 
is 20800. I estimated the upper bound  of number of nonzero entries in each row 
as 594. So I set f9 to be 20206 and f6 to be 10103 in the following command:

 call 
MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER, 
ierr)


Running the code in sequential mode with -info I got the following 
information.And I can get the result even thought the assembling process is 
kind of slow(several times slower than using Mumps).

[0] MatAssemblyBegin_MPIBAIJ(): Stash has 0 entries,uses 0 mallocs.

[0] MatAssemblyBegin_MPIBAIJ(): Block-Stash has 0 entries, uses 0 mallocs.

[0] MatAssemblyEnd_SeqBAIJ(): Matrix size: 20800 X 20800, block size 1; storage 
space: 13847 unneeded, 1030038 used

[0] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 62659

[0] MatAssemblyEnd_SeqBAIJ(): Most nonzeros blocks in any row is 70

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 
20800) < 0.6. Do not use CompressedRow routines.

[0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472 
140509341618528

[0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472 
140509341618528

[0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472 
140509341618528

[0] VecScatterCreate_Seq(): Special case: sequential copy

[0] MatAssemblyEnd_SeqBAIJ(): Matrix size: 20800 X 0, block size 1; storage 
space: 104000 unneeded, 0 used

[0] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 0

[0] MatAssemblyEnd_SeqBAIJ(): Most nonzeros blocks in any row is 0

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows 
20800)/(num_localrows 20800) > 0.6. Use CompressedRow routines.

 aseembel time   282.54871800000001

[0] PetscCommDuplicate(): Using internal PETSc communicator 4462888960 
140509341442256

[0] PCSetUp(): Setting up PC for first time

[0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
unchanged

[0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
unchanged

[0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
unchanged

[0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
unchanged

[0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
unchanged

[0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
unchanged

[0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
unchanged


However, when I use the parallel mode,I got the following information:

[0] MatAssemblyBegin_MPIBAIJ(): Stash has 707965 entries,uses 6 mallocs.

[0] MatAssemblyBegin_MPIBAIJ(): Block-Stash has 0 entries, uses 0 mallocs.


it seems it never went to  call MatAssemblyEnd(Amat,MAT_FINAL_ASSEMBLY,ierr)

Is there anything I am doing wrong?

Thanks


Yaxiong Chen,
Ph.D. Student


School of Mechanical Engineering, 3171

585 Purdue Mall

West Lafayette, IN 47907





________________________________
From: Mark Adams <mfad...@lbl.gov>
Sent: Tuesday, January 29, 2019 10:02 PM
To: Yaxiong Chen; PETSc users list
Subject: Re: [petsc-users] PETSC matrix assembling super slow

Optimized is a configuration flag not a versions.

You need to figure out your number of non-zeros per row of you global matrix, 
or a bound on it, and supply that in MatMPIAIJSetPreallocation. Otherwise it 
has to allocate and copy memory often.

You could increase your f9 on a serial run and see what runs best and then  
move to parallel with a value in f6 of about 1/2 of f9.

On Tue, Jan 29, 2019 at 9:13 PM Yaxiong Chen 
<chen2...@purdue.edu<mailto:chen2...@purdue.edu>> wrote:

Thanks Mark,

    I use PETSC 3.9.4, is this the optimized version you called?

    Actually f9 and f6 are from the PETSC example. I don't know how to set the 
value correctly so I commend them. The size of my elemental matrix may vary. 
For 2D problem, the size of elemental matrix can be 24*24 or 32*32 or some 
other sizes. And the index is not continuous. In this case, the elemental 
matrix may interlace with each other in the global matrix, and I may have 
thousands of elemental matrix to be assembled. Does the preallocating suitable 
for this?


Yaxiong Chen,
Ph.D. Student


School of Mechanical Engineering, 3171

585 Purdue Mall

West Lafayette, IN 47907





________________________________
From: Mark Adams <mfad...@lbl.gov<mailto:mfad...@lbl.gov>>
Sent: Tuesday, January 29, 2019 8:25 PM
To: Yaxiong Chen
Cc: Song Gao; petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>
Subject: Re: [petsc-users] PETSC matrix assembling super slow

Slow assembly is often from not preallocating correctly. I am guessing that you 
are using Q1 element and f9==9, and thus the preallocation should be OK if this 
is a scalar problem on a regular grid and f6-==6 should be OK for the off 
processor allocation, if my assumptions are correct.

You can run with -info, which will tell you how many allocation were done in 
assembly. Make sure that it is small (eg, 0).

I see you use f90 array stuff 'idx-1'. Compilers can sometimes do crazy things 
with seeming simple code. You could just do this manually if you can find 
anything else.

And I trust you are running an optimized version of PETSc.


On Tue, Jan 29, 2019 at 6:22 PM Yaxiong Chen via petsc-users 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote:

Hi Song,

I don't quite understand how I can use this command. I don't partition the 
gloabl matrix. If I add my elemental matrix to the global system it will be 
like this. And in my parallel part, I use each core to generate the elemental 
matrix in turn. In this case, I guess each core will be assigned the space for 
global matrix and finally be assembled.But according to the manual, it seems 
each core will store a part of the global matrix. Is the local submatrix in the 
MatMPIAIJSetPreallocation<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatMPIAIJSetPreallocation.html#MatMPIAIJSetPreallocation>(Mat<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/Mat.html#Mat>
 
B,PetscInt<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
 d_nz,const 
PetscInt<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
 
d_nnz[],PetscInt<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
 o_nz,const 
PetscInt<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
 o_nnz[])the same as my elemental matrix?

[cid:1689caf2cd3f456b1e51]

Thanks


Yaxiong Chen,
Ph.D. Student


School of Mechanical Engineering, 3171

585 Purdue Mall

West Lafayette, IN 47907





________________________________
From: Song Gao <song.g...@mail.mcgill.ca<mailto:song.g...@mail.mcgill.ca>>
Sent: Tuesday, January 29, 2019 1:22 PM
To: Yaxiong Chen
Cc: Matthew Knepley; petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>
Subject: Re: [petsc-users] PETSC matrix assembling super slow

I think you would prefer to preallocate the matrix

uncomment this line
! call 
MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER, 
ierr)



Le mar. 29 janv. 2019, à 12 h 40, Yaxiong Chen via petsc-users 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> a écrit :

Hello,


I have a 2D system which is assembled by each elemental matrix.  Ae is my 
elemental matrix, auxRHSe(:) and RHSe(:) and corresponding right hand side, idx 
is the global index. My code is as follow, however ,the assembling rate is 
super slow(Marked red in the code). I am not sure whether the assembling type 
is right or not. Since for each element, idx are not continuous numbers. Do you 
have any idea what is the better way to assemble the matrix?


Thanks


block
    PetscErrorCode ierr
     PetscMPIInt rank,nproc, mystart
     PetscInt  nelem
     integer,allocatable  ::info(:)
     real(wp), allocatable :: Ae(:,:), auxRHSe(:),RHSe(:)
     integer, allocatable      :: idx(:)
     PetscScalar, pointer :: xx_v(:)

     PC               prec
     PetscScalar ::   val
     Vec              xvec,bvec,uvec
     Mat              Amat
     KSP              ksp
     PetscViewer viewer
     PetscInt  geq,j,k,ne,M,Istart,Iend
     PetscBool  flg
     KSPConvergedReason reason
     Vec         dummyVec, dummyVecs(1)
     MatNullSpace nullspace

     call PetscInitialize( PETSC_NULL_CHARACTER, ierr )

     if (ierr .ne. 0) then
     print*,'Unable to initialize PETSc'
     stop
     endif
     call MPI_Comm_size(PETSC_COMM_WORLD,nproc,ierr)
     call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr)
     mystart=rank+1

 !   Parameter set
     info=ptsystem%getInitInfo()
     nelem = info(Info_EleMatNum)+info(Info_FixedDOFNum)+info(Info_NumConstrain)
     print*,'nelem',nelem
 !-------------------------------------
 !  Create Matrix
     call MatCreate(PETSC_COMM_WORLD,Amat,ierr)
     call MatSetSizes( Amat,PETSC_DECIDE, PETSC_DECIDE, info(1), info(1), ierr )
     call MatSetType( Amat, MATMPIBAIJ, ierr )
     ! call 
MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER, 
ierr)
     call MatSetFromOptions( Amat, ierr )
     call MatSetUp( Amat, ierr )
     call MatGetOwnershipRange( Amat, Istart, Iend, ierr )

      xvec = tVec(0)
      call MatCreateVecs( Amat, PETSC_NULL_VEC, xvec, ierr )
      call VecSetFromOptions( xvec, ierr )
      call VecDuplicate( xvec, bvec, ierr )
      call VecDuplicate( xvec, uvec, ierr )

   t1 = MPI_WTime();

      do i=mystart,nelem,nproc
            call ptSystem%getElementalMAT(i, Ae, auxRHSe, idx)
            ne=size(idx)
            if (allocated(auxRHSe))  call 
VecSetValues(bvec,ne,idx-1,auxRHSe,ADD_VALUES,ierr)
             call MatSetValues(Amat,ne,idx-1,ne,idx-1,Ae,ADD_VALUES,ierr)
      end do

      nelem   = info(Info_EleRHSNum)
      mystart = rank+1

      do i = mystart, nelem, nproc
        call ptSystem%getElementalRHS(i, RHSe, idx)
        print*,'idx',idx
        ne=size(idx)
        if (allocated(RHSe))  call 
VecSetValues(bvec,ne,idx-1,RHSe,ADD_VALUES,ierr)
      end do
      call MatAssemblyBegin(Amat,MAT_FINAL_ASSEMBLY,ierr)
      call MatAssemblyEnd(Amat,MAT_FINAL_ASSEMBLY,ierr)                         
 !  this part is slow, the for loop above is done but here it may get stuck
      call VecAssemblyBegin(bvec,ierr)                                          
                          ! For a 2500 DOF system, assembling only takes over 2 
seconds
      call VecAssemblyEnd(bvec,ierr)                                            
                          ! But for a 10000 DOF system , it gets stuck
     t2 = MPI_WTime();
      print*,'assembeling time',t2-t1
      ! Solve
      call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
!  Set operators. Here the matrix that defines the linear system
!  also serves as the preconditioning matrix.
      call KSPSetOperators(ksp,Amat,Amat,ierr)
      call KSPSetFromOptions(ksp,ierr)
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!                      Solve the linear system
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    call KSPSetType(ksp,KSPCG,ierr)
    call KSPGetPC(ksp,prec,ierr)
   ! call KSPSetPCSide(ksp,PC_SYMMETRIC,ierr)
    call PCSetType(prec,PCJACOBI,ierr)
     call KSPSolve(ksp,bvec,xvec,ierr)
     call PetscFinalize(ierr)

end block



Yaxiong Chen,
Ph.D. Student


School of Mechanical Engineering, 3171

585 Purdue Mall

West Lafayette, IN 47907

Re: [petsc-users] PETSC matrix assembling super slow

Reply via email to