Optimized is a configuration flag not a versions. You need to figure out your number of non-zeros per row of you global matrix, or a bound on it, and supply that in MatMPIAIJSetPreallocation. Otherwise it has to allocate and copy memory often.
You could increase your f9 on a serial run and see what runs best and then move to parallel with a value in f6 of about 1/2 of f9. On Tue, Jan 29, 2019 at 9:13 PM Yaxiong Chen <chen2...@purdue.edu> wrote: > Thanks Mark, > > I use PETSC 3.9.4, is this the optimized version you called? > > Actually f9 and f6 are from the PETSC example. I don't know how to set > the value correctly so I commend them. The size of my elemental matrix may > vary. For 2D problem, the size of elemental matrix can be 24*24 or 32*32 or > some other sizes. And the index is not continuous. In this case, the > elemental matrix may interlace with each other in the global matrix, and I > may have thousands of elemental matrix to be assembled. Does the > preallocating suitable for this? > > > Yaxiong Chen, > Ph.D. Student > > School of Mechanical Engineering, 3171 > > 585 Purdue Mall > > West Lafayette, IN 47907 > > > > > > ------------------------------ > *From:* Mark Adams <mfad...@lbl.gov> > *Sent:* Tuesday, January 29, 2019 8:25 PM > *To:* Yaxiong Chen > *Cc:* Song Gao; petsc-users@mcs.anl.gov > *Subject:* Re: [petsc-users] PETSC matrix assembling super slow > > Slow assembly is often from not preallocating correctly. I am guessing > that you are using Q1 element and f9==9, and thus the preallocation should > be OK if this is a scalar problem on a regular grid and f6-==6 should be OK > for the off processor allocation, if my assumptions are correct. > > You can run with -info, which will tell you how many allocation were done > in assembly. Make sure that it is small (eg, 0). > > I see you use f90 array stuff 'idx-1'. Compilers can sometimes do crazy > things with seeming simple code. You could just do this manually if you can > find anything else. > > And I trust you are running an optimized version of PETSc. > > > On Tue, Jan 29, 2019 at 6:22 PM Yaxiong Chen via petsc-users < > petsc-users@mcs.anl.gov> wrote: > > Hi Song, > > I don't quite understand how I can use this command. I don't partition the > gloabl matrix. If I add my elemental matrix to the global system it will be > like this. And in my parallel part, I use each core to generate the > elemental matrix in turn. In this case, I guess each core will be assigned > the space for global matrix and finally be assembled.But according to the > manual, it seems each core will store a part of the global matrix. Is the > local > submatrix in the MatMPIAIJSetPreallocation > <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatMPIAIJSetPreallocation.html#MatMPIAIJSetPreallocation> > (Mat > <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/Mat.html#Mat> > B,PetscInt > <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt> > d_nz,const PetscInt > <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt> > d_nnz[],PetscInt > <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt> > o_nz,const PetscInt > <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt> > o_nnz[])the same as my elemental matrix? > > > > Thanks > > > Yaxiong Chen, > Ph.D. Student > > School of Mechanical Engineering, 3171 > > 585 Purdue Mall > > West Lafayette, IN 47907 > > > > > > ------------------------------ > *From:* Song Gao <song.g...@mail.mcgill.ca> > *Sent:* Tuesday, January 29, 2019 1:22 PM > *To:* Yaxiong Chen > *Cc:* Matthew Knepley; petsc-users@mcs.anl.gov > *Subject:* Re: [petsc-users] PETSC matrix assembling super slow > > I think you would prefer to preallocate the matrix > > uncomment this line > ! call > MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER, > ierr) > > > > Le mar. 29 janv. 2019, à 12 h 40, Yaxiong Chen via petsc-users < > petsc-users@mcs.anl.gov> a écrit : > > Hello, > > > I have a 2D system which is assembled by each elemental matrix. Ae is my > elemental matrix, auxRHSe(:) and RHSe(:) and corresponding right hand side, > idx is the global index. My code is as follow, however ,the assembling rate > is super slow(Marked red in the code). I am not sure whether the assembling > type is right or not. Since for each element, idx are not continuous > numbers. Do you have any idea what is the better way to assemble the matrix? > > > Thanks > > > block > PetscErrorCode ierr > PetscMPIInt rank,nproc, mystart > PetscInt nelem > integer,allocatable ::info(:) > real(wp), allocatable :: Ae(:,:), auxRHSe(:),RHSe(:) > integer, allocatable :: idx(:) > PetscScalar, pointer :: xx_v(:) > > PC prec > PetscScalar :: val > Vec xvec,bvec,uvec > Mat Amat > KSP ksp > PetscViewer viewer > PetscInt geq,j,k,ne,M,Istart,Iend > PetscBool flg > KSPConvergedReason reason > Vec dummyVec, dummyVecs(1) > MatNullSpace nullspace > > call PetscInitialize( PETSC_NULL_CHARACTER, ierr ) > > if (ierr .ne. 0) then > print*,'Unable to initialize PETSc' > stop > endif > call MPI_Comm_size(PETSC_COMM_WORLD,nproc,ierr) > call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr) > mystart=rank+1 > > ! Parameter set > info=ptsystem%getInitInfo() > nelem = > info(Info_EleMatNum)+info(Info_FixedDOFNum)+info(Info_NumConstrain) > print*,'nelem',nelem > !------------------------------------- > ! Create Matrix > call MatCreate(PETSC_COMM_WORLD,Amat,ierr) > call MatSetSizes( Amat,PETSC_DECIDE, PETSC_DECIDE, info(1), info(1), > ierr ) > call MatSetType( Amat, MATMPIBAIJ, ierr ) > ! call > MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER, > ierr) > call MatSetFromOptions( Amat, ierr ) > call MatSetUp( Amat, ierr ) > call MatGetOwnershipRange( Amat, Istart, Iend, ierr ) > > xvec = tVec(0) > call MatCreateVecs( Amat, PETSC_NULL_VEC, xvec, ierr ) > call VecSetFromOptions( xvec, ierr ) > call VecDuplicate( xvec, bvec, ierr ) > call VecDuplicate( xvec, uvec, ierr ) > > t1 = MPI_WTime(); > > do i=mystart,nelem,nproc > call ptSystem%getElementalMAT(i, Ae, auxRHSe, idx) > ne=size(idx) > if (allocated(auxRHSe)) call > VecSetValues(bvec,ne,idx-1,auxRHSe,ADD_VALUES,ierr) > call MatSetValues(Amat,ne,idx-1,ne,idx-1,Ae,ADD_VALUES,ierr) > end do > > nelem = info(Info_EleRHSNum) > mystart = rank+1 > > do i = mystart, nelem, nproc > call ptSystem%getElementalRHS(i, RHSe, idx) > print*,'idx',idx > ne=size(idx) > if (allocated(RHSe)) call > VecSetValues(bvec,ne,idx-1,RHSe,ADD_VALUES,ierr) > end do > call MatAssemblyBegin(Amat,MAT_FINAL_ASSEMBLY,ierr) > call MatAssemblyEnd(Amat,MAT_FINAL_ASSEMBLY,ierr) > ! this part is slow, the for loop above is done but here it may get > stuck > call VecAssemblyBegin(bvec,ierr) > ! For a 2500 DOF system, assembling only > takes over 2 seconds > call VecAssemblyEnd(bvec,ierr) > ! But for a 10000 DOF system , it gets stuck > t2 = MPI_WTime(); > print*,'assembeling time',t2-t1 > ! Solve > call KSPCreate(PETSC_COMM_WORLD,ksp,ierr) > ! Set operators. Here the matrix that defines the linear system > ! also serves as the preconditioning matrix. > call KSPSetOperators(ksp,Amat,Amat,ierr) > call KSPSetFromOptions(ksp,ierr) > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > ! Solve the linear system > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > call KSPSetType(ksp,KSPCG,ierr) > call KSPGetPC(ksp,prec,ierr) > ! call KSPSetPCSide(ksp,PC_SYMMETRIC,ierr) > call PCSetType(prec,PCJACOBI,ierr) > call KSPSolve(ksp,bvec,xvec,ierr) > call PetscFinalize(ierr) > > end block > > > Yaxiong Chen, > Ph.D. Student > > School of Mechanical Engineering, 3171 > > 585 Purdue Mall > > West Lafayette, IN 47907 > > > >