Re: [petsc-users] PETSC matrix assembling super slow

2019-02-05 Thread Smith, Barry F. via petsc-users

  Send configure.log and make.log to petsc-ma...@mcs.anl.gov

  Barry


> On Feb 5, 2019, at 10:48 PM, Yaxiong Chen  wrote:
> 
> Since mumps and scalapack are already installed on my computer, I only ran 
> ./configure with --download-superlu_dist . 
> After everything is done, I received the error: 
> dyld: lazy symbol binding failed: Symbol not found: 
> _MatSolverTypeRegister_SuperLU_DIST
>   Referenced from: 
> /Users/yaxiong/Downloads/petsc-3.9.4/arch-darwin-c-debug/lib/libpetsc.3.9.dylib
>   Expected in: flat namespace
> 
> dyld: Symbol not found: _MatSolverTypeRegister_SuperLU_DIST
>   Referenced from: 
> /Users/yaxiong/Downloads/petsc-3.9.4/arch-darwin-c-debug/lib/libpetsc.3.9.dylib
>   Expected in: flat namespace
> 
> [0]PETSC ERROR: 
> 
> [0]PETSC ERROR: Caught signal number 5 TRAP
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see 
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to 
> find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: -  Stack Frames 
> 
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> [0]PETSC ERROR:   INSTEAD the line number of the start of the function
> [0]PETSC ERROR:   is given.
> [0]PETSC ERROR: [0] MatInitializePackage line 150 
> /Users/yaxiong/Downloads/petsc-3.9.4/src/mat/interface/dlregismat.c
> [0]PETSC ERROR: [0] MatCreate line 83 
> /Users/yaxiong/Downloads/petsc-3.9.4/src/mat/utils/gcreate.c
> [0]PETSC ERROR: - Error Message 
> --
> [0]PETSC ERROR: Signal received
> 
> 
> From: Smith, Barry F. 
> Sent: Tuesday, February 5, 2019 10:58 PM
> To: Yaxiong Chen
> Cc: petsc-users@mcs.anl.gov
> Subject: Re: [petsc-users] PETSC matrix assembling super slow
>  
> 
>   Run ./configure with --download-superlu_dist --download-mumps 
> --download-scalapack and the you can use either -pc_factor_mat_solver_type 
> superlu_dist or -pc_factor_mat_solver_type mumps
> 
>   Good luck
> 
> 
> > On Feb 5, 2019, at 9:29 PM, Yaxiong Chen  wrote:
> > 
> > > Also, I found the solving time is also shorter when I use the direct 
> > > solver(0.432s  vs 4.332 s ). Is this due the small scale of the system? 
> > > When I have a very large (e.g., 10*10 ) system, can I expect 
> > > iterative solver being faster?
> > 
> > <<   It sounds like the default preconditioner is not working well on your 
> > problem. First run with -ksp_monitor -ksp_converged_reason to get an idea 
> > of how quickly < > iterations is high you are going to need to change the preconditioner to 
> > get one that converges well for your < > your code implementing, this will help in determining what type of 
> > preconditioner to use.
> > 
> >  << Barry
> > 
> > 
> > It seems I can only work with Jacobi pre-conditioner. When I try LU or 
> > Cholesky ,I got the error :
> > [0]PETSC ERROR: See 
> > http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for 
> > possible LU and Cholesky solvers
> > [0]PETSC ERROR: Could not locate a solver package. Perhaps you must 
> > ./configure with --download-
> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
> > trouble shooting.
> > [0]PETSC ERROR: Petsc Release Version 3.9.4, Sep, 11, 2018 
> > [0]PETSC ERROR: ./optimal_mechanical_part.hdc on a arch-darwin-c-debug 
> > named hidac.ecn.purdue.edu by chen2018 Tue Feb  5 22:19:08 2019
> > [0]PETSC ERROR: Configure options 
> > [0]PETSC ERROR: #1 MatGetFactor() line 4328 in 
> > /Users/yaxiong/Downloads/petsc-3.9.4/src/mat/interface/matrix.c
> > [0]PETSC ERROR: #2 PCSetUp_Cholesky() line 86 in 
> > /Users/yaxiong/Downloads/petsc-3.9.4/src/ksp/pc/impls/factor/cholesky/cholesky.c
> > [0]PETSC ERROR: #3 PCSetUp() line 923 in 
> > /Users/yaxiong/Downloads/petsc-3.9.4/src/ksp/pc/interface/precon.c
> > [0]PETSC ERROR: #4 KSPSetUp() line 381 in 
> > /Users/yaxiong/Downloads/petsc-3.9.4/src/ksp/ksp/interface/itfunc.c
> > [0]PETSC ERROR: #5 KSPSolve() line 612 in 
> > /Users/yaxiong/Downloads/petsc-3.9.4/src/ksp/ksp/interface/itfunc.c
> >  solve time   3.858999985025E-003
> >  reason   0
> >  It asks me to download the package. Do I want to run the command in the 
> > folder of PETSC? But

Re: [petsc-users] PETSC matrix assembling super slow

2019-02-05 Thread Yaxiong Chen via petsc-users
Since mumps and scalapack are already installed on my computer, I only ran 
./configure with --download-superlu_dist .

After everything is done, I received the error:

dyld: lazy symbol binding failed: Symbol not found: 
_MatSolverTypeRegister_SuperLU_DIST

  Referenced from: 
/Users/yaxiong/Downloads/petsc-3.9.4/arch-darwin-c-debug/lib/libpetsc.3.9.dylib

  Expected in: flat namespace


dyld: Symbol not found: _MatSolverTypeRegister_SuperLU_DIST

  Referenced from: 
/Users/yaxiong/Downloads/petsc-3.9.4/arch-darwin-c-debug/lib/libpetsc.3.9.dylib

  Expected in: flat namespace


[0]PETSC ERROR: 


[0]PETSC ERROR: Caught signal number 5 TRAP

[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger

[0]PETSC ERROR: or see 
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind

[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to 
find memory corruption errors

[0]PETSC ERROR: likely location of problem given in stack below

[0]PETSC ERROR: -  Stack Frames 


[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,

[0]PETSC ERROR:   INSTEAD the line number of the start of the function

[0]PETSC ERROR:   is given.

[0]PETSC ERROR: [0] MatInitializePackage line 150 
/Users/yaxiong/Downloads/petsc-3.9.4/src/mat/interface/dlregismat.c

[0]PETSC ERROR: [0] MatCreate line 83 
/Users/yaxiong/Downloads/petsc-3.9.4/src/mat/utils/gcreate.c

[0]PETSC ERROR: - Error Message 
--

[0]PETSC ERROR: Signal received



From: Smith, Barry F. 
Sent: Tuesday, February 5, 2019 10:58 PM
To: Yaxiong Chen
Cc: petsc-users@mcs.anl.gov
Subject: Re: [petsc-users] PETSC matrix assembling super slow


  Run ./configure with --download-superlu_dist --download-mumps 
--download-scalapack and the you can use either -pc_factor_mat_solver_type 
superlu_dist or -pc_factor_mat_solver_type mumps

  Good luck


> On Feb 5, 2019, at 9:29 PM, Yaxiong Chen  wrote:
>
> > Also, I found the solving time is also shorter when I use the direct 
> > solver(0.432s  vs 4.332 s ). Is this due the small scale of the system? 
> > When I have a very large (e.g., 10*10 ) system, can I expect 
> > iterative solver being faster?
>
> <<   It sounds like the default preconditioner is not working well on your 
> problem. First run with -ksp_monitor -ksp_converged_reason to get an idea of 
> how quickly < is high you are going to need to change the preconditioner to get one that 
> converges well for your < implementing, this will help in determining what type of preconditioner to 
> use.
>
>  << Barry
>
>
> It seems I can only work with Jacobi pre-conditioner. When I try LU or 
> Cholesky ,I got the error :
> [0]PETSC ERROR: See 
> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for 
> possible LU and Cholesky solvers
> [0]PETSC ERROR: Could not locate a solver package. Perhaps you must 
> ./configure with --download-
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
> trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.9.4, Sep, 11, 2018
> [0]PETSC ERROR: ./optimal_mechanical_part.hdc on a arch-darwin-c-debug named 
> hidac.ecn.purdue.edu by chen2018 Tue Feb  5 22:19:08 2019
> [0]PETSC ERROR: Configure options
> [0]PETSC ERROR: #1 MatGetFactor() line 4328 in 
> /Users/yaxiong/Downloads/petsc-3.9.4/src/mat/interface/matrix.c
> [0]PETSC ERROR: #2 PCSetUp_Cholesky() line 86 in 
> /Users/yaxiong/Downloads/petsc-3.9.4/src/ksp/pc/impls/factor/cholesky/cholesky.c
> [0]PETSC ERROR: #3 PCSetUp() line 923 in 
> /Users/yaxiong/Downloads/petsc-3.9.4/src/ksp/pc/interface/precon.c
> [0]PETSC ERROR: #4 KSPSetUp() line 381 in 
> /Users/yaxiong/Downloads/petsc-3.9.4/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #5 KSPSolve() line 612 in 
> /Users/yaxiong/Downloads/petsc-3.9.4/src/ksp/ksp/interface/itfunc.c
>  solve time   3.85885025E-003
>  reason   0
>  It asks me to download the package. Do I want to run the command in the 
> folder of PETSC? But I am not sure what the name of the package should be . I 
> tried with ./configure --download- But it does not work.
>
> From: Smith, Barry F. 
> Sent: Tuesday, February 5, 2019 1:18 PM
> To: Yaxiong Chen
> Cc: PETSc users list
> Subject: Re: [petsc-users] PETSC matrix assembling super slow
>
>
>
> > On Feb 5, 2019, at 10:32 AM, Yaxiong Chen  wrote:
> >
> > Thanks Barry,
> >  I will explore how to partition for parallel computation later. But 
> > now I still have some confusion on the sequential operation.
> > I c

Re: [petsc-users] PETSC matrix assembling super slow

2019-02-05 Thread Smith, Barry F. via petsc-users

  Run ./configure with --download-superlu_dist --download-mumps 
--download-scalapack and the you can use either -pc_factor_mat_solver_type 
superlu_dist or -pc_factor_mat_solver_type mumps

  Good luck


> On Feb 5, 2019, at 9:29 PM, Yaxiong Chen  wrote:
> 
> > Also, I found the solving time is also shorter when I use the direct 
> > solver(0.432s  vs 4.332 s ). Is this due the small scale of the system? 
> > When I have a very large (e.g., 10*10 ) system, can I expect 
> > iterative solver being faster?
> 
> <<   It sounds like the default preconditioner is not working well on your 
> problem. First run with -ksp_monitor -ksp_converged_reason to get an idea of 
> how quickly < is high you are going to need to change the preconditioner to get one that 
> converges well for your < implementing, this will help in determining what type of preconditioner to 
> use.
> 
>  << Barry
> 
> 
> It seems I can only work with Jacobi pre-conditioner. When I try LU or 
> Cholesky ,I got the error :
> [0]PETSC ERROR: See 
> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for 
> possible LU and Cholesky solvers
> [0]PETSC ERROR: Could not locate a solver package. Perhaps you must 
> ./configure with --download-
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
> trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.9.4, Sep, 11, 2018 
> [0]PETSC ERROR: ./optimal_mechanical_part.hdc on a arch-darwin-c-debug named 
> hidac.ecn.purdue.edu by chen2018 Tue Feb  5 22:19:08 2019
> [0]PETSC ERROR: Configure options 
> [0]PETSC ERROR: #1 MatGetFactor() line 4328 in 
> /Users/yaxiong/Downloads/petsc-3.9.4/src/mat/interface/matrix.c
> [0]PETSC ERROR: #2 PCSetUp_Cholesky() line 86 in 
> /Users/yaxiong/Downloads/petsc-3.9.4/src/ksp/pc/impls/factor/cholesky/cholesky.c
> [0]PETSC ERROR: #3 PCSetUp() line 923 in 
> /Users/yaxiong/Downloads/petsc-3.9.4/src/ksp/pc/interface/precon.c
> [0]PETSC ERROR: #4 KSPSetUp() line 381 in 
> /Users/yaxiong/Downloads/petsc-3.9.4/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #5 KSPSolve() line 612 in 
> /Users/yaxiong/Downloads/petsc-3.9.4/src/ksp/ksp/interface/itfunc.c
>  solve time   3.85885025E-003
>  reason   0
>  It asks me to download the package. Do I want to run the command in the 
> folder of PETSC? But I am not sure what the name of the package should be . I 
> tried with ./configure --download- But it does not work.
> 
> From: Smith, Barry F. 
> Sent: Tuesday, February 5, 2019 1:18 PM
> To: Yaxiong Chen
> Cc: PETSc users list
> Subject: Re: [petsc-users] PETSC matrix assembling super slow
>  
> 
> 
> > On Feb 5, 2019, at 10:32 AM, Yaxiong Chen  wrote:
> > 
> > Thanks Barry,
> >  I will explore how to partition for parallel computation later. But 
> > now I still have some confusion on the sequential operation.
> > I compared PETSC and Mumps. In both case, the subroutine for generating 
> > elemental matrix is very similar. However, the difference is in the 
> > following step:
> >call MatSetValues(Amat,ne,idx,ne,idx,Ae,ADD_VALUES,ierr)
> > In this step ,each element cost about 3~4 e-2 second.
> > 
> > However, when I use mumps, I use the following code: 
> >preA(Aptr+1:Aptr+n*(n+1)/2) = pack(Ae, mask(1:n,1:n))
> >Aptr = Aptr +n*(n+1)/2
> >if (allocated(auxRHSe)) preRHS(idx) = preRHS(idx)+auxRHSe
> >  It just cost 10e-6~10e-5 second. For a 1*1 matrix, the assembling 
> > time for using PETSC is 300s while it cost 60s when using Mumps.
> > For a 1*1 system,the  Is there any way I can make it faster? 
> 
>   As we keep saying the slowness of the matrix assembly is due to incorrect 
> preallocation. Once you have the preallocation correct the speed should 
> increase dramatically.
> 
> > Also, I found the solving time is also shorter when I use the direct 
> > solver(0.432s  vs 4.332 s ). Is this due the small scale of the system? 
> > When I have a very large (e.g., 10*10 ) system, can I expect 
> > iterative solver being faster?
> 
>It sounds like the default preconditioner is not working well on your 
> problem. First run with -ksp_monitor -ksp_converged_reason to get an idea of 
> how quickly the iterative solver is converging. If the number of iterations 
> is high you are going to need to change the preconditioner to get one that 
> converges well for your problem. What mathematical models is your code 
> implementing, this will help in determining what type of preconditioner to 
> use.
> 
>   Barry
> 
> 
> > 
> > Thanks
> > 
> > Yaxiong
> > 
> > 
> > 
&g

Re: [petsc-users] PETSC matrix assembling super slow

2019-02-05 Thread Yaxiong Chen via petsc-users
> Also, I found the solving time is also shorter when I use the direct 
> solver(0.432s  vs 4.332 s ). Is this due the small scale of the system? When 
> I have a very large (e.g., 10*10 ) system, can I expect iterative 
> solver being faster?

<<   It sounds like the default preconditioner is not working well on your 
problem. First run with -ksp_monitor -ksp_converged_reason to get an idea of 
how quickly <http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for possible 
LU and Cholesky solvers

[0]PETSC ERROR: Could not locate a solver package. Perhaps you must ./configure 
with --download-

[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.

[0]PETSC ERROR: Petsc Release Version 3.9.4, Sep, 11, 2018

[0]PETSC ERROR: ./optimal_mechanical_part.hdc on a arch-darwin-c-debug named 
hidac.ecn.purdue.edu by chen2018 Tue Feb  5 22:19:08 2019

[0]PETSC ERROR: Configure options

[0]PETSC ERROR: #1 MatGetFactor() line 4328 in 
/Users/yaxiong/Downloads/petsc-3.9.4/src/mat/interface/matrix.c

[0]PETSC ERROR: #2 PCSetUp_Cholesky() line 86 in 
/Users/yaxiong/Downloads/petsc-3.9.4/src/ksp/pc/impls/factor/cholesky/cholesky.c

[0]PETSC ERROR: #3 PCSetUp() line 923 in 
/Users/yaxiong/Downloads/petsc-3.9.4/src/ksp/pc/interface/precon.c

[0]PETSC ERROR: #4 KSPSetUp() line 381 in 
/Users/yaxiong/Downloads/petsc-3.9.4/src/ksp/ksp/interface/itfunc.c

[0]PETSC ERROR: #5 KSPSolve() line 612 in 
/Users/yaxiong/Downloads/petsc-3.9.4/src/ksp/ksp/interface/itfunc.c

 solve time   3.85885025E-003

 reason   0

 It asks me to download the package. Do I want to run the command in the folder 
of PETSC? But I am not sure what the name of the package should be . I tried 
with ./configure --download- But it does not work.


From: Smith, Barry F. 
Sent: Tuesday, February 5, 2019 1:18 PM
To: Yaxiong Chen
Cc: PETSc users list
Subject: Re: [petsc-users] PETSC matrix assembling super slow



> On Feb 5, 2019, at 10:32 AM, Yaxiong Chen  wrote:
>
> Thanks Barry,
>  I will explore how to partition for parallel computation later. But now 
> I still have some confusion on the sequential operation.
> I compared PETSC and Mumps. In both case, the subroutine for generating 
> elemental matrix is very similar. However, the difference is in the following 
> step:
>call MatSetValues(Amat,ne,idx,ne,idx,Ae,ADD_VALUES,ierr)
> In this step ,each element cost about 3~4 e-2 second.
>
> However, when I use mumps, I use the following code:
>preA(Aptr+1:Aptr+n*(n+1)/2) = pack(Ae, mask(1:n,1:n))
>Aptr = Aptr +n*(n+1)/2
>if (allocated(auxRHSe)) preRHS(idx) = preRHS(idx)+auxRHSe
>  It just cost 10e-6~10e-5 second. For a 1*1 matrix, the assembling 
> time for using PETSC is 300s while it cost 60s when using Mumps.
> For a 1*1 system,the  Is there any way I can make it faster?

  As we keep saying the slowness of the matrix assembly is due to incorrect 
preallocation. Once you have the preallocation correct the speed should 
increase dramatically.

> Also, I found the solving time is also shorter when I use the direct 
> solver(0.432s  vs 4.332 s ). Is this due the small scale of the system? When 
> I have a very large (e.g., 10*10 ) system, can I expect iterative 
> solver being faster?

   It sounds like the default preconditioner is not working well on your 
problem. First run with -ksp_monitor -ksp_converged_reason to get an idea of 
how quickly the iterative solver is converging. If the number of iterations is 
high you are going to need to change the preconditioner to get one that 
converges well for your problem. What mathematical models is your code 
implementing, this will help in determining what type of preconditioner to use.

  Barry


>
> Thanks
>
> Yaxiong
>
>
>
>
>
> Yaxiong Chen,
> Ph.D. Student
>
> School of Mechanical Engineering, 3171
> 585 Purdue Mall
> West Lafayette, IN 47907
>
>
>
>
>
> From: Smith, Barry F. 
> Sent: Monday, February 4, 2019 10:42 PM
> To: Yaxiong Chen
> Cc: PETSc users list
> Subject: Re: [petsc-users] PETSC matrix assembling super slow
>
>
>
> > On Feb 4, 2019, at 4:41 PM, Yaxiong Chen  wrote:
> >
> >   Hi Barry,
> >
> >  !===
> >   mystart =rank +1! rank starts 
> > from 0
> >   do i=mystart,nelem,nproc! nelem: total number of 
> > elements  ; nproc :number of process
> > call ptSystem%getElementalMAT(i, Ae, auxRHSe, idx)! 
> > Generate elemental matrix Ae and corresponding global Idx
> > ne=size(idx)
> > idx=idx-1   !-1 since PETSC ind

Re: [petsc-users] PETSC matrix assembling super slow

2019-02-05 Thread Mark Adams via petsc-users
On Mon, Feb 4, 2019 at 4:17 PM Yaxiong Chen  wrote:

> Hi Mark,
>
>  Will the parameter MatMPIAIJSetPreallocation in influence the
> following part
>   do i=mystart,nelem,nproc
> call ptSystem%getElementalMAT(i, Ae, auxRHSe, idx)
> ne=size(idx)
> idx=idx-1   !-1 since PETSC index starts from zero
> if (allocated(auxRHSe))  call
> VecSetValues(bvec,ne,idx,auxRHSe,ADD_VALUES,ierr)
> call MatSetValues(Amat,ne,idx,ne,idx,Ae,ADD_VALUES,ierr)
>   end do
>
> I found this part will impede my assembling process. In my case ,the total
> DOF is 20800. I estimated the upper bound  of number of nonzero entries in
> each row as 594.
>

Then you want to set f9 to 594


> So I set f9 to be 20206 and f6 to be 10103 in the following command:
>
 call
> MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER,
> ierr)
>
>
> Running the code in sequential mode with -info I got the following
> information.And I can get the result even thought the assembling process is
> kind of slow(several times slower than using Mumps).
>
> [0] MatAssemblyBegin_MPIBAIJ(): Stash has 0 entries,uses 0 mallocs.
>
> [0] MatAssemblyBegin_MPIBAIJ(): Block-Stash has 0 entries, uses 0 mallocs.
>
> [0] MatAssemblyEnd_SeqBAIJ(): Matrix size: 20800 X 20800, block size 1;
> storage space: 13847 unneeded, 1030038 used
>
> [0] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is
> 62659
>
> [0] MatAssemblyEnd_SeqBAIJ(): Most nonzeros blocks in any row is 70
>
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 20800) < 0.6. Do not use CompressedRow routines.
>
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472
> 140509341618528
>
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472
> 140509341618528
>
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472
> 140509341618528
>
> [0] VecScatterCreate_Seq(): Special case: sequential copy
>
> [0] MatAssemblyEnd_SeqBAIJ(): Matrix size: 20800 X 0, block size 1;
> storage space: 104000 unneeded, 0 used
>

This looks like your matrix was create with 0 columns.


> [0] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 0
>
> [0] MatAssemblyEnd_SeqBAIJ(): Most nonzeros blocks in any row is 0
>
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 20800)/(num_localrows 20800) > 0.6. Use CompressedRow routines.
>
>  aseembel time   282.5487180001
>
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4462888960
> 140509341442256
>
> [0] PCSetUp(): Setting up PC for first time
>
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is
> unchanged
>
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is
> unchanged
>
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is
> unchanged
>
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is
> unchanged
>
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is
> unchanged
>
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is
> unchanged
>
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is
> unchanged
>
> However, when I use the parallel mode,I got the following information:
>
> [0] MatAssemblyBegin_MPIBAIJ(): Stash has 707965 entries,uses 6 mallocs.
>
> [0] MatAssemblyBegin_MPIBAIJ(): Block-Stash has 0 entries, uses 0 mallocs.
>
> it seems it never went to  call
> MatAssemblyEnd(Amat,MAT_FINAL_ASSEMBLY,ierr)
>
> Is there anything I am doing wrong?
>
> Thanks
>
>
> Yaxiong Chen,
> Ph.D. Student
>
> School of Mechanical Engineering, 3171
>
> 585 Purdue Mall
>
> West Lafayette, IN 47907
>
>
>
>
>
> --
> *From:* Mark Adams 
> *Sent:* Tuesday, January 29, 2019 10:02 PM
> *To:* Yaxiong Chen; PETSc users list
> *Subject:* Re: [petsc-users] PETSC matrix assembling super slow
>
> Optimized is a configuration flag not a versions.
>
> You need to figure out your number of non-zeros per row of you global
> matrix, or a bound on it, and supply that in MatMPIAIJSetPreallocation.
> Otherwise it has to allocate and copy memory often.
>
> You could increase your f9 on a serial run and see what runs best and
> then  move to parallel with a value in f6 of about 1/2 of f9.
>
> On Tue, Jan 29, 2019 at 9:13 PM Yaxiong Chen  wrote:
>
> Thanks Mark,
>
> I use PETSC 3.9.4, is this the optimized version you called?
>
> Actually f9 and f6 are from the PETSC examp

Re: [petsc-users] PETSC matrix assembling super slow

2019-02-05 Thread Smith, Barry F. via petsc-users


> On Feb 5, 2019, at 10:32 AM, Yaxiong Chen  wrote:
> 
> Thanks Barry,
>  I will explore how to partition for parallel computation later. But now 
> I still have some confusion on the sequential operation.
> I compared PETSC and Mumps. In both case, the subroutine for generating 
> elemental matrix is very similar. However, the difference is in the following 
> step:
>call MatSetValues(Amat,ne,idx,ne,idx,Ae,ADD_VALUES,ierr)
> In this step ,each element cost about 3~4 e-2 second.
> 
> However, when I use mumps, I use the following code: 
>preA(Aptr+1:Aptr+n*(n+1)/2) = pack(Ae, mask(1:n,1:n))
>Aptr = Aptr +n*(n+1)/2
>if (allocated(auxRHSe)) preRHS(idx) = preRHS(idx)+auxRHSe
>  It just cost 10e-6~10e-5 second. For a 1*1 matrix, the assembling 
> time for using PETSC is 300s while it cost 60s when using Mumps.
> For a 1*1 system,the  Is there any way I can make it faster? 

  As we keep saying the slowness of the matrix assembly is due to incorrect 
preallocation. Once you have the preallocation correct the speed should 
increase dramatically.

> Also, I found the solving time is also shorter when I use the direct 
> solver(0.432s  vs 4.332 s ). Is this due the small scale of the system? When 
> I have a very large (e.g., 10*10 ) system, can I expect iterative 
> solver being faster?

   It sounds like the default preconditioner is not working well on your 
problem. First run with -ksp_monitor -ksp_converged_reason to get an idea of 
how quickly the iterative solver is converging. If the number of iterations is 
high you are going to need to change the preconditioner to get one that 
converges well for your problem. What mathematical models is your code 
implementing, this will help in determining what type of preconditioner to use.

  Barry


> 
> Thanks
> 
> Yaxiong
> 
> 
> 
> 
> 
> Yaxiong Chen, 
> Ph.D. Student 
> 
> School of Mechanical Engineering, 3171 
> 585 Purdue Mall
> West Lafayette, IN 47907
> 
> 
> 
> 
> 
> From: Smith, Barry F. 
> Sent: Monday, February 4, 2019 10:42 PM
> To: Yaxiong Chen
> Cc: PETSc users list
> Subject: Re: [petsc-users] PETSC matrix assembling super slow
>  
> 
> 
> > On Feb 4, 2019, at 4:41 PM, Yaxiong Chen  wrote:
> > 
> >   Hi Barry,
> > 
> >  !===
> >   mystart =rank +1! rank starts 
> > from 0
> >   do i=mystart,nelem,nproc! nelem: total number of 
> > elements  ; nproc :number of process 
> > call ptSystem%getElementalMAT(i, Ae, auxRHSe, idx)! 
> > Generate elemental matrix Ae and corresponding global Idx
> > ne=size(idx)
> > idx=idx-1   !-1 since PETSC index starts from zero 
> > if (allocated(auxRHSe))  call 
> > VecSetValues(bvec,ne,idx,auxRHSe,ADD_VALUES,ierr)  
> > call MatSetValues(Amat,ne,idx,ne,idx,Ae,ADD_VALUES,ierr) ! Add 
> > elemental RHS to global RHS
> >   end do
> > !===
> >   Maybe this is where I am wrong. The way I use MPI is to let each core 
> > generate the elemental matrix in turn.
> 
>This is very bad strategy because there is no data locality. 
> 
> > Which means I have one global matrix on each core and finally add them 
> > together. My case is similar to typical finite element method. But the 
> > problem is the Index is not continuous, in this case I don't know how I can 
> > partition the global matrix. Do you have any suggestions or do you have any 
> > template which can show me how finite element method use PETSC?
> 
>   What you need to do is partition the elements across the processors (so 
> that the each process has a contiguous subdomain of elements), The each 
> process computes the element stiffness for "its elements". There really isn't 
> a single PETSc example that manages this all directly for finite elements 
> because that is a rather involved process to do it so as to get good 
> performance.
> 
>Depending on how specialized your problem needs to be you might consider 
> using one of the packages libMesh, MOOSE or deal.ii to manage the elements 
> and element computations (they all use PETSc for the algebraic solvers) 
> instead of doing it yourself; it is an involved process to do it all your 
> self.
> 
>Barry
> 
> > 
> > Thanks
> > 
> > Yaxiong
> > 
> > 
> > From: Smith, Barry F. 
> > Sent: Monday, February 4, 2019 5:21 PM
> > To: Yaxiong Chen
> > Cc: Mark Adams; PETSc users lis

Re: [petsc-users] PETSC matrix assembling super slow

2019-02-04 Thread Smith, Barry F. via petsc-users


> On Feb 4, 2019, at 4:41 PM, Yaxiong Chen  wrote:
> 
>   Hi Barry,
> 
>  !===
>   mystart =rank +1! rank starts from 0
>   do i=mystart,nelem,nproc! nelem: total number of 
> elements  ; nproc :number of process 
> call ptSystem%getElementalMAT(i, Ae, auxRHSe, idx)! Generate 
> elemental matrix Ae and corresponding global Idx
> ne=size(idx)
> idx=idx-1   !-1 since PETSC index starts from zero 
> if (allocated(auxRHSe))  call 
> VecSetValues(bvec,ne,idx,auxRHSe,ADD_VALUES,ierr)  
> call MatSetValues(Amat,ne,idx,ne,idx,Ae,ADD_VALUES,ierr) ! Add 
> elemental RHS to global RHS
>   end do
> !===
>   Maybe this is where I am wrong. The way I use MPI is to let each core 
> generate the elemental matrix in turn.

   This is very bad strategy because there is no data locality. 

> Which means I have one global matrix on each core and finally add them 
> together. My case is similar to typical finite element method. But the 
> problem is the Index is not continuous, in this case I don't know how I can 
> partition the global matrix. Do you have any suggestions or do you have any 
> template which can show me how finite element method use PETSC?

  What you need to do is partition the elements across the processors (so that 
the each process has a contiguous subdomain of elements), The each process 
computes the element stiffness for "its elements". There really isn't a single 
PETSc example that manages this all directly for finite elements because that 
is a rather involved process to do it so as to get good performance.

   Depending on how specialized your problem needs to be you might consider 
using one of the packages libMesh, MOOSE or deal.ii to manage the elements and 
element computations (they all use PETSc for the algebraic solvers) instead of 
doing it yourself; it is an involved process to do it all your self.

   Barry

> 
> Thanks
> 
> Yaxiong
> 
> 
> From: Smith, Barry F. 
> Sent: Monday, February 4, 2019 5:21 PM
> To: Yaxiong Chen
> Cc: Mark Adams; PETSc users list
> Subject: Re: [petsc-users] PETSC matrix assembling super slow
>  
> 
> 
> > On Feb 4, 2019, at 3:17 PM, Yaxiong Chen via petsc-users 
> >  wrote:
> > 
> > Hi Mark,
> >  Will the parameter MatMPIAIJSetPreallocation in influence the 
> > following part
> >   do i=mystart,nelem,nproc
> > call ptSystem%getElementalMAT(i, Ae, auxRHSe, idx)
> > ne=size(idx)
> > idx=idx-1   !-1 since PETSC index starts from zero 
> > if (allocated(auxRHSe))  call 
> > VecSetValues(bvec,ne,idx,auxRHSe,ADD_VALUES,ierr)
> > call MatSetValues(Amat,ne,idx,ne,idx,Ae,ADD_VALUES,ierr)
> >   end do
> > I found this part will impede my assembling process. In my case ,the total 
> > DOF is 20800. I estimated the upper bound  of number of nonzero entries in 
> > each row as 594. So I set f9 to be 20206 and f6 to be 10103 in the 
> > following command:
> >  call 
> > MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER, 
> > ierr)
> > 
> > Running the code in sequential mode with -info I got the following 
> > information.And I can get the result even thought the assembling process is 
> > kind of slow(several times slower than using Mumps).
> > [0] MatAssemblyBegin_MPIBAIJ(): Stash has 0 entries,uses 0 mallocs.
> > [0] MatAssemblyBegin_MPIBAIJ(): Block-Stash has 0 entries, uses 0 mallocs.
> > [0] MatAssemblyEnd_SeqBAIJ(): Matrix size: 20800 X 20800, block size 1; 
> > storage space: 13847 unneeded, 1030038 used
> > [0] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 62659
> 
> 
>Something is very wrong. The number of mallocs should be zero if you have 
> the correct preallocation. Are you calling MatZeroEntries() or some other Mat 
> routine before you call MatSetValues() ?
> 
> 
> > [0] MatAssemblyEnd_SeqBAIJ(): Most nonzeros blocks in any row is 70
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 
> > 0)/(num_localrows 20800) < 0.6. Do not use CompressedRow routines.
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472 
> > 140509341618528
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472 
> > 140509341618528
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472 
> > 140509341618528
> > [0] VecScatterCreate_Seq(): 

Re: [petsc-users] PETSC matrix assembling super slow

2019-02-04 Thread Smith, Barry F. via petsc-users


> On Feb 4, 2019, at 3:17 PM, Yaxiong Chen via petsc-users 
>  wrote:
> 
> Hi Mark,
>  Will the parameter MatMPIAIJSetPreallocation in influence the following 
> part
>   do i=mystart,nelem,nproc
> call ptSystem%getElementalMAT(i, Ae, auxRHSe, idx)
> ne=size(idx)
> idx=idx-1   !-1 since PETSC index starts from zero 
> if (allocated(auxRHSe))  call 
> VecSetValues(bvec,ne,idx,auxRHSe,ADD_VALUES,ierr)
> call MatSetValues(Amat,ne,idx,ne,idx,Ae,ADD_VALUES,ierr)
>   end do
> I found this part will impede my assembling process. In my case ,the total 
> DOF is 20800. I estimated the upper bound  of number of nonzero entries in 
> each row as 594. So I set f9 to be 20206 and f6 to be 10103 in the following 
> command:
>  call 
> MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER, 
> ierr)
> 
> Running the code in sequential mode with -info I got the following 
> information.And I can get the result even thought the assembling process is 
> kind of slow(several times slower than using Mumps).
> [0] MatAssemblyBegin_MPIBAIJ(): Stash has 0 entries,uses 0 mallocs.
> [0] MatAssemblyBegin_MPIBAIJ(): Block-Stash has 0 entries, uses 0 mallocs.
> [0] MatAssemblyEnd_SeqBAIJ(): Matrix size: 20800 X 20800, block size 1; 
> storage space: 13847 unneeded, 1030038 used
> [0] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 62659


   Something is very wrong. The number of mallocs should be zero if you have 
the correct preallocation. Are you calling MatZeroEntries() or some other Mat 
routine before you call MatSetValues() ?


> [0] MatAssemblyEnd_SeqBAIJ(): Most nonzeros blocks in any row is 70
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 
> 20800) < 0.6. Do not use CompressedRow routines.
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472 
> 140509341618528
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472 
> 140509341618528
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472 
> 140509341618528
> [0] VecScatterCreate_Seq(): Special case: sequential copy
> [0] MatAssemblyEnd_SeqBAIJ(): Matrix size: 20800 X 0, block size 1; storage 
> space: 104000 unneeded, 0 used
> [0] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 0
> [0] MatAssemblyEnd_SeqBAIJ(): Most nonzeros blocks in any row is 0
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 
> 20800)/(num_localrows 20800) > 0.6. Use CompressedRow routines.
>  aseembel time   282.5487180001 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 4462888960 
> 140509341442256
> [0] PCSetUp(): Setting up PC for first time
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
> unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
> unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
> unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
> unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
> unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
> unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
> unchanged
> 
> However, when I use the parallel mode,I got the following information:
> [0] MatAssemblyBegin_MPIBAIJ(): Stash has 707965 entries,uses 6 mallocs.

   You have a lot of off-process MatSetValues(). That is one process is 
generating a lot of matrix entries that belong on another process. This is not 
desirable. Ideally each process will generate the matrix entries that belong to 
that process so only a few matrix entries need to be transported to another 
process. How are you partitioning the mesh and how are you deciding which 
process computes which which entries in the matrix? All of this may be need to 
be revisited.

  Barry



> [0] MatAssemblyBegin_MPIBAIJ(): Block-Stash has 0 entries, uses 0 mallocs.
> 
> it seems it never went to  call MatAssemblyEnd(Amat,MAT_FINAL_ASSEMBLY,ierr)
> Is there anything I am doing wrong?
> Thanks
> 
> Yaxiong Chen, 
> Ph.D. Student 
> 
> School of Mechanical Engineering, 3171 
> 585 Purdue Mall
> West Lafayette, IN 47907
> 
> 
> 
> 
> 
> From: Mark Adams 
> Sent: Tuesday, January 29, 2019 10:02 PM
> To: Yaxiong Chen; PETSc users list
> Subject: Re: [petsc-users] PETSC matrix assembling super slow
>  
> Optimized is a configuration flag not a versions.
> 
> You need to figure out your number of non-zeros per row of you global matrix, 
> or a bound on it, and supply t

Re: [petsc-users] PETSC matrix assembling super slow

2019-02-04 Thread Yaxiong Chen via petsc-users
Hi Mark,

 Will the parameter MatMPIAIJSetPreallocation in influence the following 
part

  do i=mystart,nelem,nproc
call ptSystem%getElementalMAT(i, Ae, auxRHSe, idx)
ne=size(idx)
idx=idx-1   !-1 since PETSC index starts from zero
if (allocated(auxRHSe))  call 
VecSetValues(bvec,ne,idx,auxRHSe,ADD_VALUES,ierr)
call MatSetValues(Amat,ne,idx,ne,idx,Ae,ADD_VALUES,ierr)
  end do

I found this part will impede my assembling process. In my case ,the total DOF 
is 20800. I estimated the upper bound  of number of nonzero entries in each row 
as 594. So I set f9 to be 20206 and f6 to be 10103 in the following command:

 call 
MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER, 
ierr)


Running the code in sequential mode with -info I got the following 
information.And I can get the result even thought the assembling process is 
kind of slow(several times slower than using Mumps).

[0] MatAssemblyBegin_MPIBAIJ(): Stash has 0 entries,uses 0 mallocs.

[0] MatAssemblyBegin_MPIBAIJ(): Block-Stash has 0 entries, uses 0 mallocs.

[0] MatAssemblyEnd_SeqBAIJ(): Matrix size: 20800 X 20800, block size 1; storage 
space: 13847 unneeded, 1030038 used

[0] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 62659

[0] MatAssemblyEnd_SeqBAIJ(): Most nonzeros blocks in any row is 70

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 
20800) < 0.6. Do not use CompressedRow routines.

[0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472 
140509341618528

[0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472 
140509341618528

[0] PetscCommDuplicate(): Using internal PETSc communicator 4462889472 
140509341618528

[0] VecScatterCreate_Seq(): Special case: sequential copy

[0] MatAssemblyEnd_SeqBAIJ(): Matrix size: 20800 X 0, block size 1; storage 
space: 104000 unneeded, 0 used

[0] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 0

[0] MatAssemblyEnd_SeqBAIJ(): Most nonzeros blocks in any row is 0

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows 
20800)/(num_localrows 20800) > 0.6. Use CompressedRow routines.

 aseembel time   282.5487180001

[0] PetscCommDuplicate(): Using internal PETSc communicator 4462888960 
140509341442256

[0] PCSetUp(): Setting up PC for first time

[0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
unchanged

[0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
unchanged

[0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
unchanged

[0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
unchanged

[0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
unchanged

[0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
unchanged

[0] PCSetUp(): Leaving PC with identical preconditioner since operator is 
unchanged


However, when I use the parallel mode,I got the following information:

[0] MatAssemblyBegin_MPIBAIJ(): Stash has 707965 entries,uses 6 mallocs.

[0] MatAssemblyBegin_MPIBAIJ(): Block-Stash has 0 entries, uses 0 mallocs.


it seems it never went to  call MatAssemblyEnd(Amat,MAT_FINAL_ASSEMBLY,ierr)

Is there anything I am doing wrong?

Thanks


Yaxiong Chen,
Ph.D. Student


School of Mechanical Engineering, 3171

585 Purdue Mall

West Lafayette, IN 47907






From: Mark Adams 
Sent: Tuesday, January 29, 2019 10:02 PM
To: Yaxiong Chen; PETSc users list
Subject: Re: [petsc-users] PETSC matrix assembling super slow

Optimized is a configuration flag not a versions.

You need to figure out your number of non-zeros per row of you global matrix, 
or a bound on it, and supply that in MatMPIAIJSetPreallocation. Otherwise it 
has to allocate and copy memory often.

You could increase your f9 on a serial run and see what runs best and then  
move to parallel with a value in f6 of about 1/2 of f9.

On Tue, Jan 29, 2019 at 9:13 PM Yaxiong Chen 
mailto:chen2...@purdue.edu>> wrote:

Thanks Mark,

I use PETSC 3.9.4, is this the optimized version you called?

Actually f9 and f6 are from the PETSC example. I don't know how to set the 
value correctly so I commend them. The size of my elemental matrix may vary. 
For 2D problem, the size of elemental matrix can be 24*24 or 32*32 or some 
other sizes. And the index is not continuous. In this case, the elemental 
matrix may interlace with each other in the global matrix, and I may have 
thousands of elemental matrix to be assembled. Does the preallocating suitable 
for this?


Yaxiong Chen,
Ph.D. Student


School of Mechanical Engineering, 3171

585 Purdue Mall

West Lafayette, IN 47907






From: Mark Adams mailto:mfad...@lbl.gov>>
Sent: Tuesday, January 29, 2019 8:25 PM
To: Yaxiong Chen
Cc: Song Gao; petsc-users@mcs.anl.gov<mail

Re: [petsc-users] PETSC matrix assembling super slow

2019-01-29 Thread Mark Adams via petsc-users
Optimized is a configuration flag not a versions.

You need to figure out your number of non-zeros per row of you global
matrix, or a bound on it, and supply that in MatMPIAIJSetPreallocation.
Otherwise it has to allocate and copy memory often.

You could increase your f9 on a serial run and see what runs best and then
move to parallel with a value in f6 of about 1/2 of f9.

On Tue, Jan 29, 2019 at 9:13 PM Yaxiong Chen  wrote:

> Thanks Mark,
>
> I use PETSC 3.9.4, is this the optimized version you called?
>
> Actually f9 and f6 are from the PETSC example. I don't know how to set
> the value correctly so I commend them. The size of my elemental matrix may
> vary. For 2D problem, the size of elemental matrix can be 24*24 or 32*32 or
> some other sizes. And the index is not continuous. In this case, the
> elemental matrix may interlace with each other in the global matrix, and I
> may have thousands of elemental matrix to be assembled. Does the
> preallocating suitable for this?
>
>
> Yaxiong Chen,
> Ph.D. Student
>
> School of Mechanical Engineering, 3171
>
> 585 Purdue Mall
>
> West Lafayette, IN 47907
>
>
>
>
>
> --
> *From:* Mark Adams 
> *Sent:* Tuesday, January 29, 2019 8:25 PM
> *To:* Yaxiong Chen
> *Cc:* Song Gao; petsc-users@mcs.anl.gov
> *Subject:* Re: [petsc-users] PETSC matrix assembling super slow
>
> Slow assembly is often from not preallocating correctly. I am guessing
> that you are using Q1 element and f9==9, and thus the preallocation should
> be OK if this is a scalar problem on a regular grid and f6-==6 should be OK
> for the off processor allocation, if my assumptions are correct.
>
> You can run with -info, which will tell you how many allocation were done
> in assembly. Make sure that it is small (eg, 0).
>
> I see you use f90 array stuff 'idx-1'. Compilers can sometimes do crazy
> things with seeming simple code. You could just do this manually if you can
> find anything else.
>
> And I trust you are running an optimized version of PETSc.
>
>
> On Tue, Jan 29, 2019 at 6:22 PM Yaxiong Chen via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
> Hi Song,
>
> I don't quite understand how I can use this command. I don't partition the
> gloabl matrix. If I add my elemental matrix to the global system it will be
> like this. And in my parallel part, I use each core to generate the
> elemental matrix in turn. In this case, I guess each core will be assigned
> the space for global matrix and finally be assembled.But according to the
> manual, it seems each core will store a part of the global matrix. Is the 
> local
> submatrix in the MatMPIAIJSetPreallocation
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatMPIAIJSetPreallocation.html#MatMPIAIJSetPreallocation>
> (Mat
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/Mat.html#Mat>
> B,PetscInt
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
> d_nz,const PetscInt
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
> d_nnz[],PetscInt
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
> o_nz,const PetscInt
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
> o_nnz[])the same as my elemental matrix?
>
>
>
> Thanks
>
>
> Yaxiong Chen,
> Ph.D. Student
>
> School of Mechanical Engineering, 3171
>
> 585 Purdue Mall
>
> West Lafayette, IN 47907
>
>
>
>
>
> --
> *From:* Song Gao 
> *Sent:* Tuesday, January 29, 2019 1:22 PM
> *To:* Yaxiong Chen
> *Cc:* Matthew Knepley; petsc-users@mcs.anl.gov
> *Subject:* Re: [petsc-users] PETSC matrix assembling super slow
>
> I think you would prefer to preallocate the matrix
>
> uncomment this line
> ! call
> MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER,
> ierr)
>
>
>
> Le mar. 29 janv. 2019, à 12 h 40, Yaxiong Chen via petsc-users <
> petsc-users@mcs.anl.gov> a écrit :
>
> Hello,
>
>
> I have a 2D system which is assembled by each elemental matrix.  Ae is my
> elemental matrix, auxRHSe(:) and RHSe(:) and corresponding right hand side,
> idx is the global index. My code is as follow, however ,the assembling rate
> is super slow(Marked red in the code). I am not sure whether the assembling
> type is right or not. Since for each element, idx are not continuous
> numbers. Do you have any idea what is the better way to assemble the matrix?
>
>
> Thanks
>
>
> block
> PetscErro

Re: [petsc-users] PETSC matrix assembling super slow

2019-01-29 Thread Mark Adams via petsc-users
Slow assembly is often from not preallocating correctly. I am guessing that
you are using Q1 element and f9==9, and thus the preallocation should be OK
if this is a scalar problem on a regular grid and f6-==6 should be OK for
the off processor allocation, if my assumptions are correct.

You can run with -info, which will tell you how many allocation were done
in assembly. Make sure that it is small (eg, 0).

I see you use f90 array stuff 'idx-1'. Compilers can sometimes do crazy
things with seeming simple code. You could just do this manually if you can
find anything else.

And I trust you are running an optimized version of PETSc.


On Tue, Jan 29, 2019 at 6:22 PM Yaxiong Chen via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Hi Song,
>
> I don't quite understand how I can use this command. I don't partition the
> gloabl matrix. If I add my elemental matrix to the global system it will be
> like this. And in my parallel part, I use each core to generate the
> elemental matrix in turn. In this case, I guess each core will be assigned
> the space for global matrix and finally be assembled.But according to the
> manual, it seems each core will store a part of the global matrix. Is the 
> local
> submatrix in the MatMPIAIJSetPreallocation
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatMPIAIJSetPreallocation.html#MatMPIAIJSetPreallocation>
> (Mat
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/Mat.html#Mat>
> B,PetscInt
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
> d_nz,const PetscInt
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
> d_nnz[],PetscInt
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
> o_nz,const PetscInt
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
> o_nnz[])the same as my elemental matrix?
>
>
>
> Thanks
>
>
> Yaxiong Chen,
> Ph.D. Student
>
> School of Mechanical Engineering, 3171
>
> 585 Purdue Mall
>
> West Lafayette, IN 47907
>
>
>
>
>
> ----------
> *From:* Song Gao 
> *Sent:* Tuesday, January 29, 2019 1:22 PM
> *To:* Yaxiong Chen
> *Cc:* Matthew Knepley; petsc-users@mcs.anl.gov
> *Subject:* Re: [petsc-users] PETSC matrix assembling super slow
>
> I think you would prefer to preallocate the matrix
>
> uncomment this line
> ! call
> MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER,
> ierr)
>
>
>
> Le mar. 29 janv. 2019, à 12 h 40, Yaxiong Chen via petsc-users <
> petsc-users@mcs.anl.gov> a écrit :
>
> Hello,
>
>
> I have a 2D system which is assembled by each elemental matrix.  Ae is my
> elemental matrix, auxRHSe(:) and RHSe(:) and corresponding right hand side,
> idx is the global index. My code is as follow, however ,the assembling rate
> is super slow(Marked red in the code). I am not sure whether the assembling
> type is right or not. Since for each element, idx are not continuous
> numbers. Do you have any idea what is the better way to assemble the matrix?
>
>
> Thanks
>
>
> block
> PetscErrorCode ierr
>  PetscMPIInt rank,nproc, mystart
>  PetscInt  nelem
>  integer,allocatable  ::info(:)
>  real(wp), allocatable :: Ae(:,:), auxRHSe(:),RHSe(:)
>  integer, allocatable  :: idx(:)
>  PetscScalar, pointer :: xx_v(:)
>
>  PC   prec
>  PetscScalar ::   val
>  Vec  xvec,bvec,uvec
>  Mat  Amat
>  KSP  ksp
>  PetscViewer viewer
>  PetscInt  geq,j,k,ne,M,Istart,Iend
>  PetscBool  flg
>  KSPConvergedReason reason
>  Vec dummyVec, dummyVecs(1)
>  MatNullSpace nullspace
>
>  call PetscInitialize( PETSC_NULL_CHARACTER, ierr )
>
>  if (ierr .ne. 0) then
>  print*,'Unable to initialize PETSc'
>  stop
>  endif
>  call MPI_Comm_size(PETSC_COMM_WORLD,nproc,ierr)
>  call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr)
>  mystart=rank+1
>
>  !   Parameter set
>  info=ptsystem%getInitInfo()
>  nelem =
> info(Info_EleMatNum)+info(Info_FixedDOFNum)+info(Info_NumConstrain)
>  print*,'nelem',nelem
>  !-
>  !  Create Matrix
>  call MatCreate(PETSC_COMM_WORLD,Amat,ierr)
>  call MatSetSizes( Amat,PETSC_DECIDE, PETSC_DECIDE, info(1), info(1),
> ierr )
>  call MatSetType( Amat, MATMPIBAIJ, ierr )
>  ! call
> MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL

Re: [petsc-users] PETSC matrix assembling super slow

2019-01-29 Thread Yaxiong Chen via petsc-users
Hi Song,

I don't quite understand how I can use this command. I don't partition the 
gloabl matrix. If I add my elemental matrix to the global system it will be 
like this. And in my parallel part, I use each core to generate the elemental 
matrix in turn. In this case, I guess each core will be assigned the space for 
global matrix and finally be assembled.But according to the manual, it seems 
each core will store a part of the global matrix. Is the local submatrix in the 
MatMPIAIJSetPreallocation<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatMPIAIJSetPreallocation.html#MatMPIAIJSetPreallocation>(Mat<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/Mat.html#Mat>
 
B,PetscInt<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
 d_nz,const 
PetscInt<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
 
d_nnz[],PetscInt<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
 o_nz,const 
PetscInt<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInt.html#PetscInt>
 o_nnz[])the same as my elemental matrix?

[cid:ac5af973-49cd-4207-8f47-a94751d7f444]

Thanks


Yaxiong Chen,
Ph.D. Student


School of Mechanical Engineering, 3171

585 Purdue Mall

West Lafayette, IN 47907






From: Song Gao 
Sent: Tuesday, January 29, 2019 1:22 PM
To: Yaxiong Chen
Cc: Matthew Knepley; petsc-users@mcs.anl.gov
Subject: Re: [petsc-users] PETSC matrix assembling super slow

I think you would prefer to preallocate the matrix

uncomment this line
! call 
MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER, 
ierr)



Le mar. 29 janv. 2019, à 12 h 40, Yaxiong Chen via petsc-users 
mailto:petsc-users@mcs.anl.gov>> a écrit :

Hello,


I have a 2D system which is assembled by each elemental matrix.  Ae is my 
elemental matrix, auxRHSe(:) and RHSe(:) and corresponding right hand side, idx 
is the global index. My code is as follow, however ,the assembling rate is 
super slow(Marked red in the code). I am not sure whether the assembling type 
is right or not. Since for each element, idx are not continuous numbers. Do you 
have any idea what is the better way to assemble the matrix?


Thanks


block
PetscErrorCode ierr
 PetscMPIInt rank,nproc, mystart
 PetscInt  nelem
 integer,allocatable  ::info(:)
 real(wp), allocatable :: Ae(:,:), auxRHSe(:),RHSe(:)
 integer, allocatable  :: idx(:)
 PetscScalar, pointer :: xx_v(:)

 PC   prec
 PetscScalar ::   val
 Vec  xvec,bvec,uvec
 Mat  Amat
 KSP  ksp
 PetscViewer viewer
 PetscInt  geq,j,k,ne,M,Istart,Iend
 PetscBool  flg
 KSPConvergedReason reason
 Vec dummyVec, dummyVecs(1)
 MatNullSpace nullspace

 call PetscInitialize( PETSC_NULL_CHARACTER, ierr )

 if (ierr .ne. 0) then
 print*,'Unable to initialize PETSc'
 stop
 endif
 call MPI_Comm_size(PETSC_COMM_WORLD,nproc,ierr)
 call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr)
 mystart=rank+1

 !   Parameter set
 info=ptsystem%getInitInfo()
 nelem = info(Info_EleMatNum)+info(Info_FixedDOFNum)+info(Info_NumConstrain)
 print*,'nelem',nelem
 !-
 !  Create Matrix
 call MatCreate(PETSC_COMM_WORLD,Amat,ierr)
 call MatSetSizes( Amat,PETSC_DECIDE, PETSC_DECIDE, info(1), info(1), ierr )
 call MatSetType( Amat, MATMPIBAIJ, ierr )
 ! call 
MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER, 
ierr)
 call MatSetFromOptions( Amat, ierr )
 call MatSetUp( Amat, ierr )
 call MatGetOwnershipRange( Amat, Istart, Iend, ierr )

  xvec = tVec(0)
  call MatCreateVecs( Amat, PETSC_NULL_VEC, xvec, ierr )
  call VecSetFromOptions( xvec, ierr )
  call VecDuplicate( xvec, bvec, ierr )
  call VecDuplicate( xvec, uvec, ierr )

   t1 = MPI_WTime();

  do i=mystart,nelem,nproc
call ptSystem%getElementalMAT(i, Ae, auxRHSe, idx)
ne=size(idx)
if (allocated(auxRHSe))  call 
VecSetValues(bvec,ne,idx-1,auxRHSe,ADD_VALUES,ierr)
 call MatSetValues(Amat,ne,idx-1,ne,idx-1,Ae,ADD_VALUES,ierr)
  end do

  nelem   = info(Info_EleRHSNum)
  mystart = rank+1

  do i = mystart, nelem, nproc
call ptSystem%getElementalRHS(i, RHSe, idx)
print*,'idx',idx
ne=size(idx)
if (allocated(RHSe))  call 
VecSetValues(bvec,ne,idx-1,RHSe,ADD_VALUES,ierr)
  end do
  call MatAssemblyBegin(Amat,MAT_FINAL_ASSEMBLY,ierr)
  call MatAssemblyEnd(Amat,MAT_FINAL_ASSEMBLY,ierr) 
 !  this part is slow, the for loop above is done but here it may get stuck
  call VecAssemblyBegin(bvec,ierr)   

Re: [petsc-users] PETSC matrix assembling super slow

2019-01-29 Thread Song Gao via petsc-users
I think you would prefer to preallocate the matrix

uncomment this line
! call
MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER,
ierr)



Le mar. 29 janv. 2019, à 12 h 40, Yaxiong Chen via petsc-users <
petsc-users@mcs.anl.gov> a écrit :

> Hello,
>
>
> I have a 2D system which is assembled by each elemental matrix.  Ae is my
> elemental matrix, auxRHSe(:) and RHSe(:) and corresponding right hand side,
> idx is the global index. My code is as follow, however ,the assembling rate
> is super slow(Marked red in the code). I am not sure whether the assembling
> type is right or not. Since for each element, idx are not continuous
> numbers. Do you have any idea what is the better way to assemble the matrix?
>
>
> Thanks
>
>
> block
> PetscErrorCode ierr
>  PetscMPIInt rank,nproc, mystart
>  PetscInt  nelem
>  integer,allocatable  ::info(:)
>  real(wp), allocatable :: Ae(:,:), auxRHSe(:),RHSe(:)
>  integer, allocatable  :: idx(:)
>  PetscScalar, pointer :: xx_v(:)
>
>  PC   prec
>  PetscScalar ::   val
>  Vec  xvec,bvec,uvec
>  Mat  Amat
>  KSP  ksp
>  PetscViewer viewer
>  PetscInt  geq,j,k,ne,M,Istart,Iend
>  PetscBool  flg
>  KSPConvergedReason reason
>  Vec dummyVec, dummyVecs(1)
>  MatNullSpace nullspace
>
>  call PetscInitialize( PETSC_NULL_CHARACTER, ierr )
>
>  if (ierr .ne. 0) then
>  print*,'Unable to initialize PETSc'
>  stop
>  endif
>  call MPI_Comm_size(PETSC_COMM_WORLD,nproc,ierr)
>  call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr)
>  mystart=rank+1
>
>  !   Parameter set
>  info=ptsystem%getInitInfo()
>  nelem =
> info(Info_EleMatNum)+info(Info_FixedDOFNum)+info(Info_NumConstrain)
>  print*,'nelem',nelem
>  !-
>  !  Create Matrix
>  call MatCreate(PETSC_COMM_WORLD,Amat,ierr)
>  call MatSetSizes( Amat,PETSC_DECIDE, PETSC_DECIDE, info(1), info(1),
> ierr )
>  call MatSetType( Amat, MATMPIBAIJ, ierr )
>  ! call
> MatMPIAIJSetPreallocation(Amat,f9,PETSC_NULL_INTEGER,f6,PETSC_NULL_INTEGER,
> ierr)
>  call MatSetFromOptions( Amat, ierr )
>  call MatSetUp( Amat, ierr )
>  call MatGetOwnershipRange( Amat, Istart, Iend, ierr )
>
>   xvec = tVec(0)
>   call MatCreateVecs( Amat, PETSC_NULL_VEC, xvec, ierr )
>   call VecSetFromOptions( xvec, ierr )
>   call VecDuplicate( xvec, bvec, ierr )
>   call VecDuplicate( xvec, uvec, ierr )
>
>t1 = MPI_WTime();
>
>   do i=mystart,nelem,nproc
> call ptSystem%getElementalMAT(i, Ae, auxRHSe, idx)
> ne=size(idx)
> if (allocated(auxRHSe))  call
> VecSetValues(bvec,ne,idx-1,auxRHSe,ADD_VALUES,ierr)
>  call MatSetValues(Amat,ne,idx-1,ne,idx-1,Ae,ADD_VALUES,ierr)
>   end do
>
>   nelem   = info(Info_EleRHSNum)
>   mystart = rank+1
>
>   do i = mystart, nelem, nproc
> call ptSystem%getElementalRHS(i, RHSe, idx)
> print*,'idx',idx
> ne=size(idx)
> if (allocated(RHSe))  call
> VecSetValues(bvec,ne,idx-1,RHSe,ADD_VALUES,ierr)
>   end do
>   call MatAssemblyBegin(Amat,MAT_FINAL_ASSEMBLY,ierr)
>   call MatAssemblyEnd(Amat,MAT_FINAL_ASSEMBLY,ierr)
>   !  this part is slow, the for loop above is done but here it may get
> stuck
>   call VecAssemblyBegin(bvec,ierr)
> ! For a 2500 DOF system, assembling only
> takes over 2 seconds
>   call VecAssemblyEnd(bvec,ierr)
> ! But for a 1 DOF system , it gets stuck
>  t2 = MPI_WTime();
>   print*,'assembeling time',t2-t1
>   ! Solve
>   call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
> !  Set operators. Here the matrix that defines the linear system
> !  also serves as the preconditioning matrix.
>   call KSPSetOperators(ksp,Amat,Amat,ierr)
>   call KSPSetFromOptions(ksp,ierr)
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> !  Solve the linear system
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> call KSPSetType(ksp,KSPCG,ierr)
> call KSPGetPC(ksp,prec,ierr)
>! call KSPSetPCSide(ksp,PC_SYMMETRIC,ierr)
> call PCSetType(prec,PCJACOBI,ierr)
>  call KSPSolve(ksp,bvec,xvec,ierr)
>  call PetscFinalize(ierr)
>
> end block
>
>
> Yaxiong Chen,
> Ph.D. Student
>
> School of Mechanical Engineering, 3171
>
> 585 Purdue Mall
>
> West Lafayette, IN 47907
>
>
>
>