This log is for 100 time-steps, not a single time step
On Sun, Jul 14, 2019 at 3:01 AM Mark Adams <mfad...@lbl.gov> wrote: > You call the assembly stuff a lot (200). BuildTwoSidedF is a global thing > and is taking a lot of time. You should just call these once per time step > (it looks like you are just doing one time step). > > > --- Event Stage 1: Matrix Construction > > BuildTwoSidedF 400 1.0 6.5222e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 5 0 0 0 0 0 > VecSet 1 1.0 2.8610e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyBegin 200 1.0 6.2633e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 5 0 0 0 0 0 > VecAssemblyEnd 200 1.0 6.7163e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 200 1.0 5.9373e-03 2.2 0.00e+00 0.0 3.6e+03 2.1e+03 > 0.0e+00 0 0 79 2 0 0 0 99100 0 0 > VecScatterEnd 200 1.0 2.7236e-0223.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyBegin 200 1.0 3.2747e-02 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 200 1.0 9.0972e-01 1.0 0.00e+00 0.0 3.6e+01 5.3e+02 > 8.0e+00 4 0 1 0 6 9 0 1 0100 0 > AssembleMats 200 1.0 1.5568e+00 1.2 0.00e+00 0.0 3.6e+03 2.1e+03 > 8.0e+00 6 0 79 2 6 14 0100100100 0 > myMatSetValues 200 1.0 2.5367e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 11 0 0 0 0 25 0 0 0 0 0 > setNativeMat 100 1.0 2.8223e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 12 0 0 0 0 28 0 0 0 0 0 > setNativeMatII 100 1.0 3.2174e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 14 0 0 0 0 31 0 0 0 0 0 > callScheme 100 1.0 2.0700e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 2 0 0 0 0 0 > > > > On Fri, Jul 12, 2019 at 11:56 PM Mohammed Mostafa via petsc-users < > petsc-users@mcs.anl.gov> wrote: > >> Hello Matt, >> Attached is the dumped entire log output using -log_view and -info. >> >> Thanks, >> Kamra >> >> On Fri, Jul 12, 2019 at 9:23 PM Matthew Knepley <knep...@gmail.com> >> wrote: >> >>> On Fri, Jul 12, 2019 at 5:19 AM Mohammed Mostafa via petsc-users < >>> petsc-users@mcs.anl.gov> wrote: >>> >>>> Hello all, >>>> I have a few question regarding Petsc, >>>> >>> >>> Please send the entire output of a run with all the logging turned on, >>> using -log_view and -info. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Question 1: >>>> For the profiling , is it possible to only show the user defined log >>>> events in the breakdown of each stage in Log-view. >>>> I tried deactivating all ClassIDs, MAT,VEC, PC, KSP,PC, >>>> PetscLogEventExcludeClass(MAT_CLASSID); >>>> PetscLogEventExcludeClass(VEC_CLASSID); >>>> PetscLogEventExcludeClass(KSP_CLASSID); >>>> PetscLogEventExcludeClass(PC_CLASSID); >>>> which should "Deactivates event logging for a PETSc object class in >>>> every stage" according to the manual. >>>> however I still see them in the stage breakdown >>>> --- Event Stage 1: Matrix Construction >>>> >>>> BuildTwoSidedF 4 1.0 2.7364e-02 2.4 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 0.0e+00 0 0 0 0 0 18 0 0 0 0 0 >>>> VecSet 1 1.0 4.5300e-06 2.4 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecAssemblyBegin 2 1.0 2.7344e-02 2.4 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 0.0e+00 0 0 0 0 0 18 0 0 0 0 0 >>>> VecAssemblyEnd 2 1.0 8.3447e-06 1.5 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecScatterBegin 2 1.0 7.5102e-05 1.7 0.00e+00 0.0 3.6e+01 >>>> 2.1e+03 0.0e+00 0 0 3 0 0 0 0 50 80 0 0 >>>> VecScatterEnd 2 1.0 3.5286e-05 2.2 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatAssemblyBegin 2 1.0 8.8930e-05 1.9 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatAssemblyEnd 2 1.0 1.3566e-02 1.1 0.00e+00 0.0 3.6e+01 >>>> 5.3e+02 8.0e+00 0 0 3 0 6 10 0 50 20100 0 >>>> AssembleMats 2 1.0 3.9774e-02 1.7 0.00e+00 0.0 7.2e+01 >>>> 1.3e+03 8.0e+00 0 0 7 0 6 28 0100100100 0 # USER EVENT >>>> myMatSetValues 2 1.0 2.6931e-02 1.2 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 0.0e+00 0 0 0 0 0 19 0 0 0 0 0 # USER EVENT >>>> setNativeMat 1 1.0 3.5613e-02 1.3 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 0.0e+00 0 0 0 0 0 24 0 0 0 0 0 # USER EVENT >>>> setNativeMatII 1 1.0 4.7023e-02 1.5 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 0.0e+00 0 0 0 0 0 28 0 0 0 0 0 # USER EVENT >>>> callScheme 1 1.0 2.2333e-03 1.2 0.00e+00 0.0 0.0e+00 >>>> 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0 # USER EVENT >>>> >>>> Also is possible to clear the logs so that I can write a separate >>>> profiling output file for each timestep ( since I am solving a transient >>>> problem and I want to know the change in performance as time goes by ) >>>> >>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>> Question 2: >>>> Regarding MatSetValues >>>> Right now, I writing a finite volume code, due to algorithm requirement >>>> I have to write the matrix into local native format ( array of arrays) and >>>> then loop through rows and use MatSetValues to set the elements in "Mat A" >>>> MatSetValues(A, 1, &row, nj, j_index, coefvalues, INSERT_VALUES); >>>> but it is very slow and it is killing my performance >>>> although the matrix was properly set using >>>> MatCreateAIJ(PETSC_COMM_WORLD, this->local_size, this->local_size, >>>> PETSC_DETERMINE, >>>> PETSC_DETERMINE, -1, d_nnz, -1, o_nnz, &A); >>>> with d_nnz,and o_nnz properly assigned so no mallocs occur during >>>> matsetvalues and all inserted values are local so no off-processor values >>>> So my question is it possible to set multiple rows at once hopefully >>>> all, I checked the manual and MatSetValues can only set dense matrix block >>>> because it seems that row by row is expensive >>>> Or perhaps is it possible to copy all rows to the underlying matrix >>>> data, as I mentioned all values are local and no off-processor values ( >>>> stash is 0 ) >>>> [0] VecAssemblyBegin_MPI_BTS(): Stash has 0 entries, uses 0 mallocs. >>>> [0] VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0 >>>> mallocs. >>>> [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. >>>> [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. >>>> [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. >>>> [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. >>>> [4] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. >>>> [5] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. >>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 186064; storage >>>> space: 0 unneeded,743028 used >>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage >>>> space: 0 unneeded,742972 used >>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4 >>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines. >>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4 >>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 0)/(num_localrows 186064) < 0.6. Do not use CompressedRow routines. >>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage >>>> space: 0 unneeded,743093 used >>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage >>>> space: 0 unneeded,743036 used >>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4 >>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines. >>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage >>>> space: 0 unneeded,742938 used >>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4 >>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines. >>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4 >>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines. >>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage >>>> space: 0 unneeded,743049 used >>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4 >>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines. >>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 685; storage space: >>>> 0 unneeded,685 used >>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 649; storage space: >>>> 0 unneeded,649 used >>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 >>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 185414)/(num_localrows 186063) > 0.6. Use CompressedRow routines. >>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 >>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 185379)/(num_localrows 186064) > 0.6. Use CompressedRow routines. >>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1011; storage space: >>>> 0 unneeded,1011 used >>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1137; storage space: >>>> 0 unneeded,1137 used >>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 >>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 184925)/(num_localrows 186062) > 0.6. Use CompressedRow routines. >>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 >>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 658; storage space: >>>> 0 unneeded,658 used >>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 648; storage space: >>>> 0 unneeded,648 used >>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 185051)/(num_localrows 186062) > 0.6. Use CompressedRow routines. >>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 >>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 185414)/(num_localrows 186062) > 0.6. Use CompressedRow routines. >>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 >>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 185405)/(num_localrows 186063) > 0.6. Use CompressedRow routines. >>>> >>>> >>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>> Question 3: >>>> If all matrix and vector inserted data are local, what part of the >>>> vec/mat assembly consumes time because matsetvalues and matassembly consume >>>> more time than matrix builder >>>> Also this is not just for the first time MAT_FINAL_ASSEMBLY >>>> >>>> >>>> For context the matrix in the above is nearly 1Mx1M partitioned over >>>> six processes and it was NOT built using DM >>>> >>>> Finally the configure options are: >>>> >>>> Configure options: >>>> PETSC_ARCH=release3 -with-debugging=0 COPTFLAGS="-O3 -march=native >>>> -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3 >>>> -march=native -mtune=native" --with-cc=mpicc --with-cxx=mpicxx >>>> --with-fc=mpif90 --download-metis --download-hypre >>>> >>>> Sorry for such long question and thanks in advance >>>> Thanks >>>> M. Kamra >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> <http://www.cse.buffalo.edu/~knepley/> >>> >>