Hi! I use the same machine, same nodes and same processors per nodes. And I test many times, so this seems not an accidental result. But your points do inspire me. I use Global Array's communicator when solving matrix A, ang just MPI_COMM_WORLD in B. In every node, Global Array's communicator make one processor dedicated to manage communicate, maybe this is the reason for the difference in communicating speed?
I will have a try and respond as soon as I get the result! Runfeng Jin Jose E. Roman <[email protected]> 于2022年6月15日周三 16:09写道: > You are comparing two different codes on two different machines? Or is it > the same machine? with different number of processes and different solver > options... > > If it is the same machine, the performance seems very different: > > Matrix A: > Average time for MPI_Barrier(): 1.90986e-05 > Average time for zero size MPI_Send(): 3.44587e-06 > > Matrix B: > Average time for MPI_Barrier(): 0.0578456 > Average time for zero size MPI_Send(): 0.00358668 > > The reductions (VecReduceComm) are taking 2.1629e-01 and 2.4972e+01, > respectively. It's a two orders of magnitude difference. > > Jose > > > > El 15 jun 2022, a las 8:58, Runfeng Jin <[email protected]> escribió: > > > > Sorry ,I miss the attachment. > > > > Runfeng Jin > > > > Runfeng Jin <[email protected]> 于2022年6月15日周三 14:56写道: > > Hi! You are right! I try to use a SLEPc and PETSc version with nodebug, > and the matrix B's solver time become 99s. But It is still a little higher > than matrix A(8s). Same as mentioned before, attachment is log view of > no-debug version: > > file 1: log of matrix A solver. This is a larger > matrix(900,000*900,000) but solved quickly(8s); > > file 2: log of matix B solver. This is a smaller matrix(2,547*2,547) > but solved much slower(99s). > > > > By comparing these two files, the strang phenomenon still exist: > > 1) Matrix A has more basis vectors(375) than B(189), but A spent less > time on BVCreate(0.6s) than B(32s); > > 2) Matrix A spent less time on EPSSetup(0.015s) than B(0.9s) > > 3) In debug version, matrix B distribute much more unbalancedly storage > among processors(memory max/min 4365) than A(memory max/min 1.113), but > other metrics seems more balanced. And in no-debug version there is no > memory information output. > > > > The significant difference I can tell is :1) B use preallocation; 2) A's > matrix elements are calculated by CPU, while B's matrix elements are > calculated by GPU and then transfered to CPU and solved by PETSc in CPU. > > > > Does this is a normal result? I mean, the matrix with less non-zero > elements and less dimension can cost more epssolve time? Is this due to the > structure of matrix? IF so, is there any ways to increase the solve speed? > > > > Or this is weired and should be fixed by some ways? > > Thank you! > > > > Runfeng Jin > > > > > > Jose E. Roman <[email protected]> 于2022年6月12日周日 16:08写道: > > Please always respond to the list. > > > > Pay attention to the warnings in the log: > > > > ########################################################## > > # # > > # WARNING!!! # > > # # > > # This code was compiled with a debugging option. # > > # To get timing results run ./configure # > > # using --with-debugging=no, the performance will # > > # be generally two or three times faster. # > > # # > > ########################################################## > > > > With the debugging option the times are not trustworthy, so I suggest > repeating the analysis with an optimized build. > > > > Jose > > > > > > > El 12 jun 2022, a las 5:41, Runfeng Jin <[email protected]> > escribió: > > > > > > Hello! > > > I compare these two matrix solver's log view and find some strange > thing. Attachment files are the log view.: > > > file 1: log of matrix A solver. This is a larger > matrix(900,000*900,000) but solved quickly(30s); > > > file 2: log of matix B solver. This is a smaller matrix(2,547*2,547 > , a little different from the matrix B that is mentioned in initial email, > but solved much slower too. I use this for a quicker test) but solved much > slower(1244s). > > > > > > By comparing these two files, I find some thing: > > > 1) Matrix A has more basis vectors(375) than B(189), but A spent less > time on BVCreate(0.349s) than B(296s); > > > 2) Matrix A spent less time on EPSSetup(0.031s) than B(10.709s) > > > 3) Matrix B distribute much more unbalancedly storage among > processors(memory max/min 4365) than A(memory max/min 1.113), but other > metrics seems more balanced. > > > > > > I don't do prealocation in A, and it is distributed across processors > by PETSc. For B , when preallocation I use PetscSplitOwnership to decide > which part belongs to local processor, and B is also distributed by PETSc > when compute matrix values. > > > > > > - Does this mean, for matrix B, too much nonzero elements are stored > in single process, and this is why it cost too much more time in solving > the matrix and find eigenvalues? If so, are there some better ways to > distribute the matrix among processors? > > > - Or are there any else reasons for this difference in cost time? > > > > > > Hope to recieve your reply, thank you! > > > > > > Runfeng Jin > > > > > > > > > > > > Runfeng Jin <[email protected]> 于2022年6月11日周六 20:33写道: > > > Hello! > > > I have try ues PETSC_DEFAULT for eps_ncv, but it still cost much time. > Is there anything else I can do? Attachment is log when use PETSC_DEFAULT > for eps_ncv. > > > > > > Thank you ! > > > > > > Runfeng Jin > > > > > > Jose E. Roman <[email protected]> 于2022年6月10日周五 20:50写道: > > > The value -eps_ncv 5000 is huge. > > > Better let SLEPc use the default value. > > > > > > Jose > > > > > > > > > > El 10 jun 2022, a las 14:24, Jin Runfeng <[email protected]> > escribió: > > > > > > > > Hello! > > > > I want to acquire the 3 smallest eigenvalue, and attachment is the > log view output. I can see epssolve really cost the major time. But I can > not see why it cost so much time. Can you see something from it? > > > > > > > > Thank you ! > > > > > > > > Runfeng Jin > > > > > > > > On 6月 4 2022, at 1:37 凌晨, Jose E. Roman <[email protected]> wrote: > > > > Convergence depends on distribution of eigenvalues you want to > compute. On the other hand, the cost also depends on the time it takes to > build the preconditioner. Use -log_view to see the cost of the different > steps of the computation. > > > > > > > > Jose > > > > > > > > > > > > > El 3 jun 2022, a las 18:50, jsfaraway <[email protected]> > escribió: > > > > > > > > > > hello! > > > > > > > > > > I am trying to use epsgd compute matrix's one smallest eigenvalue. > And I find a strang thing. There are two matrix A(900000*900000) and > B(90000*90000). While solve A use 371 iterations and only 30.83s, solve B > use 22 iterations and 38885s! What could be the reason for this? Or what > can I do to find the reason? > > > > > > > > > > I use" -eps_type gd -eps_ncv 300 -eps_nev 3 -eps_smallest_real ". > > > > > And there is one difference I can tell is matrix B has many small > value, whose absolute value is less than 10-6. Could this be the reason? > > > > > > > > > > Thank you! > > > > > > > > > > Runfeng Jin > > > > <log_view.txt> > > > > > > > <File2_lower-But-Smaller-Matrix.txt><File1_fatesr-But-Larger-MATRIX.txt> > > > > <file2_nodebug_MatrixB.txt><file1_nodebug_MatrixA.txt> > >
