Also, you mentioned that you are using 10 levels. This is very strange with GAMG. You can run with -info and grep on GAMG to see the sizes and the number of non-zeros per level. You should coarsen at a rate of about 2^D to 3^D with GAMG (with 10 levels this would imply a very large fine grid problem so I suspect there is something strange going on with coarsening). Mark
On Fri, Dec 21, 2018 at 11:36 AM Zhang, Hong via petsc-users < [email protected]> wrote: > Fande: > I will explore it and get back to you. > Does anyone know how to profile memory usage? > Hong > > Thanks, Hong, >> >> I just briefly went through the code. I was wondering if it is possible >> to destroy "c->ptap" (that caches a lot of intermediate data) to release >> the memory after the coarse matrix is assembled. I understand you may still >> want to reuse these data structures by default but for my simulation, the >> preconditioner is fixed and there is no reason to keep the "c->ptap". >> > >> It would be great, if we could have this optional functionality. >> >> Fande Kong, >> >> On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong <[email protected]> wrote: >> >>> We use nonscalable implementation as default, and switch to scalable for >>> matrices over finer grids. You may use option '-matptap_via scalable' to >>> force scalable PtAP implementation for all PtAP. Let me know if it works. >>> Hong >>> >>> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. <[email protected]> >>> wrote: >>> >>>> >>>> See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically >>>> for "large" problems, which is determined by some heuristic. >>>> >>>> Barry >>>> >>>> >>>> > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users < >>>> [email protected]> wrote: >>>> > >>>> > >>>> > >>>> > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong <[email protected]> >>>> wrote: >>>> > Fande: >>>> > Hong, >>>> > Thanks for your improvements on PtAP that is critical for MG-type >>>> algorithms. >>>> > >>>> > On Wed, May 3, 2017 at 10:17 AM Hong <[email protected]> wrote: >>>> > Mark, >>>> > Below is the copy of my email sent to you on Feb 27: >>>> > >>>> > I implemented scalable MatPtAP and did comparisons of three >>>> implementations using ex56.c on alcf cetus machine (this machine has small >>>> memory, 1GB/core): >>>> > - nonscalable PtAP: use an array of length PN to do dense axpy >>>> > - scalable PtAP: do sparse axpy without use of PN array >>>> > >>>> > What PN means here? >>>> > Global number of columns of P. >>>> > >>>> > - hypre PtAP. >>>> > >>>> > The results are attached. Summary: >>>> > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre >>>> PtAP >>>> > - scalable PtAP is 4x faster than hypre PtAP >>>> > - hypre uses less memory (see job.ne399.n63.np1000.sh) >>>> > >>>> > I was wondering how much more memory PETSc PtAP uses than hypre? I am >>>> implementing an AMG algorithm based on PETSc right now, and it is working >>>> well. But we find some a bottleneck with PtAP. For the same P and A, PETSc >>>> PtAP fails to generate a coarse matrix due to out of memory, while hypre >>>> still can generates the coarse matrix. >>>> > >>>> > I do not want to just use the HYPRE one because we had to duplicate >>>> matrices if I used HYPRE PtAP. >>>> > >>>> > It would be nice if you guys already have done some compassions on >>>> these implementations for the memory usage. >>>> > Do you encounter memory issue with scalable PtAP? >>>> > >>>> > By default do we use the scalable PtAP?? Do we have to specify some >>>> options to use the scalable version of PtAP? If so, it would be nice to >>>> use the scalable version by default. I am totally missing something here. >>>> > >>>> > Thanks, >>>> > >>>> > Fande >>>> > >>>> > >>>> > Karl had a student in the summer who improved MatPtAP(). Do you use >>>> the latest version of petsc? >>>> > HYPRE may use less memory than PETSc because it does not save and >>>> reuse the matrices. >>>> > >>>> > I do not understand why generating coarse matrix fails due to out of >>>> memory. Do you use direct solver at coarse grid? >>>> > Hong >>>> > >>>> > Based on above observation, I set the default PtAP algorithm as >>>> 'nonscalable'. >>>> > When PN > local estimated nonzero of C=PtAP, then switch default to >>>> 'scalable'. >>>> > User can overwrite default. >>>> > >>>> > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get >>>> > MatPtAP 3.6224e+01 (nonscalable for small mats, >>>> scalable for larger ones) >>>> > scalable MatPtAP 4.6129e+01 >>>> > hypre 1.9389e+02 >>>> > >>>> > This work in on petsc-master. Give it a try. If you encounter any >>>> problem, let me know. >>>> > >>>> > Hong >>>> > >>>> > On Wed, May 3, 2017 at 10:01 AM, Mark Adams <[email protected]> wrote: >>>> > (Hong), what is the current state of optimizing RAP for scaling? >>>> > >>>> > Nate, is driving 3D elasticity problems at scaling with GAMG and we >>>> are working out performance problems. They are hitting problems at ~1.5B >>>> dof problems on a basic Cray (XC30 I think). >>>> > >>>> > Thanks, >>>> > Mark >>>> > >>>> >>>>
