Thanks, Hong, I just briefly went through the code. I was wondering if it is possible to destroy "c->ptap" (that caches a lot of intermediate data) to release the memory after the coarse matrix is assembled. I understand you may still want to reuse these data structures by default but for my simulation, the preconditioner is fixed and there is no reason to keep the "c->ptap".
It would be great, if we could have this optional functionality. Fande Kong, On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong <[email protected]> wrote: > We use nonscalable implementation as default, and switch to scalable for > matrices over finer grids. You may use option '-matptap_via scalable' to > force scalable PtAP implementation for all PtAP. Let me know if it works. > Hong > > On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. <[email protected]> > wrote: > >> >> See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically for >> "large" problems, which is determined by some heuristic. >> >> Barry >> >> >> > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users < >> [email protected]> wrote: >> > >> > >> > >> > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong <[email protected]> wrote: >> > Fande: >> > Hong, >> > Thanks for your improvements on PtAP that is critical for MG-type >> algorithms. >> > >> > On Wed, May 3, 2017 at 10:17 AM Hong <[email protected]> wrote: >> > Mark, >> > Below is the copy of my email sent to you on Feb 27: >> > >> > I implemented scalable MatPtAP and did comparisons of three >> implementations using ex56.c on alcf cetus machine (this machine has small >> memory, 1GB/core): >> > - nonscalable PtAP: use an array of length PN to do dense axpy >> > - scalable PtAP: do sparse axpy without use of PN array >> > >> > What PN means here? >> > Global number of columns of P. >> > >> > - hypre PtAP. >> > >> > The results are attached. Summary: >> > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP >> > - scalable PtAP is 4x faster than hypre PtAP >> > - hypre uses less memory (see job.ne399.n63.np1000.sh) >> > >> > I was wondering how much more memory PETSc PtAP uses than hypre? I am >> implementing an AMG algorithm based on PETSc right now, and it is working >> well. But we find some a bottleneck with PtAP. For the same P and A, PETSc >> PtAP fails to generate a coarse matrix due to out of memory, while hypre >> still can generates the coarse matrix. >> > >> > I do not want to just use the HYPRE one because we had to duplicate >> matrices if I used HYPRE PtAP. >> > >> > It would be nice if you guys already have done some compassions on >> these implementations for the memory usage. >> > Do you encounter memory issue with scalable PtAP? >> > >> > By default do we use the scalable PtAP?? Do we have to specify some >> options to use the scalable version of PtAP? If so, it would be nice to >> use the scalable version by default. I am totally missing something here. >> > >> > Thanks, >> > >> > Fande >> > >> > >> > Karl had a student in the summer who improved MatPtAP(). Do you use the >> latest version of petsc? >> > HYPRE may use less memory than PETSc because it does not save and reuse >> the matrices. >> > >> > I do not understand why generating coarse matrix fails due to out of >> memory. Do you use direct solver at coarse grid? >> > Hong >> > >> > Based on above observation, I set the default PtAP algorithm as >> 'nonscalable'. >> > When PN > local estimated nonzero of C=PtAP, then switch default to >> 'scalable'. >> > User can overwrite default. >> > >> > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get >> > MatPtAP 3.6224e+01 (nonscalable for small mats, >> scalable for larger ones) >> > scalable MatPtAP 4.6129e+01 >> > hypre 1.9389e+02 >> > >> > This work in on petsc-master. Give it a try. If you encounter any >> problem, let me know. >> > >> > Hong >> > >> > On Wed, May 3, 2017 at 10:01 AM, Mark Adams <[email protected]> wrote: >> > (Hong), what is the current state of optimizing RAP for scaling? >> > >> > Nate, is driving 3D elasticity problems at scaling with GAMG and we are >> working out performance problems. They are hitting problems at ~1.5B dof >> problems on a basic Cray (XC30 I think). >> > >> > Thanks, >> > Mark >> > >> >>
