We use nonscalable implementation as default, and switch to scalable for matrices over finer grids. You may use option '-matptap_via scalable' to force scalable PtAP implementation for all PtAP. Let me know if it works. Hong
On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. <bsm...@mcs.anl.gov<mailto:bsm...@mcs.anl.gov>> wrote: See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically for "large" problems, which is determined by some heuristic. Barry > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users > <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote: > > > > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong > <hzh...@mcs.anl.gov<mailto:hzh...@mcs.anl.gov>> wrote: > Fande: > Hong, > Thanks for your improvements on PtAP that is critical for MG-type algorithms. > > On Wed, May 3, 2017 at 10:17 AM Hong > <hzh...@mcs.anl.gov<mailto:hzh...@mcs.anl.gov>> wrote: > Mark, > Below is the copy of my email sent to you on Feb 27: > > I implemented scalable MatPtAP and did comparisons of three implementations > using ex56.c on alcf cetus machine (this machine has small memory, 1GB/core): > - nonscalable PtAP: use an array of length PN to do dense axpy > - scalable PtAP: do sparse axpy without use of PN array > > What PN means here? > Global number of columns of P. > > - hypre PtAP. > > The results are attached. Summary: > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP > - scalable PtAP is 4x faster than hypre PtAP > - hypre uses less memory (see > job.ne399.n63.np1000.sh<http://job.ne399.n63.np1000.sh>) > > I was wondering how much more memory PETSc PtAP uses than hypre? I am > implementing an AMG algorithm based on PETSc right now, and it is working > well. But we find some a bottleneck with PtAP. For the same P and A, PETSc > PtAP fails to generate a coarse matrix due to out of memory, while hypre > still can generates the coarse matrix. > > I do not want to just use the HYPRE one because we had to duplicate matrices > if I used HYPRE PtAP. > > It would be nice if you guys already have done some compassions on these > implementations for the memory usage. > Do you encounter memory issue with scalable PtAP? > > By default do we use the scalable PtAP?? Do we have to specify some options > to use the scalable version of PtAP? If so, it would be nice to use the > scalable version by default. I am totally missing something here. > > Thanks, > > Fande > > > Karl had a student in the summer who improved MatPtAP(). Do you use the > latest version of petsc? > HYPRE may use less memory than PETSc because it does not save and reuse the > matrices. > > I do not understand why generating coarse matrix fails due to out of memory. > Do you use direct solver at coarse grid? > Hong > > Based on above observation, I set the default PtAP algorithm as 'nonscalable'. > When PN > local estimated nonzero of C=PtAP, then switch default to > 'scalable'. > User can overwrite default. > > For the case of np=8000, ne=599 (see > job.ne599.n500.np8000.sh<http://job.ne599.n500.np8000.sh>), I get > MatPtAP 3.6224e+01 (nonscalable for small mats, scalable > for larger ones) > scalable MatPtAP 4.6129e+01 > hypre 1.9389e+02 > > This work in on petsc-master. Give it a try. If you encounter any problem, > let me know. > > Hong > > On Wed, May 3, 2017 at 10:01 AM, Mark Adams > <mfad...@lbl.gov<mailto:mfad...@lbl.gov>> wrote: > (Hong), what is the current state of optimizing RAP for scaling? > > Nate, is driving 3D elasticity problems at scaling with GAMG and we are > working out performance problems. They are hitting problems at ~1.5B dof > problems on a basic Cray (XC30 I think). > > Thanks, > Mark >