We use nonscalable implementation as default, and switch to scalable for 
matrices over finer grids. You may use option '-matptap_via scalable' to force 
scalable PtAP  implementation for all PtAP. Let me know if it works.
Hong

On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
<bsm...@mcs.anl.gov<mailto:bsm...@mcs.anl.gov>> wrote:

  See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically for 
"large" problems, which is determined by some heuristic.

   Barry


> On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users 
> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote:
>
>
>
> On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
> <hzh...@mcs.anl.gov<mailto:hzh...@mcs.anl.gov>> wrote:
> Fande:
> Hong,
> Thanks for your improvements on PtAP that is critical for MG-type algorithms.
>
> On Wed, May 3, 2017 at 10:17 AM Hong 
> <hzh...@mcs.anl.gov<mailto:hzh...@mcs.anl.gov>> wrote:
> Mark,
> Below is the copy of my email sent to you on Feb 27:
>
> I implemented scalable MatPtAP and did comparisons of three implementations 
> using ex56.c on alcf cetus machine (this machine has small memory, 1GB/core):
> - nonscalable PtAP: use an array of length PN to do dense axpy
> - scalable PtAP:       do sparse axpy without use of PN array
>
> What PN means here?
> Global number of columns of P.
>
> - hypre PtAP.
>
> The results are attached. Summary:
> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
> - scalable PtAP is 4x faster than hypre PtAP
> - hypre uses less memory (see 
> job.ne399.n63.np1000.sh<http://job.ne399.n63.np1000.sh>)
>
> I was wondering how much more memory PETSc PtAP uses than hypre? I am 
> implementing an AMG algorithm based on PETSc right now, and it is working 
> well. But we find some a bottleneck with PtAP. For the same P and A, PETSc 
> PtAP fails to generate a coarse matrix due to out of memory, while hypre 
> still can generates the coarse matrix.
>
> I do not want to just use the HYPRE one because we had to duplicate matrices 
> if I used HYPRE PtAP.
>
> It would be nice if you guys already have done some compassions on these 
> implementations for the memory usage.
> Do you encounter memory issue with  scalable PtAP?
>
> By default do we use the scalable PtAP?? Do we have to specify some options 
> to use the scalable version of PtAP?  If so, it would be nice to use the 
> scalable version by default.  I am totally missing something here.
>
> Thanks,
>
> Fande
>
>
> Karl had a student in the summer who improved MatPtAP(). Do you use the 
> latest version of petsc?
> HYPRE may use less memory than PETSc because it does not save and reuse the 
> matrices.
>
> I do not understand why generating coarse matrix fails due to out of memory. 
> Do you use direct solver at coarse grid?
> Hong
>
> Based on above observation, I set the default PtAP algorithm as 'nonscalable'.
> When PN > local estimated nonzero of C=PtAP, then switch default to 
> 'scalable'.
> User can overwrite default.
>
> For the case of np=8000, ne=599 (see 
> job.ne599.n500.np8000.sh<http://job.ne599.n500.np8000.sh>), I get
> MatPtAP                   3.6224e+01 (nonscalable for small mats, scalable 
> for larger ones)
> scalable MatPtAP     4.6129e+01
> hypre                        1.9389e+02
>
> This work in on petsc-master. Give it a try. If you encounter any problem, 
> let me know.
>
> Hong
>
> On Wed, May 3, 2017 at 10:01 AM, Mark Adams 
> <mfad...@lbl.gov<mailto:mfad...@lbl.gov>> wrote:
> (Hong), what is the current state of optimizing RAP for scaling?
>
> Nate, is driving 3D elasticity problems at scaling with GAMG and we are 
> working out performance problems. They are hitting problems at ~1.5B dof 
> problems on a basic Cray (XC30 I think).
>
> Thanks,
> Mark
>

Reply via email to