On Sun, Jan 17, 2016 at 10:13 AM, Griffith, Boyce Eugene < [email protected]> wrote:
> Barry -- > > Another random thought --- are these smallish direct solves things that > make sense to (try to) offload to a GPU? > Possibly, but the only clear-cut wins are for BLAS3, so we would need to stack up the identical solves. Matt > Thanks, > > -- Boyce > > > On Jan 16, 2016, at 10:46 PM, Barry Smith <[email protected]> wrote: > > > > > > Boyce, > > > > Of course anything is possible in software. But I expect an > optimization to not rebuild common submatrices/factorization requires a > custom PCSetUp_ASM() rather than some PETSc option that we could add > (especially if you are using Matt's PC_COMPOSITE_MULTIPLICATIVE). > > > > I would start by copying PCSetUp_ASM(), stripping out all the setup > stuff that doesn't relate to your code and then mark identical domains so > you don't need to call MatGetSubMatrices() on those domains and don't > create a new KSP for each one of those subdomains (but reuses a common > one). The PCApply_ASM() should be hopefully be reusable so long as you have > created the full array of KSP objects (some of which will be common). If > you increase the reference counts of the common KSP in > > PCSetUp_ASM() (and maybe the common sub matrices) then the > PCDestroy_ASM() should also work unchanged > > > > Good luck, > > > > Barry > > > >> On Jan 16, 2016, at 8:25 PM, Griffith, Boyce Eugene < > [email protected]> wrote: > >> > >> > >>> On Jan 16, 2016, at 7:00 PM, Barry Smith <[email protected]> wrote: > >>> > >>> > >>> Ok, I looked at your results in hpcviewer and don't see any surprises. > The PETSc time is in the little LU factorizations, the LU solves and the > matrix-vector products as it should be. Not much can be done on speeding > these except running on machines with high memory bandwidth. > >> > >> Looks like LU factorizations are about 25% for this particular case. > Many of these little subsystems are going to be identical (many will > correspond to constant coefficient Stokes), and it is fairly easy to figure > out which are which. How hard would it be to modify PCASM to allow for the > specification of one or more "default" KSPs that can be used for specified > blocks? > >> > >> Of course, we'll also look into tweaking the subdomain solves --- it > may not even be necessary to do exact subdomain solves to get reasonable MG > performance. > >> > >> -- Boyce > >> > >>> If you are using the master branch of PETSc two users gave us a nifty > new profiler that is "PETSc style" but shows the hierarchy of PETSc solvers > time and flop etc. You can run with -log_view :filename.xml:ascii_xml and > then open the file with a browser (for example open -f Safari filename.xml) > or email the file. > >>> > >>> Barry > >>> > >>>> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S < > [email protected]> wrote: > >>>> > >>>> > >>>> > >>>>> On Jan 16, 2016, at 1:13 PM, Barry Smith <[email protected]> wrote: > >>>>> > >>>>> Either way is fine so long as I don't have to install a ton of > stuff; which it sounds like I won’t. > >>>> > >>>> http://hpctoolkit.org/download/hpcviewer/ > >>>> > >>>> Unzip HPCViewer for MacOSX with command line and drag the unzipped > folder to Applications. You will be able to > >>>> fire HPCViewer from LaunchPad. Point it to this attached directory. > You will be able to see three different kind of profiling > >>>> under Calling Context View, Callers View and Flat View. > >>>> > >>>> > >>>> > >>>> <hpctoolkit-main2d-database.zip> > >>>> > >>> > >> > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
