Hi, Philip, That branch was merged to petsc/main today. Let me know once you have new profiling results.
Thanks. --Junchao Zhang On Mon, Oct 16, 2023 at 9:33 AM Fackler, Philip <fackle...@ornl.gov> wrote: > Junchao, > > I've attached updated timing plots (red and blue are swapped from before; > yellow is the new one). There is an improvement for the NE_3 case only with > CUDA. Serial stays the same, and the PSI cases stay the same. In the PSI > cases, MatShift doesn't show up (I assume because we're using different > preconditioner arguments). So, there must be some other primary culprit. > I'll try to get updated profiling data to you soon. > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Fackler, Philip via Xolotl-psi-development < > xolotl-psi-developm...@lists.sourceforge.net> > *Sent:* Wednesday, October 11, 2023 11:31 > *To:* Junchao Zhang <junchao.zh...@gmail.com> > *Cc:* petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov>; > xolotl-psi-developm...@lists.sourceforge.net < > xolotl-psi-developm...@lists.sourceforge.net> > *Subject:* Re: [Xolotl-psi-development] [EXTERNAL] Re: [petsc-users] > Unexpected performance losses switching to COO interface > > I'm on it. > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang <junchao.zh...@gmail.com> > *Sent:* Wednesday, October 11, 2023 10:14 > *To:* Fackler, Philip <fackle...@ornl.gov> > *Cc:* petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov>; > xolotl-psi-developm...@lists.sourceforge.net < > xolotl-psi-developm...@lists.sourceforge.net>; Blondel, Sophie < > sblon...@utk.edu> > *Subject:* Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses > switching to COO interface > > Hi, Philip, > Could you try this branch > jczhang/2023-10-05/feature-support-matshift-aijkokkos ? > > Thanks. > --Junchao Zhang > > > On Thu, Oct 5, 2023 at 4:52 PM Fackler, Philip <fackle...@ornl.gov> wrote: > > Aha! That makes sense. Thank you. > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang <junchao.zh...@gmail.com> > *Sent:* Thursday, October 5, 2023 17:29 > *To:* Fackler, Philip <fackle...@ornl.gov> > *Cc:* petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov>; > xolotl-psi-developm...@lists.sourceforge.net < > xolotl-psi-developm...@lists.sourceforge.net>; Blondel, Sophie < > sblon...@utk.edu> > *Subject:* [EXTERNAL] Re: [petsc-users] Unexpected performance losses > switching to COO interface > > Wait a moment, it seems it was because we do not have a GPU implementation > of MatShift... > Let me see how to add it. > --Junchao Zhang > > > On Thu, Oct 5, 2023 at 10:58 AM Junchao Zhang <junchao.zh...@gmail.com> > wrote: > > Hi, Philip, > I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() > instead of the COO interface? MatSetValues() needs to copy the data from > device to host and thus is expensive. > Do you have profiling results with COO enabled? > > [image: Screenshot 2023-10-05 at 10.55.29 AM.png] > > > --Junchao Zhang > > > On Mon, Oct 2, 2023 at 9:52 AM Junchao Zhang <junchao.zh...@gmail.com> > wrote: > > Hi, Philip, > I will look into the tarballs and get back to you. > Thanks. > --Junchao Zhang > > > On Mon, Oct 2, 2023 at 9:41 AM Fackler, Philip via petsc-users < > petsc-users@mcs.anl.gov> wrote: > > We finally have xolotl ported to use the new COO interface and the > aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port > to our previous version (using MatSetValuesStencil and the default Mat and > Vec implementations), we expected to see an improvement in performance for > both the "serial" and "cuda" builds (here I'm referring to the kokkos > configuration). > > Attached are two plots that show timings for three different cases. All of > these were run on Ascent (the Summit-like training system) with 6 MPI tasks > (on a single node). The CUDA cases were given one GPU per task (and used > CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases > we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent > as possible. > > The performance of RHSJacobian (where the bulk of computation happens in > xolotl) behaved basically as expected (better than expected in the serial > build). NE_3 case in CUDA was the only one that performed worse, but not > surprisingly, since its workload for the GPUs is much smaller. We've still > got more optimization to do on this. > > The real surprise was how much worse the overall solve times were. This > seems to be due simply to switching to the kokkos-based implementation. I'm > wondering if there are any changes we can make in configuration or runtime > arguments to help with PETSc's performance here. Any help looking into this > would be appreciated. > > The tarballs linked here > <https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_19X-5FL3SVkGBM9YUzXnRR-5FkVWFG0JFwqZ3_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=GW0ImGWhWr4rR5AoSULCnaP1CN1QWxTSeMDhdOuhTEA&e=> > and here > <https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_15yDBN7-2DYlO1g6RJNPYNImzr611i1Ffhv_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=tO-BnNY2myA-pIsRnBjQNoaOSjn-B3-lWGiQp7XXJwk&e=> > are profiling databases which, once extracted, can be viewed with > hpcviewer. I don't know how helpful that will be, but hopefully it can give > you some direction. > > Thanks for your help, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > >