And I guess I am really doing two things here. 1) The solver that I am intending to use is SuperLU. I believe Barry got LU working in OMP threads a few years ago. My problems now are in Krylov. I could live with what I have now and just get Sherry to make SuperLU_dist not use MPI in serial. SuperLU does hang now.
2) While I am doing this grab low hanging fruit and expand this model to work with Krylov. GMRES has more problems but it looks like Richardson/cuSparse-ILU only has a problem that convergence testing is hosed. I am open to other models, I have a specific problem and would like, as much as possible, to contribute to PETSc along the way. On Thu, Jan 21, 2021 at 12:01 PM Mark Adams <mfad...@lbl.gov> wrote: > > > On Thu, Jan 21, 2021 at 11:25 AM Jed Brown <j...@jedbrown.org> wrote: > >> Mark Adams <mfad...@lbl.gov> writes: >> >> > Yes, the problem is that each KSP solver is running in an OMP thread >> >> There can be more or less splits than OMP_NUM_THREADS. Each thread is >> still calling blocking operations. >> >> This is a concurrency problem, not a parallel efficiency problem. It can >> be solved with async interfaces > > > I don't know how to do that. I want a GPU solver, probably superLU, and am > starting with cuSparse ilu to get something running > > or by making as many threads as splits and ensuring that you don't spin >> (lest contention kill performance). > > > I don't get correctness with Richardson with > 1 OMP threads currently. > This is on IBM with GNU. > > >> OpenMP is pretty orthogonal and probably not a good fit. >> > > Do you have an alternative? > > >> >> > (So at this point it only works for SELF and its Landau so it is all I >> need). It looks like MPI reductions called with a comm_self are not thread >> safe (eg, the could say, this is one proc, thus, just copy send --> recv, >> but they don't) >> > >> > On Thu, Jan 21, 2021 at 10:46 AM Matthew Knepley <knep...@gmail.com> >> wrote: >> > >> >> On Thu, Jan 21, 2021 at 10:34 AM Mark Adams <mfad...@lbl.gov> wrote: >> >> >> >>> It looks like PETSc is just too clever for me. I am trying to get a >> >>> different MPI_Comm into each block, but PETSc is thwarting me: >> >>> >> >> >> >> It looks like you are using SELF. Is that what you want? Do you want a >> >> bunch of comms with the same group, but independent somehow? I am >> confused. >> >> >> >> Matt >> >> >> >> >> >>> if (jac->use_openmp) { >> >>> ierr = >> KSPCreate(MPI_COMM_SELF,&ilink->ksp);CHKERRQ(ierr); >> >>> PetscPrintf(PETSC_COMM_SELF,"In PCFieldSplitSetFields_FieldSplit with >> >>> -------------- link: %p. Comms %p >> >>> >> %p\n",ilink,PetscObjectComm((PetscObject)pc),PetscObjectComm((PetscObject)ilink->ksp)); >> >>> } else { >> >>> ierr = >> >>> KSPCreate(PetscObjectComm((PetscObject)pc),&ilink->ksp);CHKERRQ(ierr); >> >>> } >> >>> >> >>> produces: >> >>> >> >>> In PCFieldSplitSetFields_FieldSplit with -------------- link: >> 0x7e9cb4f0. >> >>> Comms 0x660c6ad0 0x660c6ad0 >> >>> In PCFieldSplitSetFields_FieldSplit with -------------- link: >> 0x7e88f7d0. >> >>> Comms 0x660c6ad0 0x660c6ad0 >> >>> >> >>> How can I work around this? >> >>> >> >>> >> >>> On Thu, Jan 21, 2021 at 7:41 AM Mark Adams <mfad...@lbl.gov> wrote: >> >>> >> >>>> >> >>>> >> >>>> On Wed, Jan 20, 2021 at 6:21 PM Barry Smith <bsm...@petsc.dev> >> wrote: >> >>>> >> >>>>> >> >>>>> >> >>>>> On Jan 20, 2021, at 3:09 PM, Mark Adams <mfad...@lbl.gov> wrote: >> >>>>> >> >>>>> So I put in a temporary hack to get the first Fieldsplit apply to >> NOT >> >>>>> use OMP and it sort of works. >> >>>>> >> >>>>> Preonly/lu is fine. GMRES calls vector creates/dups in every solve >> so >> >>>>> that is a big problem. >> >>>>> >> >>>>> >> >>>>> It should definitely not be creating vectors "in every" solve. >> But it >> >>>>> does do lazy allocation of needed restarted vectors which may make >> it look >> >>>>> like it is creating "every" vectors in every solve. You can >> >>>>> use -ksp_gmres_preallocate to force it to create all the restart >> vectors up >> >>>>> front at KSPSetUp(). >> >>>>> >> >>>> >> >>>> Well, I run the first solve w/o OMP and I see Vec dups in cuSparse >> Vecs >> >>>> in the 2nd solve. >> >>>> >> >>>> >> >>>>> >> >>>>> Why is creating vectors "at every solve" a problem? It is not >> thread >> >>>>> safe I guess? >> >>>>> >> >>>> >> >>>> It dies when it looks at the options database, in a Free in the >> >>>> get-options method to be exact (see stacks). >> >>>> >> >>>> ======= Backtrace: ========= >> >>>> /lib64/libc.so.6(cfree+0x4a0)[0x200021839be0] >> >>>> >> >>>> >> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscFreeAlign+0x4c)[0x2000002a368c] >> >>>> >> >>>> >> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscOptionsEnd_Private+0xf4)[0x2000002e53f0] >> >>>> >> >>>> >> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x7c6c28)[0x2000008b6c28] >> >>>> >> >>>> >> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreate_SeqCUDA+0x11c)[0x20000052c510] >> >>>> >> >>>> >> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecSetType+0x670)[0x200000549664] >> >>>> >> >>>> >> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreateSeqCUDA+0x150)[0x20000052c0b0] >> >>>> >> >>>> >> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x43c198)[0x20000052c198] >> >>>> >> >>>> >> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicate+0x44)[0x200000542168] >> >>>> >> >>>> >> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs_Default+0x148)[0x200000543820] >> >>>> >> >>>> >> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs+0x54)[0x2000005425f4] >> >>>> >> >>>> >> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPCreateVecs+0x4b4)[0x2000016f0aec] >> >>>> >> >>>> >> >>>> >> >>>>> >> >>>>> Richardson works except the convergence test gets confused, >> presumably >> >>>>> because MPI reductions with PETSC_COMM_SELF is not threadsafe. >> >>>>> >> >>>>> >> >>>>> >> >>>>> One fix for the norms might be to create each subdomain solver with >> a >> >>>>> different communicator. >> >>>>> >> >>>>> >> >>>>> Yes you could do that. It might actually be the correct thing to >> do >> >>>>> also, if you have multiple threads call MPI reductions on the same >> >>>>> communicator that would be a problem. Each KSP should get a new >> MPI_Comm. >> >>>>> >> >>>> >> >>>> OK. I will only do this. >> >>>> >> >>>> >> >> >> >> -- >> >> What most experimenters take for granted before they begin their >> >> experiments is infinitely more interesting than any results to which >> their >> >> experiments lead. >> >> -- Norbert Wiener >> >> >> >> https://www.cse.buffalo.edu/~knepley/ >> >> <http://www.cse.buffalo.edu/~knepley/> >> >> >> >