OK, the problem is probably: PetscMPIInt PETSC_MPI_THREAD_REQUIRED = MPI_THREAD_FUNNELED;
There is an example that sets: PETSC_MPI_THREAD_REQUIRED = MPI_THREAD_MULTIPLE; This is what I need. On Thu, Jan 21, 2021 at 2:26 PM Mark Adams <mfad...@lbl.gov> wrote: > > > On Thu, Jan 21, 2021 at 2:11 PM Matthew Knepley <knep...@gmail.com> wrote: > >> On Thu, Jan 21, 2021 at 2:02 PM Mark Adams <mfad...@lbl.gov> wrote: >> >>> On Thu, Jan 21, 2021 at 1:44 PM Matthew Knepley <knep...@gmail.com> >>> wrote: >>> >>>> On Thu, Jan 21, 2021 at 11:16 AM Mark Adams <mfad...@lbl.gov> wrote: >>>> >>>>> Yes, the problem is that each KSP solver is running in an OMP thread >>>>> (So at this point it only works for SELF and its Landau so it is all I >>>>> need). It looks like MPI reductions called with a comm_self are not thread >>>>> safe (eg, the could say, this is one proc, thus, just copy send --> recv, >>>>> but they don't) >>>>> >>>> >>>> Instead of using SELF, how about Comm_dup() for each thread? >>>> >>> >>> OK, raw MPI_Comm_dup. I tried PetscCommDup. Let me this. >>> Thanks, >>> >> >> You would have to dup them all outside the OMP section, since it is not >> threadsafe. Then each thread uses one I think. >> > > Yea sure. I do it in SetUp. > > Well that worked to get *different Comms*, finally, I still get the same > problem. The number of iterations differ wildly. This two species and two > threads (13 SNES its that is not deterministic). Way below is one thread (8 > its) and fairly uniform iteration counts. > > Maybe this MPI is just not thread safe at all. Let me look into it. > Thanks anyway, > > 0 SNES Function norm 4.974994975313e-03 > In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x80017c60. > Comms pc=0x67ad27c0 ksp=*0x7ffe1600* newcomm=0x8014b6e0 > In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x7ffdabc0. > Comms pc=0x67ad27c0 ksp=*0x7fff70d0* newcomm=0x7ffe9980 > Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL > iterations 282 > 1 SNES Function norm 1.836376279964e-05 > Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL > iterations 19 > 2 SNES Function norm 3.059930074740e-07 > Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL > iterations 15 > 3 SNES Function norm 4.744275398121e-08 > Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL > iterations 4 > 4 SNES Function norm 4.014828563316e-08 > Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL > iterations 456 > 5 SNES Function norm 5.670836337808e-09 > Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL > iterations 2 > 6 SNES Function norm 2.410421401323e-09 > Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL > iterations 18 > 7 SNES Function norm 6.533948191791e-10 > Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL > iterations 458 > 8 SNES Function norm 1.008133815842e-10 > Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL > iterations 9 > 9 SNES Function norm 1.690450876038e-11 > Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL > iterations 4 > 10 SNES Function norm 1.336383986009e-11 > Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL > iterations 463 > 11 SNES Function norm 1.873022410774e-12 > Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL > iterations 113 > 12 SNES Function norm 1.801834606518e-13 > Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL > iterations 1 > 13 SNES Function norm 1.004397317339e-13 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 13 > > > > > 0 SNES Function norm 4.974994975313e-03 > In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x6e265010. > Comms pc=0x56450340 ksp=0x6e2168d0 newcomm=0x6e265090 > In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x6e25bc40. > Comms pc=0x56450340 ksp=0x6e22c1d0 newcomm=0x6e21e8f0 > Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL > iterations 282 > 1 SNES Function norm 1.836376279963e-05 > Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL > iterations 380 > 2 SNES Function norm 3.018499983019e-07 > Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL > iterations 387 > 3 SNES Function norm 1.826353175637e-08 > Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL > iterations 391 > 4 SNES Function norm 1.378600599548e-09 > Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL > iterations 392 > 5 SNES Function norm 1.077289085611e-10 > Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL > iterations 394 > 6 SNES Function norm 8.571891727748e-12 > Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL > iterations 395 > 7 SNES Function norm 6.897647643450e-13 > Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL > iterations 395 > 8 SNES Function norm 5.606434614114e-14 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 8 > > > > > > > > > >> >> Matt >> >> >>> Matt >>>> >>>> >>>>> On Thu, Jan 21, 2021 at 10:46 AM Matthew Knepley <knep...@gmail.com> >>>>> wrote: >>>>> >>>>>> On Thu, Jan 21, 2021 at 10:34 AM Mark Adams <mfad...@lbl.gov> wrote: >>>>>> >>>>>>> It looks like PETSc is just too clever for me. I am trying to get a >>>>>>> different MPI_Comm into each block, but PETSc is thwarting me: >>>>>>> >>>>>> >>>>>> It looks like you are using SELF. Is that what you want? Do you want >>>>>> a bunch of comms with the same group, but independent somehow? I am >>>>>> confused. >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> if (jac->use_openmp) { >>>>>>> ierr = >>>>>>> KSPCreate(MPI_COMM_SELF,&ilink->ksp);CHKERRQ(ierr); >>>>>>> PetscPrintf(PETSC_COMM_SELF,"In PCFieldSplitSetFields_FieldSplit >>>>>>> with -------------- link: %p. Comms %p >>>>>>> %p\n",ilink,PetscObjectComm((PetscObject)pc),PetscObjectComm((PetscObject)ilink->ksp)); >>>>>>> } else { >>>>>>> ierr = >>>>>>> KSPCreate(PetscObjectComm((PetscObject)pc),&ilink->ksp);CHKERRQ(ierr); >>>>>>> } >>>>>>> >>>>>>> produces: >>>>>>> >>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link: >>>>>>> 0x7e9cb4f0. Comms 0x660c6ad0 0x660c6ad0 >>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link: >>>>>>> 0x7e88f7d0. Comms 0x660c6ad0 0x660c6ad0 >>>>>>> >>>>>>> How can I work around this? >>>>>>> >>>>>>> >>>>>>> On Thu, Jan 21, 2021 at 7:41 AM Mark Adams <mfad...@lbl.gov> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jan 20, 2021 at 6:21 PM Barry Smith <bsm...@petsc.dev> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Jan 20, 2021, at 3:09 PM, Mark Adams <mfad...@lbl.gov> wrote: >>>>>>>>> >>>>>>>>> So I put in a temporary hack to get the first Fieldsplit apply to >>>>>>>>> NOT use OMP and it sort of works. >>>>>>>>> >>>>>>>>> Preonly/lu is fine. GMRES calls vector creates/dups in every >>>>>>>>> solve so that is a big problem. >>>>>>>>> >>>>>>>>> >>>>>>>>> It should definitely not be creating vectors "in every" solve. >>>>>>>>> But it does do lazy allocation of needed restarted vectors which may >>>>>>>>> make >>>>>>>>> it look like it is creating "every" vectors in every solve. You can >>>>>>>>> use -ksp_gmres_preallocate to force it to create all the restart >>>>>>>>> vectors up >>>>>>>>> front at KSPSetUp(). >>>>>>>>> >>>>>>>> >>>>>>>> Well, I run the first solve w/o OMP and I see Vec dups in cuSparse >>>>>>>> Vecs in the 2nd solve. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Why is creating vectors "at every solve" a problem? It is not >>>>>>>>> thread safe I guess? >>>>>>>>> >>>>>>>> >>>>>>>> It dies when it looks at the options database, in a Free in the >>>>>>>> get-options method to be exact (see stacks). >>>>>>>> >>>>>>>> ======= Backtrace: ========= >>>>>>>> /lib64/libc.so.6(cfree+0x4a0)[0x200021839be0] >>>>>>>> >>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscFreeAlign+0x4c)[0x2000002a368c] >>>>>>>> >>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscOptionsEnd_Private+0xf4)[0x2000002e53f0] >>>>>>>> >>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x7c6c28)[0x2000008b6c28] >>>>>>>> >>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreate_SeqCUDA+0x11c)[0x20000052c510] >>>>>>>> >>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecSetType+0x670)[0x200000549664] >>>>>>>> >>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreateSeqCUDA+0x150)[0x20000052c0b0] >>>>>>>> >>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x43c198)[0x20000052c198] >>>>>>>> >>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicate+0x44)[0x200000542168] >>>>>>>> >>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs_Default+0x148)[0x200000543820] >>>>>>>> >>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs+0x54)[0x2000005425f4] >>>>>>>> >>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPCreateVecs+0x4b4)[0x2000016f0aec] >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Richardson works except the convergence test gets confused, >>>>>>>>> presumably because MPI reductions with PETSC_COMM_SELF is not >>>>>>>>> threadsafe. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> One fix for the norms might be to create each subdomain solver >>>>>>>>> with a different communicator. >>>>>>>>> >>>>>>>>> >>>>>>>>> Yes you could do that. It might actually be the correct thing >>>>>>>>> to do also, if you have multiple threads call MPI reductions on the >>>>>>>>> same >>>>>>>>> communicator that would be a problem. Each KSP should get a new >>>>>>>>> MPI_Comm. >>>>>>>>> >>>>>>>> >>>>>>>> OK. I will only do this. >>>>>>>> >>>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which >>>>>> their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> <http://www.cse.buffalo.edu/~knepley/> >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> <http://www.cse.buffalo.edu/~knepley/> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> <http://www.cse.buffalo.edu/~knepley/> >> >