Re: [petsc-dev] Memory problem with OpenMP and Fieldsplit sub solvers

Mark Adams Thu, 21 Jan 2021 14:19:21 -0800

OK, the problem is probably:

PetscMPIInt PETSC_MPI_THREAD_REQUIRED = MPI_THREAD_FUNNELED;


There is an example that sets:

PETSC_MPI_THREAD_REQUIRED = MPI_THREAD_MULTIPLE;

This is what I need.




On Thu, Jan 21, 2021 at 2:26 PM Mark Adams <mfad...@lbl.gov> wrote:

>
>
> On Thu, Jan 21, 2021 at 2:11 PM Matthew Knepley <knep...@gmail.com> wrote:
>
>> On Thu, Jan 21, 2021 at 2:02 PM Mark Adams <mfad...@lbl.gov> wrote:
>>
>>> On Thu, Jan 21, 2021 at 1:44 PM Matthew Knepley <knep...@gmail.com>
>>> wrote:
>>>
>>>> On Thu, Jan 21, 2021 at 11:16 AM Mark Adams <mfad...@lbl.gov> wrote:
>>>>
>>>>> Yes, the problem is that each KSP solver is running in an OMP thread
>>>>> (So at this point it only works for SELF and its Landau so it is all I
>>>>> need). It looks like MPI reductions called with a comm_self are not thread
>>>>> safe (eg, the could say, this is one proc, thus, just copy send --> recv,
>>>>> but they don't)
>>>>>
>>>>
>>>> Instead of using SELF, how about Comm_dup() for each thread?
>>>>
>>>
>>> OK, raw MPI_Comm_dup. I tried PetscCommDup. Let me this.
>>> Thanks,
>>>
>>
>> You would have to dup them all outside the OMP section, since it is not
>> threadsafe. Then each thread uses one I think.
>>
>
> Yea sure. I do it in SetUp.
>
> Well that worked to get *different Comms*, finally, I still get the same
> problem. The number of iterations differ wildly. This two species and two
> threads (13 SNES its that is not deterministic). Way below is one thread (8
> its) and fairly uniform iteration counts.
>
> Maybe this MPI is just not thread safe at all. Let me look into it.
> Thanks anyway,
>
>    0 SNES Function norm 4.974994975313e-03
> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x80017c60.
> Comms pc=0x67ad27c0 ksp=*0x7ffe1600* newcomm=0x8014b6e0
> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x7ffdabc0.
> Comms pc=0x67ad27c0 ksp=*0x7fff70d0* newcomm=0x7ffe9980
>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
> iterations 282
>     1 SNES Function norm 1.836376279964e-05
>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
> iterations 19
>     2 SNES Function norm 3.059930074740e-07
>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
> iterations 15
>     3 SNES Function norm 4.744275398121e-08
>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
> iterations 4
>     4 SNES Function norm 4.014828563316e-08
>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
> iterations 456
>     5 SNES Function norm 5.670836337808e-09
>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
> iterations 2
>     6 SNES Function norm 2.410421401323e-09
>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
> iterations 18
>     7 SNES Function norm 6.533948191791e-10
>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
> iterations 458
>     8 SNES Function norm 1.008133815842e-10
>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
> iterations 9
>     9 SNES Function norm 1.690450876038e-11
>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
> iterations 4
>    10 SNES Function norm 1.336383986009e-11
>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
> iterations 463
>    11 SNES Function norm 1.873022410774e-12
>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
> iterations 113
>    12 SNES Function norm 1.801834606518e-13
>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
> iterations 1
>    13 SNES Function norm 1.004397317339e-13
>   Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 13
>
>
>
>
>     0 SNES Function norm 4.974994975313e-03
> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x6e265010.
> Comms pc=0x56450340 ksp=0x6e2168d0 newcomm=0x6e265090
> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x6e25bc40.
> Comms pc=0x56450340 ksp=0x6e22c1d0 newcomm=0x6e21e8f0
>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
> iterations 282
>     1 SNES Function norm 1.836376279963e-05
>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
> iterations 380
>     2 SNES Function norm 3.018499983019e-07
>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
> iterations 387
>     3 SNES Function norm 1.826353175637e-08
>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
> iterations 391
>     4 SNES Function norm 1.378600599548e-09
>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
> iterations 392
>     5 SNES Function norm 1.077289085611e-10
>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
> iterations 394
>     6 SNES Function norm 8.571891727748e-12
>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
> iterations 395
>     7 SNES Function norm 6.897647643450e-13
>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
> iterations 395
>     8 SNES Function norm 5.606434614114e-14
>   Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 8
>
>
>
>
>
>
>
>
>
>>
>>    Matt
>>
>>
>>>   Matt
>>>>
>>>>
>>>>> On Thu, Jan 21, 2021 at 10:46 AM Matthew Knepley <knep...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> On Thu, Jan 21, 2021 at 10:34 AM Mark Adams <mfad...@lbl.gov> wrote:
>>>>>>
>>>>>>> It looks like PETSc is just too clever for me. I am trying to get a
>>>>>>> different MPI_Comm into each block, but PETSc is thwarting me:
>>>>>>>
>>>>>>
>>>>>> It looks like you are using SELF. Is that what you want? Do you want
>>>>>> a bunch of comms with the same group, but independent somehow? I am
>>>>>> confused.
>>>>>>
>>>>>>    Matt
>>>>>>
>>>>>>
>>>>>>>   if (jac->use_openmp) {
>>>>>>>     ierr          =
>>>>>>> KSPCreate(MPI_COMM_SELF,&ilink->ksp);CHKERRQ(ierr);
>>>>>>> PetscPrintf(PETSC_COMM_SELF,"In PCFieldSplitSetFields_FieldSplit
>>>>>>> with -------------- link: %p. Comms %p
>>>>>>> %p\n",ilink,PetscObjectComm((PetscObject)pc),PetscObjectComm((PetscObject)ilink->ksp));
>>>>>>>   } else {
>>>>>>>     ierr          =
>>>>>>> KSPCreate(PetscObjectComm((PetscObject)pc),&ilink->ksp);CHKERRQ(ierr);
>>>>>>>   }
>>>>>>>
>>>>>>> produces:
>>>>>>>
>>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>>> 0x7e9cb4f0. Comms 0x660c6ad0 0x660c6ad0
>>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>>> 0x7e88f7d0. Comms 0x660c6ad0 0x660c6ad0
>>>>>>>
>>>>>>> How can I work around this?
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 21, 2021 at 7:41 AM Mark Adams <mfad...@lbl.gov> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 20, 2021 at 6:21 PM Barry Smith <bsm...@petsc.dev>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Jan 20, 2021, at 3:09 PM, Mark Adams <mfad...@lbl.gov> wrote:
>>>>>>>>>
>>>>>>>>> So I put in a temporary hack to get the first Fieldsplit apply to
>>>>>>>>> NOT use OMP and it sort of works.
>>>>>>>>>
>>>>>>>>> Preonly/lu is fine. GMRES calls vector creates/dups in every
>>>>>>>>> solve so that is a big problem.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   It should definitely not be creating vectors "in every" solve.
>>>>>>>>> But it does do lazy allocation of needed restarted vectors which may 
>>>>>>>>> make
>>>>>>>>> it look like it is creating "every" vectors in every solve.  You can
>>>>>>>>> use -ksp_gmres_preallocate to force it to create all the restart 
>>>>>>>>> vectors up
>>>>>>>>> front at KSPSetUp().
>>>>>>>>>
>>>>>>>>
>>>>>>>> Well, I run the first solve w/o OMP and I see Vec dups in cuSparse
>>>>>>>> Vecs in the 2nd solve.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>   Why is creating vectors "at every solve" a problem? It is not
>>>>>>>>> thread safe I guess?
>>>>>>>>>
>>>>>>>>
>>>>>>>> It dies when it looks at the options database, in a Free in the
>>>>>>>> get-options method to be exact (see stacks).
>>>>>>>>
>>>>>>>> ======= Backtrace: =========
>>>>>>>> /lib64/libc.so.6(cfree+0x4a0)[0x200021839be0]
>>>>>>>>
>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscFreeAlign+0x4c)[0x2000002a368c]
>>>>>>>>
>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscOptionsEnd_Private+0xf4)[0x2000002e53f0]
>>>>>>>>
>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x7c6c28)[0x2000008b6c28]
>>>>>>>>
>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreate_SeqCUDA+0x11c)[0x20000052c510]
>>>>>>>>
>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecSetType+0x670)[0x200000549664]
>>>>>>>>
>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreateSeqCUDA+0x150)[0x20000052c0b0]
>>>>>>>>
>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x43c198)[0x20000052c198]
>>>>>>>>
>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicate+0x44)[0x200000542168]
>>>>>>>>
>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs_Default+0x148)[0x200000543820]
>>>>>>>>
>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs+0x54)[0x2000005425f4]
>>>>>>>>
>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPCreateVecs+0x4b4)[0x2000016f0aec]
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Richardson works except the convergence test gets confused,
>>>>>>>>> presumably because MPI reductions with PETSC_COMM_SELF is not 
>>>>>>>>> threadsafe.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> One fix for the norms might be to create each subdomain solver
>>>>>>>>> with a different communicator.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    Yes you could do that. It might actually be the correct thing
>>>>>>>>> to do also, if you have multiple threads call MPI reductions on the 
>>>>>>>>> same
>>>>>>>>> communicator that would be a problem. Each KSP should get a new 
>>>>>>>>> MPI_Comm.
>>>>>>>>>
>>>>>>>>
>>>>>>>> OK. I will only do this.
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which 
>>>>>> their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>

Re: [petsc-dev] Memory problem with OpenMP and Fieldsplit sub solvers

Reply via email to