And I guess I am really doing two things here.

1) The solver that I am intending to use is SuperLU. I believe Barry got LU
working in OMP threads a few years ago. My problems now are in Krylov. I
could live with what I have now and just get Sherry to make SuperLU_dist
not use MPI in serial. SuperLU does hang now.

2) While I am doing this grab low hanging fruit and expand this model to
work with Krylov. GMRES has more problems but it looks like
Richardson/cuSparse-ILU only has a problem that convergence testing is
hosed.

I am open to other models, I have a specific problem and would like, as
much as possible, to contribute to PETSc along the way.

On Thu, Jan 21, 2021 at 12:01 PM Mark Adams <mfad...@lbl.gov> wrote:

>
>
> On Thu, Jan 21, 2021 at 11:25 AM Jed Brown <j...@jedbrown.org> wrote:
>
>> Mark Adams <mfad...@lbl.gov> writes:
>>
>> > Yes, the problem is that each KSP solver is running in an OMP thread
>>
>> There can be more or less splits than OMP_NUM_THREADS. Each thread is
>> still calling blocking operations.
>>
>> This is a concurrency problem, not a parallel efficiency problem. It can
>> be solved with async interfaces
>
>
> I don't know how to do that. I want a GPU solver, probably superLU, and am
> starting with cuSparse ilu to get something running
>
> or by making as many threads as splits and ensuring that you don't spin
>> (lest contention kill performance).
>
>
> I don't get correctness with Richardson with > 1 OMP threads currently.
> This is on IBM with GNU.
>
>
>> OpenMP is pretty orthogonal and probably not a good fit.
>>
>
> Do you have an alternative?
>
>
>>
>> > (So at this point it only works for SELF and its Landau so it is all I
>> need). It looks like MPI reductions called with a comm_self are not thread
>> safe (eg, the could say, this is one proc, thus, just copy send --> recv,
>> but they don't)
>> >
>> > On Thu, Jan 21, 2021 at 10:46 AM Matthew Knepley <knep...@gmail.com>
>> wrote:
>> >
>> >> On Thu, Jan 21, 2021 at 10:34 AM Mark Adams <mfad...@lbl.gov> wrote:
>> >>
>> >>> It looks like PETSc is just too clever for me. I am trying to get a
>> >>> different MPI_Comm into each block, but PETSc is thwarting me:
>> >>>
>> >>
>> >> It looks like you are using SELF. Is that what you want? Do you want a
>> >> bunch of comms with the same group, but independent somehow? I am
>> confused.
>> >>
>> >>    Matt
>> >>
>> >>
>> >>>   if (jac->use_openmp) {
>> >>>     ierr          =
>> KSPCreate(MPI_COMM_SELF,&ilink->ksp);CHKERRQ(ierr);
>> >>> PetscPrintf(PETSC_COMM_SELF,"In PCFieldSplitSetFields_FieldSplit with
>> >>> -------------- link: %p. Comms %p
>> >>>
>> %p\n",ilink,PetscObjectComm((PetscObject)pc),PetscObjectComm((PetscObject)ilink->ksp));
>> >>>   } else {
>> >>>     ierr          =
>> >>> KSPCreate(PetscObjectComm((PetscObject)pc),&ilink->ksp);CHKERRQ(ierr);
>> >>>   }
>> >>>
>> >>> produces:
>> >>>
>> >>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>> 0x7e9cb4f0.
>> >>> Comms 0x660c6ad0 0x660c6ad0
>> >>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>> 0x7e88f7d0.
>> >>> Comms 0x660c6ad0 0x660c6ad0
>> >>>
>> >>> How can I work around this?
>> >>>
>> >>>
>> >>> On Thu, Jan 21, 2021 at 7:41 AM Mark Adams <mfad...@lbl.gov> wrote:
>> >>>
>> >>>>
>> >>>>
>> >>>> On Wed, Jan 20, 2021 at 6:21 PM Barry Smith <bsm...@petsc.dev>
>> wrote:
>> >>>>
>> >>>>>
>> >>>>>
>> >>>>> On Jan 20, 2021, at 3:09 PM, Mark Adams <mfad...@lbl.gov> wrote:
>> >>>>>
>> >>>>> So I put in a temporary hack to get the first Fieldsplit apply to
>> NOT
>> >>>>> use OMP and it sort of works.
>> >>>>>
>> >>>>> Preonly/lu is fine. GMRES calls vector creates/dups in every solve
>> so
>> >>>>> that is a big problem.
>> >>>>>
>> >>>>>
>> >>>>>   It should definitely not be creating vectors "in every" solve.
>> But it
>> >>>>> does do lazy allocation of needed restarted vectors which may make
>> it look
>> >>>>> like it is creating "every" vectors in every solve.  You can
>> >>>>> use -ksp_gmres_preallocate to force it to create all the restart
>> vectors up
>> >>>>> front at KSPSetUp().
>> >>>>>
>> >>>>
>> >>>> Well, I run the first solve w/o OMP and I see Vec dups in cuSparse
>> Vecs
>> >>>> in the 2nd solve.
>> >>>>
>> >>>>
>> >>>>>
>> >>>>>   Why is creating vectors "at every solve" a problem? It is not
>> thread
>> >>>>> safe I guess?
>> >>>>>
>> >>>>
>> >>>> It dies when it looks at the options database, in a Free in the
>> >>>> get-options method to be exact (see stacks).
>> >>>>
>> >>>> ======= Backtrace: =========
>> >>>> /lib64/libc.so.6(cfree+0x4a0)[0x200021839be0]
>> >>>>
>> >>>>
>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscFreeAlign+0x4c)[0x2000002a368c]
>> >>>>
>> >>>>
>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscOptionsEnd_Private+0xf4)[0x2000002e53f0]
>> >>>>
>> >>>>
>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x7c6c28)[0x2000008b6c28]
>> >>>>
>> >>>>
>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreate_SeqCUDA+0x11c)[0x20000052c510]
>> >>>>
>> >>>>
>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecSetType+0x670)[0x200000549664]
>> >>>>
>> >>>>
>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreateSeqCUDA+0x150)[0x20000052c0b0]
>> >>>>
>> >>>>
>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x43c198)[0x20000052c198]
>> >>>>
>> >>>>
>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicate+0x44)[0x200000542168]
>> >>>>
>> >>>>
>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs_Default+0x148)[0x200000543820]
>> >>>>
>> >>>>
>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs+0x54)[0x2000005425f4]
>> >>>>
>> >>>>
>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPCreateVecs+0x4b4)[0x2000016f0aec]
>> >>>>
>> >>>>
>> >>>>
>> >>>>>
>> >>>>> Richardson works except the convergence test gets confused,
>> presumably
>> >>>>> because MPI reductions with PETSC_COMM_SELF is not threadsafe.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> One fix for the norms might be to create each subdomain solver with
>> a
>> >>>>> different communicator.
>> >>>>>
>> >>>>>
>> >>>>>    Yes you could do that. It might actually be the correct thing to
>> do
>> >>>>> also, if you have multiple threads call MPI reductions on the same
>> >>>>> communicator that would be a problem. Each KSP should get a new
>> MPI_Comm.
>> >>>>>
>> >>>>
>> >>>> OK. I will only do this.
>> >>>>
>> >>>>
>> >>
>> >> --
>> >> What most experimenters take for granted before they begin their
>> >> experiments is infinitely more interesting than any results to which
>> their
>> >> experiments lead.
>> >> -- Norbert Wiener
>> >>
>> >> https://www.cse.buffalo.edu/~knepley/
>> >> <http://www.cse.buffalo.edu/~knepley/>
>> >>
>>
>

Reply via email to