Great, thanks for letting us know. We should change our example to use individual MPI communicators as well.
Barry > On Dec 20, 2018, at 3:52 PM, Krzysztof Kamieniecki <k...@kamieniecki.com> > wrote: > > Based on Barry's encouragement I spent the extra time to get back to v3.10.2 > and this code seems to work: > > int mpierr; > MPI_Comm mpiCommRaw; > int tag = 0; > mpierr = MPI_Comm_dup(MPI_COMM_SELF, &mpiCommRaw); if(mpierr != MPI_SUCCESS) > throw PSI::Exception("MPI ERROR: MPI_Comm_dup"); > > MPI_Comm mpiComm; > ierr = PetscCommDuplicate(mpiCommRaw, &mpiComm, &tag); > CHKERRABORT(PETSC_COMM_SELF, ierr); > > Tao tao; > ierr = TaoCreate(mpiComm, &tao); CHKERRABORT(mpiComm, ierr); > > ... > ierr = TaoSolve(tao); CHKERRABORT(mpiComm, ierr); > ... > > ierr = TaoDestroy(&tao); CHKERRABORT(mpiComm, ierr); > > ierr = PetscCommDestroy(&mpiComm); CHKERRABORT(PETSC_COMM_SELF, ierr); > > mpierr = MPI_Comm_free(&mpiCommRaw); if(mpierr != MPI_SUCCESS) throw > PSI::Exception("MPI ERROR: MPI_Comm_free"); > > > On Thu, Dec 20, 2018 at 4:06 PM Smith, Barry F. <bsm...@mcs.anl.gov> wrote: > > > > On Dec 20, 2018, at 2:24 PM, Krzysztof Kamieniecki via petsc-users > > <petsc-users@mcs.anl.gov> wrote: > > > > Hi Alp, > > > > Thanks! This worked, I reverted back to v3.9.4 and after removing the > > monitors (which caused an error in PetscViewerASCIIPopTab) it seems to be > > passing tests for now. > > > > (For the future peanut gallery) I misread what PetscCommDuplicate does, it > > does not duplicate Petsc communicators that already "wrap" MPI > > communicators, so I may look into MPI and creating a completely independent > > MPI_Comm for each thread. > > Yes, this should work. > > > > Best Regards, > > Krys > > > > On Thu, Dec 20, 2018 at 12:16 PM Dener, Alp <ade...@anl.gov> wrote: > > Hi Krys, > > > >> On Dec 20, 2018, at 10:59 AM, Krzysztof Kamieniecki via petsc-users > >> <petsc-users@mcs.anl.gov> wrote: > >> > >> That example seems to have critical sections around certain Vec calls, and > >> it looks like my problem occurs in VecDotBegin/VecDotEnd which is called > >> by TAO/BLMVM. > > > > The quasi-Newton matrix objects in BLMVM have asynchronous dot products in > > the matrix-free forward and inverse product formulations. This is a > > relatively recent performance optimization. If avoiding this split phase > > communication would solve the problem, and you don’t need other recent > > PETSc features, you could revert to 3.9 and use the old version of BLMVM > > that will use straight VecDot operations instead. > > > > Unfortunately I don’t know enough about multithreading to definitively say > > whether that will actually solve the problem or not. Other members of the > > community can probably provide a more complete answer on that. > > > >> > >> I assume PetscSplitReductionGet is pulling the PetscSplitReduction for > >> PETSC_COMM_SELF which is shared across the whole process? > >> > >> I tried PetscCommDuplicate/PetscCommDestroy but that does not seem to help. > >> > >> PetscErrorCode VecDotBegin(Vec x,Vec y,PetscScalar *result) > >> { > >> PetscErrorCode ierr; > >> PetscSplitReduction *sr; > >> MPI_Comm comm; > >> > >> PetscFunctionBegin; > >> PetscValidHeaderSpecific(x,VEC_CLASSID,1); > >> PetscValidHeaderSpecific(y,VEC_CLASSID,1); > >> ierr = PetscObjectGetComm((PetscObject)x,&comm);CHKERRQ(ierr); > >> ierr = PetscSplitReductionGet(comm,&sr);CHKERRQ(ierr); > >> if (sr->state != STATE_BEGIN) > >> SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ORDER,"Called before all VecxxxEnd() > >> called"); > >> if (sr->numopsbegin >= sr->maxops) { > >> ierr = PetscSplitReductionExtend(sr);CHKERRQ(ierr); > >> } > >> sr->reducetype[sr->numopsbegin] = PETSC_SR_REDUCE_SUM; > >> sr->invecs[sr->numopsbegin] = (void*)x; > >> if (!x->ops->dot_local) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_SUP,"Vector > >> does not suppport local dots"); > >> ierr = PetscLogEventBegin(VEC_ReduceArithmetic,0,0,0,0);CHKERRQ(ierr); > >> ierr = > >> (*x->ops->dot_local)(x,y,sr->lvalues+sr->numopsbegin++);CHKERRQ(ierr); > >> ierr = PetscLogEventEnd(VEC_ReduceArithmetic,0,0,0,0);CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> > >> > >> > >> On Thu, Dec 20, 2018 at 11:26 AM Smith, Barry F. <bsm...@mcs.anl.gov> > >> wrote: > >> > >> The code src/ksp/ksp/examples/tutorials/ex61f.F90 demonstrates working > >> with multiple threads each managing their own collection of PETSc objects. > >> Hope this helps. > >> > >> Barry > >> > >> > >> > On Dec 20, 2018, at 9:28 AM, Krzysztof Kamieniecki via petsc-users > >> > <petsc-users@mcs.anl.gov> wrote: > >> > > >> > Hello All, > >> > > >> > I have an embarrassingly parallel problem that I would like to use TAO > >> > on, is there some way to do this with threads as opposed to multiple > >> > processes? > >> > > >> > I compiled PETSc with the following flags > >> > ./configure \ > >> > --prefix=${DEP_INSTALL_DIR} \ > >> > --with-threadsafety --with-log=0 --download-concurrencykit \ > >> > --with-openblas=1 \ > >> > --with-openblas-dir=${DEP_INSTALL_DIR} \ > >> > --with-mpi=0 \ > >> > --with-shared=0 \ > >> > --with-debugging=0 COPTFLAGS='-O3' CXXOPTFLAGS='-O3' FOPTFLAGS='-O3' > >> > > >> > When I run TAO in multiple threads I get the error "Called VecxxxEnd() > >> > in a different order or with a different vector than VecxxxBegin()" > >> > > >> > Thanks, > >> > Krys > >> > > >> > > > > — > > Alp > > >