Great, thanks for letting us know. We should change our example to use 
individual MPI communicators as well.

    Barry


> On Dec 20, 2018, at 3:52 PM, Krzysztof Kamieniecki <k...@kamieniecki.com> 
> wrote:
> 
> Based on Barry's encouragement I spent the extra time to get back to v3.10.2 
> and this code seems to work:
> 
> int mpierr;
> MPI_Comm mpiCommRaw;
> int tag = 0;
> mpierr = MPI_Comm_dup(MPI_COMM_SELF, &mpiCommRaw); if(mpierr != MPI_SUCCESS) 
> throw PSI::Exception("MPI ERROR: MPI_Comm_dup");
> 
> MPI_Comm mpiComm;
> ierr = PetscCommDuplicate(mpiCommRaw, &mpiComm, &tag); 
> CHKERRABORT(PETSC_COMM_SELF, ierr);
> 
> Tao tao;
> ierr = TaoCreate(mpiComm, &tao); CHKERRABORT(mpiComm, ierr);
> 
> ...
> ierr = TaoSolve(tao); CHKERRABORT(mpiComm, ierr);
> ...
> 
> ierr = TaoDestroy(&tao); CHKERRABORT(mpiComm, ierr);
> 
> ierr = PetscCommDestroy(&mpiComm); CHKERRABORT(PETSC_COMM_SELF, ierr);
> 
> mpierr = MPI_Comm_free(&mpiCommRaw); if(mpierr != MPI_SUCCESS) throw 
> PSI::Exception("MPI ERROR: MPI_Comm_free");
> 
> 
> On Thu, Dec 20, 2018 at 4:06 PM Smith, Barry F. <bsm...@mcs.anl.gov> wrote:
> 
> 
> > On Dec 20, 2018, at 2:24 PM, Krzysztof Kamieniecki via petsc-users 
> > <petsc-users@mcs.anl.gov> wrote:
> > 
> > Hi Alp,
> > 
> > Thanks! This worked, I reverted back to v3.9.4 and after removing the 
> > monitors (which caused an error in PetscViewerASCIIPopTab) it seems to be 
> > passing tests for now.
> > 
> > (For the future peanut gallery) I misread what PetscCommDuplicate does, it 
> > does not duplicate Petsc communicators that already "wrap" MPI 
> > communicators, so I may look into MPI and creating a completely independent 
> > MPI_Comm for each thread.
> 
>    Yes, this should work.
> > 
> > Best Regards,
> > Krys
> > 
> > On Thu, Dec 20, 2018 at 12:16 PM Dener, Alp <ade...@anl.gov> wrote:
> > Hi Krys,
> > 
> >> On Dec 20, 2018, at 10:59 AM, Krzysztof Kamieniecki via petsc-users 
> >> <petsc-users@mcs.anl.gov> wrote:
> >> 
> >> That example seems to have critical sections around certain Vec calls, and 
> >> it looks like my problem occurs in VecDotBegin/VecDotEnd which is called 
> >> by TAO/BLMVM.
> > 
> > The quasi-Newton matrix objects in BLMVM have asynchronous dot products in 
> > the matrix-free forward and inverse product formulations. This is a 
> > relatively recent performance optimization. If avoiding this split phase 
> > communication would solve the problem, and you don’t need other recent 
> > PETSc features, you could revert to 3.9 and use the old version of BLMVM 
> > that will use straight VecDot operations instead.
> > 
> > Unfortunately I don’t know enough about multithreading to definitively say 
> > whether that will actually solve the problem or not. Other members of the 
> > community can probably provide a more complete answer on that.
> > 
> >> 
> >> I assume  PetscSplitReductionGet is pulling the PetscSplitReduction for 
> >> PETSC_COMM_SELF which is shared across the whole process?
> >> 
> >> I tried PetscCommDuplicate/PetscCommDestroy but that does not seem to help.
> >> 
> >> PetscErrorCode  VecDotBegin(Vec x,Vec y,PetscScalar *result)
> >> {
> >>   PetscErrorCode      ierr;
> >>   PetscSplitReduction *sr;
> >>   MPI_Comm            comm;
> >> 
> >>   PetscFunctionBegin;
> >>   PetscValidHeaderSpecific(x,VEC_CLASSID,1);
> >>   PetscValidHeaderSpecific(y,VEC_CLASSID,1);
> >>   ierr = PetscObjectGetComm((PetscObject)x,&comm);CHKERRQ(ierr);
> >>   ierr = PetscSplitReductionGet(comm,&sr);CHKERRQ(ierr);
> >>   if (sr->state != STATE_BEGIN) 
> >> SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ORDER,"Called before all VecxxxEnd() 
> >> called");
> >>   if (sr->numopsbegin >= sr->maxops) {
> >>     ierr = PetscSplitReductionExtend(sr);CHKERRQ(ierr);
> >>   }
> >>   sr->reducetype[sr->numopsbegin] = PETSC_SR_REDUCE_SUM;
> >>   sr->invecs[sr->numopsbegin]     = (void*)x;
> >>   if (!x->ops->dot_local) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_SUP,"Vector 
> >> does not suppport local dots");
> >>   ierr = PetscLogEventBegin(VEC_ReduceArithmetic,0,0,0,0);CHKERRQ(ierr);
> >>   ierr = 
> >> (*x->ops->dot_local)(x,y,sr->lvalues+sr->numopsbegin++);CHKERRQ(ierr);
> >>   ierr = PetscLogEventEnd(VEC_ReduceArithmetic,0,0,0,0);CHKERRQ(ierr);
> >>   PetscFunctionReturn(0);
> >> }
> >> 
> >> 
> >> 
> >> 
> >> On Thu, Dec 20, 2018 at 11:26 AM Smith, Barry F. <bsm...@mcs.anl.gov> 
> >> wrote:
> >> 
> >>    The code src/ksp/ksp/examples/tutorials/ex61f.F90 demonstrates working 
> >> with multiple threads each managing their own collection of PETSc objects. 
> >> Hope this helps.
> >> 
> >>     Barry
> >> 
> >> 
> >> > On Dec 20, 2018, at 9:28 AM, Krzysztof Kamieniecki via petsc-users 
> >> > <petsc-users@mcs.anl.gov> wrote:
> >> > 
> >> > Hello All,
> >> > 
> >> > I have an embarrassingly parallel problem that I would like to use TAO 
> >> > on, is there some way to do this with threads as opposed to multiple 
> >> > processes?
> >> > 
> >> >  I compiled PETSc with the following flags
> >> > ./configure \
> >> > --prefix=${DEP_INSTALL_DIR} \
> >> > --with-threadsafety --with-log=0 --download-concurrencykit \
> >> > --with-openblas=1 \
> >> > --with-openblas-dir=${DEP_INSTALL_DIR} \
> >> > --with-mpi=0 \
> >> > --with-shared=0 \
> >> > --with-debugging=0 COPTFLAGS='-O3' CXXOPTFLAGS='-O3' FOPTFLAGS='-O3' 
> >> > 
> >> > When I run TAO in multiple threads I get the error "Called VecxxxEnd() 
> >> > in a different order or with a different vector than VecxxxBegin()"
> >> > 
> >> > Thanks,
> >> > Krys
> >> > 
> >> 
> > 
> > —
> > Alp
> > 
> 

Reply via email to