On Sat, Jan 7, 2017 at 4:59 PM, Manuel Valera <mval...@mail.sdsu.edu> wrote:
> I would have to think and code a MWE for this problem before sending it > since the model is much bigger than the petsc solver. Attached here is a > screenshot of the debugger as barry taught me, is that the stack trace you > need ? > > the ucmsMain.f90:522 that shows is the call (from all processes) to the > routine that updates the rhs vector (from root) and scatters it (from all > processes). > Yes, so one process is here and the other has moved on, so there is a mismatch in calls. You could do what Barry suggests, but I think it would be better to just step through your main routine once (its slow going), and see where the divergence happens. Matt > This routine is itself inside a double loop that occurs in all processes > but the only call from all processes to the solver is this one, the rest of > the loop which involves correcting for velocities, pressure and > temperature, all happens in root node. > > Sorry for the convoluted program design, this is the first beta version of > the model working on parallel and was the best i could come with, i suppose > it makes more sense in serial, > > Thanks > > On Sat, Jan 7, 2017 at 2:24 PM, Matthew Knepley <knep...@gmail.com> wrote: > >> On Sat, Jan 7, 2017 at 4:20 PM, Manuel Valera <mval...@mail.sdsu.edu> >> wrote: >> >>> Thank you Matthew, >>> >>> On Sat, Jan 7, 2017 at 1:49 PM, Matthew Knepley <knep...@gmail.com> >>> wrote: >>> >>>> On Sat, Jan 7, 2017 at 3:32 PM, Manuel Valera <mval...@mail.sdsu.edu> >>>> wrote: >>>> >>>>> Hi Devs, hope you are having a great weekend, >>>>> >>>>> I could finally parallelize my linear solver and implement it into the >>>>> rest of the code in a way that only the linear system is solved in >>>>> parallel, great news for my team, but there is a catch and is that i don't >>>>> see any speedup in the linear system, i don't know if its the MPI in the >>>>> cluster we are using, but im not sure on how to debug it, >>>>> >>>> >>>> We need to see -log_view output for any performance question. >>>> >>>> >>>>> On the other hand and because of this issue i was trying to do >>>>> -log_summary or -log_view and i noticed the program in this context hangs >>>>> when is time of producing the log, if i debug this for 2 cores, process 0 >>>>> exits normally but process 1 hangs in the vectorscatterbegin() with >>>>> scatter_reverse way back in the code, >>>>> >>>> >>>> You are calling a collective routine from only 1 process. >>>> >>>> >>> Matt >>>> >>> >>> I am pretty confident this is not the case, >>> >> >> This is still the simplest explanation. Can you send the stack trace for >> the 2 process run? >> >> >>> the callings to vecscattercreatetozero and vecscatterbegin are made in >>> all processes, the program goes thru all of the iterations on the linear >>> solver, writes output correctly and even closes all the petsc objects >>> without complaining, the freeze occurs at the very end when the log is to >>> be produced. >>> >> >> If you can send us a code to run, we can likely find the error. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> >>> Manuel >>> >>> >>> >>>> >>>> >>>>> and even after destroying all associated objects and calling >>>>> petscfinalize(), so im really clueless on why is this, as it only happens >>>>> for -log_* or -ksp_view options. >>>>> >>>>> my -ksp_view shows this: >>>>> >>>>> KSP Object: 2 MPI processes >>>>> >>>>> type: gcr >>>>> >>>>> GCR: restart = 30 >>>>> >>>>> GCR: restarts performed = 20 >>>>> >>>>> maximum iterations=10000, initial guess is zero >>>>> >>>>> tolerances: relative=1e-14, absolute=1e-50, divergence=10000. >>>>> >>>>> right preconditioning >>>>> >>>>> using UNPRECONDITIONED norm type for convergence test >>>>> >>>>> PC Object: 2 MPI processes >>>>> >>>>> type: bjacobi >>>>> >>>>> block Jacobi: number of blocks = 2 >>>>> >>>>> Local solve is same for all blocks, in the following KSP and PC >>>>> objects: >>>>> >>>>> KSP Object: (sub_) 1 MPI processes >>>>> >>>>> type: preonly >>>>> >>>>> maximum iterations=10000, initial guess is zero >>>>> >>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>>> >>>>> left preconditioning >>>>> >>>>> using NONE norm type for convergence test >>>>> >>>>> PC Object: (sub_) 1 MPI processes >>>>> >>>>> type: ilu >>>>> >>>>> ILU: out-of-place factorization >>>>> >>>>> 0 levels of fill >>>>> >>>>> tolerance for zero pivot 2.22045e-14 >>>>> >>>>> matrix ordering: natural >>>>> >>>>> factor fill ratio given 1., needed 1. >>>>> >>>>> Factored matrix follows: >>>>> >>>>> Mat Object: 1 MPI processes >>>>> >>>>> type: seqaij >>>>> >>>>> rows=100000, cols=100000 >>>>> >>>>> package used to perform factorization: petsc >>>>> >>>>> total: nonzeros=1675180, allocated nonzeros=1675180 >>>>> >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> >>>>> not using I-node routines >>>>> >>>>> linear system matrix = precond matrix: >>>>> >>>>> Mat Object: 1 MPI processes >>>>> >>>>> type: seqaij >>>>> >>>>> rows=100000, cols=100000 >>>>> >>>>> total: nonzeros=1675180, allocated nonzeros=1675180 >>>>> >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> >>>>> not using I-node routines >>>>> >>>>> linear system matrix = precond matrix: >>>>> >>>>> Mat Object: 2 MPI processes >>>>> >>>>> type: mpiaij >>>>> >>>>> rows=200000, cols=200000 >>>>> >>>>> total: nonzeros=3373340, allocated nonzeros=3373340 >>>>> >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> >>>>> not using I-node (on process 0) routines >>>>> >>>>> >>>>> >>>>> And i configured my PC object as: >>>>> >>>>> >>>>> call PCSetType(mg,PCHYPRE,ierr) >>>>> >>>>> call PCHYPRESetType(mg,'boomeramg',ierr) >>>>> >>>>> >>>>> call PetscOptionsSetValue(PETSC_NULL_OBJECT,'pc_hypre_boomeramg_n >>>>> odal_coarsen','1',ierr) >>>>> >>>>> call PetscOptionsSetValue(PETSC_NULL_OBJECT,'pc_hypre_boomeramg_v >>>>> ec_interp_variant','1',ierr) >>>>> >>>>> >>>>> >>>>> What are your thoughts ? >>>>> >>>>> Thanks, >>>>> >>>>> Manuel >>>>> >>>>> >>>>> >>>>> On Fri, Jan 6, 2017 at 1:58 PM, Manuel Valera <mval...@mail.sdsu.edu> >>>>> wrote: >>>>> >>>>>> Awesome, that did it, thanks once again. >>>>>> >>>>>> >>>>>> On Fri, Jan 6, 2017 at 1:53 PM, Barry Smith <bsm...@mcs.anl.gov> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> Take the scatter out of the if () since everyone does it and get >>>>>>> rid of the VecView(). >>>>>>> >>>>>>> Does this work? If not where is it hanging? >>>>>>> >>>>>>> >>>>>>> > On Jan 6, 2017, at 3:29 PM, Manuel Valera <mval...@mail.sdsu.edu> >>>>>>> wrote: >>>>>>> > >>>>>>> > Thanks Dave, >>>>>>> > >>>>>>> > I think is interesting it never gave an error on this, after >>>>>>> adding the vecassembly calls it still shows the same behavior, without >>>>>>> complaining, i did: >>>>>>> > >>>>>>> > if(rankl==0)then >>>>>>> > >>>>>>> > call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr) >>>>>>> > call VecAssemblyBegin(bp0,ierr) ; call >>>>>>> VecAssemblyEnd(bp0,ierr); >>>>>>> > CHKERRQ(ierr) >>>>>>> > >>>>>>> endif >>>>>>> > >>>>>>> > >>>>>>> > call VecScatterBegin(ctr,bp0,bp2,IN >>>>>>> SERT_VALUES,SCATTER_REVERSE,ierr) >>>>>>> > call VecScatterEnd(ctr,bp0,bp2,INSE >>>>>>> RT_VALUES,SCATTER_REVERSE,ierr) >>>>>>> > print*,"done! " >>>>>>> > CHKERRQ(ierr) >>>>>>> > >>>>>>> > >>>>>>> > CHKERRQ(ierr) >>>>>>> > >>>>>>> > >>>>>>> > Thanks. >>>>>>> > >>>>>>> > On Fri, Jan 6, 2017 at 12:44 PM, Dave May <dave.mayhe...@gmail.com> >>>>>>> wrote: >>>>>>> > >>>>>>> > >>>>>>> > On 6 January 2017 at 20:24, Manuel Valera <mval...@mail.sdsu.edu> >>>>>>> wrote: >>>>>>> > Great help Barry, i totally had overlooked that option (it is >>>>>>> explicit in the vecscatterbegin call help page but not in >>>>>>> vecscattercreatetozero, as i read later) >>>>>>> > >>>>>>> > So i used that and it works partially, it scatters te values >>>>>>> assigned in root but not the rest, if i call vecscatterbegin from >>>>>>> outside >>>>>>> root it hangs, the code currently look as this: >>>>>>> > >>>>>>> > call VecScatterCreateToZero(bp2,ctr,bp0,ierr); CHKERRQ(ierr) >>>>>>> > >>>>>>> > call PetscObjectSetName(bp0, 'bp0:',ierr) >>>>>>> > >>>>>>> > if(rankl==0)then >>>>>>> > >>>>>>> > call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr) >>>>>>> > >>>>>>> > call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr) >>>>>>> > >>>>>>> > >>>>>>> > You need to call >>>>>>> > >>>>>>> > VecAssemblyBegin(bp0); >>>>>>> > VecAssemblyEnd(bp0); >>>>>>> > after your last call to VecSetValues() before you can do any >>>>>>> operations with bp0. >>>>>>> > >>>>>>> > With your current code, the call to VecView should produce an >>>>>>> error if you used the error checking macro CHKERRQ(ierr) (as should >>>>>>> VecScatter{Begin,End} >>>>>>> > >>>>>>> > Thanks, >>>>>>> > Dave >>>>>>> > >>>>>>> > >>>>>>> > call VecScatterBegin(ctr,bp0,bp2,IN >>>>>>> SERT_VALUES,SCATTER_REVERSE,ierr) >>>>>>> > call VecScatterEnd(ctr,bp0,bp2,INSE >>>>>>> RT_VALUES,SCATTER_REVERSE,ierr) >>>>>>> > print*,"done! " >>>>>>> > CHKERRQ(ierr) >>>>>>> > >>>>>>> > endif >>>>>>> > >>>>>>> > ! call VecScatterBegin(ctr,bp0,bp2,IN >>>>>>> SERT_VALUES,SCATTER_REVERSE,ierr) >>>>>>> > ! call VecScatterEnd(ctr,bp0,bp2,INSE >>>>>>> RT_VALUES,SCATTER_REVERSE,ierr) >>>>>>> > >>>>>>> > call VecView(bp2,PETSC_VIEWER_STDOUT_WORLD,ierr) >>>>>>> > >>>>>>> > call PetscBarrier(PETSC_NULL_OBJECT,ierr) >>>>>>> > >>>>>>> > call exit() >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > And the output is: (with bp the right answer) >>>>>>> > >>>>>>> > Vec Object:bp: 2 MPI processes >>>>>>> > type: mpi >>>>>>> > Process [0] >>>>>>> > 1. >>>>>>> > 2. >>>>>>> > Process [1] >>>>>>> > 4. >>>>>>> > 3. >>>>>>> > Vec Object:bp2: 2 MPI processes (before scatter) >>>>>>> > type: mpi >>>>>>> > Process [0] >>>>>>> > 0. >>>>>>> > 0. >>>>>>> > Process [1] >>>>>>> > 0. >>>>>>> > 0. >>>>>>> > Vec Object:bp0: 1 MPI processes >>>>>>> > type: seq >>>>>>> > 1. >>>>>>> > 2. >>>>>>> > 4. >>>>>>> > 3. >>>>>>> > done! >>>>>>> > Vec Object:bp2: 2 MPI processes (after scatter) >>>>>>> > type: mpi >>>>>>> > Process [0] >>>>>>> > 1. >>>>>>> > 2. >>>>>>> > Process [1] >>>>>>> > 0. >>>>>>> > 0. >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > Thanks inmensely for your help, >>>>>>> > >>>>>>> > Manuel >>>>>>> > >>>>>>> > >>>>>>> > On Thu, Jan 5, 2017 at 4:39 PM, Barry Smith <bsm...@mcs.anl.gov> >>>>>>> wrote: >>>>>>> > >>>>>>> > > On Jan 5, 2017, at 6:21 PM, Manuel Valera <mval...@mail.sdsu.edu> >>>>>>> wrote: >>>>>>> > > >>>>>>> > > Hello Devs is me again, >>>>>>> > > >>>>>>> > > I'm trying to distribute a vector to all called processes, the >>>>>>> vector would be originally in root as a sequential vector and i would >>>>>>> like >>>>>>> to scatter it, what would the best call to do this ? >>>>>>> > > >>>>>>> > > I already know how to gather a distributed vector to root with >>>>>>> VecScatterCreateToZero, this would be the inverse operation, >>>>>>> > >>>>>>> > Use the same VecScatter object but with SCATTER_REVERSE, not >>>>>>> you need to reverse the two vector arguments as well. >>>>>>> > >>>>>>> > >>>>>>> > > i'm currently trying with VecScatterCreate() and as of now im >>>>>>> doing the following: >>>>>>> > > >>>>>>> > > >>>>>>> > > if(rank==0)then >>>>>>> > > >>>>>>> > > >>>>>>> > > call VecCreate(PETSC_COMM_SELF,bp0,ierr); CHKERRQ(ierr) >>>>>>> !if i use WORLD >>>>>>> > > >>>>>>> !freezes in SetSizes >>>>>>> > > call VecSetSizes(bp0,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr) >>>>>>> > > call VecSetType(bp0,VECSEQ,ierr) >>>>>>> > > call VecSetFromOptions(bp0,ierr); CHKERRQ(ierr) >>>>>>> > > >>>>>>> > > >>>>>>> > > call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr) >>>>>>> > > >>>>>>> > > !call VecSet(bp0,5.0D0,ierr); CHKERRQ(ierr) >>>>>>> > > >>>>>>> > > >>>>>>> > > call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr) >>>>>>> > > >>>>>>> > > call VecAssemblyBegin(bp0,ierr) ; call >>>>>>> VecAssemblyEnd(bp0,ierr) !rhs >>>>>>> > > >>>>>>> > > do i=0,nbdp-1,1 >>>>>>> > > ind(i+1) = i >>>>>>> > > enddo >>>>>>> > > >>>>>>> > > call ISCreateGeneral(PETSC_COMM_SEL >>>>>>> F,nbdp,ind,PETSC_COPY_VALUES,locis,ierr) >>>>>>> > > >>>>>>> > > !call VecScatterCreate(bp0,PETSC_NULL_OBJECT,bp2,is,ctr,ierr) >>>>>>> !if i use SELF >>>>>>> > > >>>>>>> !freezes here. >>>>>>> > > >>>>>>> > > call VecScatterCreate(bp0,locis,bp2 >>>>>>> ,PETSC_NULL_OBJECT,ctr,ierr) >>>>>>> > > >>>>>>> > > endif >>>>>>> > > >>>>>>> > > bp2 being the receptor MPI vector to scatter to >>>>>>> > > >>>>>>> > > But it freezes in VecScatterCreate when trying to use more than >>>>>>> one processor, what would be a better approach ? >>>>>>> > > >>>>>>> > > >>>>>>> > > Thanks once again, >>>>>>> > > >>>>>>> > > Manuel >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > On Wed, Jan 4, 2017 at 3:30 PM, Manuel Valera < >>>>>>> mval...@mail.sdsu.edu> wrote: >>>>>>> > > Thanks i had no idea how to debug and read those logs, that >>>>>>> solved this issue at least (i was sending a message from root to >>>>>>> everyone >>>>>>> else, but trying to catch from everyone else including root) >>>>>>> > > >>>>>>> > > Until next time, many thanks, >>>>>>> > > >>>>>>> > > Manuel >>>>>>> > > >>>>>>> > > On Wed, Jan 4, 2017 at 3:23 PM, Matthew Knepley < >>>>>>> knep...@gmail.com> wrote: >>>>>>> > > On Wed, Jan 4, 2017 at 5:21 PM, Manuel Valera < >>>>>>> mval...@mail.sdsu.edu> wrote: >>>>>>> > > I did a PetscBarrier just before calling the vicariate routine >>>>>>> and im pretty sure im calling it from every processor, code looks like >>>>>>> this: >>>>>>> > > >>>>>>> > > From the gdb trace. >>>>>>> > > >>>>>>> > > Proc 0: Is in some MPI routine you call yourself, line 113 >>>>>>> > > >>>>>>> > > Proc 1: Is in VecCreate(), line 130 >>>>>>> > > >>>>>>> > > You need to fix your communication code. >>>>>>> > > >>>>>>> > > Matt >>>>>>> > > >>>>>>> > > call PetscBarrier(PETSC_NULL_OBJECT,ierr) >>>>>>> > > >>>>>>> > > print*,'entering POInit from',rank >>>>>>> > > !call exit() >>>>>>> > > >>>>>>> > > call PetscObjsInit() >>>>>>> > > >>>>>>> > > >>>>>>> > > And output gives: >>>>>>> > > >>>>>>> > > entering POInit from 0 >>>>>>> > > entering POInit from 1 >>>>>>> > > entering POInit from 2 >>>>>>> > > entering POInit from 3 >>>>>>> > > >>>>>>> > > >>>>>>> > > Still hangs in the same way, >>>>>>> > > >>>>>>> > > Thanks, >>>>>>> > > >>>>>>> > > Manuel >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > On Wed, Jan 4, 2017 at 2:55 PM, Manuel Valera < >>>>>>> mval...@mail.sdsu.edu> wrote: >>>>>>> > > Thanks for the answers ! >>>>>>> > > >>>>>>> > > heres the screenshot of what i got from bt in gdb (great hint in >>>>>>> how to debug in petsc, didn't know that) >>>>>>> > > >>>>>>> > > I don't really know what to look at here, >>>>>>> > > >>>>>>> > > Thanks, >>>>>>> > > >>>>>>> > > Manuel >>>>>>> > > >>>>>>> > > On Wed, Jan 4, 2017 at 2:39 PM, Dave May < >>>>>>> dave.mayhe...@gmail.com> wrote: >>>>>>> > > Are you certain ALL ranks in PETSC_COMM_WORLD call these >>>>>>> function(s). These functions cannot be inside if statements like >>>>>>> > > if (rank == 0){ >>>>>>> > > VecCreateMPI(...) >>>>>>> > > } >>>>>>> > > >>>>>>> > > >>>>>>> > > On Wed, 4 Jan 2017 at 23:34, Manuel Valera < >>>>>>> mval...@mail.sdsu.edu> wrote: >>>>>>> > > Thanks Dave for the quick answer, appreciate it, >>>>>>> > > >>>>>>> > > I just tried that and it didn't make a difference, any other >>>>>>> suggestions ? >>>>>>> > > >>>>>>> > > Thanks, >>>>>>> > > Manuel >>>>>>> > > >>>>>>> > > On Wed, Jan 4, 2017 at 2:29 PM, Dave May < >>>>>>> dave.mayhe...@gmail.com> wrote: >>>>>>> > > You need to swap the order of your function calls. >>>>>>> > > Call VecSetSizes() before VecSetType() >>>>>>> > > >>>>>>> > > Thanks, >>>>>>> > > Dave >>>>>>> > > >>>>>>> > > >>>>>>> > > On Wed, 4 Jan 2017 at 23:21, Manuel Valera < >>>>>>> mval...@mail.sdsu.edu> wrote: >>>>>>> > > Hello all, happy new year, >>>>>>> > > >>>>>>> > > I'm working on parallelizing my code, it worked and provided >>>>>>> some results when i just called more than one processor, but created >>>>>>> artifacts because i didn't need one image of the whole program in each >>>>>>> processor, conflicting with each other. >>>>>>> > > >>>>>>> > > Since the pressure solver is the main part i need in parallel im >>>>>>> chosing mpi to run everything in root processor until its time to solve >>>>>>> for >>>>>>> pressure, at this point im trying to create a distributed vector using >>>>>>> either >>>>>>> > > >>>>>>> > > call VecCreateMPI(PETSC_COMM_WORLD, >>>>>>> PETSC_DECIDE,nbdp,xp,ierr) >>>>>>> > > or >>>>>>> > > call VecCreate(PETSC_COMM_WORLD,xp,ierr); CHKERRQ(ierr) >>>>>>> > > call VecSetType(xp,VECMPI,ierr) >>>>>>> > > call VecSetSizes(xp,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr) >>>>>>> > > >>>>>>> > > >>>>>>> > > In both cases program hangs at this point, something it never >>>>>>> happened on the naive way i described before. I've made sure the global >>>>>>> size, nbdp, is the same in every processor. What can be wrong? >>>>>>> > > >>>>>>> > > Thanks for your kind help, >>>>>>> > > >>>>>>> > > Manuel. >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > -- >>>>>>> > > What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which >>>>>>> their >>>>>>> experiments lead. >>>>>>> > > -- Norbert Wiener >>>>>>> > > >>>>>>> > > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener