Re: [petsc-users] [petsc-maint] petsc ksp solver hangs

2019-09-30 Thread Smith, Barry F. via petsc-users


   This is just a memory leak in hypre; you might report it to them.  

   Memory leaks don't cause hangs 

   Barry


> On Sep 30, 2019, at 12:32 AM, Michael Wick  
> wrote:
> 
> Hi Barry:
> 
> Thanks! I can capture an issue from my local run, although I am not 100% sure 
> this is the reason causing the code hanging.
> 
> When I run with -pc_hypre_boomeramg_relax_type_all Chebyshev, valgrind 
> captures a memory leak:
> 
> ==4410== 192 bytes in 8 blocks are indirectly lost in loss record 1 of 5
> ==4410==at 0x4C2FB55: calloc (in 
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==4410==by 0x73FED84: hypre_HostMalloc (hypre_memory.c:192)
> ==4410==by 0x73FEE53: hypre_MAllocWithInit (hypre_memory.c:301)
> ==4410==by 0x73FEF1A: hypre_CAlloc (hypre_memory.c:338)
> ==4410==by 0x726E4C4: hypre_ParCSRRelax_Cheby_Setup (par_cheby.c:70)
> ==4410==by 0x7265A4C: hypre_BoomerAMGSetup (par_amg_setup.c:2738)
> ==4410==by 0x7240F96: HYPRE_BoomerAMGSetup (HYPRE_parcsr_amg.c:52)
> ==4410==by 0x694FFC2: PCSetUp_HYPRE (hypre.c:322)
> ==4410==by 0x69DBE0F: PCSetUp (precon.c:923)
> ==4410==by 0x6B2BDDC: KSPSetUp (itfunc.c:381)
> ==4410==by 0x6B2DABF: KSPSolve (itfunc.c:612)
> 
> Best,
> 
> Mike
> 
> 
> On Sun, Sep 29, 2019 at 9:24 AM Smith, Barry F.  wrote:
> 
>If you have TotalView or DDT or some other parallel debugger you can wait 
> until it is "hanging" and then send a single to  one or more of the processes 
> to stop in and from this get the stack trace. You'll have to figure out for 
> your debugger how that is done.
> 
>If you can start your 72 rank job in "interactive" mode you can launch it 
> with the option -start_in_debugger noxterm -debugger_nodes 0  then it will 
> only start the debugger on the first rank. Now wait until it hangs and do a 
> control c and then you can type bt to get the traceback.
> 
>   Barry
> 
>   Note it is possible to run 72 rank jobs even on a 
> laptop/workstations/non-cluster (so long as they don't use too much memory 
> and take too long to get to the hang point) and the you can use the debugger 
> as I indicated above.
> 
> 
> > On Sep 28, 2019, at 5:32 AM, Michael Wick via petsc-maint 
> >  wrote:
> > 
> > I attached a debugger to my run. The code just hangs without throwing an 
> > error message, interestingly. I uses 72 processors. I turned on the ksp 
> > monitor. And I can see it hangs either at the beginning or the end of KSP 
> > iteration. I also uses valgrind to debug my code on my local machine, which 
> > does not detect any issue. I uses fgmres + fieldsplit, which is really a 
> > standard option.
> > 
> > Do you have any suggestions to do?
> > 
> > On Fri, Sep 27, 2019 at 8:17 PM Zhang, Junchao  wrote:
> > How many MPI ranks did you use? If it is done on your desktop, you can just 
> > attach a debugger to a MPI process to see what is going on.
> > 
> > --Junchao Zhang
> > 
> > 
> > On Fri, Sep 27, 2019 at 4:24 PM Michael Wick via petsc-maint 
> >  wrote:
> > Hi PETSc:
> > 
> > I have been experiencing a code stagnation at certain KSP iterations. This 
> > happens rather randomly, which means the code may stop at the middle of a 
> > KSP solve and hangs there.
> > 
> > I have used valgrind and detect nothing. I just wonder if you have any 
> > suggestions.
> > 
> > Thanks!!!
> > M
> 



Re: [petsc-users] [petsc-maint] petsc ksp solver hangs

2019-09-29 Thread Smith, Barry F. via petsc-users


   If you have TotalView or DDT or some other parallel debugger you can wait 
until it is "hanging" and then send a single to  one or more of the processes 
to stop in and from this get the stack trace. You'll have to figure out for 
your debugger how that is done.

   If you can start your 72 rank job in "interactive" mode you can launch it 
with the option -start_in_debugger noxterm -debugger_nodes 0  then it will only 
start the debugger on the first rank. Now wait until it hangs and do a control 
c and then you can type bt to get the traceback.

  Barry

  Note it is possible to run 72 rank jobs even on a 
laptop/workstations/non-cluster (so long as they don't use too much memory and 
take too long to get to the hang point) and the you can use the debugger as I 
indicated above.


> On Sep 28, 2019, at 5:32 AM, Michael Wick via petsc-maint 
>  wrote:
> 
> I attached a debugger to my run. The code just hangs without throwing an 
> error message, interestingly. I uses 72 processors. I turned on the ksp 
> monitor. And I can see it hangs either at the beginning or the end of KSP 
> iteration. I also uses valgrind to debug my code on my local machine, which 
> does not detect any issue. I uses fgmres + fieldsplit, which is really a 
> standard option.
> 
> Do you have any suggestions to do?
> 
> On Fri, Sep 27, 2019 at 8:17 PM Zhang, Junchao  wrote:
> How many MPI ranks did you use? If it is done on your desktop, you can just 
> attach a debugger to a MPI process to see what is going on.
> 
> --Junchao Zhang
> 
> 
> On Fri, Sep 27, 2019 at 4:24 PM Michael Wick via petsc-maint 
>  wrote:
> Hi PETSc:
> 
> I have been experiencing a code stagnation at certain KSP iterations. This 
> happens rather randomly, which means the code may stop at the middle of a KSP 
> solve and hangs there.
> 
> I have used valgrind and detect nothing. I just wonder if you have any 
> suggestions.
> 
> Thanks!!!
> M



Re: [petsc-users] [petsc-maint] petsc ksp solver hangs

2019-09-29 Thread Mark Adams via petsc-users
On Sun, Sep 29, 2019 at 1:30 AM Michael Wick via petsc-maint <
petsc-ma...@mcs.anl.gov> wrote:

> Thank you all for the reply.
>
> I am trying to get the backtrace. However, the code hangs totally
> randomly, and it hangs only when I run large simulations (e.g. 72 CPUs for
> this one). I am trying very hard to get the error message.
>
> So far, I can pin-point that the issue is related with hypre, and a static
> build of the petsc library. Switching to a dynamic build works fine so far.
> Also, using a naked gmres works. Does anyone have similar issues before?
>

I've never heard of a problem like this. You might try deleting your
architectured directory (a make clean essentially) and reconfigure.

If dynamic builds work is there any reason not to just do that and move on?


>
> On Sat, Sep 28, 2019 at 6:28 AM Stefano Zampini 
> wrote:
>
>> In my experience, an hanging execution may results from seterrq being
>> called with the wrong communicator. Anyway, it would be useful to get the
>> output of -log_trace .
>>
>> Also, does it hang when -pc_type none is specified?
>>
>> Il Sab 28 Set 2019, 16:22 Zhang, Junchao via petsc-users <
>> petsc-users@mcs.anl.gov> ha scritto:
>>
>>> Does it hang with  2 or 4 processes? Which PETSc version do you use
>>> (using the latest is easier for us to debug)?  Did you configure PETSc with
>>> --with-debugging=yes COPTFLAGS="-O0 -g"  CXXOPTFLAGS="-O0 -g"
>>> After attaching gdb to one process, you can use bt  to see its stack
>>> trace.
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Sat, Sep 28, 2019 at 5:33 AM Michael Wick <
>>> michael.wick.1...@gmail.com> wrote:
>>>
 I attached a debugger to my run. The code just hangs without throwing
 an error message, interestingly. I uses 72 processors. I turned on the ksp
 monitor. And I can see it hangs either at the beginning or the end of KSP
 iteration. I also uses valgrind to debug my code on my local machine, which
 does not detect any issue. I uses fgmres + fieldsplit, which is really a
 standard option.

 Do you have any suggestions to do?

 On Fri, Sep 27, 2019 at 8:17 PM Zhang, Junchao 
 wrote:

> How many MPI ranks did you use? If it is done on your desktop, you can
> just attach a debugger to a MPI process to see what is going on.
>
> --Junchao Zhang
>
>
> On Fri, Sep 27, 2019 at 4:24 PM Michael Wick via petsc-maint <
> petsc-ma...@mcs.anl.gov> wrote:
>
>> Hi PETSc:
>>
>> I have been experiencing a code stagnation at certain KSP iterations.
>> This happens rather randomly, which means the code may stop at the middle
>> of a KSP solve and hangs there.
>>
>> I have used valgrind and detect nothing. I just wonder if you have
>> any suggestions.
>>
>> Thanks!!!
>> M
>>
>


Re: [petsc-users] [petsc-maint] petsc ksp solver hangs

2019-09-28 Thread Zhang, Junchao via petsc-users
Does it hang with  2 or 4 processes? Which PETSc version do you use (using the 
latest is easier for us to debug)?  Did you configure PETSc with 
--with-debugging=yes COPTFLAGS="-O0 -g"  CXXOPTFLAGS="-O0 -g"
After attaching gdb to one process, you can use bt  to see its stack trace.

--Junchao Zhang


On Sat, Sep 28, 2019 at 5:33 AM Michael Wick 
mailto:michael.wick.1...@gmail.com>> wrote:
I attached a debugger to my run. The code just hangs without throwing an error 
message, interestingly. I uses 72 processors. I turned on the ksp monitor. And 
I can see it hangs either at the beginning or the end of KSP iteration. I also 
uses valgrind to debug my code on my local machine, which does not detect any 
issue. I uses fgmres + fieldsplit, which is really a standard option.

Do you have any suggestions to do?

On Fri, Sep 27, 2019 at 8:17 PM Zhang, Junchao 
mailto:jczh...@mcs.anl.gov>> wrote:
How many MPI ranks did you use? If it is done on your desktop, you can just 
attach a debugger to a MPI process to see what is going on.

--Junchao Zhang


On Fri, Sep 27, 2019 at 4:24 PM Michael Wick via petsc-maint 
mailto:petsc-ma...@mcs.anl.gov>> wrote:
Hi PETSc:

I have been experiencing a code stagnation at certain KSP iterations. This 
happens rather randomly, which means the code may stop at the middle of a KSP 
solve and hangs there.

I have used valgrind and detect nothing. I just wonder if you have any 
suggestions.

Thanks!!!
M