Re: [petsc-users] Automatically re-solving after MUMPS error

Barry Smith Thu, 01 Oct 2015 17:14:27 -0700
  Excellent we'll make the change in PETSc and do our own testing. And 
eventually add the clean return from failed MUMPS functionality in PETSc.
> On Oct 1, 2015, at 7:04 PM, Matt Landreman <matt.landre...@gmail.com> wrote:
> 
> Hi Barry,
> Your suggestion of removing the "if (mumps->CleanUpMUMPS)" in mumps.c did 
> resolve the problem for me.
> Thanks,
> -Matt
> 
> On Wed, Sep 30, 2015 at 6:28 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
> 
>   Matt,
> 
>    Please try the following: edit
> 
> #undef __FUNCT__
> #define __FUNCT__ "MatDestroy_MUMPS"
> PetscErrorCode MatDestroy_MUMPS(Mat A)
> {
>   Mat_MUMPS      *mumps=(Mat_MUMPS*)A->spptr;
>   PetscErrorCode ierr;
> 
>   PetscFunctionBegin;
>   if (mumps->CleanUpMUMPS) {
> 
>   Remove this if () test and just always do the lines of clean up code after 
> it. Let us know if this resolves the problem?
> 
> Thanks
> 
>    Barry
> 
> This CleanUpMUMPS flag has always be goofy and definitely needs to be 
> removed, the only question is if some other changes are needed when it is 
> removed.
> 
> 
> > On Sep 30, 2015, at 4:59 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
> >
> >
> >  Matt,
> >
> >   Yes, you must be right The MatDestroy() on the "partially factored" 
> > matrix should clean up everything properly but it sounds like it is not. 
> > I'll look at it right now but I only have a few minutes; if I can't resolve 
> > it really quickly it may take a day or two.
> >
> >
> >  Barry
> >
> >> On Sep 30, 2015, at 4:10 PM, Matt Landreman <matt.landre...@gmail.com> 
> >> wrote:
> >>
> >> Hi Barry,
> >> I tried adding PetscMallocDump after SNESDestroy as you suggested. When 
> >> mumps fails, PetscMallocDump shows a number of mallocs which are absent 
> >> when mumps succeeds, the largest being MatConvertToTriples_mpiaij_mpiaij() 
> >> (line 638 in petsc-3.6.0/src/mat/impls/aij/mpi/mumps/mumps.c).  The total 
> >> memory reported by PetscMallocDump after SNESDestroy is substantially 
> >> (>20x) larger when mumps fails than when mumps succeeds, and this amount 
> >> increases uniformly with each mumps failure.  So I think some of the 
> >> mumps-related structures are not being deallocated by SNESDestroy if mumps 
> >> generates an error.
> >> Thanks,
> >> -Matt
> >>
> >> On Wed, Sep 30, 2015 at 2:16 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
> >>
> >>> On Sep 30, 2015, at 1:06 PM, Matt Landreman <matt.landre...@gmail.com> 
> >>> wrote:
> >>>
> >>> PETSc developers,
> >>>
> >>> I tried implementing a system for automatically increasing MUMPS 
> >>> ICNTL(14), along the lines described in this recent thread. If SNESSolve 
> >>> returns ierr .ne. 0 due to MUMPS error -9, I call SNESDestroy, 
> >>> re-initialize SNES, call MatMumpsSetIcntl with a larger value of 
> >>> ICNTL(14), call SNESSolve again, and repeat as needed. The procedure 
> >>> works, but the peak memory required (as measured by the HPC system) is 
> >>> 50%-100% higher if the MUMPS solve has to be repeated compared to when 
> >>> MUMPS works on the 1st try (by starting with a large ICNTL(14)), even 
> >>> though SNESDestroy is called in between the attempts. Are there some 
> >>> PETSc or MUMPS structures which would not be deallocated immediately by 
> >>> SNESDestroy?  If so, how do I deallocate them?
> >>
> >>   They should be all destroyed automatically for you. You can use 
> >> PetscMallocDump() after the SNES is destroyed to verify that all that 
> >> memory is not properly freed.
> >>
> >>   My guess is that your new malloc() with the bigger workspace cannot 
> >> "reuse" the space that was previously freed; so to the OS it looks like 
> >> you are using a lot more space but in terms of physical memory you are not 
> >> using more.
> >>
> >>  Barry
> >>
> >>>
> >>> Thanks,
> >>> Matt Landreman
> >>>
> >>>
> >>> On Tue, Sep 15, 2015 at 7:47 AM, David Knezevic 
> >>> <david.kneze...@akselos.com> wrote:
> >>> On Tue, Sep 15, 2015 at 7:29 PM, Matthew Knepley <knep...@gmail.com> 
> >>> wrote:
> >>> On Tue, Sep 15, 2015 at 4:30 AM, David Knezevic 
> >>> <david.kneze...@akselos.com> wrote:
> >>> In some cases, I get MUMPS error -9, i.e.:
> >>> [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: 
> >>> INFO(1)=-9, INFO(2)=98927
> >>>
> >>> This is easily fixed by re-running the executable with 
> >>> -mat_mumps_icntl_14 on the commandline.
> >>>
> >>> However, I would like to update my code in order to do this 
> >>> automatically, i.e. detect the -9 error and re-run with the appropriate 
> >>> option. Is there a recommended way to do this? It seems to me that I 
> >>> could do this with a PETSc error handler (e.g. PetscPushErrorHandler) in 
> >>> order to call a function that sets the appropriate option and solves 
> >>> again, is that right? Are there any examples that illustrate this type of 
> >>> thing?
> >>>
> >>> I would not use the error handler. I would just check the ierr return 
> >>> code from the solver. I think you need the
> >>> INFO output, for which you can use MatMumpsGetInfo().
> >>>
> >>>
> >>> OK, that sounds good (and much simpler than what I had in mind), thanks 
> >>> for the help!
> >>>
> >>> David
> >>>
> >>>
> >>
> >>
> >
> 
>
Re: [petsc-users] Automatically re-solving after MUMPS error

Reply via email to