Re: [petsc-users] [KSP] PETSc not reporting a KSP fail when true residual is NaN

2022-03-07 Thread Barry Smith

   The fix for the problem Geiovane encountered is in 
https://gitlab.com/petsc/petsc/-/merge_requests/4934 



> On Mar 3, 2022, at 11:24 AM, Giovane Avancini  wrote:
> 
> Sorry for my late reply Barry,
> 
> Sure I can share the code with you, but unfortunately I don't know how to 
> make docker images. If you don't mind, you can clone the code from github 
> through this link: g...@github.com:giavancini/runPFEM.git
> It can be easily compiled with cmake, and you can see the dependencies in 
> README.md. Please let me know if you need any other information.
> 
> Kind regards,
> 
> Giovane
> 
> Em sex., 25 de fev. de 2022 às 18:22, Barry Smith  > escreveu:
> 
>  Hmm, this is going to be tricky to debug why it the Inf/Nan is not found 
> when it should be. 
> 
>  In a debugger you can catch/trap floating point exceptions (how to do 
> this depends on your debugger) and then step through the code after that to 
> see why PETSc KSP is not properly noting the Inf/Nan and returning. This may 
> be cumbersome to do if you don't know PETSc well. Is your code easy to build, 
> would be willing to share it to me so I can run it and debug directly? If you 
> know how to make docker images or something you might be able to give it to 
> me easily.
> 
>   Barry
> 
> 
>> On Feb 25, 2022, at 3:59 PM, Giovane Avancini > > wrote:
>> 
>> Mark, Matthew and Barry,
>> 
>> Thank you all for the quick responses.
>> 
>> Others might have a better idea, but you could run with '-info :ksp' and see 
>> if you see any messages like "Linear solver has created a not a number (NaN) 
>> as the residual norm, declaring divergence \n"
>> You could also run with -log_trace and see if it is using 
>> KSPConvergedDefault. I'm not sure if this is the method used given your 
>> parameters, but I think it is.
>> Mark, I ran with both options. I didn't get any messages like "linear solver 
>> has created a not a number..." when using -info: ksp. When turning on 
>> -log_trace, I could verify that it is using KSPConvergedDefault but what 
>> does it mean exactly? When FGMRES converges with the true residual being 
>> NaN, I get the following message: [0] KSPConvergedDefault(): Linear solver 
>> has converged. Residual norm 8.897908325511e-05 is less than relative 
>> tolerance 1.e-08 times initial right hand side norm 
>> 1.466597558465e+04 at iteration 53. No information about NaN whatsoever.
>> 
>> We check for NaN or Inf, for example, in KSPCheckDot(). if you have the KSP 
>> set to error 
>> (https://petsc.org/main/docs/manualpages/KSP/KSPSetErrorIfNotConverged.html 
>> )
>> then we throw an error, but the return codes do not seem to be checked in 
>> your implementation. If not, then we set the flag for divergence.
>> Matthew, I do not check the return code in this case because I don't want 
>> PETSc to stop if an error occurs during the solving step. I just want to 
>> know that it didn't converge and treat this error inside my code. The 
>> problem is that the flag for divergence is not always being set when FGMRES 
>> is not converging. I was just wondering why it was set during time step 921 
>> and why not for time step 922 as well.
>> 
>> Thanks for the complete report. It looks like we may be missing a check in 
>> our FGMRES implementation that allows the iteration to continue after a 
>> NaN/Inf. 
>> 
>> I will explain how we handle the checking and then attach a patch that 
>> you can apply to see if it resolves the problem.  Whenever our KSP solvers 
>> compute a norm we
>> check after that calculation to verify that the norm is not an Inf or Nan. 
>> This is an inexpensive global check across all MPI ranks because immediately 
>> after the norm computation all ranks that share the KSP have the same value. 
>> If the norm is a Inf or Nan we "short-circuit" the KSP solve and return 
>> immediately with an appropriate not converged code. A quick eye-ball 
>> inspection of the FGMRES code found a missing check. 
>> 
>>You can apply the attached patch file in the PETSC_DIR with 
>> 
>> patch -p1 < fgmres.patch
>> make libs
>> 
>> then rerun your code and see if it now handles the Inf/NaN correctly. If so 
>> we'll patch our release branch with the fix.
>> Thank you for checking this, Barry. I applied the patch exactly the way you 
>> instructed, however, the problem is still happening. Is there a way to check 
>> if the patch was in fact applied? You can see in the attached screenshot the 
>> terminal information.
>> 
>> Kind regards,
>> 
>> Giovane
>> 
>> Em sex., 25 de fev. de 2022 às 13:48, Barry Smith > > escreveu:
>> 
>>   Giovane,
>> 
>> Thanks for the complete report. It looks like we may be missing a check 
>> in our FGMRES implementation that allows the iteration to continue after a 
>> 

Re: [petsc-users] SLEPc solve: progress info and abort option

2022-03-07 Thread Matthew Knepley
On Mon, Mar 7, 2022 at 6:23 AM Jose E. Roman  wrote:

>
>
> > El 7 mar 2022, a las 12:00, Varun Hiremath 
> escribió:
> >
> > Thanks, Matt and Jose! I have added a custom function to KSPMonitorSet,
> and that improves the response time for the abort option, however, it is
> still a bit slow for very big problems, but I think that is probably
> because I am using the MUMPS direct solver so likely a large amount of time
> is spent inside MUMPS. And I am guessing there is no way to get the
> progress info of MUMPS from PETSc?
>

Yes, we do not have a way of looking into MUMPS. You might see if they have
a suggestion.

  Thanks,

 Matt


> > Jose, for the progress bar I am using the number of converged
> eigenvalues (nconv) as obtained using EPSMonitorSet function. But this is
> slow as it is called only once every iteration, and typically many
> eigenvalues converge within an iteration, so is there any way to get more
> detailed/finer info on the solver progress?
>
> It is typical that Krylov solvers converge several eigenvalues at once.
> You can look at the residual norm of the first uncoverged eigenvalue to see
> "how far" you are from convergence. But convergence may be irregular. You
> can also try reducing the ncv parameter, so that the monitor is called more
> often, but this will probably slow down convergence.
>
> Jose
>
>
> >
> > Many thanks for your help.
> >
> > Thanks,
> > Varun
> >
> > On Fri, Mar 4, 2022 at 11:36 AM Jose E. Roman  > wrote:
> > Yes, assuming that the eigensolver is calling KSPSolve(), you can set a
> monitor with KSPMonitorSet(). This will be called more often than the
> callback for EPSSetStoppingTestFunction().
> >
> > Jose
> >
> > > El 4 mar 2022, a las 20:16, Matthew Knepley  > escribió:
> > >
> > >
> > > On Fri, Mar 4, 2022 at 2:07 PM Varun Hiremath  > wrote:
> > > Hi All,
> > >
> > > We use SLEPc to compute eigenvalues of big problems which typically
> takes a long time. We want to add a progress bar to inform the user of the
> estimated time remaining to finish the computation. In addition, we also
> want to add an option for the user to abort the computation midway if
> needed.
> > >
> > > To some extent, I am able to do these by attaching a custom function
> to EPSSetStoppingTestFunction and using nconv/nev as an indication of
> progress, and throwing an exception when the user decides to abort the
> computation. However, since this function gets called only once every
> iteration, for very big problems it takes a long time for the program to
> respond. I was wondering if there is any other function to which I can
> attach, which gets called more frequently and can provide more fine-grained
> information on the progress.
> > >
> > > I believe (Jose can correct me) that the bulk of the time in an
> iterate would be in the linear solve. You can insert something into a
> KSPMonitor. If you know the convergence tolerance and assume a linear
> convergence rate I guess you could estimate the "amount done".
> > >
> > >   Thanks,
> > >
> > >  Matt
> > >
> > > Thanks,
> > > Varun
> > >
> > >
> > > --
> > > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > > -- Norbert Wiener
> > >
> > > https://www.cse.buffalo.edu/~knepley/ <
> https://www.cse.buffalo.edu/~knepley/>
> >
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] SLEPc solve: progress info and abort option

2022-03-07 Thread Jose E. Roman



> El 7 mar 2022, a las 12:00, Varun Hiremath  escribió:
> 
> Thanks, Matt and Jose! I have added a custom function to KSPMonitorSet, and 
> that improves the response time for the abort option, however, it is still a 
> bit slow for very big problems, but I think that is probably because I am 
> using the MUMPS direct solver so likely a large amount of time is spent 
> inside MUMPS. And I am guessing there is no way to get the progress info of 
> MUMPS from PETSc?
> 
> Jose, for the progress bar I am using the number of converged eigenvalues 
> (nconv) as obtained using EPSMonitorSet function. But this is slow as it is 
> called only once every iteration, and typically many eigenvalues converge 
> within an iteration, so is there any way to get more detailed/finer info on 
> the solver progress?

It is typical that Krylov solvers converge several eigenvalues at once. You can 
look at the residual norm of the first uncoverged eigenvalue to see "how far" 
you are from convergence. But convergence may be irregular. You can also try 
reducing the ncv parameter, so that the monitor is called more often, but this 
will probably slow down convergence.

Jose


> 
> Many thanks for your help.
> 
> Thanks,
> Varun
> 
> On Fri, Mar 4, 2022 at 11:36 AM Jose E. Roman  > wrote:
> Yes, assuming that the eigensolver is calling KSPSolve(), you can set a 
> monitor with KSPMonitorSet(). This will be called more often than the 
> callback for EPSSetStoppingTestFunction(). 
> 
> Jose
> 
> > El 4 mar 2022, a las 20:16, Matthew Knepley  > > escribió:
> > 
> > 
> > On Fri, Mar 4, 2022 at 2:07 PM Varun Hiremath  > > wrote:
> > Hi All,
> > 
> > We use SLEPc to compute eigenvalues of big problems which typically takes a 
> > long time. We want to add a progress bar to inform the user of the 
> > estimated time remaining to finish the computation. In addition, we also 
> > want to add an option for the user to abort the computation midway if 
> > needed. 
> > 
> > To some extent, I am able to do these by attaching a custom function to 
> > EPSSetStoppingTestFunction and using nconv/nev as an indication of 
> > progress, and throwing an exception when the user decides to abort the 
> > computation. However, since this function gets called only once every 
> > iteration, for very big problems it takes a long time for the program to 
> > respond. I was wondering if there is any other function to which I can 
> > attach, which gets called more frequently and can provide more fine-grained 
> > information on the progress.
> > 
> > I believe (Jose can correct me) that the bulk of the time in an iterate 
> > would be in the linear solve. You can insert something into a KSPMonitor. 
> > If you know the convergence tolerance and assume a linear convergence rate 
> > I guess you could estimate the "amount done".
> > 
> >   Thanks,
> > 
> >  Matt
> >  
> > Thanks,
> > Varun
> > 
> > 
> > -- 
> > What most experimenters take for granted before they begin their 
> > experiments is infinitely more interesting than any results to which their 
> > experiments lead.
> > -- Norbert Wiener
> > 
> > https://www.cse.buffalo.edu/~knepley/ 
> > 
> 



Re: [petsc-users] SLEPc solve: progress info and abort option

2022-03-07 Thread Varun Hiremath
Thanks, Matt and Jose! I have added a custom function to KSPMonitorSet, and
that improves the response time for the *abort *option, however, it is
still a bit slow for very big problems, but I think that is probably
because I am using the MUMPS direct solver so likely a large amount of time
is spent inside MUMPS. And I am guessing there is no way to get the
progress info of MUMPS from PETSc?

Jose, for the progress bar I am using the number of converged eigenvalues
(nconv) as obtained using EPSMonitorSet function. But this is slow as it is
called only once every iteration, and typically many eigenvalues converge
within an iteration, so is there any way to get more detailed/finer info on
the solver progress?

Many thanks for your help.

Thanks,
Varun

On Fri, Mar 4, 2022 at 11:36 AM Jose E. Roman  wrote:

> Yes, assuming that the eigensolver is calling KSPSolve(), you can set a
> monitor with KSPMonitorSet(). This will be called more often than the
> callback for EPSSetStoppingTestFunction().
>
> Jose
>
> > El 4 mar 2022, a las 20:16, Matthew Knepley 
> escribió:
> >
> >
> > On Fri, Mar 4, 2022 at 2:07 PM Varun Hiremath 
> wrote:
> > Hi All,
> >
> > We use SLEPc to compute eigenvalues of big problems which typically
> takes a long time. We want to add a progress bar to inform the user of the
> estimated time remaining to finish the computation. In addition, we also
> want to add an option for the user to abort the computation midway if
> needed.
> >
> > To some extent, I am able to do these by attaching a custom function to
> EPSSetStoppingTestFunction and using nconv/nev as an indication of
> progress, and throwing an exception when the user decides to abort the
> computation. However, since this function gets called only once every
> iteration, for very big problems it takes a long time for the program to
> respond. I was wondering if there is any other function to which I can
> attach, which gets called more frequently and can provide more fine-grained
> information on the progress.
> >
> > I believe (Jose can correct me) that the bulk of the time in an iterate
> would be in the linear solve. You can insert something into a KSPMonitor.
> If you know the convergence tolerance and assume a linear convergence rate
> I guess you could estimate the "amount done".
> >
> >   Thanks,
> >
> >  Matt
> >
> > Thanks,
> > Varun
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
>
>