On Sat, Sep 26, 2020 at 1:07 PM Matthew Knepley <knep...@gmail.com> wrote:
> On Sat, Sep 26, 2020 at 11:17 AM Mark McClure <m...@resfrac.com> wrote: > >> Thank you, all for the explanations. >> >> Following Matt's suggestion, we'll use -g (and not use -with-debugging=0) >> all future compiles to all users, so in future, we can provide better >> information. >> >> Second, Chris is going to boil our function down to minimum stub and >> share in case there is some subtle issue with the way the functions are >> being called. >> >> Third, I have question/request - Petsc is, in fact, detecting an error. >> As far as I can tell, this is not an uncontrolled 'seg fault'. It seems to >> me that maybe Petsc could choose to return out from the function when it >> detects this error, returning an error code, rather than dumping the core >> and terminating the program. If Petsc simply returned out with an error >> message, this would resolve the problem for us. After the Petsc call, we >> check for Petsc error messages. If Petsc returns an error - that's fine - >> we use a direct solver as a backup, and the simulation continues. So - I am >> not sure whether this is feasible - but if Petsc could return out with an >> error message - rather than dumping the core and terminating the program - >> then that would effectively resolve the issue for us. Would this change be >> possible? >> > > At some level, I think it is currently doing what you want. CHKERRQ() > simply returns an error code from that function call, printing an error > message. Suppressing the message is harder I think, > He does not need this. > but for now, if you know what function call is causing the error, you can > just catch the (ierr != 0) yourself instead of using CHKERRQ. > This is what I suggested earlier but maybe I was not clear enough. Your code calls something like ierr = SNESSolve(....); CHKERRQ(ierr); You can replace this with: ierr = SNESSolve(....); if (ierr) { .... } I suggested something earlier to do here. Maybe call KSPView. You could even destroy the solver and start the solver from scratch and see if that works. Mark > The drawback here is that we might not have cleaned up > all the state so that restarting makes sense. It should be possible to > just kill the solve, reset the solver, and retry, although it is not clear > to me at first glance if MPI will be in an okay state. > >