On Sat, Sep 26, 2020 at 5:44 PM Mark Adams <mfad...@lbl.gov> wrote: > > > On Sat, Sep 26, 2020 at 1:07 PM Matthew Knepley <knep...@gmail.com> wrote: > >> On Sat, Sep 26, 2020 at 11:17 AM Mark McClure <m...@resfrac.com> wrote: >> >>> Thank you, all for the explanations. >>> >>> Following Matt's suggestion, we'll use -g (and not use >>> -with-debugging=0) all future compiles to all users, so in future, we can >>> provide better information. >>> >>> Second, Chris is going to boil our function down to minimum stub and >>> share in case there is some subtle issue with the way the functions are >>> being called. >>> >>> Third, I have question/request - Petsc is, in fact, detecting an error. >>> As far as I can tell, this is not an uncontrolled 'seg fault'. It seems to >>> me that maybe Petsc could choose to return out from the function when it >>> detects this error, returning an error code, rather than dumping the core >>> and terminating the program. If Petsc simply returned out with an error >>> message, this would resolve the problem for us. After the Petsc call, we >>> check for Petsc error messages. If Petsc returns an error - that's fine - >>> we use a direct solver as a backup, and the simulation continues. So - I am >>> not sure whether this is feasible - but if Petsc could return out with an >>> error message - rather than dumping the core and terminating the program - >>> then that would effectively resolve the issue for us. Would this change be >>> possible? >>> >> >> At some level, I think it is currently doing what you want. CHKERRQ() >> simply returns an error code from that function call, printing an error >> message. Suppressing the message is harder I think, >> > > He does not need this. > > >> but for now, if you know what function call is causing the error, you can >> just catch the (ierr != 0) yourself instead of using CHKERRQ. >> > > This is what I suggested earlier but maybe I was not clear enough. > > Your code calls something like > > ierr = SNESSolve(....); CHKERRQ(ierr); > > You can replace this with: > > ierr = SNESSolve(....); > if (ierr) { > How to deal with CHKERRQ(ierr); inside SNESSolve()?
> .... > } > > I suggested something earlier to do here. Maybe call KSPView. You could > even destroy the solver and start the solver from scratch and see if that > works. > > Mark > > >> The drawback here is that we might not have cleaned up >> all the state so that restarting makes sense. It should be possible to >> just kill the solve, reset the solver, and retry, although it is not clear >> to me at first glance if MPI will be in an okay state. >> >>