Ok, having looked at this a bit more, I'm inclined to support Junchao's approach, but there seems to be concern that, even if the standards support it, there could be issues in some scenarios.
I don't have enough information to dispute this. But if this was put in many years ago, I'm interested in what the other MPI libraries do now - eg. does trilinos etc use MPI_ABORT in the signal handler. If not, do users report issues hanging on terminate? ________________________________ From: Junchao Zhang <junchao.zh...@gmail.com> Sent: Friday, June 5, 2020 7:26 PM To: Hudson, Stephen Tobias P <shud...@anl.gov> Cc: Lisandro Dalcin <dalc...@gmail.com>; petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov> Subject: Re: [petsc-users] Terminating a process running petsc via petsc4py without mpi_abort On Fri, Jun 5, 2020 at 3:39 PM Hudson, Stephen Tobias P via petsc-users <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote: It seems I do have to bypass Python's multiprocessing somewhat limited interface. E.g. self.process._popen._send_signal(signal.SIGINT) which works, but I am by-passing the API. I would support allowing the user to configure at run-time the signal handling for SIGTERM to exit without MPI_ABORT. I think I understand MPI_ABORT being the default, I've experienced hangs due to errors on single processes. “hangs due to errors on single processes". If the single processes call exit(), then there will be no hang. ________________________________ From: Hudson, Stephen Tobias P <shud...@anl.gov<mailto:shud...@anl.gov>> Sent: Friday, June 5, 2020 2:41 PM To: Lisandro Dalcin <dalc...@gmail.com<mailto:dalc...@gmail.com>> Cc: Balay, Satish <ba...@mcs.anl.gov<mailto:ba...@mcs.anl.gov>>; petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> Subject: Re: Terminating a process running petsc via petsc4py without mpi_abort Thanks, I will experiment with this. I am working through the multiprocessing interface, but I can see that the routines provided there are pretty much wrappers to the process signal functions. I guess the alternative is SIGKILL. Steve ________________________________ From: Lisandro Dalcin <dalc...@gmail.com<mailto:dalc...@gmail.com>> Sent: Thursday, June 4, 2020 4:54 PM To: Hudson, Stephen Tobias P <shud...@anl.gov<mailto:shud...@anl.gov>> Cc: Balay, Satish <ba...@mcs.anl.gov<mailto:ba...@mcs.anl.gov>>; petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> Subject: Re: Terminating a process running petsc via petsc4py without mpi_abort (1) You can use PETSc.Sys.pushErrorHandler("abort"), but it will not help you. What you really need is to override PETSc's default signal handling (2) While it is true that PETSc overrides the signal handler, you can override it again from python after from petsc4py import PETSc. For implementing (2), maybe you should try sending SIGINT and not SIGTERM, such that you can do the following. from petsc4py import PETSc import signal signal.signal(signal.SIGINT, signal.default_int_handler) ... if __name__ == "__main__": try: main() except KeyboardInterrupt: # Triggered if Ctrl+C or signaled with SIGINT ... # do cleanup if needed Otherwise, you just need signal.signal(signal.SIGINT, signal.SIG_DFL) PS: I'm not in favor of changing current PETSc's signal handling behavior. This particular issue is fixable with two lines of Python code: from signal import signal, SIGINT, SIG_DFL signal(SIGINT, SIG_DFL) On Thu, 4 Jun 2020 at 23:39, Hudson, Stephen Tobias P <shud...@anl.gov<mailto:shud...@anl.gov>> wrote: Lisandro, I don't see an interface to set this through petsc4py. Is it possible? Thanks, Steve ________________________________ From: Hudson, Stephen Tobias P <shud...@anl.gov<mailto:shud...@anl.gov>> Sent: Thursday, June 4, 2020 2:47 PM To: Balay, Satish <ba...@mcs.anl.gov<mailto:ba...@mcs.anl.gov>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>; Lisandro Dalcin <dalc...@gmail.com<mailto:dalc...@gmail.com>> Subject: Re: Terminating a process running petsc via petsc4py without mpi_abort Sounds good. I will have a look at how to set this through petsc4py. Thanks Steve ________________________________ From: Satish Balay <ba...@mcs.anl.gov<mailto:ba...@mcs.anl.gov>> Sent: Thursday, June 4, 2020 2:32 PM To: Hudson, Stephen Tobias P <shud...@anl.gov<mailto:shud...@anl.gov>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>; Lisandro Dalcin <dalc...@gmail.com<mailto:dalc...@gmail.com>> Subject: Re: Terminating a process running petsc via petsc4py without mpi_abort I don't completely understand the issue here. How is sequential run different than parallel run? In both cases - a PetscErrorHandler is likely getting invoked. One can change this behavior with: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscPushErrorHandler.html And there are a few default error handlers to choose PETSC_EXTERN PetscErrorCode PetscTraceBackErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*); PETSC_EXTERN PetscErrorCode PetscIgnoreErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*); PETSC_EXTERN PetscErrorCode PetscEmacsClientErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*); PETSC_EXTERN PetscErrorCode PetscMPIAbortErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*); PETSC_EXTERN PetscErrorCode PetscAbortErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*); PETSC_EXTERN PetscErrorCode PetscAttachDebuggerErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*); PETSC_EXTERN PetscErrorCode PetscReturnErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*); Some of the are accessible via command line option. for ex: -on_error_abort or -on_error_mpiabort Or perhaps you want to completely disable error handler with: -no_signal_handler cc: petsc-users Satish On Thu, 4 Jun 2020, Hudson, Stephen Tobias P wrote: > Satish, > > We are having issues caused by MPI_abort getting called when we try to > terminate a sub-process running petsc4py. Ideally we would always use a > serial build of petsc/petsc4py in this mode, but many users will have a > parallel build. We need to be able to send a terminate signal that just kills > the process. > > Is there a way to turn off the mpi_abort? > > Thanks, > > Steve > > -- Lisandro Dalcin ============ Research Scientist Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/