Ok, having looked at this a bit more, I'm inclined to support Junchao's 
approach, but there seems to be concern that, even if the standards support it, 
there could be issues
in some scenarios.

I don't have enough information to dispute this. But if this was put in many 
years ago, I'm interested in what the other MPI libraries do now - eg. does 
trilinos etc use MPI_ABORT in the signal handler. If not, do users report 
issues hanging on terminate?
________________________________
From: Junchao Zhang <junchao.zh...@gmail.com>
Sent: Friday, June 5, 2020 7:26 PM
To: Hudson, Stephen Tobias P <shud...@anl.gov>
Cc: Lisandro Dalcin <dalc...@gmail.com>; petsc-users@mcs.anl.gov 
<petsc-users@mcs.anl.gov>
Subject: Re: [petsc-users] Terminating a process running petsc via petsc4py 
without mpi_abort



On Fri, Jun 5, 2020 at 3:39 PM Hudson, Stephen Tobias P via petsc-users 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote:
It seems I do have to bypass Python's multiprocessing somewhat limited 
interface. E.g.

self.process._popen._send_signal(signal.SIGINT)

which works, but I am by-passing the API.

I would support allowing the user to configure at run-time the signal handling 
for SIGTERM to exit without MPI_ABORT. I think I understand MPI_ABORT being the 
default, I've experienced hangs due to errors on single processes.
“hangs due to errors on single processes". If the single processes call exit(), 
then there will be no hang.


________________________________
From: Hudson, Stephen Tobias P <shud...@anl.gov<mailto:shud...@anl.gov>>
Sent: Friday, June 5, 2020 2:41 PM
To: Lisandro Dalcin <dalc...@gmail.com<mailto:dalc...@gmail.com>>
Cc: Balay, Satish <ba...@mcs.anl.gov<mailto:ba...@mcs.anl.gov>>; 
petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
Subject: Re: Terminating a process running petsc via petsc4py without mpi_abort

Thanks, I will experiment with this.

I am working through the multiprocessing interface, but I can see that the 
routines provided there are pretty much wrappers to the process signal 
functions.

I guess the alternative is SIGKILL.

Steve
________________________________
From: Lisandro Dalcin <dalc...@gmail.com<mailto:dalc...@gmail.com>>
Sent: Thursday, June 4, 2020 4:54 PM
To: Hudson, Stephen Tobias P <shud...@anl.gov<mailto:shud...@anl.gov>>
Cc: Balay, Satish <ba...@mcs.anl.gov<mailto:ba...@mcs.anl.gov>>; 
petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
Subject: Re: Terminating a process running petsc via petsc4py without mpi_abort

(1) You can use PETSc.Sys.pushErrorHandler("abort"), but it will not help you. 
What you really need is to override PETSc's default signal handling

(2) While it is true that PETSc overrides the signal handler, you can override 
it again from python after from petsc4py import PETSc.

For implementing (2), maybe you should try sending SIGINT and not SIGTERM, such 
that you can do the following.

from petsc4py import PETSc

import signal
signal.signal(signal.SIGINT, signal.default_int_handler)

...

if __name__ == "__main__":
    try:
        main()
    except KeyboardInterrupt: # Triggered if Ctrl+C or signaled with SIGINT
        ... # do cleanup if needed

Otherwise, you just need  signal.signal(signal.SIGINT, signal.SIG_DFL)


PS: I'm not in favor of changing current PETSc's signal handling behavior.
This particular issue is fixable with two lines of Python code:

from signal import signal, SIGINT, SIG_DFL
signal(SIGINT, SIG_DFL)



On Thu, 4 Jun 2020 at 23:39, Hudson, Stephen Tobias P 
<shud...@anl.gov<mailto:shud...@anl.gov>> wrote:
Lisandro,

I don't see an interface to set this through petsc4py. Is it possible?

Thanks,
Steve
________________________________
From: Hudson, Stephen Tobias P <shud...@anl.gov<mailto:shud...@anl.gov>>
Sent: Thursday, June 4, 2020 2:47 PM
To: Balay, Satish <ba...@mcs.anl.gov<mailto:ba...@mcs.anl.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>; Lisandro Dalcin 
<dalc...@gmail.com<mailto:dalc...@gmail.com>>
Subject: Re: Terminating a process running petsc via petsc4py without mpi_abort

Sounds good. I will have a look at how to set this through petsc4py.

Thanks
Steve
________________________________
From: Satish Balay <ba...@mcs.anl.gov<mailto:ba...@mcs.anl.gov>>
Sent: Thursday, June 4, 2020 2:32 PM
To: Hudson, Stephen Tobias P <shud...@anl.gov<mailto:shud...@anl.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>; Lisandro Dalcin 
<dalc...@gmail.com<mailto:dalc...@gmail.com>>
Subject: Re: Terminating a process running petsc via petsc4py without mpi_abort

I don't completely understand the issue here. How is sequential run different 
than parallel run?

In both cases - a PetscErrorHandler is likely getting invoked. One can change 
this behavior with:

https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscPushErrorHandler.html

And there are a few default error handlers to choose


PETSC_EXTERN PetscErrorCode PetscTraceBackErrorHandler(MPI_Comm,int,const 
char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscIgnoreErrorHandler(MPI_Comm,int,const 
char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscEmacsClientErrorHandler(MPI_Comm,int,const 
char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscMPIAbortErrorHandler(MPI_Comm,int,const 
char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscAbortErrorHandler(MPI_Comm,int,const 
char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscAttachDebuggerErrorHandler(MPI_Comm,int,const 
char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscReturnErrorHandler(MPI_Comm,int,const 
char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);

Some of the are accessible via command line option. for ex: -on_error_abort or 
-on_error_mpiabort

Or perhaps you want to completely disable error handler with: -no_signal_handler

cc: petsc-users

Satish

On Thu, 4 Jun 2020, Hudson, Stephen Tobias P wrote:

> Satish,
>
> We are having issues caused by MPI_abort getting called when we try to 
> terminate a sub-process running petsc4py. Ideally we would always use a 
> serial build of petsc/petsc4py in this mode, but many users will have a 
> parallel build. We need to be able to send a terminate signal that just kills 
> the process.
>
> Is there a way to turn off the mpi_abort?
>
> Thanks,
>
> Steve
>
>



--
Lisandro Dalcin
============
Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

Reply via email to