The question was raised on this list a short while ago about potentially incorrect behavior by ORTE/OMPI in response to SIGUSR2 being sent to application procs. I have spent some time chasing this down, and it does -not- appear to be an issue within our systems.
What I have found is that if you send a SIGUSR1/2 to mpirun, mpirun and the daemons correctly transmit the provided signal to the application processes. Neither mpirun nor the daemons directly respond to it themselves. If the application process has defined its own signal handler to trap USR1/2, then the application process will successfully do so. Everything seems to work fine - the daemon does -not- get a callback nor in any way take action to the fact that the proc received this signal - unless the process' signal handler orders the process to exit! In this case, the environment reports to the orted that the process exit'd during a signal handler, which results in a terminated-by-signal status. You can, of course, get around this by simply not exiting from within the signal handler. Instead, set a flag and return from the handler, then have an appropriate routine check the flag and exit. I have done that in several codes and would be happy to advise you on how to do it. With this technique, you clear the signal and the environment will not report you as terminated-by-signal. However, if the application process has -not- defined its own signal handler, some native environments terminate the process when it receives SIGUSR1/2! This occurred for me under SLURM on the odin cluster, and under TM on our RRZ cluster. I cannot say it is a universal situation and would welcome more feedback from people with access to other environments. This termination is dutifully reported to the orted, which notes that the proc was terminated-by-signal. The orted does not check to see -which- signal was used to terminate the proc. By our own design requirements, the response to a termination-by-signal of a process is to abort the job. If we want to modify that, it would be simple to say "except if it was a SIGUSR1/2 signal". I have no issue with making that change, but please note that it -is- a change in our defined behavior, and a change from what has been our behavior since the beginning of the project. Let me know if you want to change the design requirement and we can take care of it. Thanks Ralph