Steve,
This is indeed strange. The mechanism you describe works for me.
Here is my simple test :
---------------------- mpi-sig.c ----------------------
#include "mpi.h"
#include <stdio.h>
#include <signal.h>
void warn(int sig) {
printf("Got signal %d\n", sig);
}
int main (int argc, char ** argv) {
signal(SIGUSR2, warn);
MPI_Init(&argc,&argv);
while (1);
MPI_Finalize();
return 0;
}
-------------------------------------------------------
Whenever I do a kill -SIGUSR2 on it, I get the message "Got signal 12"
(the handler gets called).
If I remove the call to signal() I get the same message you get :
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 25067 on node bullx1 exited on
signal 12 (User defined signal 2).
--------------------------------------------------------------------------
Maybe you should ensure that this simple test works, then figure out why
it is different from yours.
Sylvain
On Wed, 25 Aug 2010, Steve Wise wrote:
On 08/25/2010 12:43 PM, Ralph Castain wrote:
On Aug 25, 2010, at 11:26 AM, Steve Wise wrote:
On 08/25/2010 11:33 AM, Ralph Castain wrote:
We don't use it - mpirun traps it and then propagates it by default to
all remote procs.
So I should send the signal to the mpirun process?
Yes - however, note that it will be propagated to ALL processes in the job.
If you want to only get the signal in one proc, you can just do a "kill" to
that specific process on its node. We don't trap signals on the application
procs themselves, so your proc can do whatever it wants with it.
Something is funny then. When I send SIGUSR2 to the process itself -or- to
the mpirun proc, it just kills the process and doesn't get to my sig handler.
And my same library works when I run the job using mvapich2.
I'll keep digging.
Thanks!
Steve.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel