Thanks for the bug report!

I've just changed the behavior to emit a warning and *not* intercept a signal if the old signal action is neither SIG_DFL nor SIG_IGN. The opal_signal MCA parameter can be set to determine which signals you want to intercept; it defaults to the integer values of SIGABRT, SIGBUS, SIGFPE, SIGSEGV on your system.

We can probably get this in OMPI v1.3.2.


On Mar 19, 2009, at 11:13 AM, Kees Verstoep wrote:

Hi,

Currently, opal_util_register_stackhandlers() in opal/util/ stacktrace.c
calls sigaction() with a third NULL argument, meaning you don't look
at possibly previously installed signal handlers, and always override
them with print_stackframe().

But there are actually realistic scenarios where an application actively uses these signals, and also wants to use MPI. As an example, the default
opal "signal" parameter settings are such that SIG_SEGV is redirected.
Typically, indeed, SIG_SEGV indicates a bug somewhere, and the stacktrace from Open MPI is a nice bonus. However, the Sun Java JDK uses SIG_SEGV to detect when stacks should be automatically extended, and it stops working
rather ungracefully when that handler gets replaced.

(BTW, we stumbled on this recently when we added an MPI backend for our Ibis grid programming environment. It took a bit of time to figure out what was happening, since we got no usable stacktrace for the thread that
got bitten.  We suspected a bug in our native code mapping at first,
but MPICH did not have this problem).

In most cases, you can of course work around it by manually changing
the opal "signal" list, but it would be nicer if Open MPI would detect
the situation, and e.g. only install the stack printer when there is
no handler yet, or at least warn about the possible clash.

Thanks!
Kees Verstoep
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

Reply via email to