David Maxwell has finally gotten the Coverity automation stuff
working to download our nightly trunk tarballs and run them through
their tool. I was skimming through the results this morning and
noticed one that looked odd to me: in osc_rdma_component.c:619, we have:
619 if (NULL == datatype) {
620 opal_output(ompi_osc_base_output,
621 "Error recreating datatype.
Aborting.");
622 ompi_mpi_abort(module->m_comm, 1, false);
623 }
And then shortly after line 623, we use/dereference datatype.
Coverity marked this as "possible NULL dereference".
"Hah", I thought, "Coverity doesn't realize that ompi_mpi_abort()
will always abort, so this is a false positive."
Just for the heckuvit, I checked: ompi_mpi_abort does *not* guarantee
to abort. Indeed, there are two cases where it may actually return:
1. We have logic in ompi_mpi_abort to prevent recursive invocation
(ompi_mpi_abort.c:60):
/* Protection for recursive invocation */
if (have_been_invoked) {
return OMPI_SUCCESS;
}
have_been_invoked = true;
I added this check in r13354, but I don't remember the exact case in
which it was happening. :-\ I'm wondering if we should loop forever
(perhaps over sleep or something) instead of returning...
Do anyone know how it could be that ompi_mpi_abort() could be invoked
*recursively*? I can imagine in a THREAD_MULTIPLE scenario that
multiple threads could call ompi_mpi_abort (which is I *think* is why
I added that logic). In that case, do we have enough protection, or
do we really need to use atomic operations to test have_been_invoked?
2. In several places, we call orte_errmgr.error_detected(). I think
that we will *always* have the orte "proxy" errmgr component loaded
in MPI processes, in which case errmgr.error_detected() will always
call exit(), so I think we're ok there.
--
Jeff Squyres
Cisco Systems