Thanks for catching this. I tested my patch but not through valgrind. Thanks also for figuring out the line break issue. I figured that was coming from the device but didn't track it down.
Jeff On Sat, Feb 22, 2014 at 2:07 PM, Jed Brown <j...@jedbrown.org> wrote: > Jeff Hammond <jeff.scie...@gmail.com> writes: > >> https://trac.mpich.org/projects/mpich/ticket/2038 has the patches. > > Although I thought I once had an account on Trac, it doesn't seem to > know about me any more. Anyway, this patch passes an undefined > abort_str on to MPID_Abort. > > char abort_str[100], comm_name[MPI_MAX_OBJECT_NAME]; > ... > if (!MPIR_CVAR_SUPPRESS_ABORT_MESSAGE) > /* FIXME: This is not internationalized */ > MPIU_Snprintf(abort_str, 100, "application called MPI_Abort(%s, %d) - > process %d", comm_name, errorcode, comm_ptr->rank); > mpi_errno = MPID_Abort( comm_ptr, mpi_errno, errorcode, abort_str ); > > > ==27285== Conditional jump or move depends on uninitialised value(s) > ==27285== at 0x56F2AE8: vfprintf (in /usr/lib/libc-2.19.so) > ==27285== by 0x56F5630: buffered_vfprintf (in /usr/lib/libc-2.19.so) > ==27285== by 0x56F06BD: vfprintf (in /usr/lib/libc-2.19.so) > ==27285== by 0x4E96336: MPIU_Error_printf (in > /home/jed/usr/mpich-clang/lib/libmpich.so.12.0.0) > ==27285== by 0x4EC0D93: MPID_Abort (in > /home/jed/usr/mpich-clang/lib/libmpich.so.12.0.0) > ==27285== by 0x40795B6: MPI_Abort (in > /home/jed/usr/mpich-clang/lib/libpmpich.so.12.0.0) > ==27285== by 0x400808: main (in /home/jed/lang/mpi/a.out) > ==27285== > ==27285== Syscall param write(buf) points to uninitialised byte(s) > ==27285== at 0x5783470: __write_nocancel (in /usr/lib/libc-2.19.so) > ==27285== by 0x571E472: _IO_file_write@@GLIBC_2.2.5 (in > /usr/lib/libc-2.19.so) > ==27285== by 0x571DB32: new_do_write (in /usr/lib/libc-2.19.so) > ==27285== by 0x571EA85: _IO_file_xsputn@@GLIBC_2.2.5 (in > /usr/lib/libc-2.19.so) > ==27285== by 0x56F56C5: buffered_vfprintf (in /usr/lib/libc-2.19.so) > ==27285== by 0x56F06BD: vfprintf (in /usr/lib/libc-2.19.so) > ==27285== by 0x4E96336: MPIU_Error_printf (in > /home/jed/usr/mpich-clang/lib/libmpich.so.12.0.0) > ==27285== by 0x4EC0D93: MPID_Abort (in > /home/jed/usr/mpich-clang/lib/libmpich.so.12.0.0) > ==27285== by 0x40795B6: MPI_Abort (in > /home/jed/usr/mpich-clang/lib/libpmpich.so.12.0.0) > ==27285== by 0x400808: main (in /home/jed/lang/mpi/a.out) > ==27285== Address 0xffeffd130 is on thread 1's stack > > So I fix this: > > diff --git i/src/mpi/init/abort.c w/src/mpi/init/abort.c > index f0b4cdc..bb1a63b 100644 > --- i/src/mpi/init/abort.c > +++ w/src/mpi/init/abort.c > @@ -74,7 +74,7 @@ int MPI_Abort(MPI_Comm comm, int errorcode) > int mpi_errno = MPI_SUCCESS; > MPID_Comm *comm_ptr = NULL; > /* FIXME: 100 is arbitrary and may not be long enough */ > - char abort_str[100], comm_name[MPI_MAX_OBJECT_NAME]; > + char abort_str[100] = "", comm_name[MPI_MAX_OBJECT_NAME]; > int len = MPI_MAX_OBJECT_NAME; > MPID_MPI_STATE_DECL(MPID_STATE_MPI_ABORT); > > > and now I can sort of suppress the output: > > $ MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=1 ./a.out > > $ > > so it prints a blank line which may not be acceptable if it is producing > a stream, but is otherwise fine. Passing abort_str=NULL is already used > for something else ("internal ABORT"), but the following cleans up the > output. > > diff --git i/src/mpid/ch3/src/mpid_abort.c w/src/mpid/ch3/src/mpid_abort.c > index f0877ca..74b8a56 100644 > --- i/src/mpid/ch3/src/mpid_abort.c > +++ w/src/mpid/ch3/src/mpid_abort.c > @@ -94,7 +94,7 @@ int MPID_Abort(MPID_Comm * comm, int mpi_errno, int > exit_code, > #elif defined(MPIDI_DEV_IMPLEMENTS_ABORT) > MPIDI_CH3I_PMI_Abort(exit_code, error_msg); > #else > - MPIU_Error_printf("%s\n", error_msg); > + if (error_msg[0]) MPIU_Error_printf("%s\n", error_msg); > fflush(stderr); > #endif > > If this is acceptable, a similar change should be applied to the other > devices. -- Jeff Hammond jeff.scie...@gmail.com