On Feb 21, 2014, at 5:01 PM, Jeff Hammond <[email protected]> wrote:
> On Fri, Feb 21, 2014 at 4:32 PM, Barry Smith <[email protected]> wrote: >> >> Jeff, >> >> Thanks. This is certainly a useful thing. > > It's only half a solution right now. Hacking Hydra is a bit more > difficult for me. Not sure how long before I can solve that in a > manner that the MPICH folks find acceptable. > >> I never meant to kick a hornet’s nest with my initial email. I was >> taught by my postdoctoral advisor that any library or package that had >> stdout or stderr output hardwired that could not be turned off without >> losing functionality was rude and poorly thought out but then that guy >> probably never amounted to anything I guess so I should just ignore him >> since he doesn’t represent main stream thought. > > A colleague of mine suggested that libraries shouldn't be calling > MPI_Abort but rather return an error code to the application and let > them decide how to handle it, but he learned MPI from Bill Gropp, so > he might not know anything ;-) Actually "the library" isn’t “calling” MPI_Abort, the library’s default error handler is eventually calling MPI_Abort(). The library returns error codes to the application code and the application code is free to handle them anyway it likes as well as set its own error handlers. Barry > > I apologize for being unpleasant earlier. > > Best, > > Jeff > > >> >> Barry >> >> On Feb 21, 2014, at 3:10 PM, Jeff Hammond <[email protected]> wrote: >> >>> Barry: >>> >>> Would the following behavior be acceptable to you? I have only made >>> the changes in MPI but am looking at the process manager now. >>> >>> Jeff >>> >>> >>> # Without the process manager >>> >>> alcfwl181:build jhammond$ export MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=0 >>> alcfwl181:build jhammond$ ./a.out >>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 >>> alcfwl181:build jhammond$ export MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=1 >>> alcfwl181:build jhammond$ ./a.out >>> >>> alcfwl181:build jhammond$ unset MPIR_CVAR_SUPPRESS_ABORT_MESSAGE >>> alcfwl181:build jhammond$ ./a.out >>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 >>> >>> # With the process manager >>> >>> alcfwl181:build jhammond$ mpiexec -n 1 -env >>> MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 0 ./a.out >>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 >>> >>> =================================================================================== >>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>> = PID 61023 RUNNING AT alcfwl181.alcf.anl.gov >>> = EXIT CODE: 1 >>> = CLEANING UP REMAINING PROCESSES >>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>> =================================================================================== >>> alcfwl181:build jhammond$ mpiexec -n 1 -env >>> MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 1 ./a.out >>> >>> >>> =================================================================================== >>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>> = PID 61026 RUNNING AT alcfwl181.alcf.anl.gov >>> = EXIT CODE: 1 >>> = CLEANING UP REMAINING PROCESSES >>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>> =================================================================================== >>> alcfwl181:build jhammond$ mpiexec -n 1 ./a.out >>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 >>> >>> =================================================================================== >>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>> = PID 61032 RUNNING AT alcfwl181.alcf.anl.gov >>> = EXIT CODE: 1 >>> = CLEANING UP REMAINING PROCESSES >>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>> =================================================================================== >>> >>> >>> >>> On Thu, Feb 20, 2014 at 11:33 AM, Barry Smith <[email protected]> wrote: >>>> >>>> Is there any way to turn off MPICH (and others) printing messages about >>>> MPI_Abort? We have already prepared and presented useful error messages >>>> to the user about the situation and would like to avoid having these >>>> additional messages printed (that often make the situation look worse than >>>> it is) >>>> >>>> Thanks >>>> >>>> Barry >>>> >>>> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 >>>> [cli_0]: aborting job: >>>> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 >>>> >>>> ==================================================================mailto:[email protected]================= >>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>>> = EXIT CODE: 56 >>>> = CLEANING UP REMAINING PROCESSES >>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>>> =================================================================================== >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> discuss mailing list [email protected] >>>> To manage subscription options or unsubscribe: >>>> https://lists.mpich.org/mailman/listinfo/discuss >>> >>> >>> >>> -- >>> Jeff Hammond >>> [email protected] >>> _______________________________________________ >>> discuss mailing list [email protected] >>> To manage subscription options or unsubscribe: >>> https://lists.mpich.org/mailman/listinfo/discuss >> >> _______________________________________________ >> discuss mailing list [email protected] >> To manage subscription options or unsubscribe: >> https://lists.mpich.org/mailman/listinfo/discuss > > > > -- > Jeff Hammond > [email protected] > _______________________________________________ > discuss mailing list [email protected] > To manage subscription options or unsubscribe: > https://lists.mpich.org/mailman/listinfo/discuss
