On Feb 21, 2014, at 5:01 PM, Jeff Hammond <[email protected]> wrote:

> On Fri, Feb 21, 2014 at 4:32 PM, Barry Smith <[email protected]> wrote:
>> 
>>   Jeff,
>> 
>>     Thanks. This is certainly a useful thing.
> 
> It's only half a solution right now.  Hacking Hydra is a bit more
> difficult for me.  Not sure how long before I can solve that in a
> manner that the MPICH folks find acceptable.
> 
>>      I never meant to kick a hornet’s nest with my initial email. I was 
>> taught by my postdoctoral advisor that any library or package that had 
>> stdout or stderr output hardwired that could not be turned off without 
>> losing functionality was rude and poorly thought out but then that guy 
>> probably never amounted to anything I guess so I should just ignore him 
>> since he doesn’t represent main stream thought.
> 
> A colleague of mine suggested that libraries shouldn't be calling
> MPI_Abort but rather return an error code to the application and let
> them decide how to handle it, but he learned MPI from Bill Gropp, so
> he might not know anything ;-)

   Actually "the library" isn’t “calling” MPI_Abort, the library’s default 
error handler is eventually calling MPI_Abort(). The library returns error 
codes to the application code and the application code is free to handle them 
anyway it likes as well as set its own error handlers.

   Barry

> 
> I apologize for being unpleasant earlier.
> 
> Best,
> 
> Jeff
> 
> 
>> 
>>   Barry
>> 
>> On Feb 21, 2014, at 3:10 PM, Jeff Hammond <[email protected]> wrote:
>> 
>>> Barry:
>>> 
>>> Would the following behavior be acceptable to you?  I have only made
>>> the changes in MPI but am looking at the process manager now.
>>> 
>>> Jeff
>>> 
>>> 
>>> # Without the process manager
>>> 
>>> alcfwl181:build jhammond$ export MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=0
>>> alcfwl181:build jhammond$ ./a.out
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>> alcfwl181:build jhammond$ export MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=1
>>> alcfwl181:build jhammond$ ./a.out
>>> 
>>> alcfwl181:build jhammond$ unset MPIR_CVAR_SUPPRESS_ABORT_MESSAGE
>>> alcfwl181:build jhammond$ ./a.out
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>> 
>>> # With the process manager
>>> 
>>> alcfwl181:build jhammond$ mpiexec -n 1 -env
>>> MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 0 ./a.out
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>> 
>>> ===================================================================================
>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> =   PID 61023 RUNNING AT alcfwl181.alcf.anl.gov
>>> =   EXIT CODE: 1
>>> =   CLEANING UP REMAINING PROCESSES
>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> ===================================================================================
>>> alcfwl181:build jhammond$ mpiexec -n 1 -env
>>> MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 1 ./a.out
>>> 
>>> 
>>> ===================================================================================
>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> =   PID 61026 RUNNING AT alcfwl181.alcf.anl.gov
>>> =   EXIT CODE: 1
>>> =   CLEANING UP REMAINING PROCESSES
>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> ===================================================================================
>>> alcfwl181:build jhammond$ mpiexec -n 1 ./a.out
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>> 
>>> ===================================================================================
>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> =   PID 61032 RUNNING AT alcfwl181.alcf.anl.gov
>>> =   EXIT CODE: 1
>>> =   CLEANING UP REMAINING PROCESSES
>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> ===================================================================================
>>> 
>>> 
>>> 
>>> On Thu, Feb 20, 2014 at 11:33 AM, Barry Smith <[email protected]> wrote:
>>>> 
>>>>  Is there any way to turn off MPICH (and others) printing messages about 
>>>> MPI_Abort?  We have already prepared and presented useful error messages 
>>>> to the user about the situation and would like to avoid having these 
>>>> additional messages printed (that often make the situation look worse than 
>>>> it is)
>>>> 
>>>>   Thanks
>>>> 
>>>>  Barry
>>>> 
>>>> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0
>>>> [cli_0]: aborting job:
>>>> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0
>>>> 
>>>> ==================================================================mailto:[email protected]=================
>>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>> =   EXIT CODE: 56
>>>> =   CLEANING UP REMAINING PROCESSES
>>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>>> ===================================================================================
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> discuss mailing list     [email protected]
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>> 
>>> 
>>> 
>>> --
>>> Jeff Hammond
>>> [email protected]
>>> _______________________________________________
>>> discuss mailing list     [email protected]
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>> 
>> _______________________________________________
>> discuss mailing list     [email protected]
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
> 
> 
> 
> -- 
> Jeff Hammond
> [email protected]
> _______________________________________________
> discuss mailing list     [email protected]
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

Reply via email to