Good suggestion - fixed on trunk in r32189

On Jul 9, 2014, at 2:30 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> I agree with Gilles that there is not a "bug", but I believe that OMPI could 
> do something better.
> 
> First, I'll show that
> a) this is not a new behavior
> b) it is not limited to "less".
> 
> $ (strace ompi_info -a | grep -m1 btl) 2>&1 | grep -e 'Open MPI:' -e SIGPIPE
> write(1, "                Open MPI: 1.4.5\n", 32) = 32
> --- SIGPIPE (Broken pipe) @ 0 (0) ---
> +++ killed by SIGPIPE +++
> 
> a) the opmi_info output says "Open MPI: 1.4.5" (thus not new by any stretch).
> b) the "-m1" argument to the inner "grep" says exit after the first match
> 
> The "strace" is to detect/report that SIGPIPE was received.
> The outer grep picks out the relevant info from the flood of strace output.
> 
> So, the "issue" today seems to be that mxm is catching the signal and 
> producing a backtrace.  This backtrace is NOT a desirable behavior.  This is 
> not intrinsically the "fault" of mxm, because there is no reason to believe 
> that ompi_info would never link to (or dlopen) another library that performs 
> backtraces.
> 
> So, I would suggest that ompi_info simply "signal(SIGPIPE, SIG_IGN);" to 
> resolve this in a way not specific to mxm.
> 
> -Paul
> 
> 
> On Wed, Jul 9, 2014 at 3:47 AM, Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
> Mike,
> 
> how do you test ?
> i cannot reproduce a bug :
> 
> if you run ompi_info -a -l 9 | less
> 
> and i press 'q' at the early stage (e.g. before all output is written to the 
> pipe)
> then the less process exits and receives SIG_PIPE and crash (which is a 
> normal unix behaviour)
> 
> now if i press the spacebar until the end of the output (e.g. i get the (END) 
> message from less)
> and then press 'q', then there is no problem.
> 
> strace -e signal ompi_info -a -l 9 | true
> will cause ompi_info receives a SIG_PIPE
> 
> strace -e signal dd if=/dev/zero bs=1M count=1 | true
> will cause dd receives a SIG_PIPE
> 
> unless i miss something, i would conclude there is no bug
> 
> Cheers,
> 
> Gilles
> 
> On 2014/07/09 19:33, Mike Dubman wrote:
>> mxm only intercept signals and prints the stacktrace.
>> happens on trunk as well.
>> only when "| less" is used.
>> 
>> 
>> 
>> 
>> 
>> 
>> On Tue, Jul 8, 2014 at 4:50 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com>
>> wrote:
>> 
>>> I'm unable to replicate.  Please provide more detail...?  Is this a
>>> problem in the MXM component?
>>> 
>>> On Jul 8, 2014, at 9:20 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote:
>>> 
>>>> $/usr/mpi/gcc/openmpi-1.8.2a1/bin/ompi_info -a -l 9|less
>>>> Caught signal 13 (Broken pipe)
>>>> ==== backtrace ====
>>>>  2 0x0000000000054cac mxm_handle_error()
>>>  /var/tmp/OFED_topdir/BUILD/mxm-3.2.2883/src/mxm/util/debug/debug.c:653
>>>>  3 0x0000000000054e74 mxm_error_signal_handler()
>>>  /var/tmp/OFED_topdir/BUILD/mxm-3.2.2883/src/mxm/util/debug/debug.c:628
>>>>  4 0x00000033fbe32920 killpg()  ??:0
>>>>  5 0x00000033fbedb650 __write_nocancel()  interp.c:0
>>>>  6 0x00000033fbe71d53 _IO_file_write@@GLIBC_2.2.5()  ??:0
>>>>  7 0x00000033fbe73305 _IO_do_write@@GLIBC_2.2.5()  ??:0
>>>>  8 0x00000033fbe719cd _IO_file_xsputn@@GLIBC_2.2.5()  ??:0
>>>>  9 0x00000033fbe48410 _IO_vfprintf()  ??:0
>>>> 10 0x00000033fbe4f40a printf()  ??:0
>>>> 11 0x000000000002bc84 opal_info_out()
>>>  
>>> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2a1/opal/runtime/opal_info_support.c:853
>>>> 12 0x000000000002c6bb opal_info_show_mca_group_params()
>>>  
>>> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2a1/opal/runtime/opal_info_support.c:658
>>>> 13 0x000000000002c882 opal_info_show_mca_group_params()
>>>  
>>> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2a1/opal/runtime/opal_info_support.c:716
>>>> 14 0x000000000002cc13 opal_info_show_mca_params()
>>>  
>>> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2a1/opal/runtime/opal_info_support.c:742
>>>> 15 0x000000000002d074 opal_info_do_params()
>>>  
>>> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2a1/opal/runtime/opal_info_support.c:485
>>>> 16 0x000000000040167b main()  ??:0
>>>> 17 0x00000033fbe1ecdd __libc_start_main()  ??:0
>>>> 18 0x0000000000401349 _start()  ??:0
>>>> ===================
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/07/15075.php
>>> 
>>> 
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/07/15076.php
>>> 
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15080.php
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15082.php
> 
> 
> 
> -- 
> Paul H. Hargrove                          phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department     Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15085.php

Reply via email to