Good suggestion - fixed on trunk in r32189
On Jul 9, 2014, at 2:30 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > I agree with Gilles that there is not a "bug", but I believe that OMPI could > do something better. > > First, I'll show that > a) this is not a new behavior > b) it is not limited to "less". > > $ (strace ompi_info -a | grep -m1 btl) 2>&1 | grep -e 'Open MPI:' -e SIGPIPE > write(1, " Open MPI: 1.4.5\n", 32) = 32 > --- SIGPIPE (Broken pipe) @ 0 (0) --- > +++ killed by SIGPIPE +++ > > a) the opmi_info output says "Open MPI: 1.4.5" (thus not new by any stretch). > b) the "-m1" argument to the inner "grep" says exit after the first match > > The "strace" is to detect/report that SIGPIPE was received. > The outer grep picks out the relevant info from the flood of strace output. > > So, the "issue" today seems to be that mxm is catching the signal and > producing a backtrace. This backtrace is NOT a desirable behavior. This is > not intrinsically the "fault" of mxm, because there is no reason to believe > that ompi_info would never link to (or dlopen) another library that performs > backtraces. > > So, I would suggest that ompi_info simply "signal(SIGPIPE, SIG_IGN);" to > resolve this in a way not specific to mxm. > > -Paul > > > On Wed, Jul 9, 2014 at 3:47 AM, Gilles Gouaillardet > <gilles.gouaillar...@iferc.org> wrote: > Mike, > > how do you test ? > i cannot reproduce a bug : > > if you run ompi_info -a -l 9 | less > > and i press 'q' at the early stage (e.g. before all output is written to the > pipe) > then the less process exits and receives SIG_PIPE and crash (which is a > normal unix behaviour) > > now if i press the spacebar until the end of the output (e.g. i get the (END) > message from less) > and then press 'q', then there is no problem. > > strace -e signal ompi_info -a -l 9 | true > will cause ompi_info receives a SIG_PIPE > > strace -e signal dd if=/dev/zero bs=1M count=1 | true > will cause dd receives a SIG_PIPE > > unless i miss something, i would conclude there is no bug > > Cheers, > > Gilles > > On 2014/07/09 19:33, Mike Dubman wrote: >> mxm only intercept signals and prints the stacktrace. >> happens on trunk as well. >> only when "| less" is used. >> >> >> >> >> >> >> On Tue, Jul 8, 2014 at 4:50 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> >> wrote: >> >>> I'm unable to replicate. Please provide more detail...? Is this a >>> problem in the MXM component? >>> >>> On Jul 8, 2014, at 9:20 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote: >>> >>>> $/usr/mpi/gcc/openmpi-1.8.2a1/bin/ompi_info -a -l 9|less >>>> Caught signal 13 (Broken pipe) >>>> ==== backtrace ==== >>>> 2 0x0000000000054cac mxm_handle_error() >>> /var/tmp/OFED_topdir/BUILD/mxm-3.2.2883/src/mxm/util/debug/debug.c:653 >>>> 3 0x0000000000054e74 mxm_error_signal_handler() >>> /var/tmp/OFED_topdir/BUILD/mxm-3.2.2883/src/mxm/util/debug/debug.c:628 >>>> 4 0x00000033fbe32920 killpg() ??:0 >>>> 5 0x00000033fbedb650 __write_nocancel() interp.c:0 >>>> 6 0x00000033fbe71d53 _IO_file_write@@GLIBC_2.2.5() ??:0 >>>> 7 0x00000033fbe73305 _IO_do_write@@GLIBC_2.2.5() ??:0 >>>> 8 0x00000033fbe719cd _IO_file_xsputn@@GLIBC_2.2.5() ??:0 >>>> 9 0x00000033fbe48410 _IO_vfprintf() ??:0 >>>> 10 0x00000033fbe4f40a printf() ??:0 >>>> 11 0x000000000002bc84 opal_info_out() >>> >>> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2a1/opal/runtime/opal_info_support.c:853 >>>> 12 0x000000000002c6bb opal_info_show_mca_group_params() >>> >>> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2a1/opal/runtime/opal_info_support.c:658 >>>> 13 0x000000000002c882 opal_info_show_mca_group_params() >>> >>> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2a1/opal/runtime/opal_info_support.c:716 >>>> 14 0x000000000002cc13 opal_info_show_mca_params() >>> >>> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2a1/opal/runtime/opal_info_support.c:742 >>>> 15 0x000000000002d074 opal_info_do_params() >>> >>> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2a1/opal/runtime/opal_info_support.c:485 >>>> 16 0x000000000040167b main() ??:0 >>>> 17 0x00000033fbe1ecdd __libc_start_main() ??:0 >>>> 18 0x0000000000401349 _start() ??:0 >>>> =================== >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/07/15075.php >>> >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/07/15076.php >>> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15080.php > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15082.php > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15085.php