It is difficult to see it from the stack trace, as it happens in the ORTE threads. But I do have all the output I expect, and as the application I was running is hello_world I’m almost certain it happens during MPI_Finalize.
George. On Feb 7, 2014, at 03:38 , Ralph Castain <r...@open-mpi.org> wrote: > Interesting - does it happen in finalize, or in the middle of execution? > > > On Feb 6, 2014, at 5:57 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > >> Out of 150 runs I could reproduce it once. When it failed I got exactly the >> same assert: >> >> hello: ../../../../ompi/orte/mca/rml/base/rml_base_msg_handlers.c:75: >> orte_rml_base_post_recv: Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) >> == ((opal_object_t *) (recv))->obj_magic_id’ failed. >> >> A quick look at the code indicates it is in a rather obscure execution path, >> when one cancel a pending receive. The assert indicates that the receive >> object was already destroyed (somewhere else) when it got removed from the >> orte_rml_base.posted_recvs queue. >> >> George. >> >> >> On Feb 7, 2014, at 02:22 , George Bosilca <bosi...@icl.utk.edu> wrote: >> >>> A rather long configure line: >>> >>> ./configure —enable-picky —enable-debug —enable-coverage >>> —disable-heterogeneous —enable-visibility —enable-contrib-no-build=vt >>> —enable-mpirun-prefix-by-default --disable-mpi-cxx --with-cma >>> --enable-static >>> --enable-mca-no-build=plm-tm,ess-tm,ras-tm,plm-tm,ras-slurm,ess-slurm,plm-slurm,btl-sctp >>> >>> And hellow_world.c from ompi-tests compiled using: >>> mpicc -g —coverage hello.c -o hello >>> >>> George. >>> >>> >>> On Feb 7, 2014, at 01:11 , Ralph Castain <r...@open-mpi.org> wrote: >>> >>>> Oh, should have noted: that's on both trunk and 1.7.4 >>>> >>>> On Feb 6, 2014, at 4:10 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>> >>>>> Works for me on Mac and Linux/Centos6.2 as well >>>>> >>>>> >>>>> On Feb 6, 2014, at 4:00 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> >>>>> wrote: >>>>> >>>>>> I'm unable to replicate on Linux/RHEL/64 bit with a trunk build. How >>>>>> did you configure? Here's my configure: >>>>>> >>>>>> ./configure --prefix=/home/jsquyres/bogus --disable-vt >>>>>> --enable-mpirun-prefix-by-default --disable-mpi-fortran >>>>>> >>>>>> Does this happen with every run? >>>>>> >>>>>> >>>>>> On Feb 6, 2014, at 6:53 PM, George Bosilca <bosi...@icl.utk.edu> wrote: >>>>>> >>>>>>> A singleton hello_world assert with the following output: >>>>>>> >>>>>>> Warning :: opal_list_remove_item - the item 0x1211fc0 is not on the >>>>>>> list 0x7f2cd9161ae0 >>>>>>> hello: ../../../../ompi/orte/mca/rml/base/rml_base_msg_handlers.c:75: >>>>>>> orte_rml_base_post_recv: Assertion `((0xdeafbeedULL << 32) + >>>>>>> 0xdeafbeedULL) == ((opal_object_t *) (recv))->obj_magic_id' failed. >>>>>>> [dancer:00698] *** Process received signal *** >>>>>>> [dancer:00698] Signal: Aborted (6) >>>>>>> [dancer:00698] Signal code: (-6) >>>>>>> [dancer:00698] [ 0] /lib64/libpthread.so.0[0x3d8480f710] >>>>>>> [dancer:00698] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x3d83c32925] >>>>>>> [dancer:00698] [ 2] /lib64/libc.so.6(abort+0x175)[0x3d83c34105] >>>>>>> [dancer:00698] [ 3] /lib64/libc.so.6[0x3d83c2ba4e] >>>>>>> [dancer:00698] [ 4] >>>>>>> /lib64/libc.so.6(__assert_perror_fail+0x0)[0x3d83c2bb10] >>>>>>> [dancer:00698] [ 5] >>>>>>> /home/bosilca/opt/trunk/lib/libopen-rte.so.0(orte_rml_base_post_recv+0x252)[0x7f2cd8e76d55] >>>>>>> [dancer:00698] [ 6] >>>>>>> /home/bosilca/opt/trunk/lib/libopen-pal.so.0(+0xcca5d)[0x7f2cd89e8a5d] >>>>>>> [dancer:00698] [ 7] >>>>>>> /home/bosilca/opt/trunk/lib/libopen-pal.so.0(+0xcce53)[0x7f2cd89e8e53] >>>>>>> [dancer:00698] [ 8] >>>>>>> /home/bosilca/opt/trunk/lib/libopen-pal.so.0(opal_libevent2021_event_base_loop+0x4eb)[0x7f2cd89e99ea] >>>>>>> [dancer:00698] [ 9] >>>>>>> /home/bosilca/opt/trunk/lib/libopen-rte.so.0(+0x28725)[0x7f2cd8d1b725] >>>>>>> [dancer:00698] [10] /lib64/libpthread.so.0[0x3d848079d1] >>>>>>> [dancer:00698] [11] /lib64/libc.so.6(clone+0x6d)[0x3d83ce8b6d] >>>>>>> [dancer:00698] *** End of error message *** >>>>>>> >>>>>>> The same executable run via mpirun with a single process succeed. This >>>>>>> is with trunk, I did not tried with the release. >>>>>>> >>>>>>> George. >>>>>>> _______________________________________________ >>>>>>> devel mailing list >>>>>>> de...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> >>>>>> >>>>>> -- >>>>>> Jeff Squyres >>>>>> jsquy...@cisco.com >>>>>> For corporate legal information go to: >>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel