Interesting - does it happen in finalize, or in the middle of execution?

On Feb 6, 2014, at 5:57 PM, George Bosilca <bosi...@icl.utk.edu> wrote:

> Out of 150 runs I could reproduce it once. When it failed I got exactly the 
> same assert:
> 
> hello: ../../../../ompi/orte/mca/rml/base/rml_base_msg_handlers.c:75: 
> orte_rml_base_post_recv: Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) 
> == ((opal_object_t *) (recv))->obj_magic_id’ failed.
> 
> A quick look at the code indicates it is in a rather obscure execution path, 
> when one cancel a pending receive. The assert indicates that the receive 
> object was already destroyed (somewhere else) when it got removed from the 
> orte_rml_base.posted_recvs queue.
> 
> George.
> 
> 
> On Feb 7, 2014, at 02:22 , George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> A rather long configure line:
>> 
>> ./configure —enable-picky —enable-debug —enable-coverage 
>> —disable-heterogeneous —enable-visibility —enable-contrib-no-build=vt 
>> —enable-mpirun-prefix-by-default --disable-mpi-cxx --with-cma 
>> --enable-static 
>> --enable-mca-no-build=plm-tm,ess-tm,ras-tm,plm-tm,ras-slurm,ess-slurm,plm-slurm,btl-sctp
>> 
>> And hellow_world.c from ompi-tests compiled using: 
>> mpicc -g —coverage hello.c -o hello
>> 
>> George.
>> 
>> 
>> On Feb 7, 2014, at 01:11 , Ralph Castain <r...@open-mpi.org> wrote:
>> 
>>> Oh, should have noted: that's on both trunk and 1.7.4
>>> 
>>> On Feb 6, 2014, at 4:10 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>> 
>>>> Works for me on Mac and Linux/Centos6.2 as well
>>>> 
>>>> 
>>>> On Feb 6, 2014, at 4:00 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>>>> wrote:
>>>> 
>>>>> I'm unable to replicate on Linux/RHEL/64 bit with a trunk build.  How did 
>>>>> you configure?  Here's my configure:
>>>>> 
>>>>> ./configure --prefix=/home/jsquyres/bogus --disable-vt 
>>>>> --enable-mpirun-prefix-by-default --disable-mpi-fortran
>>>>> 
>>>>> Does this happen with every run?
>>>>> 
>>>>> 
>>>>> On Feb 6, 2014, at 6:53 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>>> 
>>>>>> A singleton hello_world assert with the following output:
>>>>>> 
>>>>>> Warning :: opal_list_remove_item - the item 0x1211fc0 is not on the list 
>>>>>> 0x7f2cd9161ae0
>>>>>> hello: ../../../../ompi/orte/mca/rml/base/rml_base_msg_handlers.c:75: 
>>>>>> orte_rml_base_post_recv: Assertion `((0xdeafbeedULL << 32) + 
>>>>>> 0xdeafbeedULL) == ((opal_object_t *) (recv))->obj_magic_id' failed.
>>>>>> [dancer:00698] *** Process received signal ***
>>>>>> [dancer:00698] Signal: Aborted (6)
>>>>>> [dancer:00698] Signal code:  (-6)
>>>>>> [dancer:00698] [ 0] /lib64/libpthread.so.0[0x3d8480f710]
>>>>>> [dancer:00698] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x3d83c32925]
>>>>>> [dancer:00698] [ 2] /lib64/libc.so.6(abort+0x175)[0x3d83c34105]
>>>>>> [dancer:00698] [ 3] /lib64/libc.so.6[0x3d83c2ba4e]
>>>>>> [dancer:00698] [ 4] 
>>>>>> /lib64/libc.so.6(__assert_perror_fail+0x0)[0x3d83c2bb10]
>>>>>> [dancer:00698] [ 5] 
>>>>>> /home/bosilca/opt/trunk/lib/libopen-rte.so.0(orte_rml_base_post_recv+0x252)[0x7f2cd8e76d55]
>>>>>> [dancer:00698] [ 6] 
>>>>>> /home/bosilca/opt/trunk/lib/libopen-pal.so.0(+0xcca5d)[0x7f2cd89e8a5d]
>>>>>> [dancer:00698] [ 7] 
>>>>>> /home/bosilca/opt/trunk/lib/libopen-pal.so.0(+0xcce53)[0x7f2cd89e8e53]
>>>>>> [dancer:00698] [ 8] 
>>>>>> /home/bosilca/opt/trunk/lib/libopen-pal.so.0(opal_libevent2021_event_base_loop+0x4eb)[0x7f2cd89e99ea]
>>>>>> [dancer:00698] [ 9] 
>>>>>> /home/bosilca/opt/trunk/lib/libopen-rte.so.0(+0x28725)[0x7f2cd8d1b725]
>>>>>> [dancer:00698] [10] /lib64/libpthread.so.0[0x3d848079d1]
>>>>>> [dancer:00698] [11] /lib64/libc.so.6(clone+0x6d)[0x3d83ce8b6d]
>>>>>> [dancer:00698] *** End of error message ***
>>>>>> 
>>>>>> The same executable run via mpirun with a single process succeed. This 
>>>>>> is with trunk, I did not tried with the release.
>>>>>> 
>>>>>> George.
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Jeff Squyres
>>>>> jsquy...@cisco.com
>>>>> For corporate legal information go to: 
>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to