Keep in mind, too, that opal_object is the "base" object -- put in C++ terms, 
it's the abstract class that all other classes are made of.  So it's rare that 
we could create a opal_object by itself.  opal_objects are usually created as 
part of some other, higher-level object.

What's the full call stack of where Valgrind is showing the error?

Make sure you have the most recent valgrind (www.valgrind.org); the versions 
that ship in various distros may be somewhat old.  Newer valgrind versions show 
lots of things that older versions don't.  A new valgrind *might* be able to 
show some prior memory fault that is causing the issue...?


On Jul 4, 2011, at 7:45 AM, Xin He wrote:

> Hi,
> 
> I ran the program with valgrind, and it showed almost the same error. It 
> appeared that the segmentation fault happened during
> the initiation of an opal_object.  That's why it puzzled me.
> 
> /Xin
> 
> 
> On 07/04/2011 01:40 PM, Jeff Squyres wrote:
>> Ah -- so this is in the template code.  I suspect this code might have bit 
>> rotted a bit.  :-\
>> 
>> If you run this through valgrind, does anything obvious show up?  I ask 
>> because this kind of error is typically a symptom of the real error.  I.e., 
>> the real error was some kind of memory corruption that occurred earlier, and 
>> this is the memory access that exposes that prior memory corruption.
>> 
>> 
>> On Jul 4, 2011, at 5:08 AM, Xin He wrote:
>> 
>>> Yes, it is a opal_object.
>>> 
>>> And this error seems to be caused by these code:
>>> 
>>>  void mca_btl_template_proc_construct(mca_btl_template_proc_t* 
>>> template_proc){
>>>     .......
>>>    .........
>>>     /* add to list of all proc instance */
>>>     OPAL_THREAD_LOCK(&mca_btl_template_component.template_lock);
>>>     
>>> opal_list_append(&mca_btl_template_component.template_procs,&template_proc->super);
>>>     OPAL_THREAD_UNLOCK(&mca_btl_template_component.template_lock);
>>> }
>>> 
>>> /Xin
>>> 
>>> On 07/02/2011 10:49 PM, Jeff Squyres (jsquyres) wrote:
>>>> Do u know which object it is that is being constructed?  When you compile 
>>>> with debugging enabled, theres strings in the object struct that identify 
>>>> te file and line where the obj was created.
>>>> 
>>>> Sent from my phone. No type good.
>>>> 
>>>> On Jun 29, 2011, at 8:48 AM, "Xin He"
>>>> <xin.i...@ericsson.com>
>>>>  wrote:
>>>> 
>>>> 
>>>>> Hi,
>>>>> 
>>>>> As I advanced in my implementation of TIPC BTL, I added the component and 
>>>>> tried to run hello_c program to test.
>>>>> 
>>>>> Then I got this segmentation fault. It seemed happening after the call 
>>>>> "mca_btl_tipc_add_procs".
>>>>> 
>>>>> The error message displayed:
>>>>> 
>>>>> [oak:23192] *** Process received signal ***
>>>>> [oak:23192] Signal: Segmentation fault (11)
>>>>> [oak:23192] Signal code:  (128)
>>>>> [oak:23192] Failing at address: (nil)
>>>>> [oak:23192] [ 0] /lib/libpthread.so.0(+0xfb40) [0x7fec2a40fb40]
>>>>> [oak:23192] [ 1] /usr/lib/libmpi.so.0(+0x1e6c10) [0x7fec2b2afc10]
>>>>> [oak:23192] [ 2] /usr/lib/libmpi.so.0(+0x1e71f2) [0x7fec2b2b01f2]
>>>>> [oak:23192] [ 3] /usr/lib/openmpi/mca_pml_ob1.so(+0x59f2) [0x7fec264fc9f2]
>>>>> [oak:23192] [ 4] /usr/lib/openmpi/mca_pml_ob1.so(+0x5e5a) [0x7fec264fce5a]
>>>>> [oak:23192] [ 5] /usr/lib/openmpi/mca_pml_ob1.so(+0x2386) [0x7fec264f9386]
>>>>> [oak:23192] [ 6] /usr/lib/openmpi/mca_pml_ob1.so(+0x24a0) [0x7fec264f94a0]
>>>>> [oak:23192] [ 7] /usr/lib/openmpi/mca_pml_ob1.so(+0x22fb) [0x7fec264f92fb]
>>>>> [oak:23192] [ 8] /usr/lib/openmpi/mca_pml_ob1.so(+0x3a60) [0x7fec264faa60]
>>>>> [oak:23192] [ 9] /usr/lib/libmpi.so.0(+0x67f51) [0x7fec2b130f51]
>>>>> [oak:23192] [10] /usr/lib/libmpi.so.0(MPI_Init+0x173) [0x7fec2b161c33]
>>>>> [oak:23192] [11] hello_i(main+0x22) [0x400936]
>>>>> [oak:23192] [12] /lib/libc.so.6(__libc_start_main+0xfe) [0x7fec2a09bd8e]
>>>>> [oak:23192] [13] hello_i() [0x400859]
>>>>> [oak:23192] *** End of error message ***
>>>>> 
>>>>> I used gdb to check the stack:
>>>>> (gdb) bt
>>>>> #0  0x00007ffff7afac10 in opal_obj_run_constructors (object=0x6ca980)
>>>>>    at ../opal/class/opal_object.h:427
>>>>> #1  0x00007ffff7afb1f2 in opal_list_construct (list=0x6ca958) at 
>>>>> class/opal_list.c:88
>>>>> #2  0x00007ffff2d479f2 in opal_obj_run_constructors (object=0x6ca958)
>>>>>    at ../../../../opal/class/opal_object.h:427
>>>>> #3  0x00007ffff2d47e5a in mca_pml_ob1_comm_construct (comm=0x6ca8c0)
>>>>>    at pml_ob1_comm.c:55
>>>>> #4  0x00007ffff2d44386 in opal_obj_run_constructors (object=0x6ca8c0)
>>>>>    at ../../../../opal/class/opal_object.h:427
>>>>> #5  0x00007ffff2d444a0 in opal_obj_new (cls=0x7ffff2f6c040)
>>>>>    at ../../../../opal/class/opal_object.h:477
>>>>> #6  0x00007ffff2d442fb in opal_obj_new_debug (type=0x7ffff2f6c040,
>>>>>    file=0x7ffff2d62840 "pml_ob1.c", line=182)
>>>>>    at ../../../../opal/class/opal_object.h:252
>>>>> #7  0x00007ffff2d45a60 in mca_pml_ob1_add_comm (comm=0x601060) at 
>>>>> pml_ob1.c:182
>>>>> #8  0x00007ffff797bf51 in ompi_mpi_init (argc=1, argv=0x7fffffffdf58, 
>>>>> requested=0,
>>>>>    provided=0x7fffffffde28) at runtime/ompi_mpi_init.c:770
>>>>> #9  0x00007ffff79acc33 in PMPI_Init (argc=0x7fffffffde5c, 
>>>>> argv=0x7fffffffde50)
>>>>>    at pinit.c:84
>>>>> #10 0x0000000000400936 in main (argc=1, argv=0x7fffffffdf58) at 
>>>>> hello_c.c:17
>>>>> 
>>>>> It seems the error happened when an object is constructed. Any idea why 
>>>>> this is happening?
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> Best regards,
>>>>> Xin
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> 
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> _______________________________________________
>>>> devel mailing list
>>>> 
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to