Keep in mind, too, that opal_object is the "base" object -- put in C++ terms, it's the abstract class that all other classes are made of. So it's rare that we could create a opal_object by itself. opal_objects are usually created as part of some other, higher-level object.
What's the full call stack of where Valgrind is showing the error? Make sure you have the most recent valgrind (www.valgrind.org); the versions that ship in various distros may be somewhat old. Newer valgrind versions show lots of things that older versions don't. A new valgrind *might* be able to show some prior memory fault that is causing the issue...? On Jul 4, 2011, at 7:45 AM, Xin He wrote: > Hi, > > I ran the program with valgrind, and it showed almost the same error. It > appeared that the segmentation fault happened during > the initiation of an opal_object. That's why it puzzled me. > > /Xin > > > On 07/04/2011 01:40 PM, Jeff Squyres wrote: >> Ah -- so this is in the template code. I suspect this code might have bit >> rotted a bit. :-\ >> >> If you run this through valgrind, does anything obvious show up? I ask >> because this kind of error is typically a symptom of the real error. I.e., >> the real error was some kind of memory corruption that occurred earlier, and >> this is the memory access that exposes that prior memory corruption. >> >> >> On Jul 4, 2011, at 5:08 AM, Xin He wrote: >> >>> Yes, it is a opal_object. >>> >>> And this error seems to be caused by these code: >>> >>> void mca_btl_template_proc_construct(mca_btl_template_proc_t* >>> template_proc){ >>> ....... >>> ......... >>> /* add to list of all proc instance */ >>> OPAL_THREAD_LOCK(&mca_btl_template_component.template_lock); >>> >>> opal_list_append(&mca_btl_template_component.template_procs,&template_proc->super); >>> OPAL_THREAD_UNLOCK(&mca_btl_template_component.template_lock); >>> } >>> >>> /Xin >>> >>> On 07/02/2011 10:49 PM, Jeff Squyres (jsquyres) wrote: >>>> Do u know which object it is that is being constructed? When you compile >>>> with debugging enabled, theres strings in the object struct that identify >>>> te file and line where the obj was created. >>>> >>>> Sent from my phone. No type good. >>>> >>>> On Jun 29, 2011, at 8:48 AM, "Xin He" >>>> <xin.i...@ericsson.com> >>>> wrote: >>>> >>>> >>>>> Hi, >>>>> >>>>> As I advanced in my implementation of TIPC BTL, I added the component and >>>>> tried to run hello_c program to test. >>>>> >>>>> Then I got this segmentation fault. It seemed happening after the call >>>>> "mca_btl_tipc_add_procs". >>>>> >>>>> The error message displayed: >>>>> >>>>> [oak:23192] *** Process received signal *** >>>>> [oak:23192] Signal: Segmentation fault (11) >>>>> [oak:23192] Signal code: (128) >>>>> [oak:23192] Failing at address: (nil) >>>>> [oak:23192] [ 0] /lib/libpthread.so.0(+0xfb40) [0x7fec2a40fb40] >>>>> [oak:23192] [ 1] /usr/lib/libmpi.so.0(+0x1e6c10) [0x7fec2b2afc10] >>>>> [oak:23192] [ 2] /usr/lib/libmpi.so.0(+0x1e71f2) [0x7fec2b2b01f2] >>>>> [oak:23192] [ 3] /usr/lib/openmpi/mca_pml_ob1.so(+0x59f2) [0x7fec264fc9f2] >>>>> [oak:23192] [ 4] /usr/lib/openmpi/mca_pml_ob1.so(+0x5e5a) [0x7fec264fce5a] >>>>> [oak:23192] [ 5] /usr/lib/openmpi/mca_pml_ob1.so(+0x2386) [0x7fec264f9386] >>>>> [oak:23192] [ 6] /usr/lib/openmpi/mca_pml_ob1.so(+0x24a0) [0x7fec264f94a0] >>>>> [oak:23192] [ 7] /usr/lib/openmpi/mca_pml_ob1.so(+0x22fb) [0x7fec264f92fb] >>>>> [oak:23192] [ 8] /usr/lib/openmpi/mca_pml_ob1.so(+0x3a60) [0x7fec264faa60] >>>>> [oak:23192] [ 9] /usr/lib/libmpi.so.0(+0x67f51) [0x7fec2b130f51] >>>>> [oak:23192] [10] /usr/lib/libmpi.so.0(MPI_Init+0x173) [0x7fec2b161c33] >>>>> [oak:23192] [11] hello_i(main+0x22) [0x400936] >>>>> [oak:23192] [12] /lib/libc.so.6(__libc_start_main+0xfe) [0x7fec2a09bd8e] >>>>> [oak:23192] [13] hello_i() [0x400859] >>>>> [oak:23192] *** End of error message *** >>>>> >>>>> I used gdb to check the stack: >>>>> (gdb) bt >>>>> #0 0x00007ffff7afac10 in opal_obj_run_constructors (object=0x6ca980) >>>>> at ../opal/class/opal_object.h:427 >>>>> #1 0x00007ffff7afb1f2 in opal_list_construct (list=0x6ca958) at >>>>> class/opal_list.c:88 >>>>> #2 0x00007ffff2d479f2 in opal_obj_run_constructors (object=0x6ca958) >>>>> at ../../../../opal/class/opal_object.h:427 >>>>> #3 0x00007ffff2d47e5a in mca_pml_ob1_comm_construct (comm=0x6ca8c0) >>>>> at pml_ob1_comm.c:55 >>>>> #4 0x00007ffff2d44386 in opal_obj_run_constructors (object=0x6ca8c0) >>>>> at ../../../../opal/class/opal_object.h:427 >>>>> #5 0x00007ffff2d444a0 in opal_obj_new (cls=0x7ffff2f6c040) >>>>> at ../../../../opal/class/opal_object.h:477 >>>>> #6 0x00007ffff2d442fb in opal_obj_new_debug (type=0x7ffff2f6c040, >>>>> file=0x7ffff2d62840 "pml_ob1.c", line=182) >>>>> at ../../../../opal/class/opal_object.h:252 >>>>> #7 0x00007ffff2d45a60 in mca_pml_ob1_add_comm (comm=0x601060) at >>>>> pml_ob1.c:182 >>>>> #8 0x00007ffff797bf51 in ompi_mpi_init (argc=1, argv=0x7fffffffdf58, >>>>> requested=0, >>>>> provided=0x7fffffffde28) at runtime/ompi_mpi_init.c:770 >>>>> #9 0x00007ffff79acc33 in PMPI_Init (argc=0x7fffffffde5c, >>>>> argv=0x7fffffffde50) >>>>> at pinit.c:84 >>>>> #10 0x0000000000400936 in main (argc=1, argv=0x7fffffffdf58) at >>>>> hello_c.c:17 >>>>> >>>>> It seems the error happened when an object is constructed. Any idea why >>>>> this is happening? >>>>> >>>>> Thanks. >>>>> >>>>> Best regards, >>>>> Xin >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> _______________________________________________ >>>> devel mailing list >>>> >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/