Patrick,I'm unable to reproduce the buffer overrun with the latest trunk. I run valgrind (with the memchekcer tool) on a regular basis on the trunk, and I never noticed anything like that. Moreover, I went over the code, and I cannot imagine how we can overrun the buffer in the code you pinpointed.
Thanks, george. On Aug 23, 2008, at 7:57 PM, Patrick Farrell wrote:
Hi, I think I have found a buffer overrun in a function called by MPI::Init, though explanations of why I am wrong are welcome. I am using the openmpi included in Ubuntu Hardy, version 1.2.5, though I have inspected the latest trunk by eye and I don't believe the relevant code has changed. I was trying to use Electric Fence, a memory debugging library, to debug a suspected buffer overrun in my own program. Electric Fence works by replacing malloc/free in such a way that bounds violation errors issue a segfault. While running my program under Electric Fence, I found that I got a segfault issued at:0xb5cdd334 in opal_free_list_grow (flist=0xb2b46a50, num_elements=1) at class/opal_free_list.c:113113 OBJ_CONSTRUCT_INTERNAL(item, flist->fl_elem_class); (gdb) bt#0 0xb5cdd334 in opal_free_list_grow (flist=0xb2b46a50, num_elements=1) at class/opal_free_list.c:113 #1 0xb5cdd479 in opal_free_list_init (flist=0xb2b46a50, elem_size=56, elem_class=0xb2b46e20, num_elements_to_alloc=73, max_elements_to_alloc=-1, num_elements_per_alloc=1) at class/ opal_free_list.c:78 #2 0xb2b381aa in ompi_osc_pt2pt_component_init (enable_progress_threads=false, enable_mpi_threads=false) at osc_pt2pt_component.c:173 #3 0xb792b67c in ompi_osc_base_find_available (enable_progress_threads=false, enable_mpi_threads=false) at base/ osc_base_open.c:84 #4 0xb78e6abe in ompi_mpi_init (argc=5, argv=0xbfd61f84, requested=0, provided=0xbfd61e78) at runtime/ompi_mpi_init.c:411 #5 0xb7911a87 in PMPI_Init (argc=0xbfd61f00, argv=0xbfd61f04) at pinit.c:71#6 0x0811ca6c in MPI::Init () #7 0x08118b8a in main () To investigate further, I replaced the OBJ_CONSTRUCT_INTERNALmacro with its definition in opal/class/opal_object.h, and ran it again.It appears that the invalid memory access is happening on the instruction ((opal_object_t *) (item))->obj_class = (flist->fl_elem_class); Investigating further, I modified the source to opal_free_list with the attached patch. It adds a few debugging printfs to diagnose exactly what the code is doing. The output of the debugging statements are: mpidebug: allocating 216 mpidebug: allocated at memory address 0xb62bdf28 mpidebug: accessing address 0xb62be000 [segfault] Now, 0xb62be000 - 0xb62bdf28 = 216, which is the size of the buffer allocated, and so I think this is a buffer overrun. Steps to reproduce: a) Install Electric Fence b) Compile the following program #include <stdlib.h> #include <unistd.h> #include <mpi.h> int main(int argc, char **argv) { MPI::Init(argc, argv); MPI::Finalize(); return 0; } with mpiCC -o test ./test.cpp c) gdb ./test d) set environment LD_PRELOAD /usr/lib/libefence.so.0.0 e) run Hope this helps, Patrick Farrell -- Patrick Farrell PhD student Imperial College London--- openmpi-1.2.5/opal/class/opal_free_list.c 2008-08-23 18:35:03.000000000 +0100 +++ openmpi-1.2.5-modified/opal/class/opal_free_list.c 2008-08-23 18:31:47.000000000 +0100@@ -90,9 +90,12 @@if (flist->fl_max_to_alloc > 0 && flist->fl_num_allocated + num_elements > flist->fl_max_to_alloc)return OPAL_ERR_TEMP_OUT_OF_RESOURCE;+ fprintf(stderr, "mpidebug: allocating %d\n", (num_elements * flist->fl_elem_size) + sizeof(opal_list_item_t) + CACHE_LINE_SIZE); alloc_ptr = (unsigned char *)malloc((num_elements * flist- >fl_elem_size) +sizeof(opal_list_item_t) + CACHE_LINE_SIZE);+ fprintf(stderr, "mpidebug: allocated at memory address %p\n", alloc_ptr);+ if(NULL == alloc_ptr) return OPAL_ERR_TEMP_OUT_OF_RESOURCE; @@ -110,7 +113,16 @@ for(i=0; i<num_elements; i++) { opal_free_list_item_t* item = (opal_free_list_item_t*)ptr; if (NULL != flist->fl_elem_class) { - OBJ_CONSTRUCT_INTERNAL(item, flist->fl_elem_class); + do { + if (0 == (flist->fl_elem_class)->cls_initialized) { + opal_class_initialize((flist->fl_elem_class)); + }+ fprintf(stderr, "mpidebug: accessing address %p\n", &((opal_object_t *) (item))->obj_class); + ((opal_object_t *) (item))->obj_class = (flist- >fl_elem_class); + fprintf(stderr, "mpidebug: accessing address %p\n", &((opal_object_t *) (item))->obj_reference_count);+ ((opal_object_t *) (item))->obj_reference_count = 1; + opal_obj_run_constructors((opal_object_t *) (item)); + } while (0); } opal_list_append(&(flist->super), &(item->super)); ptr += flist->fl_elem_size; @@ -119,5 +131,3 @@ return OPAL_SUCCESS; } - - _______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
smime.p7s
Description: S/MIME cryptographic signature