Patrick,

I'm unable to reproduce the buffer overrun with the latest trunk. I run valgrind (with the memchekcer tool) on a regular basis on the trunk, and I never noticed anything like that. Moreover, I went over the code, and I cannot imagine how we can overrun the buffer in the code you pinpointed.

  Thanks,
    george.

On Aug 23, 2008, at 7:57 PM, Patrick Farrell wrote:

Hi,

I think I have found a buffer overrun in a function
called by MPI::Init, though explanations of why I am
wrong are welcome.

I am using the openmpi included in Ubuntu Hardy,
version 1.2.5, though I have inspected the latest trunk by eye
and I don't believe the relevant code has changed.

I was trying to use Electric Fence, a memory debugging library,
to debug a suspected buffer overrun in my own program.
Electric Fence works by replacing malloc/free in such
a way that bounds violation errors issue a segfault.
While running my program under Electric Fence, I found
that I got a segfault issued at:

0xb5cdd334 in opal_free_list_grow (flist=0xb2b46a50, num_elements=1) at class/opal_free_list.c:113
113 OBJ_CONSTRUCT_INTERNAL(item, flist->fl_elem_class);
(gdb) bt
#0 0xb5cdd334 in opal_free_list_grow (flist=0xb2b46a50, num_elements=1) at class/opal_free_list.c:113 #1 0xb5cdd479 in opal_free_list_init (flist=0xb2b46a50, elem_size=56, elem_class=0xb2b46e20, num_elements_to_alloc=73, max_elements_to_alloc=-1, num_elements_per_alloc=1) at class/ opal_free_list.c:78 #2 0xb2b381aa in ompi_osc_pt2pt_component_init (enable_progress_threads=false, enable_mpi_threads=false) at osc_pt2pt_component.c:173 #3 0xb792b67c in ompi_osc_base_find_available (enable_progress_threads=false, enable_mpi_threads=false) at base/ osc_base_open.c:84 #4 0xb78e6abe in ompi_mpi_init (argc=5, argv=0xbfd61f84, requested=0, provided=0xbfd61e78) at runtime/ompi_mpi_init.c:411 #5 0xb7911a87 in PMPI_Init (argc=0xbfd61f00, argv=0xbfd61f04) at pinit.c:71
#6 0x0811ca6c in MPI::Init ()
#7 0x08118b8a in main ()

To investigate further, I replaced the OBJ_CONSTRUCT_INTERNAL
macro with its definition in opal/class/opal_object.h, and ran it again.
It appears that the invalid memory access is happening
on the instruction

((opal_object_t *) (item))->obj_class = (flist->fl_elem_class);

Investigating further, I modified the source to opal_free_list
with the attached patch. It adds a few debugging printfs to
diagnose exactly what the code is doing. The output of the debugging
statements are:

mpidebug: allocating 216
mpidebug: allocated at memory address 0xb62bdf28
mpidebug: accessing address 0xb62be000
[segfault]

Now, 0xb62be000 - 0xb62bdf28 = 216, which is
the size of the buffer allocated, and so I think
this is a buffer overrun.

Steps to reproduce:

a) Install Electric Fence
b) Compile the following program

#include <stdlib.h>
#include <unistd.h>

#include <mpi.h>

int main(int argc, char **argv)
{
 MPI::Init(argc, argv);
 MPI::Finalize();

 return 0;
}

with

mpiCC -o test ./test.cpp

c) gdb ./test
d) set environment LD_PRELOAD /usr/lib/libefence.so.0.0
e) run

Hope this helps,

Patrick Farrell

--
Patrick Farrell
PhD student
Imperial College London
--- openmpi-1.2.5/opal/class/opal_free_list.c 2008-08-23 18:35:03.000000000 +0100 +++ openmpi-1.2.5-modified/opal/class/opal_free_list.c 2008-08-23 18:31:47.000000000 +0100
@@ -90,9 +90,12 @@
if (flist->fl_max_to_alloc > 0 && flist->fl_num_allocated + num_elements > flist->fl_max_to_alloc)
        return OPAL_ERR_TEMP_OUT_OF_RESOURCE;

+ fprintf(stderr, "mpidebug: allocating %d\n", (num_elements * flist->fl_elem_size) + sizeof(opal_list_item_t) + CACHE_LINE_SIZE); alloc_ptr = (unsigned char *)malloc((num_elements * flist- >fl_elem_size) +
                                        sizeof(opal_list_item_t) +
                                        CACHE_LINE_SIZE);
+ fprintf(stderr, "mpidebug: allocated at memory address %p\n", alloc_ptr);
+
    if(NULL == alloc_ptr)
        return OPAL_ERR_TEMP_OUT_OF_RESOURCE;

@@ -110,7 +113,16 @@
    for(i=0; i<num_elements; i++) {
        opal_free_list_item_t* item = (opal_free_list_item_t*)ptr;
        if (NULL != flist->fl_elem_class) {
-            OBJ_CONSTRUCT_INTERNAL(item, flist->fl_elem_class);
+            do {
+                if (0 == (flist->fl_elem_class)->cls_initialized) {
+                    opal_class_initialize((flist->fl_elem_class));
+                }
+ fprintf(stderr, "mpidebug: accessing address %p\n", &((opal_object_t *) (item))->obj_class); + ((opal_object_t *) (item))->obj_class = (flist- >fl_elem_class); + fprintf(stderr, "mpidebug: accessing address %p\n", &((opal_object_t *) (item))->obj_reference_count);
+                ((opal_object_t *) (item))->obj_reference_count = 1;
+                opal_obj_run_constructors((opal_object_t *) (item));
+            } while (0);
        }
        opal_list_append(&(flist->super), &(item->super));
        ptr += flist->fl_elem_size;
@@ -119,5 +131,3 @@
    return OPAL_SUCCESS;
}

-
-
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to