For the mailing lists -- Dorian found a bug here: our one-sided code assumes that the datatype will be "small" (i.e., a packed version of the datatype will fit within the eager limit fragment size in OMPI). Dorian's datatype is much larger than that, leading to an amorphous failure. I've failed a bug about this:

    https://svn.open-mpi.org/trac/ompi/ticket/1905


On Apr 23, 2009, at 3:36 PM, doriankrause wrote:

Hi,

I'm currently looking at this bug:
http://www.open-mpi.org/community/lists/users/2008/12/7611.php
I'm using the 1.3.2 tarball.

Valgrind tells me that there is an invalid write (of size 1) in
osc_pt2pt_data_move.c at line 229 which is the
statement

    memcpy((unsigned char*) buffer->payload + written_data,
           packed_ddt, packed_ddt_len);

in the function ompi_osc_pt2pt_sendreq_send.

I have

(gdb) p packed_ddt_len
$2 = 44852

and

(gdb) p written_data
$3 = 36

but I can't figure out what the actual size of buffer->payload is. I have

(gdb) p *buffer
$6 = {mpireq = {super = {super = {super = {
          obj_magic_id = 16046253926196952813, obj_class = 0x4f5240,
          obj_reference_count = 1,
          cls_init_file_name = 0x2efe0b "class/opal_free_list.c",
cls_init_lineno = 114}, opal_list_next = 0x0, opal_list_prev =
0x0,
        item_free = 1, opal_list_item_refcount = 0,
        opal_list_item_belong_to = 0x0}}, request = 0x5a35, status = {
      MPI_SOURCE = 23094, MPI_TAG = 23095, MPI_ERROR = 23096, _count =
23097,
      _cancelled = 23098}, cbfunc = 0x4e6cc5
<ompi_osc_pt2pt_sendreq_send_cb>,
    cbdata = 0x8681080}, payload = 0x86bc0d8, len = 23102}

Is len the size of payload?

In osc_pt2pt_component.c I found the statement

/* adjust size to be multiple of ompi_ptr_t to avoid alignment issues*/
    aligned_size = sizeof(ompi_osc_pt2pt_buffer_t) +
        (sizeof(ompi_osc_pt2pt_buffer_t) % sizeof(ompi_ptr_t)) +
        mca_osc_pt2pt_component.p2p_c_eager_size;
OBJ_CONSTRUCT(&mca_osc_pt2pt_component.p2p_c_buffers, opal_free_list_t);
    opal_free_list_init(&mca_osc_pt2pt_component.p2p_c_buffers,
                        aligned_size,
                        OBJ_CLASS(ompi_osc_pt2pt_buffer_t),
                        1, -1, 1);

but this doesn't help me to understand ...


Can you help with this? Where can I find the allocation routine for the
buffer?
Or do you know why there could be an invalid write?

Thanks + Best regards,
Dorian


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

Reply via email to