I'm trying to port Open MPI to OS with threads instead of processes. Currently, during MPI_Finalize, I get attempt to call munmap first with address of 0x200000c00000 and later 0x200000c00008.

mca_btl_vader_component_close():
munmap (mca_btl_vader_component.my_segment, mca_btl_vader_component.segment_size)

mca_btl_vader_component_init():
if(MCA_BTL_VADER_XPMEM != mca_btl_vader_component.single_copy_mechanism) {
opal_shmem_segment_create (&component->seg_ds, sm_file, component->segment_size);
  component->my_segment = opal_shmem_segment_attach (&component->seg_ds);
} else {
mmap (NULL, component->segment_size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0);
}

But opal_shmem_segment_attach (from mmap module) ends with:
    /* update returned base pointer with an offset that hides our stuff */
    return (ds_buf->seg_base_addr + sizeof(opal_shmem_seg_hdr_t));

So mca_btl_vader_component_close() should in that case call opal_shmem_segment_dettach() instead of munmap. Or actually, as at that point shmem_mmap module cleanup code is already done, vader could/should just skip cleanup part?

Maybe I should ask first how does that setup/cleanup work on normal Linux system? Is mmap called twice, and vader and shmem_mmap module each uses different address (so vader munmap is indeed required in that case)?

Second question.
With two threads in one process, I got attempt to opal_shmem_segment_dettach() and munmap() on same mmap-ed address, from both threads. I 'fixed' that by replacing "ds_buf->seg_cpid = getpid()" with gettid(), and then each thread munmap-s only address allocated by itself. Is that correct? Or is it possible, that the second thread might still try to access data at that address?

BR Justin

Reply via email to