Vader is for intra-node communication only, right? So for inter-node communication some other mechanism will be used anyway. Why would be even better to write a new btl? To avoid memcpy (knem would use it, if I understand you correctly; I guess code assumes that multiple processes on same node have isolated address spaces).

Fork + execve was one of first problems, yes. I replaced that with OSv specific calls (ignore fork, and instead of execve start given binary in new thread). The global variables required OSv modification - the guys from http://osv.io/ took care of that (I was surprised that at the end, the patches were really small and elegant). So while there are no real processes, new binary / ELF file is loaded at different address then the rest of OS - so it has separate global variables, and separate environ too. Other resources like file descriptors are still shared.

BR Justin

On 15. 12. 2015 14:55, Gilles Gouaillardet wrote:
Justin,

at first glance, vader should be symmetric (e.g. call opal_shmem_segment_dettach() instead of munmap()
Nathan, can you please comment ?

using tid instead of pid should also do the trick

that being said, a more elegant approach would be to create a new module in the shmem framework basically, create = malloc, attach = return the malloc'ed address, detach = noop, destroy = free

and an even better approach would be to write your own btl that replaces vader. basically, vader can use the knem module to write into an other process address space. since your os is thread only, knem invocation would become a simple memcpy.

makes sense ?


as a side note,
ompi uses global variables, and orted forks and exec MPI tasks after setting some environment variables. it seems porting ompi to this new os was not so painful, and I would have expected some issues with the global variables, and some race conditions with the environment.
did you already solve these issues ?

Cheers,

Gilles

On Tuesday, December 15, 2015, Justin Cinkelj <justin.cink...@xlab.si <mailto:justin.cink...@xlab.si>> wrote:

    I'm trying to port Open MPI to OS with threads instead of
    processes. Currently, during MPI_Finalize, I get attempt to call
    munmap first with address of 0x200000c00000 and later 0x200000c00008.

    mca_btl_vader_component_close():
    munmap (mca_btl_vader_component.my_segment,
    mca_btl_vader_component.segment_size)

    mca_btl_vader_component_init():
    if(MCA_BTL_VADER_XPMEM !=
    mca_btl_vader_component.single_copy_mechanism) {
      opal_shmem_segment_create (&component->seg_ds, sm_file,
    component->segment_size);
      component->my_segment = opal_shmem_segment_attach
    (&component->seg_ds);
    } else {
      mmap (NULL, component->segment_size, PROT_READ | PROT_WRITE,
    MAP_ANONYMOUS | MAP_SHARED, -1, 0);
    }

    But opal_shmem_segment_attach (from mmap module) ends with:
        /* update returned base pointer with an offset that hides our
    stuff */
        return (ds_buf->seg_base_addr + sizeof(opal_shmem_seg_hdr_t));

    So mca_btl_vader_component_close() should in that case call
    opal_shmem_segment_dettach() instead of munmap.
    Or actually, as at that point shmem_mmap module cleanup code is
    already done, vader could/should just skip cleanup part?

    Maybe I should ask first how does that setup/cleanup work on
    normal Linux system?
    Is mmap called twice, and vader and shmem_mmap module each uses
    different address (so vader munmap is indeed required in that case)?

    Second question.
    With two threads in one process, I got attempt to
    opal_shmem_segment_dettach() and munmap() on same mmap-ed address,
    from both threads. I 'fixed' that by replacing "ds_buf->seg_cpid =
    getpid()" with gettid(), and then each thread munmap-s only
    address allocated by itself. Is that correct? Or is it possible,
    that the second thread might still try to access data at that address?

    BR Justin

    _______________________________________________
    devel mailing list
    de...@open-mpi.org
    Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
    Link to this post:
    http://www.open-mpi.org/community/lists/devel/2015/12/18417.php



_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/12/18418.php

Reply via email to