All atomics must be done through not just "the same btl" but the same btl
MODULE,  since atomics from two IB HCAs, for instance, are not necessarily
coherent. So, how is the "best" one to be selected?

-Paul [Sent from my phone]
On Nov 5, 2014 7:15 AM, "Nathan Hjelm" <hje...@lanl.gov> wrote:

>
> In the new osc component I don't try to handle that case. All atomics
> have to be done through the same btl (including atomics on self). I did
> this because with the default setup of Gemini they can not be mixed. If
> it is possible to mix them with other networks I would be happy to add
> an atomic flag for that.
>
> -Nathan
>
> On Wed, Nov 05, 2014 at 03:41:58AM -0500, Joshua Ladd wrote:
> >    Quick question. Out of curiosity, how do you handle the (common) case
> of
> >    mixing network atomics with CPU atomics? Say for a single target with
> two
> >    initiators, one initiator is on host with the target, so goes through
> the
> >    SM BTL, and the other initiator is off host, so goes through the
> network
> >    BTL.
> >
> >    Josh
> >    On Tue, Nov 4, 2014 at 6:46 PM, Nathan Hjelm <hje...@lanl.gov> wrote:
> >
> >      What: Completely revamp the BTL RDMA interface (btl_put, btl_get) to
> >      better match what is needed for MPI one-sided.
> >
> >      Why: I am preparing to push an enhanced MPI-3 one-sided component
> that
> >      makes use of network rdma and atomic operations to provide a fast
> truely
> >      one-sided implementation. Before I can push this component I want to
> >      change the btl interface to:
> >
> >       - Provide access to network atomic operations. I only need add and
> >         cswap but the interface can be extended to any number of
> operations.
> >
> >         The new interface provides three new functions: btl_atomic_op,
> >         btl_atomic_fop, and btl_atomic_cswap. Additionally there are two
> new
> >         btl_flags to indicate available atomic support:
> >         MCA_BTL_FLAGS_ATOMIC_OPS, and MCA_BTL_FLAGS_ATOMIC_FOPS. The
> >         btl_atomics_flags field has been added to indicate which atomic
> >         operations are supported (see mca_btl_base_atomic_op_t). At this
> time
> >         I only added support for 64-bit integer atomics but I am open to
> >         adding support for 32-bit as well.
> >
> >       - Provide an interface that will allow simultaneous put/get
> operations
> >         without extra calls into the btl. The current interface requires
> the
> >         btl user to call prepare_src/prepare_dst before every rdma
> >         operation. In some cases this is a complete waste (vader, sm with
> >         CMA, knem, or xpmem).
> >
> >         I seperated the registration of memory from the segment info.
> More
> >         information is provided below. The new put/get functions have the
> >         following signatures:
> >
> >      typedef int (*mca_btl_base_module_put_fn_t) (struct
> >      mca_btl_base_module_t *btl,
> >          struct mca_btl_base_endpoint_t *endpoint, void *local_address,
> >          uint64_t remote_address, struct
> mca_btl_base_registration_handle_t
> >      *local_handle,
> >          struct mca_btl_base_registration_handle_t *remote_handle, size_t
> >      size, int flags,
> >          int order, mca_btl_base_rdma_completion_fn_t cbfunc, void
> >      *cbcontext, void *cbdata);
> >
> >      typedef int (*mca_btl_base_module_get_fn_t) (struct
> >      mca_btl_base_module_t *btl,
> >          struct mca_btl_base_endpoint_t *endpoint, void *local_address,
> >          uint64_t remote_address, struct
> mca_btl_base_registration_handle_t
> >      *local_handle,
> >          struct mca_btl_base_registration_handle_t *remote_handle, size_t
> >      size, int flags,
> >          int order, mca_btl_base_rdma_completion_fn_t cbfunc, void
> >      *cbcontext, void *cbdata);
> >
> >      typedef void (*mca_btl_base_rdma_completion_fn_t)(
> >          struct mca_btl_base_module_t* module,
> >          struct mca_btl_base_endpoint_t* endpoint,
> >          void *local_address,
> >          struct mca_btl_base_registration_handle_t *local_handle,
> >          void *context,
> >          void *cbdata,
> >          int status);
> >
> >         I may modify the completion function to provide more information
> on
> >         the completed operation (size).
> >
> >       - Allow the registration of an entire region even if the region
> can not
> >         be modified with a single rdma operation. At this time
> prepare_src
> >         and prepare_dst may modify the size and register a smaller
> >         region. This will not work.
> >
> >         This is done in the new interface through the new
> btl_register_mem,
> >         and btl_deregister_mem interfaces. The btl_register_mem interface
> >         returns a registration handle of size
> btl_registration_handle_size
> >         that can be used as either the local_handle or remote_handle to
> any
> >         rdma/atomic function. BTLs that do not provide these functions
> do not
> >         require registration for rdma/atomic operations.
> >
> >      typedef struct mca_btl_base_registration_handle_t
> >      *(*mca_btl_base_module_register_mem_fn_t)(
> >          struct mca_btl_base_module_t* btl, struct
> mca_btl_base_endpoint_t
> >      *endpoint, void *base,
> >          size_t size, uint32_t flags);
> >
> >      typedef struct mca_btl_base_registration_handle_t
> >      *(*mca_btl_base_module_register_mem_fn_t)(
> >          struct mca_btl_base_module_t* btl, struct
> mca_btl_base_endpoint_t
> >      *endpoint, void *base,
> >          size_t size, uint32_t flags);
> >
> >       - Expose the limitations of the put and get operations so the
> caller
> >         can make decisions before trying a get or put operation. Two
> >         examples: the Gemini interconnect has an alignment restriction on
> >         get, openib devices may have a limit on how large a single
> get/put
> >         operation can be. The current interface sort of gives the put
> limit
> >         but it is tied to the rdma pipeline protocol.
> >
> >         This is done in the new interface by providing btl_get_limit,
> >         btl_get_alignment, btl_put_limit, and btl_put_alignment.
> Operations
> >         that violate these restrictions should return OPAL_ERR_BAD_PARAM
> >         (operation over limit) or OPAL_ERR_NOT_SUPPORTED (operation not
> >         supported due to alignment restructions with either the source or
> >         destination buffer).
> >
> >      This is a big change and I do not expect everyone to like 100% of
> these
> >      changes. I welcome any feedback people have.
> >
> >      When: Tuesday, Nov 17, 2015. This is during SC so there will be
> time for
> >      face-to-face discussion if anyone has any concerns or would like to
> see
> >      something changed.
> >
> >      The proposed new btl interface as well as updated versions of:
> pml/ob1,
> >      btl/openib, btl/self, btl/scif, btl/sm, btl/tcp, btl/ugni, and
> btl/vader
> >      can be found in my btlmod branch at:
> >
> >      https://github.com/hjelmn/ompi/tree/btlmod
> >
> >      Other btls (smcuda, and usnic) still need to be updated to provide
> the
> >      new interface. Unmodified btl will not build.
> >
> >      If there are no objections I will push the btl modifications into
> the
> >      master two weeks from today (Nov 17). Please take a look and let me
> know
> >      what you think.
> >
> >      _______________________________________________
> >      devel mailing list
> >      de...@open-mpi.org
> >      Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >      Link to this post:
> >      http://www.open-mpi.org/community/lists/devel/2014/11/16193.php
>
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/11/16195.php
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/11/16198.php
>

Reply via email to