Copying the Open MPI folks on this thread.

  - Matt


On Wed, 2006-04-19 at 12:05 -0700, Sean Hefty wrote:
> I'd like to get some feedback regarding the following approach to supporting
> multicast groups in userspace, and in particular for MPI.  Based on side
> conversations, I need to know if this approach would meet the needs of MPI
> developers.
> 
> To join / leave a multicast group, my proposal is to add the following APIs to
> the rdma_cm.  (Note I haven't implemented this yet, so I'm just assuming that
> it's possible at this point.)
> 
> /* Asynchronously join a multicast group. */
> int rdma_set_option(struct rdma_cm_id *id, int level, int optname,
>                         void *optval, size_t optlen);
> 
> /* Retrieve multicast group information - not usually called. */
> int rdma_get_option(struct rdma_cm_id *id, int level, int optname,
>                         void *optval, size_t optlen);
> 
> /*
>  * Post a message on the QP associated with the cm_id for the
>  * specified multicast address.
> */
> int rdma_sendto(struct rdma_cm_id *id, struct ibv_send_wr *send_wr,
>                   struct sockaddr *to);
> 
> ---
> 
> As an example of how these APIs would be used:
> 
> /* The cm_id provides event handling and context. */
> rdma_create_id(&id, context);
> 
> /* Bind to a local interface to attach to a local device. */
> rdma_bind_addr(id, local_addr);
> 
> /* Allocate a PD, CQs, etc. */
> pd = ibv_alloc_pd(id->verbs);
> ...
> 
> /*
>  * Create a UD QP associated with the cm_id.
>  * TBD: automatically transition the QP to RTS for UD QP types?
>  */
> rdma_create_qp(id, pd, init_attr);
> 
> /* Bind to multicast group. */
> mcast_ip = 224.0.0.74.71; /* some fine mcast addr */
> ip_mreq.imr_multiaddr = mcast_ip.in_addr;
> rdma_set_option(id, RDMA_PROTO_IP, IP_ADD_MEMBERSHIP, &ip_mreq,
>                   sizeof(ip_mreq));
> 
> /* Wait for join to complete. */
> rdma_get_cm_event(&event);
> if (event->event == RDMA_CM_EVENT_JOIN_COMPLETE)
>       /* join worked - we could call rdma_get_option() here */
>       /* The rdma_cm attached the QP to the multicast group for us. */
> ...
> rdma_ack_cm_event(event);
> 
> /*
>  * Format a send wr.  The ah, remote_qpn, and remote_qkey are
>  * filled out by the rdma_cm based on the provided destination
>  * address.
>  */
> rdma_sendto(id, send_wr, &mcast_ip);
> 
> ---
> 
> The multicast group information is created / managed by the rdma_cm.  The
> rdma_cm defines the mgid, q_key, p_key, sl, flowlabel, tclass, and joinstate.
> Except for mgid, these would most likely match the values used by the ipoib
> broadcast group.  The mgid mapping would be similar to that used by ipoib.  
> The
> actual MCMember record would be available to the user by calling
> rdma_get_option.
> 
> I don't believe that there would be any restriction on the use of the QP that 
> is
> attached to the multicast group, but it would take more work to support more
> than one multicast group per QP.  The purpose of the rdma_sendto() routine is 
> to
> map a given IP address to an allocated address handle and Qkey.  At this 
> point,
> rdma_sendto would only work for multicast addresses that have been joined by 
> the
> user.
> 
> If a user wanted more control over the multicast group, we could support a 
> call
> such as:
> 
> struct ib_mreq {
>       struct ib_sa_mcmember_rec       rec;
>       ib_sa_comp_mask                 comp_mask;
> }
> 
> rdma_set_option(id, RDMA_PROTO_IB, IB_ADD_MEMBERSHIP, &ib_mreq,
>                   sizeof(ib_mreq));
> 
> Thoughts?
> 
> - Sean
> _______________________________________________
> openib-general mailing list
> openib-gene...@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 

Reply via email to