Copying the Open MPI folks on this thread. - Matt
On Wed, 2006-04-19 at 12:05 -0700, Sean Hefty wrote: > I'd like to get some feedback regarding the following approach to supporting > multicast groups in userspace, and in particular for MPI. Based on side > conversations, I need to know if this approach would meet the needs of MPI > developers. > > To join / leave a multicast group, my proposal is to add the following APIs to > the rdma_cm. (Note I haven't implemented this yet, so I'm just assuming that > it's possible at this point.) > > /* Asynchronously join a multicast group. */ > int rdma_set_option(struct rdma_cm_id *id, int level, int optname, > void *optval, size_t optlen); > > /* Retrieve multicast group information - not usually called. */ > int rdma_get_option(struct rdma_cm_id *id, int level, int optname, > void *optval, size_t optlen); > > /* > * Post a message on the QP associated with the cm_id for the > * specified multicast address. > */ > int rdma_sendto(struct rdma_cm_id *id, struct ibv_send_wr *send_wr, > struct sockaddr *to); > > --- > > As an example of how these APIs would be used: > > /* The cm_id provides event handling and context. */ > rdma_create_id(&id, context); > > /* Bind to a local interface to attach to a local device. */ > rdma_bind_addr(id, local_addr); > > /* Allocate a PD, CQs, etc. */ > pd = ibv_alloc_pd(id->verbs); > ... > > /* > * Create a UD QP associated with the cm_id. > * TBD: automatically transition the QP to RTS for UD QP types? > */ > rdma_create_qp(id, pd, init_attr); > > /* Bind to multicast group. */ > mcast_ip = 224.0.0.74.71; /* some fine mcast addr */ > ip_mreq.imr_multiaddr = mcast_ip.in_addr; > rdma_set_option(id, RDMA_PROTO_IP, IP_ADD_MEMBERSHIP, &ip_mreq, > sizeof(ip_mreq)); > > /* Wait for join to complete. */ > rdma_get_cm_event(&event); > if (event->event == RDMA_CM_EVENT_JOIN_COMPLETE) > /* join worked - we could call rdma_get_option() here */ > /* The rdma_cm attached the QP to the multicast group for us. */ > ... > rdma_ack_cm_event(event); > > /* > * Format a send wr. The ah, remote_qpn, and remote_qkey are > * filled out by the rdma_cm based on the provided destination > * address. > */ > rdma_sendto(id, send_wr, &mcast_ip); > > --- > > The multicast group information is created / managed by the rdma_cm. The > rdma_cm defines the mgid, q_key, p_key, sl, flowlabel, tclass, and joinstate. > Except for mgid, these would most likely match the values used by the ipoib > broadcast group. The mgid mapping would be similar to that used by ipoib. > The > actual MCMember record would be available to the user by calling > rdma_get_option. > > I don't believe that there would be any restriction on the use of the QP that > is > attached to the multicast group, but it would take more work to support more > than one multicast group per QP. The purpose of the rdma_sendto() routine is > to > map a given IP address to an allocated address handle and Qkey. At this > point, > rdma_sendto would only work for multicast addresses that have been joined by > the > user. > > If a user wanted more control over the multicast group, we could support a > call > such as: > > struct ib_mreq { > struct ib_sa_mcmember_rec rec; > ib_sa_comp_mask comp_mask; > } > > rdma_set_option(id, RDMA_PROTO_IB, IB_ADD_MEMBERSHIP, &ib_mreq, > sizeof(ib_mreq)); > > Thoughts? > > - Sean > _______________________________________________ > openib-general mailing list > openib-gene...@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >