> The big hack is that the SRQ number needs to be transmitted to the remote
> side. This patch hijacks the private data, so it's not acceptable. Ideally 
> the SRQ
> number should be transmitted either in the REQ or REP packet (depending
> on which side the sender or the receiver) alongside the QP number. But that
> would need a change in the specs. Any suggestions ?
> 
> Also a good chunk of the patch is to deal with the XRC verbs API. I wonder
> whether XRC could/should be more integrated into the existing verbs:
> - sender should not need a domain,
> - there should be 2 types of xrc QPs (send and receive) instead of one,
> - *_xrc_rcv_qp verbs should be abstracted under the cover in libibverbs,

I've spent some time reading over the XRC patches in Roland's git tree and the 
XRC patches to OFED's version of libibverbs.  These are some of the ideas that 
I've jotted down to support XRC through the librdmacm and mainline libibverbs, 
in no specific order.   (There may very well be implementation issues with 
these.)

1. The IB CM needs to be updated to connect XRC INI QPs to XRC TGT QPs.  This 
should be fairly simple.

2. As mentioned, there's no standard way of obtaining the SRQN.  I will submit 
a comment to the IBTA on this.  My recommendation will be to use the IB CM SIDR 
protocol.
        2a. As an optimization, the IB CM REP could optionally return an SRQN 
in the EECN.
        2b. It may be useful if the SIDR REQ carried the XRC INI/TGT QPNs

3. I don't see an easy way to hide the 'XRC domain'.  However, if I look at the 
existing libibverbs and librdmacm APIs, it may be simpler for the user and API 
compatibility if it were abstracted behind a PD (struct ibv_pd).  For example, 
a kernel XRC domain could be created the first time an XRC object is allocated 
on a PD.  In order to share an XRC domain among multiple processes, we would 
need a new call (ibv_share_pd?  ibv_modify_pd?).

4. There doesn't seem to be a strong reason to expose the XRC TGT QP to user 
space.  A kernel XRC component could accept and manage XRC target connections.

5. Assuming that XRC TGT QP is not exposed, libibverbs would use IBV_QPT_XRC 
only as the send side QP.  In order to support this QPT through the librdmacm, 
we would need to know what port space XRC QPs use, to know what SID range to 
map to, if any.  I will submit a comment to the IBTA on how XRC makes use of 
the RDMA IP CM Service.

6.  The XRC SRQ is more troubling to fit under the existing APIs.  The one idea 
I had was to treat the XRC SRQ as a QP (struct ibv_qp) rather than like an SRQ 
(struct ibv_srq).  This would require defining an IBV_QPT_SRQ type.  The SRQN 
ends up being a QPN for all purposes, though I don't know if this would cause 
other issues if the SRQNs and real QPNs overlap.


With these changes, the librdmacm usage model would be:

Passive side:
        rdma_create_ep()  /* qp_type = IBV_QPT_SRQ */
        rdma_listen()  /* listen for SIDR REQ */
        rdma_get_cm_event()
        rdma_accept()

Active side:
        rdma_getaddrinfo() /* qp_type = IBV_QPT_XRC */
        rdma_create_ep()
        rdma_connect() /* connects to XRC TGT QP */

        rdma_getaddrinfo() /* qp_type = IBV_QPT_SRQ */
        rdma_create_ep()
        rdma_connect() /* resolves XRC SRQN */

This doesn't include code necessary to share the 'XRC domain' among multiple 
processes on the passive side.  On the passive side, the kernel XRC component 
would respond to IB CM REQs based on whether there was an associated SRQ listen.

I realize this model would deviate from the OFED libibverbs APIs, but I don't 
want to break the librdmacm ABI, or necessarily add an entire new set of APIs 
just to support XRC.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to