Re: [PATCH] rdma cm + XRC

Jason Gunthorpe Wed, 11 Aug 2010 18:55:37 -0700

On Wed, Aug 11, 2010 at 05:04:00PM -0700, Hefty, Sean wrote:
> > Maybe 3 functions, since you already have create_ep:
> > create_id_ep - takes rdma_addrinfo, allocates PD/XRC, rdma_cm_id
> > create_qp_ep - takes rdma_addrinfo, allocates QP, CQ, etc
> > create_ep - just calls both the above. Very simplified
> > (not sure on the names)
> 
> This is similar to what I was thinking, except I would just use the
> existing create_qp.
> 
> I need to give adding PDs to rdma_addrinfo more thought.  It seems
> that if AF_IB were ever accepted, then device specific addressing
> could be used, rather than relying on mapped values.  As an
> alternative, we could define a function like:


Even so, the point of passing it into rdma_getaddrinfo is to restrict
the device rdma_getaddrinfo selects, it doesn't matter that you can get
back to the PD from the addrinfo if that PD doesn't have the resources
you need attached to it. Again, I'm thinking from the app perspective
where juggling multiple PDs isn't really done, and thus multiple
connections on different HCAs are not supported by the app. This model
should be supportable without introducing random failures when
rdma_getaddrinfo returns things that use other devices.

> struct ibv_context *rdma_get_device(rdma_addrinfo *res);
> 
> Internally, this would just end up doing a lot of the same work that
> create_id_ep mentioned above would do.

Indeed - so why bother? create_id_ep gets you the ID, bound to a
device with a verbs handle. Follow-up calls can use the convetion that
0 for the PD means 'use the global default PD' otherwise an app can
allocate a new PD, or find an existing one using the verbs handle
provided.

> > It looks to me like the main use model for this is peer-peer, so each
> > side would establish their send half independently and message routing
> > would be app specific. This means the CM initiator side should be the
> > side that has the INI QP and the CM target side should be the side
> > with TGT - ?
 
> This is why I questioned what the desired behavior should be (from
> an API perspective).  If the main usage model is peer-peer, then the
> librdmacm _could_ allocate and connect XRC INI and TGT QPs as pairs,
> so that bidirectional traffic was possible.  (For example, the
> rdma_cm could respond to a CM REQ with a CM REP and a CM REQ for the
> return path.)

If the model is peer-peer having half duplex connections is ideal -
peer-peer would mean that either side could start setting things up at
any time, dealing with the inherent races is a huge messy
problem. Having each side control when its half of the connection
starts up cleans things up tremendously.

> I'm not saying this wouldn't end up in an implementation mess.  The
> easiest thing to do is just perform a unidirectional connect and
> leave the SRQs up to the user.  XRC just seems hideous to use from
> an application programmer viewpoint, but it seems worth exploring if
> an app could make use of it without significant changes from what
> they would do for RC QPs.

Well.. I admit I can't think of many uses for XRC - but I can think of
two reasonable approached for non peer-peer apps:
 - Create a RC QP and build the XRC QPs by exchanging messages within
   the RC QP - avoid the CM protocol entirely. XRC would then be a
   secondary channel to the RC channel, probably for buffer size sorting
   purposes
 - Create the TGT QP on the initiator side and pass its info in the
   private message and do a double QP setup. More complex - does
   rdmacm provide hooks to do this?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] rdma cm + XRC

Reply via email to