> On Oct 28, 2015, at 4:10 PM, Jason Gunthorpe 
> <jguntho...@obsidianresearch.com> wrote:
> 
> On Wed, Oct 28, 2015 at 03:56:08PM -0400, Chuck Lever wrote:
> 
>> A key question is whether connection loss guarantees that the
>> server is fenced, for all device types, from existing
>> registered MRs. After reconnect, each MR must be registered
>> again before it can be accessed remotely. Is this true for the
>> Linux IB core, and all kernel providers, when using FRWR?
> 
> MR validation is not linked to a QP in any way. The memory is not
> fully fenced until the invalidate completes, or the MR unregister
> completes. Nothing else is good enough.

IBTA spec states:

> MW access operations (i.e. RDMA Write, RDMA Reads, and Atom-
> ics) are only allowed if the Type 2B MW is in the Valid state and the
> QP Number (QPN) and PD of the QP performing the MW access op-
> eration matches the QPN and PD associated with the Bound Type 2B
> MW.

Once the QP is out of RTS, there can be no incoming RDMA
requests that match the R_key, QPN, PD tuple. I think you
are saying that the QP state change has the same problem
as not waiting for an invalidation to complete.


>> After a connection loss, the Linux kernel RPC/RDMA client
>> creates a new QP as it reconnects, thus I’d expect the QPN to
>> be different on the new connection. That should be enough to
>> prevent access to MRs that were registered with the previous
>> QP and PD, right?
> 
> No, the NFS implementation creates a single PD for everything and any
> QP in the PD can access all the MRs. This is another security issue of
> a different sort.

I’m speaking only of the client at the moment.


> If there was one PD per QP then the above would be true, since the MR
> is linked to the PD.

There is a per-connection struct rpcrdma_ia that contains
both a PD and a QP. Therefore there is one PD and only one
QP (on the client) per connection.

Transport reconnect replaces the QP, but not the PD. See
rpcrdma_ep_connect().


> Even so, moving a QP out of RTR is not a synchronous operation, and
> until the CQ is drained, the disoposition of ongoing RDMA is not
> defined.
> 
> Basically: You can't avoid actually doing a blocking invalidate
> operation. The core layer must allow for this if it is going to async
> cancel RPCs.

Disappointing, but understood.


> FWIW, the same is true on the send side too, if the RPC had send
> buffers and gets canceled, you have to block until a CQ linked to that
> send is seen.

By “you have to block” you mean the send buffer cannot be reused
until the Send WR is known to have completed, and new Send WRs
cannot be posted until it is known that enough send queue resources
are available.

The connection recovery logic in rpcrdma_ep_connect should flush
pending CQs. New RPCs are blocked until a new connection is
established, although I’m not certain we are careful to ensure
the hardware has truly relinquished the send buffer before it is
made available for re-use. A known issue.


—-
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to