On Thu, Aug 20, 2015 at 11:34:58PM -0700, Christoph Hellwig wrote: > How is this going to work for drivers that might consumer multiple > MRs per request like SRP or similar upcoming block drivers? Unless > you want to allocate a potentially large number of MRs for each > request that scheme doesn't work.
There are at least two approaches, and it depends on how flow control to the driving layer works out. Look at what the ULP does when the existing MR pool exhausts: - Exhaustion is not allowed. In this model every slot must truely handle every required action without blocking. The ULP somehow wrangles things so pool exhaustion is not possible. NFS client is a good example. Where NFS client went wrong is that the MR alone is not enough, issuing a request requires SQE/CQE resources, failing to track that caused hard to find bugs. - Exhaustion is allowed, and somehow the ULP is able to stop processing. In this case you'd just swap MRs for slots in the pool, probably having pools of different kinds of slots to optimize resource use. Pool draw down includes SQE/CQE/etc resources as well. A multiple rkey MR case would just draw down the required slots from the pool. I suspect client side tends to lean toward the first option and target side the second - targets can always do back pressure flow control by simply halting RQE processing, and it makes alot of sense on a target to globally pool slots across all client QPs. This idea of a slot is just a higher level structure we can hang other stuff off - like the sg/mr decision, the iwarp rdma read change, sqe accounting. We don't need to start with everything, but I'm looking at Sagi's notes on trying to factor the lkey side code paths and thinking a broader abstraction than raw MR is needed to solve that. > FYI, I have working early patches to do per-WR completion callback, > I'll post them after I get them into a slightly better shape. Interesting.. > As for your grand schemes: I like some of the idea there, but we > need to get there gradually. I'd much prefer to finish Sagi's simple > scheme, get my completion work in, add abstractions for RDMA READ and > WRITE scatterlist mapping and build things up slowly. Yes, absolutely, we have to go slowly - but exploring how we can fit this together in some other way can help guide some of the smaller choices. Sagi could drop the lkey side, getting the rkey side in order would be nice enough. Something like this is a direction to address the lkey side. Ie we could 1:1 replace MR with 'slot' and use that to factor the lkey code paths. Over time slot can grow organically to factor more code. Slot would be a new object for the core, one that is guarenteed to last from post->completion, that seems like exactly the sort of object a completion callback scheme would benefit from. Guarenteed memory to hang callback pointers/etc off. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html