[PATCH WIP 00/43] New fast registration API

2015-07-21 Thread Sagi Grimberg
Hi all,

So I went ahead and tried to implement some of the stuff
we've been talking about. I figured I'd send out a WIP version
to try and communicate early where this is heading.

In order to have a sane patchset I followed a scheme that
add-new/port-existing/drop-old...

The set starts with:
- Convert ib_create_mr API to ib_alloc_mr as Christoph suggested (1)
- Add vendor drivers support for ib_alloc_mr (2-7)
- Port ULPs to use ib_alloc_mr (8-12)
- Drop alloc_fast_reg_mr API (core + vendor drivers) (13-20)

Continues with:
- Allocate vendor private page lists (21-27)
- Add a new fast registration API that will replace existing frwr (28)
- Add support for the new API in relevant vendor drivers (29-35)
  * its a bit hacky since just bluntly duplicated the registration routines
keep in mind that this is transient until we drop the old API...
- Port ULPs to use the new API (iser, isert, xprtrdma for now) (36-38)
  this is on top of Chuck's nfs-rdma-for-4.3 and updated iser/isert code

The set should end with:
- Complete ULPs porting (svcrdma, rds, srp)
- Drop old fast registration API - FRWR (core + vendor drivers)
- Still have the huge-pages bit to work out.

I also added the arbitrary sg list registration support to mlx5 and iser
in a less intrusive API additions (39-43) just to show the concept.

This set was lightly tested on the ported ULPs over mlx5 (didn't have a
chance to test mlx4 yet).

The main reasons for this preview are:
- Help with testing (especially on devices that I don't have access to
  e.g cxgb3, cxgb4, ocrdma, nes, qib). I probably have bugs there
  as I just compile tested so far.
- Help with porting of the rest of the ULPs (rds, srp, svcrdma)
- Early code review

What I've noticed from this effort was that several drivers keep
a shadow mapped page lists for specific device settings. At registration
time, the drivers iterate on the page list and sets the mapped page list
entries with some extra information. I'd expect these drivers not to use
the core function to map SG list to pages and use it's own function which
will allow them to lose their page list duplication. I haven't done that yet.

Comments and review are welcomed (and needed!).

Sorry for the long series, but it's kinda transverse...

The code/patches can be found in:
https://github.com/sagigrimberg/linux/tree/fastreg_api_wip

Sagi Grimberg (43):
  IB: Modify ib_create_mr API
  IB/mlx4: Support ib_alloc_mr verb
  ocrdma: Support ib_alloc_mr verb
  iw_cxgb4: Support ib_alloc_mr verb
  cxgb3: Support ib_alloc_mr verb
  nes: Support ib_alloc_mr verb
  qib: Support ib_alloc_mr verb
  IB/iser: Convert to ib_alloc_mr
  iser-target: Convert to ib_alloc_mr
  IB/srp: Convert to ib_alloc_mr
  xprtrdma, svcrdma: Convert to ib_alloc_mr
  RDS: Convert to ib_alloc_mr
  mlx5: Drop mlx5_ib_alloc_fast_reg_mr
  mlx4: Drop mlx4_ib_alloc_fast_reg_mr
  ocrdma: Drop ocrdma_alloc_frmr
  qib: Drop qib_alloc_fast_reg_mr
  nes: Drop nes_alloc_fast_reg_mr
  cxgb4: Drop c4iw_alloc_fast_reg_mr
  cxgb3: Drop iwch_alloc_fast_reg_mr
  IB/core: Drop ib_alloc_fast_reg_mr
  mlx5: Allocate a private page list in ib_alloc_mr
  mlx4: Allocate a private page list in ib_alloc_mr
  ocrdma: Allocate a private page list in ib_alloc_mr
  cxgb3: Allocate a provate page list in ib_alloc_mr
  cxgb4: Allocate a private page list in ib_alloc_mr
  qib: Allocate a private page list in ib_alloc_mr
  nes: Allocate a private page list in ib_alloc_mr
  IB/core: Introduce new fast registration API
  mlx5: Support the new memory registration API
  mlx4: Support the new memory registration API
  ocrdma: Support the new memory registration API
  cxgb3: Support the new memory registration API
  cxgb4: Support the new memory registration API
  nes: Support the new memory registration API
  qib: Support the new memory registration API
  iser: Port to new fast registration api
  xprtrdma: Port to new memory registration API
  iser-target: Port to new memory registration API
  IB/core: Add arbitrary sg_list support
  mlx5: Allocate private context for arbitrary scatterlist registration
  mlx5: Add arbitrary sg list support
  iser: Accept arbitrary sg lists mapping if the device supports it
  iser: Move unaligned counter increment

 drivers/infiniband/core/verbs.c | 164 ++
 drivers/infiniband/hw/cxgb3/iwch_provider.c |  35 -
 drivers/infiniband/hw/cxgb3/iwch_provider.h |   2 +
 drivers/infiniband/hw/cxgb3/iwch_qp.c   |  48 +++
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h  |  12 +-
 drivers/infiniband/hw/cxgb4/mem.c   |  38 +-
 drivers/infiniband/hw/cxgb4/provider.c  |   3 +-
 drivers/infiniband/hw/cxgb4/qp.c|  75 +-
 drivers/infiniband/hw/mlx4/main.c   |   3 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h|  14 +-
 drivers/infiniband/hw/mlx4/mr.c |  74 +-
 drivers/infiniband/hw/mlx4/qp.c |  27 
 drivers/infiniband/hw/mlx5/main.c   |   5 +-
 drivers/infin

Re: [PATCH WIP 00/43] New fast registration API

2015-07-22 Thread Christoph Hellwig
Thanks Sagi,

this looks pretty good in general, various nitpicks nonwithstanding.

The one thing I'm curious about is how we can support SRP with it's
multiple MR support without too much boilerplate code.  One option
would be that pass an array of MRs to the map routines, and while
most callers would just pass in one it would handle multiple for those
drivers that supply them.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH WIP 00/43] New fast registration API

2015-07-22 Thread Jason Gunthorpe
On Wed, Jul 22, 2015 at 10:10:23AM -0700, Christoph Hellwig wrote:
> The one thing I'm curious about is how we can support SRP with it's
> multiple MR support without too much boilerplate code.  One option
> would be that pass an array of MRs to the map routines, and while
> most callers would just pass in one it would handle multiple for those
> drivers that supply them.

What is SRP trying to accomplish with that?

The only reason that springs to mind is to emulate IB_MR_MAP_ARB_SG ?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH WIP 00/43] New fast registration API

2015-07-22 Thread Sagi Grimberg

On 7/22/2015 8:10 PM, Christoph Hellwig wrote:

Thanks Sagi,

this looks pretty good in general, various nitpicks nonwithstanding.

The one thing I'm curious about is how we can support SRP with it's
multiple MR support without too much boilerplate code.  One option
would be that pass an array of MRs to the map routines, and while
most callers would just pass in one it would handle multiple for those
drivers that supply them.


We can do that, but I'd prefer not to pollute the API just for this
single use case. What we can do, is add a pool API that would take care
of that. But even then we might end up with different strategies as not
all ULPs can use it the same way (protocol constraints)...

Today SRP has this logic that registers multiple SG aligned partials.
We can just have it pass a partial SG list to what we have today instead
of building the page vectors...

Or if we can come up with something that will keep the API trivial, we
can take care of that too.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH WIP 00/43] New fast registration API

2015-07-23 Thread Christoph Hellwig
On Wed, Jul 22, 2015 at 11:27:02AM -0600, Jason Gunthorpe wrote:
> What is SRP trying to accomplish with that?
> 
> The only reason that springs to mind is to emulate IB_MR_MAP_ARB_SG ?

It's not emulating IB_MR_MAP_ARB_SG, it simply allows muliple memory
registrations per I/O request.  Be that to support gappy SGLs in a
generic way, or to allow larger I/O sizes than the HCA MR size.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH WIP 00/43] New fast registration API

2015-07-23 Thread Christoph Hellwig
On Wed, Jul 22, 2015 at 08:42:32PM +0300, Sagi Grimberg wrote:
> We can do that, but I'd prefer not to pollute the API just for this
> single use case. What we can do, is add a pool API that would take care
> of that. But even then we might end up with different strategies as not
> all ULPs can use it the same way (protocol constraints)...
> 
> Today SRP has this logic that registers multiple SG aligned partials.
> We can just have it pass a partial SG list to what we have today instead
> of building the page vectors...
> 
> Or if we can come up with something that will keep the API trivial, we
> can take care of that too.


Supporting an array or list of MRs seems pretty easy.  If you ignore the
weird fallback to physical DMA case when a MR fails case the SRP memory
registration code isn't significanly more complex than that in iSER for
example.  And I think NFS needs the same support as well, as it allows
using additional MRs when detecting a gap.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH WIP 00/43] New fast registration API

2015-07-23 Thread Sagi Grimberg

On 7/23/2015 12:28 PM, Christoph Hellwig wrote:

On Wed, Jul 22, 2015 at 08:42:32PM +0300, Sagi Grimberg wrote:

We can do that, but I'd prefer not to pollute the API just for this
single use case. What we can do, is add a pool API that would take care
of that. But even then we might end up with different strategies as not
all ULPs can use it the same way (protocol constraints)...

Today SRP has this logic that registers multiple SG aligned partials.
We can just have it pass a partial SG list to what we have today instead
of building the page vectors...

Or if we can come up with something that will keep the API trivial, we
can take care of that too.



Supporting an array or list of MRs seems pretty easy.


I'm missing the simplicity here...


If you ignore the
weird fallback to physical DMA case when a MR fails case the SRP memory
registration code isn't significanly more complex than that in iSER for
example.  And I think NFS needs the same support as well, as it allows
using additional MRs when detecting a gap.



This kinda changing the semantics a bit. With this we need to return a
value of how many MRs used to register. It will also make it a bit
sloppy as the actual mapping is driven from the drivers (which use their
internal buffers).

Don't you think that a separate pool API is better for addressing this?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html