RE: [PATCH 1/3] IB/uverbs: reject invalid or unknown opcodes

2015-08-19 Thread Hefty, Sean
> AFAIK, this path is rarely (never?) actually used. I think all the
> drivers we have can post directly from userspace.

I didn't think the ipath or qib drivers post from userspace.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V8 6/6] IB/ucma: HW Device hot-removal support

2015-08-19 Thread Jason Gunthorpe
On Wed, Aug 19, 2015 at 04:59:11PM +0300, Yishai Hadas wrote:
> On 8/18/2015 8:50 PM, Jason Gunthorpe wrote:
> >On Thu, Aug 13, 2015 at 06:32:07PM +0300, Yishai Hadas wrote:
> >>@@ -501,10 +586,24 @@ static ssize_t ucma_destroy_id(struct ucma_file 
> >>*file, const char __user *inbuf,
> >>+   if (!ctx->closing) {
> >>+   mutex_unlock(&mut);
> >>+   ucma_put_ctx(ctx);
> >>+   wait_for_completion(&ctx->comp);
> >>+   rdma_destroy_id(ctx->cm_id);
> >
> >Suggest nulling cm_id after it is destroyed in all places, this code
> >is very complicated, I'd rather see a nice clean risk of
> >null-pointer-deref than an undetected use-after free if it gets messed
> >up.
> 
> It can be helpful for debugging but usually nulling is not done when it's
> not really needed, because it is considered redundant.
> Currently it's not the usage in this module and in cma.c when calling
> rdma_destroy_id.

Well, 'not really needed' should be something simple to prove, and
this is not at all simple.

> >Isn't the list_for_each_safe version needed if list_del/kfree is called
> >within the body?
> No need for a safe version here.

Okay

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] IB/uverbs: reject invalid or unknown opcodes

2015-08-19 Thread Jason Gunthorpe
On Wed, Aug 19, 2015 at 07:48:02PM +0200, Christoph Hellwig wrote:
> On Wed, Aug 19, 2015 at 11:46:14AM -0600, Jason Gunthorpe wrote:
> > Reviewed-by: Jason Gunthorpe 
> > 
> > AFAIK, this path is rarely (never?) actually used. I think all the
> > drivers we have can post directly from userspace.
> 
> Oh, interesting.  Is there any chance to deprecate it?  Not having
> to care for the uvers command would really help with some of the
> upcoming changes I have in my mind.

Hmm, we'd need a survey of the userspace side to see if it is rarely
or never...

And we'd have to talk to the soft XXX guys to see if they plan to use
it..

I always like to see cruft go away, especially if it is broken cruft..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] IB: remove xrc_remote_srq_num from struct ib_send_wr

2015-08-19 Thread Jason Gunthorpe
On Wed, Aug 19, 2015 at 06:37:34PM +0200, Christoph Hellwig wrote:
> The field is only initialized in mlx, but never used.
> 
> If we want to add proper XRC support it should be done with a new
> struct ib_xrc_wr.
> 
> This shrinks the various WR structures by another 4 bytes.

Reviewed-by: Jason Gunthorpe 

I've never really been entirely clear on how much of this stuff needs
to be kernel side. Even the switch doesn't really make alot of sense..

> +++ b/drivers/infiniband/hw/mlx5/qp.c
> @@ -2634,7 +2634,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
> ib_send_wr *wr,
>   switch (ibqp->qp_type) {
>   case IB_QPT_XRC_INI:
>   xrc = seg;
> - xrc->xrc_srqn = htonl(wr->xrc_remote_srq_num);


Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] IB/uverbs: reject invalid or unknown opcodes

2015-08-19 Thread Christoph Hellwig
On Wed, Aug 19, 2015 at 11:46:14AM -0600, Jason Gunthorpe wrote:
> Reviewed-by: Jason Gunthorpe 
> 
> AFAIK, this path is rarely (never?) actually used. I think all the
> drivers we have can post directly from userspace.

Oh, interesting.  Is there any chance to deprecate it?  Not having
to care for the uvers command would really help with some of the
upcoming changes I have in my mind.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] IB/uverbs: reject invalid or unknown opcodes

2015-08-19 Thread Jason Gunthorpe
On Wed, Aug 19, 2015 at 06:37:32PM +0200, Christoph Hellwig wrote:
> We have many WR opcodes that are only supported in kernel space
> and/or require optional information to be copied into the WR
> structure.  Reject all those not explicitly handled so that we
> can't pass invalid information to drivers.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christoph Hellwig 
>  drivers/infiniband/core/uverbs_cmd.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)

Oh yes, this is absolutely needed for -stable.

Reviewed-by: Jason Gunthorpe 

AFAIK, this path is rarely (never?) actually used. I think all the
drivers we have can post directly from userspace.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API

2015-08-19 Thread Jason Gunthorpe
On Wed, Aug 19, 2015 at 02:56:24PM +0300, Sagi Grimberg wrote:
> On 7/23/2015 8:55 PM, Jason Gunthorpe wrote:
> >On Thu, Jul 23, 2015 at 01:15:16PM +0300, Sagi Grimberg wrote:
> >>>I was hoping we'd move the DMA flush and translate into here and make
> >>>it mandatory. Is there any reason not to do that?
> >>
> >>The reason I didn't added it in was so the ULPs can make sure they meet
> >>the restrictions of ib_map_mr_sg(). Allow SRP to iterate on his
> >>SG list set partials and iSER to detect gaps (they need to dma map
> >>for that).
> >
> >The ULP can always get the sg list's virtual address to check for
> >gaps. Page aligned gaps are always OK.
> 
> So I had a go with moving the DMA mapping into ib_map_mr_sg() and
> it turns out mapping somewhat poorly if the ULP _may_ register memory
> or just send sg_lists (like storage targets over IB/iWARP). So the ULP
> will sometimes use the DMA mapping and sometimes it won't... feels
> kinda off to me...

You need to split the rkey and lkey API flows to pull this off - the
rkey side never needs to touch a sg, while the lkey side should always
try and use a sg first. I keep saying this: they have
fundamentally different ULP usages.

> 1. if register - call ib_map_mr_sg (which calls dma_map_sg)
>else do dma_map_sg
> 2. if registered - call ib_dma_unmap_sg (which calles dma_unmap_sg)
>else do dma_unmap_sg

>From what I've seen in the ULPs the flow control is generally such
that the MR is 'consumed' even if it isn't used by a send.

So lkey usage is simply split into things that absolutely don't need a
MR, and things that maybe do. The maybe side can go ahead and always
consume the MR resource, but optimize the implementation to a SG list
to avoid a performance hit.

Then the whole API becomes symmetric. The ULP says, 'here is a
scatterlist list and a lkey MR, make me a ib_sg list' and the core
either packes it as is into the sg, or it spins up the MR and packs
that.

This lets the unmap be symmetric, as the core always dma_unmaps, but
only tears down the MR if it was used.

The cost is the lkey MR slot is always consumed, which should be OK
because SQE flow control bounds the number of concurrent MRs required,
so consuming a SQE but not a MR doesn't provide an advantage.

> Also, at the moment, when ULPs are doing either FRWR or FMRs
> its a pain to get a non-intrusive conversion.

Without FMR sharing API entry points it is going to be hard to unify
them..

ie the map and alloc API side certainly could be shared..

> I'm thinking we should keep dma_map_sg out of ib_map_mr_sg, and leave
> it to the ULP like it does today (at least in the first stage...)

I'm fine with first stage, but we still really do need to figure how
how to get better code sharing in our API here..

Maybe we can do the rkey side right away until we can figure out how
to harmonize the rkey sg/mr usage?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API

2015-08-19 Thread Christoph Hellwig
On Wed, Aug 19, 2015 at 07:09:18PM +0300, Sagi Grimberg wrote:
> Ok, I was also thinking on moving the access flags
> to the work request again.

Yes, with the current code I don't think we need it in the MR.

> I'd prefer to get this right with a different helper like Steve
> suggested:
> int rdma_access_flags(int mr_roles);

We can start with that.  In the long run we really want to have
two higher level helpers to RDMA READ a scatterlist:

 - one for iWARP that uses an FR and RDMA READ WITH INVALIDATE
 - one of IB-like transports that just uses a READ with the
   local lkey

Right now every ULP that wants to support iWarp needs to duplicate
that code.  This leads to some curious situations like the NFS
server apparently always using FRs if available for this if my
reading of svc_rdma_accept() is correct, or the weird parallel
code pathes for IB vs iWarp in RDS:

hch@brick:~/work/linux/net/rds$ ls ib*
ib.c  ib_cm.c  ib.h  ib_rdma.c  ib_recv.c  ib_ring.c  ib_send.c
ib_stats.c  ib_sysctl.c
hch@brick:~/work/linux/net/rds$ ls iw*
iw.c  iw_cm.c  iw.h  iw_rdma.c  iw_recv.c  iw_ring.c  iw_send.c
iw_stats.c  iw_sysctl.c

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


shrink struct ib_send_wr

2015-08-19 Thread Christoph Hellwig
This series shrinks the WR size by splitting out the different WR
types.

Patch number two is too large for the mailinglist, so if you didn't
get it grab it here:


http://git.infradead.org/users/hch/rdma.git/commitdiff/30e522ee6c1d7adb614d7308f09fbfd71c6d3e07

or the full git tree at:

git://git.infradead.org/users/hch/rdma.git wr-cleanup

Changes since the version sent around a couple of times:

 - new patch 1 which rejects invalid opcodes.  Patch 2 was doing
   this implicitly except for UD QPs, but I think we should a)
   do this explicitly and b) ensure this goes into 4.2 and -stable
   as I can see quite a lot of harm from submitting such malformed
   operations
 - patch 2 now covers all drivers including those in staging to
   side step any sort of discussions on the staging tree.
 - patch 2 now explicitly replaces the weird overloading in the mlx5
   driver with an explicit embedding of struct ib_send_wr, similar
   to what we do for all other MRs.
 - new patch to drop another unused send_wr field.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] IB: remove xrc_remote_srq_num from struct ib_send_wr

2015-08-19 Thread Christoph Hellwig
The field is only initialized in mlx, but never used.

If we want to add proper XRC support it should be done with a new
struct ib_xrc_wr.

This shrinks the various WR structures by another 4 bytes.

Signed-off-by: Christoph Hellwig 
---
 drivers/infiniband/hw/mlx5/qp.c | 1 -
 include/rdma/ib_verbs.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 04df156..83a290f 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -2634,7 +2634,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
switch (ibqp->qp_type) {
case IB_QPT_XRC_INI:
xrc = seg;
-   xrc->xrc_srqn = htonl(wr->xrc_remote_srq_num);
seg += sizeof(*xrc);
size += sizeof(*xrc) / 16;
/* fall through */
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9b29c78..b855189 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1100,7 +1100,6 @@ struct ib_send_wr {
__be32  imm_data;
u32 invalidate_rkey;
} ex;
-   u32 xrc_remote_srq_num; /* XRC TGT QPs only */
 };
 
 struct ib_rdma_wr {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] IB/uverbs: reject invalid or unknown opcodes

2015-08-19 Thread Christoph Hellwig
We have many WR opcodes that are only supported in kernel space
and/or require optional information to be copied into the WR
structure.  Reject all those not explicitly handled so that we
can't pass invalid information to drivers.

Cc: sta...@vger.kernel.org
Signed-off-by: Christoph Hellwig 
---
 drivers/infiniband/core/uverbs_cmd.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index a15318a..f9f3921 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2372,6 +2372,12 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
next->send_flags = user_wr->send_flags;
 
if (is_ud) {
+   if (next->opcode != IB_WR_SEND &&
+   next->opcode != IB_WR_SEND_WITH_IMM) {
+   ret = -EINVAL;
+   goto out_put;
+   }
+
next->wr.ud.ah = idr_read_ah(user_wr->wr.ud.ah,
 file->ucontext);
if (!next->wr.ud.ah) {
@@ -2413,7 +2419,8 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
next->wr.atomic.rkey = user_wr->wr.atomic.rkey;
break;
default:
-   break;
+   ret = -EINVAL;
+   goto out_put;
}
}
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API

2015-08-19 Thread Sagi Grimberg




Keep it out for now.


Ok, I was also thinking on moving the access flags
to the work request again. It doesn't make much sense there
unless I go with what Jason suggested with ib_map_mr_[lkey|rkey]
to protect against remote access for lkeys in IB which to me, sounds
redundant at this point given that ULPs will set the access according
to iWARP anyway.

I'd prefer to get this right with a different helper like Steve
suggested:
int rdma_access_flags(int mr_roles);

This way we don't need to protect against it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V8 6/6] IB/ucma: HW Device hot-removal support

2015-08-19 Thread Yishai Hadas

On 8/18/2015 8:50 PM, Jason Gunthorpe wrote:

On Thu, Aug 13, 2015 at 06:32:07PM +0300, Yishai Hadas wrote:

@@ -501,10 +586,24 @@ static ssize_t ucma_destroy_id(struct ucma_file *file, 
const char __user *inbuf,
+   if (!ctx->closing) {
+   mutex_unlock(&mut);
+   ucma_put_ctx(ctx);
+   wait_for_completion(&ctx->comp);
+   rdma_destroy_id(ctx->cm_id);


Suggest nulling cm_id after it is destroyed in all places, this code
is very complicated, I'd rather see a nice clean risk of
null-pointer-deref than an undetected use-after free if it gets messed
up.


It can be helpful for debugging but usually nulling is not done when 
it's not really needed, because it is considered redundant.
Currently it's not the usage in this module and in cma.c when calling 
rdma_destroy_id.



 >> + list_for_each_entry(con_req_eve, &ctx->file->event_list, list) {

+   if (con_req_eve->cm_id == cm_id &&
+   con_req_eve->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST) {
+   list_del(&con_req_eve->list);


Isn't the list_for_each_safe version needed if list_del/kfree is called
within the body?

No need for a safe version here.

The safe version is needed only when a loop continues iterating after 
list_del && kfree, which is not the case here. When the entry is found 
there is a "break" in the code and the iteration is stopped. see next 
few lines in the patch.


See also below same usage as part of mlx4_remove_device.
http://lxr.free-electrons.com/source/drivers/net/ethernet/mellanox/mlx4/intf.c#L70

By the way, even if the code continues iterating the list (which is not 
the case here ...) the code is safe as the free is done from a task that 
calls first rdma_destroy_id which internally waits on same mutex that 
the loop called with as part of cma_remove_id_dev(i.e 
id_priv->handler_mutex). When the loop ends and the mutex is released, 
all the other tasks can continue running and free the entry. (see the 
comment in rdma_destroy_id "Wait for any active callback to finish ...")




The locking looks much saner now, thanks Haggaie.


Yes, thanks as well.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Supported uverbs opcodes?

2015-08-19 Thread Christoph Hellwig
What opcodes are supposed to be submitted by users?

Currently we do not define opcodes in the UAPI and kinda rely that
userspace uses the same ones as the kernel.

For thos defines by libibverbs (RDMA_WRITE, RDMA_WRITE_WITH_IMM,
SEND, SEND_WITH_IMM, RDMA_READ, ATOMIC_CMP_AND_SWP and
ATOMIC_FETCH_AND_ADD) this happens to work fine.

Additionally uvers takes care of SEND_WITH_INV (but not READ_WITH_INV)
and copies the right fields into the kernel WR, but it also allows
all other opcodes without taking care of their fields.

My send_wr split that I'm going to repost soon will disallow all
those not handled explicitly as the current situration seems
dangerous.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V2 4/4] IB/core: RoCE GID management separate cleanup and release

2015-08-19 Thread Matan Barak
On Sun, Aug 16, 2015 at 7:48 PM, Doug Ledford  wrote:
> On 08/16/2015 12:24 PM, Doug Ledford wrote:
>> On 08/16/2015 04:19 AM, Matan Barak wrote:
>>>
>>>
>>> On 8/15/2015 5:14 PM, Doug Ledford wrote:
 On 08/06/2015 01:14 PM, Matan Barak wrote:
> Separate the cleanup and release IB cache functions.
> The cleanup function delete all GIDs and unregister the event channel,
> while the release function frees all memory. The cleanup function is
> called after all clients were removed (in the device un-registration
> process).
>
> The release function is called after the device was put.
> This is in order to avoid use-after-free errors if ther vendor
> driver's teardown code uses IB cache.
>
> Squash-into: 'IB/core: Add RoCE GID table management'
> Signed-off-by: Matan Barak 

 I had originally used the v1 of this patch instead of v2.  In order to
 make sure I didn't miss anything, I hand compared the final results of
 the v1 patch to what this patch would have created if I had used it.  I
 found a few things worth nothing, comments inline below.

> @@ -903,20 +924,28 @@ int ib_cache_setup_one(struct ib_device *device)
> (rdma_end_port(device) -
>  rdma_start_port(device) + 1),
> GFP_KERNEL);
> -err = gid_table_setup_one(device);
> -
> -if (!device->cache.pkey_cache || !device->cache.gid_cache ||
> +if (!device->cache.pkey_cache ||
>   !device->cache.lmc_cache) {
>   printk(KERN_WARNING "Couldn't allocate cache "
>  "for %s\n", device->name);
> -err = -ENOMEM;
> -goto err;
> +/* pkey_cache is freed here because we later access it's
> + * elements.
> + */
> +kfree(device->cache.pkey_cache);
> +device->cache.pkey_cache = NULL;
> +return -ENOMEM;

 This function is unnecessarily convoluted IMO.  A simple change to using
 kzalloc() for the pkey_cache alloc eliminates part of the issue.  As a
 result, I fixed this function up to be different than either patch.
 Please look over my results and make sure you agree with my changes.

>   }
>
> -for (p = 0; p <= rdma_end_port(device) -
> rdma_start_port(device); ++p) {
> +for (p = 0; p <= rdma_end_port(device) -
> rdma_start_port(device); ++p)
>   device->cache.pkey_cache[p] = NULL;

 For instance this goes away entirely by using kzalloc().

 The rest looked ok.  I have to fixup the ordering changes between sysfs
 and cache to make the v1 patch match the v2 patch.


>>>
>>> Thanks for the squashing and re-ordering these patches.
>>> I think you're missing if (device->cache.pkey_cache) over the for pkey
>>> free loop in ib_cache_release_one.
>>
>> I think you're right, but I think that particular issue exists in both
>> versions of the code (mine and yours).  In particular, a failure during
>> ib_register_device should result in a call to ib_dealloc_device which
>> will result in a call to ib_cache_release_one, which will happen whether
>> ib_cache_setup_one returns an error or not.  So, ib_cache_release_one
>> must check pkey_cache before releasing the ports because it can't assume
>> the original setup succeeded.
>>
>>> Otherwise, the allocation of
>>> device->cache.pkey_cache might has failed and you dereference an invalid
>>> address device->cache.pkey_cache[p].
>>>
>>> In addition, ib_device_release calls ib_cache_release_one with device
>>> instead of dev (which has the wrong type).
>>> ib_register_device calls ib_cache_clean_one that I can't really find :)
>>
>> Both of these turned up in compile tests.  I've since resolved them.
>> Hand merge issues ;-)
>>
>
> I've pushed what I think is a fixed version to github.  Please feel free
> to review.
>
> --
> Doug Ledford 
>   GPG KeyID: 0E572FDD
>
>

I think that after applying Dan Carpenter's "[patch] IB/core: missing
curly braces in ib_find_gid()" it's fine :-)
Thanks.

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API

2015-08-19 Thread Christoph Hellwig
On Wed, Aug 19, 2015 at 02:56:24PM +0300, Sagi Grimberg wrote:
> So I had a go with moving the DMA mapping into ib_map_mr_sg() and
> it turns out mapping somewhat poorly if the ULP _may_ register memory
> or just send sg_lists (like storage targets over IB/iWARP). So the ULP
> will sometimes use the DMA mapping and sometimes it won't... feels
> kinda off to me...

Yes, it's odd.

> it's much saner to do:
> 1. dma_map_sg
> 2. register / send-sg-list
> 3. unregister (if needed)
> 4. dma_unmap_sg
> 
> then:
> 1. if register - call ib_map_mr_sg (which calls dma_map_sg)
>else do dma_map_sg
> 2. if registered - call ib_dma_unmap_sg (which calles dma_unmap_sg)
>else do dma_unmap_sg
> 
> this kinda forces ULP to completely separate these code paths with
> with very little sharing.
> 
> Also, at the moment, when ULPs are doing either FRWR or FMRs
> its a pain to get a non-intrusive conversion.
> 
> I'm thinking we should keep dma_map_sg out of ib_map_mr_sg, and leave
> it to the ULP like it does today (at least in the first stage...)

Keep it out for now.  I think we need to move the dma mapping into
the RDMA care rather sooner than later, but that must also include
ib_post_send/recv, so it's better done separately.

After having a look at the mess some drivers (ipath,qib,hfi & ehca)
cause with abuse of dma_map_ops I've got an even strong opion on
the whole subject now.  However I think we'll get more things done
if we split them into smaller steps.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API

2015-08-19 Thread Sagi Grimberg

On 7/23/2015 8:55 PM, Jason Gunthorpe wrote:

On Thu, Jul 23, 2015 at 01:15:16PM +0300, Sagi Grimberg wrote:

I was hoping we'd move the DMA flush and translate into here and make
it mandatory. Is there any reason not to do that?


The reason I didn't added it in was so the ULPs can make sure they meet
the restrictions of ib_map_mr_sg(). Allow SRP to iterate on his
SG list set partials and iSER to detect gaps (they need to dma map
for that).


The ULP can always get the sg list's virtual address to check for
gaps. Page aligned gaps are always OK.


So I had a go with moving the DMA mapping into ib_map_mr_sg() and
it turns out mapping somewhat poorly if the ULP _may_ register memory
or just send sg_lists (like storage targets over IB/iWARP). So the ULP
will sometimes use the DMA mapping and sometimes it won't... feels
kinda off to me...

it's much saner to do:
1. dma_map_sg
2. register / send-sg-list
3. unregister (if needed)
4. dma_unmap_sg

then:
1. if register - call ib_map_mr_sg (which calls dma_map_sg)
   else do dma_map_sg
2. if registered - call ib_dma_unmap_sg (which calles dma_unmap_sg)
   else do dma_unmap_sg

this kinda forces ULP to completely separate these code paths with
with very little sharing.

Also, at the moment, when ULPs are doing either FRWR or FMRs
its a pain to get a non-intrusive conversion.

I'm thinking we should keep dma_map_sg out of ib_map_mr_sg, and leave
it to the ULP like it does today (at least in the first stage...)

Thoughts?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html