Re: [PATCH V6 1/9] RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth

2015-07-26 Thread Sagi Grimberg

On 7/24/2015 10:14 PM, Jason Gunthorpe wrote:

On Fri, Jul 24, 2015 at 01:40:17PM -0500, Steve Wise wrote:

Huh. How does this relate to the max_page_list_len argument:

  struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)

Shouldn't max_fast_reg_page_list_len be checked during the above?

Ie does this still make sense:

drivers/infiniband/ulp/iser/iser_verbs.c:   desc-data_mr = 
ib_alloc_fast_reg_mr(pd, ISCSI_ISER_SG_TABLESIZE + 1);

?

The only ULP that checks this is SRP, so basically, all our ULPs are
probably quietly broken? cxgb3 has a limit of 10 (!?!?!!)



Yea seems like some drivers need to enforce this in ib_alloc_fast_reg_mr() as 
well as ib_alloc_fast_reg_page_list(), and ULPs need
to not exceed the device max.


Great, Sagi, can you incorporate that in your series so that
ib_alloc_mr's max_entires is checked against
max_fast_reg_page_list_len and EINVAL's if it is too great?


Yes. I'll take care of that.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V6 4/9] svcrdma: Use max_sge_rd for destination read depths

2015-07-26 Thread Christoph Hellwig
On Sun, Jul 26, 2015 at 12:58:59PM +0300, Sagi Grimberg wrote:
 With the above patch change, we have no more users of the recently created 
 rdma_cap_read_multi_sge().  Should I add a patch to remove it?
 
 Yes please.

And in the long run this is another argument for killing the system-wide
REMOTE_WRITE phys MR and require memory registrations for iWarp..
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API

2015-07-26 Thread Sagi Grimberg

On 7/23/2015 10:08 PM, Jason Gunthorpe wrote:

On Thu, Jul 23, 2015 at 01:07:56PM +0300, Sagi Grimberg wrote:

On 7/22/2015 10:05 PM, Jason Gunthorpe wrote:
The reason I named max_entries is because might might not be pages but
real SG elements. It stands for maximum registration entries.

Do you have a better name?


I wouldn't try and be both..

Use 'max_num_sg' and document that no aggregate scatterlist with
length larger than 'max_num_sg*PAGE_SIZE' or with more entries than
max_num_sg can be submitted?

Maybe document with ARB_SG that it is not length limited?


OK, I can do that.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V6 4/9] svcrdma: Use max_sge_rd for destination read depths

2015-07-26 Thread Sagi Grimberg



@@ -1059,6 +1062,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt 
*xprt)
ntohs(((struct sockaddr_in *)newxprt-sc_cm_id-
   route.addr.dst_addr)-sin_port),
newxprt-sc_max_sge,
+   newxprt-sc_max_sge_rd,
newxprt-sc_sq_depth,
newxprt-sc_max_requests,
newxprt-sc_ord);



With the above patch change, we have no more users of the recently created 
rdma_cap_read_multi_sge().  Should I add a patch to remove it?


Yes please.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10] IB: Replace safe uses for ib_get_dma_mr with pd-local_dma_lkey

2015-07-26 Thread Sagi Grimberg



If we want security by default then I propose not only to change the default
value of register_always from false into true but also to change the default
value of prefer_fr from false into true such that fast registration becomes
the default instead of FMR.


Yes, I was frowning at that stuff too.. We are trying to get rid of
FMR, so nothing should prefer it over FRWR...

Sagi, perhaps that belongs in your MR unification series?


I don't see how this fits in.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API

2015-07-26 Thread Sagi Grimberg

On 7/23/2015 9:51 PM, Jason Gunthorpe wrote:

On Thu, Jul 23, 2015 at 07:47:14PM +0300, Sagi Grimberg wrote:


So we force ULPs to think about what they are doing properly, and we
get a chance to actually force lkey to be local use only for IB.


The lkey/rkey decision is passed in the fastreg post_send().


That is too late to check the access flags.


Why? the access permissions are kept in the mr context?


Sure, one could do if (key == mr-lkey) .. check lkey flags in the
post, but that seems silly considering we want the post inlined..


Why should we check the lkey/rkey access flags in the post?




I can move it to the post interface if it makes more sense.
the access is kind of out of place in the mapping routine anyway...


All the dma routines have an access equivalent during map, I don't
think it is out of place..

To my mind, the map is the point where the MR should crystallize into
an rkey or lkey MR, not at the post.


I'm not sure I understand why the lkey/rkey should be set at the map
routine. To me, it seems more natural to map_mr_sg and then either
register the lkey or the rkey.

It's easy enough to move the key arg to ib_map_mr_sg, but I don't see a
good reason why at the moment.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-07-26 Thread Sagi Grimberg

On 7/24/2015 7:18 PM, Steve Wise wrote:

This is in preparation for adding new FRMR-only IO handlers
for devices that support FRMR and not PI.


Steve,

I've given this some thought and I think we should avoid splitting
logic from PI and iWARP. The reason (other than code duplication) is
that currently the iser target support only up to 1MB IOs. I have some
code (not done yet) to support larger IOs by using multiple
registrations  per IO (with or without PI).
With a little tweaking I think we can get iwarp to fit in too...

So, do you mind if I take a crack at it?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-07-26 Thread Christoph Hellwig
On Sun, Jul 26, 2015 at 01:08:16PM +0300, Sagi Grimberg wrote:
 I've given this some thought and I think we should avoid splitting
 logic from PI and iWARP. The reason (other than code duplication) is
 that currently the iser target support only up to 1MB IOs. I have some
 code (not done yet) to support larger IOs by using multiple
 registrations  per IO (with or without PI).

Just curious: How is this going to work with iSER only having a single
rkey/offset/len field?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mlx5: Expose correct page_size_cap in device attributes

2015-07-26 Thread Sagi Grimberg

On 7/24/2015 12:48 AM, Jason Gunthorpe wrote:

On Thu, Jul 23, 2015 at 05:41:38PM -0400, Doug Ledford wrote:


I assume this prevents the driver from working at all on certain arches
(like ppc with 64k page size)?


Nothing uses page_size_cap correctly, so it has no impact.

Sagi, that is a good point, your generic code for the cleanup series
really should check that PAGE_SIZE is in page_size_cap and at least
fail the mr allocation if it isn't...


Yea, that's doable...
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-07-26 Thread Sagi Grimberg

On 7/26/2015 1:43 PM, Christoph Hellwig wrote:

On Sun, Jul 26, 2015 at 01:08:16PM +0300, Sagi Grimberg wrote:

I've given this some thought and I think we should avoid splitting
logic from PI and iWARP. The reason (other than code duplication) is
that currently the iser target support only up to 1MB IOs. I have some
code (not done yet) to support larger IOs by using multiple
registrations  per IO (with or without PI).


Just curious: How is this going to work with iSER only having a single
rkey/offset/len field?



Good question,

On the wire iser sends a single rkey, but the target is allowed to
transfer the data however it wants to.

Say that the local target HCA supports only 32 pages (128K bytes for 4K
pages) registration and the initiator sent:
rkey=0x1234
address=0x
length=512K

The target would allocate a 512K buffer and:
register offset 0-128K to lkey=0x1
register offset 128K-256K to lkey=0x2
register offset 256K-384K to lkey=0x3
register offset 384K-512K to lkey=0x4

then constructs sg_list as:
sg_list[0] = {addr=buf, length=128K, lkey=0x1}
sg_list[1] = {addr=buf+128K, length=128K, lkey=0x2}
sg_list[2] = {addr=buf+256K, length=128K, lkey=0x3}
sg_list[3] = {addr=buf+384K, length=128K, lkey=0x4}

Then set rdma_read wr with:
rdma_r_wr.sg_list=sg_list
rdma_r_wr.rdma.addr=0x
rdma_r_wr.rdma.rkey=0x1234

post_send(rdma_r_wr);

Ideally, the post contains a chain of all 4 registrations and the
rdma_read (and an opportunistic good scsi response).
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API

2015-07-26 Thread Sagi Grimberg


I would like to see the kdoc for ib_map_mr_sg explain exactly what is
required of the caller, maybe just hoist this bit from the
ib_sg_to_pages


I'll add the kdoc.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API

2015-07-26 Thread Sagi Grimberg

On 7/23/2015 8:55 PM, Jason Gunthorpe wrote:

On Thu, Jul 23, 2015 at 01:15:16PM +0300, Sagi Grimberg wrote:

I was hoping we'd move the DMA flush and translate into here and make
it mandatory. Is there any reason not to do that?


The reason I didn't added it in was so the ULPs can make sure they meet
the restrictions of ib_map_mr_sg(). Allow SRP to iterate on his
SG list set partials and iSER to detect gaps (they need to dma map
for that).


The ULP can always get the sg list's virtual address to check for
gaps. Page aligned gaps are always OK.


I guess I can pull DMA mapping in there, but we will need an opposite
routine ib_umap_mr_sg() since it'll be weird if the ULP will do dma
unmap without doing the map...



BTW, the logic in ib_sg_to_pages should be checking that directly, as
coded, it won't work with swiotlb:

// Only the first SG entry can start unaligned
if (i  page_addr != dma_addr)
 return EINVAL;
// Only the last SG entry can end unaligned
if ((page_addr + dma_len)  PAGE_MASK != end_dma_addr)
  if (!is_last)
  return EINVAL;

Don't use sg-offset after dma mapping.

The biggest problem with checking the virtual address is
swiotlb. However, if swiotlb is used this API is basically broken as
swiotlb downgrades everything to a 2k alignment, which means we only
ever get 1 s/g entry.


Can you explain what do you mean by downgrades everything to a 2k 
alignment? If the ULP is responsible for a PAGE_SIZE alignment than

how would this get out of alignment with swiotlb?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-07-26 Thread Sagi Grimberg



Ideally, the post contains a chain of all 4 registrations and the
rdma_read (and an opportunistic good scsi response).


Just to be clear: This example is for IB only, correct?  IW would
require rkeys with REMOTE_WRITE and 4 read wrs.


My assumption is that it would depend on max_sge_rd.

IB only? iWARP by definition isn't capable of doing rdma_read to
more than one scatter? Anyway, we'll need to calculate the number
of RDMA_READs.


And you're ignoring invalidation wrs (or read-with-inv) in the example...


Yes, didn't want to inflate the example too much...
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 12/13] IB/cma: Share ib_cm_ids between rdma_cm_ids

2015-07-26 Thread Haggai Eran
Use ib_cm_insert_listen to create listening IB CM IDs or share existing
ones if needed. When given a request on a specific CM ID, the code now
matches the request to the RDMA CM ID based on the request parameters, so
it no longer needs to rely on the ib_cm's private data matching
capabilities.

Signed-off-by: Haggai Eran hagg...@mellanox.com
---
 drivers/infiniband/core/cma.c | 60 ---
 1 file changed, 5 insertions(+), 55 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 1c43b58a8eb2..ca547ff2bb95 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1765,42 +1765,6 @@ __be64 rdma_get_service_id(struct rdma_cm_id *id, struct 
sockaddr *addr)
 }
 EXPORT_SYMBOL(rdma_get_service_id);
 
-static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr 
*addr,
-struct ib_cm_compare_data *compare)
-{
-   struct cma_hdr *cma_data, *cma_mask;
-   __be32 ip4_addr;
-   struct in6_addr ip6_addr;
-
-   memset(compare, 0, sizeof *compare);
-   cma_data = (void *) compare-data;
-   cma_mask = (void *) compare-mask;
-
-   switch (addr-sa_family) {
-   case AF_INET:
-   ip4_addr = ((struct sockaddr_in *) addr)-sin_addr.s_addr;
-   cma_set_ip_ver(cma_data, 4);
-   cma_set_ip_ver(cma_mask, 0xF);
-   if (!cma_any_addr(addr)) {
-   cma_data-dst_addr.ip4.addr = ip4_addr;
-   cma_mask-dst_addr.ip4.addr = htonl(~0);
-   }
-   break;
-   case AF_INET6:
-   ip6_addr = ((struct sockaddr_in6 *) addr)-sin6_addr;
-   cma_set_ip_ver(cma_data, 6);
-   cma_set_ip_ver(cma_mask, 0xF);
-   if (!cma_any_addr(addr)) {
-   cma_data-dst_addr.ip6 = ip6_addr;
-   memset(cma_mask-dst_addr.ip6, 0xFF,
-  sizeof cma_mask-dst_addr.ip6);
-   }
-   break;
-   default:
-   break;
-   }
-}
-
 static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event)
 {
struct rdma_id_private *id_priv = iw_id-context;
@@ -1954,33 +1918,19 @@ out:
 
 static int cma_ib_listen(struct rdma_id_private *id_priv)
 {
-   struct ib_cm_compare_data compare_data;
struct sockaddr *addr;
struct ib_cm_id *id;
__be64 svc_id;
-   int ret;
 
-   id = ib_create_cm_id(id_priv-id.device, cma_req_handler, id_priv);
+   addr = cma_src_addr(id_priv);
+   svc_id = rdma_get_service_id(id_priv-id, addr);
+   id = ib_cm_insert_listen(id_priv-id.device, cma_req_handler, svc_id,
+0);
if (IS_ERR(id))
return PTR_ERR(id);
-
id_priv-cm_id.ib = id;
 
-   addr = cma_src_addr(id_priv);
-   svc_id = rdma_get_service_id(id_priv-id, addr);
-   if (cma_any_addr(addr)  !id_priv-afonly)
-   ret = ib_cm_listen(id_priv-cm_id.ib, svc_id, 0, NULL);
-   else {
-   cma_set_compare_data(id_priv-id.ps, addr, compare_data);
-   ret = ib_cm_listen(id_priv-cm_id.ib, svc_id, 0, compare_data);
-   }
-
-   if (ret) {
-   ib_destroy_cm_id(id_priv-cm_id.ib);
-   id_priv-cm_id.ib = NULL;
-   }
-
-   return ret;
+   return 0;
 }
 
 static int cma_iw_listen(struct rdma_id_private *id_priv, int backlog)
-- 
1.7.11.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 09/13] IB/cma: Add net_dev and private data checks to RDMA CM

2015-07-26 Thread Haggai Eran
Instead of relying on a the ib_cm module to check an incoming CM request's
private data header, add these checks to the RDMA CM module. This allows a
following patch to to clean up the ib_cm interface and remove the code that
looks into the private headers. It will also allow supporting namespaces in
RDMA CM by making these checks namespace aware later on.

Signed-off-by: Haggai Eran hagg...@mellanox.com
---
 drivers/infiniband/core/cma.c | 184 +-
 1 file changed, 181 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index f2d799209412..ed3d63ad94ac 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -300,7 +300,7 @@ static enum rdma_cm_state cma_exch(struct rdma_id_private 
*id_priv,
return old;
 }
 
-static inline u8 cma_get_ip_ver(struct cma_hdr *hdr)
+static inline u8 cma_get_ip_ver(const struct cma_hdr *hdr)
 {
return hdr-ip_version  4;
 }
@@ -1016,7 +1016,7 @@ static int cma_save_ip_info(struct sockaddr *src_addr,
cma_save_ip6_info(src_addr, dst_addr, hdr, port);
break;
default:
-   return -EINVAL;
+   return -EAFNOSUPPORT;
}
 
return 0;
@@ -1040,6 +1040,181 @@ static int cma_save_net_info(struct sockaddr *src_addr,
return cma_save_ip_info(src_addr, dst_addr, ib_event, service_id);
 }
 
+struct cma_req_info {
+   struct ib_device *device;
+   int port;
+   const union ib_gid *local_gid;
+   __be64 service_id;
+   u16 pkey;
+};
+
+static int cma_save_req_info(const struct ib_cm_event *ib_event,
+struct cma_req_info *req)
+{
+   const struct ib_cm_req_event_param *req_param =
+   ib_event-param.req_rcvd;
+   const struct ib_cm_sidr_req_event_param *sidr_param =
+   ib_event-param.sidr_req_rcvd;
+
+   switch (ib_event-event) {
+   case IB_CM_REQ_RECEIVED:
+   req-device = req_param-listen_id-device;
+   req-port   = req_param-port;
+   req-local_gid  = req_param-primary_path-sgid;
+   req-service_id = req_param-primary_path-service_id;
+   req-pkey   = req_param-bth_pkey;
+   break;
+   case IB_CM_SIDR_REQ_RECEIVED:
+   req-device = sidr_param-listen_id-device;
+   req-port   = sidr_param-port;
+   req-local_gid  = NULL;
+   req-service_id = sidr_param-service_id;
+   req-pkey   = sidr_param-bth_pkey;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+static struct net_device *cma_get_net_dev(struct ib_cm_event *ib_event,
+ const struct cma_req_info *req)
+{
+   struct sockaddr_storage listen_addr_storage;
+   struct sockaddr *listen_addr = (struct sockaddr *)listen_addr_storage;
+   struct net_device *net_dev;
+   int err;
+
+   err = cma_save_ip_info(listen_addr, NULL, ib_event, req-service_id);
+   if (err)
+   return ERR_PTR(err);
+
+   net_dev = ib_get_net_dev_by_params(req-device, req-port, req-pkey,
+  req-local_gid, listen_addr);
+   if (!net_dev)
+   return ERR_PTR(-ENODEV);
+
+   return net_dev;
+}
+
+static enum rdma_port_space rdma_ps_from_service_id(__be64 service_id)
+{
+   return (be64_to_cpu(service_id)  16)  0x;
+}
+
+static bool cma_match_private_data(struct rdma_id_private *id_priv,
+  const struct cma_hdr *hdr)
+{
+   struct sockaddr *addr = cma_src_addr(id_priv);
+   __be32 ip4_addr;
+   struct in6_addr ip6_addr;
+
+   if (cma_any_addr(addr)  !id_priv-afonly)
+   return true;
+
+   switch (addr-sa_family) {
+   case AF_INET:
+   ip4_addr = ((struct sockaddr_in *)addr)-sin_addr.s_addr;
+   if (cma_get_ip_ver(hdr) != 4)
+   return false;
+   if (!cma_any_addr(addr) 
+   hdr-dst_addr.ip4.addr != ip4_addr)
+   return false;
+   break;
+   case AF_INET6:
+   ip6_addr = ((struct sockaddr_in6 *)addr)-sin6_addr;
+   if (cma_get_ip_ver(hdr) != 6)
+   return false;
+   if (!cma_any_addr(addr) 
+   memcmp(hdr-dst_addr.ip6, ip6_addr, sizeof(ip6_addr)))
+   return false;
+   break;
+   case AF_IB:
+   return true;
+   default:
+   return false;
+   }
+
+   return true;
+}
+
+static bool cma_match_net_dev(const struct rdma_id_private *id_priv,
+ const struct net_device *net_dev)
+{
+   const struct rdma_addr *addr = id_priv-id.route.addr;
+
+   if (!net_dev)
+   /* 

Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-07-26 Thread Steve Wise

On 7/26/2015 12:40 PM, Sagi Grimberg wrote:



Ideally, the post contains a chain of all 4 registrations and the
rdma_read (and an opportunistic good scsi response).


Just to be clear: This example is for IB only, correct?  IW would
require rkeys with REMOTE_WRITE and 4 read wrs.


My assumption is that it would depend on max_sge_rd.



yea.


IB only? iWARP by definition isn't capable of doing rdma_read to
more than one scatter? Anyway, we'll need to calculate the number
of RDMA_READs.



The wire protocol limits the destination to a single stg/to/len (aka 
rkey/addr/len).  Devices/fw/sw could implement some magic to support a 
single stg/to/len that maps to a scatter gather list of stags/tos/lens.


And you're ignoring invalidation wrs (or read-with-inv) in the 
example...


Yes, didn't want to inflate the example too much...


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-07-26 Thread Steve Wise

On 7/26/2015 5:08 AM, Sagi Grimberg wrote:

On 7/24/2015 7:18 PM, Steve Wise wrote:

This is in preparation for adding new FRMR-only IO handlers
for devices that support FRMR and not PI.


Steve,

I've given this some thought and I think we should avoid splitting
logic from PI and iWARP. The reason (other than code duplication) is
that currently the iser target support only up to 1MB IOs. I have some
code (not done yet) to support larger IOs by using multiple
registrations  per IO (with or without PI).
With a little tweaking I think we can get iwarp to fit in too...

So, do you mind if I take a crack at it?


Sure, go ahead.  Let me know how I can help.  Certainly I can test it 
for you.  I'm very keen to get this in for 4.3 if possible...



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 13/13] IB/cm: Remove compare_data checks

2015-07-26 Thread Haggai Eran
Now that there are no ib_cm clients using the compare_data feature for
matching IB CM requests' private data, remove the compare_data parameter of
ib_cm_listen and remove the code implementing the feature.

Signed-off-by: Haggai Eran hagg...@mellanox.com
---
 drivers/infiniband/core/cm.c| 109 ++--
 drivers/infiniband/core/ucm.c   |   3 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |   2 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c   |   2 +-
 include/rdma/ib_cm.h|  14 +---
 5 files changed, 23 insertions(+), 107 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index a05c17b336aa..73803a55edd6 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -222,7 +222,6 @@ struct cm_id_private {
/* todo: use alternate port on send failure */
struct cm_av av;
struct cm_av alt_av;
-   struct ib_cm_compare_data *compare_data;
 
void *private_data;
__be64 tid;
@@ -443,40 +442,6 @@ static struct cm_id_private * cm_acquire_id(__be32 
local_id, __be32 remote_id)
return cm_id_priv;
 }
 
-static void cm_mask_copy(u32 *dst, const u32 *src, const u32 *mask)
-{
-   int i;
-
-   for (i = 0; i  IB_CM_COMPARE_SIZE; i++)
-   dst[i] = src[i]  mask[i];
-}
-
-static int cm_compare_data(struct ib_cm_compare_data *src_data,
-  struct ib_cm_compare_data *dst_data)
-{
-   u32 src[IB_CM_COMPARE_SIZE];
-   u32 dst[IB_CM_COMPARE_SIZE];
-
-   if (!src_data || !dst_data)
-   return 0;
-
-   cm_mask_copy(src, src_data-data, dst_data-mask);
-   cm_mask_copy(dst, dst_data-data, src_data-mask);
-   return memcmp(src, dst, sizeof(src));
-}
-
-static int cm_compare_private_data(u32 *private_data,
-  struct ib_cm_compare_data *dst_data)
-{
-   u32 src[IB_CM_COMPARE_SIZE];
-
-   if (!dst_data)
-   return 0;
-
-   cm_mask_copy(src, private_data, dst_data-mask);
-   return memcmp(src, dst_data-data, sizeof(src));
-}
-
 /*
  * Trivial helpers to strip endian annotation and compare; the
  * endianness doesn't actually matter since we just need a stable
@@ -509,18 +474,14 @@ static struct cm_id_private * cm_insert_listen(struct 
cm_id_private *cm_id_priv)
struct cm_id_private *cur_cm_id_priv;
__be64 service_id = cm_id_priv-id.service_id;
__be64 service_mask = cm_id_priv-id.service_mask;
-   int data_cmp;
 
while (*link) {
parent = *link;
cur_cm_id_priv = rb_entry(parent, struct cm_id_private,
  service_node);
-   data_cmp = cm_compare_data(cm_id_priv-compare_data,
-  cur_cm_id_priv-compare_data);
if ((cur_cm_id_priv-id.service_mask  service_id) ==
(service_mask  cur_cm_id_priv-id.service_id) 
-   (cm_id_priv-id.device == cur_cm_id_priv-id.device) 
-   !data_cmp)
+   (cm_id_priv-id.device == cur_cm_id_priv-id.device))
return cur_cm_id_priv;
 
if (cm_id_priv-id.device  cur_cm_id_priv-id.device)
@@ -531,8 +492,6 @@ static struct cm_id_private * cm_insert_listen(struct 
cm_id_private *cm_id_priv)
link = (*link)-rb_left;
else if (be64_gt(service_id, cur_cm_id_priv-id.service_id))
link = (*link)-rb_right;
-   else if (data_cmp  0)
-   link = (*link)-rb_left;
else
link = (*link)-rb_right;
}
@@ -542,20 +501,16 @@ static struct cm_id_private * cm_insert_listen(struct 
cm_id_private *cm_id_priv)
 }
 
 static struct cm_id_private * cm_find_listen(struct ib_device *device,
-__be64 service_id,
-u32 *private_data)
+__be64 service_id)
 {
struct rb_node *node = cm.listen_service_table.rb_node;
struct cm_id_private *cm_id_priv;
-   int data_cmp;
 
while (node) {
cm_id_priv = rb_entry(node, struct cm_id_private, service_node);
-   data_cmp = cm_compare_private_data(private_data,
-  cm_id_priv-compare_data);
if ((cm_id_priv-id.service_mask  service_id) ==
 cm_id_priv-id.service_id 
-   (cm_id_priv-id.device == device)  !data_cmp)
+   (cm_id_priv-id.device == device))
return cm_id_priv;
 
if (device  cm_id_priv-id.device)
@@ -566,8 +521,6 @@ static struct cm_id_private * cm_find_listen(struct 
ib_device *device,
node = node-rb_left;
else if (be64_gt(service_id, 

[PATCH v2 10/13] IB/cma: Validate routing of incoming requests

2015-07-26 Thread Haggai Eran
Pass incoming request parameters through the relevant IPv4/IPv6 routing
tables and make sure the network stack is configured to handle such
requests.

Signed-off-by: Haggai Eran hagg...@mellanox.com
---
 drivers/infiniband/core/cma.c | 95 +--
 1 file changed, 92 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index ed3d63ad94ac..42f412fde064 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -46,6 +46,8 @@
 
 #include net/tcp.h
 #include net/ipv6.h
+#include net/ip_fib.h
+#include net/ip6_route.h
 
 #include rdma/rdma_cm.h
 #include rdma/rdma_cm_ib.h
@@ -1078,15 +1080,97 @@ static int cma_save_req_info(const struct ib_cm_event 
*ib_event,
return 0;
 }
 
+static bool validate_ipv4_net_dev(struct net_device *net_dev,
+ const struct sockaddr_in *dst_addr,
+ const struct sockaddr_in *src_addr)
+{
+   __be32 daddr = dst_addr-sin_addr.s_addr,
+  saddr = src_addr-sin_addr.s_addr;
+   struct fib_result res;
+   struct flowi4 fl4;
+   int err;
+   bool ret;
+
+   if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr) ||
+   ipv4_is_lbcast(daddr) || ipv4_is_zeronet(saddr) ||
+   ipv4_is_zeronet(daddr) || ipv4_is_loopback(daddr) ||
+   ipv4_is_loopback(saddr))
+   return false;
+
+   memset(fl4, 0, sizeof(fl4));
+   fl4.flowi4_iif = net_dev-ifindex;
+   fl4.daddr = daddr;
+   fl4.saddr = saddr;
+
+   rcu_read_lock();
+   err = fib_lookup(dev_net(net_dev), fl4, res, 0);
+   if (err)
+   return false;
+
+   ret = FIB_RES_DEV(res) == net_dev;
+   rcu_read_unlock();
+
+   return ret;
+}
+
+static bool validate_ipv6_net_dev(struct net_device *net_dev,
+ const struct sockaddr_in6 *dst_addr,
+ const struct sockaddr_in6 *src_addr)
+{
+#if IS_ENABLED(CONFIG_IPV6)
+   const int strict = ipv6_addr_type(dst_addr-sin6_addr) 
+  IPV6_ADDR_LINKLOCAL;
+   struct rt6_info *rt = rt6_lookup(dev_net(net_dev), dst_addr-sin6_addr,
+src_addr-sin6_addr, net_dev-ifindex,
+strict);
+   bool ret;
+
+   if (!rt)
+   return false;
+
+   ret = rt-rt6i_idev-dev == net_dev;
+   ip6_rt_put(rt);
+
+   return ret;
+#else
+   return false;
+#endif
+}
+
+static bool validate_net_dev(struct net_device *net_dev,
+const struct sockaddr *daddr,
+const struct sockaddr *saddr)
+{
+   const struct sockaddr_in *daddr4 = (const struct sockaddr_in *)daddr;
+   const struct sockaddr_in *saddr4 = (const struct sockaddr_in *)saddr;
+   const struct sockaddr_in6 *daddr6 = (const struct sockaddr_in6 *)daddr;
+   const struct sockaddr_in6 *saddr6 = (const struct sockaddr_in6 *)saddr;
+
+   switch (daddr-sa_family) {
+   case AF_INET:
+   return saddr-sa_family == AF_INET 
+  validate_ipv4_net_dev(net_dev, daddr4, saddr4);
+
+   case AF_INET6:
+   return saddr-sa_family == AF_INET6 
+  validate_ipv6_net_dev(net_dev, daddr6, saddr6);
+
+   default:
+   return false;
+   }
+}
+
 static struct net_device *cma_get_net_dev(struct ib_cm_event *ib_event,
  const struct cma_req_info *req)
 {
-   struct sockaddr_storage listen_addr_storage;
-   struct sockaddr *listen_addr = (struct sockaddr *)listen_addr_storage;
+   struct sockaddr_storage listen_addr_storage, src_addr_storage;
+   struct sockaddr *listen_addr = (struct sockaddr *)listen_addr_storage,
+   *src_addr = (struct sockaddr *)src_addr_storage;
struct net_device *net_dev;
int err;
 
-   err = cma_save_ip_info(listen_addr, NULL, ib_event, req-service_id);
+   err = cma_save_ip_info(listen_addr, src_addr, ib_event,
+  req-service_id);
if (err)
return ERR_PTR(err);
 
@@ -1095,6 +1179,11 @@ static struct net_device *cma_get_net_dev(struct 
ib_cm_event *ib_event,
if (!net_dev)
return ERR_PTR(-ENODEV);
 
+   if (!validate_net_dev(net_dev, listen_addr, src_addr)) {
+   dev_put(net_dev);
+   return ERR_PTR(-EHOSTUNREACH);
+   }
+
return net_dev;
 }
 
-- 
1.7.11.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 01/13] IB/core: lock client data with lists_rwsem

2015-07-26 Thread Haggai Eran
An ib_client callback that is called with the lists_rwsem locked only for
read is protected from changes to the IB client lists, but not from
ib_unregister_device() freeing its client data. This is because
ib_unregister_device() will remove the device from the device list with
lists_rwsem locked for write, but perform the rest of the cleanup,
including the call to remove() without that lock.

Mark client data that is undergoing de-registration with a new going_down
flag in the client data context. Lock the client data list with lists_rwsem
for write in addition to using the spinlock, so that functions calling the
callback would be able to lock only lists_rwsem for read and let callbacks
sleep.

Since ib_unregister_client() now marks the client data context, no need for
remove() to search the context again, so pass the client data directly to
remove() callbacks.

Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
Signed-off-by: Haggai Eran hagg...@mellanox.com
---
 drivers/infiniband/core/cache.c   |  2 +-
 drivers/infiniband/core/cm.c  |  7 ++--
 drivers/infiniband/core/cma.c |  7 ++--
 drivers/infiniband/core/device.c  | 53 +--
 drivers/infiniband/core/mad.c |  2 +-
 drivers/infiniband/core/multicast.c   |  7 ++--
 drivers/infiniband/core/sa_query.c|  6 ++--
 drivers/infiniband/core/ucm.c |  6 ++--
 drivers/infiniband/core/user_mad.c|  6 ++--
 drivers/infiniband/core/uverbs_main.c |  6 ++--
 drivers/infiniband/ulp/ipoib/ipoib_main.c |  7 ++--
 drivers/infiniband/ulp/srp/ib_srp.c   |  6 ++--
 drivers/infiniband/ulp/srpt/ib_srpt.c |  5 ++-
 include/rdma/ib_verbs.h   |  4 ++-
 net/rds/ib.c  |  5 ++-
 net/rds/iw.c  |  5 ++-
 16 files changed, 82 insertions(+), 52 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 871da832d016..c93af66cc091 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -394,7 +394,7 @@ err:
kfree(device-cache.lmc_cache);
 }
 
-static void ib_cache_cleanup_one(struct ib_device *device)
+static void ib_cache_cleanup_one(struct ib_device *device, void *client_data)
 {
int p;
 
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 3a972ebf3c0d..82d5c4362aa8 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -58,7 +58,7 @@ MODULE_DESCRIPTION(InfiniBand CM);
 MODULE_LICENSE(Dual BSD/GPL);
 
 static void cm_add_one(struct ib_device *device);
-static void cm_remove_one(struct ib_device *device);
+static void cm_remove_one(struct ib_device *device, void *client_data);
 
 static struct ib_client cm_client = {
.name   = cm,
@@ -3886,9 +3886,9 @@ free:
kfree(cm_dev);
 }
 
-static void cm_remove_one(struct ib_device *ib_device)
+static void cm_remove_one(struct ib_device *ib_device, void *client_data)
 {
-   struct cm_device *cm_dev;
+   struct cm_device *cm_dev = client_data;
struct cm_port *port;
struct ib_port_modify port_modify = {
.clr_port_cap_mask = IB_PORT_CM_SUP
@@ -3896,7 +3896,6 @@ static void cm_remove_one(struct ib_device *ib_device)
unsigned long flags;
int i;
 
-   cm_dev = ib_get_client_data(ib_device, cm_client);
if (!cm_dev)
return;
 
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 143ded2bbe7c..6b6cdfa5d231 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -94,7 +94,7 @@ const char *rdma_event_msg(enum rdma_cm_event_type event)
 EXPORT_SYMBOL(rdma_event_msg);
 
 static void cma_add_one(struct ib_device *device);
-static void cma_remove_one(struct ib_device *device);
+static void cma_remove_one(struct ib_device *device, void *client_data);
 
 static struct ib_client cma_client = {
.name   = cma,
@@ -3551,11 +3551,10 @@ static void cma_process_remove(struct cma_device 
*cma_dev)
wait_for_completion(cma_dev-comp);
 }
 
-static void cma_remove_one(struct ib_device *device)
+static void cma_remove_one(struct ib_device *device, void *client_data)
 {
-   struct cma_device *cma_dev;
+   struct cma_device *cma_dev = client_data;
 
-   cma_dev = ib_get_client_data(device, cma_client);
if (!cma_dev)
return;
 
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index f08d438205ed..623d8e191ced 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -50,6 +50,9 @@ struct ib_client_data {
struct list_head  list;
struct ib_client *client;
void *data;
+   /* The device or client is going down. Do not call client or device
+* callbacks other than remove(). */
+   bool  going_down;
 };
 
 struct workqueue_struct *ib_wq;

[PATCH v2 06/13] IB/cma: Refactor RDMA IP CM private-data parsing code

2015-07-26 Thread Haggai Eran
When receiving a connection request, rdma_cm needs to associate the request
with a network device, in order to disambiguate requests. To do this, it
needs to know the request's destination IP. For this the module needs to
allow getting this information from the private data in the request packet,
instead of relying on the information already being in the listening RDMA
CM ID.

When creating a new incoming connection ID, the code in
cma_save_ip{4,6}_info can no longer rely on the listener's private data to
find the port number, so it reads it from the requested service ID.

Signed-off-by: Guy Shapiro gu...@mellanox.com
Signed-off-by: Haggai Eran hagg...@mellanox.com
Signed-off-by: Yotam Kenneth yota...@mellanox.com
Signed-off-by: Shachar Raindel rain...@mellanox.com
---
 drivers/infiniband/core/cma.c | 170 ++
 1 file changed, 105 insertions(+), 65 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 6b6cdfa5d231..cf5c48b0b7d5 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -870,107 +870,138 @@ static inline int cma_any_port(struct sockaddr *addr)
return !cma_port(addr);
 }
 
-static void cma_save_ib_info(struct rdma_cm_id *id, struct rdma_cm_id 
*listen_id,
+static void cma_save_ib_info(struct sockaddr *src_addr,
+struct sockaddr *dst_addr,
+struct rdma_cm_id *listen_id,
 struct ib_sa_path_rec *path)
 {
struct sockaddr_ib *listen_ib, *ib;
 
listen_ib = (struct sockaddr_ib *) listen_id-route.addr.src_addr;
-   ib = (struct sockaddr_ib *) id-route.addr.src_addr;
-   ib-sib_family = listen_ib-sib_family;
-   if (path) {
-   ib-sib_pkey = path-pkey;
-   ib-sib_flowinfo = path-flow_label;
-   memcpy(ib-sib_addr, path-sgid, 16);
-   } else {
-   ib-sib_pkey = listen_ib-sib_pkey;
-   ib-sib_flowinfo = listen_ib-sib_flowinfo;
-   ib-sib_addr = listen_ib-sib_addr;
-   }
-   ib-sib_sid = listen_ib-sib_sid;
-   ib-sib_sid_mask = cpu_to_be64(0xULL);
-   ib-sib_scope_id = listen_ib-sib_scope_id;
-
-   if (path) {
-   ib = (struct sockaddr_ib *) id-route.addr.dst_addr;
-   ib-sib_family = listen_ib-sib_family;
-   ib-sib_pkey = path-pkey;
-   ib-sib_flowinfo = path-flow_label;
-   memcpy(ib-sib_addr, path-dgid, 16);
+   if (src_addr) {
+   ib = (struct sockaddr_ib *)src_addr;
+   ib-sib_family = AF_IB;
+   if (path) {
+   ib-sib_pkey = path-pkey;
+   ib-sib_flowinfo = path-flow_label;
+   memcpy(ib-sib_addr, path-sgid, 16);
+   ib-sib_sid = path-service_id;
+   ib-sib_scope_id = 0;
+   } else {
+   ib-sib_pkey = listen_ib-sib_pkey;
+   ib-sib_flowinfo = listen_ib-sib_flowinfo;
+   ib-sib_addr = listen_ib-sib_addr;
+   ib-sib_sid = listen_ib-sib_sid;
+   ib-sib_scope_id = listen_ib-sib_scope_id;
+   }
+   ib-sib_sid_mask = cpu_to_be64(0xULL);
+   }
+   if (dst_addr) {
+   ib = (struct sockaddr_ib *)dst_addr;
+   ib-sib_family = AF_IB;
+   if (path) {
+   ib-sib_pkey = path-pkey;
+   ib-sib_flowinfo = path-flow_label;
+   memcpy(ib-sib_addr, path-dgid, 16);
+   }
}
 }
 
-static __be16 ss_get_port(const struct sockaddr_storage *ss)
-{
-   if (ss-ss_family == AF_INET)
-   return ((struct sockaddr_in *)ss)-sin_port;
-   else if (ss-ss_family == AF_INET6)
-   return ((struct sockaddr_in6 *)ss)-sin6_port;
-   BUG();
-}
-
-static void cma_save_ip4_info(struct rdma_cm_id *id, struct rdma_cm_id 
*listen_id,
- struct cma_hdr *hdr)
+static void cma_save_ip4_info(struct sockaddr *src_addr,
+ struct sockaddr *dst_addr,
+ struct cma_hdr *hdr,
+ __be16 local_port)
 {
struct sockaddr_in *ip4;
 
-   ip4 = (struct sockaddr_in *) id-route.addr.src_addr;
-   ip4-sin_family = AF_INET;
-   ip4-sin_addr.s_addr = hdr-dst_addr.ip4.addr;
-   ip4-sin_port = ss_get_port(listen_id-route.addr.src_addr);
+   if (src_addr) {
+   ip4 = (struct sockaddr_in *)src_addr;
+   ip4-sin_family = AF_INET;
+   ip4-sin_addr.s_addr = hdr-dst_addr.ip4.addr;
+   ip4-sin_port = local_port;
+   }
 
-   ip4 = (struct sockaddr_in *) id-route.addr.dst_addr;
-   ip4-sin_family = AF_INET;
-   ip4-sin_addr.s_addr = 

[PATCH v2 04/13] IB/cm: Expose service ID in request events

2015-07-26 Thread Haggai Eran
Expose the service ID on an incoming CM or SIDR request to the event
handler. This will allow the RDMA CM module to de-multiplex connection
requests based on the information encoded in the service ID.

Acked-by: Sean Hefty sean.he...@intel.com
Signed-off-by: Haggai Eran hagg...@mellanox.com
---
 drivers/infiniband/core/cm.c | 3 +++
 include/rdma/ib_cm.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 82d5c4362aa8..93e9e2f34fc6 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1268,6 +1268,7 @@ static void cm_format_paths_from_req(struct cm_req_msg 
*req_msg,
primary_path-packet_life_time =
cm_req_get_primary_local_ack_timeout(req_msg);
primary_path-packet_life_time -= (primary_path-packet_life_time  0);
+   primary_path-service_id = req_msg-service_id;
 
if (req_msg-alt_local_lid) {
memset(alt_path, 0, sizeof *alt_path);
@@ -1289,6 +1290,7 @@ static void cm_format_paths_from_req(struct cm_req_msg 
*req_msg,
alt_path-packet_life_time =
cm_req_get_alt_local_ack_timeout(req_msg);
alt_path-packet_life_time -= (alt_path-packet_life_time  0);
+   alt_path-service_id = req_msg-service_id;
}
 }
 
@@ -2992,6 +2994,7 @@ static void cm_format_sidr_req_event(struct cm_work *work,
param = work-cm_event.param.sidr_req_rcvd;
param-pkey = __be16_to_cpu(sidr_req_msg-pkey);
param-listen_id = listen_id;
+   param-service_id = sidr_req_msg-service_id;
param-port = work-port-port_num;
work-cm_event.private_data = sidr_req_msg-private_data;
 }
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index 39ed2d2fbd51..1b567bbc3ad4 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -223,6 +223,7 @@ struct ib_cm_apr_event_param {
 
 struct ib_cm_sidr_req_event_param {
struct ib_cm_id *listen_id;
+   __be64  service_id;
u8  port;
u16 pkey;
 };
-- 
1.7.11.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 06/15] xprtrdma: Clean up rpcrdma_ia_open()

2015-07-26 Thread Christoph Hellwig
Jason has patches that provide a local_dma_lkey in the PD that is always
available.  Do you need this clean up for the next merge window?  If not
it might be worth to postponed it to avoid merge conflicts, specially
as I assume the NFS changes will go in through Trond.

On Mon, Jul 20, 2015 at 03:03:20PM -0400, Chuck Lever wrote:
 Untangle the end of rpcrdma_ia_open() by moving DMA MR set-up, which
 is different for each registration method, to the .ro_open functions.
 
 This is refactoring only. No behavior change is expected.
 
 Signed-off-by: Chuck Lever chuck.le...@oracle.com
 Tested-by: Devesh Sharma devesh.sha...@avagotech.com
 ---
  net/sunrpc/xprtrdma/fmr_ops.c  |   19 +++
  net/sunrpc/xprtrdma/frwr_ops.c |5 +++
  net/sunrpc/xprtrdma/physical_ops.c |   25 ++-
  net/sunrpc/xprtrdma/verbs.c|   60 
 +++-
  net/sunrpc/xprtrdma/xprt_rdma.h|3 +-
  5 files changed, 67 insertions(+), 45 deletions(-)
 
 diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
 index f1e8daf..cb25c89 100644
 --- a/net/sunrpc/xprtrdma/fmr_ops.c
 +++ b/net/sunrpc/xprtrdma/fmr_ops.c
 @@ -39,6 +39,25 @@ static int
  fmr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
   struct rpcrdma_create_data_internal *cdata)
  {
 + struct ib_device_attr *devattr = ia-ri_devattr;
 + struct ib_mr *mr;
 +
 + /* Obtain an lkey to use for the regbufs, which are
 +  * protected from remote access.
 +  */
 + if (devattr-device_cap_flags  IB_DEVICE_LOCAL_DMA_LKEY) {
 + ia-ri_dma_lkey = ia-ri_device-local_dma_lkey;
 + } else {
 + mr = ib_get_dma_mr(ia-ri_pd, IB_ACCESS_LOCAL_WRITE);
 + if (IS_ERR(mr)) {
 + pr_err(%s: ib_get_dma_mr for failed with %lX\n,
 +__func__, PTR_ERR(mr));
 + return -ENOMEM;
 + }
 + ia-ri_dma_lkey = ia-ri_dma_mr-lkey;
 + ia-ri_dma_mr = mr;
 + }
 +
   return 0;
  }
  
 diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
 index 04ea914..63f282e 100644
 --- a/net/sunrpc/xprtrdma/frwr_ops.c
 +++ b/net/sunrpc/xprtrdma/frwr_ops.c
 @@ -189,6 +189,11 @@ frwr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep 
 *ep,
   struct ib_device_attr *devattr = ia-ri_devattr;
   int depth, delta;
  
 + /* Obtain an lkey to use for the regbufs, which are
 +  * protected from remote access.
 +  */
 + ia-ri_dma_lkey = ia-ri_device-local_dma_lkey;
 +
   ia-ri_max_frmr_depth =
   min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
 devattr-max_fast_reg_page_list_len);
 diff --git a/net/sunrpc/xprtrdma/physical_ops.c 
 b/net/sunrpc/xprtrdma/physical_ops.c
 index 41985d0..72cf8b1 100644
 --- a/net/sunrpc/xprtrdma/physical_ops.c
 +++ b/net/sunrpc/xprtrdma/physical_ops.c
 @@ -23,6 +23,29 @@ static int
  physical_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
struct rpcrdma_create_data_internal *cdata)
  {
 + struct ib_device_attr *devattr = ia-ri_devattr;
 + struct ib_mr *mr;
 +
 + /* Obtain an rkey to use for RPC data payloads.
 +  */
 + mr = ib_get_dma_mr(ia-ri_pd,
 +IB_ACCESS_LOCAL_WRITE |
 +IB_ACCESS_REMOTE_WRITE |
 +IB_ACCESS_REMOTE_READ);
 + if (IS_ERR(mr)) {
 + pr_err(%s: ib_get_dma_mr for failed with %lX\n,
 +__func__, PTR_ERR(mr));
 + return -ENOMEM;
 + }
 + ia-ri_dma_mr = mr;
 +
 + /* Obtain an lkey to use for regbufs.
 +  */
 + if (devattr-device_cap_flags  IB_DEVICE_LOCAL_DMA_LKEY)
 + ia-ri_dma_lkey = ia-ri_device-local_dma_lkey;
 + else
 + ia-ri_dma_lkey = ia-ri_dma_mr-lkey;
 +
   return 0;
  }
  
 @@ -51,7 +74,7 @@ physical_op_map(struct rpcrdma_xprt *r_xprt, struct 
 rpcrdma_mr_seg *seg,
   struct rpcrdma_ia *ia = r_xprt-rx_ia;
  
   rpcrdma_map_one(ia-ri_device, seg, rpcrdma_data_dir(writing));
 - seg-mr_rkey = ia-ri_bind_mem-rkey;
 + seg-mr_rkey = ia-ri_dma_mr-rkey;
   seg-mr_base = seg-mr_dma;
   seg-mr_nsegs = 1;
   return 1;
 diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
 index da184f9..8516d98 100644
 --- a/net/sunrpc/xprtrdma/verbs.c
 +++ b/net/sunrpc/xprtrdma/verbs.c
 @@ -493,9 +493,11 @@ rpcrdma_clean_cq(struct ib_cq *cq)
  int
  rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
  {
 - int rc, mem_priv;
   struct rpcrdma_ia *ia = xprt-rx_ia;
   struct ib_device_attr *devattr = ia-ri_devattr;
 + int rc;
 +
 + ia-ri_dma_mr = NULL;
  
   ia-ri_id = rpcrdma_create_id(xprt, ia, addr);
   if (IS_ERR(ia-ri_id)) {
 @@ -519,11 +521,6 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct 
 sockaddr *addr, int memreg)
   goto out3;
   }
  

Re: [PATCH V6 9/9] isert: Support iWARP transports using FRMRs

2015-07-26 Thread Sagi Grimberg

On 7/24/2015 10:24 PM, Jason Gunthorpe wrote:

On Fri, Jul 24, 2015 at 01:48:09PM -0500, Steve Wise wrote:

The use of FRWR for RDMA READ should be iWarp specific, IB shouldn't
pay that overhead. I am expecting to see a cap_rdma_read_rkey or
something in here ?


Ok.  But cap_rdma_read_rkey() doesn't really describe the
requirement.  The requirement is rkey + REMOTE_WRITE.  So it is more
like rdma_cap_read_requires_remote_write() which is ugly and too
long (but descriptive)...


I don't care much what name you pick, just jam something like this in
the description


I think we can just do if (signature || iwarp) use fastreg else
use local_dma_lkey.



  If set then RDMA_READ must be performed by mapping the local
  buffers through a rkey MR with ACCESS_REMOTE_WRITE enabled.
  The rkey of this MR should be passed in as the sg_lists's lkey for
  IB_WR_RDMA_READ_WITH_INV.


I think this would be an incremental patch and not as part of iwarp
support.

Question though, wouldn't it be better to do a single RDMA_READ to say
4 registered keys rather than RDMA_READ_WITH_INV for each?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 11/13] IB/cma: Use found net_dev for passive connections

2015-07-26 Thread Haggai Eran
When receiving a new connection in cma_req_handler, we actually already
know the net_dev that is used for the connection's creation. Instead of
calling cma_translate_addr to resolve the new connection id's source
address, just use the net_dev that was found.

Signed-off-by: Haggai Eran hagg...@mellanox.com
---
 drivers/infiniband/core/cma.c | 74 +++
 1 file changed, 47 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 42f412fde064..1c43b58a8eb2 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1273,33 +1273,31 @@ static struct rdma_id_private *cma_find_listener(
 }
 
 static struct rdma_id_private *cma_id_from_event(struct ib_cm_id *cm_id,
-struct ib_cm_event *ib_event)
+struct ib_cm_event *ib_event,
+struct net_device **net_dev)
 {
struct cma_req_info req;
struct rdma_bind_list *bind_list;
struct rdma_id_private *id_priv;
-   struct net_device *net_dev;
int err;
 
err = cma_save_req_info(ib_event, req);
if (err)
return ERR_PTR(err);
 
-   net_dev = cma_get_net_dev(ib_event, req);
-   if (IS_ERR(net_dev)) {
-   if (PTR_ERR(net_dev) == -EAFNOSUPPORT) {
+   *net_dev = cma_get_net_dev(ib_event, req);
+   if (IS_ERR(*net_dev)) {
+   if (PTR_ERR(*net_dev) == -EAFNOSUPPORT) {
/* Assuming the protocol is AF_IB */
-   net_dev = NULL;
+   *net_dev = NULL;
} else {
-   return ERR_PTR(PTR_ERR(net_dev));
+   return ERR_PTR(PTR_ERR(*net_dev));
}
}
 
bind_list = cma_ps_find(rdma_ps_from_service_id(req.service_id),
cma_port_from_service_id(req.service_id));
-   id_priv = cma_find_listener(bind_list, cm_id, ib_event, req, net_dev);
-
-   dev_put(net_dev);
+   id_priv = cma_find_listener(bind_list, cm_id, ib_event, req, *net_dev);
 
return id_priv;
 }
@@ -1549,7 +1547,8 @@ out:
 }
 
 static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
-  struct ib_cm_event *ib_event)
+  struct ib_cm_event *ib_event,
+  struct net_device *net_dev)
 {
struct rdma_id_private *id_priv;
struct rdma_cm_id *id;
@@ -1581,14 +1580,15 @@ static struct rdma_id_private *cma_new_conn_id(struct 
rdma_cm_id *listen_id,
if (rt-num_paths == 2)
rt-path_rec[1] = *ib_event-param.req_rcvd.alternate_path;
 
-   if (cma_any_addr(cma_src_addr(id_priv))) {
-   rt-addr.dev_addr.dev_type = ARPHRD_INFINIBAND;
-   rdma_addr_set_sgid(rt-addr.dev_addr, rt-path_rec[0].sgid);
-   ib_addr_set_pkey(rt-addr.dev_addr, 
be16_to_cpu(rt-path_rec[0].pkey));
-   } else {
-   ret = cma_translate_addr(cma_src_addr(id_priv), 
rt-addr.dev_addr);
+   if (net_dev) {
+   ret = rdma_copy_addr(rt-addr.dev_addr, net_dev, NULL);
if (ret)
goto err;
+   } else {
+   /* An AF_IB connection */
+   WARN_ON_ONCE(ss_family != AF_IB);
+
+   cma_translate_ib((struct sockaddr_ib *)cma_src_addr(id_priv), 
rt-addr.dev_addr);
}
rdma_addr_set_dgid(rt-addr.dev_addr, rt-path_rec[0].dgid);
 
@@ -1601,7 +1601,8 @@ err:
 }
 
 static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id,
- struct ib_cm_event *ib_event)
+ struct ib_cm_event *ib_event,
+ struct net_device *net_dev)
 {
struct rdma_id_private *id_priv;
struct rdma_cm_id *id;
@@ -1620,10 +1621,17 @@ static struct rdma_id_private *cma_new_udp_id(struct 
rdma_cm_id *listen_id,
  ib_event-param.sidr_req_rcvd.service_id))
goto err;
 
-   if (!cma_any_addr((struct sockaddr *) id-route.addr.src_addr)) {
-   ret = cma_translate_addr(cma_src_addr(id_priv), 
id-route.addr.dev_addr);
+   if (net_dev) {
+   ret = rdma_copy_addr(id-route.addr.dev_addr, net_dev, NULL);
if (ret)
goto err;
+   } else {
+   /* An AF_IB connection */
+   WARN_ON_ONCE(ss_family != AF_IB);
+
+   if (!cma_any_addr(cma_src_addr(id_priv)))
+   cma_translate_ib((struct sockaddr_ib 
*)cma_src_addr(id_priv),
+id-route.addr.dev_addr);
}
 
id_priv-state = 

[PATCH v2 03/13] IB/ipoib: Return IPoIB devices matching connection parameters

2015-07-26 Thread Haggai Eran
From: Guy Shapiro gu...@mellanox.com

Implement the get_net_device_by_port_pkey_ip callback that returns network
device to ib_core according to connection parameters. Check the ipoib
device and iterate over all child devices to look for a match.

For each IPoIB device we iterate through all upper devices when searching
for a matching IP, in order to support bonding.

Signed-off-by: Guy Shapiro gu...@mellanox.com
Signed-off-by: Haggai Eran hagg...@mellanox.com
Signed-off-by: Yotam Kenneth yota...@mellanox.com
Signed-off-by: Shachar Raindel rain...@mellanox.com
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 229 +-
 1 file changed, 228 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index cca1a0c91ec4..36536ce5a3e2 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -48,6 +48,9 @@
 
 #include linux/jhash.h
 #include net/arp.h
+#include net/addrconf.h
+#include linux/inetdevice.h
+#include rdma/ib_cache.h
 
 #define DRV_VERSION 1.0.0
 
@@ -91,11 +94,16 @@ struct ib_sa_client ipoib_sa_client;
 static void ipoib_add_one(struct ib_device *device);
 static void ipoib_remove_one(struct ib_device *device, void *client_data);
 static void ipoib_neigh_reclaim(struct rcu_head *rp);
+static struct net_device *ipoib_get_net_dev_by_params(
+   struct ib_device *dev, u8 port, u16 pkey,
+   const union ib_gid *gid, const struct sockaddr *addr,
+   void *client_data);
 
 static struct ib_client ipoib_client = {
.name   = ipoib,
.add= ipoib_add_one,
-   .remove = ipoib_remove_one
+   .remove = ipoib_remove_one,
+   .get_net_dev_by_params = ipoib_get_net_dev_by_params,
 };
 
 int ipoib_open(struct net_device *dev)
@@ -222,6 +230,225 @@ static int ipoib_change_mtu(struct net_device *dev, int 
new_mtu)
return 0;
 }
 
+/* Called with an RCU read lock taken */
+static bool ipoib_is_dev_match_addr_rcu(const struct sockaddr *addr,
+   struct net_device *dev)
+{
+   struct net *net = dev_net(dev);
+   struct in_device *in_dev;
+   struct sockaddr_in *addr_in = (struct sockaddr_in *)addr;
+   struct sockaddr_in6 *addr_in6 = (struct sockaddr_in6 *)addr;
+   __be32 ret_addr;
+
+   switch (addr-sa_family) {
+   case AF_INET:
+   in_dev = in_dev_get(dev);
+   if (!in_dev)
+   return false;
+
+   ret_addr = inet_confirm_addr(net, in_dev, 0,
+addr_in-sin_addr.s_addr,
+RT_SCOPE_HOST);
+   in_dev_put(in_dev);
+   if (ret_addr)
+   return true;
+
+   break;
+   case AF_INET6:
+   if (IS_ENABLED(CONFIG_IPV6) 
+   ipv6_chk_addr(net, addr_in6-sin6_addr, dev, 1))
+   return true;
+
+   break;
+   }
+   return false;
+}
+
+/**
+ * Find the master net_device on top of the given net_device.
+ * @dev: base IPoIB net_device
+ *
+ * Returns the master net_device with a reference held, or the same net_device
+ * if no master exists.
+ */
+static struct net_device *ipoib_get_master_net_dev(struct net_device *dev)
+{
+   struct net_device *master;
+
+   rcu_read_lock();
+   master = netdev_master_upper_dev_get_rcu(dev);
+   if (master)
+   dev_hold(master);
+   rcu_read_unlock();
+
+   if (master)
+   return master;
+
+   dev_hold(dev);
+   return dev;
+}
+
+/**
+ * Find a net_device matching the given address, which is an upper device of
+ * the given net_device.
+ * @addr: IP address to look for.
+ * @dev: base IPoIB net_device
+ *
+ * If found, returns the net_device with a reference held. Otherwise return
+ * NULL.
+ */
+static struct net_device *ipoib_get_net_dev_match_addr(
+   const struct sockaddr *addr, struct net_device *dev)
+{
+   struct net_device *upper,
+ *result = NULL;
+   struct list_head *iter;
+
+   rcu_read_lock();
+   if (ipoib_is_dev_match_addr_rcu(addr, dev)) {
+   dev_hold(dev);
+   result = dev;
+   goto out;
+   }
+
+   netdev_for_each_all_upper_dev_rcu(dev, upper, iter) {
+   if (ipoib_is_dev_match_addr_rcu(addr, upper)) {
+   dev_hold(upper);
+   result = upper;
+   break;
+   }
+   }
+out:
+   rcu_read_unlock();
+   return result;
+}
+
+/* returns the number of IPoIB netdevs on top a given ipoib device matching a
+ * pkey_index and address, if one exists.
+ *
+ * @found_net_dev: contains a matching net_device if the return value = 1,
+ * with a reference held. */
+static int ipoib_match_gid_pkey_addr(struct 

Re: [PATCH v3 04/15] xprtrdma: Don't fall back to PHYSICAL memory registration

2015-07-26 Thread Christoph Hellwig
On Mon, Jul 20, 2015 at 03:03:02PM -0400, Chuck Lever wrote:
 PHYSICAL memory registration uses a single rkey for all of the
 client's memory, thus is insecure. It is still useful in some cases
 for testing.
 
 Retain the ability to select PHYSICAL memory registration capability
 via /proc/sys/sunrpc/rdma_memreg_strategy, but don't fall back to it
 if the HCA does not support FRWR or FMR.
 
 This means amso1100 no longer works out of the box with NFS/RDMA.
 When using amso1100 HCAs, set the memreg_strategy sysctl to 6 before
 performing NFS/RDMA mounts.

Looks good,

Reviewed-by: Christoph Hellwig h...@lst.de
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 01/15] xprtrdma: Make xprt_setup_rdma() agnostic to family of server address

2015-07-26 Thread Christoph Hellwig
On Mon, Jul 20, 2015 at 03:02:33PM -0400, Chuck Lever wrote:
 In particular, recognize when an IPv6 connection is bound.
 
 Signed-off-by: Chuck Lever chuck.le...@oracle.com
 Tested-by: Devesh Sharma devesh.sha...@avagotech.com

Looks good,

Reviewed-by: Christoph Hellwig h...@lst.de
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-07-26 Thread Steve Wise

On 7/26/2015 6:00 AM, Sagi Grimberg wrote:

On 7/26/2015 1:43 PM, Christoph Hellwig wrote:

On Sun, Jul 26, 2015 at 01:08:16PM +0300, Sagi Grimberg wrote:

I've given this some thought and I think we should avoid splitting
logic from PI and iWARP. The reason (other than code duplication) is
that currently the iser target support only up to 1MB IOs. I have some
code (not done yet) to support larger IOs by using multiple
registrations  per IO (with or without PI).


Just curious: How is this going to work with iSER only having a single
rkey/offset/len field?



Good question,

On the wire iser sends a single rkey, but the target is allowed to
transfer the data however it wants to.

Say that the local target HCA supports only 32 pages (128K bytes for 4K
pages) registration and the initiator sent:
rkey=0x1234
address=0x
length=512K

The target would allocate a 512K buffer and:
register offset 0-128K to lkey=0x1
register offset 128K-256K to lkey=0x2
register offset 256K-384K to lkey=0x3
register offset 384K-512K to lkey=0x4

then constructs sg_list as:
sg_list[0] = {addr=buf, length=128K, lkey=0x1}
sg_list[1] = {addr=buf+128K, length=128K, lkey=0x2}
sg_list[2] = {addr=buf+256K, length=128K, lkey=0x3}
sg_list[3] = {addr=buf+384K, length=128K, lkey=0x4}

Then set rdma_read wr with:
rdma_r_wr.sg_list=sg_list
rdma_r_wr.rdma.addr=0x
rdma_r_wr.rdma.rkey=0x1234

post_send(rdma_r_wr);

Ideally, the post contains a chain of all 4 registrations and the
rdma_read (and an opportunistic good scsi response).


Just to be clear: This example is for IB only, correct?  IW would 
require rkeys with REMOTE_WRITE and 4 read wrs.  And you're ignoring 
invalidation wrs (or read-with-inv) in the example...


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 06/15] xprtrdma: Clean up rpcrdma_ia_open()

2015-07-26 Thread Christoph Hellwig
On Sun, Jul 26, 2015 at 02:21:23PM -0400, Chuck Lever wrote:
 No, this patch is not strictly needed in 4.3, but my read of
 Jason?s series is that he does not touch xprtrdma. I don?t
 believe there will be a merge conflict.
 
 The goal of this patch is to move xprtrdma forward so it will
 be straightforward to use pd-local_dma_key for RPC send and
 receive buffers. That?s a change that can be added after both
 this patch and Jason?s series is merged.
 
 I prefer keeping this patch separate, because that makes it
 simpler to review and test this refactor. I don?t see a reason
 to delay it, but I can do that if it is needed.

You're right, Jason didn't touch xprtrdma. Sorry for the noise.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-07-26 Thread Christoph Hellwig
On Sun, Jul 26, 2015 at 02:00:51PM +0300, Sagi Grimberg wrote:
 On the wire iser sends a single rkey, but the target is allowed to
 transfer the data however it wants to.

So you're trying to get above the limit of a single RDMA READ, not
above the limit for memory registration in the initiator?  In that
case your explanation makes sense, that's just not what I expected
to be the limiting factor.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 06/15] xprtrdma: Clean up rpcrdma_ia_open()

2015-07-26 Thread Chuck Lever
Hi Christoph-


On Jul 26, 2015, at 12:53 PM, Christoph Hellwig h...@infradead.org wrote:

 Jason has patches that provide a local_dma_lkey in the PD that is always
 available.  Do you need this clean up for the next merge window?  If not
 it might be worth to postponed it to avoid merge conflicts, specially
 as I assume the NFS changes will go in through Trond.

No, this patch is not strictly needed in 4.3, but my read of
Jason’s series is that he does not touch xprtrdma. I don’t
believe there will be a merge conflict.

The goal of this patch is to move xprtrdma forward so it will
be straightforward to use pd-local_dma_key for RPC send and
receive buffers. That’s a change that can be added after both
this patch and Jason’s series is merged.

I prefer keeping this patch separate, because that makes it
simpler to review and test this refactor. I don’t see a reason
to delay it, but I can do that if it is needed.


 On Mon, Jul 20, 2015 at 03:03:20PM -0400, Chuck Lever wrote:
 Untangle the end of rpcrdma_ia_open() by moving DMA MR set-up, which
 is different for each registration method, to the .ro_open functions.
 
 This is refactoring only. No behavior change is expected.
 
 Signed-off-by: Chuck Lever chuck.le...@oracle.com
 Tested-by: Devesh Sharma devesh.sha...@avagotech.com
 ---
 net/sunrpc/xprtrdma/fmr_ops.c  |   19 +++
 net/sunrpc/xprtrdma/frwr_ops.c |5 +++
 net/sunrpc/xprtrdma/physical_ops.c |   25 ++-
 net/sunrpc/xprtrdma/verbs.c|   60 
 +++-
 net/sunrpc/xprtrdma/xprt_rdma.h|3 +-
 5 files changed, 67 insertions(+), 45 deletions(-)
 
 diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
 index f1e8daf..cb25c89 100644
 --- a/net/sunrpc/xprtrdma/fmr_ops.c
 +++ b/net/sunrpc/xprtrdma/fmr_ops.c
 @@ -39,6 +39,25 @@ static int
 fmr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
  struct rpcrdma_create_data_internal *cdata)
 {
 +struct ib_device_attr *devattr = ia-ri_devattr;
 +struct ib_mr *mr;
 +
 +/* Obtain an lkey to use for the regbufs, which are
 + * protected from remote access.
 + */
 +if (devattr-device_cap_flags  IB_DEVICE_LOCAL_DMA_LKEY) {
 +ia-ri_dma_lkey = ia-ri_device-local_dma_lkey;
 +} else {
 +mr = ib_get_dma_mr(ia-ri_pd, IB_ACCESS_LOCAL_WRITE);
 +if (IS_ERR(mr)) {
 +pr_err(%s: ib_get_dma_mr for failed with %lX\n,
 +   __func__, PTR_ERR(mr));
 +return -ENOMEM;
 +}
 +ia-ri_dma_lkey = ia-ri_dma_mr-lkey;
 +ia-ri_dma_mr = mr;
 +}
 +
  return 0;
 }
 
 diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
 index 04ea914..63f282e 100644
 --- a/net/sunrpc/xprtrdma/frwr_ops.c
 +++ b/net/sunrpc/xprtrdma/frwr_ops.c
 @@ -189,6 +189,11 @@ frwr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep 
 *ep,
  struct ib_device_attr *devattr = ia-ri_devattr;
  int depth, delta;
 
 +/* Obtain an lkey to use for the regbufs, which are
 + * protected from remote access.
 + */
 +ia-ri_dma_lkey = ia-ri_device-local_dma_lkey;
 +
  ia-ri_max_frmr_depth =
  min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
devattr-max_fast_reg_page_list_len);
 diff --git a/net/sunrpc/xprtrdma/physical_ops.c 
 b/net/sunrpc/xprtrdma/physical_ops.c
 index 41985d0..72cf8b1 100644
 --- a/net/sunrpc/xprtrdma/physical_ops.c
 +++ b/net/sunrpc/xprtrdma/physical_ops.c
 @@ -23,6 +23,29 @@ static int
 physical_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
   struct rpcrdma_create_data_internal *cdata)
 {
 +struct ib_device_attr *devattr = ia-ri_devattr;
 +struct ib_mr *mr;
 +
 +/* Obtain an rkey to use for RPC data payloads.
 + */
 +mr = ib_get_dma_mr(ia-ri_pd,
 +   IB_ACCESS_LOCAL_WRITE |
 +   IB_ACCESS_REMOTE_WRITE |
 +   IB_ACCESS_REMOTE_READ);
 +if (IS_ERR(mr)) {
 +pr_err(%s: ib_get_dma_mr for failed with %lX\n,
 +   __func__, PTR_ERR(mr));
 +return -ENOMEM;
 +}
 +ia-ri_dma_mr = mr;
 +
 +/* Obtain an lkey to use for regbufs.
 + */
 +if (devattr-device_cap_flags  IB_DEVICE_LOCAL_DMA_LKEY)
 +ia-ri_dma_lkey = ia-ri_device-local_dma_lkey;
 +else
 +ia-ri_dma_lkey = ia-ri_dma_mr-lkey;
 +
  return 0;
 }
 
 @@ -51,7 +74,7 @@ physical_op_map(struct rpcrdma_xprt *r_xprt, struct 
 rpcrdma_mr_seg *seg,
  struct rpcrdma_ia *ia = r_xprt-rx_ia;
 
  rpcrdma_map_one(ia-ri_device, seg, rpcrdma_data_dir(writing));
 -seg-mr_rkey = ia-ri_bind_mem-rkey;
 +seg-mr_rkey = ia-ri_dma_mr-rkey;
  seg-mr_base = seg-mr_dma;
  seg-mr_nsegs = 1;
  return 1;
 diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
 index da184f9..8516d98 100644
 --- 

[PATCH for-4.2] iw_cxgb4: gracefully handle unknown CQE status errors

2015-07-26 Thread Hariprasad Shenai
c4iw_poll_cq_on() shouldn't fail the poll operation just because
the CQE status is unknown.  Rather, it should map this to the
fatal error status and log the anomaly.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
Signed-off-by: Hariprasad Shenai haripra...@chelsio.com
---
 drivers/infiniband/hw/cxgb4/cq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c
index c7aab48..92d5183 100644
--- a/drivers/infiniband/hw/cxgb4/cq.c
+++ b/drivers/infiniband/hw/cxgb4/cq.c
@@ -814,7 +814,7 @@ static int c4iw_poll_cq_one(struct c4iw_cq *chp, struct 
ib_wc *wc)
printk(KERN_ERR MOD
   Unexpected cqe_status 0x%x for QPID=0x%0x\n,
   CQE_STATUS(cqe), CQE_QPID(cqe));
-   ret = -EINVAL;
+   wc-status = IB_WC_FATAL_ERR;
}
}
 out:
-- 
2.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-4.2] iw_cxgb4: set the default MPA version to 2

2015-07-26 Thread Hariprasad Shenai
This enables ORD/IRD negotiation and its about time to enable it by
default

Signed-off-by: Steve Wise sw...@opengridcomputing.com
Signed-off-by: Hariprasad Shenai haripra...@chelsio.com
---
 drivers/infiniband/hw/cxgb4/cm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 3ad8dc7..75144d9 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -115,11 +115,11 @@ module_param(ep_timeout_secs, int, 0644);
 MODULE_PARM_DESC(ep_timeout_secs, CM Endpoint operation timeout 
   in seconds (default=60));
 
-static int mpa_rev = 1;
+static int mpa_rev = 2;
 module_param(mpa_rev, int, 0644);
 MODULE_PARM_DESC(mpa_rev, MPA Revision, 0 supports amso1100, 
1 is RFC0544 spec compliant, 2 is IETF MPA Peer Connect Draft
-compliant (default=1));
+compliant (default=2));
 
 static int markers_enabled;
 module_param(markers_enabled, int, 0644);
-- 
2.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 02/13] IB/core: Find the network device matching connection parameters

2015-07-26 Thread Haggai Eran
From: Yotam Kenneth yota...@mellanox.com

In the case of IPoIB, and maybe in other cases, the network device is
managed by an upper-layer protocol (ULP). In order to expose this
network device to other users of the IB device, let ULPs implement
a callback that returns network device according to connection parameters.

The IB device and port, together with the P_Key and the GID should
be enough to uniquely identify the ULP net device. However, in current
kernels there can be multiple IPoIB interfaces created with the same GID.
Furthermore, such configuration may be desireable to support ipvlan-like
configurations for RDMA CM with IPoIB.  To resolve the device in these
cases the code will also take the IP address as an additional input.

Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
Signed-off-by: Haggai Eran hagg...@mellanox.com
Signed-off-by: Yotam Kenneth yota...@mellanox.com
Signed-off-by: Shachar Raindel rain...@mellanox.com
Signed-off-by: Guy Shapiro gu...@mellanox.com
---
 drivers/infiniband/core/device.c | 46 
 include/rdma/ib_verbs.h  | 27 +++
 2 files changed, 73 insertions(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 623d8e191ced..124597732fe7 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -38,6 +38,7 @@
 #include linux/slab.h
 #include linux/init.h
 #include linux/mutex.h
+#include linux/netdevice.h
 #include rdma/rdma_netlink.h
 
 #include core_priv.h
@@ -781,6 +782,51 @@ int ib_find_pkey(struct ib_device *device,
 }
 EXPORT_SYMBOL(ib_find_pkey);
 
+/**
+ * ib_get_net_dev_by_params() - Return the appropriate net_dev
+ * for a received CM request
+ * @dev:   An RDMA device on which the request has been received.
+ * @port:  Port number on the RDMA device.
+ * @pkey:  The Pkey the request came on.
+ * @gid:   A GID that the net_dev uses to communicate.
+ * @addr:  Contains the IP address that the request specified as its
+ * destination.
+ */
+struct net_device *ib_get_net_dev_by_params(struct ib_device *dev,
+   u8 port,
+   u16 pkey,
+   const union ib_gid *gid,
+   const struct sockaddr *addr)
+{
+   struct net_device *net_dev = NULL;
+   struct ib_client_data *context;
+
+   if (!rdma_protocol_ib(dev, port))
+   return NULL;
+
+   down_read(lists_rwsem);
+
+   list_for_each_entry(context, dev-client_data_list, list) {
+   struct ib_client *client = context-client;
+
+   if (context-going_down)
+   continue;
+
+   if (client-get_net_dev_by_params) {
+   net_dev = client-get_net_dev_by_params(dev, port, pkey,
+   gid, addr,
+   context-data);
+   if (net_dev)
+   break;
+   }
+   }
+
+   up_read(lists_rwsem);
+
+   return net_dev;
+}
+EXPORT_SYMBOL(ib_get_net_dev_by_params);
+
 static int __init ib_core_init(void)
 {
int ret;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5b83e0c10d55..b04d2b4d1792 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -48,6 +48,7 @@
 #include linux/rwsem.h
 #include linux/scatterlist.h
 #include linux/workqueue.h
+#include linux/socket.h
 #include uapi/linux/if_ether.h
 
 #include linux/atomic.h
@@ -1765,6 +1766,28 @@ struct ib_client {
void (*add)   (struct ib_device *);
void (*remove)(struct ib_device *, void *client_data);
 
+   /* Returns the net_dev belonging to this ib_client and matching the
+* given parameters.
+* @dev: An RDMA device that the net_dev use for communication.
+* @port:A physical port number on the RDMA device.
+* @pkey:P_Key that the net_dev uses if applicable.
+* @gid: A GID that the net_dev uses to communicate.
+* @addr:An IP address the net_dev is configured with.
+* @client_data: The device's client data set by ib_set_client_data().
+*
+* An ib_client that implements a net_dev on top of RDMA devices
+* (such as IP over IB) should implement this callback, allowing the
+* rdma_cm module to find the right net_dev for a given request.
+*
+* The caller is responsible for calling dev_put on the returned
+* netdev. */
+   struct net_device *(*get_net_dev_by_params)(
+   struct ib_device *dev,
+   u8 port,
+   u16 pkey,
+   const union ib_gid *gid,
+   const struct sockaddr *addr,
+

[PATCH v2 05/13] IB/cm: Share listening CM IDs

2015-07-26 Thread Haggai Eran
Enabling network namespaces for RDMA CM will allow processes on different
namespaces to listen on the same port. In order to leave namespace support
out of the CM layer, this requires that multiple RDMA CM IDs will be able
to share a single CM ID.

This patch adds infrastructure to retrieve an existing listening ib_cm_id,
based on its device and service ID, or create a new one if one does not
already exist. It also adds a reference count for such instances
(cm_id_private.listen_sharecount), and prevents cm_destroy_id from
destroying a CM if it is still shared. See the relevant discussion [1].

[1] Re: [PATCH v3 for-next 05/13] IB/cm: Reference count ib_cm_ids
http://www.spinics.net/lists/netdev/msg328860.html

Reviewed-by: Jason Gunthorpe jguntho...@obsidianresearch.com
Signed-off-by: Haggai Eran hagg...@mellanox.com
---
 drivers/infiniband/core/cm.c | 126 ---
 include/rdma/ib_cm.h |   4 ++
 2 files changed, 124 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 93e9e2f34fc6..bcad4cf8404e 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -213,6 +213,9 @@ struct cm_id_private {
spinlock_t lock;/* Do not acquire inside cm.lock */
struct completion comp;
atomic_t refcount;
+   /* Number of clients sharing this ib_cm_id. Only valid for listeners.
+* Protected by the cm.lock spinlock. */
+   int listen_sharecount;
 
struct ib_mad_send_buf *msg;
struct cm_timewait_info *timewait_info;
@@ -859,9 +862,15 @@ retest:
spin_lock_irq(cm_id_priv-lock);
switch (cm_id-state) {
case IB_CM_LISTEN:
-   cm_id-state = IB_CM_IDLE;
spin_unlock_irq(cm_id_priv-lock);
+
spin_lock_irq(cm.lock);
+   if (--cm_id_priv-listen_sharecount  0) {
+   /* The id is still shared. */
+   cm_deref_id(cm_id_priv);
+   spin_unlock_irq(cm.lock);
+   return;
+   }
rb_erase(cm_id_priv-service_node, cm.listen_service_table);
spin_unlock_irq(cm.lock);
break;
@@ -941,11 +950,32 @@ void ib_destroy_cm_id(struct ib_cm_id *cm_id)
 }
 EXPORT_SYMBOL(ib_destroy_cm_id);
 
-int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 
service_mask,
-struct ib_cm_compare_data *compare_data)
+/**
+ * __ib_cm_listen - Initiates listening on the specified service ID for
+ *   connection and service ID resolution requests.
+ * @cm_id: Connection identifier associated with the listen request.
+ * @service_id: Service identifier matched against incoming connection
+ *   and service ID resolution requests.  The service ID should be specified
+ *   network-byte order.  If set to IB_CM_ASSIGN_SERVICE_ID, the CM will
+ *   assign a service ID to the caller.
+ * @service_mask: Mask applied to service ID used to listen across a
+ *   range of service IDs.  If set to 0, the service ID is matched
+ *   exactly.  This parameter is ignored if %service_id is set to
+ *   IB_CM_ASSIGN_SERVICE_ID.
+ * @compare_data: This parameter is optional.  It specifies data that must
+ *   appear in the private data of a connection request for the specified
+ *   listen request.
+ * @lock: If set, lock the cm.lock spin-lock when adding the id to the
+ *   listener tree. When false, the caller must already hold the spin-lock,
+ *   and compare_data must be NULL.
+ */
+static int __ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id,
+ __be64 service_mask,
+ struct ib_cm_compare_data *compare_data,
+ bool lock)
 {
struct cm_id_private *cm_id_priv, *cur_cm_id_priv;
-   unsigned long flags;
+   unsigned long flags = 0;
int ret = 0;
 
service_mask = service_mask ? service_mask : ~cpu_to_be64(0);
@@ -970,8 +1000,10 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64 
service_id, __be64 service_mask,
}
 
cm_id-state = IB_CM_LISTEN;
+   if (lock)
+   spin_lock_irqsave(cm.lock, flags);
 
-   spin_lock_irqsave(cm.lock, flags);
+   ++cm_id_priv-listen_sharecount;
if (service_id == IB_CM_ASSIGN_SERVICE_ID) {
cm_id-service_id = cpu_to_be64(cm.listen_service_id++);
cm_id-service_mask = ~cpu_to_be64(0);
@@ -980,18 +1012,100 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64 
service_id, __be64 service_mask,
cm_id-service_mask = service_mask;
}
cur_cm_id_priv = cm_insert_listen(cm_id_priv);
-   spin_unlock_irqrestore(cm.lock, flags);
 
if (cur_cm_id_priv) {
cm_id-state = IB_CM_IDLE;
+   --cm_id_priv-listen_sharecount;
kfree(cm_id_priv-compare_data);
cm_id_priv-compare_data = NULL;
 

[PATCH v2 08/13] IB/cm: Expose BTH P_Key in CM and SIDR request events

2015-07-26 Thread Haggai Eran
The rdma_cm module will later use the P_Key from the BTH to de-mux
requests.

See discussion at:
  http://www.spinics.net/lists/netdev/msg336067.html

Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
Cc: Liran Liss lir...@mellanox.com
Signed-off-by: Haggai Eran hagg...@mellanox.com
---
 drivers/infiniband/core/cm.c | 20 
 include/rdma/ib_cm.h |  6 ++
 2 files changed, 26 insertions(+)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index bcad4cf8404e..a05c17b336aa 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1408,6 +1408,24 @@ static void cm_format_paths_from_req(struct cm_req_msg 
*req_msg,
}
 }
 
+static u16 cm_get_bth_pkey(struct cm_work *work)
+{
+   struct ib_device *ib_dev = work-port-cm_dev-ib_device;
+   u8 port_num = work-port-port_num;
+   u16 pkey_index = work-mad_recv_wc-wc-pkey_index;
+   u16 pkey;
+   int ret;
+
+   ret = ib_get_cached_pkey(ib_dev, port_num, pkey_index, pkey);
+   if (ret) {
+   dev_warn_ratelimited(ib_dev-dev, ib_cm: Couldn't retrieve 
pkey for incoming request (port %d, pkey index %d). %d\n,
+port_num, pkey_index, ret);
+   return 0;
+   }
+
+   return pkey;
+}
+
 static void cm_format_req_event(struct cm_work *work,
struct cm_id_private *cm_id_priv,
struct ib_cm_id *listen_id)
@@ -1418,6 +1436,7 @@ static void cm_format_req_event(struct cm_work *work,
req_msg = (struct cm_req_msg *)work-mad_recv_wc-recv_buf.mad;
param = work-cm_event.param.req_rcvd;
param-listen_id = listen_id;
+   param-bth_pkey = cm_get_bth_pkey(work);
param-port = cm_id_priv-av.port-port_num;
param-primary_path = work-path[0];
if (req_msg-alt_local_lid)
@@ -3109,6 +3128,7 @@ static void cm_format_sidr_req_event(struct cm_work *work,
param-pkey = __be16_to_cpu(sidr_req_msg-pkey);
param-listen_id = listen_id;
param-service_id = sidr_req_msg-service_id;
+   param-bth_pkey = cm_get_bth_pkey(work);
param-port = work-port-port_num;
work-cm_event.private_data = sidr_req_msg-private_data;
 }
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index f7fd22f10bae..5b54cf77862e 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -113,6 +113,10 @@ struct ib_cm_id;
 
 struct ib_cm_req_event_param {
struct ib_cm_id *listen_id;
+
+   /* P_Key that was used by the GMP's BTH header */
+   u16 bth_pkey;
+
u8  port;
 
struct ib_sa_path_rec   *primary_path;
@@ -224,6 +228,8 @@ struct ib_cm_apr_event_param {
 struct ib_cm_sidr_req_event_param {
struct ib_cm_id *listen_id;
__be64  service_id;
+   /* P_Key that was used by the GMP's BTH header */
+   u16 bth_pkey;
u8  port;
u16 pkey;
 };
-- 
1.7.11.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 07/13] IB/cma: Helper functions to access port space IDRs

2015-07-26 Thread Haggai Eran
Add helper functions to access the IDRs by port-space and port number.

Pass around the port-space enum in cma.c instead of using pointers to
port-space IDRs.

Signed-off-by: Haggai Eran hagg...@mellanox.com
Signed-off-by: Yotam Kenneth yota...@mellanox.com
Signed-off-by: Shachar Raindel rain...@mellanox.com
Signed-off-by: Guy Shapiro gu...@mellanox.com
---
 drivers/infiniband/core/cma.c | 81 ---
 1 file changed, 60 insertions(+), 21 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index cf5c48b0b7d5..f2d799209412 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -113,6 +113,22 @@ static DEFINE_IDR(udp_ps);
 static DEFINE_IDR(ipoib_ps);
 static DEFINE_IDR(ib_ps);
 
+static struct idr *cma_idr(enum rdma_port_space ps)
+{
+   switch (ps) {
+   case RDMA_PS_TCP:
+   return tcp_ps;
+   case RDMA_PS_UDP:
+   return udp_ps;
+   case RDMA_PS_IPOIB:
+   return ipoib_ps;
+   case RDMA_PS_IB:
+   return ib_ps;
+   default:
+   return NULL;
+   }
+}
+
 struct cma_device {
struct list_headlist;
struct ib_device*device;
@@ -122,11 +138,33 @@ struct cma_device {
 };
 
 struct rdma_bind_list {
-   struct idr  *ps;
+   enum rdma_port_spaceps;
struct hlist_head   owners;
unsigned short  port;
 };
 
+static int cma_ps_alloc(enum rdma_port_space ps,
+   struct rdma_bind_list *bind_list, int snum)
+{
+   struct idr *idr = cma_idr(ps);
+
+   return idr_alloc(idr, bind_list, snum, snum + 1, GFP_KERNEL);
+}
+
+static struct rdma_bind_list *cma_ps_find(enum rdma_port_space ps, int snum)
+{
+   struct idr *idr = cma_idr(ps);
+
+   return idr_find(idr, snum);
+}
+
+static void cma_ps_remove(enum rdma_port_space ps, int snum)
+{
+   struct idr *idr = cma_idr(ps);
+
+   idr_remove(idr, snum);
+}
+
 enum {
CMA_OPTION_AFONLY,
 };
@@ -1069,7 +1107,7 @@ static void cma_release_port(struct rdma_id_private 
*id_priv)
mutex_lock(lock);
hlist_del(id_priv-node);
if (hlist_empty(bind_list-owners)) {
-   idr_remove(bind_list-ps, bind_list-port);
+   cma_ps_remove(bind_list-ps, bind_list-port);
kfree(bind_list);
}
mutex_unlock(lock);
@@ -2365,8 +2403,8 @@ static void cma_bind_port(struct rdma_bind_list 
*bind_list,
hlist_add_head(id_priv-node, bind_list-owners);
 }
 
-static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv,
- unsigned short snum)
+static int cma_alloc_port(enum rdma_port_space ps,
+ struct rdma_id_private *id_priv, unsigned short snum)
 {
struct rdma_bind_list *bind_list;
int ret;
@@ -2375,7 +2413,7 @@ static int cma_alloc_port(struct idr *ps, struct 
rdma_id_private *id_priv,
if (!bind_list)
return -ENOMEM;
 
-   ret = idr_alloc(ps, bind_list, snum, snum + 1, GFP_KERNEL);
+   ret = cma_ps_alloc(ps, bind_list, snum);
if (ret  0)
goto err;
 
@@ -2388,7 +2426,8 @@ err:
return ret == -ENOSPC ? -EADDRNOTAVAIL : ret;
 }
 
-static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
+static int cma_alloc_any_port(enum rdma_port_space ps,
+ struct rdma_id_private *id_priv)
 {
static unsigned int last_used_port;
int low, high, remaining;
@@ -2399,7 +2438,7 @@ static int cma_alloc_any_port(struct idr *ps, struct 
rdma_id_private *id_priv)
rover = prandom_u32() % remaining + low;
 retry:
if (last_used_port != rover 
-   !idr_find(ps, (unsigned short) rover)) {
+   !cma_ps_find(ps, (unsigned short)rover)) {
int ret = cma_alloc_port(ps, id_priv, rover);
/*
 * Remember previously used port number in order to avoid
@@ -2454,7 +2493,8 @@ static int cma_check_port(struct rdma_bind_list 
*bind_list,
return 0;
 }
 
-static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
+static int cma_use_port(enum rdma_port_space ps,
+   struct rdma_id_private *id_priv)
 {
struct rdma_bind_list *bind_list;
unsigned short snum;
@@ -2464,7 +2504,7 @@ static int cma_use_port(struct idr *ps, struct 
rdma_id_private *id_priv)
if (snum  PROT_SOCK  !capable(CAP_NET_BIND_SERVICE))
return -EACCES;
 
-   bind_list = idr_find(ps, snum);
+   bind_list = cma_ps_find(ps, snum);
if (!bind_list) {
ret = cma_alloc_port(ps, id_priv, snum);
} else {
@@ -2487,25 +2527,24 @@ static int cma_bind_listen(struct rdma_id_private 
*id_priv)
return ret;
 }
 
-static struct idr *cma_select_inet_ps(struct rdma_id_private *id_priv)
+static enum rdma_port_space 

[PATCH v2 00/13] Demux IB CM requests in the rdma_cm module

2015-07-26 Thread Haggai Eran
Thanks everyone for the review comments. I've updated the patch set
accordingly. The changes are listed below. In addition to the changes discussed
on the list I've made sure AF_IB continues to work by retrieving parameters
from the listener ID when an AF_IB request is detected.

Changes from v1:
- Patch 1: mark ib_client_data as going down instead of removing all client
  contexts during de-registration.
- Patch 2:
  * move kdoc to the function definition
  * do not call get_net_dev_by_params() on devices/clients that are going
down
  * pass client data directly to the callback
- Patch 3:
  * pass client data directly to callback
  * fix a lockdep warning in ipoib_match_gid_pkey_addr()
  * remove a debugging print left over
  * set a rate limit to the duplicated IP address warning
- Patch 5:
  * change atomic_dec(id-refcount) to cm_deref_id()
  * always update listen_sharecount under the cm.lock spinlock
- Patch 6: handle AF_IB requests by getting parameters from the listener
- Patch 8: new patch to expose BTH P_Key from ib_cm to rdma_cm
- Patch 9:
  * get P_Key used for de-mux from the BTH
  * use -EAFNOSUPPORT in cma_save_ip_info to designate a possible AF_IB
connection request
  * pass a NULL netdev for AF_IB requests
- Patch 11: handle AF_IB connections by filling connection information from
  the listener id instead of from the net_dev
- Patch 12: fix mention of the old ib_cm_id_create_and_listen function in
  the changelog entry.

Changes from v0:
- Added a patch to prevent a race between ib_unregister_device() and
  ib_get_net_dev_by_params().
- Removed the patch that exported a UD GMP packet's GID from the GRH, and
  related code.
- Patch 3:
  * Add _rcu suffix to ipoib_is_dev_match_addr().
  * Add helper function to get the master netdev for bonding support.
  * Scan for matching net devices in two phases: first without looking at
  * the IP address, and then looking at the IP address only when the first
phase did not find a unique net device.
- Patch 5:
  * Do not init listen_sharecount = 1 for non-listening ib_cm_ids.
  * Remove code that sets a CM ID's state to IB_CM_IDLE right before
destruction.
  * Rename ib_cm_id_create_and_listen() to ib_cm_insert_listen().
  * Do not increase reference counts when failing to add a shared CM ID due
to having a different handler callback.
- Patch 9: Clean IPv4 net_dev validation function.
- Added patch 10: new patch to use the found net_dev in IB/cma for
  eliminating unneeded calls to cma_translate_addr.
- Patch 12: Remove the lock argument to __ib_cm_listen().

The rdma_cm module relies today on the ib_cm module to demux incoming
requests based on their service ID and IP address. The ib_cm module is the
wrong place to perform this task, as it can also be used with services that
do not adhere to the RDMA IP CM service as defined in the IBA
specifications. It is forced to use an opaque private data struct and mask
to compare incoming requests against.

This series moves that demux task responsibility to the rdma_cm module. The
rdma_cm module can look into the private data attached to a CM request,
containing the IP addresses related to the request. It uses the details of
the request to find the net device associated with the request, and use
that net device to find the correct listening rdma_cm_id.

The series applies against Doug's for-v4.2 tree with the patch adding a
rwsem to IB core [2] applied.

The series is structured as follows:
Patch 1 prevents a possible race between ib_client.remove() callbacks from
ib_unregister_device(), and ib_client callbacks that rely on the
lists_rwsem locked for read, such as ib_get_net_dev_by_params(). Both
callbacks may call ib_get_client_data(), and the patch makes sure that the
remove callback doesn't free the client data while it is being used by the
other callback.

Patches 2-3 add the ability to lookup a network device according to the IB
device, port, P_Key, GID and IP address. They find the matching IPoIB
interfaces, and return a matching net_device if one exists.

Patches 4-5 make necessary changes in ib_cm to allow RDMA CM get the
information it needs out of CM and SIDR requests, and share a single
ib_cm_id with multiple RDMA CM listeners.

Patches 6-7 do some preliminary refactoring to the rdma_cm module. They
allow extracting information out of incoming requests instead of retrieving
them from a listening CM ID, and add helper functions to access the port
space IDRs.

Finally, patches 8-12 change rdma_cm to demultiplex requests on its own, and
patch 13 cleans up the now unneeded code in ib_cm to compare against the
private data.

This series contains a subset of the RDMA CM namespaces patches [1]. The
changes from v4 of the relevant patches are:
- Patch 1
  * in addition to the IB device, port, P_Key and IP address, pass
also the GID, to make future IPoIB devices with alias GIDs to unique.
  * return the matching net_device instead of a network namespace.
- Patch 2: use 

Re: [PATCH V6 6/9] isert: Rename IO functions to more descriptive names

2015-07-26 Thread Sagi Grimberg

On 7/26/2015 6:53 PM, Christoph Hellwig wrote:

On Sun, Jul 26, 2015 at 02:00:51PM +0300, Sagi Grimberg wrote:

On the wire iser sends a single rkey, but the target is allowed to
transfer the data however it wants to.


So you're trying to get above the limit of a single RDMA READ, not
above the limit for memory registration in the initiator?


Correct.


 In that case your explanation makes sense, that's just not what I expected
to be the limiting factor.



In the initiator case, there is no way to support transfer size that
exceeds the device registration length capabilities (unless we start
using higher-order atomic allocations which we won't).
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html