RE: [PATCH v6 0/4] Add network namespace support in the RDMA-CM

2015-08-27 Thread Hefty, Sean
This series looks reasonable to me
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] IB/ipoib: Suppress warning for send only join failures

2015-08-27 Thread Jason Gunthorpe
On Wed, Aug 26, 2015 at 05:41:08AM -0400, Hal Rosenstock wrote:
> On 8/25/2015 12:28 PM, Jason Gunthorpe wrote:
> > On Tue, Aug 25, 2015 at 08:59:13AM -0400, Hal Rosenstock wrote:
> >>> - if (mcast->logcount++ < 20) {
> >>> - if (status == -ETIMEDOUT || status == -EAGAIN) {
> >>> + bool silent_fail =
> >>> + test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&
> >>> + status == -EINVAL;
> >>
> >> Aren't there other reasons that send only join might have EINVAL
> >> indicated ?
> > 
> > Not sure, the layers below all eat the detailed error code. Hopefully
> > EINVAL isn't re-used.
> 
> AFAIR there are a number of reasons EINVAL could occur here in which
> case this makes this change overly silent. If so, this particular
> failure case of send only join failure due to SM rejection (perhaps
> ERR_REQ_INVALID SA status only) is best to be made unique and different
> from the other current EINVAL failures here.

That is way to much to undertake just to silence this message.

Unless you know the other EINVALs are likely to happen, I'd just
ignore this imperfection.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] IB/ipoib: Clean up send-only multicast joins

2015-08-27 Thread Jason Gunthorpe
On Wed, Aug 26, 2015 at 12:43:15PM -0400, Doug Ledford wrote:

> That still takes us back to the fact that the locking changes are
> unneeded.  I'm not opposed to them, but as you mentioned in your first
> email, they should go with the changes that require them, and none of
> the changes in the first patch require them.  Which means that if we
> want to keep them, it might be worth splitting them out and giving them
> their own patch with an explanation of why they are a benefit (lightly
> contended code, saves a release/reacquire on the failure path).

Lets just drop them, the cost for restructing was an added empty lock
grab on a non-error path.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V3 1/8] IB/core: Change provider's API of create_cq to be extendible

2015-08-27 Thread Christoph Lameter
Ok we tested this patchset with Matans timestamp-v2 branches from his repo
on github and the timestamps now work fine.

Can we please get the user space library bits into libibverbs and libmlx4?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V3 7/8] IB/mlx4: Add mmap call to map the hardware clock

2015-08-27 Thread Christoph Lameter
Could you please post an updates patch that reflects the current state in
Matan's tree?


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] Add support for extended query device capabilities

2015-08-27 Thread Haggai Eran
From: Eli Cohen 

Add the verb ibv_query_device_ex which is extensible and allows following
commits to add new features to define additional properties.

Signed-off-by: Eli Cohen 
Signed-off-by: Haggai Eran 
---
 Makefile.am   |   3 +-
 examples/devinfo.c|  16 --
 include/infiniband/driver.h   |   9 
 include/infiniband/kern-abi.h |  26 +-
 include/infiniband/verbs.h|  28 ++
 man/ibv_query_device_ex.3 |  47 +
 src/cmd.c | 118 --
 src/libibverbs.map|   2 +
 8 files changed, 202 insertions(+), 47 deletions(-)
 create mode 100644 man/ibv_query_device_ex.3

diff --git a/Makefile.am b/Makefile.am
index ef4df033581d..c85e98ae0662 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -62,7 +62,8 @@ man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 
man/ibv_devinfo.1   \
 man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3
\
 man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3  \
 man/ibv_create_qp_ex.3 man/ibv_create_srq_ex.3 man/ibv_open_xrcd.3  \
-man/ibv_get_srq_num.3 man/ibv_open_qp.3
+man/ibv_get_srq_num.3 man/ibv_open_qp.3 \
+man/ibv_query_device_ex.3
 
 DEBIAN = debian/changelog debian/compat debian/control debian/copyright \
 debian/ibverbs-utils.install debian/libibverbs1.install \
diff --git a/examples/devinfo.c b/examples/devinfo.c
index afa8c853868f..95e8f83753ca 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -208,6 +208,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t 
ib_port)
 {
struct ibv_context *ctx;
struct ibv_device_attr device_attr;
+   struct ibv_device_attr_ex attrx;
struct ibv_port_attr port_attr;
int rc = 0;
uint8_t port;
@@ -219,11 +220,18 @@ static int print_hca_cap(struct ibv_device *ib_dev, 
uint8_t ib_port)
rc = 1;
goto cleanup;
}
-   if (ibv_query_device(ctx, &device_attr)) {
-   fprintf(stderr, "Failed to query device props\n");
-   rc = 2;
-   goto cleanup;
+
+   if (ibv_query_device_ex(ctx, &attrx)) {
+   attrx.comp_mask = 0;
+   if (ibv_query_device(ctx, &device_attr)) {
+   fprintf(stderr, "Failed to query device props\n");
+   rc = 2;
+   goto cleanup;
+   }
+   } else {
+   device_attr = attrx.orig_attr;
}
+
if (ib_port && ib_port > device_attr.phys_port_cnt) {
fprintf(stderr, "Invalid port requested for device\n");
/* rc = 3 is taken by failure to clean up */
diff --git a/include/infiniband/driver.h b/include/infiniband/driver.h
index 5cc092bf9bd5..b78093ae6a8e 100644
--- a/include/infiniband/driver.h
+++ b/include/infiniband/driver.h
@@ -105,6 +105,15 @@ int ibv_cmd_query_device(struct ibv_context *context,
 struct ibv_device_attr *device_attr,
 uint64_t *raw_fw_ver,
 struct ibv_query_device *cmd, size_t cmd_size);
+int ibv_cmd_query_device_ex(struct ibv_context *context,
+   struct ibv_device_attr_ex *attr,
+   uint64_t *raw_fw_ver,
+   struct ibv_query_device_ex *cmd,
+   size_t cmd_core_size,
+   size_t cmd_size,
+   struct ibv_query_device_resp_ex *resp,
+   size_t resp_core_size,
+   size_t resp_size);
 int ibv_cmd_query_port(struct ibv_context *context, uint8_t port_num,
   struct ibv_port_attr *port_attr,
   struct ibv_query_port *cmd, size_t cmd_size);
diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h
index 91b45d837239..af2a1bebf683 100644
--- a/include/infiniband/kern-abi.h
+++ b/include/infiniband/kern-abi.h
@@ -101,12 +101,20 @@ enum {
 
 #define IB_USER_VERBS_CMD_FLAG_EXTENDED0x80ul
 
+/* use this mask for creating extended commands that
+   correspond to old commands */
+#define IB_USER_VERBS_CMD_EXTENDED_MASK \
+   (IB_USER_VERBS_CMD_FLAG_EXTENDED << \
+IB_USER_VERBS_CMD_FLAGS_SHIFT)
+
 
 enum {
IB_USER_VERBS_CMD_CREATE_FLOW = (IB_USER_VERBS_CMD_FLAG_EXTENDED <<
 IB_USER_VERBS_CMD_FLAGS_SHIFT) +
IB_USER_VERBS_CMD_THRESHOLD,
-   IB_USER_VERBS_CMD_DESTROY_FLOW
+   IB_USER_VERBS_CMD_DESTROY_FLOW,
+   IB_USER_VERBS_CMD_QUERY_DEVICE_EX = IB_USER_VERBS_CMD_EXTENDED_MASK |
+   IB_USER_VERBS_CMD_QUERY_DEVICE,
 };
 
 /*
@@ -240,6 +248,19 @@ struct ibv_query_device_resp {
__u8  reserved[4];
 };
 
+struct ibv_query_device_ex {
+   struct ex_hdr   hdr;
+

[PATCH 0/3] libibverbs: On-demand paging support

2015-08-27 Thread Haggai Eran
This series adds userspace support for on-demand paging. The first patch adds
support for the new extended query device verb. Patch 2 adds the capability and
interface bits related to on-demand paging, and patch 3 adds example code to
the rc_pingpong program to use on-demand paging.

Eli Cohen (1):
  Add support for extended query device capabilities

Haggai Eran (1):
  Add on-demand paging support

Majd Dibbiny (1):
  libibverbs/examples: Support odp in rc_pingpong

 Makefile.am   |   3 +-
 examples/devinfo.c|  67 --
 examples/rc_pingpong.c|  31 +-
 include/infiniband/driver.h   |   9 +++
 include/infiniband/kern-abi.h |  36 +++-
 include/infiniband/verbs.h|  53 -
 man/ibv_query_device_ex.3 |  70 +++
 man/ibv_reg_mr.3  |   2 +
 src/cmd.c | 129 +-
 src/libibverbs.map|   2 +
 10 files changed, 352 insertions(+), 50 deletions(-)
 create mode 100644 man/ibv_query_device_ex.3

-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] libibverbs/examples: Support odp in rc_pingpong

2015-08-27 Thread Haggai Eran
From: Majd Dibbiny 

Signed-off-by: Majd Dibbiny 
Signed-off-by: Haggai Eran 
---
 examples/rc_pingpong.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
index ddfe8d007e1a..904ec83a633f 100644
--- a/examples/rc_pingpong.c
+++ b/examples/rc_pingpong.c
@@ -55,6 +55,7 @@ enum {
 };
 
 static int page_size;
+static int use_odp;
 
 struct pingpong_context {
struct ibv_context  *context;
@@ -315,6 +316,7 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev, int size,
int use_event)
 {
struct pingpong_context *ctx;
+   int access_flags = IBV_ACCESS_LOCAL_WRITE;
 
ctx = calloc(1, sizeof *ctx);
if (!ctx)
@@ -355,7 +357,25 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev, int size,
goto clean_comp_channel;
}
 
-   ctx->mr = ibv_reg_mr(ctx->pd, ctx->buf, size, IBV_ACCESS_LOCAL_WRITE);
+   if (use_odp) {
+   const uint32_t rc_caps_mask = IBV_ODP_SUPPORT_SEND |
+ IBV_ODP_SUPPORT_RECV;
+   struct ibv_device_attr_ex attrx = {};
+
+   if (ibv_query_device_ex(ctx->context, &attrx)) {
+   fprintf(stderr, "Couldn't query device for its 
features\n");
+   goto clean_comp_channel;
+   }
+
+   if (!(attrx.odp_caps.general_caps & IBV_ODP_SUPPORT) ||
+   (attrx.odp_caps.per_transport_caps.rc_odp_caps & 
rc_caps_mask) != rc_caps_mask) {
+   fprintf(stderr, "The device isn't ODP capable or does 
not support RC send and receive with ODP\n");
+   goto clean_comp_channel;
+   }
+   access_flags |= IBV_ACCESS_ON_DEMAND;
+   }
+   ctx->mr = ibv_reg_mr(ctx->pd, ctx->buf, size, access_flags);
+
if (!ctx->mr) {
fprintf(stderr, "Couldn't register MR\n");
goto clean_pd;
@@ -540,6 +560,7 @@ static void usage(const char *argv0)
printf("  -l, --sl=  service level value\n");
printf("  -e, --events   sleep on CQ events (default poll)\n");
printf("  -g, --gid-idx= local port gid index\n");
+   printf("  -o, --odp use on demand paging\n");
 }
 
 int main(int argc, char *argv[])
@@ -582,11 +603,13 @@ int main(int argc, char *argv[])
{ .name = "sl",   .has_arg = 1, .val = 'l' },
{ .name = "events",   .has_arg = 0, .val = 'e' },
{ .name = "gid-idx",  .has_arg = 1, .val = 'g' },
+   { .name = "odp",  .has_arg = 0, .val = 'o' },
{ 0 }
};
 
-   c = getopt_long(argc, argv, "p:d:i:s:m:r:n:l:eg:",
+   c = getopt_long(argc, argv, "p:d:i:s:m:r:n:l:eg:o",
long_options, NULL);
+
if (c == -1)
break;
 
@@ -643,6 +666,10 @@ int main(int argc, char *argv[])
gidx = strtol(optarg, NULL, 0);
break;
 
+   case 'o':
+   use_odp = 1;
+   break;
+
default:
usage(argv[0]);
return 1;
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] Add on-demand paging support

2015-08-27 Thread Haggai Eran
On-demand paging feature allows registering memory regions without pinning
their pages. Unfortunately the feature doesn't work together will all
transports and all operations. This patch adds the ability to report on-demand
paging capabilities through the ibv_query_device_ex.

The patch also add the IBV_ACCESS_ON_DEMAND access flag to allow registration
of on-demand paging enabled memory regions.

Signed-off-by: Shachar Raindel 
Signed-off-by: Majd Dibbiny 
Signed-off-by: Haggai Eran 
---
 examples/devinfo.c| 51 +++
 include/infiniband/kern-abi.h | 12 +-
 include/infiniband/verbs.h| 25 -
 man/ibv_query_device_ex.3 | 23 +++
 man/ibv_reg_mr.3  |  2 ++
 src/cmd.c | 11 ++
 6 files changed, 122 insertions(+), 2 deletions(-)

diff --git a/examples/devinfo.c b/examples/devinfo.c
index 95e8f83753ca..61cfdf520be6 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -204,6 +205,54 @@ static const char *link_layer_str(uint8_t link_layer)
}
 }
 
+void print_odp_trans_caps(uint32_t trans)
+{
+   uint32_t unknown_transport_caps = ~(IBV_ODP_SUPPORT_SEND |
+   IBV_ODP_SUPPORT_RECV |
+   IBV_ODP_SUPPORT_WRITE |
+   IBV_ODP_SUPPORT_READ |
+   IBV_ODP_SUPPORT_ATOMIC);
+
+   if (!trans) {
+   printf("\t\t\t\t\tNO SUPPORT\n");
+   } else {
+   if (trans & IBV_ODP_SUPPORT_SEND)
+   printf("\t\t\t\t\tSUPPORT_SEND\n");
+   if (trans & IBV_ODP_SUPPORT_RECV)
+   printf("\t\t\t\t\tSUPPORT_RECV\n");
+   if (trans & IBV_ODP_SUPPORT_WRITE)
+   printf("\t\t\t\t\tSUPPORT_WRITE\n");
+   if (trans & IBV_ODP_SUPPORT_READ)
+   printf("\t\t\t\t\tSUPPORT_READ\n");
+   if (trans & IBV_ODP_SUPPORT_ATOMIC)
+   printf("\t\t\t\t\tSUPPORT_ATOMIC\n");
+   if (trans & unknown_transport_caps)
+   printf("\t\t\t\t\tUnknown flags: 0x%" PRIX32 "\n",
+  trans & unknown_transport_caps);
+   }
+}
+
+void print_odp_caps(struct ibv_odp_caps caps)
+{
+   uint64_t unknown_general_caps = ~(IBV_ODP_SUPPORT);
+
+   /* general odp caps */
+   printf("\tgeneral_odp_caps:\n");
+   if (caps.general_caps & IBV_ODP_SUPPORT)
+   printf("\t\t\t\t\tODP_SUPPORT\n");
+   if (caps.general_caps & unknown_general_caps)
+   printf("\t\t\t\t\tUnknown flags: 0x%" PRIX64 "\n",
+  caps.general_caps & unknown_general_caps);
+
+   /* RC transport */
+   printf("\trc_odp_caps:\n");
+   print_odp_trans_caps(caps.per_transport_caps.rc_odp_caps);
+   printf("\tuc_odp_caps:\n");
+   print_odp_trans_caps(caps.per_transport_caps.uc_odp_caps);
+   printf("\tud_odp_caps:\n");
+   print_odp_trans_caps(caps.per_transport_caps.ud_odp_caps);
+}
+
 static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port)
 {
struct ibv_context *ctx;
@@ -296,6 +345,8 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t 
ib_port)
}
printf("\tmax_pkeys:\t\t\t%d\n", device_attr.max_pkeys);
printf("\tlocal_ca_ack_delay:\t\t%d\n", 
device_attr.local_ca_ack_delay);
+
+   print_odp_caps(attrx.odp_caps);
}
 
for (port = 1; port <= device_attr.phys_port_cnt; ++port) {
diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h
index af2a1bebf683..1c0d0d30c612 100644
--- a/include/infiniband/kern-abi.h
+++ b/include/infiniband/kern-abi.h
@@ -254,11 +254,21 @@ struct ibv_query_device_ex {
__u32   reserved;
 };
 
+struct ibv_odp_caps_resp {
+   __u64 general_caps;
+   struct {
+   __u32 rc_odp_caps;
+   __u32 uc_odp_caps;
+   __u32 ud_odp_caps;
+   } per_transport_caps;
+   __u32 reserved;
+};
+
 struct ibv_query_device_resp_ex {
struct ibv_query_device_resp base;
__u32 comp_mask;
__u32 response_length;
-   __u64 reserved[3];
+   struct ibv_odp_caps_resp odp_caps;
 };
 
 struct ibv_query_port {
diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index ff806bf8555d..ce56315b236e 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -168,9 +168,31 @@ struct ibv_device_attr {
uint8_t phys_port_cnt;
 };
 
+enum ibv_odp_transport_cap_bits {
+   IBV_ODP_SUPPORT_SEND = 1 << 0,
+   IBV_ODP_SUPPORT_RECV = 1 << 1,
+   IBV_ODP_SUPPORT_WRITE= 1 << 2,
+   IBV_ODP_SUPPORT_READ = 1 << 3,
+   IBV_ODP_

[PATCH v6 3/4] IB/cma: Add support for network namespaces

2015-08-27 Thread Haggai Eran
From: Guy Shapiro 

Add support for network namespaces in the ib_cma module. This is
accomplished by:

1. Adding network namespace parameter for rdma_create_id. This parameter is
   used to populate the network namespace field in rdma_id_private.
   rdma_create_id keeps a reference on the network namespace.
2. Using the network namespace from the rdma_id instead of init_net inside
   of ib_cma, when listening on an ID and when looking for an ID for an
   incoming request.
3. Decrementing the reference count for the appropriate network namespace
   when calling rdma_destroy_id.

In order to preserve the current behavior init_net is passed when calling
from other modules.

Signed-off-by: Guy Shapiro 
Signed-off-by: Haggai Eran 
Signed-off-by: Yotam Kenneth 
Signed-off-by: Shachar Raindel 
---
 drivers/infiniband/core/cma.c  | 46 +-
 drivers/infiniband/core/ucma.c |  3 +-
 drivers/infiniband/ulp/iser/iser_verbs.c   |  2 +-
 drivers/infiniband/ulp/isert/ib_isert.c|  2 +-
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h|  4 +-
 include/rdma/rdma_cm.h |  6 ++-
 net/9p/trans_rdma.c|  4 +-
 net/rds/ib.c   |  2 +-
 net/rds/ib_cm.c|  2 +-
 net/rds/iw.c   |  2 +-
 net/rds/iw_cm.c|  2 +-
 net/rds/rdma_transport.c   |  4 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c   |  4 +-
 net/sunrpc/xprtrdma/verbs.c|  3 +-
 14 files changed, 52 insertions(+), 34 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index f40ca053fa3e..debf25ccf930 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -591,7 +591,8 @@ static int cma_disable_callback(struct rdma_id_private 
*id_priv,
return 0;
 }
 
-struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
+struct rdma_cm_id *rdma_create_id(struct net *net,
+ rdma_cm_event_handler event_handler,
  void *context, enum rdma_port_space ps,
  enum ib_qp_type qp_type)
 {
@@ -615,7 +616,7 @@ struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler 
event_handler,
INIT_LIST_HEAD(&id_priv->listen_list);
INIT_LIST_HEAD(&id_priv->mc_list);
get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
-   id_priv->id.route.addr.dev_addr.net = &init_net;
+   id_priv->id.route.addr.dev_addr.net = get_net(net);
 
return &id_priv->id;
 }
@@ -1257,7 +1258,7 @@ static bool cma_match_net_dev(const struct 
rdma_id_private *id_priv,
return addr->src_addr.ss_family == AF_IB;
 
return !addr->dev_addr.bound_dev_if ||
-  (net_eq(dev_net(net_dev), &init_net) &&
+  (net_eq(dev_net(net_dev), addr->dev_addr.net) &&
addr->dev_addr.bound_dev_if == net_dev->ifindex);
 }
 
@@ -1314,7 +1315,7 @@ static struct rdma_id_private *cma_id_from_event(struct 
ib_cm_id *cm_id,
}
}
 
-   bind_list = cma_ps_find(&init_net,
+   bind_list = cma_ps_find(dev_net(*net_dev),
rdma_ps_from_service_id(req.service_id),
cma_port_from_service_id(req.service_id));
id_priv = cma_find_listener(bind_list, cm_id, ib_event, &req, *net_dev);
@@ -1386,6 +1387,7 @@ static void cma_cancel_operation(struct rdma_id_private 
*id_priv,
 static void cma_release_port(struct rdma_id_private *id_priv)
 {
struct rdma_bind_list *bind_list = id_priv->bind_list;
+   struct net *net = id_priv->id.route.addr.dev_addr.net;
 
if (!bind_list)
return;
@@ -1393,7 +1395,7 @@ static void cma_release_port(struct rdma_id_private 
*id_priv)
mutex_lock(&lock);
hlist_del(&id_priv->node);
if (hlist_empty(&bind_list->owners)) {
-   cma_ps_remove(&init_net, bind_list->ps, bind_list->port);
+   cma_ps_remove(net, bind_list->ps, bind_list->port);
kfree(bind_list);
}
mutex_unlock(&lock);
@@ -1452,6 +1454,7 @@ void rdma_destroy_id(struct rdma_cm_id *id)
cma_deref_id(id_priv->id.context);
 
kfree(id_priv->id.route.path_rec);
+   put_net(id_priv->id.route.addr.dev_addr.net);
kfree(id_priv);
 }
 EXPORT_SYMBOL(rdma_destroy_id);
@@ -1582,7 +1585,8 @@ static struct rdma_id_private *cma_new_conn_id(struct 
rdma_cm_id *listen_id,
  ib_event->param.req_rcvd.primary_path->service_id;
int ret;
 
-   id = rdma_create_id(listen_id->event_handler, listen_id->context,
+   id = rdma_create_id(listen_id->route.addr.dev_addr.net,
+   listen_id->event_handler, listen_id->con

[PATCH v6 0/4] Add network namespace support in the RDMA-CM

2015-08-27 Thread Haggai Eran
Hi,

Now that the code for demuxing requests is inside rdma_cm, here are the patches
to add InfiniBand network namespace again.

Changes from v5:
- removed patches that got in as part of the cleanup series.

RDMA-CM uses IP based addressing and routing to setup RDMA connections between
hosts. Currently, all of the IP interfaces and addresses used by the RDMA-CM
must reside in the init_net namespace. This restricts the usage of containers
with RDMA to only work with host network namespace (aka the kernel init_net NS
instance).

This patchset allows using network namespaces with the RDMA-CM.

Each RDMA-CM id keeps a reference to a network namespace.

This reference is based on the process network namespace at the time of the
creation of the object or inherited from the listener.

This network namespace is used to perform all IP and network related
operations. Specifically, the local device lookup, as well as the remote GID
address resolution are done in the context of the RDMA-CM object's namespace.
This allows outgoing connections to reach the right target, even if the same
IP address exists in multiple network namespaces. This can happen if each
network namespace resides on a different P_Key.

Additionally, the network namespace is used to split the listener port space
tables. From the user point of view, each network namespace has a unique,
completely independent tables for its port spaces. This allows running multiple
instances of a single service on the same machine, using containers. 

The functionality introduced by this series would come into play when the
transport is InfiniBand and IPoIB interfaces are assigned to each namespace.
Multiple IPoIB interfaces can be created and assigned to different RDMA-CM
capable containers, for example using pipework [1].

The patches apply against Doug's to-be-rebased tree for v4.3.

The patchset is structured as follows:

Patch 1 is a relatively trivial API extension, requiring the callers
of certain ib_addr functions to provide a network namespace, as needed.

Patches 2-4 add proper namespace support to the RDMA-CM module. This
includes adding multiple port space tables, adding a network namespace
parameter, and finally retrieving the namespace from the creating process.

[1] https://github.com/jpetazzo/pipework/pull/108

Guy Shapiro (3):
  IB/addr: Pass network namespace as a parameter
  IB/cma: Add support for network namespaces
  IB/ucma: Take the network namespace from the process

Haggai Eran (1):
  IB/cma: Separate port allocation to network namespaces

 drivers/infiniband/core/addr.c |  17 +--
 drivers/infiniband/core/cma.c  | 129 +++--
 drivers/infiniband/core/ucma.c |   4 +-
 drivers/infiniband/ulp/iser/iser_verbs.c   |   2 +-
 drivers/infiniband/ulp/isert/ib_isert.c|   2 +-
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h|   4 +-
 include/rdma/ib_addr.h |  16 ++-
 include/rdma/rdma_cm.h |   6 +-
 net/9p/trans_rdma.c|   4 +-
 net/rds/ib.c   |   2 +-
 net/rds/ib_cm.c|   2 +-
 net/rds/iw.c   |   2 +-
 net/rds/iw_cm.c|   2 +-
 net/rds/rdma_transport.c   |   4 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c   |   4 +-
 net/sunrpc/xprtrdma/verbs.c|   3 +-
 16 files changed, 142 insertions(+), 61 deletions(-)

-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 4/4] IB/ucma: Take the network namespace from the process

2015-08-27 Thread Haggai Eran
From: Guy Shapiro 

Add support for network namespaces from user space. This is done by passing
the network namespace of the process instead of init_net.

Signed-off-by: Haggai Eran 
Signed-off-by: Yotam Kenneth 
Signed-off-by: Shachar Raindel 
Signed-off-by: Guy Shapiro 
---
 drivers/infiniband/core/ucma.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 82a17a7b7e6d..00402e6d505a 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -472,8 +473,8 @@ static ssize_t ucma_create_id(struct ucma_file *file, const 
char __user *inbuf,
return -ENOMEM;
 
ctx->uid = cmd.uid;
-   ctx->cm_id = rdma_create_id(&init_net, ucma_event_handler, ctx, cmd.ps,
-   qp_type);
+   ctx->cm_id = rdma_create_id(current->nsproxy->net_ns,
+   ucma_event_handler, ctx, cmd.ps, qp_type);
if (IS_ERR(ctx->cm_id)) {
ret = PTR_ERR(ctx->cm_id);
goto err1;
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 1/4] IB/addr: Pass network namespace as a parameter

2015-08-27 Thread Haggai Eran
From: Guy Shapiro 

Add network namespace support to the ib_addr module. For that, all the
address resolution and matching should be done using the appropriate
namespace instead of init_net.

This is achieved by:

1. Adding an explicit network namespace argument to exported function that
   require a namespace.
2. Saving the namespace in the rdma_addr_client structure.
3. Using it when calling networking functions.

In order to preserve the behavior of calling modules, &init_net is
passed as the parameter in calls from other modules. This is modified as
namespace support is added on more levels.

Signed-off-by: Haggai Eran 
Signed-off-by: Yotam Kenneth 
Signed-off-by: Shachar Raindel 
Signed-off-by: Guy Shapiro 
---
 drivers/infiniband/core/addr.c | 17 +
 drivers/infiniband/core/cma.c  |  1 +
 include/rdma/ib_addr.h | 16 +++-
 3 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 746cdf56bc76..6ed9685efebd 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -128,7 +128,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct 
rdma_dev_addr *dev_addr,
int ret = -EADDRNOTAVAIL;
 
if (dev_addr->bound_dev_if) {
-   dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
+   dev = dev_get_by_index(dev_addr->net, dev_addr->bound_dev_if);
if (!dev)
return -ENODEV;
ret = rdma_copy_addr(dev_addr, dev, NULL);
@@ -138,7 +138,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct 
rdma_dev_addr *dev_addr,
 
switch (addr->sa_family) {
case AF_INET:
-   dev = ip_dev_find(&init_net,
+   dev = ip_dev_find(dev_addr->net,
((struct sockaddr_in *) addr)->sin_addr.s_addr);
 
if (!dev)
@@ -149,12 +149,11 @@ int rdma_translate_ip(struct sockaddr *addr, struct 
rdma_dev_addr *dev_addr,
*vlan_id = rdma_vlan_dev_vlan_id(dev);
dev_put(dev);
break;
-
 #if IS_ENABLED(CONFIG_IPV6)
case AF_INET6:
rcu_read_lock();
-   for_each_netdev_rcu(&init_net, dev) {
-   if (ipv6_chk_addr(&init_net,
+   for_each_netdev_rcu(dev_addr->net, dev) {
+   if (ipv6_chk_addr(dev_addr->net,
  &((struct sockaddr_in6 *) 
addr)->sin6_addr,
  dev, 1)) {
ret = rdma_copy_addr(dev_addr, dev, NULL);
@@ -236,7 +235,7 @@ static int addr4_resolve(struct sockaddr_in *src_in,
fl4.daddr = dst_ip;
fl4.saddr = src_ip;
fl4.flowi4_oif = addr->bound_dev_if;
-   rt = ip_route_output_key(&init_net, &fl4);
+   rt = ip_route_output_key(addr->net, &fl4);
if (IS_ERR(rt)) {
ret = PTR_ERR(rt);
goto out;
@@ -278,12 +277,12 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
fl6.saddr = src_in->sin6_addr;
fl6.flowi6_oif = addr->bound_dev_if;
 
-   dst = ip6_route_output(&init_net, NULL, &fl6);
+   dst = ip6_route_output(addr->net, NULL, &fl6);
if ((ret = dst->error))
goto put;
 
if (ipv6_addr_any(&fl6.saddr)) {
-   ret = ipv6_dev_get_saddr(&init_net, ip6_dst_idev(dst)->dev,
+   ret = ipv6_dev_get_saddr(addr->net, ip6_dst_idev(dst)->dev,
 &fl6.daddr, 0, &fl6.saddr);
if (ret)
goto put;
@@ -476,6 +475,7 @@ int rdma_addr_find_dmac_by_grh(const union ib_gid *sgid, 
const union ib_gid *dgi
rdma_gid2ip(&dgid_addr._sockaddr, dgid);
 
memset(&dev_addr, 0, sizeof(dev_addr));
+   dev_addr.net = &init_net;
 
ctx.addr = &dev_addr;
init_completion(&ctx.comp);
@@ -510,6 +510,7 @@ int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 
*smac, u16 *vlan_id)
rdma_gid2ip(&gid_addr._sockaddr, sgid);
 
memset(&dev_addr, 0, sizeof(dev_addr));
+   dev_addr.net = &init_net;
ret = rdma_translate_ip(&gid_addr._sockaddr, &dev_addr, vlan_id);
if (ret)
return ret;
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index b1ab13f3e182..0530c6188e75 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -601,6 +601,7 @@ struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler 
event_handler,
INIT_LIST_HEAD(&id_priv->listen_list);
INIT_LIST_HEAD(&id_priv->mc_list);
get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
+   id_priv->id.route.addr.dev_addr.net = &init_net;
 
return &id_priv->id;
 }
diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
index fde33ac6b58a..3d9afc3bc601 100644
--- a/include/rdma/ib_addr.h
+++ b/includ

[PATCH v6 2/4] IB/cma: Separate port allocation to network namespaces

2015-08-27 Thread Haggai Eran
Keep a struct for each network namespace containing the IDRs for the RDMA
CM port spaces. The struct is created dynamically using the generic_net
mechanism.

This patch is internal infrastructure work for the following patches. In
this patch, init_net is statically used as the network namespace for
the new port-space API.

Signed-off-by: Haggai Eran 
Signed-off-by: Yotam Kenneth 
Signed-off-by: Shachar Raindel 
Signed-off-by: Guy Shapiro 
---
 drivers/infiniband/core/cma.c | 94 ---
 1 file changed, 70 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 0530c6188e75..f40ca053fa3e 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -44,6 +44,8 @@
 #include 
 #include 
 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -110,22 +112,33 @@ static LIST_HEAD(dev_list);
 static LIST_HEAD(listen_any_list);
 static DEFINE_MUTEX(lock);
 static struct workqueue_struct *cma_wq;
-static DEFINE_IDR(tcp_ps);
-static DEFINE_IDR(udp_ps);
-static DEFINE_IDR(ipoib_ps);
-static DEFINE_IDR(ib_ps);
+static int cma_pernet_id;
 
-static struct idr *cma_idr(enum rdma_port_space ps)
+struct cma_pernet {
+   struct idr tcp_ps;
+   struct idr udp_ps;
+   struct idr ipoib_ps;
+   struct idr ib_ps;
+};
+
+static struct cma_pernet *cma_pernet(struct net *net)
+{
+   return net_generic(net, cma_pernet_id);
+}
+
+static struct idr *cma_pernet_idr(struct net *net, enum rdma_port_space ps)
 {
+   struct cma_pernet *pernet = cma_pernet(net);
+
switch (ps) {
case RDMA_PS_TCP:
-   return &tcp_ps;
+   return &pernet->tcp_ps;
case RDMA_PS_UDP:
-   return &udp_ps;
+   return &pernet->udp_ps;
case RDMA_PS_IPOIB:
-   return &ipoib_ps;
+   return &pernet->ipoib_ps;
case RDMA_PS_IB:
-   return &ib_ps;
+   return &pernet->ib_ps;
default:
return NULL;
}
@@ -145,24 +158,25 @@ struct rdma_bind_list {
unsigned short  port;
 };
 
-static int cma_ps_alloc(enum rdma_port_space ps,
+static int cma_ps_alloc(struct net *net, enum rdma_port_space ps,
struct rdma_bind_list *bind_list, int snum)
 {
-   struct idr *idr = cma_idr(ps);
+   struct idr *idr = cma_pernet_idr(net, ps);
 
return idr_alloc(idr, bind_list, snum, snum + 1, GFP_KERNEL);
 }
 
-static struct rdma_bind_list *cma_ps_find(enum rdma_port_space ps, int snum)
+static struct rdma_bind_list *cma_ps_find(struct net *net,
+ enum rdma_port_space ps, int snum)
 {
-   struct idr *idr = cma_idr(ps);
+   struct idr *idr = cma_pernet_idr(net, ps);
 
return idr_find(idr, snum);
 }
 
-static void cma_ps_remove(enum rdma_port_space ps, int snum)
+static void cma_ps_remove(struct net *net, enum rdma_port_space ps, int snum)
 {
-   struct idr *idr = cma_idr(ps);
+   struct idr *idr = cma_pernet_idr(net, ps);
 
idr_remove(idr, snum);
 }
@@ -1300,7 +1314,8 @@ static struct rdma_id_private *cma_id_from_event(struct 
ib_cm_id *cm_id,
}
}
 
-   bind_list = cma_ps_find(rdma_ps_from_service_id(req.service_id),
+   bind_list = cma_ps_find(&init_net,
+   rdma_ps_from_service_id(req.service_id),
cma_port_from_service_id(req.service_id));
id_priv = cma_find_listener(bind_list, cm_id, ib_event, &req, *net_dev);
if (IS_ERR(id_priv)) {
@@ -1378,7 +1393,7 @@ static void cma_release_port(struct rdma_id_private 
*id_priv)
mutex_lock(&lock);
hlist_del(&id_priv->node);
if (hlist_empty(&bind_list->owners)) {
-   cma_ps_remove(bind_list->ps, bind_list->port);
+   cma_ps_remove(&init_net, bind_list->ps, bind_list->port);
kfree(bind_list);
}
mutex_unlock(&lock);
@@ -2663,7 +2678,7 @@ static int cma_alloc_port(enum rdma_port_space ps,
if (!bind_list)
return -ENOMEM;
 
-   ret = cma_ps_alloc(ps, bind_list, snum);
+   ret = cma_ps_alloc(&init_net, ps, bind_list, snum);
if (ret < 0)
goto err;
 
@@ -2688,7 +2703,7 @@ static int cma_alloc_any_port(enum rdma_port_space ps,
rover = prandom_u32() % remaining + low;
 retry:
if (last_used_port != rover &&
-   !cma_ps_find(ps, (unsigned short)rover)) {
+   !cma_ps_find(&init_net, ps, (unsigned short)rover)) {
int ret = cma_alloc_port(ps, id_priv, rover);
/*
 * Remember previously used port number in order to avoid
@@ -2754,7 +2769,7 @@ static int cma_use_port(enum rdma_port_space ps,
if (snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE))
return -EACCES;
 
-   bind_list = cma_ps_find(ps, snum);
+   b

RE: [PATCH] infiniband:cxgb4:Fix if statement check in the function pick_local_ip6adddrs

2015-08-27 Thread Steve Wise
Acked-by: Steve Wise 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] infiniband:cxgb4:Fix incorrect return statement in the function c4iw_reject_cr

2015-08-27 Thread Steve Wise

> -Original Message-
> From: Nicholas Krause [mailto:xerofo...@gmail.com]
> Sent: Wednesday, August 26, 2015 7:22 PM
> To: sw...@chelsio.com
> Cc: dledf...@redhat.com; sean.he...@intel.com; hal.rosenst...@gmail.com; 
> linux-rdma@vger.kernel.org; linux-ker...@vger.kernel.org
> Subject: [PATCH] infiniband:cxgb4:Fix incorrect return statement in the 
> function c4iw_reject_cr
> 
> This fixes the incorrect return statement in the function
> c4iw_reject_cr that returns the value zero directly to instead
> return the variable err as this function can fail when called
> and if so we will incorrectly return success rather then the
> correct status of a failed call to the caller of this particular
> function.
> 
> Signed-off-by: Nicholas Krause 
> ---

NAK.  

The return code for these cpl handlers indicates if process_work() or other 
callers needs to free the skb.   They are supposed to
return 0.



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/cma: Fix net_dev reference leak with failed requests

2015-08-27 Thread Haggai Eran
When no matching listening ID is found for a given request, the net_dev
that was used to find the request isn't released.

Fixes: 20c36836ecad ("IB/cma: Use found net_dev for passive connections")
Signed-off-by: Haggai Eran 
---
 drivers/infiniband/core/cma.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 9b306d7b5c27..b1ab13f3e182 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1302,6 +1302,10 @@ static struct rdma_id_private *cma_id_from_event(struct 
ib_cm_id *cm_id,
bind_list = cma_ps_find(rdma_ps_from_service_id(req.service_id),
cma_port_from_service_id(req.service_id));
id_priv = cma_find_listener(bind_list, cm_id, ib_event, &req, *net_dev);
+   if (IS_ERR(id_priv)) {
+   dev_put(*net_dev);
+   *net_dev = NULL;
+   }
 
return id_priv;
 }
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html