RE: [PATCH v6 0/4] Add network namespace support in the RDMA-CM
This series looks reasonable to me -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] IB/ipoib: Suppress warning for send only join failures
On Wed, Aug 26, 2015 at 05:41:08AM -0400, Hal Rosenstock wrote: > On 8/25/2015 12:28 PM, Jason Gunthorpe wrote: > > On Tue, Aug 25, 2015 at 08:59:13AM -0400, Hal Rosenstock wrote: > >>> - if (mcast->logcount++ < 20) { > >>> - if (status == -ETIMEDOUT || status == -EAGAIN) { > >>> + bool silent_fail = > >>> + test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) && > >>> + status == -EINVAL; > >> > >> Aren't there other reasons that send only join might have EINVAL > >> indicated ? > > > > Not sure, the layers below all eat the detailed error code. Hopefully > > EINVAL isn't re-used. > > AFAIR there are a number of reasons EINVAL could occur here in which > case this makes this change overly silent. If so, this particular > failure case of send only join failure due to SM rejection (perhaps > ERR_REQ_INVALID SA status only) is best to be made unique and different > from the other current EINVAL failures here. That is way to much to undertake just to silence this message. Unless you know the other EINVALs are likely to happen, I'd just ignore this imperfection. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] IB/ipoib: Clean up send-only multicast joins
On Wed, Aug 26, 2015 at 12:43:15PM -0400, Doug Ledford wrote: > That still takes us back to the fact that the locking changes are > unneeded. I'm not opposed to them, but as you mentioned in your first > email, they should go with the changes that require them, and none of > the changes in the first patch require them. Which means that if we > want to keep them, it might be worth splitting them out and giving them > their own patch with an explanation of why they are a benefit (lightly > contended code, saves a release/reacquire on the failure path). Lets just drop them, the cost for restructing was an added empty lock grab on a non-error path. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next V3 1/8] IB/core: Change provider's API of create_cq to be extendible
Ok we tested this patchset with Matans timestamp-v2 branches from his repo on github and the timestamps now work fine. Can we please get the user space library bits into libibverbs and libmlx4? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next V3 7/8] IB/mlx4: Add mmap call to map the hardware clock
Could you please post an updates patch that reflects the current state in Matan's tree? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] Add support for extended query device capabilities
From: Eli Cohen Add the verb ibv_query_device_ex which is extensible and allows following commits to add new features to define additional properties. Signed-off-by: Eli Cohen Signed-off-by: Haggai Eran --- Makefile.am | 3 +- examples/devinfo.c| 16 -- include/infiniband/driver.h | 9 include/infiniband/kern-abi.h | 26 +- include/infiniband/verbs.h| 28 ++ man/ibv_query_device_ex.3 | 47 + src/cmd.c | 118 -- src/libibverbs.map| 2 + 8 files changed, 202 insertions(+), 47 deletions(-) create mode 100644 man/ibv_query_device_ex.3 diff --git a/Makefile.am b/Makefile.am index ef4df033581d..c85e98ae0662 100644 --- a/Makefile.am +++ b/Makefile.am @@ -62,7 +62,8 @@ man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 man/ibv_devinfo.1 \ man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3 \ man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3 \ man/ibv_create_qp_ex.3 man/ibv_create_srq_ex.3 man/ibv_open_xrcd.3 \ -man/ibv_get_srq_num.3 man/ibv_open_qp.3 +man/ibv_get_srq_num.3 man/ibv_open_qp.3 \ +man/ibv_query_device_ex.3 DEBIAN = debian/changelog debian/compat debian/control debian/copyright \ debian/ibverbs-utils.install debian/libibverbs1.install \ diff --git a/examples/devinfo.c b/examples/devinfo.c index afa8c853868f..95e8f83753ca 100644 --- a/examples/devinfo.c +++ b/examples/devinfo.c @@ -208,6 +208,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port) { struct ibv_context *ctx; struct ibv_device_attr device_attr; + struct ibv_device_attr_ex attrx; struct ibv_port_attr port_attr; int rc = 0; uint8_t port; @@ -219,11 +220,18 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port) rc = 1; goto cleanup; } - if (ibv_query_device(ctx, &device_attr)) { - fprintf(stderr, "Failed to query device props\n"); - rc = 2; - goto cleanup; + + if (ibv_query_device_ex(ctx, &attrx)) { + attrx.comp_mask = 0; + if (ibv_query_device(ctx, &device_attr)) { + fprintf(stderr, "Failed to query device props\n"); + rc = 2; + goto cleanup; + } + } else { + device_attr = attrx.orig_attr; } + if (ib_port && ib_port > device_attr.phys_port_cnt) { fprintf(stderr, "Invalid port requested for device\n"); /* rc = 3 is taken by failure to clean up */ diff --git a/include/infiniband/driver.h b/include/infiniband/driver.h index 5cc092bf9bd5..b78093ae6a8e 100644 --- a/include/infiniband/driver.h +++ b/include/infiniband/driver.h @@ -105,6 +105,15 @@ int ibv_cmd_query_device(struct ibv_context *context, struct ibv_device_attr *device_attr, uint64_t *raw_fw_ver, struct ibv_query_device *cmd, size_t cmd_size); +int ibv_cmd_query_device_ex(struct ibv_context *context, + struct ibv_device_attr_ex *attr, + uint64_t *raw_fw_ver, + struct ibv_query_device_ex *cmd, + size_t cmd_core_size, + size_t cmd_size, + struct ibv_query_device_resp_ex *resp, + size_t resp_core_size, + size_t resp_size); int ibv_cmd_query_port(struct ibv_context *context, uint8_t port_num, struct ibv_port_attr *port_attr, struct ibv_query_port *cmd, size_t cmd_size); diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h index 91b45d837239..af2a1bebf683 100644 --- a/include/infiniband/kern-abi.h +++ b/include/infiniband/kern-abi.h @@ -101,12 +101,20 @@ enum { #define IB_USER_VERBS_CMD_FLAG_EXTENDED0x80ul +/* use this mask for creating extended commands that + correspond to old commands */ +#define IB_USER_VERBS_CMD_EXTENDED_MASK \ + (IB_USER_VERBS_CMD_FLAG_EXTENDED << \ +IB_USER_VERBS_CMD_FLAGS_SHIFT) + enum { IB_USER_VERBS_CMD_CREATE_FLOW = (IB_USER_VERBS_CMD_FLAG_EXTENDED << IB_USER_VERBS_CMD_FLAGS_SHIFT) + IB_USER_VERBS_CMD_THRESHOLD, - IB_USER_VERBS_CMD_DESTROY_FLOW + IB_USER_VERBS_CMD_DESTROY_FLOW, + IB_USER_VERBS_CMD_QUERY_DEVICE_EX = IB_USER_VERBS_CMD_EXTENDED_MASK | + IB_USER_VERBS_CMD_QUERY_DEVICE, }; /* @@ -240,6 +248,19 @@ struct ibv_query_device_resp { __u8 reserved[4]; }; +struct ibv_query_device_ex { + struct ex_hdr hdr; +
[PATCH 0/3] libibverbs: On-demand paging support
This series adds userspace support for on-demand paging. The first patch adds support for the new extended query device verb. Patch 2 adds the capability and interface bits related to on-demand paging, and patch 3 adds example code to the rc_pingpong program to use on-demand paging. Eli Cohen (1): Add support for extended query device capabilities Haggai Eran (1): Add on-demand paging support Majd Dibbiny (1): libibverbs/examples: Support odp in rc_pingpong Makefile.am | 3 +- examples/devinfo.c| 67 -- examples/rc_pingpong.c| 31 +- include/infiniband/driver.h | 9 +++ include/infiniband/kern-abi.h | 36 +++- include/infiniband/verbs.h| 53 - man/ibv_query_device_ex.3 | 70 +++ man/ibv_reg_mr.3 | 2 + src/cmd.c | 129 +- src/libibverbs.map| 2 + 10 files changed, 352 insertions(+), 50 deletions(-) create mode 100644 man/ibv_query_device_ex.3 -- 1.7.11.2 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] libibverbs/examples: Support odp in rc_pingpong
From: Majd Dibbiny Signed-off-by: Majd Dibbiny Signed-off-by: Haggai Eran --- examples/rc_pingpong.c | 31 +-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c index ddfe8d007e1a..904ec83a633f 100644 --- a/examples/rc_pingpong.c +++ b/examples/rc_pingpong.c @@ -55,6 +55,7 @@ enum { }; static int page_size; +static int use_odp; struct pingpong_context { struct ibv_context *context; @@ -315,6 +316,7 @@ static struct pingpong_context *pp_init_ctx(struct ibv_device *ib_dev, int size, int use_event) { struct pingpong_context *ctx; + int access_flags = IBV_ACCESS_LOCAL_WRITE; ctx = calloc(1, sizeof *ctx); if (!ctx) @@ -355,7 +357,25 @@ static struct pingpong_context *pp_init_ctx(struct ibv_device *ib_dev, int size, goto clean_comp_channel; } - ctx->mr = ibv_reg_mr(ctx->pd, ctx->buf, size, IBV_ACCESS_LOCAL_WRITE); + if (use_odp) { + const uint32_t rc_caps_mask = IBV_ODP_SUPPORT_SEND | + IBV_ODP_SUPPORT_RECV; + struct ibv_device_attr_ex attrx = {}; + + if (ibv_query_device_ex(ctx->context, &attrx)) { + fprintf(stderr, "Couldn't query device for its features\n"); + goto clean_comp_channel; + } + + if (!(attrx.odp_caps.general_caps & IBV_ODP_SUPPORT) || + (attrx.odp_caps.per_transport_caps.rc_odp_caps & rc_caps_mask) != rc_caps_mask) { + fprintf(stderr, "The device isn't ODP capable or does not support RC send and receive with ODP\n"); + goto clean_comp_channel; + } + access_flags |= IBV_ACCESS_ON_DEMAND; + } + ctx->mr = ibv_reg_mr(ctx->pd, ctx->buf, size, access_flags); + if (!ctx->mr) { fprintf(stderr, "Couldn't register MR\n"); goto clean_pd; @@ -540,6 +560,7 @@ static void usage(const char *argv0) printf(" -l, --sl= service level value\n"); printf(" -e, --events sleep on CQ events (default poll)\n"); printf(" -g, --gid-idx= local port gid index\n"); + printf(" -o, --odp use on demand paging\n"); } int main(int argc, char *argv[]) @@ -582,11 +603,13 @@ int main(int argc, char *argv[]) { .name = "sl", .has_arg = 1, .val = 'l' }, { .name = "events", .has_arg = 0, .val = 'e' }, { .name = "gid-idx", .has_arg = 1, .val = 'g' }, + { .name = "odp", .has_arg = 0, .val = 'o' }, { 0 } }; - c = getopt_long(argc, argv, "p:d:i:s:m:r:n:l:eg:", + c = getopt_long(argc, argv, "p:d:i:s:m:r:n:l:eg:o", long_options, NULL); + if (c == -1) break; @@ -643,6 +666,10 @@ int main(int argc, char *argv[]) gidx = strtol(optarg, NULL, 0); break; + case 'o': + use_odp = 1; + break; + default: usage(argv[0]); return 1; -- 1.7.11.2 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] Add on-demand paging support
On-demand paging feature allows registering memory regions without pinning their pages. Unfortunately the feature doesn't work together will all transports and all operations. This patch adds the ability to report on-demand paging capabilities through the ibv_query_device_ex. The patch also add the IBV_ACCESS_ON_DEMAND access flag to allow registration of on-demand paging enabled memory regions. Signed-off-by: Shachar Raindel Signed-off-by: Majd Dibbiny Signed-off-by: Haggai Eran --- examples/devinfo.c| 51 +++ include/infiniband/kern-abi.h | 12 +- include/infiniband/verbs.h| 25 - man/ibv_query_device_ex.3 | 23 +++ man/ibv_reg_mr.3 | 2 ++ src/cmd.c | 11 ++ 6 files changed, 122 insertions(+), 2 deletions(-) diff --git a/examples/devinfo.c b/examples/devinfo.c index 95e8f83753ca..61cfdf520be6 100644 --- a/examples/devinfo.c +++ b/examples/devinfo.c @@ -43,6 +43,7 @@ #include #include #include +#include #include #include @@ -204,6 +205,54 @@ static const char *link_layer_str(uint8_t link_layer) } } +void print_odp_trans_caps(uint32_t trans) +{ + uint32_t unknown_transport_caps = ~(IBV_ODP_SUPPORT_SEND | + IBV_ODP_SUPPORT_RECV | + IBV_ODP_SUPPORT_WRITE | + IBV_ODP_SUPPORT_READ | + IBV_ODP_SUPPORT_ATOMIC); + + if (!trans) { + printf("\t\t\t\t\tNO SUPPORT\n"); + } else { + if (trans & IBV_ODP_SUPPORT_SEND) + printf("\t\t\t\t\tSUPPORT_SEND\n"); + if (trans & IBV_ODP_SUPPORT_RECV) + printf("\t\t\t\t\tSUPPORT_RECV\n"); + if (trans & IBV_ODP_SUPPORT_WRITE) + printf("\t\t\t\t\tSUPPORT_WRITE\n"); + if (trans & IBV_ODP_SUPPORT_READ) + printf("\t\t\t\t\tSUPPORT_READ\n"); + if (trans & IBV_ODP_SUPPORT_ATOMIC) + printf("\t\t\t\t\tSUPPORT_ATOMIC\n"); + if (trans & unknown_transport_caps) + printf("\t\t\t\t\tUnknown flags: 0x%" PRIX32 "\n", + trans & unknown_transport_caps); + } +} + +void print_odp_caps(struct ibv_odp_caps caps) +{ + uint64_t unknown_general_caps = ~(IBV_ODP_SUPPORT); + + /* general odp caps */ + printf("\tgeneral_odp_caps:\n"); + if (caps.general_caps & IBV_ODP_SUPPORT) + printf("\t\t\t\t\tODP_SUPPORT\n"); + if (caps.general_caps & unknown_general_caps) + printf("\t\t\t\t\tUnknown flags: 0x%" PRIX64 "\n", + caps.general_caps & unknown_general_caps); + + /* RC transport */ + printf("\trc_odp_caps:\n"); + print_odp_trans_caps(caps.per_transport_caps.rc_odp_caps); + printf("\tuc_odp_caps:\n"); + print_odp_trans_caps(caps.per_transport_caps.uc_odp_caps); + printf("\tud_odp_caps:\n"); + print_odp_trans_caps(caps.per_transport_caps.ud_odp_caps); +} + static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port) { struct ibv_context *ctx; @@ -296,6 +345,8 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port) } printf("\tmax_pkeys:\t\t\t%d\n", device_attr.max_pkeys); printf("\tlocal_ca_ack_delay:\t\t%d\n", device_attr.local_ca_ack_delay); + + print_odp_caps(attrx.odp_caps); } for (port = 1; port <= device_attr.phys_port_cnt; ++port) { diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h index af2a1bebf683..1c0d0d30c612 100644 --- a/include/infiniband/kern-abi.h +++ b/include/infiniband/kern-abi.h @@ -254,11 +254,21 @@ struct ibv_query_device_ex { __u32 reserved; }; +struct ibv_odp_caps_resp { + __u64 general_caps; + struct { + __u32 rc_odp_caps; + __u32 uc_odp_caps; + __u32 ud_odp_caps; + } per_transport_caps; + __u32 reserved; +}; + struct ibv_query_device_resp_ex { struct ibv_query_device_resp base; __u32 comp_mask; __u32 response_length; - __u64 reserved[3]; + struct ibv_odp_caps_resp odp_caps; }; struct ibv_query_port { diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index ff806bf8555d..ce56315b236e 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -168,9 +168,31 @@ struct ibv_device_attr { uint8_t phys_port_cnt; }; +enum ibv_odp_transport_cap_bits { + IBV_ODP_SUPPORT_SEND = 1 << 0, + IBV_ODP_SUPPORT_RECV = 1 << 1, + IBV_ODP_SUPPORT_WRITE= 1 << 2, + IBV_ODP_SUPPORT_READ = 1 << 3, + IBV_ODP_
[PATCH v6 3/4] IB/cma: Add support for network namespaces
From: Guy Shapiro Add support for network namespaces in the ib_cma module. This is accomplished by: 1. Adding network namespace parameter for rdma_create_id. This parameter is used to populate the network namespace field in rdma_id_private. rdma_create_id keeps a reference on the network namespace. 2. Using the network namespace from the rdma_id instead of init_net inside of ib_cma, when listening on an ID and when looking for an ID for an incoming request. 3. Decrementing the reference count for the appropriate network namespace when calling rdma_destroy_id. In order to preserve the current behavior init_net is passed when calling from other modules. Signed-off-by: Guy Shapiro Signed-off-by: Haggai Eran Signed-off-by: Yotam Kenneth Signed-off-by: Shachar Raindel --- drivers/infiniband/core/cma.c | 46 +- drivers/infiniband/core/ucma.c | 3 +- drivers/infiniband/ulp/iser/iser_verbs.c | 2 +- drivers/infiniband/ulp/isert/ib_isert.c| 2 +- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h| 4 +- include/rdma/rdma_cm.h | 6 ++- net/9p/trans_rdma.c| 4 +- net/rds/ib.c | 2 +- net/rds/ib_cm.c| 2 +- net/rds/iw.c | 2 +- net/rds/iw_cm.c| 2 +- net/rds/rdma_transport.c | 4 +- net/sunrpc/xprtrdma/svc_rdma_transport.c | 4 +- net/sunrpc/xprtrdma/verbs.c| 3 +- 14 files changed, 52 insertions(+), 34 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index f40ca053fa3e..debf25ccf930 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -591,7 +591,8 @@ static int cma_disable_callback(struct rdma_id_private *id_priv, return 0; } -struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler, +struct rdma_cm_id *rdma_create_id(struct net *net, + rdma_cm_event_handler event_handler, void *context, enum rdma_port_space ps, enum ib_qp_type qp_type) { @@ -615,7 +616,7 @@ struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler, INIT_LIST_HEAD(&id_priv->listen_list); INIT_LIST_HEAD(&id_priv->mc_list); get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num); - id_priv->id.route.addr.dev_addr.net = &init_net; + id_priv->id.route.addr.dev_addr.net = get_net(net); return &id_priv->id; } @@ -1257,7 +1258,7 @@ static bool cma_match_net_dev(const struct rdma_id_private *id_priv, return addr->src_addr.ss_family == AF_IB; return !addr->dev_addr.bound_dev_if || - (net_eq(dev_net(net_dev), &init_net) && + (net_eq(dev_net(net_dev), addr->dev_addr.net) && addr->dev_addr.bound_dev_if == net_dev->ifindex); } @@ -1314,7 +1315,7 @@ static struct rdma_id_private *cma_id_from_event(struct ib_cm_id *cm_id, } } - bind_list = cma_ps_find(&init_net, + bind_list = cma_ps_find(dev_net(*net_dev), rdma_ps_from_service_id(req.service_id), cma_port_from_service_id(req.service_id)); id_priv = cma_find_listener(bind_list, cm_id, ib_event, &req, *net_dev); @@ -1386,6 +1387,7 @@ static void cma_cancel_operation(struct rdma_id_private *id_priv, static void cma_release_port(struct rdma_id_private *id_priv) { struct rdma_bind_list *bind_list = id_priv->bind_list; + struct net *net = id_priv->id.route.addr.dev_addr.net; if (!bind_list) return; @@ -1393,7 +1395,7 @@ static void cma_release_port(struct rdma_id_private *id_priv) mutex_lock(&lock); hlist_del(&id_priv->node); if (hlist_empty(&bind_list->owners)) { - cma_ps_remove(&init_net, bind_list->ps, bind_list->port); + cma_ps_remove(net, bind_list->ps, bind_list->port); kfree(bind_list); } mutex_unlock(&lock); @@ -1452,6 +1454,7 @@ void rdma_destroy_id(struct rdma_cm_id *id) cma_deref_id(id_priv->id.context); kfree(id_priv->id.route.path_rec); + put_net(id_priv->id.route.addr.dev_addr.net); kfree(id_priv); } EXPORT_SYMBOL(rdma_destroy_id); @@ -1582,7 +1585,8 @@ static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id, ib_event->param.req_rcvd.primary_path->service_id; int ret; - id = rdma_create_id(listen_id->event_handler, listen_id->context, + id = rdma_create_id(listen_id->route.addr.dev_addr.net, + listen_id->event_handler, listen_id->con
[PATCH v6 0/4] Add network namespace support in the RDMA-CM
Hi, Now that the code for demuxing requests is inside rdma_cm, here are the patches to add InfiniBand network namespace again. Changes from v5: - removed patches that got in as part of the cleanup series. RDMA-CM uses IP based addressing and routing to setup RDMA connections between hosts. Currently, all of the IP interfaces and addresses used by the RDMA-CM must reside in the init_net namespace. This restricts the usage of containers with RDMA to only work with host network namespace (aka the kernel init_net NS instance). This patchset allows using network namespaces with the RDMA-CM. Each RDMA-CM id keeps a reference to a network namespace. This reference is based on the process network namespace at the time of the creation of the object or inherited from the listener. This network namespace is used to perform all IP and network related operations. Specifically, the local device lookup, as well as the remote GID address resolution are done in the context of the RDMA-CM object's namespace. This allows outgoing connections to reach the right target, even if the same IP address exists in multiple network namespaces. This can happen if each network namespace resides on a different P_Key. Additionally, the network namespace is used to split the listener port space tables. From the user point of view, each network namespace has a unique, completely independent tables for its port spaces. This allows running multiple instances of a single service on the same machine, using containers. The functionality introduced by this series would come into play when the transport is InfiniBand and IPoIB interfaces are assigned to each namespace. Multiple IPoIB interfaces can be created and assigned to different RDMA-CM capable containers, for example using pipework [1]. The patches apply against Doug's to-be-rebased tree for v4.3. The patchset is structured as follows: Patch 1 is a relatively trivial API extension, requiring the callers of certain ib_addr functions to provide a network namespace, as needed. Patches 2-4 add proper namespace support to the RDMA-CM module. This includes adding multiple port space tables, adding a network namespace parameter, and finally retrieving the namespace from the creating process. [1] https://github.com/jpetazzo/pipework/pull/108 Guy Shapiro (3): IB/addr: Pass network namespace as a parameter IB/cma: Add support for network namespaces IB/ucma: Take the network namespace from the process Haggai Eran (1): IB/cma: Separate port allocation to network namespaces drivers/infiniband/core/addr.c | 17 +-- drivers/infiniband/core/cma.c | 129 +++-- drivers/infiniband/core/ucma.c | 4 +- drivers/infiniband/ulp/iser/iser_verbs.c | 2 +- drivers/infiniband/ulp/isert/ib_isert.c| 2 +- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h| 4 +- include/rdma/ib_addr.h | 16 ++- include/rdma/rdma_cm.h | 6 +- net/9p/trans_rdma.c| 4 +- net/rds/ib.c | 2 +- net/rds/ib_cm.c| 2 +- net/rds/iw.c | 2 +- net/rds/iw_cm.c| 2 +- net/rds/rdma_transport.c | 4 +- net/sunrpc/xprtrdma/svc_rdma_transport.c | 4 +- net/sunrpc/xprtrdma/verbs.c| 3 +- 16 files changed, 142 insertions(+), 61 deletions(-) -- 1.7.11.2 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 4/4] IB/ucma: Take the network namespace from the process
From: Guy Shapiro Add support for network namespaces from user space. This is done by passing the network namespace of the process instead of init_net. Signed-off-by: Haggai Eran Signed-off-by: Yotam Kenneth Signed-off-by: Shachar Raindel Signed-off-by: Guy Shapiro --- drivers/infiniband/core/ucma.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index 82a17a7b7e6d..00402e6d505a 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -42,6 +42,7 @@ #include #include #include +#include #include #include @@ -472,8 +473,8 @@ static ssize_t ucma_create_id(struct ucma_file *file, const char __user *inbuf, return -ENOMEM; ctx->uid = cmd.uid; - ctx->cm_id = rdma_create_id(&init_net, ucma_event_handler, ctx, cmd.ps, - qp_type); + ctx->cm_id = rdma_create_id(current->nsproxy->net_ns, + ucma_event_handler, ctx, cmd.ps, qp_type); if (IS_ERR(ctx->cm_id)) { ret = PTR_ERR(ctx->cm_id); goto err1; -- 1.7.11.2 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 1/4] IB/addr: Pass network namespace as a parameter
From: Guy Shapiro Add network namespace support to the ib_addr module. For that, all the address resolution and matching should be done using the appropriate namespace instead of init_net. This is achieved by: 1. Adding an explicit network namespace argument to exported function that require a namespace. 2. Saving the namespace in the rdma_addr_client structure. 3. Using it when calling networking functions. In order to preserve the behavior of calling modules, &init_net is passed as the parameter in calls from other modules. This is modified as namespace support is added on more levels. Signed-off-by: Haggai Eran Signed-off-by: Yotam Kenneth Signed-off-by: Shachar Raindel Signed-off-by: Guy Shapiro --- drivers/infiniband/core/addr.c | 17 + drivers/infiniband/core/cma.c | 1 + include/rdma/ib_addr.h | 16 +++- 3 files changed, 25 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index 746cdf56bc76..6ed9685efebd 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -128,7 +128,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr, int ret = -EADDRNOTAVAIL; if (dev_addr->bound_dev_if) { - dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if); + dev = dev_get_by_index(dev_addr->net, dev_addr->bound_dev_if); if (!dev) return -ENODEV; ret = rdma_copy_addr(dev_addr, dev, NULL); @@ -138,7 +138,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr, switch (addr->sa_family) { case AF_INET: - dev = ip_dev_find(&init_net, + dev = ip_dev_find(dev_addr->net, ((struct sockaddr_in *) addr)->sin_addr.s_addr); if (!dev) @@ -149,12 +149,11 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr, *vlan_id = rdma_vlan_dev_vlan_id(dev); dev_put(dev); break; - #if IS_ENABLED(CONFIG_IPV6) case AF_INET6: rcu_read_lock(); - for_each_netdev_rcu(&init_net, dev) { - if (ipv6_chk_addr(&init_net, + for_each_netdev_rcu(dev_addr->net, dev) { + if (ipv6_chk_addr(dev_addr->net, &((struct sockaddr_in6 *) addr)->sin6_addr, dev, 1)) { ret = rdma_copy_addr(dev_addr, dev, NULL); @@ -236,7 +235,7 @@ static int addr4_resolve(struct sockaddr_in *src_in, fl4.daddr = dst_ip; fl4.saddr = src_ip; fl4.flowi4_oif = addr->bound_dev_if; - rt = ip_route_output_key(&init_net, &fl4); + rt = ip_route_output_key(addr->net, &fl4); if (IS_ERR(rt)) { ret = PTR_ERR(rt); goto out; @@ -278,12 +277,12 @@ static int addr6_resolve(struct sockaddr_in6 *src_in, fl6.saddr = src_in->sin6_addr; fl6.flowi6_oif = addr->bound_dev_if; - dst = ip6_route_output(&init_net, NULL, &fl6); + dst = ip6_route_output(addr->net, NULL, &fl6); if ((ret = dst->error)) goto put; if (ipv6_addr_any(&fl6.saddr)) { - ret = ipv6_dev_get_saddr(&init_net, ip6_dst_idev(dst)->dev, + ret = ipv6_dev_get_saddr(addr->net, ip6_dst_idev(dst)->dev, &fl6.daddr, 0, &fl6.saddr); if (ret) goto put; @@ -476,6 +475,7 @@ int rdma_addr_find_dmac_by_grh(const union ib_gid *sgid, const union ib_gid *dgi rdma_gid2ip(&dgid_addr._sockaddr, dgid); memset(&dev_addr, 0, sizeof(dev_addr)); + dev_addr.net = &init_net; ctx.addr = &dev_addr; init_completion(&ctx.comp); @@ -510,6 +510,7 @@ int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id) rdma_gid2ip(&gid_addr._sockaddr, sgid); memset(&dev_addr, 0, sizeof(dev_addr)); + dev_addr.net = &init_net; ret = rdma_translate_ip(&gid_addr._sockaddr, &dev_addr, vlan_id); if (ret) return ret; diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index b1ab13f3e182..0530c6188e75 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -601,6 +601,7 @@ struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler, INIT_LIST_HEAD(&id_priv->listen_list); INIT_LIST_HEAD(&id_priv->mc_list); get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num); + id_priv->id.route.addr.dev_addr.net = &init_net; return &id_priv->id; } diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h index fde33ac6b58a..3d9afc3bc601 100644 --- a/include/rdma/ib_addr.h +++ b/includ
[PATCH v6 2/4] IB/cma: Separate port allocation to network namespaces
Keep a struct for each network namespace containing the IDRs for the RDMA CM port spaces. The struct is created dynamically using the generic_net mechanism. This patch is internal infrastructure work for the following patches. In this patch, init_net is statically used as the network namespace for the new port-space API. Signed-off-by: Haggai Eran Signed-off-by: Yotam Kenneth Signed-off-by: Shachar Raindel Signed-off-by: Guy Shapiro --- drivers/infiniband/core/cma.c | 94 --- 1 file changed, 70 insertions(+), 24 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 0530c6188e75..f40ca053fa3e 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -44,6 +44,8 @@ #include #include +#include +#include #include #include #include @@ -110,22 +112,33 @@ static LIST_HEAD(dev_list); static LIST_HEAD(listen_any_list); static DEFINE_MUTEX(lock); static struct workqueue_struct *cma_wq; -static DEFINE_IDR(tcp_ps); -static DEFINE_IDR(udp_ps); -static DEFINE_IDR(ipoib_ps); -static DEFINE_IDR(ib_ps); +static int cma_pernet_id; -static struct idr *cma_idr(enum rdma_port_space ps) +struct cma_pernet { + struct idr tcp_ps; + struct idr udp_ps; + struct idr ipoib_ps; + struct idr ib_ps; +}; + +static struct cma_pernet *cma_pernet(struct net *net) +{ + return net_generic(net, cma_pernet_id); +} + +static struct idr *cma_pernet_idr(struct net *net, enum rdma_port_space ps) { + struct cma_pernet *pernet = cma_pernet(net); + switch (ps) { case RDMA_PS_TCP: - return &tcp_ps; + return &pernet->tcp_ps; case RDMA_PS_UDP: - return &udp_ps; + return &pernet->udp_ps; case RDMA_PS_IPOIB: - return &ipoib_ps; + return &pernet->ipoib_ps; case RDMA_PS_IB: - return &ib_ps; + return &pernet->ib_ps; default: return NULL; } @@ -145,24 +158,25 @@ struct rdma_bind_list { unsigned short port; }; -static int cma_ps_alloc(enum rdma_port_space ps, +static int cma_ps_alloc(struct net *net, enum rdma_port_space ps, struct rdma_bind_list *bind_list, int snum) { - struct idr *idr = cma_idr(ps); + struct idr *idr = cma_pernet_idr(net, ps); return idr_alloc(idr, bind_list, snum, snum + 1, GFP_KERNEL); } -static struct rdma_bind_list *cma_ps_find(enum rdma_port_space ps, int snum) +static struct rdma_bind_list *cma_ps_find(struct net *net, + enum rdma_port_space ps, int snum) { - struct idr *idr = cma_idr(ps); + struct idr *idr = cma_pernet_idr(net, ps); return idr_find(idr, snum); } -static void cma_ps_remove(enum rdma_port_space ps, int snum) +static void cma_ps_remove(struct net *net, enum rdma_port_space ps, int snum) { - struct idr *idr = cma_idr(ps); + struct idr *idr = cma_pernet_idr(net, ps); idr_remove(idr, snum); } @@ -1300,7 +1314,8 @@ static struct rdma_id_private *cma_id_from_event(struct ib_cm_id *cm_id, } } - bind_list = cma_ps_find(rdma_ps_from_service_id(req.service_id), + bind_list = cma_ps_find(&init_net, + rdma_ps_from_service_id(req.service_id), cma_port_from_service_id(req.service_id)); id_priv = cma_find_listener(bind_list, cm_id, ib_event, &req, *net_dev); if (IS_ERR(id_priv)) { @@ -1378,7 +1393,7 @@ static void cma_release_port(struct rdma_id_private *id_priv) mutex_lock(&lock); hlist_del(&id_priv->node); if (hlist_empty(&bind_list->owners)) { - cma_ps_remove(bind_list->ps, bind_list->port); + cma_ps_remove(&init_net, bind_list->ps, bind_list->port); kfree(bind_list); } mutex_unlock(&lock); @@ -2663,7 +2678,7 @@ static int cma_alloc_port(enum rdma_port_space ps, if (!bind_list) return -ENOMEM; - ret = cma_ps_alloc(ps, bind_list, snum); + ret = cma_ps_alloc(&init_net, ps, bind_list, snum); if (ret < 0) goto err; @@ -2688,7 +2703,7 @@ static int cma_alloc_any_port(enum rdma_port_space ps, rover = prandom_u32() % remaining + low; retry: if (last_used_port != rover && - !cma_ps_find(ps, (unsigned short)rover)) { + !cma_ps_find(&init_net, ps, (unsigned short)rover)) { int ret = cma_alloc_port(ps, id_priv, rover); /* * Remember previously used port number in order to avoid @@ -2754,7 +2769,7 @@ static int cma_use_port(enum rdma_port_space ps, if (snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE)) return -EACCES; - bind_list = cma_ps_find(ps, snum); + b
RE: [PATCH] infiniband:cxgb4:Fix if statement check in the function pick_local_ip6adddrs
Acked-by: Steve Wise -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] infiniband:cxgb4:Fix incorrect return statement in the function c4iw_reject_cr
> -Original Message- > From: Nicholas Krause [mailto:xerofo...@gmail.com] > Sent: Wednesday, August 26, 2015 7:22 PM > To: sw...@chelsio.com > Cc: dledf...@redhat.com; sean.he...@intel.com; hal.rosenst...@gmail.com; > linux-rdma@vger.kernel.org; linux-ker...@vger.kernel.org > Subject: [PATCH] infiniband:cxgb4:Fix incorrect return statement in the > function c4iw_reject_cr > > This fixes the incorrect return statement in the function > c4iw_reject_cr that returns the value zero directly to instead > return the variable err as this function can fail when called > and if so we will incorrectly return success rather then the > correct status of a failed call to the caller of this particular > function. > > Signed-off-by: Nicholas Krause > --- NAK. The return code for these cpl handlers indicates if process_work() or other callers needs to free the skb. They are supposed to return 0. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/cma: Fix net_dev reference leak with failed requests
When no matching listening ID is found for a given request, the net_dev that was used to find the request isn't released. Fixes: 20c36836ecad ("IB/cma: Use found net_dev for passive connections") Signed-off-by: Haggai Eran --- drivers/infiniband/core/cma.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 9b306d7b5c27..b1ab13f3e182 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1302,6 +1302,10 @@ static struct rdma_id_private *cma_id_from_event(struct ib_cm_id *cm_id, bind_list = cma_ps_find(rdma_ps_from_service_id(req.service_id), cma_port_from_service_id(req.service_id)); id_priv = cma_find_listener(bind_list, cm_id, ib_event, &req, *net_dev); + if (IS_ERR(id_priv)) { + dev_put(*net_dev); + *net_dev = NULL; + } return id_priv; } -- 1.7.11.2 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html