Re: [RFC PATCH 07/11] IB/Verbs: Use management helper has_mcast() and, cap_mcast() for mcast-check
On 03/27/2015 06:47 PM, Jason Gunthorpe wrote: On Fri, Mar 27, 2015 at 01:05:08PM -0400, ira.weiny wrote: But it seems redudent, since mcast_add_one will already not add a port that is not IB, so mcast_event_handler is not callable. Something to do with rocee/ib switching? I'm not sure about this either. This check seems to be necessary only on a per-port level. It does seem apparent that one can't go from Eth to IB. What happens if you go from IB to Eth on the port? Hmm... I see a mlx4_change_port_types which ultimately calls ib_unregister_device, which suggests the port type doesn't change at runtime (yay) Yeah, seems like mlx4 will reinitialize the device when port link layer changed. I've take a look at other HW, they directly return a static type or infer from transport type (I suppose this won't change dynamically). Thus I also agreed check inside mcast_event_handler() is unnecessary, maybe we can change that logical to WARN_ON(!cap_mcast()) ? Regards, Michael Wang So maybe these checks really are redundant? Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
infiniband/ulp/srpt/ib_srpt.c:1082: bug report
Hello there, [linux-4.0-rc6/drivers/infiniband/ulp/srpt/ib_srpt.c:1082] - [linux-4.0-rc6/drivers/infiniband/ulp/srpt/ib_srpt.c:1098]: (warning) Possible null pointer dereference: ch - otherwise it is redundant to check it against null. struct ib_device *dev = ch-sport-sdev-device; ... BUG_ON(!ch); Suggest move init of dev until *after* ch has been sanity checked against NULL. Regards David Binderman -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] OpenSM: command line option ignore-guids broken
On 3/27/2015 5:00 AM, Jens Domke wrote: this patch changes the documentation (--help and man page) from --ignore-guids to --ignore_guids, so that it matches the implementation Signed-off-by: Jens Domke jens.do...@tu-dresden.de Thanks. Applied. -- Hal -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/srpt: Suppress a compiler warning
On 3/30/2015 1:37 PM, Bart Van Assche wrote: Remove a BUG_ON(!ch) statement because it is superfluous - if the ch pointer would be NULL then the assignment in the first line of srpt_map_sg_to_ib_sge() would trigger a kernel oops anyway. This patch suppresses the following compiler warning: Possible null pointer dereference: ch - otherwise it is redundant to check it against null. Reported-by: David Binderman dcb...@hotmail.com Signed-off-by: Bart Van Assche bart.vanass...@sandisk.com --- drivers/infiniband/ulp/srpt/ib_srpt.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 6e0a477..4e74fc8 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -1095,7 +1095,6 @@ static int srpt_map_sg_to_ib_sge(struct srpt_rdma_ch *ch, int count, nrdma; int i, j, k; - BUG_ON(!ch); BUG_ON(!ioctx); cmd = ioctx-cmd; dir = cmd-data_direction; Acked-by: Sagi Grimberg sa...@mellanox.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [-stable] commit 377b513485fd (IB/core: Avoid leakage from kernel to user space)
On Fri, Mar 27, 2015 at 01:42:44PM +0100, Yann Droneaud wrote: Hi, Please add commit 377b513485fd (IB/core: Avoid leakage from kernel to user space) to -stable. It can be applied to v2.6.32 and later. Regards. -- Yann Droneaud OPTEYA -- To unsubscribe from this list: send the line unsubscribe stable in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Thanks, I'm queuing it for the 3.16 kernel. Cheers, -- Luís -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/srpt: Suppress a compiler warning
Remove a BUG_ON(!ch) statement because it is superfluous - if the ch pointer would be NULL then the assignment in the first line of srpt_map_sg_to_ib_sge() would trigger a kernel oops anyway. This patch suppresses the following compiler warning: Possible null pointer dereference: ch - otherwise it is redundant to check it against null. Reported-by: David Binderman dcb...@hotmail.com Signed-off-by: Bart Van Assche bart.vanass...@sandisk.com --- drivers/infiniband/ulp/srpt/ib_srpt.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 6e0a477..4e74fc8 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -1095,7 +1095,6 @@ static int srpt_map_sg_to_ib_sge(struct srpt_rdma_ch *ch, int count, nrdma; int i, j, k; - BUG_ON(!ch); BUG_ON(!ioctx); cmd = ioctx-cmd; dir = cmd-data_direction; -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check
On 03/27/2015 06:29 PM, Jason Gunthorpe wrote: On Fri, Mar 27, 2015 at 01:16:31PM -0400, ira.weiny wrote: [snip] http://www.spinics.net/lists/linux-rdma/msg22565.html ''Unlike IB, the iWARP protocol only allows 1 target/sink SGE in an rdma read'' It is one of those annoying verbs is different on iWarp things. So the max sge in the query_verbs must only apply to send/rdma write on iWarp? I found that actually we don't have to touch this one which only used by HW driver currently. I think we can leave these scenes there in device driver, since vendor could have different way to classify the usage of transfer and link layer. Our purpose is to introduce IB core management approach, which may not be applicable on device level, maybe we can just pass them :-) Regards, Michael Wang Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check
On 03/30/2015 06:13 PM, Doug Ledford wrote: On Fri, 2015-03-27 at 16:47 +0100, Michael Wang wrote: Introduce helper has_iwarp() to help us check if an IB device support IWARP protocol. This is a needless redirection. Just stick with the original rdma_transport_is_iwarp(). Agree, will leave it there :-) Regards, Michael Wang Cc: Jason Gunthorpe jguntho...@obsidianresearch.com Cc: Doug Ledford dledf...@redhat.com Cc: Ira Weiny ira.we...@intel.com Cc: Sean Hefty sean.he...@intel.com Signed-off-by: Michael Wang yun.w...@profitbricks.com --- include/rdma/ib_verbs.h | 13 + net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 2 +- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index e796104..0ef9cd7 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1836,6 +1836,19 @@ static inline int has_mcast(struct ib_device *device) } /** + * has_iwarp - Check if a device support IWARP protocol. + * + * @device: Device to be checked + * + * Return 0 when a device has none port to support + * IWARP protocol. + */ +static inline int has_iwarp(struct ib_device *device) +{ +return rdma_transport_is_iwarp(device); +} + +/** * cap_smi - Check if the port of device has the capability * Subnet Management Interface. * diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index a7b5891..48aeb5e 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -118,7 +118,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp, static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count) { -if (rdma_transport_is_iwarp(xprt-sc_cm_id-device)) +if (has_iwarp(xprt-sc_cm_id-device)) return 1; else return min_t(int, sge_count, xprt-sc_max_sge); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1
On Mar 30, 2015, at 10:18 AM, Steve Wise sw...@opengridcomputing.com wrote: Hey Chuck, Chelsio's QA regression tested this series on iw_cxgb4. Tests out good. Tests ran: spew, ffsb, xdd, fio, dbench, and cthon with both v3 and v4. Thanks, Steve. Who should I credit in the Tested-by tag? -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/11] IB/Verbs: Use helpers to check transport and link layer
On Mon, 2015-03-30 at 18:14 +0200, Michael Wang wrote: Hi, Doug Thanks for the comments :-) On 03/30/2015 05:56 PM, Doug Ledford wrote: On Fri, 2015-03-27 at 16:40 +0100, Michael Wang wrote: We have so much places to check transport type and link layer type, it's now make sense to introduce some helpers in order to refine the lengthy code. This patch will introduce helpers: rdma_transport_is_ib() rdma_transport_is_iwarp() rdma_port_ll_is_ib() rdma_port_ll_is_eth() and use them to save some code for us. If the end result is to do something like I proposed, then why take this intermediate step that just has to be backed out later? The problem is that I found there are still many places our new mechanism may could not take care, especially inside device driver, this is just try to collect the issues together as a basement so we can gradually eliminate them. There is no gradually eliminate them to the suggestion I made. Remember, my suggestion was to remove the transport and link_layer items from the port settings and replace it with just one transport item that is a bitmask of the possible transport types. This can not be done gradually, it must be a complete change all at once as the two methods of setting things are incompatible. As there is only one out of tree driver that I know of, lustre, we can give them the information they need to make their driver work both before and after the change. Sure if finally we do capture all the cases, we can just get rid of this one, but I guess it won't be that easy to directly jump into next stage :-P As I could imaging, after this reform, next stage could be introducing the new mechanism without changing device driver, and the last stage is to asking vendor adapt their code into the new mechanism. In other words, if our end goal is to have rdma_transport_is_ib() rdma_transport_is_iwarp() rdma_transport_is_roce() rdma_transport_is_opa() Then we should skip doing rdma_port_ll_is_*() as the answers to these items would be implied by rdma_transport_is_roce() and such. Great if we achieved that ;-) but currently I just wondering maybe these helpers can only cover part of the cases where we check transport and link layer, there are still some cases we'll need the very rough helper to save some code and make things clean~ Regards, Michael Wang Cc: Jason Gunthorpe jguntho...@obsidianresearch.com Cc: Doug Ledford dledf...@redhat.com Cc: Ira Weiny ira.we...@intel.com Cc: Sean Hefty sean.he...@intel.com Signed-off-by: Michael Wang yun.w...@profitbricks.com --- drivers/infiniband/core/agent.c | 2 +- drivers/infiniband/core/cm.c | 2 +- drivers/infiniband/core/cma.c | 27 --- drivers/infiniband/core/mad.c | 6 +++--- drivers/infiniband/core/multicast.c | 11 --- drivers/infiniband/core/sa_query.c| 14 +++--- drivers/infiniband/core/ucm.c | 3 +-- drivers/infiniband/core/user_mad.c| 2 +- drivers/infiniband/core/verbs.c | 5 ++--- drivers/infiniband/hw/mlx4/ah.c | 2 +- drivers/infiniband/hw/mlx4/cq.c | 4 +--- drivers/infiniband/hw/mlx4/mad.c | 14 -- drivers/infiniband/hw/mlx4/main.c | 8 +++- drivers/infiniband/hw/mlx4/mlx4_ib.h | 2 +- drivers/infiniband/hw/mlx4/qp.c | 21 +++-- drivers/infiniband/hw/mlx4/sysfs.c| 6 ++ drivers/infiniband/ulp/ipoib/ipoib_main.c | 6 +++--- include/rdma/ib_verbs.h | 24 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 3 +-- 19 files changed, 79 insertions(+), 83 deletions(-) diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c index f6d2961..27f1bec 100644 --- a/drivers/infiniband/core/agent.c +++ b/drivers/infiniband/core/agent.c @@ -156,7 +156,7 @@ int ib_agent_port_open(struct ib_device *device, int port_num) goto error1; } -if (rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND) { +if (rdma_port_ll_is_ib(device, port_num)) { /* Obtain send only MAD agent for SMI QP */ port_priv-agent[0] = ib_register_mad_agent(device, port_num, IB_QPT_SMI, NULL, 0, diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index e28a494..2c72e9e 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3762,7 +3762,7 @@ static void cm_add_one(struct ib_device *ib_device) int ret; u8 i; -if (rdma_node_get_transport(ib_device-node_type) != RDMA_TRANSPORT_IB) +if (!rdma_transport_is_ib(ib_device)) return; cm_dev = kzalloc(sizeof(*cm_dev) + sizeof(*port) * diff --git
Re: [RFC PATCH 02/11] IB/Verbs: Use management helper tech_iboe() for iboe-check
On 03/30/2015 06:17 PM, Doug Ledford wrote: On Fri, 2015-03-27 at 16:42 +0100, Michael Wang wrote: Introduce helper tech_iboe() to help us check if the port of an IB device is using RoCE/IBoE technology. Just use rdma_transport_is_roce() instead. Sounds good :-) will be in next version. Regards, Michael Wang Cc: Jason Gunthorpe jguntho...@obsidianresearch.com Cc: Doug Ledford dledf...@redhat.com Cc: Ira Weiny ira.we...@intel.com Cc: Sean Hefty sean.he...@intel.com Signed-off-by: Michael Wang yun.w...@profitbricks.com --- drivers/infiniband/core/cma.c | 6 ++ include/rdma/ib_verbs.h | 16 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 668e955..280cfe3 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -375,8 +375,7 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv, listen_id_priv-id.port_num) == dev_ll) { cma_dev = listen_id_priv-cma_dev; port = listen_id_priv-id.port_num; -if (rdma_transport_is_ib(cma_dev-device) -rdma_port_ll_is_eth(cma_dev-device, port)) +if (tech_iboe(cma_dev-device, port)) ret = ib_find_cached_gid(cma_dev-device, iboe_gid, found_port, NULL); else @@ -395,8 +394,7 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv, listen_id_priv-id.port_num == port) continue; if (rdma_port_get_link_layer(cma_dev-device, port) == dev_ll) { -if (rdma_transport_is_ib(cma_dev-device) -rdma_port_ll_is_eth(cma_dev-device, port)) +if (tech_iboe(cma_dev-device, port)) ret = ib_find_cached_gid(cma_dev-device, iboe_gid, found_port, NULL); else ret = ib_find_cached_gid(cma_dev-device, gid, found_port, NULL); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 2bf9094..ca6d6bc 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1767,6 +1767,22 @@ static inline int rdma_port_ll_is_eth(struct ib_device *device, u8 port_num) == IB_LINK_LAYER_ETHERNET; } +/** + * tech_iboe - Check if the port of device using technology + * RoCE/IBoE. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device is not using technology + * RoCE/IBoE. + */ +static inline int tech_iboe(struct ib_device *device, u8 port_num) +{ +return rdma_transport_is_ib(device) +rdma_port_ll_is_eth(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 07/11] IB/Verbs: Use management helper has_mcast() and, cap_mcast() for mcast-check
On 03/30/2015 06:11 PM, Doug Ledford wrote: On Fri, 2015-03-27 at 16:46 +0100, Michael Wang wrote: Introduce helper has_mcast() and cap_mcast() to help us check if an IB device or it's port support Multicast. This probably needs reworded or rethought. In truth, *all* rdma devices are multicast capable. *BUT*, IB/OPA devices require multicast registration done the IB way (including for sendonly multicast sends), while Ethernet devices do multicast the Ethernet way. These tests are really just for IB specific multicast registration and deregistration. Call it has_mcast() and cap_mcast() is incorrect. Thanks for the explanation :-) Jason also mentioned we should use cap_ib_XX() instead, I'll use that name then we can distinguish the management between Eth and IB/OPA. Regards, Michael Wang Cc: Jason Gunthorpe jguntho...@obsidianresearch.com Cc: Doug Ledford dledf...@redhat.com Cc: Ira Weiny ira.we...@intel.com Cc: Sean Hefty sean.he...@intel.com Signed-off-by: Michael Wang yun.w...@profitbricks.com --- drivers/infiniband/core/cma.c | 2 +- drivers/infiniband/core/multicast.c | 8 include/rdma/ib_verbs.h | 28 3 files changed, 33 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 276fb76..cbbc85b 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -3398,7 +3398,7 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr) ib_detach_mcast(id-qp, mc-multicast.ib-rec.mgid, be16_to_cpu(mc-multicast.ib-rec.mlid)); -if (rdma_transport_is_ib(id_priv-cma_dev-device)) { +if (has_mcast(id_priv-cma_dev-device)) { switch (rdma_port_get_link_layer(id-device, id-port_num)) { case IB_LINK_LAYER_INFINIBAND: ib_sa_free_multicast(mc-multicast.ib); diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c index 17573ff..ffeaf27 100644 --- a/drivers/infiniband/core/multicast.c +++ b/drivers/infiniband/core/multicast.c @@ -780,7 +780,7 @@ static void mcast_event_handler(struct ib_event_handler *handler, int index; dev = container_of(handler, struct mcast_device, event_handler); -if (!rdma_port_ll_is_ib(dev-device, event-element.port_num)) +if (!cap_mcast(dev-device, event-element.port_num)) return; index = event-element.port_num - dev-start_port; @@ -807,7 +807,7 @@ static void mcast_add_one(struct ib_device *device) int i; int count = 0; -if (!rdma_transport_is_ib(device)) +if (!has_mcast(device)) return; dev = kmalloc(sizeof *dev + device-phys_port_cnt * sizeof *port, @@ -823,7 +823,7 @@ static void mcast_add_one(struct ib_device *device) } for (i = 0; i = dev-end_port - dev-start_port; i++) { -if (!rdma_port_ll_is_ib(device, dev-start_port + i)) +if (!cap_mcast(device, dev-start_port + i)) continue; port = dev-port[i]; port-dev = dev; @@ -861,7 +861,7 @@ static void mcast_remove_one(struct ib_device *device) flush_workqueue(mcast_wq); for (i = 0; i = dev-end_port - dev-start_port; i++) { -if (rdma_port_ll_is_ib(device, dev-start_port + i)) { +if (cap_mcast(device, dev-start_port + i)) { port = dev-port[i]; deref_port(port); wait_for_completion(port-comp); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index fa8ffa3..e796104 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1823,6 +1823,19 @@ static inline int has_sa(struct ib_device *device) } /** + * has_mcast - Check if a device support Multicast. + * + * @device: Device to be checked + * + * Return 0 when a device has none port to support + * Multicast. + */ +static inline int has_mcast(struct ib_device *device) +{ +return rdma_transport_is_ib(device); +} + +/** * cap_smi - Check if the port of device has the capability * Subnet Management Interface. * @@ -1852,6 +1865,21 @@ static inline int cap_sa(struct ib_device *device, u8 port_num) return rdma_port_ll_is_ib(device, port_num); } +/** + * cap_mcast - Check if the port of device has the capability + * Multicast. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device don't support + * Multicast. + */ +static inline int cap_mcast(struct ib_device *device, u8 port_num) +{ +return rdma_port_ll_is_ib(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to
Re: [PATCH v1 00/10] Add network namespace support in the RDMA-CM
On Thu, Mar 19, 2015 at 5:12 PM, Or Gerlitz ogerl...@mellanox.com wrote: On 2/17/2015 5:53 PM, Or Gerlitz wrote: On 02/11/2015 05:06 PM, Shachar Raindel wrote: This patchset allows using network namespaces with the RDMA-CM. Each RDMA-CM and CM id is keeping a reference to a network namespace. [...] Hi Sean, Did you had the chance to look on the patches that do the changes to the cm and cma code? Sean, ping... you are the maintainer of the rdma-cm, these patches are here for many weeks, can you please take a look and provide your feedback. Sean, PING. Busy as you may be, the upstream cm rdma-cm maintainer hat is sitting solid on your head, and makes no sense for people doing development to submit patches touching these layers and get no feedback from you over months, so? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 06/11] IB/Verbs: Use management helper has_sa() and cap_sa(), for sa-check
On Mon, 2015-03-30 at 18:42 +0200, Michael Wang wrote: On 03/30/2015 06:16 PM, Doug Ledford wrote: On Fri, 2015-03-27 at 16:46 +0100, Michael Wang wrote: Introduce helper has_sa() and cap_sa() to help us check if an IB device or it's port support Subnet Administrator. There's no functional reason to have both rdma_transport_is_ib and rdma_port_ll_is_ib, just use one. Then there is also no reason for both has_sa and cap_sa. Just use one. The has_sa() will be eliminated :-) rdma_transport_is_ib and rdma_port_ll_is_ib is actually just rough helper to save some code, we can get rid of them when we no longer need them, but currently device driver still using them a lot, I'm not sure if the new mechanism could take cover all these cases... Sure it would. This is what I had suggested (well, close to this, I rearranged the order this time around): enum rdma_transport { RDMA_TRANSPORT_IB = 0x01, RDMA_TRANSPORT_OPA = 0x02, RDMA_TRANSPORT_IWARP = 0x04, RDMA_TRANSPORT_ROCE_V1 = 0x08, RDMA_TRANSPORT_ROCE_V2 = 0x10, }; struct ib_port { ... enum rdma_transport; ... }; static inline bool rdma_transport_is_ib(struct ib_port *port) { return port-transport (RDMA_TRANSPORT_IB | RDMA_TRANSPORT_OPA); } static inline bool rdma_transport_is_opa(struct ib_port *port) { return port-transport RDMA_TRANSPORT_OPA; } static inline bool rdma_transport_is_iwarp(struct ib_port *port) { return port-transport RDMA_TRANSPORT_IWARP; } static inline bool rdma_transport_is_roce(struct ib_port *port) { return port-transport (RDMA_TRANSPORT_ROCE_V1 | RDMA_TRANSPORT_ROCE_V2); } static inline bool rdma_ib_mgmt(struct ib_port *port) { return port-transport (RDMA_TRANSPORT_IB | RDMA_TRANSPORT_OPA); } static inline bool rdma_opa_mgmt(struct ib_port *port) { return port-transport RDMA_TRANSPORT_OPA; } If we use something like this, then the above is all you need. Then every place in the code that checks for something like has_sa or cap_sa can be replaced with rdma_ib_mgmt. When Ira updates his patches for this, he can check for rdma_opa_mgmt to enable jumbo MAD packets and whatever else he needs. Every place that does transport == IB and ll == Ethernet can become rdma_transport_is_roce. Every place that does transport == IB and ll == INFINIBAND becomes rdma_transport_is_ib. The code in multicast.c just needs to check rdma_ib_mgmt() (which happens to make perfect sense anyway as the code in multicast.c that is checking that we are on an IB interface is doing so because IB requires extra management of the multicast group joins/leaves). But, like I said, this is an all or nothing change, it isn't something we can ease into. -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH for-next 3/9] net/mlx4_core: Set initial admin GUIDs for VFs
On Sun, Mar 29, 2015 at 04:51:27PM +0300, Or Gerlitz wrote: +void mlx4_set_random_admin_guid(struct mlx4_dev *dev, int entry, int port) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + u8 random_mac[6]; + char *raw_gid; + + /* hw GUID */ + if (entry == 0) + return; + + eth_random_addr(random_mac); + raw_gid = (char *)priv-mfunc.master.vf_admin[entry].vport[port].guid; raw_gid is actually a guid + raw_gid[0] = random_mac[0] ^ 2; eth_random_addr already guarentees the ULA bit is set to one (local), so this is wrong. IBA uses the EUI-64 system, not the IPv6 modification. + raw_gid[1] = random_mac[1]; + raw_gid[2] = random_mac[2]; + raw_gid[3] = 0xff; + raw_gid[4] = 0xfe; This should be 0xff for mapping a MAC to a EUI-64 But, it doesn't really make sense to use eth_random_addr (which doesn't have a special OUI) and not randomize every bit. get_random_bytes(guid, sizeof(guid)); guid = ~(1ULL 56); guid |= 1ULL 57; I also don't think the kernel should be generating random GUIDs. Either the SA should be consulted to do this, or the management stack should generate a cloud wide unique number. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 06/11] IB/Verbs: Use management helper has_sa() and cap_sa(), for sa-check
On 03/30/2015 06:16 PM, Doug Ledford wrote: On Fri, 2015-03-27 at 16:46 +0100, Michael Wang wrote: Introduce helper has_sa() and cap_sa() to help us check if an IB device or it's port support Subnet Administrator. There's no functional reason to have both rdma_transport_is_ib and rdma_port_ll_is_ib, just use one. Then there is also no reason for both has_sa and cap_sa. Just use one. The has_sa() will be eliminated :-) rdma_transport_is_ib and rdma_port_ll_is_ib is actually just rough helper to save some code, we can get rid of them when we no longer need them, but currently device driver still using them a lot, I'm not sure if the new mechanism could take cover all these cases... Regards, Michael Wang Cc: Jason Gunthorpe jguntho...@obsidianresearch.com Cc: Doug Ledford dledf...@redhat.com Cc: Ira Weiny ira.we...@intel.com Cc: Sean Hefty sean.he...@intel.com Signed-off-by: Michael Wang yun.w...@profitbricks.com --- drivers/infiniband/core/sa_query.c | 12 ++-- include/rdma/ib_verbs.h| 28 2 files changed, 34 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index d95d25f..89c27da 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -450,7 +450,7 @@ static void ib_sa_event(struct ib_event_handler *handler, struct ib_event *event struct ib_sa_port *port = sa_dev-port[event-element.port_num - sa_dev-start_port]; -if (!rdma_port_ll_is_ib(handler-device, port-port_num)) +if (!cap_sa(handler-device, port-port_num)) return; spin_lock_irqsave(port-ah_lock, flags); @@ -1154,7 +1154,7 @@ static void ib_sa_add_one(struct ib_device *device) struct ib_sa_device *sa_dev; int s, e, i; -if (!rdma_transport_is_ib(device)) +if (!has_sa(device)) return; if (device-node_type == RDMA_NODE_IB_SWITCH) @@ -1175,7 +1175,7 @@ static void ib_sa_add_one(struct ib_device *device) for (i = 0; i = e - s; ++i) { spin_lock_init(sa_dev-port[i].ah_lock); -if (!rdma_port_ll_is_ib(device, i + 1)) +if (!cap_sa(device, i + 1)) continue; sa_dev-port[i].sm_ah= NULL; @@ -1205,14 +1205,14 @@ static void ib_sa_add_one(struct ib_device *device) goto err; for (i = 0; i = e - s; ++i) -if (rdma_port_ll_is_ib(device, i + 1)) +if (cap_sa(device, i + 1)) update_sm_ah(sa_dev-port[i].update_task); return; err: while (--i = 0) -if (rdma_port_ll_is_ib(device, i + 1)) +if (cap_sa(device, i + 1)) ib_unregister_mad_agent(sa_dev-port[i].agent); kfree(sa_dev); @@ -1233,7 +1233,7 @@ static void ib_sa_remove_one(struct ib_device *device) flush_workqueue(ib_wq); for (i = 0; i = sa_dev-end_port - sa_dev-start_port; ++i) { -if (rdma_port_ll_is_ib(device, i + 1)) { +if (cap_sa(device, i + 1)) { ib_unregister_mad_agent(sa_dev-port[i].agent); if (sa_dev-port[i].sm_ah) kref_put(sa_dev-port[i].sm_ah-ref, free_sm_ah); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index c0a63f8..fa8ffa3 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1810,6 +1810,19 @@ static inline int has_cm(struct ib_device *device) } /** + * has_sa - Check if a device support Subnet Administrator. + * + * @device: Device to be checked + * + * Return 0 when a device has none port to support + * Subnet Administrator. + */ +static inline int has_sa(struct ib_device *device) +{ +return rdma_transport_is_ib(device); +} + +/** * cap_smi - Check if the port of device has the capability * Subnet Management Interface. * @@ -1824,6 +1837,21 @@ static inline int cap_smi(struct ib_device *device, u8 port_num) return rdma_port_ll_is_ib(device, port_num); } +/** + * cap_sa - Check if the port of device has the capability + * Subnet Administrator. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device don't support + * Subnet Administrator. + */ +static inline int cap_sa(struct ib_device *device, u8 port_num) +{ +return rdma_port_ll_is_ib(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/11] IB/Verbs: Use helpers to check transport and link layer
On 03/30/2015 06:22 PM, Doug Ledford wrote: On Mon, 2015-03-30 at 18:14 +0200, Michael Wang wrote: [snip] There is no gradually eliminate them to the suggestion I made. Remember, my suggestion was to remove the transport and link_layer items from the port settings and replace it with just one transport item that is a bitmask of the possible transport types. This can not be done gradually, it must be a complete change all at once as the two methods of setting things are incompatible. As there is only one out of tree driver that I know of, lustre, we can give them the information they need to make their driver work both before and after the change. Actually there is something confused me on transport and link layer here, basically we have defined: transport type RDMA_TRANSPORT_IB, RDMA_TRANSPORT_IWARP, RDMA_TRANSPORT_USNIC, RDMA_TRANSPORT_USNIC_UDP link layer IB_LINK_LAYER_INFINIBAND, IB_LINK_LAYER_ETHERNET, So we could have a table: LL_INFINIBANDLL_ETHERNET UNCARE TRANSPORT_IB12 3 TRANSPORT_IWARP, 4 UNCARE 56 In current implementation I've found all these combination in core or driver, and I could see: rdma_transport_is_ib() 1 rdma_transport_is_iwarp() 4 rdma_transport_is_roce()2 Just confusing how to take care the combination 3,5,6? Regards, Michael Wang Sure if finally we do capture all the cases, we can just get rid of this one, but I guess it won't be that easy to directly jump into next stage :-P As I could imaging, after this reform, next stage could be introducing the new mechanism without changing device driver, and the last stage is to asking vendor adapt their code into the new mechanism. In other words, if our end goal is to have rdma_transport_is_ib() rdma_transport_is_iwarp() rdma_transport_is_roce() rdma_transport_is_opa() Then we should skip doing rdma_port_ll_is_*() as the answers to these items would be implied by rdma_transport_is_roce() and such. Great if we achieved that ;-) but currently I just wondering maybe these helpers can only cover part of the cases where we check transport and link layer, there are still some cases we'll need the very rough helper to save some code and make things clean~ Regards, Michael Wang Cc: Jason Gunthorpe jguntho...@obsidianresearch.com Cc: Doug Ledford dledf...@redhat.com Cc: Ira Weiny ira.we...@intel.com Cc: Sean Hefty sean.he...@intel.com Signed-off-by: Michael Wang yun.w...@profitbricks.com --- drivers/infiniband/core/agent.c | 2 +- drivers/infiniband/core/cm.c | 2 +- drivers/infiniband/core/cma.c | 27 --- drivers/infiniband/core/mad.c | 6 +++--- drivers/infiniband/core/multicast.c | 11 --- drivers/infiniband/core/sa_query.c| 14 +++--- drivers/infiniband/core/ucm.c | 3 +-- drivers/infiniband/core/user_mad.c| 2 +- drivers/infiniband/core/verbs.c | 5 ++--- drivers/infiniband/hw/mlx4/ah.c | 2 +- drivers/infiniband/hw/mlx4/cq.c | 4 +--- drivers/infiniband/hw/mlx4/mad.c | 14 -- drivers/infiniband/hw/mlx4/main.c | 8 +++- drivers/infiniband/hw/mlx4/mlx4_ib.h | 2 +- drivers/infiniband/hw/mlx4/qp.c | 21 +++-- drivers/infiniband/hw/mlx4/sysfs.c| 6 ++ drivers/infiniband/ulp/ipoib/ipoib_main.c | 6 +++--- include/rdma/ib_verbs.h | 24 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 3 +-- 19 files changed, 79 insertions(+), 83 deletions(-) diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c index f6d2961..27f1bec 100644 --- a/drivers/infiniband/core/agent.c +++ b/drivers/infiniband/core/agent.c @@ -156,7 +156,7 @@ int ib_agent_port_open(struct ib_device *device, int port_num) goto error1; } -if (rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND) { +if (rdma_port_ll_is_ib(device, port_num)) { /* Obtain send only MAD agent for SMI QP */ port_priv-agent[0] = ib_register_mad_agent(device, port_num, IB_QPT_SMI, NULL, 0, diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index e28a494..2c72e9e 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3762,7 +3762,7 @@ static void cm_add_one(struct ib_device *ib_device) int ret; u8 i; -if (rdma_node_get_transport(ib_device-node_type) != RDMA_TRANSPORT_IB) +if (!rdma_transport_is_ib(ib_device))
[PATCH v3 07/15] xprtrdma: Add a max_payload op for each memreg mode
The max_payload computation is generalized to ensure that the payload maximum is the lesser of RPC_MAX_DATA_SEGS and the number of data segments that can be transmitted in an inline buffer. Signed-off-by: Chuck Lever chuck.le...@oracle.com Reviewed-by: Sagi Grimberg sa...@mellanox.com Tested-by: Devesh Sharma devesh.sha...@emulex.com Tested-by: Meghana Cheripady meghana.cherip...@emulex.com Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com --- net/sunrpc/xprtrdma/fmr_ops.c | 13 ++ net/sunrpc/xprtrdma/frwr_ops.c | 13 ++ net/sunrpc/xprtrdma/physical_ops.c | 10 +++ net/sunrpc/xprtrdma/transport.c|5 +++- net/sunrpc/xprtrdma/verbs.c| 49 +++- net/sunrpc/xprtrdma/xprt_rdma.h|5 +++- 6 files changed, 59 insertions(+), 36 deletions(-) diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c index ffb7d93..eec2660 100644 --- a/net/sunrpc/xprtrdma/fmr_ops.c +++ b/net/sunrpc/xprtrdma/fmr_ops.c @@ -17,6 +17,19 @@ # define RPCDBG_FACILITY RPCDBG_TRANS #endif +/* Maximum scatter/gather per FMR */ +#define RPCRDMA_MAX_FMR_SGES (64) + +/* FMR mode conveys up to 64 pages of payload per chunk segment. + */ +static size_t +fmr_op_maxpages(struct rpcrdma_xprt *r_xprt) +{ + return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS, +rpcrdma_max_segments(r_xprt) * RPCRDMA_MAX_FMR_SGES); +} + const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = { + .ro_maxpages= fmr_op_maxpages, .ro_displayname = fmr, }; diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index 79173f9..73a5ac8 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -17,6 +17,19 @@ # define RPCDBG_FACILITY RPCDBG_TRANS #endif +/* FRWR mode conveys a list of pages per chunk segment. The + * maximum length of that list is the FRWR page list depth. + */ +static size_t +frwr_op_maxpages(struct rpcrdma_xprt *r_xprt) +{ + struct rpcrdma_ia *ia = r_xprt-rx_ia; + + return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS, +rpcrdma_max_segments(r_xprt) * ia-ri_max_frmr_depth); +} + const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = { + .ro_maxpages= frwr_op_maxpages, .ro_displayname = frwr, }; diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c index b0922ac..28ade19 100644 --- a/net/sunrpc/xprtrdma/physical_ops.c +++ b/net/sunrpc/xprtrdma/physical_ops.c @@ -19,6 +19,16 @@ # define RPCDBG_FACILITY RPCDBG_TRANS #endif +/* PHYSICAL memory registration conveys one page per chunk segment. + */ +static size_t +physical_op_maxpages(struct rpcrdma_xprt *r_xprt) +{ + return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS, +rpcrdma_max_segments(r_xprt)); +} + const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = { + .ro_maxpages= physical_op_maxpages, .ro_displayname = physical, }; diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 97f6562..da71a24 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -406,7 +406,10 @@ xprt_setup_rdma(struct xprt_create *args) xprt_rdma_connect_worker); xprt_rdma_format_addresses(xprt); - xprt-max_payload = rpcrdma_max_payload(new_xprt); + xprt-max_payload = new_xprt-rx_ia.ri_ops-ro_maxpages(new_xprt); + if (xprt-max_payload == 0) + goto out4; + xprt-max_payload = PAGE_SHIFT; dprintk(RPC: %s: transport data payload maximum: %zu bytes\n, __func__, xprt-max_payload); diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index c3319e1..da55cda 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -2212,43 +2212,24 @@ rpcrdma_ep_post_recv(struct rpcrdma_ia *ia, return rc; } -/* Physical mapping means one Read/Write list entry per-page. - * All list entries must fit within an inline buffer - * - * NB: The server must return a Write list for NFS READ, - * which has the same constraint. Factor in the inline - * rsize as well. +/* How many chunk list items fit within our inline buffers? */ -static size_t -rpcrdma_physical_max_payload(struct rpcrdma_xprt *r_xprt) +unsigned int +rpcrdma_max_segments(struct rpcrdma_xprt *r_xprt) { struct rpcrdma_create_data_internal *cdata = r_xprt-rx_data; - unsigned int inline_size, pages; - - inline_size = min_t(unsigned int, - cdata-inline_wsize, cdata-inline_rsize); - inline_size -= RPCRDMA_HDRLEN_MIN; - pages = inline_size / sizeof(struct rpcrdma_segment); - return pages PAGE_SHIFT; -} + int bytes, segments; -static size_t
[PATCH v3 09/15] xprtrdma: Add a deregister_external op for each memreg mode
There is very little common processing among the different external memory deregistration functions. Signed-off-by: Chuck Lever chuck.le...@oracle.com Tested-by: Devesh Sharma devesh.sha...@emulex.com Tested-by: Meghana Cheripady meghana.cherip...@emulex.com Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com --- net/sunrpc/xprtrdma/fmr_ops.c | 27 net/sunrpc/xprtrdma/frwr_ops.c | 36 net/sunrpc/xprtrdma/physical_ops.c | 10 net/sunrpc/xprtrdma/rpc_rdma.c | 11 +++-- net/sunrpc/xprtrdma/transport.c|4 +- net/sunrpc/xprtrdma/verbs.c| 81 net/sunrpc/xprtrdma/xprt_rdma.h|5 +- 7 files changed, 84 insertions(+), 90 deletions(-) diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c index 45fb646..888aa10 100644 --- a/net/sunrpc/xprtrdma/fmr_ops.c +++ b/net/sunrpc/xprtrdma/fmr_ops.c @@ -79,8 +79,35 @@ out_maperr: return rc; } +/* Use the ib_unmap_fmr() verb to prevent further remote + * access via RDMA READ or RDMA WRITE. + */ +static int +fmr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg) +{ + struct rpcrdma_ia *ia = r_xprt-rx_ia; + struct rpcrdma_mr_seg *seg1 = seg; + int rc, nsegs = seg-mr_nsegs; + LIST_HEAD(l); + + list_add(seg1-rl_mw-r.fmr-list, l); + rc = ib_unmap_fmr(l); + read_lock(ia-ri_qplock); + while (seg1-mr_nsegs--) + rpcrdma_unmap_one(ia, seg++); + read_unlock(ia-ri_qplock); + if (rc) + goto out_err; + return nsegs; + +out_err: + dprintk(RPC: %s: ib_unmap_fmr status %i\n, __func__, rc); + return nsegs; +} + const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = { .ro_map = fmr_op_map, + .ro_unmap = fmr_op_unmap, .ro_maxpages= fmr_op_maxpages, .ro_displayname = fmr, }; diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index 23e4d99..35b725b 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -110,8 +110,44 @@ out_senderr: return rc; } +/* Post a LOCAL_INV Work Request to prevent further remote access + * via RDMA READ or RDMA WRITE. + */ +static int +frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg) +{ + struct rpcrdma_mr_seg *seg1 = seg; + struct rpcrdma_ia *ia = r_xprt-rx_ia; + struct ib_send_wr invalidate_wr, *bad_wr; + int rc, nsegs = seg-mr_nsegs; + + seg1-rl_mw-r.frmr.fr_state = FRMR_IS_INVALID; + + memset(invalidate_wr, 0, sizeof(invalidate_wr)); + invalidate_wr.wr_id = (unsigned long)(void *)seg1-rl_mw; + invalidate_wr.opcode = IB_WR_LOCAL_INV; + invalidate_wr.ex.invalidate_rkey = seg1-rl_mw-r.frmr.fr_mr-rkey; + DECR_CQCOUNT(r_xprt-rx_ep); + + read_lock(ia-ri_qplock); + while (seg1-mr_nsegs--) + rpcrdma_unmap_one(ia, seg++); + rc = ib_post_send(ia-ri_id-qp, invalidate_wr, bad_wr); + read_unlock(ia-ri_qplock); + if (rc) + goto out_err; + return nsegs; + +out_err: + /* Force rpcrdma_buffer_get() to retry */ + seg1-rl_mw-r.frmr.fr_state = FRMR_IS_STALE; + dprintk(RPC: %s: ib_post_send status %i\n, __func__, rc); + return nsegs; +} + const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = { .ro_map = frwr_op_map, + .ro_unmap = frwr_op_unmap, .ro_maxpages= frwr_op_maxpages, .ro_displayname = frwr, }; diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c index 5a284ee..5b5a63a 100644 --- a/net/sunrpc/xprtrdma/physical_ops.c +++ b/net/sunrpc/xprtrdma/physical_ops.c @@ -44,8 +44,18 @@ physical_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, return 1; } +/* Unmap a memory region, but leave it registered. + */ +static int +physical_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg) +{ + rpcrdma_unmap_one(r_xprt-rx_ia, seg); + return 1; +} + const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = { .ro_map = physical_op_map, + .ro_unmap = physical_op_unmap, .ro_maxpages= physical_op_maxpages, .ro_displayname = physical, }; diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c index 6ab1d03..2c53ea9 100644 --- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -284,11 +284,12 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct xdr_buf *target, return (unsigned char *)iptr - (unsigned char *)headerp; out: - if (r_xprt-rx_ia.ri_memreg_strategy != RPCRDMA_FRMR) { - for (pos
[PATCH v3 11/15] xprtrdma: Add reset MRs memreg op
This method is invoked when a transport instance is about to be reconnected. Each Memory Region object is reset to its initial state. Signed-off-by: Chuck Lever chuck.le...@oracle.com Reviewed-by: Sagi Grimberg sa...@mellanox.com Tested-by: Devesh Sharma devesh.sha...@emulex.com Tested-by: Meghana Cheripady meghana.cherip...@emulex.com Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com --- net/sunrpc/xprtrdma/fmr_ops.c | 23 net/sunrpc/xprtrdma/frwr_ops.c | 51 ++ net/sunrpc/xprtrdma/physical_ops.c |6 ++ net/sunrpc/xprtrdma/verbs.c| 103 +--- net/sunrpc/xprtrdma/xprt_rdma.h|1 5 files changed, 83 insertions(+), 101 deletions(-) diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c index 825ce96..93261b0 100644 --- a/net/sunrpc/xprtrdma/fmr_ops.c +++ b/net/sunrpc/xprtrdma/fmr_ops.c @@ -146,10 +146,33 @@ out_err: return nsegs; } +/* After a disconnect, unmap all FMRs. + * + * This is invoked only in the transport connect worker in order + * to serialize with rpcrdma_register_fmr_external(). + */ +static void +fmr_op_reset(struct rpcrdma_xprt *r_xprt) +{ + struct rpcrdma_buffer *buf = r_xprt-rx_buf; + struct rpcrdma_mw *r; + LIST_HEAD(list); + int rc; + + list_for_each_entry(r, buf-rb_all, mw_all) + list_add(r-r.fmr-list, list); + + rc = ib_unmap_fmr(list); + if (rc) + dprintk(RPC: %s: ib_unmap_fmr failed %i\n, + __func__, rc); +} + const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = { .ro_map = fmr_op_map, .ro_unmap = fmr_op_unmap, .ro_maxpages= fmr_op_maxpages, .ro_init= fmr_op_init, + .ro_reset = fmr_op_reset, .ro_displayname = fmr, }; diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index 9168c15..c2bb29d 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -46,6 +46,18 @@ out_list_err: return rc; } +static void +__frwr_release(struct rpcrdma_mw *r) +{ + int rc; + + rc = ib_dereg_mr(r-r.frmr.fr_mr); + if (rc) + dprintk(RPC: %s: ib_dereg_mr status %i\n, + __func__, rc); + ib_free_fast_reg_page_list(r-r.frmr.fr_pgl); +} + /* FRWR mode conveys a list of pages per chunk segment. The * maximum length of that list is the FRWR page list depth. */ @@ -210,10 +222,49 @@ out_err: return nsegs; } +/* After a disconnect, a flushed FAST_REG_MR can leave an FRMR in + * an unusable state. Find FRMRs in this state and dereg / reg + * each. FRMRs that are VALID and attached to an rpcrdma_req are + * also torn down. + * + * This gives all in-use FRMRs a fresh rkey and leaves them INVALID. + * + * This is invoked only in the transport connect worker in order + * to serialize with rpcrdma_register_frmr_external(). + */ +static void +frwr_op_reset(struct rpcrdma_xprt *r_xprt) +{ + struct rpcrdma_buffer *buf = r_xprt-rx_buf; + struct ib_device *device = r_xprt-rx_ia.ri_id-device; + unsigned int depth = r_xprt-rx_ia.ri_max_frmr_depth; + struct ib_pd *pd = r_xprt-rx_ia.ri_pd; + struct rpcrdma_mw *r; + int rc; + + list_for_each_entry(r, buf-rb_all, mw_all) { + if (r-r.frmr.fr_state == FRMR_IS_INVALID) + continue; + + __frwr_release(r); + rc = __frwr_init(r, pd, device, depth); + if (rc) { + dprintk(RPC: %s: mw %p left %s\n, + __func__, r, + (r-r.frmr.fr_state == FRMR_IS_STALE ? + stale : valid)); + continue; + } + + r-r.frmr.fr_state = FRMR_IS_INVALID; + } +} + const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = { .ro_map = frwr_op_map, .ro_unmap = frwr_op_unmap, .ro_maxpages= frwr_op_maxpages, .ro_init= frwr_op_init, + .ro_reset = frwr_op_reset, .ro_displayname = frwr, }; diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c index c372051..e060713 100644 --- a/net/sunrpc/xprtrdma/physical_ops.c +++ b/net/sunrpc/xprtrdma/physical_ops.c @@ -59,10 +59,16 @@ physical_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg) return 1; } +static void +physical_op_reset(struct rpcrdma_xprt *r_xprt) +{ +} + const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = { .ro_map = physical_op_map, .ro_unmap
[PATCH v3 10/15] xprtrdma: Add init MRs memreg op
This method is used when setting up a new transport instance to create a pool of Memory Region objects that will be used to register memory during operation. Memory Regions are not needed for physical registration, since -prepare and -release are no-ops for that mode. Signed-off-by: Chuck Lever chuck.le...@oracle.com Reviewed-by: Sagi Grimberg sa...@mellanox.com Tested-by: Devesh Sharma devesh.sha...@emulex.com Tested-by: Meghana Cheripady meghana.cherip...@emulex.com Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com --- net/sunrpc/xprtrdma/fmr_ops.c | 42 +++ net/sunrpc/xprtrdma/frwr_ops.c | 66 +++ net/sunrpc/xprtrdma/physical_ops.c |7 ++ net/sunrpc/xprtrdma/verbs.c| 104 +--- net/sunrpc/xprtrdma/xprt_rdma.h|1 5 files changed, 119 insertions(+), 101 deletions(-) diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c index 888aa10..825ce96 100644 --- a/net/sunrpc/xprtrdma/fmr_ops.c +++ b/net/sunrpc/xprtrdma/fmr_ops.c @@ -29,6 +29,47 @@ fmr_op_maxpages(struct rpcrdma_xprt *r_xprt) rpcrdma_max_segments(r_xprt) * RPCRDMA_MAX_FMR_SGES); } +static int +fmr_op_init(struct rpcrdma_xprt *r_xprt) +{ + struct rpcrdma_buffer *buf = r_xprt-rx_buf; + int mr_access_flags = IB_ACCESS_REMOTE_WRITE | IB_ACCESS_REMOTE_READ; + struct ib_fmr_attr fmr_attr = { + .max_pages = RPCRDMA_MAX_FMR_SGES, + .max_maps = 1, + .page_shift = PAGE_SHIFT + }; + struct ib_pd *pd = r_xprt-rx_ia.ri_pd; + struct rpcrdma_mw *r; + int i, rc; + + INIT_LIST_HEAD(buf-rb_mws); + INIT_LIST_HEAD(buf-rb_all); + + i = (buf-rb_max_requests + 1) * RPCRDMA_MAX_SEGS; + dprintk(RPC: %s: initalizing %d FMRs\n, __func__, i); + + while (i--) { + r = kzalloc(sizeof(*r), GFP_KERNEL); + if (!r) + return -ENOMEM; + + r-r.fmr = ib_alloc_fmr(pd, mr_access_flags, fmr_attr); + if (IS_ERR(r-r.fmr)) + goto out_fmr_err; + + list_add(r-mw_list, buf-rb_mws); + list_add(r-mw_all, buf-rb_all); + } + return 0; + +out_fmr_err: + rc = PTR_ERR(r-r.fmr); + dprintk(RPC: %s: ib_alloc_fmr status %i\n, __func__, rc); + kfree(r); + return rc; +} + /* Use the ib_map_phys_fmr() verb to register a memory region * for remote access via RDMA READ or RDMA WRITE. */ @@ -109,5 +150,6 @@ const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = { .ro_map = fmr_op_map, .ro_unmap = fmr_op_unmap, .ro_maxpages= fmr_op_maxpages, + .ro_init= fmr_op_init, .ro_displayname = fmr, }; diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index 35b725b..9168c15 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -17,6 +17,35 @@ # define RPCDBG_FACILITY RPCDBG_TRANS #endif +static int +__frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device, + unsigned int depth) +{ + struct rpcrdma_frmr *f = r-r.frmr; + int rc; + + f-fr_mr = ib_alloc_fast_reg_mr(pd, depth); + if (IS_ERR(f-fr_mr)) + goto out_mr_err; + f-fr_pgl = ib_alloc_fast_reg_page_list(device, depth); + if (IS_ERR(f-fr_pgl)) + goto out_list_err; + return 0; + +out_mr_err: + rc = PTR_ERR(f-fr_mr); + dprintk(RPC: %s: ib_alloc_fast_reg_mr status %i\n, + __func__, rc); + return rc; + +out_list_err: + rc = PTR_ERR(f-fr_pgl); + dprintk(RPC: %s: ib_alloc_fast_reg_page_list status %i\n, + __func__, rc); + ib_dereg_mr(f-fr_mr); + return rc; +} + /* FRWR mode conveys a list of pages per chunk segment. The * maximum length of that list is the FRWR page list depth. */ @@ -29,6 +58,42 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt) rpcrdma_max_segments(r_xprt) * ia-ri_max_frmr_depth); } +static int +frwr_op_init(struct rpcrdma_xprt *r_xprt) +{ + struct rpcrdma_buffer *buf = r_xprt-rx_buf; + struct ib_device *device = r_xprt-rx_ia.ri_id-device; + unsigned int depth = r_xprt-rx_ia.ri_max_frmr_depth; + struct ib_pd *pd = r_xprt-rx_ia.ri_pd; + int i; + + INIT_LIST_HEAD(buf-rb_mws); + INIT_LIST_HEAD(buf-rb_all); + + i = (buf-rb_max_requests + 1) * RPCRDMA_MAX_SEGS; + dprintk(RPC: %s: initalizing %d FRMRs\n, __func__, i); + + while (i--) { + struct rpcrdma_mw *r; + int rc; + + r = kzalloc(sizeof(*r), GFP_KERNEL); + if (!r) + return -ENOMEM; + + rc =
[PATCH v3 12/15] xprtrdma: Add destroy MRs memreg op
Memory Region objects associated with a transport instance are destroyed before the instance is shutdown and destroyed. Signed-off-by: Chuck Lever chuck.le...@oracle.com Reviewed-by: Sagi Grimberg sa...@mellanox.com Tested-by: Devesh Sharma devesh.sha...@emulex.com Tested-by: Meghana Cheripady meghana.cherip...@emulex.com Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com --- net/sunrpc/xprtrdma/fmr_ops.c | 18 net/sunrpc/xprtrdma/frwr_ops.c | 14 ++ net/sunrpc/xprtrdma/physical_ops.c |6 net/sunrpc/xprtrdma/verbs.c| 52 +--- net/sunrpc/xprtrdma/xprt_rdma.h|1 + 5 files changed, 40 insertions(+), 51 deletions(-) diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c index 93261b0..e9ca594 100644 --- a/net/sunrpc/xprtrdma/fmr_ops.c +++ b/net/sunrpc/xprtrdma/fmr_ops.c @@ -168,11 +168,29 @@ fmr_op_reset(struct rpcrdma_xprt *r_xprt) __func__, rc); } +static void +fmr_op_destroy(struct rpcrdma_buffer *buf) +{ + struct rpcrdma_mw *r; + int rc; + + while (!list_empty(buf-rb_all)) { + r = list_entry(buf-rb_all.next, struct rpcrdma_mw, mw_all); + list_del(r-mw_all); + rc = ib_dealloc_fmr(r-r.fmr); + if (rc) + dprintk(RPC: %s: ib_dealloc_fmr failed %i\n, + __func__, rc); + kfree(r); + } +} + const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = { .ro_map = fmr_op_map, .ro_unmap = fmr_op_unmap, .ro_maxpages= fmr_op_maxpages, .ro_init= fmr_op_init, .ro_reset = fmr_op_reset, + .ro_destroy = fmr_op_destroy, .ro_displayname = fmr, }; diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index c2bb29d..121e400 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -260,11 +260,25 @@ frwr_op_reset(struct rpcrdma_xprt *r_xprt) } } +static void +frwr_op_destroy(struct rpcrdma_buffer *buf) +{ + struct rpcrdma_mw *r; + + while (!list_empty(buf-rb_all)) { + r = list_entry(buf-rb_all.next, struct rpcrdma_mw, mw_all); + list_del(r-mw_all); + __frwr_release(r); + kfree(r); + } +} + const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = { .ro_map = frwr_op_map, .ro_unmap = frwr_op_unmap, .ro_maxpages= frwr_op_maxpages, .ro_init= frwr_op_init, .ro_reset = frwr_op_reset, + .ro_destroy = frwr_op_destroy, .ro_displayname = frwr, }; diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c index e060713..eb39011 100644 --- a/net/sunrpc/xprtrdma/physical_ops.c +++ b/net/sunrpc/xprtrdma/physical_ops.c @@ -64,11 +64,17 @@ physical_op_reset(struct rpcrdma_xprt *r_xprt) { } +static void +physical_op_destroy(struct rpcrdma_buffer *buf) +{ +} + const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = { .ro_map = physical_op_map, .ro_unmap = physical_op_unmap, .ro_maxpages= physical_op_maxpages, .ro_init= physical_op_init, .ro_reset = physical_op_reset, + .ro_destroy = physical_op_destroy, .ro_displayname = physical, }; diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 1b2c1f4..a7fb314 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -1199,47 +1199,6 @@ rpcrdma_destroy_req(struct rpcrdma_ia *ia, struct rpcrdma_req *req) kfree(req); } -static void -rpcrdma_destroy_fmrs(struct rpcrdma_buffer *buf) -{ - struct rpcrdma_mw *r; - int rc; - - while (!list_empty(buf-rb_all)) { - r = list_entry(buf-rb_all.next, struct rpcrdma_mw, mw_all); - list_del(r-mw_all); - list_del(r-mw_list); - - rc = ib_dealloc_fmr(r-r.fmr); - if (rc) - dprintk(RPC: %s: ib_dealloc_fmr failed %i\n, - __func__, rc); - - kfree(r); - } -} - -static void -rpcrdma_destroy_frmrs(struct rpcrdma_buffer *buf) -{ - struct rpcrdma_mw *r; - int rc; - - while (!list_empty(buf-rb_all)) { - r = list_entry(buf-rb_all.next, struct rpcrdma_mw, mw_all); - list_del(r-mw_all); - list_del(r-mw_list); - - rc =
[PATCH v3 13/15] xprtrdma: Add open memreg op
The open op determines the size of various transport data structures based on device capabilities and memory registration mode. Signed-off-by: Chuck Lever chuck.le...@oracle.com Tested-by: Devesh Sharma devesh.sha...@emulex.com Tested-by: Meghana Cheripady meghana.cherip...@emulex.com Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com --- net/sunrpc/xprtrdma/fmr_ops.c |8 ++ net/sunrpc/xprtrdma/frwr_ops.c | 48 +++ net/sunrpc/xprtrdma/physical_ops.c |8 ++ net/sunrpc/xprtrdma/verbs.c| 49 ++-- net/sunrpc/xprtrdma/xprt_rdma.h|3 ++ 5 files changed, 70 insertions(+), 46 deletions(-) diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c index e9ca594..e8a9837 100644 --- a/net/sunrpc/xprtrdma/fmr_ops.c +++ b/net/sunrpc/xprtrdma/fmr_ops.c @@ -20,6 +20,13 @@ /* Maximum scatter/gather per FMR */ #define RPCRDMA_MAX_FMR_SGES (64) +static int +fmr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep, + struct rpcrdma_create_data_internal *cdata) +{ + return 0; +} + /* FMR mode conveys up to 64 pages of payload per chunk segment. */ static size_t @@ -188,6 +195,7 @@ fmr_op_destroy(struct rpcrdma_buffer *buf) const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = { .ro_map = fmr_op_map, .ro_unmap = fmr_op_unmap, + .ro_open= fmr_op_open, .ro_maxpages= fmr_op_maxpages, .ro_init= fmr_op_init, .ro_reset = fmr_op_reset, diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index 121e400..e17d54d 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -58,6 +58,53 @@ __frwr_release(struct rpcrdma_mw *r) ib_free_fast_reg_page_list(r-r.frmr.fr_pgl); } +static int +frwr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep, +struct rpcrdma_create_data_internal *cdata) +{ + struct ib_device_attr *devattr = ia-ri_devattr; + int depth, delta; + + ia-ri_max_frmr_depth = + min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS, + devattr-max_fast_reg_page_list_len); + dprintk(RPC: %s: device's max FR page list len = %u\n, + __func__, ia-ri_max_frmr_depth); + + /* Add room for frmr register and invalidate WRs. +* 1. FRMR reg WR for head +* 2. FRMR invalidate WR for head +* 3. N FRMR reg WRs for pagelist +* 4. N FRMR invalidate WRs for pagelist +* 5. FRMR reg WR for tail +* 6. FRMR invalidate WR for tail +* 7. The RDMA_SEND WR +*/ + depth = 7; + + /* Calculate N if the device max FRMR depth is smaller than +* RPCRDMA_MAX_DATA_SEGS. +*/ + if (ia-ri_max_frmr_depth RPCRDMA_MAX_DATA_SEGS) { + delta = RPCRDMA_MAX_DATA_SEGS - ia-ri_max_frmr_depth; + do { + depth += 2; /* FRMR reg + invalidate */ + delta -= ia-ri_max_frmr_depth; + } while (delta 0); + } + + ep-rep_attr.cap.max_send_wr *= depth; + if (ep-rep_attr.cap.max_send_wr devattr-max_qp_wr) { + cdata-max_requests = devattr-max_qp_wr / depth; + if (!cdata-max_requests) + return -EINVAL; + ep-rep_attr.cap.max_send_wr = cdata-max_requests * + depth; + } + + return 0; +} + /* FRWR mode conveys a list of pages per chunk segment. The * maximum length of that list is the FRWR page list depth. */ @@ -276,6 +323,7 @@ frwr_op_destroy(struct rpcrdma_buffer *buf) const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = { .ro_map = frwr_op_map, .ro_unmap = frwr_op_unmap, + .ro_open= frwr_op_open, .ro_maxpages= frwr_op_maxpages, .ro_init= frwr_op_init, .ro_reset = frwr_op_reset, diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c index eb39011..0ba130b 100644 --- a/net/sunrpc/xprtrdma/physical_ops.c +++ b/net/sunrpc/xprtrdma/physical_ops.c @@ -19,6 +19,13 @@ # define RPCDBG_FACILITY RPCDBG_TRANS #endif +static int +physical_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep, +struct rpcrdma_create_data_internal *cdata) +{ + return 0; +} + /* PHYSICAL memory registration conveys one page per chunk segment. */ static size_t @@ -72,6 +79,7 @@ physical_op_destroy(struct rpcrdma_buffer *buf) const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = { .ro_map = physical_op_map,
[PATCH 3/5 linux-next] IB/mlx4: remove unneccessary message level.
KERN_WARNING is implicitely declared in pr_warn() Signed-off-by: Fabian Frederick f...@skynet.be --- drivers/infiniband/hw/mlx4/main.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index b972c0b..1298fe8 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -1568,8 +1568,7 @@ static void reset_gids_task(struct work_struct *work) MLX4_CMD_TIME_CLASS_B, MLX4_CMD_WRAPPED); if (err) - pr_warn(KERN_WARNING - set port %d command failed\n, gw-port); + pr_warn(set port %d command failed\n, gw-port); } mlx4_free_cmd_mailbox(dev, mailbox); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 15/15] xprtrdma: Make rpcrdma_{un}map_one() into inline functions
These functions are called in a loop for each page transferred via RDMA READ or WRITE. Extract loop invariants and inline them to reduce CPU overhead. Signed-off-by: Chuck Lever chuck.le...@oracle.com Tested-by: Devesh Sharma devesh.sha...@emulex.com Tested-by: Meghana Cheripady meghana.cherip...@emulex.com Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com --- net/sunrpc/xprtrdma/fmr_ops.c | 10 ++-- net/sunrpc/xprtrdma/frwr_ops.c | 10 ++-- net/sunrpc/xprtrdma/physical_ops.c | 10 ++-- net/sunrpc/xprtrdma/verbs.c| 44 ++- net/sunrpc/xprtrdma/xprt_rdma.h| 45 ++-- 5 files changed, 73 insertions(+), 46 deletions(-) diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c index e8a9837..a91ba2c 100644 --- a/net/sunrpc/xprtrdma/fmr_ops.c +++ b/net/sunrpc/xprtrdma/fmr_ops.c @@ -85,6 +85,8 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, int nsegs, bool writing) { struct rpcrdma_ia *ia = r_xprt-rx_ia; + struct ib_device *device = ia-ri_id-device; + enum dma_data_direction direction = rpcrdma_data_dir(writing); struct rpcrdma_mr_seg *seg1 = seg; struct rpcrdma_mw *mw = seg1-rl_mw; u64 physaddrs[RPCRDMA_MAX_DATA_SEGS]; @@ -97,7 +99,7 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, if (nsegs RPCRDMA_MAX_FMR_SGES) nsegs = RPCRDMA_MAX_FMR_SGES; for (i = 0; i nsegs;) { - rpcrdma_map_one(ia, seg, writing); + rpcrdma_map_one(device, seg, direction); physaddrs[i] = seg-mr_dma; len += seg-mr_len; ++seg; @@ -123,7 +125,7 @@ out_maperr: __func__, len, (unsigned long long)seg1-mr_dma, pageoff, i, rc); while (i--) - rpcrdma_unmap_one(ia, --seg); + rpcrdma_unmap_one(device, --seg); return rc; } @@ -135,14 +137,16 @@ fmr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg) { struct rpcrdma_ia *ia = r_xprt-rx_ia; struct rpcrdma_mr_seg *seg1 = seg; + struct ib_device *device; int rc, nsegs = seg-mr_nsegs; LIST_HEAD(l); list_add(seg1-rl_mw-r.fmr-list, l); rc = ib_unmap_fmr(l); read_lock(ia-ri_qplock); + device = ia-ri_id-device; while (seg1-mr_nsegs--) - rpcrdma_unmap_one(ia, seg++); + rpcrdma_unmap_one(device, seg++); read_unlock(ia-ri_qplock); if (rc) goto out_err; diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index ea59c1b..0a7b9df 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -178,6 +178,8 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, int nsegs, bool writing) { struct rpcrdma_ia *ia = r_xprt-rx_ia; + struct ib_device *device = ia-ri_id-device; + enum dma_data_direction direction = rpcrdma_data_dir(writing); struct rpcrdma_mr_seg *seg1 = seg; struct rpcrdma_mw *mw = seg1-rl_mw; struct rpcrdma_frmr *frmr = mw-r.frmr; @@ -197,7 +199,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, if (nsegs ia-ri_max_frmr_depth) nsegs = ia-ri_max_frmr_depth; for (page_no = i = 0; i nsegs;) { - rpcrdma_map_one(ia, seg, writing); + rpcrdma_map_one(device, seg, direction); pa = seg-mr_dma; for (seg_len = seg-mr_len; seg_len 0; seg_len -= PAGE_SIZE) { frmr-fr_pgl-page_list[page_no++] = pa; @@ -247,7 +249,7 @@ out_senderr: ib_update_fast_reg_key(mr, --key); frmr-fr_state = FRMR_IS_INVALID; while (i--) - rpcrdma_unmap_one(ia, --seg); + rpcrdma_unmap_one(device, --seg); return rc; } @@ -261,6 +263,7 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg) struct rpcrdma_ia *ia = r_xprt-rx_ia; struct ib_send_wr invalidate_wr, *bad_wr; int rc, nsegs = seg-mr_nsegs; + struct ib_device *device; seg1-rl_mw-r.frmr.fr_state = FRMR_IS_INVALID; @@ -271,8 +274,9 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg) DECR_CQCOUNT(r_xprt-rx_ep); read_lock(ia-ri_qplock); + device = ia-ri_id-device; while (seg1-mr_nsegs--) - rpcrdma_unmap_one(ia, seg++); + rpcrdma_unmap_one(device, seg++); rc = ib_post_send(ia-ri_id-qp, invalidate_wr, bad_wr); read_unlock(ia-ri_qplock); if (rc) diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c index 0ba130b..ba518af 100644 --- a/net/sunrpc/xprtrdma/physical_ops.c +++ b/net/sunrpc/xprtrdma/physical_ops.c @@ -50,7
[PATCH v3 14/15] xprtrdma: Handle non-SEND completions via a callout
Allow each memory registration mode to plug in a callout that handles the completion of a memory registration operation. Signed-off-by: Chuck Lever chuck.le...@oracle.com Reviewed-by: Sagi Grimberg sa...@mellanox.com Tested-by: Devesh Sharma devesh.sha...@emulex.com Tested-by: Meghana Cheripady meghana.cherip...@emulex.com Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com --- net/sunrpc/xprtrdma/frwr_ops.c | 17 + net/sunrpc/xprtrdma/verbs.c | 16 ++-- net/sunrpc/xprtrdma/xprt_rdma.h |5 + 3 files changed, 28 insertions(+), 10 deletions(-) diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index e17d54d..ea59c1b 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -117,6 +117,22 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt) rpcrdma_max_segments(r_xprt) * ia-ri_max_frmr_depth); } +/* If FAST_REG or LOCAL_INV failed, indicate the frmr needs to be reset. */ +static void +frwr_sendcompletion(struct ib_wc *wc) +{ + struct rpcrdma_mw *r; + + if (likely(wc-status == IB_WC_SUCCESS)) + return; + + /* WARNING: Only wr_id and status are reliable at this point */ + r = (struct rpcrdma_mw *)(unsigned long)wc-wr_id; + dprintk(RPC: %s: frmr %p (stale), status %d\n, + __func__, r, wc-status); + r-r.frmr.fr_state = FRMR_IS_STALE; +} + static int frwr_op_init(struct rpcrdma_xprt *r_xprt) { @@ -148,6 +164,7 @@ frwr_op_init(struct rpcrdma_xprt *r_xprt) list_add(r-mw_list, buf-rb_mws); list_add(r-mw_all, buf-rb_all); + r-mw_sendcompletion = frwr_sendcompletion; } return 0; diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index b697b3e..cac06f2 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -186,7 +186,7 @@ static const char * const wc_status[] = { remote access error, remote operation error, transport retry counter exceeded, - RNR retrycounter exceeded, + RNR retry counter exceeded, local RDD violation error, remove invalid RD request, operation aborted, @@ -204,21 +204,17 @@ static const char * const wc_status[] = { static void rpcrdma_sendcq_process_wc(struct ib_wc *wc) { - if (likely(wc-status == IB_WC_SUCCESS)) - return; - /* WARNING: Only wr_id and status are reliable at this point */ - if (wc-wr_id == 0ULL) { - if (wc-status != IB_WC_WR_FLUSH_ERR) + if (wc-wr_id == RPCRDMA_IGNORE_COMPLETION) { + if (wc-status != IB_WC_SUCCESS + wc-status != IB_WC_WR_FLUSH_ERR) pr_err(RPC: %s: SEND: %s\n, __func__, COMPLETION_MSG(wc-status)); } else { struct rpcrdma_mw *r; r = (struct rpcrdma_mw *)(unsigned long)wc-wr_id; - r-r.frmr.fr_state = FRMR_IS_STALE; - pr_err(RPC: %s: frmr %p (stale): %s\n, - __func__, r, COMPLETION_MSG(wc-status)); + r-mw_sendcompletion(wc); } } @@ -1622,7 +1618,7 @@ rpcrdma_ep_post(struct rpcrdma_ia *ia, } send_wr.next = NULL; - send_wr.wr_id = 0ULL; /* no send cookie */ + send_wr.wr_id = RPCRDMA_IGNORE_COMPLETION; send_wr.sg_list = req-rl_send_iov; send_wr.num_sge = req-rl_niovs; send_wr.opcode = IB_WR_SEND; diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h index 9036fb4..54bcbe4 100644 --- a/net/sunrpc/xprtrdma/xprt_rdma.h +++ b/net/sunrpc/xprtrdma/xprt_rdma.h @@ -106,6 +106,10 @@ struct rpcrdma_ep { #define INIT_CQCOUNT(ep) atomic_set((ep)-rep_cqcount, (ep)-rep_cqinit) #define DECR_CQCOUNT(ep) atomic_sub_return(1, (ep)-rep_cqcount) +/* Force completion handler to ignore the signal + */ +#define RPCRDMA_IGNORE_COMPLETION (0ULL) + /* Registered buffer -- registered kmalloc'd memory for RDMA SEND/RECV * * The below structure appears at the front of a large region of kmalloc'd @@ -206,6 +210,7 @@ struct rpcrdma_mw { struct ib_fmr *fmr; struct rpcrdma_frmr frmr; } r; + void(*mw_sendcompletion)(struct ib_wc *); struct list_headmw_list; struct list_headmw_all; }; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5 linux-next] iw_cxgb4: remove unneccessary message level.
KERN_ERR is implicitely declared in pr_err() Signed-off-by: Fabian Frederick f...@skynet.be --- drivers/infiniband/hw/cxgb4/device.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c index 8fb295e..59546b6 100644 --- a/drivers/infiniband/hw/cxgb4/device.c +++ b/drivers/infiniband/hw/cxgb4/device.c @@ -1355,7 +1355,7 @@ static void recover_lost_dbs(struct uld_ctx *ctx, struct qp_list *qp_list) t4_sq_host_wq_pidx(qp-wq), t4_sq_wq_size(qp-wq)); if (ret) { - pr_err(KERN_ERR MOD %s: Fatal error - + pr_err(MOD %s: Fatal error - DB overflow recovery failed - error syncing SQ qid %u\n, pci_name(ctx-lldi.pdev), qp-wq.sq.qid); @@ -1371,7 +1371,7 @@ static void recover_lost_dbs(struct uld_ctx *ctx, struct qp_list *qp_list) t4_rq_wq_size(qp-wq)); if (ret) { - pr_err(KERN_ERR MOD %s: Fatal error - + pr_err(MOD %s: Fatal error - DB overflow recovery failed - error syncing RQ qid %u\n, pci_name(ctx-lldi.pdev), qp-wq.rq.qid); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 00/15] NFS/RDMA patches proposed for 4.1
This is a series of client-side patches for NFS/RDMA. In preparation for increasing the transport credit limit and maximum rsize/wsize, I've re-factored the memory registration logic into separate files, invoked via a method API. The series is available in the nfs-rdma-for-4.1 topic branch at git://linux-nfs.org/projects/cel/cel-2.6.git Changes since v2: - Rebased on 4.0-rc6 - One minor fix squashed into 01/15 - Tested-by tags added Changes since v1: - Rebased on 4.0-rc5 - Main optimizations postponed to 4.2 - Addressed review comments from Anna, Sagi, and Devesh --- Chuck Lever (15): SUNRPC: Introduce missing well-known netids xprtrdma: Display IPv6 addresses and port numbers correctly xprtrdma: Perform a full marshal on retransmit xprtrdma: Byte-align FRWR registration xprtrdma: Prevent infinite loop in rpcrdma_ep_create() xprtrdma: Add vector of ops for each memory registration strategy xprtrdma: Add a max_payload op for each memreg mode xprtrdma: Add a register_external op for each memreg mode xprtrdma: Add a deregister_external op for each memreg mode xprtrdma: Add init MRs memreg op xprtrdma: Add reset MRs memreg op xprtrdma: Add destroy MRs memreg op xprtrdma: Add open memreg op xprtrdma: Handle non-SEND completions via a callout xprtrdma: Make rpcrdma_{un}map_one() into inline functions include/linux/sunrpc/msg_prot.h|8 include/linux/sunrpc/xprtrdma.h|5 net/sunrpc/xprtrdma/Makefile |3 net/sunrpc/xprtrdma/fmr_ops.c | 208 +++ net/sunrpc/xprtrdma/frwr_ops.c | 353 ++ net/sunrpc/xprtrdma/physical_ops.c | 94 + net/sunrpc/xprtrdma/rpc_rdma.c | 87 ++-- net/sunrpc/xprtrdma/transport.c| 61 ++- net/sunrpc/xprtrdma/verbs.c| 699 +++- net/sunrpc/xprtrdma/xprt_rdma.h| 90 - 10 files changed, 882 insertions(+), 726 deletions(-) create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c create mode 100644 net/sunrpc/xprtrdma/physical_ops.c -- Chuck Lever -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 02/15] xprtrdma: Display IPv6 addresses and port numbers correctly
Signed-off-by: Chuck Lever chuck.le...@oracle.com Reviewed-by: Sagi Grimberg sa...@mellanox.com Tested-by: Devesh Sharma devesh.sha...@emulex.com Tested-by: Meghana Cheripady meghana.cherip...@emulex.com Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com --- net/sunrpc/xprtrdma/transport.c | 47 --- net/sunrpc/xprtrdma/verbs.c | 21 +++-- 2 files changed, 47 insertions(+), 21 deletions(-) diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 2e192ba..9be7f97 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -157,12 +157,47 @@ static struct ctl_table sunrpc_table[] = { static struct rpc_xprt_ops xprt_rdma_procs;/* forward reference */ static void +xprt_rdma_format_addresses4(struct rpc_xprt *xprt, struct sockaddr *sap) +{ + struct sockaddr_in *sin = (struct sockaddr_in *)sap; + char buf[20]; + + snprintf(buf, sizeof(buf), %08x, ntohl(sin-sin_addr.s_addr)); + xprt-address_strings[RPC_DISPLAY_HEX_ADDR] = kstrdup(buf, GFP_KERNEL); + + xprt-address_strings[RPC_DISPLAY_NETID] = RPCBIND_NETID_RDMA; +} + +static void +xprt_rdma_format_addresses6(struct rpc_xprt *xprt, struct sockaddr *sap) +{ + struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)sap; + char buf[40]; + + snprintf(buf, sizeof(buf), %pi6, sin6-sin6_addr); + xprt-address_strings[RPC_DISPLAY_HEX_ADDR] = kstrdup(buf, GFP_KERNEL); + + xprt-address_strings[RPC_DISPLAY_NETID] = RPCBIND_NETID_RDMA6; +} + +static void xprt_rdma_format_addresses(struct rpc_xprt *xprt) { struct sockaddr *sap = (struct sockaddr *) rpcx_to_rdmad(xprt).addr; - struct sockaddr_in *sin = (struct sockaddr_in *)sap; - char buf[64]; + char buf[128]; + + switch (sap-sa_family) { + case AF_INET: + xprt_rdma_format_addresses4(xprt, sap); + break; + case AF_INET6: + xprt_rdma_format_addresses6(xprt, sap); + break; + default: + pr_err(rpcrdma: Unrecognized address family\n); + return; + } (void)rpc_ntop(sap, buf, sizeof(buf)); xprt-address_strings[RPC_DISPLAY_ADDR] = kstrdup(buf, GFP_KERNEL); @@ -170,16 +205,10 @@ xprt_rdma_format_addresses(struct rpc_xprt *xprt) snprintf(buf, sizeof(buf), %u, rpc_get_port(sap)); xprt-address_strings[RPC_DISPLAY_PORT] = kstrdup(buf, GFP_KERNEL); - xprt-address_strings[RPC_DISPLAY_PROTO] = rdma; - - snprintf(buf, sizeof(buf), %08x, ntohl(sin-sin_addr.s_addr)); - xprt-address_strings[RPC_DISPLAY_HEX_ADDR] = kstrdup(buf, GFP_KERNEL); - snprintf(buf, sizeof(buf), %4hx, rpc_get_port(sap)); xprt-address_strings[RPC_DISPLAY_HEX_PORT] = kstrdup(buf, GFP_KERNEL); - /* netid */ - xprt-address_strings[RPC_DISPLAY_NETID] = rdma; + xprt-address_strings[RPC_DISPLAY_PROTO] = rdma; } static void diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 124676c..1aa55b7 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -50,6 +50,7 @@ #include linux/interrupt.h #include linux/slab.h #include linux/prefetch.h +#include linux/sunrpc/addr.h #include asm/bitops.h #include xprt_rdma.h @@ -424,7 +425,7 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event) struct rpcrdma_ia *ia = xprt-rx_ia; struct rpcrdma_ep *ep = xprt-rx_ep; #if IS_ENABLED(CONFIG_SUNRPC_DEBUG) - struct sockaddr_in *addr = (struct sockaddr_in *) ep-rep_remote_addr; + struct sockaddr *sap = (struct sockaddr *)ep-rep_remote_addr; #endif struct ib_qp_attr *attr = ia-ri_qp_attr; struct ib_qp_init_attr *iattr = ia-ri_qp_init_attr; @@ -480,9 +481,8 @@ connected: wake_up_all(ep-rep_connect_wait); /*FALLTHROUGH*/ default: - dprintk(RPC: %s: %pI4:%u (ep 0x%p): %s\n, - __func__, addr-sin_addr.s_addr, - ntohs(addr-sin_port), ep, + dprintk(RPC: %s: %pIS:%u (ep 0x%p): %s\n, + __func__, sap, rpc_get_port(sap), ep, CONNECTION_MSG(event-event)); break; } @@ -491,19 +491,16 @@ connected: if (connstate == 1) { int ird = attr-max_dest_rd_atomic; int tird = ep-rep_remote_cma.responder_resources; - printk(KERN_INFO rpcrdma: connection to %pI4:%u - on %s, memreg %d slots %d ird %d%s\n, - addr-sin_addr.s_addr, - ntohs(addr-sin_port), + + pr_info(rpcrdma: connection to %pIS:%u on %s, memreg %d slots %d ird %d%s\n, + sap, rpc_get_port(sap), ia-ri_id-device-name,
[PATCH v3 01/15] SUNRPC: Introduce missing well-known netids
Signed-off-by: Chuck Lever chuck.le...@oracle.com --- include/linux/sunrpc/msg_prot.h |8 +++- include/linux/sunrpc/xprtrdma.h |5 - 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/include/linux/sunrpc/msg_prot.h b/include/linux/sunrpc/msg_prot.h index aadc6a0..8073713 100644 --- a/include/linux/sunrpc/msg_prot.h +++ b/include/linux/sunrpc/msg_prot.h @@ -142,12 +142,18 @@ typedef __be32rpc_fraghdr; (RPC_REPHDRSIZE + (2 + RPC_MAX_AUTH_SIZE/4)) /* - * RFC1833/RFC3530 rpcbind (v3+) well-known netid's. + * Well-known netids. See: + * + * http://www.iana.org/assignments/rpc-netids/rpc-netids.xhtml */ #define RPCBIND_NETID_UDP udp #define RPCBIND_NETID_TCP tcp +#define RPCBIND_NETID_RDMA rdma +#define RPCBIND_NETID_SCTP sctp #define RPCBIND_NETID_UDP6 udp6 #define RPCBIND_NETID_TCP6 tcp6 +#define RPCBIND_NETID_RDMA6rdma6 +#define RPCBIND_NETID_SCTP6sctp6 #define RPCBIND_NETID_LOCALlocal /* diff --git a/include/linux/sunrpc/xprtrdma.h b/include/linux/sunrpc/xprtrdma.h index 64a0a0a..c984c85 100644 --- a/include/linux/sunrpc/xprtrdma.h +++ b/include/linux/sunrpc/xprtrdma.h @@ -41,11 +41,6 @@ #define _LINUX_SUNRPC_XPRTRDMA_H /* - * rpcbind (v3+) RDMA netid. - */ -#define RPCBIND_NETID_RDMA rdma - -/* * Constants. Max RPC/NFS header is big enough to account for * additional marshaling buffers passed down by Linux client. * -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 05/15] xprtrdma: Prevent infinite loop in rpcrdma_ep_create()
If a provider advertizes a zero max_fast_reg_page_list_len, FRWR depth detection loops forever. Instead of just failing the mount, try other memory registration modes. Fixes: 0fc6c4e7bb28 (xprtrdma: mind the device's max fast . . .) Reported-by: Devesh Sharma devesh.sha...@emulex.com Signed-off-by: Chuck Lever chuck.le...@oracle.com Tested-by: Devesh Sharma devesh.sha...@emulex.com Tested-by: Meghana Cheripady meghana.cherip...@emulex.com Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com --- net/sunrpc/xprtrdma/verbs.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 60f3317..99752b5 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -618,9 +618,10 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg) if (memreg == RPCRDMA_FRMR) { /* Requires both frmr reg and local dma lkey */ - if ((devattr-device_cap_flags + if (((devattr-device_cap_flags (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) != - (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) { + (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) || + (devattr-max_fast_reg_page_list_len == 0)) { dprintk(RPC: %s: FRMR registration not supported by HCA\n, __func__); memreg = RPCRDMA_MTHCAFMR; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 06/15] xprtrdma: Add vector of ops for each memory registration strategy
Instead of employing switch() statements, let's use the typical Linux kernel idiom for handling behavioral variation: virtual functions. Start by defining a vector of operations for each supported memory registration mode, and by adding a source file for each mode. Signed-off-by: Chuck Lever chuck.le...@oracle.com Reviewed-by: Sagi Grimberg sa...@mellanox.com Tested-by: Devesh Sharma devesh.sha...@emulex.com Tested-by: Meghana Cheripady meghana.cherip...@emulex.com Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com --- net/sunrpc/xprtrdma/Makefile |3 ++- net/sunrpc/xprtrdma/fmr_ops.c | 22 ++ net/sunrpc/xprtrdma/frwr_ops.c | 22 ++ net/sunrpc/xprtrdma/physical_ops.c | 24 net/sunrpc/xprtrdma/verbs.c| 11 +++ net/sunrpc/xprtrdma/xprt_rdma.h| 12 6 files changed, 89 insertions(+), 5 deletions(-) create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c create mode 100644 net/sunrpc/xprtrdma/physical_ops.c diff --git a/net/sunrpc/xprtrdma/Makefile b/net/sunrpc/xprtrdma/Makefile index da5136f..579f72b 100644 --- a/net/sunrpc/xprtrdma/Makefile +++ b/net/sunrpc/xprtrdma/Makefile @@ -1,6 +1,7 @@ obj-$(CONFIG_SUNRPC_XPRT_RDMA_CLIENT) += xprtrdma.o -xprtrdma-y := transport.o rpc_rdma.o verbs.o +xprtrdma-y := transport.o rpc_rdma.o verbs.o \ + fmr_ops.o frwr_ops.o physical_ops.o obj-$(CONFIG_SUNRPC_XPRT_RDMA_SERVER) += svcrdma.o diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c new file mode 100644 index 000..ffb7d93 --- /dev/null +++ b/net/sunrpc/xprtrdma/fmr_ops.c @@ -0,0 +1,22 @@ +/* + * Copyright (c) 2015 Oracle. All rights reserved. + * Copyright (c) 2003-2007 Network Appliance, Inc. All rights reserved. + */ + +/* Lightweight memory registration using Fast Memory Regions (FMR). + * Referred to sometimes as MTHCAFMR mode. + * + * FMR uses synchronous memory registration and deregistration. + * FMR registration is known to be fast, but FMR deregistration + * can take tens of usecs to complete. + */ + +#include xprt_rdma.h + +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) +# define RPCDBG_FACILITY RPCDBG_TRANS +#endif + +const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = { + .ro_displayname = fmr, +}; diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c new file mode 100644 index 000..79173f9 --- /dev/null +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -0,0 +1,22 @@ +/* + * Copyright (c) 2015 Oracle. All rights reserved. + * Copyright (c) 2003-2007 Network Appliance, Inc. All rights reserved. + */ + +/* Lightweight memory registration using Fast Registration Work + * Requests (FRWR). Also referred to sometimes as FRMR mode. + * + * FRWR features ordered asynchronous registration and deregistration + * of arbitrarily sized memory regions. This is the fastest and safest + * but most complex memory registration mode. + */ + +#include xprt_rdma.h + +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) +# define RPCDBG_FACILITY RPCDBG_TRANS +#endif + +const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = { + .ro_displayname = frwr, +}; diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c new file mode 100644 index 000..b0922ac --- /dev/null +++ b/net/sunrpc/xprtrdma/physical_ops.c @@ -0,0 +1,24 @@ +/* + * Copyright (c) 2015 Oracle. All rights reserved. + * Copyright (c) 2003-2007 Network Appliance, Inc. All rights reserved. + */ + +/* No-op chunk preparation. All client memory is pre-registered. + * Sometimes referred to as ALLPHYSICAL mode. + * + * Physical registration is simple because all client memory is + * pre-registered and never deregistered. This mode is good for + * adapter bring up, but is considered not safe: the server is + * trusted not to abuse its access to client memory not involved + * in RDMA I/O. + */ + +#include xprt_rdma.h + +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) +# define RPCDBG_FACILITY RPCDBG_TRANS +#endif + +const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = { + .ro_displayname = physical, +}; diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 99752b5..c3319e1 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -492,10 +492,10 @@ connected: int ird = attr-max_dest_rd_atomic; int tird = ep-rep_remote_cma.responder_resources; - pr_info(rpcrdma: connection to %pIS:%u on %s, memreg %d slots %d ird %d%s\n, + pr_info(rpcrdma: connection to %pIS:%u on %s, memreg '%s', %d credits, %d responders%s\n, sap, rpc_get_port(sap), ia-ri_id-device-name, - ia-ri_memreg_strategy, + ia-ri_ops-ro_displayname,
[PATCH v3 04/15] xprtrdma: Byte-align FRWR registration
The RPC/RDMA transport's FRWR registration logic registers whole pages. This means areas in the first and last pages that are not involved in the RDMA I/O are needlessly exposed to the server. Buffered I/O is typically page-aligned, so not a problem there. But for direct I/O, which can be byte-aligned, and for reply chunks, which are nearly always smaller than a page, the transport could expose memory outside the I/O buffer. FRWR allows byte-aligned memory registration, so let's use it as it was intended. Reported-by: Sagi Grimberg sa...@mellanox.com Signed-off-by: Chuck Lever chuck.le...@oracle.com Tested-by: Devesh Sharma devesh.sha...@emulex.com Tested-by: Meghana Cheripady meghana.cherip...@emulex.com Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com --- net/sunrpc/xprtrdma/verbs.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 1aa55b7..60f3317 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -1924,23 +1924,19 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg, offset_in_page((seg-1)-mr_offset + (seg-1)-mr_len)) break; } - dprintk(RPC: %s: Using frmr %p to map %d segments\n, - __func__, mw, i); + dprintk(RPC: %s: Using frmr %p to map %d segments (%d bytes)\n, + __func__, mw, i, len); frmr-fr_state = FRMR_IS_VALID; memset(fastreg_wr, 0, sizeof(fastreg_wr)); fastreg_wr.wr_id = (unsigned long)(void *)mw; fastreg_wr.opcode = IB_WR_FAST_REG_MR; - fastreg_wr.wr.fast_reg.iova_start = seg1-mr_dma; + fastreg_wr.wr.fast_reg.iova_start = seg1-mr_dma + pageoff; fastreg_wr.wr.fast_reg.page_list = frmr-fr_pgl; fastreg_wr.wr.fast_reg.page_list_len = page_no; fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT; - fastreg_wr.wr.fast_reg.length = page_no PAGE_SHIFT; - if (fastreg_wr.wr.fast_reg.length len) { - rc = -EIO; - goto out_err; - } + fastreg_wr.wr.fast_reg.length = len; /* Bump the key */ key = (u8)(mr-rkey 0x00FF); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 08/15] xprtrdma: Add a register_external op for each memreg mode
There is very little common processing among the different external memory registration functions. Have rpcrdma_create_chunks() call the registration method directly. This removes a stack frame and a switch statement from the external registration path. Signed-off-by: Chuck Lever chuck.le...@oracle.com Tested-by: Devesh Sharma devesh.sha...@emulex.com Tested-by: Meghana Cheripady meghana.cherip...@emulex.com Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com --- net/sunrpc/xprtrdma/fmr_ops.c | 51 +++ net/sunrpc/xprtrdma/frwr_ops.c | 82 ++ net/sunrpc/xprtrdma/physical_ops.c | 17 net/sunrpc/xprtrdma/rpc_rdma.c |5 + net/sunrpc/xprtrdma/verbs.c| 168 +--- net/sunrpc/xprtrdma/xprt_rdma.h|6 + 6 files changed, 160 insertions(+), 169 deletions(-) diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c index eec2660..45fb646 100644 --- a/net/sunrpc/xprtrdma/fmr_ops.c +++ b/net/sunrpc/xprtrdma/fmr_ops.c @@ -29,7 +29,58 @@ fmr_op_maxpages(struct rpcrdma_xprt *r_xprt) rpcrdma_max_segments(r_xprt) * RPCRDMA_MAX_FMR_SGES); } +/* Use the ib_map_phys_fmr() verb to register a memory region + * for remote access via RDMA READ or RDMA WRITE. + */ +static int +fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, + int nsegs, bool writing) +{ + struct rpcrdma_ia *ia = r_xprt-rx_ia; + struct rpcrdma_mr_seg *seg1 = seg; + struct rpcrdma_mw *mw = seg1-rl_mw; + u64 physaddrs[RPCRDMA_MAX_DATA_SEGS]; + int len, pageoff, i, rc; + + pageoff = offset_in_page(seg1-mr_offset); + seg1-mr_offset -= pageoff; /* start of page */ + seg1-mr_len += pageoff; + len = -pageoff; + if (nsegs RPCRDMA_MAX_FMR_SGES) + nsegs = RPCRDMA_MAX_FMR_SGES; + for (i = 0; i nsegs;) { + rpcrdma_map_one(ia, seg, writing); + physaddrs[i] = seg-mr_dma; + len += seg-mr_len; + ++seg; + ++i; + /* Check for holes */ + if ((i nsegs offset_in_page(seg-mr_offset)) || + offset_in_page((seg-1)-mr_offset + (seg-1)-mr_len)) + break; + } + + rc = ib_map_phys_fmr(mw-r.fmr, physaddrs, i, seg1-mr_dma); + if (rc) + goto out_maperr; + + seg1-mr_rkey = mw-r.fmr-rkey; + seg1-mr_base = seg1-mr_dma + pageoff; + seg1-mr_nsegs = i; + seg1-mr_len = len; + return i; + +out_maperr: + dprintk(RPC: %s: ib_map_phys_fmr %u@0x%llx+%i (%d) status %i\n, + __func__, len, (unsigned long long)seg1-mr_dma, + pageoff, i, rc); + while (i--) + rpcrdma_unmap_one(ia, --seg); + return rc; +} + const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = { + .ro_map = fmr_op_map, .ro_maxpages= fmr_op_maxpages, .ro_displayname = fmr, }; diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index 73a5ac8..23e4d99 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -29,7 +29,89 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt) rpcrdma_max_segments(r_xprt) * ia-ri_max_frmr_depth); } +/* Post a FAST_REG Work Request to register a memory region + * for remote access via RDMA READ or RDMA WRITE. + */ +static int +frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, + int nsegs, bool writing) +{ + struct rpcrdma_ia *ia = r_xprt-rx_ia; + struct rpcrdma_mr_seg *seg1 = seg; + struct rpcrdma_mw *mw = seg1-rl_mw; + struct rpcrdma_frmr *frmr = mw-r.frmr; + struct ib_mr *mr = frmr-fr_mr; + struct ib_send_wr fastreg_wr, *bad_wr; + u8 key; + int len, pageoff; + int i, rc; + int seg_len; + u64 pa; + int page_no; + + pageoff = offset_in_page(seg1-mr_offset); + seg1-mr_offset -= pageoff; /* start of page */ + seg1-mr_len += pageoff; + len = -pageoff; + if (nsegs ia-ri_max_frmr_depth) + nsegs = ia-ri_max_frmr_depth; + for (page_no = i = 0; i nsegs;) { + rpcrdma_map_one(ia, seg, writing); + pa = seg-mr_dma; + for (seg_len = seg-mr_len; seg_len 0; seg_len -= PAGE_SIZE) { + frmr-fr_pgl-page_list[page_no++] = pa; + pa += PAGE_SIZE; + } + len += seg-mr_len; + ++seg; + ++i; + /* Check for holes */ + if ((i nsegs offset_in_page(seg-mr_offset)) || + offset_in_page((seg-1)-mr_offset + (seg-1)-mr_len)) + break; + } + dprintk(RPC: %s: Using frmr %p to map %d segments (%d bytes)\n, +
Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check
On Mon, Mar 30, 2015 at 05:10:12PM +0200, Michael Wang wrote: I found that actually we don't have to touch this one which only used by HW driver currently. I'm having a hard time understanding this, the code in question was in net/sunrpc/xprtrdma/svc_rdma_recvfrom.c Which is the NFS ULP, not a device driver. Regards, Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 07/11] IB/Verbs: Use management helper has_mcast() and, cap_mcast() for mcast-check
On Mon, Mar 30, 2015 at 10:30:36AM +0200, Michael Wang wrote: Thus I also agreed check inside mcast_event_handler() is unnecessary, maybe we can change that logical to WARN_ON(!cap_mcast()) ? Seems reasonable to me. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 07/11] IB/Verbs: Use management helper has_mcast() and, cap_mcast() for mcast-check
On Mon, Mar 30, 2015 at 06:20:48PM +0200, Michael Wang wrote: On 03/30/2015 06:11 PM, Doug Ledford wrote: On Fri, 2015-03-27 at 16:46 +0100, Michael Wang wrote: Introduce helper has_mcast() and cap_mcast() to help us check if an IB device or it's port support Multicast. This probably needs reworded or rethought. In truth, *all* rdma devices are multicast capable. *BUT*, IB/OPA devices require multicast registration done the IB way (including for sendonly multicast sends), while Ethernet devices do multicast the Ethernet way. These tests are really just for IB specific multicast registration and deregistration. Call it has_mcast() and cap_mcast() is incorrect. Thanks for the explanation :-) Jason also mentioned we should use cap_ib_XX() instead, I'll use that name then we can distinguish the management between Eth and IB/OPA. Regards, Michael Wang Cc: Jason Gunthorpe jguntho...@obsidianresearch.com Cc: Doug Ledford dledf...@redhat.com Cc: Ira Weiny ira.we...@intel.com Cc: Sean Hefty sean.he...@intel.com Signed-off-by: Michael Wang yun.w...@profitbricks.com --- drivers/infiniband/core/cma.c | 2 +- drivers/infiniband/core/multicast.c | 8 include/rdma/ib_verbs.h | 28 3 files changed, 33 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 276fb76..cbbc85b 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -3398,7 +3398,7 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr) ib_detach_mcast(id-qp, mc-multicast.ib-rec.mgid, be16_to_cpu(mc-multicast.ib-rec.mlid)); -if (rdma_transport_is_ib(id_priv-cma_dev-device)) { +if (has_mcast(id_priv-cma_dev-device)) { You need a similar check in rdma_join_multicast. Ira switch (rdma_port_get_link_layer(id-device, id-port_num)) { case IB_LINK_LAYER_INFINIBAND: ib_sa_free_multicast(mc-multicast.ib); diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c index 17573ff..ffeaf27 100644 --- a/drivers/infiniband/core/multicast.c +++ b/drivers/infiniband/core/multicast.c @@ -780,7 +780,7 @@ static void mcast_event_handler(struct ib_event_handler *handler, int index; dev = container_of(handler, struct mcast_device, event_handler); -if (!rdma_port_ll_is_ib(dev-device, event-element.port_num)) +if (!cap_mcast(dev-device, event-element.port_num)) return; index = event-element.port_num - dev-start_port; @@ -807,7 +807,7 @@ static void mcast_add_one(struct ib_device *device) int i; int count = 0; -if (!rdma_transport_is_ib(device)) +if (!has_mcast(device)) return; dev = kmalloc(sizeof *dev + device-phys_port_cnt * sizeof *port, @@ -823,7 +823,7 @@ static void mcast_add_one(struct ib_device *device) } for (i = 0; i = dev-end_port - dev-start_port; i++) { -if (!rdma_port_ll_is_ib(device, dev-start_port + i)) +if (!cap_mcast(device, dev-start_port + i)) continue; port = dev-port[i]; port-dev = dev; @@ -861,7 +861,7 @@ static void mcast_remove_one(struct ib_device *device) flush_workqueue(mcast_wq); for (i = 0; i = dev-end_port - dev-start_port; i++) { -if (rdma_port_ll_is_ib(device, dev-start_port + i)) { +if (cap_mcast(device, dev-start_port + i)) { port = dev-port[i]; deref_port(port); wait_for_completion(port-comp); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index fa8ffa3..e796104 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1823,6 +1823,19 @@ static inline int has_sa(struct ib_device *device) } /** + * has_mcast - Check if a device support Multicast. + * + * @device: Device to be checked + * + * Return 0 when a device has none port to support + * Multicast. + */ +static inline int has_mcast(struct ib_device *device) +{ +return rdma_transport_is_ib(device); +} + +/** * cap_smi - Check if the port of device has the capability * Subnet Management Interface. * @@ -1852,6 +1865,21 @@ static inline int cap_sa(struct ib_device *device, u8 port_num) return rdma_port_ll_is_ib(device, port_num); } +/** + * cap_mcast - Check if the port of device has the capability + * Multicast. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device don't support + * Multicast. + */ +static inline int cap_mcast(struct ib_device *device, u8 port_num) +{ +return
Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management
From: Or Gerlitz gerlitz...@gmail.com Date: Mon, 30 Mar 2015 19:17:01 +0300 On Sun, Mar 29, 2015 at 4:51 PM, Or Gerlitz ogerl...@mellanox.com wrote: Under the existing implementation for virtual GIDs, if the SM is not reachable or incurs a delayed response, or if the VF is probed into a VM before their GUID is registered with the SM, there exists a window in time in which the VF sees an incorrect GID, i.e., not the GID that was intended by the admin. This results in exposing a temporal identity to the VF. Hi Roland, so your for-next branch is again way behind, still on 3.19 and while 4.0 is soon @ rc6, we couldn't even rebase this series on it. It's really hard where your tree is really active once every nine weeks or so, e.g only few days before/after rc1's. I'm not sure what you expect us to do, kernel development simply needs not be like this. April 3rd-12th is holiday here, and we would like to really really know early this week what you intend to pull for 4.1 out of the pending things in linux-rdma. Roland, I have to genuinely agree with Or, that your handling of patch integration is sub-par and really painful for anyone actually trying to get real work done here. If you simply don't have the time to devote to constantly reviewing patches as they come in, and doing so in a timely manner, please let someone who is actually interested and has the time to take over. Only integrating peoples work right before the merge window, and then disappearing for a long time really isn't acceptable. Thanks! -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management
Roland, I have to genuinely agree with Or, that your handling of patch integration is sub-par and really painful for anyone actually trying to get real work done here. If you simply don't have the time to devote to constantly reviewing patches as they come in, and doing so in a timely manner, please let someone who is actually interested and has the time to take over. It's a fair criticism, and certainly for at least the last year or so I have not had the time to do enough work as a maintainer. I have hope that some of the things that have been keeping me busy are dying down and that I'll have more time to spend on handling the RDMA tree, but that's just talk until I actually get more done. I really would like to get more people involved in handling the flow of patches but I'm not sure who has not only the interest and the time but also the judgement and expertise to take over. Certainly Or has been a long time contributor who has done a lot of great things, but I still worry about things like ABI stability and backwards compatibility. But I'm open to ideas. - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/11] IB/Verbs: Use helpers to check transport and link layer
On Fri, 2015-03-27 at 16:40 +0100, Michael Wang wrote: We have so much places to check transport type and link layer type, it's now make sense to introduce some helpers in order to refine the lengthy code. This patch will introduce helpers: rdma_transport_is_ib() rdma_transport_is_iwarp() rdma_port_ll_is_ib() rdma_port_ll_is_eth() and use them to save some code for us. If the end result is to do something like I proposed, then why take this intermediate step that just has to be backed out later? In other words, if our end goal is to have rdma_transport_is_ib() rdma_transport_is_iwarp() rdma_transport_is_roce() rdma_transport_is_opa() Then we should skip doing rdma_port_ll_is_*() as the answers to these items would be implied by rdma_transport_is_roce() and such. Cc: Jason Gunthorpe jguntho...@obsidianresearch.com Cc: Doug Ledford dledf...@redhat.com Cc: Ira Weiny ira.we...@intel.com Cc: Sean Hefty sean.he...@intel.com Signed-off-by: Michael Wang yun.w...@profitbricks.com --- drivers/infiniband/core/agent.c | 2 +- drivers/infiniband/core/cm.c | 2 +- drivers/infiniband/core/cma.c | 27 --- drivers/infiniband/core/mad.c | 6 +++--- drivers/infiniband/core/multicast.c | 11 --- drivers/infiniband/core/sa_query.c| 14 +++--- drivers/infiniband/core/ucm.c | 3 +-- drivers/infiniband/core/user_mad.c| 2 +- drivers/infiniband/core/verbs.c | 5 ++--- drivers/infiniband/hw/mlx4/ah.c | 2 +- drivers/infiniband/hw/mlx4/cq.c | 4 +--- drivers/infiniband/hw/mlx4/mad.c | 14 -- drivers/infiniband/hw/mlx4/main.c | 8 +++- drivers/infiniband/hw/mlx4/mlx4_ib.h | 2 +- drivers/infiniband/hw/mlx4/qp.c | 21 +++-- drivers/infiniband/hw/mlx4/sysfs.c| 6 ++ drivers/infiniband/ulp/ipoib/ipoib_main.c | 6 +++--- include/rdma/ib_verbs.h | 24 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 3 +-- 19 files changed, 79 insertions(+), 83 deletions(-) diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c index f6d2961..27f1bec 100644 --- a/drivers/infiniband/core/agent.c +++ b/drivers/infiniband/core/agent.c @@ -156,7 +156,7 @@ int ib_agent_port_open(struct ib_device *device, int port_num) goto error1; } -if (rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND) { +if (rdma_port_ll_is_ib(device, port_num)) { /* Obtain send only MAD agent for SMI QP */ port_priv-agent[0] = ib_register_mad_agent(device, port_num, IB_QPT_SMI, NULL, 0, diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index e28a494..2c72e9e 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3762,7 +3762,7 @@ static void cm_add_one(struct ib_device *ib_device) int ret; u8 i; -if (rdma_node_get_transport(ib_device-node_type) != RDMA_TRANSPORT_IB) +if (!rdma_transport_is_ib(ib_device)) return; cm_dev = kzalloc(sizeof(*cm_dev) + sizeof(*port) * diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d570030..668e955 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -375,8 +375,8 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv, listen_id_priv-id.port_num) == dev_ll) { cma_dev = listen_id_priv-cma_dev; port = listen_id_priv-id.port_num; -if (rdma_node_get_transport(cma_dev-device-node_type) == RDMA_TRANSPORT_IB -rdma_port_get_link_layer(cma_dev-device, port) == IB_LINK_LAYER_ETHERNET) +if (rdma_transport_is_ib(cma_dev-device) +rdma_port_ll_is_eth(cma_dev-device, port)) ret = ib_find_cached_gid(cma_dev-device, iboe_gid, found_port, NULL); else @@ -395,8 +395,8 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv, listen_id_priv-id.port_num == port) continue; if (rdma_port_get_link_layer(cma_dev-device, port) == dev_ll) { -if (rdma_node_get_transport(cma_dev-device-node_type) == RDMA_TRANSPORT_IB -rdma_port_get_link_layer(cma_dev-device, port) == IB_LINK_LAYER_ETHERNET) +if (rdma_transport_is_ib(cma_dev-device) +rdma_port_ll_is_eth(cma_dev-device, port)) ret = ib_find_cached_gid(cma_dev-device, iboe_gid, found_port, NULL); else ret = ib_find_cached_gid(cma_dev-device, gid, found_port, NULL); @@ -435,7 +435,7 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv) pkey =
Re: [RFC PATCH 07/11] IB/Verbs: Use management helper has_mcast() and, cap_mcast() for mcast-check
On Fri, 2015-03-27 at 16:46 +0100, Michael Wang wrote: Introduce helper has_mcast() and cap_mcast() to help us check if an IB device or it's port support Multicast. This probably needs reworded or rethought. In truth, *all* rdma devices are multicast capable. *BUT*, IB/OPA devices require multicast registration done the IB way (including for sendonly multicast sends), while Ethernet devices do multicast the Ethernet way. These tests are really just for IB specific multicast registration and deregistration. Call it has_mcast() and cap_mcast() is incorrect. Cc: Jason Gunthorpe jguntho...@obsidianresearch.com Cc: Doug Ledford dledf...@redhat.com Cc: Ira Weiny ira.we...@intel.com Cc: Sean Hefty sean.he...@intel.com Signed-off-by: Michael Wang yun.w...@profitbricks.com --- drivers/infiniband/core/cma.c | 2 +- drivers/infiniband/core/multicast.c | 8 include/rdma/ib_verbs.h | 28 3 files changed, 33 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 276fb76..cbbc85b 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -3398,7 +3398,7 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr) ib_detach_mcast(id-qp, mc-multicast.ib-rec.mgid, be16_to_cpu(mc-multicast.ib-rec.mlid)); -if (rdma_transport_is_ib(id_priv-cma_dev-device)) { +if (has_mcast(id_priv-cma_dev-device)) { switch (rdma_port_get_link_layer(id-device, id-port_num)) { case IB_LINK_LAYER_INFINIBAND: ib_sa_free_multicast(mc-multicast.ib); diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c index 17573ff..ffeaf27 100644 --- a/drivers/infiniband/core/multicast.c +++ b/drivers/infiniband/core/multicast.c @@ -780,7 +780,7 @@ static void mcast_event_handler(struct ib_event_handler *handler, int index; dev = container_of(handler, struct mcast_device, event_handler); -if (!rdma_port_ll_is_ib(dev-device, event-element.port_num)) +if (!cap_mcast(dev-device, event-element.port_num)) return; index = event-element.port_num - dev-start_port; @@ -807,7 +807,7 @@ static void mcast_add_one(struct ib_device *device) int i; int count = 0; -if (!rdma_transport_is_ib(device)) +if (!has_mcast(device)) return; dev = kmalloc(sizeof *dev + device-phys_port_cnt * sizeof *port, @@ -823,7 +823,7 @@ static void mcast_add_one(struct ib_device *device) } for (i = 0; i = dev-end_port - dev-start_port; i++) { -if (!rdma_port_ll_is_ib(device, dev-start_port + i)) +if (!cap_mcast(device, dev-start_port + i)) continue; port = dev-port[i]; port-dev = dev; @@ -861,7 +861,7 @@ static void mcast_remove_one(struct ib_device *device) flush_workqueue(mcast_wq); for (i = 0; i = dev-end_port - dev-start_port; i++) { -if (rdma_port_ll_is_ib(device, dev-start_port + i)) { +if (cap_mcast(device, dev-start_port + i)) { port = dev-port[i]; deref_port(port); wait_for_completion(port-comp); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index fa8ffa3..e796104 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1823,6 +1823,19 @@ static inline int has_sa(struct ib_device *device) } /** + * has_mcast - Check if a device support Multicast. + * + * @device: Device to be checked + * + * Return 0 when a device has none port to support + * Multicast. + */ +static inline int has_mcast(struct ib_device *device) +{ +return rdma_transport_is_ib(device); +} + +/** * cap_smi - Check if the port of device has the capability * Subnet Management Interface. * @@ -1852,6 +1865,21 @@ static inline int cap_sa(struct ib_device *device, u8 port_num) return rdma_port_ll_is_ib(device, port_num); } +/** + * cap_mcast - Check if the port of device has the capability + * Multicast. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device don't support + * Multicast. + */ +static inline int cap_mcast(struct ib_device *device, u8 port_num) +{ +return rdma_port_ll_is_ib(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD signature.asc Description: This is a digitally signed message part
Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check
On Fri, 2015-03-27 at 16:47 +0100, Michael Wang wrote: Introduce helper has_iwarp() to help us check if an IB device support IWARP protocol. This is a needless redirection. Just stick with the original rdma_transport_is_iwarp(). Cc: Jason Gunthorpe jguntho...@obsidianresearch.com Cc: Doug Ledford dledf...@redhat.com Cc: Ira Weiny ira.we...@intel.com Cc: Sean Hefty sean.he...@intel.com Signed-off-by: Michael Wang yun.w...@profitbricks.com --- include/rdma/ib_verbs.h | 13 + net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 2 +- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index e796104..0ef9cd7 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1836,6 +1836,19 @@ static inline int has_mcast(struct ib_device *device) } /** + * has_iwarp - Check if a device support IWARP protocol. + * + * @device: Device to be checked + * + * Return 0 when a device has none port to support + * IWARP protocol. + */ +static inline int has_iwarp(struct ib_device *device) +{ +return rdma_transport_is_iwarp(device); +} + +/** * cap_smi - Check if the port of device has the capability * Subnet Management Interface. * diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index a7b5891..48aeb5e 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -118,7 +118,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp, static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count) { -if (rdma_transport_is_iwarp(xprt-sc_cm_id-device)) +if (has_iwarp(xprt-sc_cm_id-device)) return 1; else return min_t(int, sge_count, xprt-sc_max_sge); -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH 01/11] IB/Verbs: Use helpers to check transport and link layer
Hi, Doug Thanks for the comments :-) On 03/30/2015 05:56 PM, Doug Ledford wrote: On Fri, 2015-03-27 at 16:40 +0100, Michael Wang wrote: We have so much places to check transport type and link layer type, it's now make sense to introduce some helpers in order to refine the lengthy code. This patch will introduce helpers: rdma_transport_is_ib() rdma_transport_is_iwarp() rdma_port_ll_is_ib() rdma_port_ll_is_eth() and use them to save some code for us. If the end result is to do something like I proposed, then why take this intermediate step that just has to be backed out later? The problem is that I found there are still many places our new mechanism may could not take care, especially inside device driver, this is just try to collect the issues together as a basement so we can gradually eliminate them. Sure if finally we do capture all the cases, we can just get rid of this one, but I guess it won't be that easy to directly jump into next stage :-P As I could imaging, after this reform, next stage could be introducing the new mechanism without changing device driver, and the last stage is to asking vendor adapt their code into the new mechanism. In other words, if our end goal is to have rdma_transport_is_ib() rdma_transport_is_iwarp() rdma_transport_is_roce() rdma_transport_is_opa() Then we should skip doing rdma_port_ll_is_*() as the answers to these items would be implied by rdma_transport_is_roce() and such. Great if we achieved that ;-) but currently I just wondering maybe these helpers can only cover part of the cases where we check transport and link layer, there are still some cases we'll need the very rough helper to save some code and make things clean~ Regards, Michael Wang Cc: Jason Gunthorpe jguntho...@obsidianresearch.com Cc: Doug Ledford dledf...@redhat.com Cc: Ira Weiny ira.we...@intel.com Cc: Sean Hefty sean.he...@intel.com Signed-off-by: Michael Wang yun.w...@profitbricks.com --- drivers/infiniband/core/agent.c | 2 +- drivers/infiniband/core/cm.c | 2 +- drivers/infiniband/core/cma.c | 27 --- drivers/infiniband/core/mad.c | 6 +++--- drivers/infiniband/core/multicast.c | 11 --- drivers/infiniband/core/sa_query.c| 14 +++--- drivers/infiniband/core/ucm.c | 3 +-- drivers/infiniband/core/user_mad.c| 2 +- drivers/infiniband/core/verbs.c | 5 ++--- drivers/infiniband/hw/mlx4/ah.c | 2 +- drivers/infiniband/hw/mlx4/cq.c | 4 +--- drivers/infiniband/hw/mlx4/mad.c | 14 -- drivers/infiniband/hw/mlx4/main.c | 8 +++- drivers/infiniband/hw/mlx4/mlx4_ib.h | 2 +- drivers/infiniband/hw/mlx4/qp.c | 21 +++-- drivers/infiniband/hw/mlx4/sysfs.c| 6 ++ drivers/infiniband/ulp/ipoib/ipoib_main.c | 6 +++--- include/rdma/ib_verbs.h | 24 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 3 +-- 19 files changed, 79 insertions(+), 83 deletions(-) diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c index f6d2961..27f1bec 100644 --- a/drivers/infiniband/core/agent.c +++ b/drivers/infiniband/core/agent.c @@ -156,7 +156,7 @@ int ib_agent_port_open(struct ib_device *device, int port_num) goto error1; } -if (rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND) { +if (rdma_port_ll_is_ib(device, port_num)) { /* Obtain send only MAD agent for SMI QP */ port_priv-agent[0] = ib_register_mad_agent(device, port_num, IB_QPT_SMI, NULL, 0, diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index e28a494..2c72e9e 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3762,7 +3762,7 @@ static void cm_add_one(struct ib_device *ib_device) int ret; u8 i; -if (rdma_node_get_transport(ib_device-node_type) != RDMA_TRANSPORT_IB) +if (!rdma_transport_is_ib(ib_device)) return; cm_dev = kzalloc(sizeof(*cm_dev) + sizeof(*port) * diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d570030..668e955 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -375,8 +375,8 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv, listen_id_priv-id.port_num) == dev_ll) { cma_dev = listen_id_priv-cma_dev; port = listen_id_priv-id.port_num; -if (rdma_node_get_transport(cma_dev-device-node_type) == RDMA_TRANSPORT_IB -rdma_port_get_link_layer(cma_dev-device, port) == IB_LINK_LAYER_ETHERNET) +if (rdma_transport_is_ib(cma_dev-device) +rdma_port_ll_is_eth(cma_dev-device, port)) ret =
Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management
On Sun, Mar 29, 2015 at 4:51 PM, Or Gerlitz ogerl...@mellanox.com wrote: Under the existing implementation for virtual GIDs, if the SM is not reachable or incurs a delayed response, or if the VF is probed into a VM before their GUID is registered with the SM, there exists a window in time in which the VF sees an incorrect GID, i.e., not the GID that was intended by the admin. This results in exposing a temporal identity to the VF. Hi Roland, so your for-next branch is again way behind, still on 3.19 and while 4.0 is soon @ rc6, we couldn't even rebase this series on it. It's really hard where your tree is really active once every nine weeks or so, e.g only few days before/after rc1's. I'm not sure what you expect us to do, kernel development simply needs not be like this. April 3rd-12th is holiday here, and we would like to really really know early this week what you intend to pull for 4.1 out of the pending things in linux-rdma. Or. Moreover, a subsequent change in the alias GID causes a spec-incompliant change to the VF identity. Some guest operating systems, such as Windows, cannot tolerate such changes. This series solves above problem by exposing the admin desired value instead of the value that was approved by the SM. As long as the SM doesn't approve the GID, the VF would see its link as down. In addition, we request GIDs from the SM on demand, i.e., when a VF actually needs them, and release them when the GIDs are no longer in use. In cloud environments, this is useful for GID migrations, in which a GID is assigned to a VF on the destination HCA, while the VF on the source HCA is shut down (but the GID was not administratively released). For reasons of compatibility, an explicit admin request to set/change a GUID entry is done immediately, regardless of whether the VF is active or not. This allows administrators to change the GUID without the need to unbind/bind the VF. In addition, the existing implementation doesn't support a persistency mechanism to retry a GUID request when the SM has rejected it for any reason. The PF driver shall keep trying to acquire the specified GUID indefinitely by utilizing an exponential back off scheme, this should be managed per GUID and be aligned with other incoming admin requests. This ability needed especially for the on-demand GUID feature. In this case, we must manage the GUID's status per entry and handle cases that some entries are temporarily rejected. The first patch adds the persistency support and is pre-requisites for the series. Further patches make the change to use the admin VF behavior as described above. Finally, the default mode is changed to be HOST assigned instead of SM assigned. This is the expected operational mode, because it doesn't depend on SM availability as described above. Yishai and Or. Yishai Hadas (9): IB/mlx4: Alias GUID adding persistency support net/mlx4_core: Manage alias GUID per VF net/mlx4_core: Set initial admin GUIDs for VFs IB/mlx4: Manage admin alias GUID upon admin request IB/mlx4: Change init flow to request alias GUIDs for active VFs IB/mlx4: Request alias GUID on demand net/mlx4_core: Raise slave shutdown event upon FLR net/mlx4_core: Return the admin alias GUID upon host view request IB/mlx4: Change alias guids default to be host assigned drivers/infiniband/hw/mlx4/alias_GUID.c | 468 + drivers/infiniband/hw/mlx4/main.c | 26 ++- drivers/infiniband/hw/mlx4/mlx4_ib.h | 14 +- drivers/infiniband/hw/mlx4/sysfs.c| 44 +-- drivers/net/ethernet/mellanox/mlx4/cmd.c | 42 ++- drivers/net/ethernet/mellanox/mlx4/eq.c |2 + drivers/net/ethernet/mellanox/mlx4/main.c | 39 +++ drivers/net/ethernet/mellanox/mlx4/mlx4.h |1 + include/linux/mlx4/device.h |4 + 9 files changed, 459 insertions(+), 181 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 02/11] IB/Verbs: Use management helper tech_iboe() for iboe-check
On Fri, 2015-03-27 at 16:42 +0100, Michael Wang wrote: Introduce helper tech_iboe() to help us check if the port of an IB device is using RoCE/IBoE technology. Just use rdma_transport_is_roce() instead. Cc: Jason Gunthorpe jguntho...@obsidianresearch.com Cc: Doug Ledford dledf...@redhat.com Cc: Ira Weiny ira.we...@intel.com Cc: Sean Hefty sean.he...@intel.com Signed-off-by: Michael Wang yun.w...@profitbricks.com --- drivers/infiniband/core/cma.c | 6 ++ include/rdma/ib_verbs.h | 16 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 668e955..280cfe3 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -375,8 +375,7 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv, listen_id_priv-id.port_num) == dev_ll) { cma_dev = listen_id_priv-cma_dev; port = listen_id_priv-id.port_num; -if (rdma_transport_is_ib(cma_dev-device) -rdma_port_ll_is_eth(cma_dev-device, port)) +if (tech_iboe(cma_dev-device, port)) ret = ib_find_cached_gid(cma_dev-device, iboe_gid, found_port, NULL); else @@ -395,8 +394,7 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv, listen_id_priv-id.port_num == port) continue; if (rdma_port_get_link_layer(cma_dev-device, port) == dev_ll) { -if (rdma_transport_is_ib(cma_dev-device) -rdma_port_ll_is_eth(cma_dev-device, port)) +if (tech_iboe(cma_dev-device, port)) ret = ib_find_cached_gid(cma_dev-device, iboe_gid, found_port, NULL); else ret = ib_find_cached_gid(cma_dev-device, gid, found_port, NULL); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 2bf9094..ca6d6bc 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1767,6 +1767,22 @@ static inline int rdma_port_ll_is_eth(struct ib_device *device, u8 port_num) == IB_LINK_LAYER_ETHERNET; } +/** + * tech_iboe - Check if the port of device using technology + * RoCE/IBoE. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device is not using technology + * RoCE/IBoE. + */ +static inline int tech_iboe(struct ib_device *device, u8 port_num) +{ +return rdma_transport_is_ib(device) +rdma_port_ll_is_eth(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD signature.asc Description: This is a digitally signed message part