date:20150330

Re: [RFC PATCH 07/11] IB/Verbs: Use management helper has_mcast() and, cap_mcast() for mcast-check

2015-03-30 Thread Michael Wang



On 03/27/2015 06:47 PM, Jason Gunthorpe wrote:
 On Fri, Mar 27, 2015 at 01:05:08PM -0400, ira.weiny wrote:

 But it seems redudent, since mcast_add_one will already not add a port that 
 is
 not IB, so mcast_event_handler is not callable. Something to do with
 rocee/ib switching?
 I'm not sure about this either.  This check seems to be necessary only on a
 per-port level.  It does seem apparent that one can't go from Eth to IB.  
 What
 happens if you go from IB to Eth on the port?
 Hmm... I see a mlx4_change_port_types which ultimately calls
 ib_unregister_device, which suggests the port type doesn't change at
 runtime (yay)
Yeah, seems like mlx4 will reinitialize the device when port link layer
changed.

I've take a look at other HW, they directly return a static type or
infer from transport type (I suppose this won't change dynamically).

Thus I also agreed check inside mcast_event_handler() is unnecessary,
maybe we can change that logical to WARN_ON(!cap_mcast()) ?

Regards,
Michael Wang


 So maybe these checks really are redundant?

 Jason

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

infiniband/ulp/srpt/ib_srpt.c:1082: bug report

2015-03-30 Thread David Binderman

Hello there,

[linux-4.0-rc6/drivers/infiniband/ulp/srpt/ib_srpt.c:1082] - 
[linux-4.0-rc6/drivers/infiniband/ulp/srpt/ib_srpt.c:1098]: (warning) Possible 
null pointer dereference: ch - otherwise it is redundant to check it against 
null.

    struct ib_device *dev = ch-sport-sdev-device;

...

   BUG_ON(!ch);

Suggest move init of dev until *after* ch has been sanity checked against NULL.

Regards

David Binderman

  --
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] OpenSM: command line option ignore-guids broken

2015-03-30 Thread Hal Rosenstock

On 3/27/2015 5:00 AM, Jens Domke wrote:
 this patch changes the documentation (--help and man page) from
 --ignore-guids to --ignore_guids, so that it matches the implementation
 
 Signed-off-by: Jens Domke jens.do...@tu-dresden.de

Thanks. Applied.

-- Hal
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IB/srpt: Suppress a compiler warning

2015-03-30 Thread Sagi Grimberg


On 3/30/2015 1:37 PM, Bart Van Assche wrote:

Remove a BUG_ON(!ch) statement because it is superfluous - if
the ch pointer would be NULL then the assignment in the first
line of srpt_map_sg_to_ib_sge() would trigger a kernel oops anyway.

This patch suppresses the following compiler warning:

Possible null pointer dereference: ch - otherwise it is redundant to
check it against null.

Reported-by: David Binderman dcb...@hotmail.com
Signed-off-by: Bart Van Assche bart.vanass...@sandisk.com
---
  drivers/infiniband/ulp/srpt/ib_srpt.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 6e0a477..4e74fc8 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -1095,7 +1095,6 @@ static int srpt_map_sg_to_ib_sge(struct
srpt_rdma_ch *ch,
int count, nrdma;
int i, j, k;

-   BUG_ON(!ch);
BUG_ON(!ioctx);
cmd = ioctx-cmd;
dir = cmd-data_direction;



Acked-by: Sagi Grimberg sa...@mellanox.com
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [-stable] commit 377b513485fd (IB/core: Avoid leakage from kernel to user space)

2015-03-30 Thread Luis Henriques

On Fri, Mar 27, 2015 at 01:42:44PM +0100, Yann Droneaud wrote:
 Hi,
 
 Please add commit 377b513485fd (IB/core: Avoid leakage from kernel to
 user space) to -stable. It can be applied to v2.6.32 and later.
 
 Regards.
 
 -- 
 Yann Droneaud
 OPTEYA
 
 
 --
 To unsubscribe from this list: send the line unsubscribe stable in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks, I'm queuing it for the 3.16 kernel.

Cheers,
--
Luís
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] IB/srpt: Suppress a compiler warning

2015-03-30 Thread Bart Van Assche

Remove a BUG_ON(!ch) statement because it is superfluous - if
the ch pointer would be NULL then the assignment in the first
line of srpt_map_sg_to_ib_sge() would trigger a kernel oops anyway.

This patch suppresses the following compiler warning:

Possible null pointer dereference: ch - otherwise it is redundant to
check it against null.

Reported-by: David Binderman dcb...@hotmail.com
Signed-off-by: Bart Van Assche bart.vanass...@sandisk.com
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 6e0a477..4e74fc8 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -1095,7 +1095,6 @@ static int srpt_map_sg_to_ib_sge(struct
srpt_rdma_ch *ch,
int count, nrdma;
int i, j, k;

-   BUG_ON(!ch);
BUG_ON(!ioctx);
cmd = ioctx-cmd;
dir = cmd-data_direction;
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check

2015-03-30 Thread Michael Wang

On 03/27/2015 06:29 PM, Jason Gunthorpe wrote:
 On Fri, Mar 27, 2015 at 01:16:31PM -0400, ira.weiny wrote:
 [snip]
 http://www.spinics.net/lists/linux-rdma/msg22565.html

 ''Unlike IB, the iWARP protocol only allows 1 target/sink SGE in an
 rdma read''

 It is one of those annoying verbs is different on iWarp things.

 So the max sge in the query_verbs must only apply to send/rdma write
 on iWarp?
I found that actually we don't have to touch this one which
only used by HW driver currently.

I think we can leave these scenes there in device driver, since
vendor could have different way to classify the usage of transfer
and link layer.

Our purpose is to introduce IB core management approach, which
may not be applicable on device level, maybe we can just pass them :-)

Regards,
Michael Wang



 Jason

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check

2015-03-30 Thread Michael Wang

On 03/30/2015 06:13 PM, Doug Ledford wrote:
 On Fri, 2015-03-27 at 16:47 +0100, Michael Wang wrote:
 Introduce helper has_iwarp() to help us check if an IB device
 support IWARP protocol.
 This is a needless redirection.  Just stick with the original
 rdma_transport_is_iwarp().

Agree, will leave it there :-)

Regards,
Michael Wang


 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  include/rdma/ib_verbs.h | 13 +
  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |  2 +-
  2 files changed, 14 insertions(+), 1 deletion(-)

 diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
 index e796104..0ef9cd7 100644
 --- a/include/rdma/ib_verbs.h
 +++ b/include/rdma/ib_verbs.h
 @@ -1836,6 +1836,19 @@ static inline int has_mcast(struct ib_device *device)
  }
  
  /**
 + * has_iwarp - Check if a device support IWARP protocol.
 + *
 + * @device: Device to be checked
 + *
 + * Return 0 when a device has none port to support
 + * IWARP protocol.
 + */
 +static inline int has_iwarp(struct ib_device *device)
 +{
 +return rdma_transport_is_iwarp(device);
 +}
 +
 +/**
   * cap_smi - Check if the port of device has the capability
   * Subnet Management Interface.
   *
 diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
 b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
 index a7b5891..48aeb5e 100644
 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
 +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
 @@ -118,7 +118,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
  
  static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
  {
 -if (rdma_transport_is_iwarp(xprt-sc_cm_id-device))
 +if (has_iwarp(xprt-sc_cm_id-device))
  return 1;
  else
  return min_t(int, sge_count, xprt-sc_max_sge);


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1

2015-03-30 Thread Chuck Lever


On Mar 30, 2015, at 10:18 AM, Steve Wise sw...@opengridcomputing.com wrote:

 Hey Chuck,
 
 Chelsio's QA regression tested this series on iw_cxgb4.  Tests out good.
 
 Tests ran: spew, ffsb, xdd, fio, dbench, and cthon with both v3 and v4.

Thanks, Steve. Who should I credit in the Tested-by tag?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/11] IB/Verbs: Use helpers to check transport and link layer

2015-03-30 Thread Doug Ledford

On Mon, 2015-03-30 at 18:14 +0200, Michael Wang wrote:
 Hi, Doug
 
 Thanks for the comments :-)
 
 On 03/30/2015 05:56 PM, Doug Ledford wrote:
  On Fri, 2015-03-27 at 16:40 +0100, Michael Wang wrote:
  We have so much places to check transport type and link layer type, it's 
  now
  make sense to introduce some helpers in order to refine the lengthy code.
 
  This patch will introduce helpers:
  rdma_transport_is_ib()
  rdma_transport_is_iwarp()
  rdma_port_ll_is_ib()
  rdma_port_ll_is_eth()
  and use them to save some code for us.
  If the end result is to do something like I proposed, then why take this
  intermediate step that just has to be backed out later?
 
 The problem is that I found there are still many places our new
 mechanism may could not take care, especially inside device driver,
 this is just try to collect the issues together as a basement so we can
 gradually eliminate them.

There is no gradually eliminate them to the suggestion I made.
Remember, my suggestion was to remove the transport and link_layer items
from the port settings and replace it with just one transport item that
is a bitmask of the possible transport types.  This can not be done
gradually, it must be a complete change all at once as the two methods
of setting things are incompatible.  As there is only one out of tree
driver that I know of, lustre, we can give them the information they
need to make their driver work both before and after the change.

 Sure if finally we do capture all the cases, we can just get rid of
 this one, but I guess it won't be that easy to directly jump into
 next stage :-P
 
 As I could imaging, after this reform, next stage could be introducing
 the new mechanism without changing device driver, and the last
 stage is to asking vendor adapt their code into the new mechanism.
 
 
  In other words, if our end goal is to have
 
  rdma_transport_is_ib()
  rdma_transport_is_iwarp()
  rdma_transport_is_roce()
  rdma_transport_is_opa()
 
  Then we should skip doing rdma_port_ll_is_*() as the answers to these
  items would be implied by rdma_transport_is_roce() and such.
 
 Great if we achieved that ;-) but currently I just wondering maybe
 these helpers can only cover part of the cases where we check
 transport and link layer, there are still some cases we'll need the
 very rough helper to save some code and make things clean~
 
 Regards,
 Michael Wang
 
 
 
  Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
  Cc: Doug Ledford dledf...@redhat.com
  Cc: Ira Weiny ira.we...@intel.com
  Cc: Sean Hefty sean.he...@intel.com
  Signed-off-by: Michael Wang yun.w...@profitbricks.com
  ---
   drivers/infiniband/core/agent.c   |  2 +-
   drivers/infiniband/core/cm.c  |  2 +-
   drivers/infiniband/core/cma.c | 27 ---
   drivers/infiniband/core/mad.c |  6 +++---
   drivers/infiniband/core/multicast.c   | 11 ---
   drivers/infiniband/core/sa_query.c| 14 +++---
   drivers/infiniband/core/ucm.c |  3 +--
   drivers/infiniband/core/user_mad.c|  2 +-
   drivers/infiniband/core/verbs.c   |  5 ++---
   drivers/infiniband/hw/mlx4/ah.c   |  2 +-
   drivers/infiniband/hw/mlx4/cq.c   |  4 +---
   drivers/infiniband/hw/mlx4/mad.c  | 14 --
   drivers/infiniband/hw/mlx4/main.c |  8 +++-
   drivers/infiniband/hw/mlx4/mlx4_ib.h  |  2 +-
   drivers/infiniband/hw/mlx4/qp.c   | 21 +++--
   drivers/infiniband/hw/mlx4/sysfs.c|  6 ++
   drivers/infiniband/ulp/ipoib/ipoib_main.c |  6 +++---
   include/rdma/ib_verbs.h   | 24 
   net/sunrpc/xprtrdma/svc_rdma_recvfrom.c   |  3 +--
   19 files changed, 79 insertions(+), 83 deletions(-)
 
  diff --git a/drivers/infiniband/core/agent.c 
  b/drivers/infiniband/core/agent.c
  index f6d2961..27f1bec 100644
  --- a/drivers/infiniband/core/agent.c
  +++ b/drivers/infiniband/core/agent.c
  @@ -156,7 +156,7 @@ int ib_agent_port_open(struct ib_device *device, int 
  port_num)
   goto error1;
   }
   
  -if (rdma_port_get_link_layer(device, port_num) == 
  IB_LINK_LAYER_INFINIBAND) {
  +if (rdma_port_ll_is_ib(device, port_num)) {
   /* Obtain send only MAD agent for SMI QP */
   port_priv-agent[0] = ib_register_mad_agent(device, port_num,
   IB_QPT_SMI, NULL, 0,
  diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
  index e28a494..2c72e9e 100644
  --- a/drivers/infiniband/core/cm.c
  +++ b/drivers/infiniband/core/cm.c
  @@ -3762,7 +3762,7 @@ static void cm_add_one(struct ib_device *ib_device)
   int ret;
   u8 i;
   
  -if (rdma_node_get_transport(ib_device-node_type) != 
  RDMA_TRANSPORT_IB)
  +if (!rdma_transport_is_ib(ib_device))
   return;
   
   cm_dev = kzalloc(sizeof(*cm_dev) + sizeof(*port) *
  diff --git

Re: [RFC PATCH 02/11] IB/Verbs: Use management helper tech_iboe() for iboe-check

2015-03-30 Thread Michael Wang

On 03/30/2015 06:17 PM, Doug Ledford wrote:
 On Fri, 2015-03-27 at 16:42 +0100, Michael Wang wrote:
 Introduce helper tech_iboe() to help us check if the port of an IB
 device is using RoCE/IBoE technology.
 Just use rdma_transport_is_roce() instead.
Sounds good :-) will be in next version.

Regards,
Michael Wang


 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  drivers/infiniband/core/cma.c |  6 ++
  include/rdma/ib_verbs.h   | 16 
  2 files changed, 18 insertions(+), 4 deletions(-)

 diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
 index 668e955..280cfe3 100644
 --- a/drivers/infiniband/core/cma.c
 +++ b/drivers/infiniband/core/cma.c
 @@ -375,8 +375,7 @@ static int cma_acquire_dev(struct rdma_id_private 
 *id_priv,
   listen_id_priv-id.port_num) == dev_ll) {
  cma_dev = listen_id_priv-cma_dev;
  port = listen_id_priv-id.port_num;
 -if (rdma_transport_is_ib(cma_dev-device) 
 -rdma_port_ll_is_eth(cma_dev-device, port))
 +if (tech_iboe(cma_dev-device, port))
  ret = ib_find_cached_gid(cma_dev-device, iboe_gid,
   found_port, NULL);
  else
 @@ -395,8 +394,7 @@ static int cma_acquire_dev(struct rdma_id_private 
 *id_priv,
  listen_id_priv-id.port_num == port)
  continue;
  if (rdma_port_get_link_layer(cma_dev-device, port) == dev_ll) {
 -if (rdma_transport_is_ib(cma_dev-device) 
 -rdma_port_ll_is_eth(cma_dev-device, port))
 +if (tech_iboe(cma_dev-device, port))
  ret = ib_find_cached_gid(cma_dev-device, iboe_gid, 
 found_port, NULL);
  else
  ret = ib_find_cached_gid(cma_dev-device, gid, 
 found_port, NULL);
 diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
 index 2bf9094..ca6d6bc 100644
 --- a/include/rdma/ib_verbs.h
 +++ b/include/rdma/ib_verbs.h
 @@ -1767,6 +1767,22 @@ static inline int rdma_port_ll_is_eth(struct 
 ib_device *device, u8 port_num)
  == IB_LINK_LAYER_ETHERNET;
  }
  
 +/**
 + * tech_iboe - Check if the port of device using technology
 + * RoCE/IBoE.
 + *
 + * @device: Device to be checked
 + * @port_num: Port number of the device
 + *
 + * Return 0 when port of the device is not using technology
 + * RoCE/IBoE.
 + */
 +static inline int tech_iboe(struct ib_device *device, u8 port_num)
 +{
 +return rdma_transport_is_ib(device) 
 +rdma_port_ll_is_eth(device, port_num);
 +}
 +
  int ib_query_gid(struct ib_device *device,
   u8 port_num, int index, union ib_gid *gid);
  


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 07/11] IB/Verbs: Use management helper has_mcast() and, cap_mcast() for mcast-check

2015-03-30 Thread Michael Wang

On 03/30/2015 06:11 PM, Doug Ledford wrote:
 On Fri, 2015-03-27 at 16:46 +0100, Michael Wang wrote:
 Introduce helper has_mcast() and cap_mcast() to help us check if an
 IB device or it's port support Multicast.
 This probably needs reworded or rethought.  In truth, *all* rdma devices
 are multicast capable.  *BUT*, IB/OPA devices require multicast
 registration done the IB way (including for sendonly multicast sends),
 while Ethernet devices do multicast the Ethernet way.  These tests are
 really just for IB specific multicast registration and deregistration.
 Call it has_mcast() and cap_mcast() is incorrect.

Thanks for the explanation :-)

Jason also mentioned we should use cap_ib_XX() instead, I'll use
that name then we can distinguish the management between
Eth and IB/OPA.

Regards,
Michael Wang



 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  drivers/infiniband/core/cma.c   |  2 +-
  drivers/infiniband/core/multicast.c |  8 
  include/rdma/ib_verbs.h | 28 
  3 files changed, 33 insertions(+), 5 deletions(-)

 diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
 index 276fb76..cbbc85b 100644
 --- a/drivers/infiniband/core/cma.c
 +++ b/drivers/infiniband/core/cma.c
 @@ -3398,7 +3398,7 @@ void rdma_leave_multicast(struct rdma_cm_id *id, 
 struct sockaddr *addr)
  ib_detach_mcast(id-qp,
  mc-multicast.ib-rec.mgid,
  be16_to_cpu(mc-multicast.ib-rec.mlid));
 -if (rdma_transport_is_ib(id_priv-cma_dev-device)) {
 +if (has_mcast(id_priv-cma_dev-device)) {
  switch (rdma_port_get_link_layer(id-device, id-port_num)) 
 {
  case IB_LINK_LAYER_INFINIBAND:
  ib_sa_free_multicast(mc-multicast.ib);
 diff --git a/drivers/infiniband/core/multicast.c 
 b/drivers/infiniband/core/multicast.c
 index 17573ff..ffeaf27 100644
 --- a/drivers/infiniband/core/multicast.c
 +++ b/drivers/infiniband/core/multicast.c
 @@ -780,7 +780,7 @@ static void mcast_event_handler(struct ib_event_handler 
 *handler,
  int index;
  
  dev = container_of(handler, struct mcast_device, event_handler);
 -if (!rdma_port_ll_is_ib(dev-device, event-element.port_num))
 +if (!cap_mcast(dev-device, event-element.port_num))
  return;
  
  index = event-element.port_num - dev-start_port;
 @@ -807,7 +807,7 @@ static void mcast_add_one(struct ib_device *device)
  int i;
  int count = 0;
  
 -if (!rdma_transport_is_ib(device))
 +if (!has_mcast(device))
  return;
  
  dev = kmalloc(sizeof *dev + device-phys_port_cnt * sizeof *port,
 @@ -823,7 +823,7 @@ static void mcast_add_one(struct ib_device *device)
  }
  
  for (i = 0; i = dev-end_port - dev-start_port; i++) {
 -if (!rdma_port_ll_is_ib(device, dev-start_port + i))
 +if (!cap_mcast(device, dev-start_port + i))
  continue;
  port = dev-port[i];
  port-dev = dev;
 @@ -861,7 +861,7 @@ static void mcast_remove_one(struct ib_device *device)
  flush_workqueue(mcast_wq);
  
  for (i = 0; i = dev-end_port - dev-start_port; i++) {
 -if (rdma_port_ll_is_ib(device, dev-start_port + i)) {
 +if (cap_mcast(device, dev-start_port + i)) {
  port = dev-port[i];
  deref_port(port);
  wait_for_completion(port-comp);
 diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
 index fa8ffa3..e796104 100644
 --- a/include/rdma/ib_verbs.h
 +++ b/include/rdma/ib_verbs.h
 @@ -1823,6 +1823,19 @@ static inline int has_sa(struct ib_device *device)
  }
  
  /**
 + * has_mcast - Check if a device support Multicast.
 + *
 + * @device: Device to be checked
 + *
 + * Return 0 when a device has none port to support
 + * Multicast.
 + */
 +static inline int has_mcast(struct ib_device *device)
 +{
 +return rdma_transport_is_ib(device);
 +}
 +
 +/**
   * cap_smi - Check if the port of device has the capability
   * Subnet Management Interface.
   *
 @@ -1852,6 +1865,21 @@ static inline int cap_sa(struct ib_device *device, u8 
 port_num)
  return rdma_port_ll_is_ib(device, port_num);
  }
  
 +/**
 + * cap_mcast - Check if the port of device has the capability
 + * Multicast.
 + *
 + * @device: Device to be checked
 + * @port_num: Port number of the device
 + *
 + * Return 0 when port of the device don't support
 + * Multicast.
 + */
 +static inline int cap_mcast(struct ib_device *device, u8 port_num)
 +{
 +return rdma_port_ll_is_ib(device, port_num);
 +}
 +
  int ib_query_gid(struct ib_device *device,
   u8 port_num, int index, union ib_gid *gid);
  


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to

Re: [PATCH v1 00/10] Add network namespace support in the RDMA-CM

2015-03-30 Thread Or Gerlitz

On Thu, Mar 19, 2015 at 5:12 PM, Or Gerlitz ogerl...@mellanox.com wrote:
 On 2/17/2015 5:53 PM, Or Gerlitz wrote:
 On 02/11/2015 05:06 PM, Shachar Raindel wrote:

 This patchset allows using network namespaces with the RDMA-CM.
 Each RDMA-CM and CM id is keeping a reference to a network namespace.
 [...]

 Hi Sean,
 Did you had the chance to look on the patches that do the changes to the
 cm and cma code?

 Sean, ping... you are the maintainer of the rdma-cm, these patches are here
 for many weeks,  can you please take a look and provide your feedback.

Sean, PING.  Busy as you may be, the upstream cm  rdma-cm maintainer
hat is sitting solid on your head, and makes no sense for people doing
development to submit patches touching these layers and get no
feedback from you over months, so?

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 06/11] IB/Verbs: Use management helper has_sa() and cap_sa(), for sa-check

2015-03-30 Thread Doug Ledford

On Mon, 2015-03-30 at 18:42 +0200, Michael Wang wrote:
 On 03/30/2015 06:16 PM, Doug Ledford wrote:
  On Fri, 2015-03-27 at 16:46 +0100, Michael Wang wrote:
  Introduce helper has_sa() and cap_sa() to help us check if an IB device
  or it's port support Subnet Administrator.
  There's no functional reason to have both rdma_transport_is_ib and
  rdma_port_ll_is_ib, just use one.  Then there is also no reason for both
  has_sa and cap_sa.  Just use one.
 The has_sa() will be eliminated :-)
 
 rdma_transport_is_ib and rdma_port_ll_is_ib is actually just rough helper
 to save some code, we can get rid of them when we no longer need them, but
 currently device driver still using them a lot, I'm not sure if the new
 mechanism could take cover all these cases...

Sure it would.  This is what I had suggested (well, close to this, I
rearranged the order this time around):

enum rdma_transport {
RDMA_TRANSPORT_IB =  0x01,
RDMA_TRANSPORT_OPA = 0x02,
RDMA_TRANSPORT_IWARP =   0x04,
RDMA_TRANSPORT_ROCE_V1 = 0x08,
RDMA_TRANSPORT_ROCE_V2 = 0x10,
};

struct ib_port {
...
enum rdma_transport;
...
};

static inline bool rdma_transport_is_ib(struct ib_port *port)
{
return port-transport  (RDMA_TRANSPORT_IB | RDMA_TRANSPORT_OPA);
}

static inline bool rdma_transport_is_opa(struct ib_port *port)
{
return port-transport  RDMA_TRANSPORT_OPA;
}

static inline bool rdma_transport_is_iwarp(struct ib_port *port)
{
return port-transport  RDMA_TRANSPORT_IWARP;
}

static inline bool rdma_transport_is_roce(struct ib_port *port)
{
return port-transport  (RDMA_TRANSPORT_ROCE_V1 | 
RDMA_TRANSPORT_ROCE_V2);
}

static inline bool rdma_ib_mgmt(struct ib_port *port)
{
return port-transport  (RDMA_TRANSPORT_IB | RDMA_TRANSPORT_OPA);
}

static inline bool rdma_opa_mgmt(struct ib_port *port)
{
return port-transport  RDMA_TRANSPORT_OPA;
}


If we use something like this, then the above is all you need.  Then
every place in the code that checks for something like has_sa or cap_sa
can be replaced with rdma_ib_mgmt.  When Ira updates his patches for
this, he can check for rdma_opa_mgmt to enable jumbo MAD packets and
whatever else he needs.  Every place that does transport == IB and ll ==
Ethernet can become rdma_transport_is_roce.  Every place that does
transport == IB and ll == INFINIBAND becomes rdma_transport_is_ib.  The
code in multicast.c just needs to check rdma_ib_mgmt() (which happens to
make perfect sense anyway as the code in multicast.c that is checking
that we are on an IB interface is doing so because IB requires extra
management of the multicast group joins/leaves).  But, like I said, this
is an all or nothing change, it isn't something we can ease into.

-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD




signature.asc
Description: This is a digitally signed message part

Re: [PATCH for-next 3/9] net/mlx4_core: Set initial admin GUIDs for VFs

2015-03-30 Thread Jason Gunthorpe

On Sun, Mar 29, 2015 at 04:51:27PM +0300, Or Gerlitz wrote:
 +void mlx4_set_random_admin_guid(struct mlx4_dev *dev, int entry, int port)
 +{
 + struct mlx4_priv *priv = mlx4_priv(dev);
 + u8 random_mac[6];
 + char *raw_gid;
 +
 + /* hw GUID */
 + if (entry == 0)
 + return;
 +
 + eth_random_addr(random_mac);
 + raw_gid = (char *)priv-mfunc.master.vf_admin[entry].vport[port].guid;

raw_gid is actually a guid

 + raw_gid[0] = random_mac[0] ^ 2;

eth_random_addr already guarentees the ULA bit is set to one (local),
so this is wrong. IBA uses the EUI-64 system, not the IPv6
modification.

 + raw_gid[1] = random_mac[1];
 + raw_gid[2] = random_mac[2];
 + raw_gid[3] = 0xff;
 + raw_gid[4] = 0xfe;

This should be 0xff for mapping a MAC to a EUI-64

But, it doesn't really make sense to use eth_random_addr (which
doesn't have a special OUI) and not randomize every bit.

get_random_bytes(guid, sizeof(guid));
guid = ~(1ULL  56);
guid |= 1ULL  57;

I also don't think the kernel should be generating random GUIDs. Either
the SA should be consulted to do this, or the management stack should
generate a cloud wide unique number.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 06/11] IB/Verbs: Use management helper has_sa() and cap_sa(), for sa-check

2015-03-30 Thread Michael Wang

On 03/30/2015 06:16 PM, Doug Ledford wrote:
 On Fri, 2015-03-27 at 16:46 +0100, Michael Wang wrote:
 Introduce helper has_sa() and cap_sa() to help us check if an IB device
 or it's port support Subnet Administrator.
 There's no functional reason to have both rdma_transport_is_ib and
 rdma_port_ll_is_ib, just use one.  Then there is also no reason for both
 has_sa and cap_sa.  Just use one.
The has_sa() will be eliminated :-)

rdma_transport_is_ib and rdma_port_ll_is_ib is actually just rough helper
to save some code, we can get rid of them when we no longer need them, but
currently device driver still using them a lot, I'm not sure if the new
mechanism could take cover all these cases...

Regards,
Michael Wang 


 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  drivers/infiniband/core/sa_query.c | 12 ++--
  include/rdma/ib_verbs.h| 28 
  2 files changed, 34 insertions(+), 6 deletions(-)

 diff --git a/drivers/infiniband/core/sa_query.c 
 b/drivers/infiniband/core/sa_query.c
 index d95d25f..89c27da 100644
 --- a/drivers/infiniband/core/sa_query.c
 +++ b/drivers/infiniband/core/sa_query.c
 @@ -450,7 +450,7 @@ static void ib_sa_event(struct ib_event_handler 
 *handler, struct ib_event *event
  struct ib_sa_port *port =
  sa_dev-port[event-element.port_num - sa_dev-start_port];
  
 -if (!rdma_port_ll_is_ib(handler-device, port-port_num))
 +if (!cap_sa(handler-device, port-port_num))
  return;
  
  spin_lock_irqsave(port-ah_lock, flags);
 @@ -1154,7 +1154,7 @@ static void ib_sa_add_one(struct ib_device *device)
  struct ib_sa_device *sa_dev;
  int s, e, i;
  
 -if (!rdma_transport_is_ib(device))
 +if (!has_sa(device))
  return;
  
  if (device-node_type == RDMA_NODE_IB_SWITCH)
 @@ -1175,7 +1175,7 @@ static void ib_sa_add_one(struct ib_device *device)
  
  for (i = 0; i = e - s; ++i) {
  spin_lock_init(sa_dev-port[i].ah_lock);
 -if (!rdma_port_ll_is_ib(device, i + 1))
 +if (!cap_sa(device, i + 1))
  continue;
  
  sa_dev-port[i].sm_ah= NULL;
 @@ -1205,14 +1205,14 @@ static void ib_sa_add_one(struct ib_device *device)
  goto err;
  
  for (i = 0; i = e - s; ++i)
 -if (rdma_port_ll_is_ib(device, i + 1))
 +if (cap_sa(device, i + 1))
  update_sm_ah(sa_dev-port[i].update_task);
  
  return;
  
  err:
  while (--i = 0)
 -if (rdma_port_ll_is_ib(device, i + 1))
 +if (cap_sa(device, i + 1))
  ib_unregister_mad_agent(sa_dev-port[i].agent);
  
  kfree(sa_dev);
 @@ -1233,7 +1233,7 @@ static void ib_sa_remove_one(struct ib_device *device)
  flush_workqueue(ib_wq);
  
  for (i = 0; i = sa_dev-end_port - sa_dev-start_port; ++i) {
 -if (rdma_port_ll_is_ib(device, i + 1)) {
 +if (cap_sa(device, i + 1)) {
  ib_unregister_mad_agent(sa_dev-port[i].agent);
  if (sa_dev-port[i].sm_ah)
  kref_put(sa_dev-port[i].sm_ah-ref, free_sm_ah);
 diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
 index c0a63f8..fa8ffa3 100644
 --- a/include/rdma/ib_verbs.h
 +++ b/include/rdma/ib_verbs.h
 @@ -1810,6 +1810,19 @@ static inline int has_cm(struct ib_device *device)
  }
  
  /**
 + * has_sa - Check if a device support Subnet Administrator.
 + *
 + * @device: Device to be checked
 + *
 + * Return 0 when a device has none port to support
 + * Subnet Administrator.
 + */
 +static inline int has_sa(struct ib_device *device)
 +{
 +return rdma_transport_is_ib(device);
 +}
 +
 +/**
   * cap_smi - Check if the port of device has the capability
   * Subnet Management Interface.
   *
 @@ -1824,6 +1837,21 @@ static inline int cap_smi(struct ib_device *device, 
 u8 port_num)
  return rdma_port_ll_is_ib(device, port_num);
  }
  
 +/**
 + * cap_sa - Check if the port of device has the capability
 + * Subnet Administrator.
 + *
 + * @device: Device to be checked
 + * @port_num: Port number of the device
 + *
 + * Return 0 when port of the device don't support
 + * Subnet Administrator.
 + */
 +static inline int cap_sa(struct ib_device *device, u8 port_num)
 +{
 +return rdma_port_ll_is_ib(device, port_num);
 +}
 +
  int ib_query_gid(struct ib_device *device,
   u8 port_num, int index, union ib_gid *gid);
  


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/11] IB/Verbs: Use helpers to check transport and link layer

2015-03-30 Thread Michael Wang

On 03/30/2015 06:22 PM, Doug Ledford wrote:
 On Mon, 2015-03-30 at 18:14 +0200, Michael Wang wrote:
 [snip]
 There is no gradually eliminate them to the suggestion I made.
 Remember, my suggestion was to remove the transport and link_layer items
 from the port settings and replace it with just one transport item that
 is a bitmask of the possible transport types.  This can not be done
 gradually, it must be a complete change all at once as the two methods
 of setting things are incompatible.  As there is only one out of tree
 driver that I know of, lustre, we can give them the information they
 need to make their driver work both before and after the change.

Actually there is something confused me on transport and link
layer here, basically we have defined:

transport type
RDMA_TRANSPORT_IB,
RDMA_TRANSPORT_IWARP,
RDMA_TRANSPORT_USNIC,
RDMA_TRANSPORT_USNIC_UDP
link layer
IB_LINK_LAYER_INFINIBAND,
IB_LINK_LAYER_ETHERNET,

So we could have a table:

LL_INFINIBANDLL_ETHERNET
UNCARE
TRANSPORT_IB12  
  3
TRANSPORT_IWARP,
4
UNCARE   56 
 

In current implementation I've found all these combination
in core or driver, and I could see:

rdma_transport_is_ib()  1
rdma_transport_is_iwarp()   4   
rdma_transport_is_roce()2

Just confusing how to take care the combination 3,5,6?

Regards,
Michael Wang


 Sure if finally we do capture all the cases, we can just get rid of
 this one, but I guess it won't be that easy to directly jump into
 next stage :-P

 As I could imaging, after this reform, next stage could be introducing
 the new mechanism without changing device driver, and the last
 stage is to asking vendor adapt their code into the new mechanism.

 In other words, if our end goal is to have

 rdma_transport_is_ib()
 rdma_transport_is_iwarp()
 rdma_transport_is_roce()
 rdma_transport_is_opa()

 Then we should skip doing rdma_port_ll_is_*() as the answers to these
 items would be implied by rdma_transport_is_roce() and such.
 Great if we achieved that ;-) but currently I just wondering maybe
 these helpers can only cover part of the cases where we check
 transport and link layer, there are still some cases we'll need the
 very rough helper to save some code and make things clean~

 Regards,
 Michael Wang


 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  drivers/infiniband/core/agent.c   |  2 +-
  drivers/infiniband/core/cm.c  |  2 +-
  drivers/infiniband/core/cma.c | 27 ---
  drivers/infiniband/core/mad.c |  6 +++---
  drivers/infiniband/core/multicast.c   | 11 ---
  drivers/infiniband/core/sa_query.c| 14 +++---
  drivers/infiniband/core/ucm.c |  3 +--
  drivers/infiniband/core/user_mad.c|  2 +-
  drivers/infiniband/core/verbs.c   |  5 ++---
  drivers/infiniband/hw/mlx4/ah.c   |  2 +-
  drivers/infiniband/hw/mlx4/cq.c   |  4 +---
  drivers/infiniband/hw/mlx4/mad.c  | 14 --
  drivers/infiniband/hw/mlx4/main.c |  8 +++-
  drivers/infiniband/hw/mlx4/mlx4_ib.h  |  2 +-
  drivers/infiniband/hw/mlx4/qp.c   | 21 +++--
  drivers/infiniband/hw/mlx4/sysfs.c|  6 ++
  drivers/infiniband/ulp/ipoib/ipoib_main.c |  6 +++---
  include/rdma/ib_verbs.h   | 24 
  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c   |  3 +--
  19 files changed, 79 insertions(+), 83 deletions(-)

 diff --git a/drivers/infiniband/core/agent.c 
 b/drivers/infiniband/core/agent.c
 index f6d2961..27f1bec 100644
 --- a/drivers/infiniband/core/agent.c
 +++ b/drivers/infiniband/core/agent.c
 @@ -156,7 +156,7 @@ int ib_agent_port_open(struct ib_device *device, int 
 port_num)
  goto error1;
  }
  
 -if (rdma_port_get_link_layer(device, port_num) == 
 IB_LINK_LAYER_INFINIBAND) {
 +if (rdma_port_ll_is_ib(device, port_num)) {
  /* Obtain send only MAD agent for SMI QP */
  port_priv-agent[0] = ib_register_mad_agent(device, port_num,
  IB_QPT_SMI, NULL, 0,
 diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
 index e28a494..2c72e9e 100644
 --- a/drivers/infiniband/core/cm.c
 +++ b/drivers/infiniband/core/cm.c
 @@ -3762,7 +3762,7 @@ static void cm_add_one(struct ib_device *ib_device)
  int ret;
  u8 i;
  
 -if (rdma_node_get_transport(ib_device-node_type) != 
 RDMA_TRANSPORT_IB)
 +if (!rdma_transport_is_ib(ib_device))

[PATCH v3 07/15] xprtrdma: Add a max_payload op for each memreg mode

2015-03-30 Thread Chuck Lever

The max_payload computation is generalized to ensure that the
payload maximum is the lesser of RPC_MAX_DATA_SEGS and the number of
data segments that can be transmitted in an inline buffer.

Signed-off-by: Chuck Lever chuck.le...@oracle.com
Reviewed-by: Sagi Grimberg sa...@mellanox.com
Tested-by: Devesh Sharma devesh.sha...@emulex.com
Tested-by: Meghana Cheripady meghana.cherip...@emulex.com
Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com
---
 net/sunrpc/xprtrdma/fmr_ops.c  |   13 ++
 net/sunrpc/xprtrdma/frwr_ops.c |   13 ++
 net/sunrpc/xprtrdma/physical_ops.c |   10 +++
 net/sunrpc/xprtrdma/transport.c|5 +++-
 net/sunrpc/xprtrdma/verbs.c|   49 +++-
 net/sunrpc/xprtrdma/xprt_rdma.h|5 +++-
 6 files changed, 59 insertions(+), 36 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index ffb7d93..eec2660 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -17,6 +17,19 @@
 # define RPCDBG_FACILITY   RPCDBG_TRANS
 #endif
 
+/* Maximum scatter/gather per FMR */
+#define RPCRDMA_MAX_FMR_SGES   (64)
+
+/* FMR mode conveys up to 64 pages of payload per chunk segment.
+ */
+static size_t
+fmr_op_maxpages(struct rpcrdma_xprt *r_xprt)
+{
+   return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
+rpcrdma_max_segments(r_xprt) * RPCRDMA_MAX_FMR_SGES);
+}
+
 const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
+   .ro_maxpages= fmr_op_maxpages,
.ro_displayname = fmr,
 };
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 79173f9..73a5ac8 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -17,6 +17,19 @@
 # define RPCDBG_FACILITY   RPCDBG_TRANS
 #endif
 
+/* FRWR mode conveys a list of pages per chunk segment. The
+ * maximum length of that list is the FRWR page list depth.
+ */
+static size_t
+frwr_op_maxpages(struct rpcrdma_xprt *r_xprt)
+{
+   struct rpcrdma_ia *ia = r_xprt-rx_ia;
+
+   return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
+rpcrdma_max_segments(r_xprt) * ia-ri_max_frmr_depth);
+}
+
 const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
+   .ro_maxpages= frwr_op_maxpages,
.ro_displayname = frwr,
 };
diff --git a/net/sunrpc/xprtrdma/physical_ops.c 
b/net/sunrpc/xprtrdma/physical_ops.c
index b0922ac..28ade19 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -19,6 +19,16 @@
 # define RPCDBG_FACILITY   RPCDBG_TRANS
 #endif
 
+/* PHYSICAL memory registration conveys one page per chunk segment.
+ */
+static size_t
+physical_op_maxpages(struct rpcrdma_xprt *r_xprt)
+{
+   return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
+rpcrdma_max_segments(r_xprt));
+}
+
 const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
+   .ro_maxpages= physical_op_maxpages,
.ro_displayname = physical,
 };
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 97f6562..da71a24 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -406,7 +406,10 @@ xprt_setup_rdma(struct xprt_create *args)
  xprt_rdma_connect_worker);
 
xprt_rdma_format_addresses(xprt);
-   xprt-max_payload = rpcrdma_max_payload(new_xprt);
+   xprt-max_payload = new_xprt-rx_ia.ri_ops-ro_maxpages(new_xprt);
+   if (xprt-max_payload == 0)
+   goto out4;
+   xprt-max_payload = PAGE_SHIFT;
dprintk(RPC:   %s: transport data payload maximum: %zu bytes\n,
__func__, xprt-max_payload);
 
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index c3319e1..da55cda 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -2212,43 +2212,24 @@ rpcrdma_ep_post_recv(struct rpcrdma_ia *ia,
return rc;
 }
 
-/* Physical mapping means one Read/Write list entry per-page.
- * All list entries must fit within an inline buffer
- *
- * NB: The server must return a Write list for NFS READ,
- * which has the same constraint. Factor in the inline
- * rsize as well.
+/* How many chunk list items fit within our inline buffers?
  */
-static size_t
-rpcrdma_physical_max_payload(struct rpcrdma_xprt *r_xprt)
+unsigned int
+rpcrdma_max_segments(struct rpcrdma_xprt *r_xprt)
 {
struct rpcrdma_create_data_internal *cdata = r_xprt-rx_data;
-   unsigned int inline_size, pages;
-
-   inline_size = min_t(unsigned int,
-   cdata-inline_wsize, cdata-inline_rsize);
-   inline_size -= RPCRDMA_HDRLEN_MIN;
-   pages = inline_size / sizeof(struct rpcrdma_segment);
-   return pages  PAGE_SHIFT;
-}
+   int bytes, segments;
 
-static size_t

[PATCH v3 09/15] xprtrdma: Add a deregister_external op for each memreg mode

2015-03-30 Thread Chuck Lever

There is very little common processing among the different external
memory deregistration functions.

Signed-off-by: Chuck Lever chuck.le...@oracle.com
Tested-by: Devesh Sharma devesh.sha...@emulex.com
Tested-by: Meghana Cheripady meghana.cherip...@emulex.com
Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com
---
 net/sunrpc/xprtrdma/fmr_ops.c  |   27 
 net/sunrpc/xprtrdma/frwr_ops.c |   36 
 net/sunrpc/xprtrdma/physical_ops.c |   10 
 net/sunrpc/xprtrdma/rpc_rdma.c |   11 +++--
 net/sunrpc/xprtrdma/transport.c|4 +-
 net/sunrpc/xprtrdma/verbs.c|   81 
 net/sunrpc/xprtrdma/xprt_rdma.h|5 +-
 7 files changed, 84 insertions(+), 90 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 45fb646..888aa10 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -79,8 +79,35 @@ out_maperr:
return rc;
 }
 
+/* Use the ib_unmap_fmr() verb to prevent further remote
+ * access via RDMA READ or RDMA WRITE.
+ */
+static int
+fmr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
+{
+   struct rpcrdma_ia *ia = r_xprt-rx_ia;
+   struct rpcrdma_mr_seg *seg1 = seg;
+   int rc, nsegs = seg-mr_nsegs;
+   LIST_HEAD(l);
+
+   list_add(seg1-rl_mw-r.fmr-list, l);
+   rc = ib_unmap_fmr(l);
+   read_lock(ia-ri_qplock);
+   while (seg1-mr_nsegs--)
+   rpcrdma_unmap_one(ia, seg++);
+   read_unlock(ia-ri_qplock);
+   if (rc)
+   goto out_err;
+   return nsegs;
+
+out_err:
+   dprintk(RPC:   %s: ib_unmap_fmr status %i\n, __func__, rc);
+   return nsegs;
+}
+
 const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_map = fmr_op_map,
+   .ro_unmap   = fmr_op_unmap,
.ro_maxpages= fmr_op_maxpages,
.ro_displayname = fmr,
 };
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 23e4d99..35b725b 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -110,8 +110,44 @@ out_senderr:
return rc;
 }
 
+/* Post a LOCAL_INV Work Request to prevent further remote access
+ * via RDMA READ or RDMA WRITE.
+ */
+static int
+frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
+{
+   struct rpcrdma_mr_seg *seg1 = seg;
+   struct rpcrdma_ia *ia = r_xprt-rx_ia;
+   struct ib_send_wr invalidate_wr, *bad_wr;
+   int rc, nsegs = seg-mr_nsegs;
+
+   seg1-rl_mw-r.frmr.fr_state = FRMR_IS_INVALID;
+
+   memset(invalidate_wr, 0, sizeof(invalidate_wr));
+   invalidate_wr.wr_id = (unsigned long)(void *)seg1-rl_mw;
+   invalidate_wr.opcode = IB_WR_LOCAL_INV;
+   invalidate_wr.ex.invalidate_rkey = seg1-rl_mw-r.frmr.fr_mr-rkey;
+   DECR_CQCOUNT(r_xprt-rx_ep);
+
+   read_lock(ia-ri_qplock);
+   while (seg1-mr_nsegs--)
+   rpcrdma_unmap_one(ia, seg++);
+   rc = ib_post_send(ia-ri_id-qp, invalidate_wr, bad_wr);
+   read_unlock(ia-ri_qplock);
+   if (rc)
+   goto out_err;
+   return nsegs;
+
+out_err:
+   /* Force rpcrdma_buffer_get() to retry */
+   seg1-rl_mw-r.frmr.fr_state = FRMR_IS_STALE;
+   dprintk(RPC:   %s: ib_post_send status %i\n, __func__, rc);
+   return nsegs;
+}
+
 const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
.ro_map = frwr_op_map,
+   .ro_unmap   = frwr_op_unmap,
.ro_maxpages= frwr_op_maxpages,
.ro_displayname = frwr,
 };
diff --git a/net/sunrpc/xprtrdma/physical_ops.c 
b/net/sunrpc/xprtrdma/physical_ops.c
index 5a284ee..5b5a63a 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -44,8 +44,18 @@ physical_op_map(struct rpcrdma_xprt *r_xprt, struct 
rpcrdma_mr_seg *seg,
return 1;
 }
 
+/* Unmap a memory region, but leave it registered.
+ */
+static int
+physical_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
+{
+   rpcrdma_unmap_one(r_xprt-rx_ia, seg);
+   return 1;
+}
+
 const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
.ro_map = physical_op_map,
+   .ro_unmap   = physical_op_unmap,
.ro_maxpages= physical_op_maxpages,
.ro_displayname = physical,
 };
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 6ab1d03..2c53ea9 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -284,11 +284,12 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct 
xdr_buf *target,
return (unsigned char *)iptr - (unsigned char *)headerp;
 
 out:
-   if (r_xprt-rx_ia.ri_memreg_strategy != RPCRDMA_FRMR) {
-   for (pos

[PATCH v3 11/15] xprtrdma: Add reset MRs memreg op

2015-03-30 Thread Chuck Lever

This method is invoked when a transport instance is about to be
reconnected. Each Memory Region object is reset to its initial
state.

Signed-off-by: Chuck Lever chuck.le...@oracle.com
Reviewed-by: Sagi Grimberg sa...@mellanox.com
Tested-by: Devesh Sharma devesh.sha...@emulex.com
Tested-by: Meghana Cheripady meghana.cherip...@emulex.com
Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com
---
 net/sunrpc/xprtrdma/fmr_ops.c  |   23 
 net/sunrpc/xprtrdma/frwr_ops.c |   51 ++
 net/sunrpc/xprtrdma/physical_ops.c |6 ++
 net/sunrpc/xprtrdma/verbs.c|  103 +---
 net/sunrpc/xprtrdma/xprt_rdma.h|1 
 5 files changed, 83 insertions(+), 101 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 825ce96..93261b0 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -146,10 +146,33 @@ out_err:
return nsegs;
 }
 
+/* After a disconnect, unmap all FMRs.
+ *
+ * This is invoked only in the transport connect worker in order
+ * to serialize with rpcrdma_register_fmr_external().
+ */
+static void
+fmr_op_reset(struct rpcrdma_xprt *r_xprt)
+{
+   struct rpcrdma_buffer *buf = r_xprt-rx_buf;
+   struct rpcrdma_mw *r;
+   LIST_HEAD(list);
+   int rc;
+
+   list_for_each_entry(r, buf-rb_all, mw_all)
+   list_add(r-r.fmr-list, list);
+
+   rc = ib_unmap_fmr(list);
+   if (rc)
+   dprintk(RPC:   %s: ib_unmap_fmr failed %i\n,
+   __func__, rc);
+}
+
 const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_map = fmr_op_map,
.ro_unmap   = fmr_op_unmap,
.ro_maxpages= fmr_op_maxpages,
.ro_init= fmr_op_init,
+   .ro_reset   = fmr_op_reset,
.ro_displayname = fmr,
 };
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 9168c15..c2bb29d 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -46,6 +46,18 @@ out_list_err:
return rc;
 }
 
+static void
+__frwr_release(struct rpcrdma_mw *r)
+{
+   int rc;
+
+   rc = ib_dereg_mr(r-r.frmr.fr_mr);
+   if (rc)
+   dprintk(RPC:   %s: ib_dereg_mr status %i\n,
+   __func__, rc);
+   ib_free_fast_reg_page_list(r-r.frmr.fr_pgl);
+}
+
 /* FRWR mode conveys a list of pages per chunk segment. The
  * maximum length of that list is the FRWR page list depth.
  */
@@ -210,10 +222,49 @@ out_err:
return nsegs;
 }
 
+/* After a disconnect, a flushed FAST_REG_MR can leave an FRMR in
+ * an unusable state. Find FRMRs in this state and dereg / reg
+ * each.  FRMRs that are VALID and attached to an rpcrdma_req are
+ * also torn down.
+ *
+ * This gives all in-use FRMRs a fresh rkey and leaves them INVALID.
+ *
+ * This is invoked only in the transport connect worker in order
+ * to serialize with rpcrdma_register_frmr_external().
+ */
+static void
+frwr_op_reset(struct rpcrdma_xprt *r_xprt)
+{
+   struct rpcrdma_buffer *buf = r_xprt-rx_buf;
+   struct ib_device *device = r_xprt-rx_ia.ri_id-device;
+   unsigned int depth = r_xprt-rx_ia.ri_max_frmr_depth;
+   struct ib_pd *pd = r_xprt-rx_ia.ri_pd;
+   struct rpcrdma_mw *r;
+   int rc;
+
+   list_for_each_entry(r, buf-rb_all, mw_all) {
+   if (r-r.frmr.fr_state == FRMR_IS_INVALID)
+   continue;
+
+   __frwr_release(r);
+   rc = __frwr_init(r, pd, device, depth);
+   if (rc) {
+   dprintk(RPC:   %s: mw %p left %s\n,
+   __func__, r,
+   (r-r.frmr.fr_state == FRMR_IS_STALE ?
+   stale : valid));
+   continue;
+   }
+
+   r-r.frmr.fr_state = FRMR_IS_INVALID;
+   }
+}
+
 const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
.ro_map = frwr_op_map,
.ro_unmap   = frwr_op_unmap,
.ro_maxpages= frwr_op_maxpages,
.ro_init= frwr_op_init,
+   .ro_reset   = frwr_op_reset,
.ro_displayname = frwr,
 };
diff --git a/net/sunrpc/xprtrdma/physical_ops.c 
b/net/sunrpc/xprtrdma/physical_ops.c
index c372051..e060713 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -59,10 +59,16 @@ physical_op_unmap(struct rpcrdma_xprt *r_xprt, struct 
rpcrdma_mr_seg *seg)
return 1;
 }
 
+static void
+physical_op_reset(struct rpcrdma_xprt *r_xprt)
+{
+}
+
 const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
.ro_map = physical_op_map,
.ro_unmap

[PATCH v3 10/15] xprtrdma: Add init MRs memreg op

2015-03-30 Thread Chuck Lever

This method is used when setting up a new transport instance to
create a pool of Memory Region objects that will be used to register
memory during operation.

Memory Regions are not needed for physical registration, since
-prepare and -release are no-ops for that mode.

Signed-off-by: Chuck Lever chuck.le...@oracle.com
Reviewed-by: Sagi Grimberg sa...@mellanox.com
Tested-by: Devesh Sharma devesh.sha...@emulex.com
Tested-by: Meghana Cheripady meghana.cherip...@emulex.com
Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com
---
 net/sunrpc/xprtrdma/fmr_ops.c  |   42 +++
 net/sunrpc/xprtrdma/frwr_ops.c |   66 +++
 net/sunrpc/xprtrdma/physical_ops.c |7 ++
 net/sunrpc/xprtrdma/verbs.c|  104 +---
 net/sunrpc/xprtrdma/xprt_rdma.h|1 
 5 files changed, 119 insertions(+), 101 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 888aa10..825ce96 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -29,6 +29,47 @@ fmr_op_maxpages(struct rpcrdma_xprt *r_xprt)
 rpcrdma_max_segments(r_xprt) * RPCRDMA_MAX_FMR_SGES);
 }
 
+static int
+fmr_op_init(struct rpcrdma_xprt *r_xprt)
+{
+   struct rpcrdma_buffer *buf = r_xprt-rx_buf;
+   int mr_access_flags = IB_ACCESS_REMOTE_WRITE | IB_ACCESS_REMOTE_READ;
+   struct ib_fmr_attr fmr_attr = {
+   .max_pages  = RPCRDMA_MAX_FMR_SGES,
+   .max_maps   = 1,
+   .page_shift = PAGE_SHIFT
+   };
+   struct ib_pd *pd = r_xprt-rx_ia.ri_pd;
+   struct rpcrdma_mw *r;
+   int i, rc;
+
+   INIT_LIST_HEAD(buf-rb_mws);
+   INIT_LIST_HEAD(buf-rb_all);
+
+   i = (buf-rb_max_requests + 1) * RPCRDMA_MAX_SEGS;
+   dprintk(RPC:   %s: initalizing %d FMRs\n, __func__, i);
+
+   while (i--) {
+   r = kzalloc(sizeof(*r), GFP_KERNEL);
+   if (!r)
+   return -ENOMEM;
+
+   r-r.fmr = ib_alloc_fmr(pd, mr_access_flags, fmr_attr);
+   if (IS_ERR(r-r.fmr))
+   goto out_fmr_err;
+
+   list_add(r-mw_list, buf-rb_mws);
+   list_add(r-mw_all, buf-rb_all);
+   }
+   return 0;
+
+out_fmr_err:
+   rc = PTR_ERR(r-r.fmr);
+   dprintk(RPC:   %s: ib_alloc_fmr status %i\n, __func__, rc);
+   kfree(r);
+   return rc;
+}
+
 /* Use the ib_map_phys_fmr() verb to register a memory region
  * for remote access via RDMA READ or RDMA WRITE.
  */
@@ -109,5 +150,6 @@ const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_map = fmr_op_map,
.ro_unmap   = fmr_op_unmap,
.ro_maxpages= fmr_op_maxpages,
+   .ro_init= fmr_op_init,
.ro_displayname = fmr,
 };
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 35b725b..9168c15 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -17,6 +17,35 @@
 # define RPCDBG_FACILITY   RPCDBG_TRANS
 #endif
 
+static int
+__frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device,
+   unsigned int depth)
+{
+   struct rpcrdma_frmr *f = r-r.frmr;
+   int rc;
+
+   f-fr_mr = ib_alloc_fast_reg_mr(pd, depth);
+   if (IS_ERR(f-fr_mr))
+   goto out_mr_err;
+   f-fr_pgl = ib_alloc_fast_reg_page_list(device, depth);
+   if (IS_ERR(f-fr_pgl))
+   goto out_list_err;
+   return 0;
+
+out_mr_err:
+   rc = PTR_ERR(f-fr_mr);
+   dprintk(RPC:   %s: ib_alloc_fast_reg_mr status %i\n,
+   __func__, rc);
+   return rc;
+
+out_list_err:
+   rc = PTR_ERR(f-fr_pgl);
+   dprintk(RPC:   %s: ib_alloc_fast_reg_page_list status %i\n,
+   __func__, rc);
+   ib_dereg_mr(f-fr_mr);
+   return rc;
+}
+
 /* FRWR mode conveys a list of pages per chunk segment. The
  * maximum length of that list is the FRWR page list depth.
  */
@@ -29,6 +58,42 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt)
 rpcrdma_max_segments(r_xprt) * ia-ri_max_frmr_depth);
 }
 
+static int
+frwr_op_init(struct rpcrdma_xprt *r_xprt)
+{
+   struct rpcrdma_buffer *buf = r_xprt-rx_buf;
+   struct ib_device *device = r_xprt-rx_ia.ri_id-device;
+   unsigned int depth = r_xprt-rx_ia.ri_max_frmr_depth;
+   struct ib_pd *pd = r_xprt-rx_ia.ri_pd;
+   int i;
+
+   INIT_LIST_HEAD(buf-rb_mws);
+   INIT_LIST_HEAD(buf-rb_all);
+
+   i = (buf-rb_max_requests + 1) * RPCRDMA_MAX_SEGS;
+   dprintk(RPC:   %s: initalizing %d FRMRs\n, __func__, i);
+
+   while (i--) {
+   struct rpcrdma_mw *r;
+   int rc;
+
+   r = kzalloc(sizeof(*r), GFP_KERNEL);
+   if (!r)
+   return -ENOMEM;
+
+   rc =

[PATCH v3 12/15] xprtrdma: Add destroy MRs memreg op

2015-03-30 Thread Chuck Lever

Memory Region objects associated with a transport instance are
destroyed before the instance is shutdown and destroyed.

Signed-off-by: Chuck Lever chuck.le...@oracle.com
Reviewed-by: Sagi Grimberg sa...@mellanox.com
Tested-by: Devesh Sharma devesh.sha...@emulex.com
Tested-by: Meghana Cheripady meghana.cherip...@emulex.com
Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com
---
 net/sunrpc/xprtrdma/fmr_ops.c  |   18 
 net/sunrpc/xprtrdma/frwr_ops.c |   14 ++
 net/sunrpc/xprtrdma/physical_ops.c |6 
 net/sunrpc/xprtrdma/verbs.c|   52 +---
 net/sunrpc/xprtrdma/xprt_rdma.h|1 +
 5 files changed, 40 insertions(+), 51 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 93261b0..e9ca594 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -168,11 +168,29 @@ fmr_op_reset(struct rpcrdma_xprt *r_xprt)
__func__, rc);
 }
 
+static void
+fmr_op_destroy(struct rpcrdma_buffer *buf)
+{
+   struct rpcrdma_mw *r;
+   int rc;
+
+   while (!list_empty(buf-rb_all)) {
+   r = list_entry(buf-rb_all.next, struct rpcrdma_mw, mw_all);
+   list_del(r-mw_all);
+   rc = ib_dealloc_fmr(r-r.fmr);
+   if (rc)
+   dprintk(RPC:   %s: ib_dealloc_fmr failed %i\n,
+   __func__, rc);
+   kfree(r);
+   }
+}
+
 const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_map = fmr_op_map,
.ro_unmap   = fmr_op_unmap,
.ro_maxpages= fmr_op_maxpages,
.ro_init= fmr_op_init,
.ro_reset   = fmr_op_reset,
+   .ro_destroy = fmr_op_destroy,
.ro_displayname = fmr,
 };
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index c2bb29d..121e400 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -260,11 +260,25 @@ frwr_op_reset(struct rpcrdma_xprt *r_xprt)
}
 }
 
+static void
+frwr_op_destroy(struct rpcrdma_buffer *buf)
+{
+   struct rpcrdma_mw *r;
+
+   while (!list_empty(buf-rb_all)) {
+   r = list_entry(buf-rb_all.next, struct rpcrdma_mw, mw_all);
+   list_del(r-mw_all);
+   __frwr_release(r);
+   kfree(r);
+   }
+}
+
 const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
.ro_map = frwr_op_map,
.ro_unmap   = frwr_op_unmap,
.ro_maxpages= frwr_op_maxpages,
.ro_init= frwr_op_init,
.ro_reset   = frwr_op_reset,
+   .ro_destroy = frwr_op_destroy,
.ro_displayname = frwr,
 };
diff --git a/net/sunrpc/xprtrdma/physical_ops.c 
b/net/sunrpc/xprtrdma/physical_ops.c
index e060713..eb39011 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -64,11 +64,17 @@ physical_op_reset(struct rpcrdma_xprt *r_xprt)
 {
 }
 
+static void
+physical_op_destroy(struct rpcrdma_buffer *buf)
+{
+}
+
 const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
.ro_map = physical_op_map,
.ro_unmap   = physical_op_unmap,
.ro_maxpages= physical_op_maxpages,
.ro_init= physical_op_init,
.ro_reset   = physical_op_reset,
+   .ro_destroy = physical_op_destroy,
.ro_displayname = physical,
 };
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 1b2c1f4..a7fb314 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1199,47 +1199,6 @@ rpcrdma_destroy_req(struct rpcrdma_ia *ia, struct 
rpcrdma_req *req)
kfree(req);
 }
 
-static void
-rpcrdma_destroy_fmrs(struct rpcrdma_buffer *buf)
-{
-   struct rpcrdma_mw *r;
-   int rc;
-
-   while (!list_empty(buf-rb_all)) {
-   r = list_entry(buf-rb_all.next, struct rpcrdma_mw, mw_all);
-   list_del(r-mw_all);
-   list_del(r-mw_list);
-
-   rc = ib_dealloc_fmr(r-r.fmr);
-   if (rc)
-   dprintk(RPC:   %s: ib_dealloc_fmr failed %i\n,
-   __func__, rc);
-
-   kfree(r);
-   }
-}
-
-static void
-rpcrdma_destroy_frmrs(struct rpcrdma_buffer *buf)
-{
-   struct rpcrdma_mw *r;
-   int rc;
-
-   while (!list_empty(buf-rb_all)) {
-   r = list_entry(buf-rb_all.next, struct rpcrdma_mw, mw_all);
-   list_del(r-mw_all);
-   list_del(r-mw_list);
-
-   rc =

[PATCH v3 13/15] xprtrdma: Add open memreg op

2015-03-30 Thread Chuck Lever

The open op determines the size of various transport data structures
based on device capabilities and memory registration mode.

Signed-off-by: Chuck Lever chuck.le...@oracle.com
Tested-by: Devesh Sharma devesh.sha...@emulex.com
Tested-by: Meghana Cheripady meghana.cherip...@emulex.com
Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com
---
 net/sunrpc/xprtrdma/fmr_ops.c  |8 ++
 net/sunrpc/xprtrdma/frwr_ops.c |   48 +++
 net/sunrpc/xprtrdma/physical_ops.c |8 ++
 net/sunrpc/xprtrdma/verbs.c|   49 ++--
 net/sunrpc/xprtrdma/xprt_rdma.h|3 ++
 5 files changed, 70 insertions(+), 46 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index e9ca594..e8a9837 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -20,6 +20,13 @@
 /* Maximum scatter/gather per FMR */
 #define RPCRDMA_MAX_FMR_SGES   (64)
 
+static int
+fmr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
+   struct rpcrdma_create_data_internal *cdata)
+{
+   return 0;
+}
+
 /* FMR mode conveys up to 64 pages of payload per chunk segment.
  */
 static size_t
@@ -188,6 +195,7 @@ fmr_op_destroy(struct rpcrdma_buffer *buf)
 const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_map = fmr_op_map,
.ro_unmap   = fmr_op_unmap,
+   .ro_open= fmr_op_open,
.ro_maxpages= fmr_op_maxpages,
.ro_init= fmr_op_init,
.ro_reset   = fmr_op_reset,
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 121e400..e17d54d 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -58,6 +58,53 @@ __frwr_release(struct rpcrdma_mw *r)
ib_free_fast_reg_page_list(r-r.frmr.fr_pgl);
 }
 
+static int
+frwr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
+struct rpcrdma_create_data_internal *cdata)
+{
+   struct ib_device_attr *devattr = ia-ri_devattr;
+   int depth, delta;
+
+   ia-ri_max_frmr_depth =
+   min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
+ devattr-max_fast_reg_page_list_len);
+   dprintk(RPC:   %s: device's max FR page list len = %u\n,
+   __func__, ia-ri_max_frmr_depth);
+
+   /* Add room for frmr register and invalidate WRs.
+* 1. FRMR reg WR for head
+* 2. FRMR invalidate WR for head
+* 3. N FRMR reg WRs for pagelist
+* 4. N FRMR invalidate WRs for pagelist
+* 5. FRMR reg WR for tail
+* 6. FRMR invalidate WR for tail
+* 7. The RDMA_SEND WR
+*/
+   depth = 7;
+
+   /* Calculate N if the device max FRMR depth is smaller than
+* RPCRDMA_MAX_DATA_SEGS.
+*/
+   if (ia-ri_max_frmr_depth  RPCRDMA_MAX_DATA_SEGS) {
+   delta = RPCRDMA_MAX_DATA_SEGS - ia-ri_max_frmr_depth;
+   do {
+   depth += 2; /* FRMR reg + invalidate */
+   delta -= ia-ri_max_frmr_depth;
+   } while (delta  0);
+   }
+
+   ep-rep_attr.cap.max_send_wr *= depth;
+   if (ep-rep_attr.cap.max_send_wr  devattr-max_qp_wr) {
+   cdata-max_requests = devattr-max_qp_wr / depth;
+   if (!cdata-max_requests)
+   return -EINVAL;
+   ep-rep_attr.cap.max_send_wr = cdata-max_requests *
+  depth;
+   }
+
+   return 0;
+}
+
 /* FRWR mode conveys a list of pages per chunk segment. The
  * maximum length of that list is the FRWR page list depth.
  */
@@ -276,6 +323,7 @@ frwr_op_destroy(struct rpcrdma_buffer *buf)
 const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
.ro_map = frwr_op_map,
.ro_unmap   = frwr_op_unmap,
+   .ro_open= frwr_op_open,
.ro_maxpages= frwr_op_maxpages,
.ro_init= frwr_op_init,
.ro_reset   = frwr_op_reset,
diff --git a/net/sunrpc/xprtrdma/physical_ops.c 
b/net/sunrpc/xprtrdma/physical_ops.c
index eb39011..0ba130b 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -19,6 +19,13 @@
 # define RPCDBG_FACILITY   RPCDBG_TRANS
 #endif
 
+static int
+physical_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
+struct rpcrdma_create_data_internal *cdata)
+{
+   return 0;
+}
+
 /* PHYSICAL memory registration conveys one page per chunk segment.
  */
 static size_t
@@ -72,6 +79,7 @@ physical_op_destroy(struct rpcrdma_buffer *buf)
 const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
.ro_map = physical_op_map,

[PATCH 3/5 linux-next] IB/mlx4: remove unneccessary message level.

2015-03-30 Thread Fabian Frederick

KERN_WARNING is implicitely declared in pr_warn()

Signed-off-by: Fabian Frederick f...@skynet.be
---
 drivers/infiniband/hw/mlx4/main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index b972c0b..1298fe8 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1568,8 +1568,7 @@ static void reset_gids_task(struct work_struct *work)
   MLX4_CMD_TIME_CLASS_B,
   MLX4_CMD_WRAPPED);
if (err)
-   pr_warn(KERN_WARNING
-   set port %d command failed\n, gw-port);
+   pr_warn(set port %d command failed\n, gw-port);
}
 
mlx4_free_cmd_mailbox(dev, mailbox);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 15/15] xprtrdma: Make rpcrdma_{un}map_one() into inline functions

2015-03-30 Thread Chuck Lever

These functions are called in a loop for each page transferred via
RDMA READ or WRITE. Extract loop invariants and inline them to
reduce CPU overhead.

Signed-off-by: Chuck Lever chuck.le...@oracle.com
Tested-by: Devesh Sharma devesh.sha...@emulex.com
Tested-by: Meghana Cheripady meghana.cherip...@emulex.com
Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com
---
 net/sunrpc/xprtrdma/fmr_ops.c  |   10 ++--
 net/sunrpc/xprtrdma/frwr_ops.c |   10 ++--
 net/sunrpc/xprtrdma/physical_ops.c |   10 ++--
 net/sunrpc/xprtrdma/verbs.c|   44 ++-
 net/sunrpc/xprtrdma/xprt_rdma.h|   45 ++--
 5 files changed, 73 insertions(+), 46 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index e8a9837..a91ba2c 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -85,6 +85,8 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg 
*seg,
   int nsegs, bool writing)
 {
struct rpcrdma_ia *ia = r_xprt-rx_ia;
+   struct ib_device *device = ia-ri_id-device;
+   enum dma_data_direction direction = rpcrdma_data_dir(writing);
struct rpcrdma_mr_seg *seg1 = seg;
struct rpcrdma_mw *mw = seg1-rl_mw;
u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];
@@ -97,7 +99,7 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg 
*seg,
if (nsegs  RPCRDMA_MAX_FMR_SGES)
nsegs = RPCRDMA_MAX_FMR_SGES;
for (i = 0; i  nsegs;) {
-   rpcrdma_map_one(ia, seg, writing);
+   rpcrdma_map_one(device, seg, direction);
physaddrs[i] = seg-mr_dma;
len += seg-mr_len;
++seg;
@@ -123,7 +125,7 @@ out_maperr:
__func__, len, (unsigned long long)seg1-mr_dma,
pageoff, i, rc);
while (i--)
-   rpcrdma_unmap_one(ia, --seg);
+   rpcrdma_unmap_one(device, --seg);
return rc;
 }
 
@@ -135,14 +137,16 @@ fmr_op_unmap(struct rpcrdma_xprt *r_xprt, struct 
rpcrdma_mr_seg *seg)
 {
struct rpcrdma_ia *ia = r_xprt-rx_ia;
struct rpcrdma_mr_seg *seg1 = seg;
+   struct ib_device *device;
int rc, nsegs = seg-mr_nsegs;
LIST_HEAD(l);
 
list_add(seg1-rl_mw-r.fmr-list, l);
rc = ib_unmap_fmr(l);
read_lock(ia-ri_qplock);
+   device = ia-ri_id-device;
while (seg1-mr_nsegs--)
-   rpcrdma_unmap_one(ia, seg++);
+   rpcrdma_unmap_one(device, seg++);
read_unlock(ia-ri_qplock);
if (rc)
goto out_err;
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index ea59c1b..0a7b9df 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -178,6 +178,8 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct 
rpcrdma_mr_seg *seg,
int nsegs, bool writing)
 {
struct rpcrdma_ia *ia = r_xprt-rx_ia;
+   struct ib_device *device = ia-ri_id-device;
+   enum dma_data_direction direction = rpcrdma_data_dir(writing);
struct rpcrdma_mr_seg *seg1 = seg;
struct rpcrdma_mw *mw = seg1-rl_mw;
struct rpcrdma_frmr *frmr = mw-r.frmr;
@@ -197,7 +199,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct 
rpcrdma_mr_seg *seg,
if (nsegs  ia-ri_max_frmr_depth)
nsegs = ia-ri_max_frmr_depth;
for (page_no = i = 0; i  nsegs;) {
-   rpcrdma_map_one(ia, seg, writing);
+   rpcrdma_map_one(device, seg, direction);
pa = seg-mr_dma;
for (seg_len = seg-mr_len; seg_len  0; seg_len -= PAGE_SIZE) {
frmr-fr_pgl-page_list[page_no++] = pa;
@@ -247,7 +249,7 @@ out_senderr:
ib_update_fast_reg_key(mr, --key);
frmr-fr_state = FRMR_IS_INVALID;
while (i--)
-   rpcrdma_unmap_one(ia, --seg);
+   rpcrdma_unmap_one(device, --seg);
return rc;
 }
 
@@ -261,6 +263,7 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct 
rpcrdma_mr_seg *seg)
struct rpcrdma_ia *ia = r_xprt-rx_ia;
struct ib_send_wr invalidate_wr, *bad_wr;
int rc, nsegs = seg-mr_nsegs;
+   struct ib_device *device;
 
seg1-rl_mw-r.frmr.fr_state = FRMR_IS_INVALID;
 
@@ -271,8 +274,9 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct 
rpcrdma_mr_seg *seg)
DECR_CQCOUNT(r_xprt-rx_ep);
 
read_lock(ia-ri_qplock);
+   device = ia-ri_id-device;
while (seg1-mr_nsegs--)
-   rpcrdma_unmap_one(ia, seg++);
+   rpcrdma_unmap_one(device, seg++);
rc = ib_post_send(ia-ri_id-qp, invalidate_wr, bad_wr);
read_unlock(ia-ri_qplock);
if (rc)
diff --git a/net/sunrpc/xprtrdma/physical_ops.c 
b/net/sunrpc/xprtrdma/physical_ops.c
index 0ba130b..ba518af 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -50,7

[PATCH v3 14/15] xprtrdma: Handle non-SEND completions via a callout

2015-03-30 Thread Chuck Lever

Allow each memory registration mode to plug in a callout that handles
the completion of a memory registration operation.

Signed-off-by: Chuck Lever chuck.le...@oracle.com
Reviewed-by: Sagi Grimberg sa...@mellanox.com
Tested-by: Devesh Sharma devesh.sha...@emulex.com
Tested-by: Meghana Cheripady meghana.cherip...@emulex.com
Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com
---
 net/sunrpc/xprtrdma/frwr_ops.c  |   17 +
 net/sunrpc/xprtrdma/verbs.c |   16 ++--
 net/sunrpc/xprtrdma/xprt_rdma.h |5 +
 3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index e17d54d..ea59c1b 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -117,6 +117,22 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt)
 rpcrdma_max_segments(r_xprt) * ia-ri_max_frmr_depth);
 }
 
+/* If FAST_REG or LOCAL_INV failed, indicate the frmr needs to be reset. */
+static void
+frwr_sendcompletion(struct ib_wc *wc)
+{
+   struct rpcrdma_mw *r;
+
+   if (likely(wc-status == IB_WC_SUCCESS))
+   return;
+
+   /* WARNING: Only wr_id and status are reliable at this point */
+   r = (struct rpcrdma_mw *)(unsigned long)wc-wr_id;
+   dprintk(RPC:   %s: frmr %p (stale), status %d\n,
+   __func__, r, wc-status);
+   r-r.frmr.fr_state = FRMR_IS_STALE;
+}
+
 static int
 frwr_op_init(struct rpcrdma_xprt *r_xprt)
 {
@@ -148,6 +164,7 @@ frwr_op_init(struct rpcrdma_xprt *r_xprt)
 
list_add(r-mw_list, buf-rb_mws);
list_add(r-mw_all, buf-rb_all);
+   r-mw_sendcompletion = frwr_sendcompletion;
}
 
return 0;
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index b697b3e..cac06f2 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -186,7 +186,7 @@ static const char * const wc_status[] = {
remote access error,
remote operation error,
transport retry counter exceeded,
-   RNR retrycounter exceeded,
+   RNR retry counter exceeded,
local RDD violation error,
remove invalid RD request,
operation aborted,
@@ -204,21 +204,17 @@ static const char * const wc_status[] = {
 static void
 rpcrdma_sendcq_process_wc(struct ib_wc *wc)
 {
-   if (likely(wc-status == IB_WC_SUCCESS))
-   return;
-
/* WARNING: Only wr_id and status are reliable at this point */
-   if (wc-wr_id == 0ULL) {
-   if (wc-status != IB_WC_WR_FLUSH_ERR)
+   if (wc-wr_id == RPCRDMA_IGNORE_COMPLETION) {
+   if (wc-status != IB_WC_SUCCESS 
+   wc-status != IB_WC_WR_FLUSH_ERR)
pr_err(RPC:   %s: SEND: %s\n,
   __func__, COMPLETION_MSG(wc-status));
} else {
struct rpcrdma_mw *r;
 
r = (struct rpcrdma_mw *)(unsigned long)wc-wr_id;
-   r-r.frmr.fr_state = FRMR_IS_STALE;
-   pr_err(RPC:   %s: frmr %p (stale): %s\n,
-  __func__, r, COMPLETION_MSG(wc-status));
+   r-mw_sendcompletion(wc);
}
 }
 
@@ -1622,7 +1618,7 @@ rpcrdma_ep_post(struct rpcrdma_ia *ia,
}
 
send_wr.next = NULL;
-   send_wr.wr_id = 0ULL;   /* no send cookie */
+   send_wr.wr_id = RPCRDMA_IGNORE_COMPLETION;
send_wr.sg_list = req-rl_send_iov;
send_wr.num_sge = req-rl_niovs;
send_wr.opcode = IB_WR_SEND;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 9036fb4..54bcbe4 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -106,6 +106,10 @@ struct rpcrdma_ep {
 #define INIT_CQCOUNT(ep) atomic_set((ep)-rep_cqcount, (ep)-rep_cqinit)
 #define DECR_CQCOUNT(ep) atomic_sub_return(1, (ep)-rep_cqcount)
 
+/* Force completion handler to ignore the signal
+ */
+#define RPCRDMA_IGNORE_COMPLETION  (0ULL)
+
 /* Registered buffer -- registered kmalloc'd memory for RDMA SEND/RECV
  *
  * The below structure appears at the front of a large region of kmalloc'd
@@ -206,6 +210,7 @@ struct rpcrdma_mw {
struct ib_fmr   *fmr;
struct rpcrdma_frmr frmr;
} r;
+   void(*mw_sendcompletion)(struct ib_wc *);
struct list_headmw_list;
struct list_headmw_all;
 };

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/5 linux-next] iw_cxgb4: remove unneccessary message level.

2015-03-30 Thread Fabian Frederick

KERN_ERR is implicitely declared in pr_err()

Signed-off-by: Fabian Frederick f...@skynet.be
---
 drivers/infiniband/hw/cxgb4/device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/device.c 
b/drivers/infiniband/hw/cxgb4/device.c
index 8fb295e..59546b6 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -1355,7 +1355,7 @@ static void recover_lost_dbs(struct uld_ctx *ctx, struct 
qp_list *qp_list)
  t4_sq_host_wq_pidx(qp-wq),
  t4_sq_wq_size(qp-wq));
if (ret) {
-   pr_err(KERN_ERR MOD %s: Fatal error - 
+   pr_err(MOD %s: Fatal error - 
   DB overflow recovery failed - 
   error syncing SQ qid %u\n,
   pci_name(ctx-lldi.pdev), qp-wq.sq.qid);
@@ -1371,7 +1371,7 @@ static void recover_lost_dbs(struct uld_ctx *ctx, struct 
qp_list *qp_list)
  t4_rq_wq_size(qp-wq));
 
if (ret) {
-   pr_err(KERN_ERR MOD %s: Fatal error - 
+   pr_err(MOD %s: Fatal error - 
   DB overflow recovery failed - 
   error syncing RQ qid %u\n,
   pci_name(ctx-lldi.pdev), qp-wq.rq.qid);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 00/15] NFS/RDMA patches proposed for 4.1

2015-03-30 Thread Chuck Lever

This is a series of client-side patches for NFS/RDMA. In preparation
for increasing the transport credit limit and maximum rsize/wsize,
I've re-factored the memory registration logic into separate files,
invoked via a method API.

The series is available in the nfs-rdma-for-4.1 topic branch at

git://linux-nfs.org/projects/cel/cel-2.6.git

Changes since v2:
- Rebased on 4.0-rc6
- One minor fix squashed into 01/15
- Tested-by tags added

Changes since v1:
- Rebased on 4.0-rc5
- Main optimizations postponed to 4.2
- Addressed review comments from Anna, Sagi, and Devesh

---

Chuck Lever (15):
  SUNRPC: Introduce missing well-known netids
  xprtrdma: Display IPv6 addresses and port numbers correctly
  xprtrdma: Perform a full marshal on retransmit
  xprtrdma: Byte-align FRWR registration
  xprtrdma: Prevent infinite loop in rpcrdma_ep_create()
  xprtrdma: Add vector of ops for each memory registration strategy
  xprtrdma: Add a max_payload op for each memreg mode
  xprtrdma: Add a register_external op for each memreg mode
  xprtrdma: Add a deregister_external op for each memreg mode
  xprtrdma: Add init MRs memreg op
  xprtrdma: Add reset MRs memreg op
  xprtrdma: Add destroy MRs memreg op
  xprtrdma: Add open memreg op
  xprtrdma: Handle non-SEND completions via a callout
  xprtrdma: Make rpcrdma_{un}map_one() into inline functions


 include/linux/sunrpc/msg_prot.h|8 
 include/linux/sunrpc/xprtrdma.h|5 
 net/sunrpc/xprtrdma/Makefile   |3 
 net/sunrpc/xprtrdma/fmr_ops.c  |  208 +++
 net/sunrpc/xprtrdma/frwr_ops.c |  353 ++
 net/sunrpc/xprtrdma/physical_ops.c |   94 +
 net/sunrpc/xprtrdma/rpc_rdma.c |   87 ++--
 net/sunrpc/xprtrdma/transport.c|   61 ++-
 net/sunrpc/xprtrdma/verbs.c|  699 +++-
 net/sunrpc/xprtrdma/xprt_rdma.h|   90 -
 10 files changed, 882 insertions(+), 726 deletions(-)
 create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c
 create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c
 create mode 100644 net/sunrpc/xprtrdma/physical_ops.c

--
Chuck Lever
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 02/15] xprtrdma: Display IPv6 addresses and port numbers correctly

2015-03-30 Thread Chuck Lever

Signed-off-by: Chuck Lever chuck.le...@oracle.com
Reviewed-by: Sagi Grimberg sa...@mellanox.com
Tested-by: Devesh Sharma devesh.sha...@emulex.com
Tested-by: Meghana Cheripady meghana.cherip...@emulex.com
Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com
---
 net/sunrpc/xprtrdma/transport.c |   47 ---
 net/sunrpc/xprtrdma/verbs.c |   21 +++--
 2 files changed, 47 insertions(+), 21 deletions(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 2e192ba..9be7f97 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -157,12 +157,47 @@ static struct ctl_table sunrpc_table[] = {
 static struct rpc_xprt_ops xprt_rdma_procs;/* forward reference */
 
 static void
+xprt_rdma_format_addresses4(struct rpc_xprt *xprt, struct sockaddr *sap)
+{
+   struct sockaddr_in *sin = (struct sockaddr_in *)sap;
+   char buf[20];
+
+   snprintf(buf, sizeof(buf), %08x, ntohl(sin-sin_addr.s_addr));
+   xprt-address_strings[RPC_DISPLAY_HEX_ADDR] = kstrdup(buf, GFP_KERNEL);
+
+   xprt-address_strings[RPC_DISPLAY_NETID] = RPCBIND_NETID_RDMA;
+}
+
+static void
+xprt_rdma_format_addresses6(struct rpc_xprt *xprt, struct sockaddr *sap)
+{
+   struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)sap;
+   char buf[40];
+
+   snprintf(buf, sizeof(buf), %pi6, sin6-sin6_addr);
+   xprt-address_strings[RPC_DISPLAY_HEX_ADDR] = kstrdup(buf, GFP_KERNEL);
+
+   xprt-address_strings[RPC_DISPLAY_NETID] = RPCBIND_NETID_RDMA6;
+}
+
+static void
 xprt_rdma_format_addresses(struct rpc_xprt *xprt)
 {
struct sockaddr *sap = (struct sockaddr *)
rpcx_to_rdmad(xprt).addr;
-   struct sockaddr_in *sin = (struct sockaddr_in *)sap;
-   char buf[64];
+   char buf[128];
+
+   switch (sap-sa_family) {
+   case AF_INET:
+   xprt_rdma_format_addresses4(xprt, sap);
+   break;
+   case AF_INET6:
+   xprt_rdma_format_addresses6(xprt, sap);
+   break;
+   default:
+   pr_err(rpcrdma: Unrecognized address family\n);
+   return;
+   }
 
(void)rpc_ntop(sap, buf, sizeof(buf));
xprt-address_strings[RPC_DISPLAY_ADDR] = kstrdup(buf, GFP_KERNEL);
@@ -170,16 +205,10 @@ xprt_rdma_format_addresses(struct rpc_xprt *xprt)
snprintf(buf, sizeof(buf), %u, rpc_get_port(sap));
xprt-address_strings[RPC_DISPLAY_PORT] = kstrdup(buf, GFP_KERNEL);
 
-   xprt-address_strings[RPC_DISPLAY_PROTO] = rdma;
-
-   snprintf(buf, sizeof(buf), %08x, ntohl(sin-sin_addr.s_addr));
-   xprt-address_strings[RPC_DISPLAY_HEX_ADDR] = kstrdup(buf, GFP_KERNEL);
-
snprintf(buf, sizeof(buf), %4hx, rpc_get_port(sap));
xprt-address_strings[RPC_DISPLAY_HEX_PORT] = kstrdup(buf, GFP_KERNEL);
 
-   /* netid */
-   xprt-address_strings[RPC_DISPLAY_NETID] = rdma;
+   xprt-address_strings[RPC_DISPLAY_PROTO] = rdma;
 }
 
 static void
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 124676c..1aa55b7 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -50,6 +50,7 @@
 #include linux/interrupt.h
 #include linux/slab.h
 #include linux/prefetch.h
+#include linux/sunrpc/addr.h
 #include asm/bitops.h
 
 #include xprt_rdma.h
@@ -424,7 +425,7 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct 
rdma_cm_event *event)
struct rpcrdma_ia *ia = xprt-rx_ia;
struct rpcrdma_ep *ep = xprt-rx_ep;
 #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
-   struct sockaddr_in *addr = (struct sockaddr_in *) ep-rep_remote_addr;
+   struct sockaddr *sap = (struct sockaddr *)ep-rep_remote_addr;
 #endif
struct ib_qp_attr *attr = ia-ri_qp_attr;
struct ib_qp_init_attr *iattr = ia-ri_qp_init_attr;
@@ -480,9 +481,8 @@ connected:
wake_up_all(ep-rep_connect_wait);
/*FALLTHROUGH*/
default:
-   dprintk(RPC:   %s: %pI4:%u (ep 0x%p): %s\n,
-   __func__, addr-sin_addr.s_addr,
-   ntohs(addr-sin_port), ep,
+   dprintk(RPC:   %s: %pIS:%u (ep 0x%p): %s\n,
+   __func__, sap, rpc_get_port(sap), ep,
CONNECTION_MSG(event-event));
break;
}
@@ -491,19 +491,16 @@ connected:
if (connstate == 1) {
int ird = attr-max_dest_rd_atomic;
int tird = ep-rep_remote_cma.responder_resources;
-   printk(KERN_INFO rpcrdma: connection to %pI4:%u 
-   on %s, memreg %d slots %d ird %d%s\n,
-   addr-sin_addr.s_addr,
-   ntohs(addr-sin_port),
+
+   pr_info(rpcrdma: connection to %pIS:%u on %s, memreg %d slots 
%d ird %d%s\n,
+   sap, rpc_get_port(sap),
ia-ri_id-device-name,

[PATCH v3 01/15] SUNRPC: Introduce missing well-known netids

2015-03-30 Thread Chuck Lever

Signed-off-by: Chuck Lever chuck.le...@oracle.com
---
 include/linux/sunrpc/msg_prot.h |8 +++-
 include/linux/sunrpc/xprtrdma.h |5 -
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/sunrpc/msg_prot.h b/include/linux/sunrpc/msg_prot.h
index aadc6a0..8073713 100644
--- a/include/linux/sunrpc/msg_prot.h
+++ b/include/linux/sunrpc/msg_prot.h
@@ -142,12 +142,18 @@ typedef __be32rpc_fraghdr;
(RPC_REPHDRSIZE + (2 + RPC_MAX_AUTH_SIZE/4))
 
 /*
- * RFC1833/RFC3530 rpcbind (v3+) well-known netid's.
+ * Well-known netids. See:
+ *
+ *   http://www.iana.org/assignments/rpc-netids/rpc-netids.xhtml
  */
 #define RPCBIND_NETID_UDP  udp
 #define RPCBIND_NETID_TCP  tcp
+#define RPCBIND_NETID_RDMA rdma
+#define RPCBIND_NETID_SCTP sctp
 #define RPCBIND_NETID_UDP6 udp6
 #define RPCBIND_NETID_TCP6 tcp6
+#define RPCBIND_NETID_RDMA6rdma6
+#define RPCBIND_NETID_SCTP6sctp6
 #define RPCBIND_NETID_LOCALlocal
 
 /*
diff --git a/include/linux/sunrpc/xprtrdma.h b/include/linux/sunrpc/xprtrdma.h
index 64a0a0a..c984c85 100644
--- a/include/linux/sunrpc/xprtrdma.h
+++ b/include/linux/sunrpc/xprtrdma.h
@@ -41,11 +41,6 @@
 #define _LINUX_SUNRPC_XPRTRDMA_H
 
 /*
- * rpcbind (v3+) RDMA netid.
- */
-#define RPCBIND_NETID_RDMA rdma
-
-/*
  * Constants. Max RPC/NFS header is big enough to account for
  * additional marshaling buffers passed down by Linux client.
  *

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 05/15] xprtrdma: Prevent infinite loop in rpcrdma_ep_create()

2015-03-30 Thread Chuck Lever

If a provider advertizes a zero max_fast_reg_page_list_len, FRWR
depth detection loops forever. Instead of just failing the mount,
try other memory registration modes.

Fixes: 0fc6c4e7bb28 (xprtrdma: mind the device's max fast . . .)
Reported-by: Devesh Sharma devesh.sha...@emulex.com
Signed-off-by: Chuck Lever chuck.le...@oracle.com
Tested-by: Devesh Sharma devesh.sha...@emulex.com
Tested-by: Meghana Cheripady meghana.cherip...@emulex.com
Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com
---
 net/sunrpc/xprtrdma/verbs.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 60f3317..99752b5 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -618,9 +618,10 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr 
*addr, int memreg)
 
if (memreg == RPCRDMA_FRMR) {
/* Requires both frmr reg and local dma lkey */
-   if ((devattr-device_cap_flags 
+   if (((devattr-device_cap_flags 
 (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) !=
-   (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) {
+   (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) ||
+ (devattr-max_fast_reg_page_list_len == 0)) {
dprintk(RPC:   %s: FRMR registration 
not supported by HCA\n, __func__);
memreg = RPCRDMA_MTHCAFMR;

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 06/15] xprtrdma: Add vector of ops for each memory registration strategy

2015-03-30 Thread Chuck Lever

Instead of employing switch() statements, let's use the typical
Linux kernel idiom for handling behavioral variation: virtual
functions.

Start by defining a vector of operations for each supported memory
registration mode, and by adding a source file for each mode.

Signed-off-by: Chuck Lever chuck.le...@oracle.com
Reviewed-by: Sagi Grimberg sa...@mellanox.com
Tested-by: Devesh Sharma devesh.sha...@emulex.com
Tested-by: Meghana Cheripady meghana.cherip...@emulex.com
Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com
---
 net/sunrpc/xprtrdma/Makefile   |3 ++-
 net/sunrpc/xprtrdma/fmr_ops.c  |   22 ++
 net/sunrpc/xprtrdma/frwr_ops.c |   22 ++
 net/sunrpc/xprtrdma/physical_ops.c |   24 
 net/sunrpc/xprtrdma/verbs.c|   11 +++
 net/sunrpc/xprtrdma/xprt_rdma.h|   12 
 6 files changed, 89 insertions(+), 5 deletions(-)
 create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c
 create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c
 create mode 100644 net/sunrpc/xprtrdma/physical_ops.c

diff --git a/net/sunrpc/xprtrdma/Makefile b/net/sunrpc/xprtrdma/Makefile
index da5136f..579f72b 100644
--- a/net/sunrpc/xprtrdma/Makefile
+++ b/net/sunrpc/xprtrdma/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_SUNRPC_XPRT_RDMA_CLIENT) += xprtrdma.o
 
-xprtrdma-y := transport.o rpc_rdma.o verbs.o
+xprtrdma-y := transport.o rpc_rdma.o verbs.o \
+   fmr_ops.o frwr_ops.o physical_ops.o
 
 obj-$(CONFIG_SUNRPC_XPRT_RDMA_SERVER) += svcrdma.o
 
diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
new file mode 100644
index 000..ffb7d93
--- /dev/null
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -0,0 +1,22 @@
+/*
+ * Copyright (c) 2015 Oracle.  All rights reserved.
+ * Copyright (c) 2003-2007 Network Appliance, Inc. All rights reserved.
+ */
+
+/* Lightweight memory registration using Fast Memory Regions (FMR).
+ * Referred to sometimes as MTHCAFMR mode.
+ *
+ * FMR uses synchronous memory registration and deregistration.
+ * FMR registration is known to be fast, but FMR deregistration
+ * can take tens of usecs to complete.
+ */
+
+#include xprt_rdma.h
+
+#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
+# define RPCDBG_FACILITY   RPCDBG_TRANS
+#endif
+
+const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
+   .ro_displayname = fmr,
+};
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
new file mode 100644
index 000..79173f9
--- /dev/null
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -0,0 +1,22 @@
+/*
+ * Copyright (c) 2015 Oracle.  All rights reserved.
+ * Copyright (c) 2003-2007 Network Appliance, Inc. All rights reserved.
+ */
+
+/* Lightweight memory registration using Fast Registration Work
+ * Requests (FRWR). Also referred to sometimes as FRMR mode.
+ *
+ * FRWR features ordered asynchronous registration and deregistration
+ * of arbitrarily sized memory regions. This is the fastest and safest
+ * but most complex memory registration mode.
+ */
+
+#include xprt_rdma.h
+
+#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
+# define RPCDBG_FACILITY   RPCDBG_TRANS
+#endif
+
+const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
+   .ro_displayname = frwr,
+};
diff --git a/net/sunrpc/xprtrdma/physical_ops.c 
b/net/sunrpc/xprtrdma/physical_ops.c
new file mode 100644
index 000..b0922ac
--- /dev/null
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -0,0 +1,24 @@
+/*
+ * Copyright (c) 2015 Oracle.  All rights reserved.
+ * Copyright (c) 2003-2007 Network Appliance, Inc. All rights reserved.
+ */
+
+/* No-op chunk preparation. All client memory is pre-registered.
+ * Sometimes referred to as ALLPHYSICAL mode.
+ *
+ * Physical registration is simple because all client memory is
+ * pre-registered and never deregistered. This mode is good for
+ * adapter bring up, but is considered not safe: the server is
+ * trusted not to abuse its access to client memory not involved
+ * in RDMA I/O.
+ */
+
+#include xprt_rdma.h
+
+#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
+# define RPCDBG_FACILITY   RPCDBG_TRANS
+#endif
+
+const struct rpcrdma_memreg_ops rpcrdma_physical_memreg_ops = {
+   .ro_displayname = physical,
+};
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 99752b5..c3319e1 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -492,10 +492,10 @@ connected:
int ird = attr-max_dest_rd_atomic;
int tird = ep-rep_remote_cma.responder_resources;
 
-   pr_info(rpcrdma: connection to %pIS:%u on %s, memreg %d slots 
%d ird %d%s\n,
+   pr_info(rpcrdma: connection to %pIS:%u on %s, memreg '%s', %d 
credits, %d responders%s\n,
sap, rpc_get_port(sap),
ia-ri_id-device-name,
-   ia-ri_memreg_strategy,
+   ia-ri_ops-ro_displayname,

[PATCH v3 04/15] xprtrdma: Byte-align FRWR registration

2015-03-30 Thread Chuck Lever

The RPC/RDMA transport's FRWR registration logic registers whole
pages. This means areas in the first and last pages that are not
involved in the RDMA I/O are needlessly exposed to the server.

Buffered I/O is typically page-aligned, so not a problem there. But
for direct I/O, which can be byte-aligned, and for reply chunks,
which are nearly always smaller than a page, the transport could
expose memory outside the I/O buffer.

FRWR allows byte-aligned memory registration, so let's use it as
it was intended.

Reported-by: Sagi Grimberg sa...@mellanox.com
Signed-off-by: Chuck Lever chuck.le...@oracle.com
Tested-by: Devesh Sharma devesh.sha...@emulex.com
Tested-by: Meghana Cheripady meghana.cherip...@emulex.com
Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com
---
 net/sunrpc/xprtrdma/verbs.c |   12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 1aa55b7..60f3317 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1924,23 +1924,19 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg 
*seg,
offset_in_page((seg-1)-mr_offset + (seg-1)-mr_len))
break;
}
-   dprintk(RPC:   %s: Using frmr %p to map %d segments\n,
-   __func__, mw, i);
+   dprintk(RPC:   %s: Using frmr %p to map %d segments (%d bytes)\n,
+   __func__, mw, i, len);
 
frmr-fr_state = FRMR_IS_VALID;
 
memset(fastreg_wr, 0, sizeof(fastreg_wr));
fastreg_wr.wr_id = (unsigned long)(void *)mw;
fastreg_wr.opcode = IB_WR_FAST_REG_MR;
-   fastreg_wr.wr.fast_reg.iova_start = seg1-mr_dma;
+   fastreg_wr.wr.fast_reg.iova_start = seg1-mr_dma + pageoff;
fastreg_wr.wr.fast_reg.page_list = frmr-fr_pgl;
fastreg_wr.wr.fast_reg.page_list_len = page_no;
fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
-   fastreg_wr.wr.fast_reg.length = page_no  PAGE_SHIFT;
-   if (fastreg_wr.wr.fast_reg.length  len) {
-   rc = -EIO;
-   goto out_err;
-   }
+   fastreg_wr.wr.fast_reg.length = len;
 
/* Bump the key */
key = (u8)(mr-rkey  0x00FF);

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 08/15] xprtrdma: Add a register_external op for each memreg mode

2015-03-30 Thread Chuck Lever

There is very little common processing among the different external
memory registration functions. Have rpcrdma_create_chunks() call
the registration method directly. This removes a stack frame and a
switch statement from the external registration path.

Signed-off-by: Chuck Lever chuck.le...@oracle.com
Tested-by: Devesh Sharma devesh.sha...@emulex.com
Tested-by: Meghana Cheripady meghana.cherip...@emulex.com
Tested-by: Veeresh U. Kokatnur veeres...@chelsio.com
---
 net/sunrpc/xprtrdma/fmr_ops.c  |   51 +++
 net/sunrpc/xprtrdma/frwr_ops.c |   82 ++
 net/sunrpc/xprtrdma/physical_ops.c |   17 
 net/sunrpc/xprtrdma/rpc_rdma.c |5 +
 net/sunrpc/xprtrdma/verbs.c|  168 +---
 net/sunrpc/xprtrdma/xprt_rdma.h|6 +
 6 files changed, 160 insertions(+), 169 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index eec2660..45fb646 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -29,7 +29,58 @@ fmr_op_maxpages(struct rpcrdma_xprt *r_xprt)
 rpcrdma_max_segments(r_xprt) * RPCRDMA_MAX_FMR_SGES);
 }
 
+/* Use the ib_map_phys_fmr() verb to register a memory region
+ * for remote access via RDMA READ or RDMA WRITE.
+ */
+static int
+fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
+  int nsegs, bool writing)
+{
+   struct rpcrdma_ia *ia = r_xprt-rx_ia;
+   struct rpcrdma_mr_seg *seg1 = seg;
+   struct rpcrdma_mw *mw = seg1-rl_mw;
+   u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];
+   int len, pageoff, i, rc;
+
+   pageoff = offset_in_page(seg1-mr_offset);
+   seg1-mr_offset -= pageoff; /* start of page */
+   seg1-mr_len += pageoff;
+   len = -pageoff;
+   if (nsegs  RPCRDMA_MAX_FMR_SGES)
+   nsegs = RPCRDMA_MAX_FMR_SGES;
+   for (i = 0; i  nsegs;) {
+   rpcrdma_map_one(ia, seg, writing);
+   physaddrs[i] = seg-mr_dma;
+   len += seg-mr_len;
+   ++seg;
+   ++i;
+   /* Check for holes */
+   if ((i  nsegs  offset_in_page(seg-mr_offset)) ||
+   offset_in_page((seg-1)-mr_offset + (seg-1)-mr_len))
+   break;
+   }
+
+   rc = ib_map_phys_fmr(mw-r.fmr, physaddrs, i, seg1-mr_dma);
+   if (rc)
+   goto out_maperr;
+
+   seg1-mr_rkey = mw-r.fmr-rkey;
+   seg1-mr_base = seg1-mr_dma + pageoff;
+   seg1-mr_nsegs = i;
+   seg1-mr_len = len;
+   return i;
+
+out_maperr:
+   dprintk(RPC:   %s: ib_map_phys_fmr %u@0x%llx+%i (%d) status %i\n,
+   __func__, len, (unsigned long long)seg1-mr_dma,
+   pageoff, i, rc);
+   while (i--)
+   rpcrdma_unmap_one(ia, --seg);
+   return rc;
+}
+
 const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
+   .ro_map = fmr_op_map,
.ro_maxpages= fmr_op_maxpages,
.ro_displayname = fmr,
 };
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 73a5ac8..23e4d99 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -29,7 +29,89 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt)
 rpcrdma_max_segments(r_xprt) * ia-ri_max_frmr_depth);
 }
 
+/* Post a FAST_REG Work Request to register a memory region
+ * for remote access via RDMA READ or RDMA WRITE.
+ */
+static int
+frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
+   int nsegs, bool writing)
+{
+   struct rpcrdma_ia *ia = r_xprt-rx_ia;
+   struct rpcrdma_mr_seg *seg1 = seg;
+   struct rpcrdma_mw *mw = seg1-rl_mw;
+   struct rpcrdma_frmr *frmr = mw-r.frmr;
+   struct ib_mr *mr = frmr-fr_mr;
+   struct ib_send_wr fastreg_wr, *bad_wr;
+   u8 key;
+   int len, pageoff;
+   int i, rc;
+   int seg_len;
+   u64 pa;
+   int page_no;
+
+   pageoff = offset_in_page(seg1-mr_offset);
+   seg1-mr_offset -= pageoff; /* start of page */
+   seg1-mr_len += pageoff;
+   len = -pageoff;
+   if (nsegs  ia-ri_max_frmr_depth)
+   nsegs = ia-ri_max_frmr_depth;
+   for (page_no = i = 0; i  nsegs;) {
+   rpcrdma_map_one(ia, seg, writing);
+   pa = seg-mr_dma;
+   for (seg_len = seg-mr_len; seg_len  0; seg_len -= PAGE_SIZE) {
+   frmr-fr_pgl-page_list[page_no++] = pa;
+   pa += PAGE_SIZE;
+   }
+   len += seg-mr_len;
+   ++seg;
+   ++i;
+   /* Check for holes */
+   if ((i  nsegs  offset_in_page(seg-mr_offset)) ||
+   offset_in_page((seg-1)-mr_offset + (seg-1)-mr_len))
+   break;
+   }
+   dprintk(RPC:   %s: Using frmr %p to map %d segments (%d bytes)\n,
+

Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check

2015-03-30 Thread Jason Gunthorpe

On Mon, Mar 30, 2015 at 05:10:12PM +0200, Michael Wang wrote:
 I found that actually we don't have to touch this one which
 only used by HW driver currently.

I'm having a hard time understanding this, the code in question was in

net/sunrpc/xprtrdma/svc_rdma_recvfrom.c

Which is the NFS ULP, not a device driver.

Regards,
Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 07/11] IB/Verbs: Use management helper has_mcast() and, cap_mcast() for mcast-check

2015-03-30 Thread Jason Gunthorpe

On Mon, Mar 30, 2015 at 10:30:36AM +0200, Michael Wang wrote:

 Thus I also agreed check inside mcast_event_handler() is unnecessary,
 maybe we can change that logical to WARN_ON(!cap_mcast()) ?

Seems reasonable to me.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 07/11] IB/Verbs: Use management helper has_mcast() and, cap_mcast() for mcast-check

2015-03-30 Thread ira.weiny

On Mon, Mar 30, 2015 at 06:20:48PM +0200, Michael Wang wrote:
 On 03/30/2015 06:11 PM, Doug Ledford wrote:
  On Fri, 2015-03-27 at 16:46 +0100, Michael Wang wrote:
  Introduce helper has_mcast() and cap_mcast() to help us check if an
  IB device or it's port support Multicast.
  This probably needs reworded or rethought.  In truth, *all* rdma devices
  are multicast capable.  *BUT*, IB/OPA devices require multicast
  registration done the IB way (including for sendonly multicast sends),
  while Ethernet devices do multicast the Ethernet way.  These tests are
  really just for IB specific multicast registration and deregistration.
  Call it has_mcast() and cap_mcast() is incorrect.
 
 Thanks for the explanation :-)
 
 Jason also mentioned we should use cap_ib_XX() instead, I'll use
 that name then we can distinguish the management between
 Eth and IB/OPA.
 
 Regards,
 Michael Wang
 
 
 
  Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
  Cc: Doug Ledford dledf...@redhat.com
  Cc: Ira Weiny ira.we...@intel.com
  Cc: Sean Hefty sean.he...@intel.com
  Signed-off-by: Michael Wang yun.w...@profitbricks.com
  ---
   drivers/infiniband/core/cma.c   |  2 +-
   drivers/infiniband/core/multicast.c |  8 
   include/rdma/ib_verbs.h | 28 
   3 files changed, 33 insertions(+), 5 deletions(-)
 
  diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
  index 276fb76..cbbc85b 100644
  --- a/drivers/infiniband/core/cma.c
  +++ b/drivers/infiniband/core/cma.c
  @@ -3398,7 +3398,7 @@ void rdma_leave_multicast(struct rdma_cm_id *id, 
  struct sockaddr *addr)
   ib_detach_mcast(id-qp,
   mc-multicast.ib-rec.mgid,
   be16_to_cpu(mc-multicast.ib-rec.mlid));
  -if (rdma_transport_is_ib(id_priv-cma_dev-device)) {
  +if (has_mcast(id_priv-cma_dev-device)) {


You need a similar check in rdma_join_multicast.

Ira


   switch (rdma_port_get_link_layer(id-device, 
  id-port_num)) {
   case IB_LINK_LAYER_INFINIBAND:
   ib_sa_free_multicast(mc-multicast.ib);
  diff --git a/drivers/infiniband/core/multicast.c 
  b/drivers/infiniband/core/multicast.c
  index 17573ff..ffeaf27 100644
  --- a/drivers/infiniband/core/multicast.c
  +++ b/drivers/infiniband/core/multicast.c
  @@ -780,7 +780,7 @@ static void mcast_event_handler(struct 
  ib_event_handler *handler,
   int index;
   
   dev = container_of(handler, struct mcast_device, event_handler);
  -if (!rdma_port_ll_is_ib(dev-device, event-element.port_num))
  +if (!cap_mcast(dev-device, event-element.port_num))
   return;
   
   index = event-element.port_num - dev-start_port;
  @@ -807,7 +807,7 @@ static void mcast_add_one(struct ib_device *device)
   int i;
   int count = 0;
   
  -if (!rdma_transport_is_ib(device))
  +if (!has_mcast(device))
   return;
   
   dev = kmalloc(sizeof *dev + device-phys_port_cnt * sizeof *port,
  @@ -823,7 +823,7 @@ static void mcast_add_one(struct ib_device *device)
   }
   
   for (i = 0; i = dev-end_port - dev-start_port; i++) {
  -if (!rdma_port_ll_is_ib(device, dev-start_port + i))
  +if (!cap_mcast(device, dev-start_port + i))
   continue;
   port = dev-port[i];
   port-dev = dev;
  @@ -861,7 +861,7 @@ static void mcast_remove_one(struct ib_device *device)
   flush_workqueue(mcast_wq);
   
   for (i = 0; i = dev-end_port - dev-start_port; i++) {
  -if (rdma_port_ll_is_ib(device, dev-start_port + i)) {
  +if (cap_mcast(device, dev-start_port + i)) {
   port = dev-port[i];
   deref_port(port);
   wait_for_completion(port-comp);
  diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
  index fa8ffa3..e796104 100644
  --- a/include/rdma/ib_verbs.h
  +++ b/include/rdma/ib_verbs.h
  @@ -1823,6 +1823,19 @@ static inline int has_sa(struct ib_device *device)
   }
   
   /**
  + * has_mcast - Check if a device support Multicast.
  + *
  + * @device: Device to be checked
  + *
  + * Return 0 when a device has none port to support
  + * Multicast.
  + */
  +static inline int has_mcast(struct ib_device *device)
  +{
  +return rdma_transport_is_ib(device);
  +}
  +
  +/**
* cap_smi - Check if the port of device has the capability
* Subnet Management Interface.
*
  @@ -1852,6 +1865,21 @@ static inline int cap_sa(struct ib_device *device, 
  u8 port_num)
   return rdma_port_ll_is_ib(device, port_num);
   }
   
  +/**
  + * cap_mcast - Check if the port of device has the capability
  + * Multicast.
  + *
  + * @device: Device to be checked
  + * @port_num: Port number of the device
  + *
  + * Return 0 when port of the device don't support
  + * Multicast.
  + */
  +static inline int cap_mcast(struct ib_device *device, u8 port_num)
  +{
  +return

Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management

2015-03-30 Thread David Miller

From: Or Gerlitz gerlitz...@gmail.com
Date: Mon, 30 Mar 2015 19:17:01 +0300

 On Sun, Mar 29, 2015 at 4:51 PM, Or Gerlitz ogerl...@mellanox.com wrote:
 Under the existing implementation for virtual GIDs, if the SM is not
 reachable or incurs a delayed response, or if the VF is probed into a
 VM before their GUID is registered with the SM, there exists a window
 in time in which the VF sees an incorrect GID, i.e., not the GID that
 was intended by the admin. This results in exposing a temporal identity
 to the VF.

 Hi Roland, so your for-next branch is again way behind, still on 3.19
 and while 4.0 is soon @ rc6, we couldn't even rebase this series on
 it. It's really hard where your tree is really active once every nine
 weeks or so, e.g only few days before/after rc1's. I'm not sure what
 you expect us to do, kernel development simply needs not be like this.

 April 3rd-12th is holiday here, and we would like to really really
 know early this week what you intend to pull for 4.1 out of the
 pending things in linux-rdma.

Roland, I have to genuinely agree with Or, that your handling of
patch integration is sub-par and really painful for anyone actually
trying to get real work done here.

If you simply don't have the time to devote to constantly reviewing
patches as they come in, and doing so in a timely manner, please let
someone who is actually interested and has the time to take over.

Only integrating peoples work right before the merge window, and then
disappearing for a long time really isn't acceptable.

Thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management

2015-03-30 Thread Roland Dreier

 Roland, I have to genuinely agree with Or, that your handling of
 patch integration is sub-par and really painful for anyone actually
 trying to get real work done here.

 If you simply don't have the time to devote to constantly reviewing
 patches as they come in, and doing so in a timely manner, please let
 someone who is actually interested and has the time to take over.

It's a fair criticism, and certainly for at least the last year or so
I have not had the time to do enough work as a maintainer.  I have
hope that some of the things that have been keeping me busy are dying
down and that I'll have more time to spend on handling the RDMA tree,
but that's just talk until I actually get more done.

I really would like to get more people involved in handling the flow
of patches but I'm not sure who has not only the interest and the time
but also the judgement and expertise to take over.  Certainly Or has
been a long time contributor who has done a lot of great things, but I
still worry about things like ABI stability and backwards
compatibility.

But I'm open to ideas.

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/11] IB/Verbs: Use helpers to check transport and link layer

2015-03-30 Thread Doug Ledford

On Fri, 2015-03-27 at 16:40 +0100, Michael Wang wrote:
 We have so much places to check transport type and link layer type, it's now
 make sense to introduce some helpers in order to refine the lengthy code.
 
 This patch will introduce helpers:
 rdma_transport_is_ib()
 rdma_transport_is_iwarp()
 rdma_port_ll_is_ib()
 rdma_port_ll_is_eth()
 and use them to save some code for us.

If the end result is to do something like I proposed, then why take this
intermediate step that just has to be backed out later?

In other words, if our end goal is to have

rdma_transport_is_ib()
rdma_transport_is_iwarp()
rdma_transport_is_roce()
rdma_transport_is_opa()

Then we should skip doing rdma_port_ll_is_*() as the answers to these
items would be implied by rdma_transport_is_roce() and such.

 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  drivers/infiniband/core/agent.c   |  2 +-
  drivers/infiniband/core/cm.c  |  2 +-
  drivers/infiniband/core/cma.c | 27 ---
  drivers/infiniband/core/mad.c |  6 +++---
  drivers/infiniband/core/multicast.c   | 11 ---
  drivers/infiniband/core/sa_query.c| 14 +++---
  drivers/infiniband/core/ucm.c |  3 +--
  drivers/infiniband/core/user_mad.c|  2 +-
  drivers/infiniband/core/verbs.c   |  5 ++---
  drivers/infiniband/hw/mlx4/ah.c   |  2 +-
  drivers/infiniband/hw/mlx4/cq.c   |  4 +---
  drivers/infiniband/hw/mlx4/mad.c  | 14 --
  drivers/infiniband/hw/mlx4/main.c |  8 +++-
  drivers/infiniband/hw/mlx4/mlx4_ib.h  |  2 +-
  drivers/infiniband/hw/mlx4/qp.c   | 21 +++--
  drivers/infiniband/hw/mlx4/sysfs.c|  6 ++
  drivers/infiniband/ulp/ipoib/ipoib_main.c |  6 +++---
  include/rdma/ib_verbs.h   | 24 
  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c   |  3 +--
  19 files changed, 79 insertions(+), 83 deletions(-)
 
 diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
 index f6d2961..27f1bec 100644
 --- a/drivers/infiniband/core/agent.c
 +++ b/drivers/infiniband/core/agent.c
 @@ -156,7 +156,7 @@ int ib_agent_port_open(struct ib_device *device, int 
 port_num)
  goto error1;
  }
  
 -if (rdma_port_get_link_layer(device, port_num) == 
 IB_LINK_LAYER_INFINIBAND) {
 +if (rdma_port_ll_is_ib(device, port_num)) {
  /* Obtain send only MAD agent for SMI QP */
  port_priv-agent[0] = ib_register_mad_agent(device, port_num,
  IB_QPT_SMI, NULL, 0,
 diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
 index e28a494..2c72e9e 100644
 --- a/drivers/infiniband/core/cm.c
 +++ b/drivers/infiniband/core/cm.c
 @@ -3762,7 +3762,7 @@ static void cm_add_one(struct ib_device *ib_device)
  int ret;
  u8 i;
  
 -if (rdma_node_get_transport(ib_device-node_type) != RDMA_TRANSPORT_IB)
 +if (!rdma_transport_is_ib(ib_device))
  return;
  
  cm_dev = kzalloc(sizeof(*cm_dev) + sizeof(*port) *
 diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
 index d570030..668e955 100644
 --- a/drivers/infiniband/core/cma.c
 +++ b/drivers/infiniband/core/cma.c
 @@ -375,8 +375,8 @@ static int cma_acquire_dev(struct rdma_id_private 
 *id_priv,
   listen_id_priv-id.port_num) == dev_ll) {
  cma_dev = listen_id_priv-cma_dev;
  port = listen_id_priv-id.port_num;
 -if (rdma_node_get_transport(cma_dev-device-node_type) == 
 RDMA_TRANSPORT_IB 
 -rdma_port_get_link_layer(cma_dev-device, port) == 
 IB_LINK_LAYER_ETHERNET)
 +if (rdma_transport_is_ib(cma_dev-device) 
 +rdma_port_ll_is_eth(cma_dev-device, port))
  ret = ib_find_cached_gid(cma_dev-device, iboe_gid,
   found_port, NULL);
  else
 @@ -395,8 +395,8 @@ static int cma_acquire_dev(struct rdma_id_private 
 *id_priv,
  listen_id_priv-id.port_num == port)
  continue;
  if (rdma_port_get_link_layer(cma_dev-device, port) == dev_ll) {
 -if (rdma_node_get_transport(cma_dev-device-node_type) == 
 RDMA_TRANSPORT_IB 
 -rdma_port_get_link_layer(cma_dev-device, port) == 
 IB_LINK_LAYER_ETHERNET)
 +if (rdma_transport_is_ib(cma_dev-device) 
 +rdma_port_ll_is_eth(cma_dev-device, port))
  ret = ib_find_cached_gid(cma_dev-device, iboe_gid, 
 found_port, NULL);
  else
  ret = ib_find_cached_gid(cma_dev-device, gid, 
 found_port, NULL);
 @@ -435,7 +435,7 @@ static int cma_resolve_ib_dev(struct rdma_id_private 
 *id_priv)
  pkey =

Re: [RFC PATCH 07/11] IB/Verbs: Use management helper has_mcast() and, cap_mcast() for mcast-check

2015-03-30 Thread Doug Ledford

On Fri, 2015-03-27 at 16:46 +0100, Michael Wang wrote:
 Introduce helper has_mcast() and cap_mcast() to help us check if an
 IB device or it's port support Multicast.

This probably needs reworded or rethought.  In truth, *all* rdma devices
are multicast capable.  *BUT*, IB/OPA devices require multicast
registration done the IB way (including for sendonly multicast sends),
while Ethernet devices do multicast the Ethernet way.  These tests are
really just for IB specific multicast registration and deregistration.
Call it has_mcast() and cap_mcast() is incorrect.

 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  drivers/infiniband/core/cma.c   |  2 +-
  drivers/infiniband/core/multicast.c |  8 
  include/rdma/ib_verbs.h | 28 
  3 files changed, 33 insertions(+), 5 deletions(-)
 
 diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
 index 276fb76..cbbc85b 100644
 --- a/drivers/infiniband/core/cma.c
 +++ b/drivers/infiniband/core/cma.c
 @@ -3398,7 +3398,7 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct 
 sockaddr *addr)
  ib_detach_mcast(id-qp,
  mc-multicast.ib-rec.mgid,
  be16_to_cpu(mc-multicast.ib-rec.mlid));
 -if (rdma_transport_is_ib(id_priv-cma_dev-device)) {
 +if (has_mcast(id_priv-cma_dev-device)) {
  switch (rdma_port_get_link_layer(id-device, id-port_num)) {
  case IB_LINK_LAYER_INFINIBAND:
  ib_sa_free_multicast(mc-multicast.ib);
 diff --git a/drivers/infiniband/core/multicast.c 
 b/drivers/infiniband/core/multicast.c
 index 17573ff..ffeaf27 100644
 --- a/drivers/infiniband/core/multicast.c
 +++ b/drivers/infiniband/core/multicast.c
 @@ -780,7 +780,7 @@ static void mcast_event_handler(struct ib_event_handler 
 *handler,
  int index;
  
  dev = container_of(handler, struct mcast_device, event_handler);
 -if (!rdma_port_ll_is_ib(dev-device, event-element.port_num))
 +if (!cap_mcast(dev-device, event-element.port_num))
  return;
  
  index = event-element.port_num - dev-start_port;
 @@ -807,7 +807,7 @@ static void mcast_add_one(struct ib_device *device)
  int i;
  int count = 0;
  
 -if (!rdma_transport_is_ib(device))
 +if (!has_mcast(device))
  return;
  
  dev = kmalloc(sizeof *dev + device-phys_port_cnt * sizeof *port,
 @@ -823,7 +823,7 @@ static void mcast_add_one(struct ib_device *device)
  }
  
  for (i = 0; i = dev-end_port - dev-start_port; i++) {
 -if (!rdma_port_ll_is_ib(device, dev-start_port + i))
 +if (!cap_mcast(device, dev-start_port + i))
  continue;
  port = dev-port[i];
  port-dev = dev;
 @@ -861,7 +861,7 @@ static void mcast_remove_one(struct ib_device *device)
  flush_workqueue(mcast_wq);
  
  for (i = 0; i = dev-end_port - dev-start_port; i++) {
 -if (rdma_port_ll_is_ib(device, dev-start_port + i)) {
 +if (cap_mcast(device, dev-start_port + i)) {
  port = dev-port[i];
  deref_port(port);
  wait_for_completion(port-comp);
 diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
 index fa8ffa3..e796104 100644
 --- a/include/rdma/ib_verbs.h
 +++ b/include/rdma/ib_verbs.h
 @@ -1823,6 +1823,19 @@ static inline int has_sa(struct ib_device *device)
  }
  
  /**
 + * has_mcast - Check if a device support Multicast.
 + *
 + * @device: Device to be checked
 + *
 + * Return 0 when a device has none port to support
 + * Multicast.
 + */
 +static inline int has_mcast(struct ib_device *device)
 +{
 +return rdma_transport_is_ib(device);
 +}
 +
 +/**
   * cap_smi - Check if the port of device has the capability
   * Subnet Management Interface.
   *
 @@ -1852,6 +1865,21 @@ static inline int cap_sa(struct ib_device *device, u8 
 port_num)
  return rdma_port_ll_is_ib(device, port_num);
  }
  
 +/**
 + * cap_mcast - Check if the port of device has the capability
 + * Multicast.
 + *
 + * @device: Device to be checked
 + * @port_num: Port number of the device
 + *
 + * Return 0 when port of the device don't support
 + * Multicast.
 + */
 +static inline int cap_mcast(struct ib_device *device, u8 port_num)
 +{
 +return rdma_port_ll_is_ib(device, port_num);
 +}
 +
  int ib_query_gid(struct ib_device *device,
   u8 port_num, int index, union ib_gid *gid);
  


-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD




signature.asc
Description: This is a digitally signed message part

Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check

2015-03-30 Thread Doug Ledford

On Fri, 2015-03-27 at 16:47 +0100, Michael Wang wrote:
 Introduce helper has_iwarp() to help us check if an IB device
 support IWARP protocol.

This is a needless redirection.  Just stick with the original
rdma_transport_is_iwarp().

 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  include/rdma/ib_verbs.h | 13 +
  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |  2 +-
  2 files changed, 14 insertions(+), 1 deletion(-)
 
 diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
 index e796104..0ef9cd7 100644
 --- a/include/rdma/ib_verbs.h
 +++ b/include/rdma/ib_verbs.h
 @@ -1836,6 +1836,19 @@ static inline int has_mcast(struct ib_device *device)
  }
  
  /**
 + * has_iwarp - Check if a device support IWARP protocol.
 + *
 + * @device: Device to be checked
 + *
 + * Return 0 when a device has none port to support
 + * IWARP protocol.
 + */
 +static inline int has_iwarp(struct ib_device *device)
 +{
 +return rdma_transport_is_iwarp(device);
 +}
 +
 +/**
   * cap_smi - Check if the port of device has the capability
   * Subnet Management Interface.
   *
 diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
 b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
 index a7b5891..48aeb5e 100644
 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
 +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
 @@ -118,7 +118,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
  
  static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
  {
 -if (rdma_transport_is_iwarp(xprt-sc_cm_id-device))
 +if (has_iwarp(xprt-sc_cm_id-device))
  return 1;
  else
  return min_t(int, sge_count, xprt-sc_max_sge);


-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD




signature.asc
Description: This is a digitally signed message part

Re: [PATCH 01/11] IB/Verbs: Use helpers to check transport and link layer

2015-03-30 Thread Michael Wang

Hi, Doug

Thanks for the comments :-)

On 03/30/2015 05:56 PM, Doug Ledford wrote:
 On Fri, 2015-03-27 at 16:40 +0100, Michael Wang wrote:
 We have so much places to check transport type and link layer type, it's now
 make sense to introduce some helpers in order to refine the lengthy code.

 This patch will introduce helpers:
 rdma_transport_is_ib()
 rdma_transport_is_iwarp()
 rdma_port_ll_is_ib()
 rdma_port_ll_is_eth()
 and use them to save some code for us.
 If the end result is to do something like I proposed, then why take this
 intermediate step that just has to be backed out later?

The problem is that I found there are still many places our new
mechanism may could not take care, especially inside device driver,
this is just try to collect the issues together as a basement so we can
gradually eliminate them.

Sure if finally we do capture all the cases, we can just get rid of
this one, but I guess it won't be that easy to directly jump into
next stage :-P

As I could imaging, after this reform, next stage could be introducing
the new mechanism without changing device driver, and the last
stage is to asking vendor adapt their code into the new mechanism.


 In other words, if our end goal is to have

 rdma_transport_is_ib()
 rdma_transport_is_iwarp()
 rdma_transport_is_roce()
 rdma_transport_is_opa()

 Then we should skip doing rdma_port_ll_is_*() as the answers to these
 items would be implied by rdma_transport_is_roce() and such.

Great if we achieved that ;-) but currently I just wondering maybe
these helpers can only cover part of the cases where we check
transport and link layer, there are still some cases we'll need the
very rough helper to save some code and make things clean~

Regards,
Michael Wang



 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  drivers/infiniband/core/agent.c   |  2 +-
  drivers/infiniband/core/cm.c  |  2 +-
  drivers/infiniband/core/cma.c | 27 ---
  drivers/infiniband/core/mad.c |  6 +++---
  drivers/infiniband/core/multicast.c   | 11 ---
  drivers/infiniband/core/sa_query.c| 14 +++---
  drivers/infiniband/core/ucm.c |  3 +--
  drivers/infiniband/core/user_mad.c|  2 +-
  drivers/infiniband/core/verbs.c   |  5 ++---
  drivers/infiniband/hw/mlx4/ah.c   |  2 +-
  drivers/infiniband/hw/mlx4/cq.c   |  4 +---
  drivers/infiniband/hw/mlx4/mad.c  | 14 --
  drivers/infiniband/hw/mlx4/main.c |  8 +++-
  drivers/infiniband/hw/mlx4/mlx4_ib.h  |  2 +-
  drivers/infiniband/hw/mlx4/qp.c   | 21 +++--
  drivers/infiniband/hw/mlx4/sysfs.c|  6 ++
  drivers/infiniband/ulp/ipoib/ipoib_main.c |  6 +++---
  include/rdma/ib_verbs.h   | 24 
  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c   |  3 +--
  19 files changed, 79 insertions(+), 83 deletions(-)

 diff --git a/drivers/infiniband/core/agent.c 
 b/drivers/infiniband/core/agent.c
 index f6d2961..27f1bec 100644
 --- a/drivers/infiniband/core/agent.c
 +++ b/drivers/infiniband/core/agent.c
 @@ -156,7 +156,7 @@ int ib_agent_port_open(struct ib_device *device, int 
 port_num)
  goto error1;
  }
  
 -if (rdma_port_get_link_layer(device, port_num) == 
 IB_LINK_LAYER_INFINIBAND) {
 +if (rdma_port_ll_is_ib(device, port_num)) {
  /* Obtain send only MAD agent for SMI QP */
  port_priv-agent[0] = ib_register_mad_agent(device, port_num,
  IB_QPT_SMI, NULL, 0,
 diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
 index e28a494..2c72e9e 100644
 --- a/drivers/infiniband/core/cm.c
 +++ b/drivers/infiniband/core/cm.c
 @@ -3762,7 +3762,7 @@ static void cm_add_one(struct ib_device *ib_device)
  int ret;
  u8 i;
  
 -if (rdma_node_get_transport(ib_device-node_type) != RDMA_TRANSPORT_IB)
 +if (!rdma_transport_is_ib(ib_device))
  return;
  
  cm_dev = kzalloc(sizeof(*cm_dev) + sizeof(*port) *
 diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
 index d570030..668e955 100644
 --- a/drivers/infiniband/core/cma.c
 +++ b/drivers/infiniband/core/cma.c
 @@ -375,8 +375,8 @@ static int cma_acquire_dev(struct rdma_id_private 
 *id_priv,
   listen_id_priv-id.port_num) == dev_ll) {
  cma_dev = listen_id_priv-cma_dev;
  port = listen_id_priv-id.port_num;
 -if (rdma_node_get_transport(cma_dev-device-node_type) == 
 RDMA_TRANSPORT_IB 
 -rdma_port_get_link_layer(cma_dev-device, port) == 
 IB_LINK_LAYER_ETHERNET)
 +if (rdma_transport_is_ib(cma_dev-device) 
 +rdma_port_ll_is_eth(cma_dev-device, port))
  ret =

Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management

2015-03-30 Thread Or Gerlitz

On Sun, Mar 29, 2015 at 4:51 PM, Or Gerlitz ogerl...@mellanox.com wrote:
 Under the existing implementation for virtual GIDs, if the SM is not
 reachable or incurs a delayed response, or if the VF is probed into a
 VM before their GUID is registered with the SM, there exists a window
 in time in which the VF sees an incorrect GID, i.e., not the GID that
 was intended by the admin. This results in exposing a temporal identity
 to the VF.

Hi Roland, so your for-next branch is again way behind, still on 3.19
and while 4.0 is soon @ rc6, we couldn't even rebase this series on
it. It's really hard where your tree is really active once every nine
weeks or so, e.g only few days before/after rc1's. I'm not sure what
you expect us to do, kernel development simply needs not be like this.

April 3rd-12th is holiday here, and we would like to really really
know early this week what you intend to pull for 4.1 out of the
pending things in linux-rdma.

Or.


 Moreover, a subsequent change in the alias GID causes a spec-incompliant
 change to the VF identity. Some guest operating systems, such as Windows,
 cannot tolerate such changes.

 This series solves above problem by exposing the admin desired value instead
 of the value that was approved by the SM. As long as the SM doesn't approve
 the GID, the VF would see its link as down.

 In addition, we request GIDs from the SM on demand, i.e., when a VF actually
 needs them, and release them when the GIDs are no longer in use. In cloud
 environments, this is useful for GID migrations, in which a GID is assigned to
 a VF on the destination HCA, while the VF on the source HCA is shut down (but
 the GID was not administratively released).

 For reasons of compatibility, an explicit admin request to set/change a GUID
 entry is done immediately, regardless of whether the VF is active or not. This
 allows administrators to change the GUID without the need to unbind/bind the 
 VF.

 In addition, the existing implementation doesn't support a persistency
 mechanism to retry a GUID request when the SM has rejected it for any reason.
 The PF driver shall keep trying to acquire the specified GUID indefinitely by
 utilizing an exponential back off scheme, this should be managed per GUID and
 be aligned with other incoming admin requests.

 This ability needed especially for the on-demand GUID feature. In this case, 
 we
 must manage the GUID's status per entry and handle cases that some entries are
 temporarily rejected.

 The first patch adds the persistency support and is pre-requisites for the
 series.  Further patches make the change to use the admin VF behavior as
 described above.

 Finally, the default mode is changed to be HOST assigned instead of SM
 assigned. This is the expected operational mode, because it doesn't depend on
 SM availability as described above.

 Yishai and Or.

 Yishai Hadas (9):
   IB/mlx4: Alias GUID adding persistency support
   net/mlx4_core: Manage alias GUID per VF
   net/mlx4_core: Set initial admin GUIDs for VFs
   IB/mlx4: Manage admin alias GUID upon admin request
   IB/mlx4: Change init flow to request alias GUIDs for active VFs
   IB/mlx4: Request alias GUID on demand
   net/mlx4_core: Raise slave shutdown event upon FLR
   net/mlx4_core: Return the admin alias GUID upon host view request
   IB/mlx4: Change alias guids default to be host assigned

  drivers/infiniband/hw/mlx4/alias_GUID.c   |  468 
 +
  drivers/infiniband/hw/mlx4/main.c |   26 ++-
  drivers/infiniband/hw/mlx4/mlx4_ib.h  |   14 +-
  drivers/infiniband/hw/mlx4/sysfs.c|   44 +--
  drivers/net/ethernet/mellanox/mlx4/cmd.c  |   42 ++-
  drivers/net/ethernet/mellanox/mlx4/eq.c   |2 +
  drivers/net/ethernet/mellanox/mlx4/main.c |   39 +++
  drivers/net/ethernet/mellanox/mlx4/mlx4.h |1 +
  include/linux/mlx4/device.h   |4 +
  9 files changed, 459 insertions(+), 181 deletions(-)

 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 02/11] IB/Verbs: Use management helper tech_iboe() for iboe-check

2015-03-30 Thread Doug Ledford

On Fri, 2015-03-27 at 16:42 +0100, Michael Wang wrote:
 Introduce helper tech_iboe() to help us check if the port of an IB
 device is using RoCE/IBoE technology.

Just use rdma_transport_is_roce() instead.

 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Cc: Doug Ledford dledf...@redhat.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Sean Hefty sean.he...@intel.com
 Signed-off-by: Michael Wang yun.w...@profitbricks.com
 ---
  drivers/infiniband/core/cma.c |  6 ++
  include/rdma/ib_verbs.h   | 16 
  2 files changed, 18 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
 index 668e955..280cfe3 100644
 --- a/drivers/infiniband/core/cma.c
 +++ b/drivers/infiniband/core/cma.c
 @@ -375,8 +375,7 @@ static int cma_acquire_dev(struct rdma_id_private 
 *id_priv,
   listen_id_priv-id.port_num) == dev_ll) {
  cma_dev = listen_id_priv-cma_dev;
  port = listen_id_priv-id.port_num;
 -if (rdma_transport_is_ib(cma_dev-device) 
 -rdma_port_ll_is_eth(cma_dev-device, port))
 +if (tech_iboe(cma_dev-device, port))
  ret = ib_find_cached_gid(cma_dev-device, iboe_gid,
   found_port, NULL);
  else
 @@ -395,8 +394,7 @@ static int cma_acquire_dev(struct rdma_id_private 
 *id_priv,
  listen_id_priv-id.port_num == port)
  continue;
  if (rdma_port_get_link_layer(cma_dev-device, port) == dev_ll) {
 -if (rdma_transport_is_ib(cma_dev-device) 
 -rdma_port_ll_is_eth(cma_dev-device, port))
 +if (tech_iboe(cma_dev-device, port))
  ret = ib_find_cached_gid(cma_dev-device, iboe_gid, 
 found_port, NULL);
  else
  ret = ib_find_cached_gid(cma_dev-device, gid, 
 found_port, NULL);
 diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
 index 2bf9094..ca6d6bc 100644
 --- a/include/rdma/ib_verbs.h
 +++ b/include/rdma/ib_verbs.h
 @@ -1767,6 +1767,22 @@ static inline int rdma_port_ll_is_eth(struct ib_device 
 *device, u8 port_num)
  == IB_LINK_LAYER_ETHERNET;
  }
  
 +/**
 + * tech_iboe - Check if the port of device using technology
 + * RoCE/IBoE.
 + *
 + * @device: Device to be checked
 + * @port_num: Port number of the device
 + *
 + * Return 0 when port of the device is not using technology
 + * RoCE/IBoE.
 + */
 +static inline int tech_iboe(struct ib_device *device, u8 port_num)
 +{
 +return rdma_transport_is_ib(device) 
 +rdma_port_ll_is_eth(device, port_num);
 +}
 +
  int ib_query_gid(struct ib_device *device,
   u8 port_num, int index, union ib_gid *gid);
  


-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD




signature.asc
Description: This is a digitally signed message part

45 matches

Mail list logo