RE: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1

2015-03-26 Thread Devesh Sharma
Hi Chuck,

I have validated these set of patches with ocrdma device, iozone passes with 
these.

-Regards
Devesh

 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
 ow...@vger.kernel.org] On Behalf Of Anna Schumaker
 Sent: Friday, March 27, 2015 12:10 AM
 To: Chuck Lever; linux-rdma@vger.kernel.org; linux-...@vger.kernel.org
 Subject: Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1
 
 Hey Chuck,
 
 I didn't see anything that needs to be fixed up in these patches.  Are they 
 ready
 for me?
 
 Anna
 
 On 03/24/2015 04:30 PM, Chuck Lever wrote:
  This is a series of client-side patches for NFS/RDMA. In preparation
  for increasing the transport credit limit and maximum rsize/wsize,
  I've re-factored the memory registration logic into separate files,
  invoked via a method API.
 
  The two main optimizations in v1 of this series have been dropped.
  Sagi Grimberg didn't like the complexity of the solution, and there
  isn't enough time to rework it, test the new version, and get it
  reviewed before the 4.1 merge window opens. I'm going to prepare these
  for 4.2.
 
  Fixes suggested by reviewers have been included before the refactoring
  patches to make it easier to backport them to previous kernels.
 
  The series is available in the nfs-rdma-for-4.1 topic branch at
 
  git://linux-nfs.org/projects/cel/cel-2.6.git
 
  Changes since v1:
  - Rebased on 4.0-rc5
  - Main optimizations postponed to 4.2
  - Addressed review comments from Anna, Sagi, and Devesh
 
  ---
 
  Chuck Lever (15):
SUNRPC: Introduce missing well-known netids
xprtrdma: Display IPv6 addresses and port numbers correctly
xprtrdma: Perform a full marshal on retransmit
xprtrdma: Byte-align FRWR registration
xprtrdma: Prevent infinite loop in rpcrdma_ep_create()
xprtrdma: Add vector of ops for each memory registration strategy
xprtrdma: Add a max_payload op for each memreg mode
xprtrdma: Add a register_external op for each memreg mode
xprtrdma: Add a deregister_external op for each memreg mode
xprtrdma: Add init MRs memreg op
xprtrdma: Add reset MRs memreg op
xprtrdma: Add destroy MRs memreg op
xprtrdma: Add open memreg op
xprtrdma: Handle non-SEND completions via a callout
xprtrdma: Make rpcrdma_{un}map_one() into inline functions
 
 
   include/linux/sunrpc/msg_prot.h|8
   net/sunrpc/xprtrdma/Makefile   |3
   net/sunrpc/xprtrdma/fmr_ops.c  |  208 +++
   net/sunrpc/xprtrdma/frwr_ops.c |  353 ++
   net/sunrpc/xprtrdma/physical_ops.c |   94 +
   net/sunrpc/xprtrdma/rpc_rdma.c |   87 ++--
   net/sunrpc/xprtrdma/transport.c|   61 ++-
   net/sunrpc/xprtrdma/verbs.c|  699 
  +++-
   net/sunrpc/xprtrdma/xprt_rdma.h|   90 -
   9 files changed, 882 insertions(+), 721 deletions(-)  create mode
  100644 net/sunrpc/xprtrdma/fmr_ops.c  create mode 100644
  net/sunrpc/xprtrdma/frwr_ops.c  create mode 100644
  net/sunrpc/xprtrdma/physical_ops.c
 
  --
  Chuck Lever
  --
  To unsubscribe from this list: send the line unsubscribe linux-nfs
  in the body of a message to majord...@vger.kernel.org More majordomo
  info at  http://vger.kernel.org/majordomo-info.html
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in the 
 body
 of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1

2015-03-26 Thread Devesh Sharma
 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
 ow...@vger.kernel.org] On Behalf Of Devesh Sharma
 Sent: Friday, March 27, 2015 11:13 AM
 To: Anna Schumaker; Chuck Lever; linux-rdma@vger.kernel.org; linux-
 n...@vger.kernel.org
 Subject: RE: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1
 
 Hi Chuck,
 
 I have validated these set of patches with ocrdma device, iozone passes with
 these.


Thanks to Meghna.

 
 -Regards
 Devesh
 
  -Original Message-
  From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
  ow...@vger.kernel.org] On Behalf Of Anna Schumaker
  Sent: Friday, March 27, 2015 12:10 AM
  To: Chuck Lever; linux-rdma@vger.kernel.org; linux-...@vger.kernel.org
  Subject: Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1
 
  Hey Chuck,
 
  I didn't see anything that needs to be fixed up in these patches.  Are
  they ready for me?
 
  Anna
 
  On 03/24/2015 04:30 PM, Chuck Lever wrote:
   This is a series of client-side patches for NFS/RDMA. In preparation
   for increasing the transport credit limit and maximum rsize/wsize,
   I've re-factored the memory registration logic into separate files,
   invoked via a method API.
  
   The two main optimizations in v1 of this series have been dropped.
   Sagi Grimberg didn't like the complexity of the solution, and there
   isn't enough time to rework it, test the new version, and get it
   reviewed before the 4.1 merge window opens. I'm going to prepare
   these for 4.2.
  
   Fixes suggested by reviewers have been included before the
   refactoring patches to make it easier to backport them to previous 
   kernels.
  
   The series is available in the nfs-rdma-for-4.1 topic branch at
  
   git://linux-nfs.org/projects/cel/cel-2.6.git
  
   Changes since v1:
   - Rebased on 4.0-rc5
   - Main optimizations postponed to 4.2
   - Addressed review comments from Anna, Sagi, and Devesh
  
   ---
  
   Chuck Lever (15):
 SUNRPC: Introduce missing well-known netids
 xprtrdma: Display IPv6 addresses and port numbers correctly
 xprtrdma: Perform a full marshal on retransmit
 xprtrdma: Byte-align FRWR registration
 xprtrdma: Prevent infinite loop in rpcrdma_ep_create()
 xprtrdma: Add vector of ops for each memory registration strategy
 xprtrdma: Add a max_payload op for each memreg mode
 xprtrdma: Add a register_external op for each memreg mode
 xprtrdma: Add a deregister_external op for each memreg mode
 xprtrdma: Add init MRs memreg op
 xprtrdma: Add reset MRs memreg op
 xprtrdma: Add destroy MRs memreg op
 xprtrdma: Add open memreg op
 xprtrdma: Handle non-SEND completions via a callout
 xprtrdma: Make rpcrdma_{un}map_one() into inline functions
  
  
include/linux/sunrpc/msg_prot.h|8
net/sunrpc/xprtrdma/Makefile   |3
net/sunrpc/xprtrdma/fmr_ops.c  |  208 +++
net/sunrpc/xprtrdma/frwr_ops.c |  353 ++
net/sunrpc/xprtrdma/physical_ops.c |   94 +
net/sunrpc/xprtrdma/rpc_rdma.c |   87 ++--
net/sunrpc/xprtrdma/transport.c|   61 ++-
net/sunrpc/xprtrdma/verbs.c|  699 
   +++-
net/sunrpc/xprtrdma/xprt_rdma.h|   90 -
9 files changed, 882 insertions(+), 721 deletions(-)  create mode
   100644 net/sunrpc/xprtrdma/fmr_ops.c  create mode 100644
   net/sunrpc/xprtrdma/frwr_ops.c  create mode 100644
   net/sunrpc/xprtrdma/physical_ops.c
  
   --
   Chuck Lever
   --
   To unsubscribe from this list: send the line unsubscribe linux-nfs
   in the body of a message to majord...@vger.kernel.org More majordomo
   info at  http://vger.kernel.org/majordomo-info.html
  
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-rdma
  in the body of a message to majord...@vger.kernel.org More majordomo
  info at http://vger.kernel.org/majordomo-info.html
   칻  ~    +-  ݶ  w  ˛   m b  kvf   ^n r   z   h   G   h ( 階 ݢj 
   m z
 ޖ   f   h   ~ m
N�r��yb�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

Questions about inline-receive in Mellanox cards

2015-03-26 Thread Anuj Kalia
Hi everyone.

I have been trying to get inline-receive functionality to work on
Connect-IB cards. However, it seems like libmlx5 (from Mellanox OFED
2.4) does not support inline-receive. libmlx4 seems to support it, but
I believe libmlx4 is not compatible with Connect-IB cards. Further,
the Mellanox OFED user manual says that Connect-IB cards provide
inline-receive - which makes it sound like ConnectX-3 cards do not,
but then libmlx4 supports it... Does this mean that ConnectX-3 cards
also provide inline-receive?

Can someone please help me with this confusion? Also, is
inline-receive supported with UD transport?

Thanks!
Anuj Kalia
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 RESEND] IB/Verbs: Use helpers to refine the checking on transport and link layer

2015-03-26 Thread Michael Wang
On 03/26/2015 05:27 PM, Doug Ledford wrote:
 On Thu, 2015-03-26 at 17:04 +0100, Michael Wang wrote:
 [snip]

 Few more questions here is:
 1. when to setup? (maybe inside ib_register_device() before doing 
 client-add() callback?)
 I don't think we can set it up here.  The driver's have to set it up.
 After all, the mlx4 driver will have to decide for itself what the port
 transport is and tell us, we can't tell it.
 2. how to setup? (still infer from the transport and link layer like we 
 currently do?)
 Find each point in each driver where they currently set the link layer
 and transport fields today, and replace that with setting the new
 transport bitmask instead.

Forgive me but I'm not very familiar with the process of such changing...

So shall we do this for all the vendors? or provide a transition method
and asking them to do this later by themselves?

The questions is just wondering how the transition method could be, but
if we have to do the changes for vendor, that sounds like a tough job...


 3. in case if a device has ports with different link layer type (please 
 correct
 me if this will never happen), then only one bitmask may not be enough to
 present the transport of all the ports? (maybe create a bitmask per 
 port?)
 [snip]

 That preserves the user space ABI and all user programs keep working,
 while we update to an internal representation that makes more sense for
 how things have evolved.

I get your point :-) to reassemble the information in old style, maybe
we can temporarily reserve the old mechanism, and do reform for
the user space in another separate patch set, also cleanup the old
stuff too.

Regards,
Michael Wang


 Regards,
 Michael Wang

 If we do this, then the only thing we have to fix up to preserve ABI
 with user space is to make sure that any time we export an ibv_device
 struct and any time we import the same, we convert from our new internal
 representation to the old representation that user space expects.  And
 we also need to make a few changes in the sysfs code to display the
 properties as things expect.  But, that would allow us to fix up what I
 see as a problem right now, which is that we hide the information we
 need to know what sort of device we are working on in two different
 fields: the transport and the link layer.  Instead, just use one field
 with enough variants that we can store all of the relevant information
 we need in that one field.  This has the benefit that any comparisons
 that happen in hot paths will now always be a single bitwise comparison
 and will no longer need to hit two separate variables for two separate
 compares.





--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 RESEND] IB/Verbs: Use helpers to refine the checking on transport and link layer

2015-03-26 Thread Doug Ledford
On Thu, 2015-03-26 at 17:04 +0100, Michael Wang wrote:
 Hi, Doug
 
 Thanks for the excellent comments :-)
 
 On 03/26/2015 03:09 PM, Doug Ledford wrote:
  On Wed, 2015-03-25 at 16:09 +0100, Michael Wang wrote:
  [snip]
 
  [snip]
 
  So, I would suggest that we fix things up thusly:
 
  enum transport {
  TRANSPORT_IB=1,
  TRANSPORT_IWARP=2,
  TRANSPORT_ROCE=4,
  TRANSPORT_OPA=8,
  TRANSPORT_USNIC=10,
  };
 
  #define HAS_SA(ibdev) ((ibdev)-transport  (TRANSPORT_IB|TRANSPORT_OPA))
  #define HAS_JUMBO_SA(ibdev) ((ibdev)-transport  TRANSPORT_OPA))
 
  or possibly
 
  static bool ib_dev_has_sa(struct ibv_device *ibdev)
  {
  return ibdev-transport  (TRANSPORT_IB | TRANSPORT_OPA);
  }
 
 The idea sounds interesting, and here my silly questions come :-P
 
 So are you suggesting that we add a new bitmask 'transport' into 'struct 
 ib_device'
 in kernel, and setup it at very beginning?
 
 Few more questions here is:
 1. when to setup? (maybe inside ib_register_device() before doing 
 client-add() callback?)

I don't think we can set it up here.  The driver's have to set it up.
After all, the mlx4 driver will have to decide for itself what the port
transport is and tell us, we can't tell it.

 2. how to setup? (still infer from the transport and link layer like we 
 currently do?)

Find each point in each driver where they currently set the link layer
and transport fields today, and replace that with setting the new
transport bitmask instead.

 3. in case if a device has ports with different link layer type (please 
 correct
 me if this will never happen), then only one bitmask may not be enough to
 present the transport of all the ports? (maybe create a bitmask per port?)

Correct, a bitmask per port.  And we can remove the existing transport
and link layer elements of the struct and replace it with just the new
transport.  Then, whenever we need to copy a struct to user space, we
have a helper that looks something like this:

static void inline ib_set_user_transport(struct ib_device *ibdev,
 struct user_ibv_device *uibdev)
{
switch(ibdev-port[port]-transport) {
case TRANSPORT_IB:
case TRANSPORT_OPA:
uibdev-port[port]-link_layer = INFINIBAND;
uibdev-port[port]-transport = INFINIBAND;
break;
case TRANSPORT_IWARP:
uibdev-port[port]-link_layer = INFINIBAND;
uibdev-port[port]-transport = IWARP;
break;
case TRANSPORT_ROCE:
uibdev-port[port]-link_layer = ETHERNET;
uibdev-port[port]-transport = INFINIBAND;
break;
case TRANSPORT_USNIC:
uibdev-port[port]-link_layer = ETHERNET;
uibdev-port[port]-transport = whatever USNIC uses today;
break;
default:
pr_err(ibdev, unknown transport type %x\n,
   ibdev-port[port]-transport);
}
}

That preserves the user space ABI and all user programs keep working,
while we update to an internal representation that makes more sense for
how things have evolved.

 Regards,
 Michael Wang
 
 
  If we do this, then the only thing we have to fix up to preserve ABI
  with user space is to make sure that any time we export an ibv_device
  struct and any time we import the same, we convert from our new internal
  representation to the old representation that user space expects.  And
  we also need to make a few changes in the sysfs code to display the
  properties as things expect.  But, that would allow us to fix up what I
  see as a problem right now, which is that we hide the information we
  need to know what sort of device we are working on in two different
  fields: the transport and the link layer.  Instead, just use one field
  with enough variants that we can store all of the relevant information
  we need in that one field.  This has the benefit that any comparisons
  that happen in hot paths will now always be a single bitwise comparison
  and will no longer need to hit two separate variables for two separate
  compares.
 
 
 
 


-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD




signature.asc
Description: This is a digitally signed message part


Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1

2015-03-26 Thread Chuck Lever

On Mar 26, 2015, at 1:39 PM, Anna Schumaker anna.schuma...@netapp.com wrote:

 Hey Chuck,
 
 I didn't see anything that needs to be fixed up in these patches.  Are they 
 ready for me?

Thanks for the review. IMO we can go one of two routes:

 - Wait for HCA vendors to test this latest version of the series, or

 - Merge it now, and simply apply any needed fixes on top before the 4.1
window opens.

What do you prefer? Is it possible to get this series in front of the
zero-day test folks before you merge?


 Anna
 
 On 03/24/2015 04:30 PM, Chuck Lever wrote:
 This is a series of client-side patches for NFS/RDMA. In preparation
 for increasing the transport credit limit and maximum rsize/wsize,
 I've re-factored the memory registration logic into separate files,
 invoked via a method API.
 
 The two main optimizations in v1 of this series have been dropped.
 Sagi Grimberg didn't like the complexity of the solution, and there
 isn't enough time to rework it, test the new version, and get it
 reviewed before the 4.1 merge window opens. I'm going to prepare
 these for 4.2.
 
 Fixes suggested by reviewers have been included before the
 refactoring patches to make it easier to backport them to previous
 kernels.
 
 The series is available in the nfs-rdma-for-4.1 topic branch at
 
 git://linux-nfs.org/projects/cel/cel-2.6.git
 
 Changes since v1:
 - Rebased on 4.0-rc5
 - Main optimizations postponed to 4.2
 - Addressed review comments from Anna, Sagi, and Devesh
 
 ---
 
 Chuck Lever (15):
  SUNRPC: Introduce missing well-known netids
  xprtrdma: Display IPv6 addresses and port numbers correctly
  xprtrdma: Perform a full marshal on retransmit
  xprtrdma: Byte-align FRWR registration
  xprtrdma: Prevent infinite loop in rpcrdma_ep_create()
  xprtrdma: Add vector of ops for each memory registration strategy
  xprtrdma: Add a max_payload op for each memreg mode
  xprtrdma: Add a register_external op for each memreg mode
  xprtrdma: Add a deregister_external op for each memreg mode
  xprtrdma: Add init MRs memreg op
  xprtrdma: Add reset MRs memreg op
  xprtrdma: Add destroy MRs memreg op
  xprtrdma: Add open memreg op
  xprtrdma: Handle non-SEND completions via a callout
  xprtrdma: Make rpcrdma_{un}map_one() into inline functions
 
 
 include/linux/sunrpc/msg_prot.h|8 
 net/sunrpc/xprtrdma/Makefile   |3 
 net/sunrpc/xprtrdma/fmr_ops.c  |  208 +++
 net/sunrpc/xprtrdma/frwr_ops.c |  353 ++
 net/sunrpc/xprtrdma/physical_ops.c |   94 +
 net/sunrpc/xprtrdma/rpc_rdma.c |   87 ++--
 net/sunrpc/xprtrdma/transport.c|   61 ++-
 net/sunrpc/xprtrdma/verbs.c|  699 
 +++-
 net/sunrpc/xprtrdma/xprt_rdma.h|   90 -
 9 files changed, 882 insertions(+), 721 deletions(-)
 create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c
 create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c
 create mode 100644 net/sunrpc/xprtrdma/physical_ops.c
 
 --
 Chuck Lever
 --
 To unsubscribe from this list: send the line unsubscribe linux-nfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-nfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1

2015-03-26 Thread Anna Schumaker
Hey Chuck,

I didn't see anything that needs to be fixed up in these patches.  Are they 
ready for me?

Anna

On 03/24/2015 04:30 PM, Chuck Lever wrote:
 This is a series of client-side patches for NFS/RDMA. In preparation
 for increasing the transport credit limit and maximum rsize/wsize,
 I've re-factored the memory registration logic into separate files,
 invoked via a method API.
 
 The two main optimizations in v1 of this series have been dropped.
 Sagi Grimberg didn't like the complexity of the solution, and there
 isn't enough time to rework it, test the new version, and get it
 reviewed before the 4.1 merge window opens. I'm going to prepare
 these for 4.2.
 
 Fixes suggested by reviewers have been included before the
 refactoring patches to make it easier to backport them to previous
 kernels.
 
 The series is available in the nfs-rdma-for-4.1 topic branch at
 
 git://linux-nfs.org/projects/cel/cel-2.6.git
 
 Changes since v1:
 - Rebased on 4.0-rc5
 - Main optimizations postponed to 4.2
 - Addressed review comments from Anna, Sagi, and Devesh
 
 ---
 
 Chuck Lever (15):
   SUNRPC: Introduce missing well-known netids
   xprtrdma: Display IPv6 addresses and port numbers correctly
   xprtrdma: Perform a full marshal on retransmit
   xprtrdma: Byte-align FRWR registration
   xprtrdma: Prevent infinite loop in rpcrdma_ep_create()
   xprtrdma: Add vector of ops for each memory registration strategy
   xprtrdma: Add a max_payload op for each memreg mode
   xprtrdma: Add a register_external op for each memreg mode
   xprtrdma: Add a deregister_external op for each memreg mode
   xprtrdma: Add init MRs memreg op
   xprtrdma: Add reset MRs memreg op
   xprtrdma: Add destroy MRs memreg op
   xprtrdma: Add open memreg op
   xprtrdma: Handle non-SEND completions via a callout
   xprtrdma: Make rpcrdma_{un}map_one() into inline functions
 
 
  include/linux/sunrpc/msg_prot.h|8 
  net/sunrpc/xprtrdma/Makefile   |3 
  net/sunrpc/xprtrdma/fmr_ops.c  |  208 +++
  net/sunrpc/xprtrdma/frwr_ops.c |  353 ++
  net/sunrpc/xprtrdma/physical_ops.c |   94 +
  net/sunrpc/xprtrdma/rpc_rdma.c |   87 ++--
  net/sunrpc/xprtrdma/transport.c|   61 ++-
  net/sunrpc/xprtrdma/verbs.c|  699 
 +++-
  net/sunrpc/xprtrdma/xprt_rdma.h|   90 -
  9 files changed, 882 insertions(+), 721 deletions(-)
  create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c
  create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c
  create mode 100644 net/sunrpc/xprtrdma/physical_ops.c
 
 --
 Chuck Lever
 --
 To unsubscribe from this list: send the line unsubscribe linux-nfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache

2015-03-26 Thread Somnath Kotur
Hi Matan/Moni,
Could either of you please respond to both of Bart's 
queries?

Thanks
Somnath

 -Original Message-
 From: Bart Van Assche [mailto:bart.vanass...@sandisk.com]
 Sent: Thursday, March 26, 2015 5:13 AM
 To: Somnath Kotur; rol...@kernel.org
 Cc: linux-rdma@vger.kernel.org; Matan Barak
 Subject: Re: [PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache
 
 On 03/25/2015 02:19 PM, Somnath Kotur wrote:
  +   if (cache-data_vec[ix].attr.ndev 
  +   cache-data_vec[ix].attr.ndev != old_net_dev)
 
 A few lines earlier the memory old_net_dev points at was freed. If two
 instances of this function run concurrently, what prevents that the
 old_net_dev memory has been reallocated and hence that attr.ndev ==
 old_net_dev although both pointers refer(red) to different network devices
 ?
 
  +   ACCESS_ONCE(cache-data_vec[ix].seq) = orig_seq;
 
 Invoking write_gid() is only safe if the caller serializes write_gid() calls.
 Apparently the cache-lock mutex is used for that purpose. So why is it
 necessary to use ACCESS_ONCE() here ? Why is it needed to prevent that
 the compiler coalesces this write with another write into the same structure
 ?
 
  +   /* Make sure the sequence number we remeber was read
 
 This looks like a typo - shouldn't the above read remember ?
 
 BTW, the style of that comment is recommended only for networking code
 and not for IB code. Have you verified this patch with checkpatch ?
 
  +   mutex_lock(cache-lock);
  +
  +   for (ix = 0; ix  cache-sz; ix++)
  +   if (cache-data_vec[ix].attr.ndev == ndev)
  +   write_gid(ib_dev, port, cache, ix, zgid, zattr);
  +
  +   mutex_unlock(cache-lock);
  +   return 0;
 
 The traditional Linux kernel coding style is one blank line before
 mutex_lock() and after mutex_unlock() but not after mutex_lock() nor
 before mutex_unlock().
 
  +   orig_seq = ACCESS_ONCE(cache-data_vec[index].seq);
  +   /* Make sure we read the sequence number before copying the
  +* gid to local storage. */
  +   smp_rmb();
 
 Please use READ_ONCE() instead of ACCESS_ONCE() as recommended in
 linux/compiler.h.
 
  +static void free_roce_gid_cache(struct ib_device *ib_dev, u8 port) {
  +   int i;
  +   struct ib_roce_gid_cache *cache =
  +   ib_dev-cache.roce_gid_cache[port - 1];
  +
  +   if (!cache)
  +   return;
  +
  +   for (i = 0; i  cache-sz; ++i) {
  +   if (memcmp(cache-data_vec[i].gid, zgid,
  +  sizeof(cache-data_vec[i].gid)))
  +   write_gid(ib_dev, port, cache, i, zgid, zattr);
  +   }
   +  kfree(cache-data_vec);
   +  kfree(cache);
   +}
 
 Overwriting data just before it is freed is not useful. Please use
 CONFIG_SLUB_DEBUG=y to debug use-after-free issues instead of such
 code.
 
 Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 RESEND] IB/Verbs: Use helpers to refine the checking on transport and link layer

2015-03-26 Thread Jason Gunthorpe
On Thu, Mar 26, 2015 at 05:58:20PM +0100, Michael Wang wrote:

 The questions is just wondering how the transition method could be, but
 if we have to do the changes for vendor, that sounds like a tough job...

I would see changing how the information is represented in the struct
as a follow on issue. The first patch should go through and replace
all direct access to the link layer/transport/etc with an
appropriately narrow is_XX() test like Doug was suggesting.

That means looking at each code site and determining what it needs,
making a is_XX for it and a kdoc describing exactly what is needed for
the test to return true.

The follow on patch can then rework the is_XX and drop the link
layer/transport stuff..

Some ideas for is_XX:
  IB compatible SA
  QP0 SMP mechanism
  IB SMP format
  OPA SMP format
  QP1 GMP mechanism
  IB compatible CM
  GID addressing
  IP/IPv6 addressing
  Ethernet VLAN
  ...

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 RESEND] IB/Verbs: Use helpers to refine the checking on transport and link layer

2015-03-26 Thread Doug Ledford
On Wed, 2015-03-25 at 16:09 +0100, Michael Wang wrote:
 My sincerely apologies for the corrupted mails, and thanks for Dan's kindly
 remind :-)
 
 There are too many lengthy code to check the transport type of IB device,
 or the link layer type of it's port, this patch set try to use some helper to
 refine and save us some code.
 
 TODO:
 Currently we inferred from the transport type and link layer type to 
 identify
 the way of management, it will be better if we can directly get the 
 indicator
 from vendor.
 
 Sean proposed one suggestion:
 https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23339.html
 
 It may need a big work to adapt current implementation to utilize
 these flags elegantly.
 
 Also the performance concern on query_port() need to be addressed, may be
 some new callback like query_mgmt() could works.
 
 Michael Wang (2):
 [PATCH 1/2] IB/Verbs: Use helpers to check transport and link layer
 [PATCH 2/2] IB/Verbs: Use helpers to check IBoE technology
 
 ---
  drivers/infiniband/core/agent.c   |2 -
  drivers/infiniband/core/cm.c  |2 -
  drivers/infiniband/core/cma.c |   33 
 --
  drivers/infiniband/core/mad.c |6 ++---
  drivers/infiniband/core/multicast.c   |   11 +++---
  drivers/infiniband/core/sa_query.c|   14 ++--
  drivers/infiniband/core/ucm.c |3 --
  drivers/infiniband/core/user_mad.c|2 -
  drivers/infiniband/core/verbs.c   |5 +---
  drivers/infiniband/hw/mlx4/ah.c   |2 -
  drivers/infiniband/hw/mlx4/cq.c   |4 ---
  drivers/infiniband/hw/mlx4/mad.c  |   14 +++-
  drivers/infiniband/hw/mlx4/main.c |8 ++-
  drivers/infiniband/hw/mlx4/mlx4_ib.h  |2 -
  drivers/infiniband/hw/mlx4/qp.c   |   21 ++-
  drivers/infiniband/hw/mlx4/sysfs.c|6 +
  drivers/infiniband/ulp/ipoib/ipoib_main.c |6 ++---
  include/rdma/ib_verbs.h   |   30 +++
  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c   |3 --
  19 files changed, 87 insertions(+), 87 deletions(-)
 

I think this is a step in the right direction.  However, as I brought up
in a different thread, I think it would be best if we start to clear up
some of the funny things in this space, such as calling iWARP link layer
InfiniBand.

One thing we didn't discuss before is that, even if changing these items
around in user space would break user space, changing them around in the
kernel won't break anything except maybe Lustre (which we can inform the
authors about the change so they can anticipate it and do the right
thing in their code) because we can fix up our return values we pass to
user space with no impact to them as it's not on a hot path.  We can
then emulate the old way of setting all these variables to user space
while fixing them up in kernel space.

So, I would suggest that we fix things up thusly:

enum transport {
TRANSPORT_IB=1,
TRANSPORT_IWARP=2,
TRANSPORT_ROCE=4,
TRANSPORT_OPA=8,
TRANSPORT_USNIC=10,
};

#define HAS_SA(ibdev) ((ibdev)-transport  (TRANSPORT_IB|TRANSPORT_OPA))
#define HAS_JUMBO_SA(ibdev) ((ibdev)-transport  TRANSPORT_OPA))

or possibly

static bool ib_dev_has_sa(struct ibv_device *ibdev)
{
return ibdev-transport  (TRANSPORT_IB | TRANSPORT_OPA);
}

If we do this, then the only thing we have to fix up to preserve ABI
with user space is to make sure that any time we export an ibv_device
struct and any time we import the same, we convert from our new internal
representation to the old representation that user space expects.  And
we also need to make a few changes in the sysfs code to display the
properties as things expect.  But, that would allow us to fix up what I
see as a problem right now, which is that we hide the information we
need to know what sort of device we are working on in two different
fields: the transport and the link layer.  Instead, just use one field
with enough variants that we can store all of the relevant information
we need in that one field.  This has the benefit that any comparisons
that happen in hot paths will now always be a single bitwise comparison
and will no longer need to hit two separate variables for two separate
compares.



-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD




signature.asc
Description: This is a digitally signed message part