Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-06 Thread Sagi Grimberg

On 7/6/2015 2:22 AM, Steve Wise wrote:

The semantics for MR access flags are not consistent across RDMA
protocols.  So rather than have applications try and glean what they
need, have them pass in the intended roles and attributes for the MR to
be allocated and let the RDMA core select the appropriate access flags
given the roles, attributes, and device capabilities.

We introduce rdma_mr_roles and rdma_mr_attributes that enumerate the
possible roles and attributes for a MR.  These are documented in the
enums themselves.

New services exported:

rdma_device_access_flags() - given the intended MR roles and attributes
passed in, return the ib_access_flags bits for the device.

rdma_get_dma_mr() - allocate a dma mr using the applications intended
MR roles and MR attributes.  This uses rdma_device_access_flags().

rdma_fast_reg_access_flags() - return the ib_access_flags bits needed
for a fast register WR given the applications intended MR roles and
MR attributes.  This uses rdma_device_access_flags().

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

  drivers/infiniband/core/verbs.c |   30 +++
  include/rdma/ib_verbs.h |  106 +++
  2 files changed, 136 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index bac3fb4..f42595c 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1144,6 +1144,36 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int 
mr_access_flags)
  }
  EXPORT_SYMBOL(ib_get_dma_mr);

+int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs)
+{
+   int access_flags = attrs;
+
+   if (roles  RDMA_MRR_RECV)
+   access_flags |= IB_ACCESS_LOCAL_WRITE;
+
+   if (roles  RDMA_MRR_WRITE_DEST)
+   access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE;
+
+   if (roles  RDMA_MRR_READ_DEST) {
+   access_flags |= IB_ACCESS_LOCAL_WRITE;
+   if (rdma_protocol_iwarp(pd-device,
+   rdma_start_port(pd-device)))
+   access_flags |= IB_ACCESS_REMOTE_WRITE;
+   }
+
+   if (roles  RDMA_MRR_READ_SOURCE)
+   access_flags |= IB_ACCESS_REMOTE_READ;
+
+   if (roles  RDMA_MRR_ATOMIC_DEST)
+   access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC;
+
+   if (roles  RDMA_MRR_MW_BIND)
+   access_flags |= IB_ACCESS_MW_BIND;
+
+   return access_flags;
+}
+EXPORT_SYMBOL(rdma_device_access_flags);
+
  struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd,
 struct ib_phys_buf *phys_buf_array,
 int num_phys_buf,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 986fddb..da1eadf 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2494,6 +2494,112 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, 
int wc_cnt)
  struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags);

  /**
+ * rdma_mr_roles - possible roles an RDMA MR will be used for
+ *
+ * This allows a transport independent RDMA application to
+ * create MRs that are usable for all the desired roles w/o
+ * having to understand which access rights are needed.
+ */
+enum {
+
+   /* lkey used in a ib_recv_wr sge */
+   RDMA_MRR_RECV   = 1,
+
+   /* lkey used for IB_WR_SEND* ops in the ib_send_wr sge */
+   RDMA_MRR_SEND   = (11),
+
+   /* rkey used for IB_WR_RDMA_READ* ops in ib_send_wr wr.rdma.rkey */
+   RDMA_MRR_READ_SOURCE= (12),
+
+   /* lkey used for IB_WR_RDMA_READ* ops in the ib_send_wr sge */
+   RDMA_MRR_READ_DEST  = (13),
+
+   /* lkey used for IB_WR_RDMA_WRITE* ops in the ib_send_wr sge */
+   RDMA_MRR_WRITE_SOURCE   = (14),
+
+   /* rkey used for IB_WR_RDMA_WRITE* ops in ib_send_wr wr.rdma.rkey */
+   RDMA_MRR_WRITE_DEST = (15),
+
+   /*
+* lkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in the ib_send_wr sge
+*/
+   RDMA_MRR_ATOMIC_SOURCE  = (16),
+
+   /*
+* rkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in ib_send_wr
+* wr.atomic.rkey
+*/
+   RDMA_MRR_ATOMIC_DEST= (17),
+
+   /* MR used for a IB_WR_MW_BIND in ib_send_wr wr.bind_mw.bind_info.mr */
+   RDMA_MRR_MW_BIND= (18),
+};
+
+/**
+ * rdma_mr_attributes - attributes for rdma memory regions
+ */
+enum {
+   RDMA_MRA_ZERO_BASED = IB_ZERO_BASED,
+   RDMA_MRA_ACCESS_ON_DEMAND   = IB_ACCESS_ON_DEMAND,
+};
+
+/**
+ * rdma_device_access_flags - Returns the device-specific MR access flags.
+ * @pd: The protection domain associated with the memory region.
+ * @roles: The intended roles of the MR
+ * @attrs: The desired attributes of the MR
+ *
+ * Use the intended roles from @roles along with @attrs and the device
+ * 

Re: [PATCH V5 4/5] RDMA/isert: Set REMOTE_WRITE on DMA MRs to support iWARP devices

2015-07-06 Thread Sagi Grimberg

On 7/5/2015 8:45 PM, Steve Wise wrote:

iWARP devices require REMOTE_WRITE for MRs used as the destination of
an RDMA READ.  So if the device protocol is iWARP, then set REMOTE_WRITE
when allocating the DMA MR.

Signed-off-by: Steve Wise sw...@opengridcomputing.com


Reviewed-by: Sagi Grimberg sa...@mellanox.com
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 3/5] RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth

2015-07-06 Thread Sagi Grimberg

On 7/5/2015 8:44 PM, Steve Wise wrote:

Currently the sg tablesize, which dictates fast register page list
depth to use, does not take into account the limits of the rdma device.
So adjust it once we discover the device fastreg max depth limit.  Also
adjust the max_sectors based on the resulting sg tablesize.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

  drivers/infiniband/ulp/iser/iscsi_iser.c |9 +
  1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 6a594aa..de8730d 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -640,6 +640,15 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep,
   SHOST_DIX_GUARD_CRC);
}

+   /*
+* Limit the sg_tablesize and max_sectors based on the device
+* max fastreg page list length.
+*/
+   shost-sg_tablesize = min_t(unsigned short, shost-sg_tablesize,
+   ib_conn-device-dev_attr.max_fast_reg_page_list_len);
+   shost-max_sectors = min_t(unsigned int,
+   1024, (shost-sg_tablesize * PAGE_SIZE)  9);
+


The min statement is meaningless for max_sectors - you do a min between
default sg_tablesize and frpl length - so the maximum sg_tablesize is
128 which is 1024 max_sectors.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 5/5] RDMA/isert: Limit read depth based on the device max_sge_rd capability

2015-07-06 Thread Sagi Grimberg

On 7/5/2015 8:45 PM, Steve Wise wrote:

Use the device's max_sge_rd capability to compute the target's read sge
depth.  Save both the read and write max_sge values in the isert_conn
struct, and use these when creating RDMA_WRITE/READ work requests.

Signed-off-by: Steve Wise sw...@opengridcomputing.com


Reviewed-by: Sagi Grimberg sa...@mellanox.com
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-06 Thread Sagi Grimberg

On 7/6/2015 2:22 AM, Steve Wise wrote:

The semantics for MR access flags are not consistent across RDMA
protocols.  So rather than have applications try and glean what they
need, have them pass in the intended roles and attributes for the MR to
be allocated and let the RDMA core select the appropriate access flags
given the roles, attributes, and device capabilities.

We introduce rdma_mr_roles and rdma_mr_attributes that enumerate the
possible roles and attributes for a MR.  These are documented in the
enums themselves.

New services exported:

rdma_device_access_flags() - given the intended MR roles and attributes
passed in, return the ib_access_flags bits for the device.

rdma_get_dma_mr() - allocate a dma mr using the applications intended
MR roles and MR attributes.  This uses rdma_device_access_flags().

rdma_fast_reg_access_flags() - return the ib_access_flags bits needed
for a fast register WR given the applications intended MR roles and
MR attributes.  This uses rdma_device_access_flags().

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

  drivers/infiniband/core/verbs.c |   30 +++
  include/rdma/ib_verbs.h |  106 +++
  2 files changed, 136 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index bac3fb4..f42595c 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1144,6 +1144,36 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int 
mr_access_flags)
  }
  EXPORT_SYMBOL(ib_get_dma_mr);

+int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs)
+{
+   int access_flags = attrs;
+
+   if (roles  RDMA_MRR_RECV)
+   access_flags |= IB_ACCESS_LOCAL_WRITE;
+
+   if (roles  RDMA_MRR_WRITE_DEST)
+   access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE;
+
+   if (roles  RDMA_MRR_READ_DEST) {
+   access_flags |= IB_ACCESS_LOCAL_WRITE;
+   if (rdma_protocol_iwarp(pd-device,
+   rdma_start_port(pd-device)))
+   access_flags |= IB_ACCESS_REMOTE_WRITE;
+   }
+
+   if (roles  RDMA_MRR_READ_SOURCE)
+   access_flags |= IB_ACCESS_REMOTE_READ;
+
+   if (roles  RDMA_MRR_ATOMIC_DEST)
+   access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC;
+
+   if (roles  RDMA_MRR_MW_BIND)
+   access_flags |= IB_ACCESS_MW_BIND;
+
+   return access_flags;
+}
+EXPORT_SYMBOL(rdma_device_access_flags);
+
  struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd,
 struct ib_phys_buf *phys_buf_array,
 int num_phys_buf,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 986fddb..da1eadf 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2494,6 +2494,112 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, 
int wc_cnt)
  struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags);

  /**
+ * rdma_mr_roles - possible roles an RDMA MR will be used for
+ *
+ * This allows a transport independent RDMA application to
+ * create MRs that are usable for all the desired roles w/o
+ * having to understand which access rights are needed.
+ */
+enum {
+
+   /* lkey used in a ib_recv_wr sge */
+   RDMA_MRR_RECV   = 1,
+
+   /* lkey used for IB_WR_SEND* ops in the ib_send_wr sge */
+   RDMA_MRR_SEND   = (11),
+
+   /* rkey used for IB_WR_RDMA_READ* ops in ib_send_wr wr.rdma.rkey */
+   RDMA_MRR_READ_SOURCE= (12),
+
+   /* lkey used for IB_WR_RDMA_READ* ops in the ib_send_wr sge */
+   RDMA_MRR_READ_DEST  = (13),
+
+   /* lkey used for IB_WR_RDMA_WRITE* ops in the ib_send_wr sge */
+   RDMA_MRR_WRITE_SOURCE   = (14),
+
+   /* rkey used for IB_WR_RDMA_WRITE* ops in ib_send_wr wr.rdma.rkey */
+   RDMA_MRR_WRITE_DEST = (15),
+
+   /*
+* lkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in the ib_send_wr sge
+*/
+   RDMA_MRR_ATOMIC_SOURCE  = (16),
+
+   /*
+* rkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in ib_send_wr
+* wr.atomic.rkey
+*/
+   RDMA_MRR_ATOMIC_DEST= (17),
+
+   /* MR used for a IB_WR_MW_BIND in ib_send_wr wr.bind_mw.bind_info.mr */
+   RDMA_MRR_MW_BIND= (18),
+};
+
+/**
+ * rdma_mr_attributes - attributes for rdma memory regions
+ */
+enum {
+   RDMA_MRA_ZERO_BASED = IB_ZERO_BASED,
+   RDMA_MRA_ACCESS_ON_DEMAND   = IB_ACCESS_ON_DEMAND,
+};
+
+/**
+ * rdma_device_access_flags - Returns the device-specific MR access flags.
+ * @pd: The protection domain associated with the memory region.
+ * @roles: The intended roles of the MR
+ * @attrs: The desired attributes of the MR
+ *
+ * Use the intended roles from @roles along with @attrs and the device
+ * 

Re: [PATCH v3] libibverbs init.c: conditionally emit warning if no userspace driver found

2015-07-06 Thread Jeff Squyres (jsquyres)
On Jun 17, 2015, at 10:25 AM, Doug Ledford dledf...@redhat.com wrote:
 
 The patch is accepted, I just haven’t pushed it out yet.

Is there a timeline for when this patch will be available in the upstream git 
repo and released in a new version of libibverbs?

I ask because we'd like to see this patch get into upstream distro libibverbs 
releases.  Once that happens, we can start planning the end of the horrible 
hackarounds we had to put into place (e.g., in Open MPI) to suppress the 
misleading libibverbs output.

Thanks!

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

N�r��yb�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

Re: [PATCH for-next V6 3/5] IB/uverbs: Enable device removal when there are active user space applications

2015-07-06 Thread Yishai Hadas

On 6/30/2015 9:40 PM, Jason Gunthorpe wrote:

On Tue, Jun 30, 2015 at 01:26:05PM +0300, Yishai Hadas wrote:

  struct ib_uverbs_device {
-   struct kref ref;
+   struct kref comp_ref;
+   struct kref free_ref;


So.. I was looking at this, and there is something wrong with the
existing code.

This old code:

cdev_del(uverbs_dev-cdev);
[..]
wait_for_completion(uverbs_dev-comp);
-   kfree(uverbs_dev);

Has built in to it an assumption that when cdev_del returns there can
be no possible open() running. Which doesn't appear to be true, cdev
calls open unlocked and relies on refcounting to make everything work
out.


The patch that introduces this bug was added 5 years ago by Alex Chiang 
and Signed-off-by: Roland Dreier.


Look at commit ID:2a72f212263701b927559f6850446421d5906c41, it can be 
seen also at: 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2a72f212263701b 



Before this commit there was a device look-up table that was protected 
by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When 
it was dropped and container_of was used instead, it enabled the race 
with remove_one as dev might be freed just after:
dev = container_of(inode-i_cdev, struct ib_uverbs_device, cdev) but 
before the kref_get.


In addition, this buggy patch added some dead code as 
container_of(x,y,z) can never be NULL and so dev can never be NULL.
As a result the comment above ib_uverbs_open saying the open method 
will either immediately run -ENXIO is wrong as it can never happen.


 static int ib_uverbs_open(struct inode *inode, struct file *filp)
 {
@@ -631,13 +628,10 @@ static int ib_uverbs_open(struct inode *inode, 
struct file *filp)

struct ib_uverbs_file *file;
int ret;

-   spin_lock(map_lock);
-   dev = dev_table[iminor(inode) - IB_UVERBS_BASE_MINOR];
+   dev = container_of(inode-i_cdev, struct ib_uverbs_device, cdev);
if (dev)
kref_get(dev-ref);
-   spin_unlock(map_lock);
-
-   if (!dev)
+   else
return -ENXIO;


Doug/Jason,
AFAIK V6 addressed all opened comments raised by Jason, including the 
last one that asked to use 2 separate krefs for both complete and free, 
it didn't introduced the problem above.


I believe that we should go forward and take the series. Please consider 
that this series fixes an existing oops in patch #1 and adds a missing 
functionality in the kernel, Enable device removal when there are 
active user space clients.


To fix the existing 5 years bug an orthogonal patch that fixes the buggy 
patch should be sent.


Alex/Roland:
Please review above, any option that you'll contribute a patch that 
solves that problem ? any comment on ?




--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH for-4.2 0/1] RDMA/core: Fixes for port mapper client registration

2015-07-06 Thread Nikolova, Tatyana E
Hello Steve,

 how can I reproduce the original bug/issue and then test this fix on 
 iw_cxgb4?

Have iwpmd running and start an rdma application. 
Leave the rdma application running and restart the iwpmd. 
When you stop the application, if you have the debug enabled, you will see that 
the remove mapping request from the client driver to the port mapper service 
isn't going through, (without the fix, it isn't allowed, because the client is 
treated as unregistered).

Tatyana

-Original Message-
From: linux-rdma-ow...@vger.kernel.org 
[mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Steve Wise
Sent: Monday, July 06, 2015 12:21 PM
To: Nikolova, Tatyana E; 'Doug Ledford'
Cc: Lacombe, John S; linux-rdma@vger.kernel.org
Subject: RE: [PATCH for-4.2 0/1] RDMA/core: Fixes for port mapper client 
registration



 -Original Message-
 From: Tatyana Nikolova [mailto:tatyana.e.nikol...@intel.com]
 Sent: Thursday, July 02, 2015 12:46 PM
 To: Doug Ledford
 Cc: john.s.laco...@intel.com; sw...@opengridcomputing.com; 
 linux-rdma@vger.kernel.org
 Subject: [PATCH for-4.2 0/1] RDMA/core: Fixes for port mapper client 
 registration
 
 This patch has been reworked, but is very similar to the previously 
 submitted (10/16/2014) patch, which was forgotten:
 
 [PATCH 1/1] RDMA/core: Fixes for port mapper client registration 
 https://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg21512.htm
 l
 
 This patch is addressing the same issue as the previous one - the 
 clients which have provided their mapping info to the user space 
 service after it is restarted, are treated as unregistered and are not 
 allowed to make remove mapping requests.
 
 If the user space port mapper service is restarted, it notifies the 
 port mapper clients and they provide the mapping information they have 
 in use. At this time with the fix the registration type of the client 
 is set to IWPM_REG_INCOMPL and the client is allowed to make remove 
 mapping requests to the user space service.
 
 There are some improvements in the current patch:
 1) Adding functions to set/check registration type
 2) New clients (calling iwpm_init()) have their registration set to 
 IWPM_REG_UNDEF
 3) If the port mapper user space service is not available, then the client
registration stays IWPM_REG_UNDEF and the registration type won't be
checked until the service becomes available (obviously no mappings are
possible, if the user space service isn't running).
 
 For these changes to take effect no provider changes are necessary.
 The patch contains fixes for the current registration logic.
 

Hey Tatyana, how can I reproduce the original bug/issue and then test this fix 
on iw_cxgb4? 





--
To unsubscribe from this list: send the line unsubscribe linux-rdma in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-06 Thread Steve Wise


 -Original Message-
 From: Haggai Eran [mailto:hagg...@mellanox.com]
 Sent: Monday, July 06, 2015 2:09 AM
 To: Steve Wise; dledf...@redhat.com
 Cc: sa...@mellanox.com; ogerl...@mellanox.com; r...@mellanox.com; 
 linux-rdma@vger.kernel.org; e...@mellanox.com; target-
 de...@vger.kernel.org; linux-...@vger.kernel.org; 
 trond.mykleb...@primarydata.com; bfie...@fieldses.org
 Subject: Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags
 
 On 06/07/2015 02:22, Steve Wise wrote:
  +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs)
  +{
  +   int access_flags = attrs;
  +
  +   if (roles  RDMA_MRR_RECV)
  +   access_flags |= IB_ACCESS_LOCAL_WRITE;
  +
  +   if (roles  RDMA_MRR_WRITE_DEST)
  +   access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE;
  +
  +   if (roles  RDMA_MRR_READ_DEST) {
  +   access_flags |= IB_ACCESS_LOCAL_WRITE;
  +   if (rdma_protocol_iwarp(pd-device,
  +   rdma_start_port(pd-device)))
  +   access_flags |= IB_ACCESS_REMOTE_WRITE;
  +   }
  +
  +   if (roles  RDMA_MRR_READ_SOURCE)
  +   access_flags |= IB_ACCESS_REMOTE_READ;
  +
  +   if (roles  RDMA_MRR_ATOMIC_DEST)
  +   access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC;
 
 I think you need LOCAL_WRITE for ATOMIC_SOURCE in order to receive the
 results of the atomic operation.
 

Ok.  I'm not familiar with atomics.  I'll verify with the IB spec and update 
the code as needed.


  +
  +   if (roles  RDMA_MRR_MW_BIND)
  +   access_flags |= IB_ACCESS_MW_BIND;
  +
  +   return access_flags;
  +}
  +EXPORT_SYMBOL(rdma_device_access_flags);

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 1/4] IB/netlink: Add defines for local service requests through netlink

2015-07-06 Thread Jason Gunthorpe
   +/* Local service status attribute */
   +enum {
   + LS_NLA_STATUS_SUCCESS = 0,
   + LS_NLA_STATUS_EINVAL,
   + LS_NLA_STATUS_ENODATA,
   + LS_NLA_STATUS_MAX
   +};
  
  So, this is never used, there seems to be a bunch of never used stuff
  - please audit everything and get rid of the cruft before re-sending 
  anything.
 
 Well, EINVAL and ENODATA are not used by the kernel, but used by the
 application (ibacm). Should this file
 (include/uapi/rdma/rdma_netlink.h) contain only defines used by both
 kernel and application? In that case, the kernel may see
 unrecognized status value if it ever decides to check the status
 code when the response status is ERR.

Get rid of the status value completely, it serves no current
purpose. If in future we need something we can always add a new
attribute.

Don't pollute the kernel headers with ibacm implementation details.

  We need a way to encode three reply types, I suggest:
   RDMA_NL_LS_F_ERR
 For some reason the listener could not respond. The kernel should
 issue the request on its own. Identical to a timeout.
   Good flags, but an empty reply with no LS_NLA_TYPE_PATH_RECORDs
 The listener responded and there are no paths. Return no paths
 failure to requestor.
   Good flags, and up to 6 LS_NLA_TYPE_PATH_RECORDs
 Success
 
 Current implementation can be easily modified to handle these three cases.

Are you going to use this scheme or do you have a different idea?

  There are only two remaining problems I see with the actual netlink
  schema:
   1) There is no easy indication what port the request is coming
  from. User space absolutely needs that to be able to forward a
  request on to the proper SA. Yes, we could look at the SGID, but
  with gid aliases that seems like alot of work for a performant
  API. Ideas? Include the hardware port GUID? Port Number? Device
  Name?
  Not sure, but does need to be addressed.
 
 We can pass the device name and port number to the user
 application. The device and port_num are two of the parameters to
 ib_sa_path_rec_get().

That might be best, a ifindex would be even better, but we don't have
one...

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] libibverbs init.c: conditionally emit warning if no userspace driver found

2015-07-06 Thread Doug Ledford
On 07/06/2015 09:35 AM, Jeff Squyres (jsquyres) wrote:
 On Jun 17, 2015, at 10:25 AM, Doug Ledford dledf...@redhat.com
 wrote:
 
 The patch is accepted, I just haven’t pushed it out yet.
 
 Is there a timeline for when this patch will be available in the
 upstream git repo and released in a new version of libibverbs?
 
 I ask because we'd like to see this patch get into upstream distro
 libibverbs releases.  Once that happens, we can start planning the
 end of the horrible hackarounds we had to put into place (e.g., in
 Open MPI) to suppress the misleading libibverbs output.
 
 Thanks!
 

The goal I had in mind was to work through this merge window until we
are in RCs on kernel stuff, then shift my focus to libibverbs and try to
work through a few of the outstanding issues for it and maybe have a
release prior to the next merge window for the kernel.

-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH for-next V6 3/5] IB/uverbs: Enable device removal when there are active user space applications

2015-07-06 Thread Jason Gunthorpe
On Mon, Jul 06, 2015 at 05:08:08PM +0300, Yishai Hadas wrote:

 The patch that introduces this bug was added 5 years ago by Alex Chiang and
 Signed-off-by: Roland Dreier.
 
 Look at commit ID:2a72f212263701b927559f6850446421d5906c41, it can be seen
 also at:
 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2a72f212263701b

Perhaps, this one also looks involved as well:

commit 055422ddbb0a7610c5f57a056743d7336a39e90f
Author: Alexander Chiang achi...@hp.com
Date:   Tue Feb 2 19:07:49 2010 +

IB/uverbs: Convert *cdev to cdev in struct ib_uverbs_device

Instead of storing a pointer to a cdev, embed the entire struct cdev.

Embedding the cdev without using a parent kobject looks like the root
mistake.

 AFAIK V6 addressed all opened comments raised by Jason, including the last
 one that asked to use 2 separate krefs for both complete and free, it didn't
 introduced the problem above.

It does make it worse though, previously the module locking would make
it unlikely to ever hit any problem here, but now we have a naked
fully exposed race where release races with kfree resulting in
use-after-free. I'd think hitting it is quite likely if the new
feature is being used, and subtle memory corruption is not something
we want to see in the kernel.

So, I'd say, yes it is an old bug, but it is unlikely to hit it. This
patch series is making it much likely, so it needs to be fixed.

In any event, I'm not sure what you are complaining about - this
series absolutely reworks the lifetime model of ib_uverbs_device, why
on earth do you think it is OK to have a buggy new implementation just
because the previous version was buggy? *Especially* when someone
takes the time to point out the mistake and tells you exactly how to
fix it, and it is *trival* to do?

Even worse: I went through and audited the lifetime of V6's new model,
and I think that is *absolutely* something you should have done before
sending V1 :(

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH for-4.2 0/1] RDMA/core: Fixes for port mapper client registration

2015-07-06 Thread Steve Wise


 -Original Message-
 From: Tatyana Nikolova [mailto:tatyana.e.nikol...@intel.com]
 Sent: Thursday, July 02, 2015 12:46 PM
 To: Doug Ledford
 Cc: john.s.laco...@intel.com; sw...@opengridcomputing.com; 
 linux-rdma@vger.kernel.org
 Subject: [PATCH for-4.2 0/1] RDMA/core: Fixes for port mapper client 
 registration
 
 This patch has been reworked, but is very similar to the previously
 submitted (10/16/2014) patch, which was forgotten:
 
 [PATCH 1/1] RDMA/core: Fixes for port mapper client registration
 https://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg21512.html
 
 This patch is addressing the same issue as the previous one -
 the clients which have provided their mapping info
 to the user space service after it is restarted, are treated as
 unregistered and are not allowed to make remove mapping requests.
 
 If the user space port mapper service is restarted, it notifies
 the port mapper clients and they provide the mapping information they
 have in use. At this time with the fix the registration type of the client
 is set to IWPM_REG_INCOMPL and the client is allowed to make remove
 mapping requests to the user space service.
 
 There are some improvements in the current patch:
 1) Adding functions to set/check registration type
 2) New clients (calling iwpm_init()) have their registration set to 
 IWPM_REG_UNDEF
 3) If the port mapper user space service is not available, then the client
registration stays IWPM_REG_UNDEF and the registration type won't be
checked until the service becomes available (obviously no mappings are
possible, if the user space service isn't running).
 
 For these changes to take effect no provider changes are necessary.
 The patch contains fixes for the current registration logic.
 

Hey Tatyana, how can I reproduce the original bug/issue and then test this fix 
on iw_cxgb4? 





--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-06 Thread Sagi Grimberg

On 7/6/2015 5:37 PM, Steve Wise wrote:




-Original Message-
From: Sagi Grimberg [mailto:sa...@dev.mellanox.co.il]
Sent: Monday, July 06, 2015 2:54 AM
To: Steve Wise; dledf...@redhat.com
Cc: sa...@mellanox.com; ogerl...@mellanox.com; r...@mellanox.com; 
linux-rdma@vger.kernel.org; e...@mellanox.com; target-
de...@vger.kernel.org; linux-...@vger.kernel.org; 
trond.mykleb...@primarydata.com; bfie...@fieldses.org
Subject: Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

On 7/6/2015 2:22 AM, Steve Wise wrote:

The semantics for MR access flags are not consistent across RDMA
protocols.  So rather than have applications try and glean what they
need, have them pass in the intended roles and attributes for the MR to
be allocated and let the RDMA core select the appropriate access flags
given the roles, attributes, and device capabilities.

We introduce rdma_mr_roles and rdma_mr_attributes that enumerate the
possible roles and attributes for a MR.  These are documented in the
enums themselves.

New services exported:

rdma_device_access_flags() - given the intended MR roles and attributes
passed in, return the ib_access_flags bits for the device.

rdma_get_dma_mr() - allocate a dma mr using the applications intended
MR roles and MR attributes.  This uses rdma_device_access_flags().

rdma_fast_reg_access_flags() - return the ib_access_flags bits needed
for a fast register WR given the applications intended MR roles and
MR attributes.  This uses rdma_device_access_flags().

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

   drivers/infiniband/core/verbs.c |   30 +++
   include/rdma/ib_verbs.h |  106 
+++
   2 files changed, 136 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index bac3fb4..f42595c 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1144,6 +1144,36 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int 
mr_access_flags)
   }
   EXPORT_SYMBOL(ib_get_dma_mr);

+int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs)
+{
+   int access_flags = attrs;
+
+   if (roles  RDMA_MRR_RECV)
+   access_flags |= IB_ACCESS_LOCAL_WRITE;
+
+   if (roles  RDMA_MRR_WRITE_DEST)
+   access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE;
+
+   if (roles  RDMA_MRR_READ_DEST) {
+   access_flags |= IB_ACCESS_LOCAL_WRITE;
+   if (rdma_protocol_iwarp(pd-device,
+   rdma_start_port(pd-device)))
+   access_flags |= IB_ACCESS_REMOTE_WRITE;
+   }
+
+   if (roles  RDMA_MRR_READ_SOURCE)
+   access_flags |= IB_ACCESS_REMOTE_READ;
+
+   if (roles  RDMA_MRR_ATOMIC_DEST)
+   access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC;
+
+   if (roles  RDMA_MRR_MW_BIND)
+   access_flags |= IB_ACCESS_MW_BIND;
+
+   return access_flags;
+}
+EXPORT_SYMBOL(rdma_device_access_flags);
+
   struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd,
 struct ib_phys_buf *phys_buf_array,
 int num_phys_buf,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 986fddb..da1eadf 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2494,6 +2494,112 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, 
int wc_cnt)
   struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags);

   /**
+ * rdma_mr_roles - possible roles an RDMA MR will be used for
+ *
+ * This allows a transport independent RDMA application to
+ * create MRs that are usable for all the desired roles w/o
+ * having to understand which access rights are needed.
+ */
+enum {
+
+   /* lkey used in a ib_recv_wr sge */
+   RDMA_MRR_RECV   = 1,
+
+   /* lkey used for IB_WR_SEND* ops in the ib_send_wr sge */
+   RDMA_MRR_SEND   = (11),
+
+   /* rkey used for IB_WR_RDMA_READ* ops in ib_send_wr wr.rdma.rkey */
+   RDMA_MRR_READ_SOURCE= (12),
+
+   /* lkey used for IB_WR_RDMA_READ* ops in the ib_send_wr sge */
+   RDMA_MRR_READ_DEST  = (13),
+
+   /* lkey used for IB_WR_RDMA_WRITE* ops in the ib_send_wr sge */
+   RDMA_MRR_WRITE_SOURCE   = (14),
+
+   /* rkey used for IB_WR_RDMA_WRITE* ops in ib_send_wr wr.rdma.rkey */
+   RDMA_MRR_WRITE_DEST = (15),
+
+   /*
+* lkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in the ib_send_wr sge
+*/
+   RDMA_MRR_ATOMIC_SOURCE  = (16),
+
+   /*
+* rkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in ib_send_wr
+* wr.atomic.rkey
+*/
+   RDMA_MRR_ATOMIC_DEST= (17),
+
+   /* MR used for a IB_WR_MW_BIND in ib_send_wr wr.bind_mw.bind_info.mr */
+   RDMA_MRR_MW_BIND= (18),
+};
+
+/**
+ * 

RE: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-06 Thread Steve Wise


 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org 
 [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Sagi Grimberg
 Sent: Monday, July 06, 2015 11:18 AM
 To: Steve Wise; dledf...@redhat.com
 Cc: sa...@mellanox.com; ogerl...@mellanox.com; r...@mellanox.com; 
 linux-rdma@vger.kernel.org; e...@mellanox.com; target-
 de...@vger.kernel.org; linux-...@vger.kernel.org; 
 trond.mykleb...@primarydata.com; bfie...@fieldses.org
 Subject: Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags
 
 On 7/6/2015 5:37 PM, Steve Wise wrote:
 
 
  -Original Message-
  From: Sagi Grimberg [mailto:sa...@dev.mellanox.co.il]
  Sent: Monday, July 06, 2015 2:54 AM
  To: Steve Wise; dledf...@redhat.com
  Cc: sa...@mellanox.com; ogerl...@mellanox.com; r...@mellanox.com; 
  linux-rdma@vger.kernel.org; e...@mellanox.com; target-
  de...@vger.kernel.org; linux-...@vger.kernel.org; 
  trond.mykleb...@primarydata.com; bfie...@fieldses.org
  Subject: Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags
 
  On 7/6/2015 2:22 AM, Steve Wise wrote:
  The semantics for MR access flags are not consistent across RDMA
  protocols.  So rather than have applications try and glean what they
  need, have them pass in the intended roles and attributes for the MR to
  be allocated and let the RDMA core select the appropriate access flags
  given the roles, attributes, and device capabilities.
 
  We introduce rdma_mr_roles and rdma_mr_attributes that enumerate the
  possible roles and attributes for a MR.  These are documented in the
  enums themselves.
 
  New services exported:
 
  rdma_device_access_flags() - given the intended MR roles and attributes
  passed in, return the ib_access_flags bits for the device.
 
  rdma_get_dma_mr() - allocate a dma mr using the applications intended
  MR roles and MR attributes.  This uses rdma_device_access_flags().
 
  rdma_fast_reg_access_flags() - return the ib_access_flags bits needed
  for a fast register WR given the applications intended MR roles and
  MR attributes.  This uses rdma_device_access_flags().
 
  Signed-off-by: Steve Wise sw...@opengridcomputing.com
  ---
 
 drivers/infiniband/core/verbs.c |   30 +++
 include/rdma/ib_verbs.h |  106 
  +++
 2 files changed, 136 insertions(+), 0 deletions(-)
 
  diff --git a/drivers/infiniband/core/verbs.c 
  b/drivers/infiniband/core/verbs.c
  index bac3fb4..f42595c 100644
  --- a/drivers/infiniband/core/verbs.c
  +++ b/drivers/infiniband/core/verbs.c
  @@ -1144,6 +1144,36 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int 
  mr_access_flags)
 }
 EXPORT_SYMBOL(ib_get_dma_mr);
 
  +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs)
  +{
  + int access_flags = attrs;
  +
  + if (roles  RDMA_MRR_RECV)
  + access_flags |= IB_ACCESS_LOCAL_WRITE;
  +
  + if (roles  RDMA_MRR_WRITE_DEST)
  + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE;
  +
  + if (roles  RDMA_MRR_READ_DEST) {
  + access_flags |= IB_ACCESS_LOCAL_WRITE;
  + if (rdma_protocol_iwarp(pd-device,
  + rdma_start_port(pd-device)))
  + access_flags |= IB_ACCESS_REMOTE_WRITE;
  + }
  +
  + if (roles  RDMA_MRR_READ_SOURCE)
  + access_flags |= IB_ACCESS_REMOTE_READ;
  +
  + if (roles  RDMA_MRR_ATOMIC_DEST)
  + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC;
  +
  + if (roles  RDMA_MRR_MW_BIND)
  + access_flags |= IB_ACCESS_MW_BIND;
  +
  + return access_flags;
  +}
  +EXPORT_SYMBOL(rdma_device_access_flags);
  +
 struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd,
 struct ib_phys_buf *phys_buf_array,
 int num_phys_buf,
  diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
  index 986fddb..da1eadf 100644
  --- a/include/rdma/ib_verbs.h
  +++ b/include/rdma/ib_verbs.h
  @@ -2494,6 +2494,112 @@ static inline int ib_req_ncomp_notif(struct ib_cq 
  *cq, int wc_cnt)
 struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags);
 
 /**
  + * rdma_mr_roles - possible roles an RDMA MR will be used for
  + *
  + * This allows a transport independent RDMA application to
  + * create MRs that are usable for all the desired roles w/o
  + * having to understand which access rights are needed.
  + */
  +enum {
  +
  + /* lkey used in a ib_recv_wr sge */
  + RDMA_MRR_RECV   = 1,
  +
  + /* lkey used for IB_WR_SEND* ops in the ib_send_wr sge */
  + RDMA_MRR_SEND   = (11),
  +
  + /* rkey used for IB_WR_RDMA_READ* ops in ib_send_wr wr.rdma.rkey */
  + RDMA_MRR_READ_SOURCE= (12),
  +
  + /* lkey used for IB_WR_RDMA_READ* ops in the ib_send_wr sge */
  + RDMA_MRR_READ_DEST  = (13),
  +
  + /* lkey used for IB_WR_RDMA_WRITE* ops in the ib_send_wr sge */
  + RDMA_MRR_WRITE_SOURCE   = (14),
  +
  + /* rkey 

RE: [PATCH v7 1/4] IB/netlink: Add defines for local service requests through netlink

2015-07-06 Thread Wan, Kaike
 From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com]
 Sent: Friday, July 03, 2015 5:16 PM
 To: Wan, Kaike
 Cc: linux-rdma@vger.kernel.org; Fleck, John; Weiny, Ira
 Subject: Re: [PATCH v7 1/4] IB/netlink: Add defines for local service requests
 through netlink
 
 On Tue, Jun 30, 2015 at 09:45:52AM -0400, kaike@intel.com wrote:
  @@ -7,12 +7,14 @@ enum {
  RDMA_NL_RDMA_CM = 1,
  RDMA_NL_NES,
  RDMA_NL_C4IW,
  +   RDMA_NL_SA,
 
 I think this should be RDMA_NL_LS to be consistent with the rest, the SA
 resolve OP should be something like:
 
   RDMA_NL_GET_TYPE(RDMA_NL_LS,RDMA_NL_LS_OP_RESOLVE_PATH)
 
 I'd probably also add a comment:
 
  +RDMA_NL_LS,   /* RDMA Local Services */
 
 I have no idea what 'local services' means, it seems like a silly name for 
 this,
 but whatever.

Changed.

 
  +/* Local service group opcodes */
  +enum {
  +   RDMA_NL_LS_OP_RESOLVE = 0,
  +   RDMA_NL_LS_OP_SET_TIMEOUT,
  +   RDMA_NL_LS_NUM_OPS
  +};
 
 I think you should document the schema for each operation here in a
 comment for future readers.

Added.

 
  +/* Local service netlink message flags */
  +#define RDMA_NL_LS_F_OK0x0100  /* Success response */
  +#define RDMA_NL_LS_F_ERR   0x0200  /* Failed response */
 
 These overlap with other generic netlink flags, that seems OK, but didn't look
 too hard.
 
 Drop RDMA_NL_LS_F_OK, we don't need OK and ERR. !ERR == OK.

RDMA_NL_LS_F_OK dropped.

 
 You might need a RDMA_NL_LS_F_REPLY however, see below.

Added.

 
  +/* Local service attribute type */
  +#define RDMA_NLA_F_MANDATORY   (1  13)
  +#define RDMA_NLA_TYPE_MASK ~(NLA_F_NESTED |
 NLA_F_NET_BYTEORDER | \
  + RDMA_NLA_F_MANDATORY)
 
 More brackets for this macro:
 
 #define RDMA_NLA_TYPE_MASK(~(NLA_F_NESTED |
 NLA_F_NET_BYTEORDER | RDMA_NLA_F_MANDATORY))

Added.

 
  +/* Local service status attribute */
  +enum {
  +   LS_NLA_STATUS_SUCCESS = 0,
  +   LS_NLA_STATUS_EINVAL,
  +   LS_NLA_STATUS_ENODATA,
  +   LS_NLA_STATUS_MAX
  +};
 
 So, this is never used, there seems to be a bunch of never used stuff
 - please audit everything and get rid of the cruft before re-sending anything.

Well,  EINVAL and ENODATA are not used by the kernel, but used by the 
application (ibacm). Should this file (include/uapi/rdma/rdma_netlink.h) 
contain only defines used by both kernel and application? In that case, the 
kernel may see unrecognized status value if it ever decides to check the status 
code when the response status is ERR.

 
 We need a way to encode three reply types, I suggest:
  RDMA_NL_LS_F_ERR
For some reason the listener could not respond. The kernel should
issue the request on its own. Identical to a timeout.
  Good flags, but an empty reply with no LS_NLA_TYPE_PATH_RECORDs
The listener responded and there are no paths. Return no paths
failure to requestor.
  Good flags, and up to 6 LS_NLA_TYPE_PATH_RECORDs
Success

Current implementation can be easily modified to handle these three cases.

 
  +struct rdma_nla_ls_status {
  +   __u32   status;
  +};
 
 Never used, drop it

OK.

 
  +
  +/* Local service pathrecord attribute: struct ib_path_rec_data */
  +
  +/* Local service timeout attribute */ struct rdma_nla_ls_timeout {
  +   __u32   timeout;
  +};
 
 I don't think the SET_TIMEOUT schema makes much sense, there is not much
 reason for a nested attribute, just use the rdma_nla_ls_timeout as the value.
 If we need extension then we can add optional attributes after it later
 without breaking.

OK

 
  +/* Local Service ServiceID attribute */ struct rdma_nla_ls_service_id
  +{
  +   __u64   service_id;
  +};
 
 Drop struct, just use u64

OK

 
  +/* Local Service DGID/SGID attribute: big endian */ struct
  +rdma_nla_ls_gid {
  +   __u8gid[16];
  +};
 
 Wish there was a common GID type we could use, but OK..
 
  +/* Local Service Traffic Class attribute */ struct rdma_nla_ls_tclass
  +{
  +   __u8tclass;
  +};
 
 Drop

OK.

 
  +/* Local Service path use attribute */ enum {
  +   LS_NLA_PATH_USE_ALL = 0,
  +   LS_NLA_PATH_USE_UNIDIRECTIONAL,
  +   LS_NLA_PATH_USE_GMP,
  +   LS_NLA_PATH_USE_MAX
  +};
 
 Document how these work

Done.

 
  +
  +struct rdma_nla_ls_path_use {
  +   __u8use;
  +};
  +
  +/* Local Service Pkey attribute*/
  +struct rdma_nla_ls_pkey {
  +   __u16   pkey;
  +};
  +
  +/* Local Service Qos Class attribute */ struct rdma_nla_ls_qos_class
  +{
  +   __u16   qos_class;
  +};
 
 Drop all of these

Done.

 
 There are only two remaining problems I see with the actual netlink
 schema:
  1) There is no easy indication what port the request is coming
 from. User space absolutely needs that to be able to forward a
 request on to the proper SA. Yes, we could look at the SGID, but
 with gid aliases that seems like alot of work for a performant
 API. Ideas? Include the hardware port GUID? Port Number? Device
 

Re: [PATCH v2 2/2] x86/mm/pat, drivers/media/ivtv: move pat warn and replace WARN() with pr_warn()

2015-07-06 Thread Luis R. Rodriguez
On Mon, Jun 29, 2015 at 05:52:18AM -0400, Andy Walls wrote:
 On June 29, 2015 2:55:05 AM EDT, Ingo Molnar mi...@kernel.org wrote:
 
 * Andy Walls a...@silverblocksystems.net wrote:
 
  On Fri, 2015-06-26 at 10:45 +0200, Ingo Molnar wrote:
   * Luis R. Rodriguez mcg...@suse.com wrote:
   
On Thu, Jun 25, 2015 at 08:51:47AM +0200, Ingo Molnar wrote:
 
 * Luis R. Rodriguez mcg...@do-not-panic.com wrote:
 
  From: Luis R. Rodriguez mcg...@suse.com
  
  On built-in kernels this warning will always splat as this is
 part
  of the module init. Fix that by shifting the PAT requirement
 check
  out under the code that does the quasi-probe for the
 device. This
  device driver relies on an existing driver to find its own
 devices,
  it looks for that device driver and its own found devices,
 then
  uses driver_for_each_device() to try to see if it can probe
 each of
  those devices as a frambuffer device with ivtvfb_init_card().
 We
  tuck the PAT requiremenet check then on the
 ivtvfb_init_card()
  call making the check at least require an ivtv device present
  before complaining.
  
  Reported-by: Fengguang Wu fengguang...@intel.com [0-day
 test robot]
  Signed-off-by: Luis R. Rodriguez mcg...@suse.com
  ---
   drivers/media/pci/ivtv/ivtvfb.c | 15 +--
   1 file changed, 9 insertions(+), 6 deletions(-)
  
  diff --git a/drivers/media/pci/ivtv/ivtvfb.c
 b/drivers/media/pci/ivtv/ivtvfb.c
  index 4cb365d..8b95eef 100644
  --- a/drivers/media/pci/ivtv/ivtvfb.c
  +++ b/drivers/media/pci/ivtv/ivtvfb.c
  @@ -38,6 +38,8 @@
   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
  02111-1307  USA
*/
   
  +#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
  +
   #include linux/module.h
   #include linux/kernel.h
   #include linux/fb.h
  @@ -1171,6 +1173,13 @@ static int ivtvfb_init_card(struct
 ivtv *itv)
   {
 int rc;
   
  +#ifdef CONFIG_X86_64
  +  if (pat_enabled()) {
  +  pr_warn(ivtvfb needs PAT disabled, boot with nopat 
  kernel
 parameter\n);
  +  return -ENODEV;
  +  }
  +#endif
  +
 if (itv-osd_info) {
 IVTVFB_ERR(Card %d already initialised\n,
 ivtvfb_card_id);
 return -EBUSY;
 
 Same argument as for ipath: why not make arch_phys_wc_add()
 fail on PAT and 
 return -1, and check it in arch_phys_wc_del()?

The arch_phys_wc_add() is a no-op for PAT systems but for PAT to
 work we need 
not only need to add this in where we replace the MTRR call but
 we also need to 
convert ioremap_nocache() calls to ioremap_wc() but only if
 things were split up 
already.
   
  
  Hi Ingo,
  
   We don't need to do that: for such legacy drivers we can fall back
 to UC just 
   fine, and inform the user that by booting with 'nopat' the old
 behavior will be 
   back...
  
  This is really a user experience decision.
  
  IMO anyone who is still using ivtvfb and an old conventional PCI
 PVR-350 to 
  render, at SDTV resolution, an X Desktop display or video playback on
 a 
  television screen, isn't going to give a hoot about modern things
 like PAT.  The 
  user will simply want the framebuffer performance they are accustomed
 to having 
  with their system.  UC will probably yield unsatisfactory performance
 for an 
  ivtvfb framebuffer.
  
  With that in mind, I would think it better to obviously and clearly
 disable the 
  ivtvfb framebuffer module with PAT enabled, so the user will check
 the log and 
  read the steps needed to obtain acceptable performance.
  
  Maybe that's me just wanting to head off the poor ivtvfb performance
 with 
  latest kernel bug reports.
  
  Whatever the decision, my stock response to bug reports related to
 this will 
  always be What do the logs say?.
 
 So what if that frame buffer is their only (working) frame buffer? A
 slow 
 framebuffer is still much better at giving people logs to look at than
 a 
 non-working one.
 
 Thanks,
 
  Ingo
 
 Hi Ingo,
 
 For an old DVR setup, I can see that being the case.
 
 So a sluggish framebuffer is better than none.

OK, so I knew I had considered this strategy before, I did and here's
the recap:

I originally had proposed this same exact thing with my first patch set but [0]
had set to require a new __arch_phys_wc_add() which forces the setting without
checking if pat_enabled() is true. This was set out as a transient API in hopes
that device drivers that require work but hadn't been converted to split the
ioremap'd calls would later be modified / fixed with the split. Here's an
update for the status of each driver:

Driver  File

fusion  drivers/message/fusion/mptbase.c
  This code was commented out, the code was just removed, this longer applies.

ivtv

Re: IBV_SEND_INLINE Behavior Differing from Spec

2015-07-06 Thread Christopher Mitchell
Roland,

My libmlx4 is named libmlx4-rdmav2.so and timestamped 2014-01-01 (from
the package libmlx4-1, 1.0.5-1ubuntu1 for Ubuntu Trusty 14.04). I use
a mutex-protected FIFO of send buffers to make sure only one thread is
able to acquire a particular send buffer at a time, and my application
does not modify a send buffer after it passes it to my message-sending
method. I've also replicated this with a single-threaded version of
the application; something as simple as zeroing out the first byte of
the send buffer on the line after ibv_post_send() is enough to
occasionally trigger this behavior.

Thanks,
Christopher

On Sun, Jul 5, 2015 at 12:28 PM, Roland Dreier rol...@purestorage.com wrote:
 On Fri, Jul 3, 2015 at 2:50 PM, Christopher Mitchell
 christop...@cemetech.net wrote:
 The manpage for ibv_post_send indicates that if you specify the
 IBV_SEND_INLINE flag, that you can immediately re-use the buffer after
 ibv_post_send() returns. However, I have found that if I modify the
 buffer immediately after ibv_post_send(), the changes are rarely
 reflected in the presumably already-posted message. I'm using
 ConnectX-3 cards and RC connections, if that's relevant. Is this a
 known issue, and/or is it possible I'm missing something obvious?

 I can't see any way to get the behavior you're seeing.  What version
 of libmlx4 are you using?

 The reason I say I don't see how that can happen is that in
 mlx4_post_send() in libmlx4, if IBV_SEND_INLINE is set, the code
 memcpys the data into the descriptor and does not keep any reference
 to the pointers passed in through the sg ilst.  So I don't see how the
 hardware could even detect that you change the data after
 ibv_post_send().

 Or what do you mean about after ibv_post_send()?  Are you possibly
 changing the data from another thread and racing with the memcpy
 before ibv_post_send() returns?

  - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 0/5] iSER support for iWARP

2015-07-06 Thread Nicholas A. Bellinger
Hi Steve  Co,

On Sun, 2015-07-05 at 12:44 -0500, Steve Wise wrote:
 The following series implements support for iWARP transports in the iSER
 initiator and target.  This is based on Doug's k.o/for-4.2 branch.
 
 I've tested this on cxgb4 and mlx4 hardware.
 
 Changes since V4:
 
 iser: fixedcompiler warning
 
 isert: back to setting REMOTE_WRITE only for iWARP devices
 
 Changes since V3:
 
 Fixed commit messages based on feedback.
 
 iser: adjust max_sectors
 
 isert: split into 2 patches
 
 isert: always set REMOTE_WRITE for dma mrs
 
 Changes since V2:
 
 The transport independent work is removed from this series and will
 be submitted in a subsequent series.  This V3 series now enables iWARP
 using existing core services.
 
 Changes since V1:
 
 Introduce and use transport-independent RDMA core services for allocating
 DMA MRs and computing fast register access flags.
 
 Correctly set the device max_sge_rd capability in several rdma device
 drivers.
 
 isert: use device capability max_sge_rd for the read sge depth.
 
 isert: change max_sge to max_write_sge in struct isert_conn.
 
 ---
 
 Sagi Grimberg (1):
   mlx4, mlx5, mthca: Expose max_sge_rd correctly
 
 Steve Wise (4):
   RDMA/isert: Limit read depth based on the device max_sge_rd capability
   RDMA/isert: Set REMOTE_WRITE on DMA MRs to support iWARP devices
   RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max 
 depth
   ipath,qib: Expose max_sge_rd correctly
 
 
  drivers/infiniband/hw/ipath/ipath_verbs.c|1 
  drivers/infiniband/hw/mlx4/main.c|1 
  drivers/infiniband/hw/mlx5/main.c|1 
  drivers/infiniband/hw/mthca/mthca_provider.c |1 
  drivers/infiniband/hw/qib/qib_verbs.c|1 
  drivers/infiniband/ulp/iser/iscsi_iser.c |9 
  drivers/infiniband/ulp/isert/ib_isert.c  |   53 
 ++
  drivers/infiniband/ulp/isert/ib_isert.h  |3 +
  8 files changed, 60 insertions(+), 10 deletions(-)
 

Very excited to see iWARP support for iser-initiator + iser-target.  ;)

No objections to taking the iser-target changes through Doug's
infiniband.git.  For the target pieces:

Acked-by: Nicholas Bellinger n...@linux-iscsi.org

Also Doug, since this series is sufficiently small enough, and allows a class
of hardware that people are expecting to function with iser to 'just work', I'd
recommend to go ahead and push this series for v4.2-rc code.

Thank you,

--nab


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rds: rds_ib_device.refcount overflow

2015-07-06 Thread Haggai Eran
On 24/06/2015 07:54, Wengang Wang wrote:
 There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
 failed(mr pool running out). this lead to the refcount overflow.
 
 A complain in line 117(see following) is seen. From vmcore:
 s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev-refcount is 
 -2147475448.
 That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very 
 likely
 to return ERR_PTR(-EAGAIN).
 
 115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
 116 {
 117 BUG_ON(atomic_read(rds_ibdev-refcount) = 0);
 118 if (atomic_dec_and_test(rds_ibdev-refcount))
 119 queue_work(rds_wq, rds_ibdev-free_work);
 120 }
 
 fix is to drop refcount when rds_ib_alloc_fmr failed.
 
 Signed-off-by: Wengang Wang wen.gang.w...@oracle.com
 ---
  net/rds/ib_rdma.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
 index 273b8bf..657ba9f 100644
 --- a/net/rds/ib_rdma.c
 +++ b/net/rds/ib_rdma.c
 @@ -759,8 +759,10 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned 
 long nents,
   }
  
   ibmr = rds_ib_alloc_fmr(rds_ibdev);
 - if (IS_ERR(ibmr))
 + if (IS_ERR(ibmr)) {
 + rds_ib_dev_put(rds_ibdev);
   return ibmr;
 + }
  
   ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents);
   if (ret == 0)
 

It seems like the function indeed is missing a put on the rds_ibdev in
that case.

Reviewed-by: Haggai Eran hagg...@mellanox.com

You may also want to add:
Fixes: 3e0249f9c05c (RDS/IB: add refcount tracking to struct
rds_ib_device)
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rds: rds_ib_device.refcount overflow

2015-07-06 Thread Wengang Wang

Haggai,

Thanks for review! I will add the message you suggested and re-post.

thanks,
wengang

在 2015年07月06日 14:18, Haggai Eran 写道:

On 24/06/2015 07:54, Wengang Wang wrote:

There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.

A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev-refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).

115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
116 {
117 BUG_ON(atomic_read(rds_ibdev-refcount) = 0);
118 if (atomic_dec_and_test(rds_ibdev-refcount))
119 queue_work(rds_wq, rds_ibdev-free_work);
120 }

fix is to drop refcount when rds_ib_alloc_fmr failed.

Signed-off-by: Wengang Wang wen.gang.w...@oracle.com
---
  net/rds/ib_rdma.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
index 273b8bf..657ba9f 100644
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -759,8 +759,10 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long 
nents,
}
  
  	ibmr = rds_ib_alloc_fmr(rds_ibdev);

-   if (IS_ERR(ibmr))
+   if (IS_ERR(ibmr)) {
+   rds_ib_dev_put(rds_ibdev);
return ibmr;
+   }
  
  	ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents);

if (ret == 0)


It seems like the function indeed is missing a put on the rds_ibdev in
that case.

Reviewed-by: Haggai Eran hagg...@mellanox.com

You may also want to add:
Fixes: 3e0249f9c05c (RDS/IB: add refcount tracking to struct
rds_ib_device)


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] rds: rds_ib_device.refcount overflow

2015-07-06 Thread Wengang Wang
Fixes: 3e0249f9c05c (RDS/IB: add refcount tracking to struct rds_ib_device)

There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.

A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev-refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).

115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
116 {
117 BUG_ON(atomic_read(rds_ibdev-refcount) = 0);
118 if (atomic_dec_and_test(rds_ibdev-refcount))
119 queue_work(rds_wq, rds_ibdev-free_work);
120 }

fix is to drop refcount when rds_ib_alloc_fmr failed.

Signed-off-by: Wengang Wang wen.gang.w...@oracle.com
Reviewed-by: Haggai Eran hagg...@mellanox.com
---
 net/rds/ib_rdma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
index 273b8bf..657ba9f 100644
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -759,8 +759,10 @@ void *rds_ib_get_mr(struct scatterlist *sg, unsigned long 
nents,
}
 
ibmr = rds_ib_alloc_fmr(rds_ibdev);
-   if (IS_ERR(ibmr))
+   if (IS_ERR(ibmr)) {
+   rds_ib_dev_put(rds_ibdev);
return ibmr;
+   }
 
ret = rds_ib_map_fmr(rds_ibdev, ibmr, sg, nents);
if (ret == 0)
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-06 Thread Haggai Eran
On 06/07/2015 02:22, Steve Wise wrote:
 +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs)
 +{
 + int access_flags = attrs;
 +
 + if (roles  RDMA_MRR_RECV)
 + access_flags |= IB_ACCESS_LOCAL_WRITE;
 +
 + if (roles  RDMA_MRR_WRITE_DEST)
 + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE;
 +
 + if (roles  RDMA_MRR_READ_DEST) {
 + access_flags |= IB_ACCESS_LOCAL_WRITE;
 + if (rdma_protocol_iwarp(pd-device,
 + rdma_start_port(pd-device)))
 + access_flags |= IB_ACCESS_REMOTE_WRITE;
 + }
 +
 + if (roles  RDMA_MRR_READ_SOURCE)
 + access_flags |= IB_ACCESS_REMOTE_READ;
 +
 + if (roles  RDMA_MRR_ATOMIC_DEST)
 + access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC;

I think you need LOCAL_WRITE for ATOMIC_SOURCE in order to receive the
results of the atomic operation.

 +
 + if (roles  RDMA_MRR_MW_BIND)
 + access_flags |= IB_ACCESS_MW_BIND;
 +
 + return access_flags;
 +}
 +EXPORT_SYMBOL(rdma_device_access_flags);

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-06 Thread Steve Wise


 -Original Message-
 From: Christoph Hellwig [mailto:h...@infradead.org]
 Sent: Monday, July 06, 2015 12:25 AM
 To: Steve Wise
 Cc: dledf...@redhat.com; sa...@mellanox.com; ogerl...@mellanox.com; 
 r...@mellanox.com; linux-rdma@vger.kernel.org;
 e...@mellanox.com; target-de...@vger.kernel.org; linux-...@vger.kernel.org; 
 trond.mykleb...@primarydata.com; bfie...@fieldses.org
 Subject: Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags
 
 On Sun, Jul 05, 2015 at 06:22:00PM -0500, Steve Wise wrote:
  The semantics for MR access flags are not consistent across RDMA
  protocols.  So rather than have applications try and glean what they
  need, have them pass in the intended roles and attributes for the MR to
  be allocated and let the RDMA core select the appropriate access flags
  given the roles, attributes, and device capabilities.
 
  We introduce rdma_mr_roles and rdma_mr_attributes that enumerate the
  possible roles and attributes for a MR.  These are documented in the
  enums themselves.
 
  New services exported:
 
  rdma_device_access_flags() - given the intended MR roles and attributes
  passed in, return the ib_access_flags bits for the device.
 
  rdma_get_dma_mr() - allocate a dma mr using the applications intended
  MR roles and MR attributes.  This uses rdma_device_access_flags().
 
  rdma_fast_reg_access_flags() - return the ib_access_flags bits needed
  for a fast register WR given the applications intended MR roles and
  MR attributes.  This uses rdma_device_access_flags().
 
  Signed-off-by: Steve Wise sw...@opengridcomputing.com
  ---
 
   drivers/infiniband/core/verbs.c |   30 +++
   include/rdma/ib_verbs.h |  106 
  +++
   2 files changed, 136 insertions(+), 0 deletions(-)
 
  diff --git a/drivers/infiniband/core/verbs.c 
  b/drivers/infiniband/core/verbs.c
  index bac3fb4..f42595c 100644
  --- a/drivers/infiniband/core/verbs.c
  +++ b/drivers/infiniband/core/verbs.c
  @@ -1144,6 +1144,36 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int 
  mr_access_flags)
   }
   EXPORT_SYMBOL(ib_get_dma_mr);
 
  +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs)
  +{
  +   int access_flags = attrs;
 
 Please add an assert for the values that are allowed for attrs.
 
 It also would be highly useful to add a kerneldoc comment describing
 the function and the parameters.  Also __bitwise sparse tricks
 to ensure the right flags are passed wouldn't be a too bad idea.
 

Can you explain the __bitwise sparse tricks?  Or point me to an example.  

  +/**
  + * rdma_mr_attributes - attributes for rdma memory regions
  + */
  +enum {
  +   RDMA_MRA_ZERO_BASED = IB_ZERO_BASED,
  +   RDMA_MRA_ACCESS_ON_DEMAND   = IB_ACCESS_ON_DEMAND,
  +};
 
 Why not specfiy these as separate nuerical namespace?
 

To avoid having to translate them. 

  +/**
  + * rdma_device_access_flags - Returns the device-specific MR access flags.
  + * @pd: The protection domain associated with the memory region.
  + * @roles: The intended roles of the MR
  + * @attrs: The desired attributes of the MR
  + *
  + * Use the intended roles from @roles along with @attrs and the device
  + * capabilities to generate the needed access rights.
  + *
  + * Return: the ib_access_flags value needed to allocate the MR.
  + */
  +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs);
 
 Oh, here we have kerneldoc comments, just in the wrong place.  Please
 move them to the function defintion.

Ok.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-06 Thread Steve Wise


 -Original Message-
 From: Sagi Grimberg [mailto:sa...@dev.mellanox.co.il]
 Sent: Monday, July 06, 2015 2:59 AM
 To: Steve Wise; dledf...@redhat.com
 Cc: sa...@mellanox.com; ogerl...@mellanox.com; r...@mellanox.com; 
 linux-rdma@vger.kernel.org; e...@mellanox.com; target-
 de...@vger.kernel.org; linux-...@vger.kernel.org; 
 trond.mykleb...@primarydata.com; bfie...@fieldses.org
 Subject: Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags
 
 On 7/6/2015 2:22 AM, Steve Wise wrote:
  The semantics for MR access flags are not consistent across RDMA
  protocols.  So rather than have applications try and glean what they
  need, have them pass in the intended roles and attributes for the MR to
  be allocated and let the RDMA core select the appropriate access flags
  given the roles, attributes, and device capabilities.
 
  We introduce rdma_mr_roles and rdma_mr_attributes that enumerate the
  possible roles and attributes for a MR.  These are documented in the
  enums themselves.
 
  New services exported:
 
  rdma_device_access_flags() - given the intended MR roles and attributes
  passed in, return the ib_access_flags bits for the device.
 
  rdma_get_dma_mr() - allocate a dma mr using the applications intended
  MR roles and MR attributes.  This uses rdma_device_access_flags().
 
  rdma_fast_reg_access_flags() - return the ib_access_flags bits needed
  for a fast register WR given the applications intended MR roles and
  MR attributes.  This uses rdma_device_access_flags().
 
  Signed-off-by: Steve Wise sw...@opengridcomputing.com
  ---
 
drivers/infiniband/core/verbs.c |   30 +++
include/rdma/ib_verbs.h |  106 
  +++
2 files changed, 136 insertions(+), 0 deletions(-)
 
  diff --git a/drivers/infiniband/core/verbs.c 
  b/drivers/infiniband/core/verbs.c
  index bac3fb4..f42595c 100644
  --- a/drivers/infiniband/core/verbs.c
  +++ b/drivers/infiniband/core/verbs.c
  @@ -1144,6 +1144,36 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int 
  mr_access_flags)
}
EXPORT_SYMBOL(ib_get_dma_mr);
 
  +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs)
  +{
  +   int access_flags = attrs;
  +
  +   if (roles  RDMA_MRR_RECV)
  +   access_flags |= IB_ACCESS_LOCAL_WRITE;
  +
  +   if (roles  RDMA_MRR_WRITE_DEST)
  +   access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE;
  +
  +   if (roles  RDMA_MRR_READ_DEST) {
  +   access_flags |= IB_ACCESS_LOCAL_WRITE;
  +   if (rdma_protocol_iwarp(pd-device,
  +   rdma_start_port(pd-device)))
  +   access_flags |= IB_ACCESS_REMOTE_WRITE;
  +   }
  +
  +   if (roles  RDMA_MRR_READ_SOURCE)
  +   access_flags |= IB_ACCESS_REMOTE_READ;
  +
  +   if (roles  RDMA_MRR_ATOMIC_DEST)
  +   access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC;
  +
  +   if (roles  RDMA_MRR_MW_BIND)
  +   access_flags |= IB_ACCESS_MW_BIND;
  +
  +   return access_flags;
  +}
  +EXPORT_SYMBOL(rdma_device_access_flags);
  +
struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd,
   struct ib_phys_buf *phys_buf_array,
   int num_phys_buf,
  diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
  index 986fddb..da1eadf 100644
  --- a/include/rdma/ib_verbs.h
  +++ b/include/rdma/ib_verbs.h
  @@ -2494,6 +2494,112 @@ static inline int ib_req_ncomp_notif(struct ib_cq 
  *cq, int wc_cnt)
struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags);
 
/**
  + * rdma_mr_roles - possible roles an RDMA MR will be used for
  + *
  + * This allows a transport independent RDMA application to
  + * create MRs that are usable for all the desired roles w/o
  + * having to understand which access rights are needed.
  + */
  +enum {
  +
  +   /* lkey used in a ib_recv_wr sge */
  +   RDMA_MRR_RECV   = 1,
  +
  +   /* lkey used for IB_WR_SEND* ops in the ib_send_wr sge */
  +   RDMA_MRR_SEND   = (11),
  +
  +   /* rkey used for IB_WR_RDMA_READ* ops in ib_send_wr wr.rdma.rkey */
  +   RDMA_MRR_READ_SOURCE= (12),
  +
  +   /* lkey used for IB_WR_RDMA_READ* ops in the ib_send_wr sge */
  +   RDMA_MRR_READ_DEST  = (13),
  +
  +   /* lkey used for IB_WR_RDMA_WRITE* ops in the ib_send_wr sge */
  +   RDMA_MRR_WRITE_SOURCE   = (14),
  +
  +   /* rkey used for IB_WR_RDMA_WRITE* ops in ib_send_wr wr.rdma.rkey */
  +   RDMA_MRR_WRITE_DEST = (15),
  +
  +   /*
  +* lkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in the ib_send_wr sge
  +*/
  +   RDMA_MRR_ATOMIC_SOURCE  = (16),
  +
  +   /*
  +* rkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in ib_send_wr
  +* wr.atomic.rkey
  +*/
  +   RDMA_MRR_ATOMIC_DEST= (17),
  +
  +   /* MR used for a IB_WR_MW_BIND in ib_send_wr wr.bind_mw.bind_info.mr */
  +   RDMA_MRR_MW_BIND= 

RE: [PATCH V3 0/5] Transport-independent MRs

2015-07-06 Thread Steve Wise


 -Original Message-
 From: Christoph Hellwig [mailto:h...@infradead.org]
 Sent: Monday, July 06, 2015 12:26 AM
 To: Steve Wise
 Cc: dledf...@redhat.com; sa...@mellanox.com; ogerl...@mellanox.com; 
 r...@mellanox.com; linux-rdma@vger.kernel.org;
 e...@mellanox.com; target-de...@vger.kernel.org; linux-...@vger.kernel.org; 
 trond.mykleb...@primarydata.com; bfie...@fieldses.org
 Subject: Re: [PATCH V3 0/5] Transport-independent MRs
 
 On Sun, Jul 05, 2015 at 06:21:53PM -0500, Steve Wise wrote:
  This series introduces transport-independent RDMA core services for
  allocating DMA MRs and computing fast register access flags.  Included are
  changes to the iSER and NFSRDMA ULPs to make use of the new services.
 
 Can you convert all of them and remove the old APIs?

I can.  These are the only transport independent ULPs at this point.  But if 
we want to remove ib_get_dma_mr(), then I can do
this.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-06 Thread Steve Wise


 -Original Message-
 From: Sagi Grimberg [mailto:sa...@dev.mellanox.co.il]
 Sent: Monday, July 06, 2015 2:54 AM
 To: Steve Wise; dledf...@redhat.com
 Cc: sa...@mellanox.com; ogerl...@mellanox.com; r...@mellanox.com; 
 linux-rdma@vger.kernel.org; e...@mellanox.com; target-
 de...@vger.kernel.org; linux-...@vger.kernel.org; 
 trond.mykleb...@primarydata.com; bfie...@fieldses.org
 Subject: Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags
 
 On 7/6/2015 2:22 AM, Steve Wise wrote:
  The semantics for MR access flags are not consistent across RDMA
  protocols.  So rather than have applications try and glean what they
  need, have them pass in the intended roles and attributes for the MR to
  be allocated and let the RDMA core select the appropriate access flags
  given the roles, attributes, and device capabilities.
 
  We introduce rdma_mr_roles and rdma_mr_attributes that enumerate the
  possible roles and attributes for a MR.  These are documented in the
  enums themselves.
 
  New services exported:
 
  rdma_device_access_flags() - given the intended MR roles and attributes
  passed in, return the ib_access_flags bits for the device.
 
  rdma_get_dma_mr() - allocate a dma mr using the applications intended
  MR roles and MR attributes.  This uses rdma_device_access_flags().
 
  rdma_fast_reg_access_flags() - return the ib_access_flags bits needed
  for a fast register WR given the applications intended MR roles and
  MR attributes.  This uses rdma_device_access_flags().
 
  Signed-off-by: Steve Wise sw...@opengridcomputing.com
  ---
 
drivers/infiniband/core/verbs.c |   30 +++
include/rdma/ib_verbs.h |  106 
  +++
2 files changed, 136 insertions(+), 0 deletions(-)
 
  diff --git a/drivers/infiniband/core/verbs.c 
  b/drivers/infiniband/core/verbs.c
  index bac3fb4..f42595c 100644
  --- a/drivers/infiniband/core/verbs.c
  +++ b/drivers/infiniband/core/verbs.c
  @@ -1144,6 +1144,36 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int 
  mr_access_flags)
}
EXPORT_SYMBOL(ib_get_dma_mr);
 
  +int rdma_device_access_flags(struct ib_pd *pd, int roles, int attrs)
  +{
  +   int access_flags = attrs;
  +
  +   if (roles  RDMA_MRR_RECV)
  +   access_flags |= IB_ACCESS_LOCAL_WRITE;
  +
  +   if (roles  RDMA_MRR_WRITE_DEST)
  +   access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE;
  +
  +   if (roles  RDMA_MRR_READ_DEST) {
  +   access_flags |= IB_ACCESS_LOCAL_WRITE;
  +   if (rdma_protocol_iwarp(pd-device,
  +   rdma_start_port(pd-device)))
  +   access_flags |= IB_ACCESS_REMOTE_WRITE;
  +   }
  +
  +   if (roles  RDMA_MRR_READ_SOURCE)
  +   access_flags |= IB_ACCESS_REMOTE_READ;
  +
  +   if (roles  RDMA_MRR_ATOMIC_DEST)
  +   access_flags |= IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_ATOMIC;
  +
  +   if (roles  RDMA_MRR_MW_BIND)
  +   access_flags |= IB_ACCESS_MW_BIND;
  +
  +   return access_flags;
  +}
  +EXPORT_SYMBOL(rdma_device_access_flags);
  +
struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd,
   struct ib_phys_buf *phys_buf_array,
   int num_phys_buf,
  diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
  index 986fddb..da1eadf 100644
  --- a/include/rdma/ib_verbs.h
  +++ b/include/rdma/ib_verbs.h
  @@ -2494,6 +2494,112 @@ static inline int ib_req_ncomp_notif(struct ib_cq 
  *cq, int wc_cnt)
struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags);
 
/**
  + * rdma_mr_roles - possible roles an RDMA MR will be used for
  + *
  + * This allows a transport independent RDMA application to
  + * create MRs that are usable for all the desired roles w/o
  + * having to understand which access rights are needed.
  + */
  +enum {
  +
  +   /* lkey used in a ib_recv_wr sge */
  +   RDMA_MRR_RECV   = 1,
  +
  +   /* lkey used for IB_WR_SEND* ops in the ib_send_wr sge */
  +   RDMA_MRR_SEND   = (11),
  +
  +   /* rkey used for IB_WR_RDMA_READ* ops in ib_send_wr wr.rdma.rkey */
  +   RDMA_MRR_READ_SOURCE= (12),
  +
  +   /* lkey used for IB_WR_RDMA_READ* ops in the ib_send_wr sge */
  +   RDMA_MRR_READ_DEST  = (13),
  +
  +   /* lkey used for IB_WR_RDMA_WRITE* ops in the ib_send_wr sge */
  +   RDMA_MRR_WRITE_SOURCE   = (14),
  +
  +   /* rkey used for IB_WR_RDMA_WRITE* ops in ib_send_wr wr.rdma.rkey */
  +   RDMA_MRR_WRITE_DEST = (15),
  +
  +   /*
  +* lkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in the ib_send_wr sge
  +*/
  +   RDMA_MRR_ATOMIC_SOURCE  = (16),
  +
  +   /*
  +* rkey used for IB_WR_ATOMIC/MASKED_ATOMIC* ops in ib_send_wr
  +* wr.atomic.rkey
  +*/
  +   RDMA_MRR_ATOMIC_DEST= (17),
  +
  +   /* MR used for a IB_WR_MW_BIND in ib_send_wr wr.bind_mw.bind_info.mr */
  +   RDMA_MRR_MW_BIND= 

RE: [PATCH V5 3/5] RDMA/iser: Limit sg tablesize and max_sectors to device fastreg max depth

2015-07-06 Thread Steve Wise


 -Original Message-
 From: Sagi Grimberg [mailto:sa...@dev.mellanox.co.il]
 Sent: Monday, July 06, 2015 2:51 AM
 To: Steve Wise; dledf...@redhat.com
 Cc: infinip...@intel.com; sa...@mellanox.com; ogerl...@mellanox.com; 
 r...@mellanox.com; linux-rdma@vger.kernel.org;
 e...@mellanox.com; target-de...@vger.kernel.org
 Subject: Re: [PATCH V5 3/5] RDMA/iser: Limit sg tablesize and max_sectors to 
 device fastreg max depth
 
 On 7/5/2015 8:44 PM, Steve Wise wrote:
  Currently the sg tablesize, which dictates fast register page list
  depth to use, does not take into account the limits of the rdma device.
  So adjust it once we discover the device fastreg max depth limit.  Also
  adjust the max_sectors based on the resulting sg tablesize.
 
  Signed-off-by: Steve Wise sw...@opengridcomputing.com
  ---
 
drivers/infiniband/ulp/iser/iscsi_iser.c |9 +
1 files changed, 9 insertions(+), 0 deletions(-)
 
  diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
  b/drivers/infiniband/ulp/iser/iscsi_iser.c
  index 6a594aa..de8730d 100644
  --- a/drivers/infiniband/ulp/iser/iscsi_iser.c
  +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
  @@ -640,6 +640,15 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep,
 SHOST_DIX_GUARD_CRC);
  }
 
  +   /*
  +* Limit the sg_tablesize and max_sectors based on the device
  +* max fastreg page list length.
  +*/
  +   shost-sg_tablesize = min_t(unsigned short, shost-sg_tablesize,
  +   ib_conn-device-dev_attr.max_fast_reg_page_list_len);
  +   shost-max_sectors = min_t(unsigned int,
  +   1024, (shost-sg_tablesize * PAGE_SIZE)  9);
  +
 
 The min statement is meaningless for max_sectors - you do a min between
 default sg_tablesize and frpl length - so the maximum sg_tablesize is
 128 which is 1024 max_sectors.

I'm not following.  What if 
ib_conn-device-dev_attr.max_fast_reg_page_list_len is say, 32?  Then 
shost-sg_tablesize is set to 32, and max_sectors is set to (32*4K)  9 == 256 
512B sectors.

  

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html