Re: [PATCH for-next 1/4] IB/mlx5: Add create_cq extended command

2015-11-10 Thread Matan Barak
On Tue, Nov 10, 2015 at 1:22 AM, Jason Gunthorpe
 wrote:
> On Mon, Nov 09, 2015 at 06:30:54PM +0200, Matan Barak wrote:
>> In order to create a CQ that supports timestamp, mlx5 needs to
>> support the extended create CQ command with the timestamp flag.
>>
>> Signed-off-by: Matan Barak 
>>  drivers/infiniband/hw/mlx5/cq.c   | 7 +++
>>  drivers/infiniband/hw/mlx5/main.c | 3 ++-
>>  2 files changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/hw/mlx5/cq.c 
>> b/drivers/infiniband/hw/mlx5/cq.c
>> index 2d0dbbf..674f857 100644
>> +++ b/drivers/infiniband/hw/mlx5/cq.c
>> @@ -743,6 +743,10 @@ static void destroy_cq_kernel(struct mlx5_ib_dev *dev, 
>> struct mlx5_ib_cq *cq)
>>   mlx5_db_free(dev->mdev, &cq->db);
>>  }
>>
>> +enum {
>> + CQ_CREATE_FLAGS_SUPPORTED = IB_CQ_FLAGS_TIMESTAMP_COMPLETION
>> +};
>> +
>>  struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev,
>>   const struct ib_cq_init_attr *attr,
>>   struct ib_ucontext *context,
>> @@ -766,6 +770,9 @@ struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev,
>>   if (entries < 0)
>>   return ERR_PTR(-EINVAL);
>>
>> + if (attr->flags & ~CQ_CREATE_FLAGS_SUPPORTED)
>> + return ERR_PTR(-EINVAL);
>
> And this is what I was just mentioning to Eli, EINVAL is not the right
> return, and this same comment applies to the places where the above
> was copy and pasted into drivers during the ex patching.
>
> Try for EOPNOTSUPP maybe?
>

Agree. Thanks, I'll fix.

> Jason

Matan

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 1/4] IB/mlx5: Add create_cq extended command

2015-11-10 Thread Matan Barak
On Tue, Nov 10, 2015 at 1:24 AM, Jason Gunthorpe
 wrote:
> On Mon, Nov 09, 2015 at 06:30:54PM +0200, Matan Barak wrote:
>> @@ -1385,7 +1385,8 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
>>   (1ull << IB_USER_VERBS_CMD_CREATE_XSRQ) |
>>   (1ull << IB_USER_VERBS_CMD_OPEN_QP);
>>   dev->ib_dev.uverbs_ex_cmd_mask =
>> - (1ull << IB_USER_VERBS_EX_CMD_QUERY_DEVICE);
>> + (1ull << IB_USER_VERBS_EX_CMD_QUERY_DEVICE) |
>> + (1ull << IB_USER_VERBS_EX_CMD_CREATE_CQ);
>
> Eli posted a series that gets rid of this stuff, can you please
> coordinate?
>

It'll create a dependency between this series and Eli's series, but
I'll change that.

> Jason

Matan

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 2/4] IB/core: Add ib_is_udata_cleared

2015-11-10 Thread Matan Barak
On Mon, Nov 9, 2015 at 9:18 PM, Leon Romanovsky  wrote:
> On Mon, Nov 09, 2015 at 06:30:55PM +0200, Matan Barak wrote:
>>
>> +static inline bool ib_is_udata_cleared(struct ib_udata *udata,
>> +char cleared_char,
>> +size_t offset,
>> +size_t len)
>> +{
>> + short i;
>> +
>> + for (i = 0; i < len; i++) {
> You are comparing "len" which is declared as size_t which is "unsigned" int 
> and "i" which is "short".

Thanks, I'll change i to be unsigned.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 3/4] IB/mlx5: Add support querying timestamp related fields in query_device

2015-11-10 Thread Matan Barak
On Mon, Nov 9, 2015 at 9:26 PM, Leon Romanovsky  wrote:
> On Mon, Nov 09, 2015 at 06:30:56PM +0200, Matan Barak wrote:
>> +
>> + if (uhw->outlen) {
>> + err = ib_copy_to_udata(uhw, &resp, resp.response_length);
>> + if (err)
>> + return err;
>> + }
>> +
>>   return 0;
> What do you think about to rewrite this part of code to be something
> like that?
> +   int ret = 0;
> .
> +   if (uhw->outlen)
> +   ret = ib_copy_to_udata(uhw, &resp, resp.response_length);
> +   return ret;

I'll change that. Thanks.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] staging/rdma/hfi1: set Gen3 half-swing for integrated devices

2015-11-10 Thread Dan Carpenter
Gar...  No.  Please please get rid of the PC() macro.  It makes the code
impossible to understand because instead of hitting CTRL-[ you have
decode it and then manually type out

:cs find g CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_OVERWRITE_ENABLE_SHIFT

which is the length of a typical college essay.  I meant just put a
comment like this:

/*
 * In the hardware spec these are prefixed with:
 * CCE_PCIE_CTRL_...
 * But it is too long to use in code.
 */
#define XMT_MARGIN_GEN1_GEN2_OVERWRITE_ENABLE_MASK 0x1ull

Or probably even better:

#define CCE_PCIE_CTRL (CCE + 0x00C0)
#define LANE_BUNDLE_MASK0x3ull /* CCE_PCIE_CTRL_PCIE_LANE_BUNDLE_MASK */
#define LANE_BUNDLE_SHIFT   0  /* CCE_PCIE_CTRL_PCIE_LANE_BUNDLE_SHIFT 
*/
#define LANE_DELAY_MASK 0xFull /* CCE_PCIE_CTRL_PCIE_LANE_DELAY_MASK */
#define LANE_DELAY_SHIFT2  /* CCE_PCIE_CTRL_PCIE_LANE_DELAY_SHIFT */
#define MARGIN_OVERWRITE_SHIFT  8  /* 
CCE_PCIE_CTRL_XMT_MARGIN_OVERWRITE_ENABLE_SHIFT */
#define MARGIN_SHIFT9  /* CCE_PCIE_CTRL_XMT_MARGIN_SHIFT */
#define MARGIN_G1G2_OVERWRITE_MASK 0x1ull /* 
CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_OVERWRITE_ENABLE_MASK */
#define MARGIN_G1G2_OVERWRITE_SHIFT 12/* 
CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_OVERWRITE_ENABLE_SHIFT */
#define MARGIN_G1G2_MASK0x7ull /* 
CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_MASK */
#define MARGIN_G1G2_SHIFT   13 /* 
CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_SHIFT */

Those lines go over the 80 character limit but it's fine.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-10 Thread Matan Barak
On Tue, Nov 10, 2015 at 1:15 AM, Jason Gunthorpe
 wrote:
> On Tue, Nov 10, 2015 at 01:05:31AM +0200, Eli Cohen wrote:
>> On Mon, Nov 09, 2015 at 03:35:31PM -0700, Jason Gunthorpe wrote:
>> > On Mon, Nov 09, 2015 at 08:48:36AM +0200, Haggai Eran wrote:
>> > > On 08/11/2015 17:04, Matan Barak wrote:
>> > > >> @@ -704,6 +719,10 @@ static ssize_t ib_uverbs_write(struct file 
>> > > >> *filp, const char __user *buf,
>> > > >> > }
>> > > >> >
>> > > >> > command = hdr.command & IB_USER_VERBS_CMD_COMMAND_MASK;
>> > > >> > +   if (verify_command_mask(ib_dev, command)) {
>> > > >> > +   ret = -EINVAL;
>> > > > Why did you replace ENOSYS with EINVAL?
>> > >
>> > > I think ENOSYS is meant only to represent a system call number that
>> > > isn't supported. There was a change in checkpatch that now warns about
>> > > it [1]. I'm not sure the intention was to fix existing uses though.
>> >
>> > Within verbs we should have two kinds of return - not supported
>> > request, and supported request with invalid parameters.
>> >
>> > Maybe use EOPNOTSUPP ?
>> >
>>
>> What about Matan's concern regarding legacy code? Maybe we should
>> leave ENOSYS as that's how it is all over the IB stack code.
>>
>> quote:
>> I think it could break old applications:
>> err = extended_verb(the_first_extension_we_added);
>> if (err == ENOSYS)
>>   err = legacy_verb();
>> if (err)
>> return err;
>>
>> Such applications used the first extension (that was added during the
>> addition of the extended verb) and when they realized it's not
>> supported, they dropped to the legacy verb. This change can now cause
>> the return of -EINVAL an early termination with an error.
>
> Since the change is to make the kernel do the above fall back
> internally, this specific example doesn't make alot of sense to worry
> about. Ie the extended verb won't fail anymore, and if it does the
> legacy one won't work anyhow.
>

The kernel will do the above fallback if the command is a legacy
command wrapped in an extended command (i.e - no extra flags).
If an application uses one of the extended values, and fall back to
legacy verb on ENOSYS, it'll behave differently after this change:
Re-posting this example:

err = extended_verb(*the_first_extension_we_added*); /* Kernel won't
fall back to legacy verbs here, as we use an actual extended request
*/
if (err == ENOSYS)
  err = legacy_verb();
if (err)
return err;

> But if there is something out there that does care about ENOSYS we
> should try to keep it, but don't convert ENOSYS to EINVAL.
>
> Also, when the driver tests the ex flags for support it should be
> returning EOPNOTSUPP or such not EINVAL.. Return codes for the ex
> stuff could stand a good sanity audit.
>
> Jason

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/srp: Fix possible send queue overflow

2015-11-10 Thread Sagi Grimberg




Hi Doug,

Kind reminder for picking this up for 4.4


Doug?

Are you planning to pick this up? Note that this patch
is stable material as well.


Doug? any plans for this patch?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/2] Handle mlx4 max_sge_rd correctly

2015-11-10 Thread Sagi Grimberg



On 28/10/2015 13:28, Sagi Grimberg wrote:

This addresses a specific mlx4 issue where the max_sge_rd
is actually smaller than max_sge (rdma reads with max_sge
entries completes with error).

The second patch removes the explicit work-around from the
iser target code.

Changes from v1:
- Fixed driver rdma segment size to be 16 bytes

Changes from v0:
- Used a dedicated enumeration MLX4_MAX_SGE_RD and added
   a root cause analysis to patch change log.

- Fixed isert qp creation to be max_sge but construct rdma
   work request with the minimum of max_sge and max_sge_rd
   as non-rdma sends (login rsp) take 2 sges (and some devices
   have max_sge_rd = 1.

Sagi Grimberg (2):
   mlx4: Expose correct max_sge_rd limit
   iser-target: Remove explicit mlx4 work-around

  drivers/infiniband/hw/mlx4/main.c   |2 +-
  drivers/infiniband/ulp/isert/ib_isert.c |   13 +++--
  include/linux/mlx4/device.h |   11 +++
  3 files changed, 15 insertions(+), 11 deletions(-)



Doug,

Any reply on this patchset?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 0/3] Introduce device attribute rdma_read_access_flags

2015-11-10 Thread Sagi Grimberg
Instead of each ULP being aware of iWARP/IB protocol in order
to determine the rdma_read access flags, have it accessible
as an attribute in the ib_device.

Patch 2,3 fixes RDS and svcrdma which gave remote access to rdma_reads
unconditionally.

This patchset goes on top of Christoph's device attributes merge into
struct ib_device.

Sagi Grimberg (3):
  IB/core: Expose a device attribute for rdma_read access flags
  svcrdma: Use device rdma_read_access_flags
  RDS_IW: Use device rdma_read_access_flags

 drivers/infiniband/hw/cxgb3/iwch_provider.c  |  2 ++
 drivers/infiniband/hw/cxgb4/provider.c   |  2 ++
 drivers/infiniband/hw/mlx4/main.c|  1 +
 drivers/infiniband/hw/mlx5/main.c|  1 +
 drivers/infiniband/hw/mthca/mthca_provider.c |  1 +
 drivers/infiniband/hw/nes/nes_verbs.c|  2 ++
 drivers/infiniband/hw/ocrdma/ocrdma_main.c   |  1 +
 drivers/infiniband/hw/qib/qib_verbs.c|  1 +
 drivers/staging/rdma/hfi1/verbs.c|  2 ++
 include/rdma/ib_verbs.h  |  1 +
 net/rds/iw_send.c| 17 ++---
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  2 +-
 12 files changed, 25 insertions(+), 8 deletions(-)

-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Sagi Grimberg
Different devices may require different access permissions
for rdma reads e.g. for Infiniband devices, local write access
suffices while iWARP devices require remote write permissions as
well.

This situation generates extra logic for ULPs that need to be aware
of the underlying device to determine rdma read access. Instead,
add a device attribute for ULPs to use without being aware of the
underlying device.

Also, set rdma_read_access_flags in the relevant device drivers:
mlx4, mlx5, qib, ocrdma, nes: IB_ACCESS_LOCAL_WRITE
cxgb3, cxgb4, hfi: IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE

Signed-off-by: Sagi Grimberg 
---
 drivers/infiniband/hw/cxgb3/iwch_provider.c  | 2 ++
 drivers/infiniband/hw/cxgb4/provider.c   | 2 ++
 drivers/infiniband/hw/mlx4/main.c| 1 +
 drivers/infiniband/hw/mlx5/main.c| 1 +
 drivers/infiniband/hw/mthca/mthca_provider.c | 1 +
 drivers/infiniband/hw/nes/nes_verbs.c| 2 ++
 drivers/infiniband/hw/ocrdma/ocrdma_main.c   | 1 +
 drivers/infiniband/hw/qib/qib_verbs.c| 1 +
 drivers/staging/rdma/hfi1/verbs.c| 2 ++
 include/rdma/ib_verbs.h  | 1 +
 10 files changed, 14 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index db4c08c53ae3..483ae3c8a13b 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1467,6 +1467,8 @@ int iwch_register_device(struct iwch_dev *dev)
dev->ibdev.max_pd = dev->attr.max_pds;
dev->ibdev.local_ca_ack_delay = 0;
dev->ibdev.max_fast_reg_page_list_len = T3_MAX_FASTREG_DEPTH;
+   ibdev->rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE |
+   IB_ACCESS_REMOTE_WRITE;
 
ret = ib_register_device(&dev->ibdev, NULL);
if (ret)
diff --git a/drivers/infiniband/hw/cxgb4/provider.c 
b/drivers/infiniband/hw/cxgb4/provider.c
index 84e9b5db0341..001db1cc737c 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -564,6 +564,8 @@ int c4iw_register_device(struct c4iw_dev *dev)
dev->ibdev.max_pd = T4_MAX_NUM_PD;
dev->ibdev.local_ca_ack_delay = 0;
dev->ibdev.max_fast_reg_page_list_len = t4_max_fr_depth(use_dsgl);
+   dev->ibdev.rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE |
+   IB_ACCESS_REMOTE_WRITE;
 
ret = ib_register_device(&dev->ibdev, NULL);
if (ret)
diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 46305dc27f9f..82af524dfb4e 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -524,6 +524,7 @@ static int mlx4_ib_init_device_flags(struct ib_device 
*ibdev)
ibdev->max_map_per_fmr = dev->dev->caps.max_fmr_maps;
ibdev->hca_core_clock = dev->dev->caps.hca_core_clock * 1000UL;
ibdev->timestamp_mask = 0xULL;
+   ibdev->rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE;
 
if (!mlx4_is_slave(dev->dev))
err = mlx4_get_internal_clock_params(dev->dev, &clock_params);
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 5b733221ca5f..b29f79a03494 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -287,6 +287,7 @@ static int mlx5_ib_init_device_flags(struct ib_device 
*ibdev)
ibdev->max_total_mcast_qp_attach = ibdev->max_mcast_qp_attach *
   ibdev->max_mcast_grp;
ibdev->max_map_per_fmr = INT_MAX; /* no limit in ConnectIB */
+   ibdev->rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE;
 
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
if (MLX5_CAP_GEN(mdev, pg))
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c 
b/drivers/infiniband/hw/mthca/mthca_provider.c
index 28d7a8bee3f7..ea1882bbe7a5 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -110,6 +110,7 @@ static int mthca_init_device_flags(struct ib_device *ibdev)
ibdev->max_mcast_qp_attach = MTHCA_QP_PER_MGM;
ibdev->max_total_mcast_qp_attach = ibdev->max_mcast_qp_attach *
   ibdev->max_mcast_grp;
+   ibdev->rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE;
/*
 * If Sinai memory key optimization is being used, then only
 * the 8-bit key portion will change.  For other HCAs, the
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c 
b/drivers/infiniband/hw/nes/nes_verbs.c
index 2ef911953e38..adcea173f8dc 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -3868,6 +3868,8 @@ struct nes_ib_device *nes_init_ofa_device(struct 
net_device *netdev)
nesibdev->ibdev.max_mw = nesibdev->max_mr;
nesibdev->ibdev.max_pd = nesibdev->max_pd;
nesibdev->ibdev.max_sge_rd = 1

[PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Sagi Grimberg
Instead of hard-coding remote access (which is not secured
issue in IB).

Signed-off-by: Sagi Grimberg 
---
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 303f194970f9..692c0142c7b6 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -238,7 +238,7 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
read = min_t(int, nents << PAGE_SHIFT, rs_length);
 
frmr->direction = DMA_FROM_DEVICE;
-   frmr->access_flags = (IB_ACCESS_LOCAL_WRITE|IB_ACCESS_REMOTE_WRITE);
+   frmr->access_flags = xprt->sc_cm_id->device->rdma_read_access_flags;
frmr->sg_nents = nents;
 
for (pno = 0; pno < nents; pno++) {
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 3/3] RDS_IW: Use device rdma_read_access_flags

2015-11-10 Thread Sagi Grimberg
Instead of hard-coding remote access (which is not secured
issue in IB).

Signed-off-by: Sagi Grimberg 
---
 net/rds/iw_send.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/net/rds/iw_send.c b/net/rds/iw_send.c
index e20bd503f4bd..47d684541342 100644
--- a/net/rds/iw_send.c
+++ b/net/rds/iw_send.c
@@ -763,7 +763,8 @@ out:
 
 static int rds_iw_build_send_reg(struct rds_iw_send_work *send,
 struct scatterlist *sg,
-int sg_nents)
+int sg_nents,
+int access_flags)
 {
int n;
 
@@ -776,7 +777,7 @@ static int rds_iw_build_send_reg(struct rds_iw_send_work 
*send,
send->s_reg_wr.wr.num_sge = 0;
send->s_reg_wr.mr = send->s_mr;
send->s_reg_wr.key = send->s_mr->rkey;
-   send->s_reg_wr.access = IB_ACCESS_REMOTE_WRITE;
+   send->s_reg_wr.access = access_flags;
 
ib_update_fast_reg_key(send->s_mr, send->s_remap_count++);
 
@@ -786,6 +787,7 @@ static int rds_iw_build_send_reg(struct rds_iw_send_work 
*send,
 int rds_iw_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
 {
struct rds_iw_connection *ic = conn->c_transport_data;
+   struct ib_device *device = ic->i_cm_id->device;
struct rds_iw_send_work *send = NULL;
struct rds_iw_send_work *first;
struct rds_iw_send_work *prev;
@@ -803,11 +805,11 @@ int rds_iw_xmit_rdma(struct rds_connection *conn, struct 
rm_rdma_op *op)
int num_sge;
int sg_nents;
 
-   rds_iwdev = ib_get_client_data(ic->i_cm_id->device, &rds_iw_client);
+   rds_iwdev = ib_get_client_data(device, &rds_iw_client);
 
/* map the message the first time we see it */
if (!op->op_mapped) {
-   op->op_count = ib_dma_map_sg(ic->i_cm_id->device,
+   op->op_count = ib_dma_map_sg(device,
 op->op_sg, op->op_nents, 
(op->op_write) ?
 DMA_TO_DEVICE : DMA_FROM_DEVICE);
rdsdebug("ic %p mapping op %p: %d\n", ic, op, op->op_count);
@@ -896,12 +898,12 @@ int rds_iw_xmit_rdma(struct rds_connection *conn, struct 
rm_rdma_op *op)
 
for (j = 0; j < send->s_rdma_wr.wr.num_sge &&
 scat != &op->op_sg[op->op_count]; j++) {
-   len = ib_sg_dma_len(ic->i_cm_id->device, scat);
+   len = ib_sg_dma_len(device, scat);
 
if (send->s_rdma_wr.wr.opcode == 
IB_WR_RDMA_READ_WITH_INV)
sg_nents++;
else {
-   send->s_sge[j].addr = 
ib_sg_dma_address(ic->i_cm_id->device, scat);
+   send->s_sge[j].addr = ib_sg_dma_address(device, 
scat);
send->s_sge[j].length = len;
send->s_sge[j].lkey = rds_iw_local_dma_lkey(ic);
}
@@ -947,7 +949,8 @@ int rds_iw_xmit_rdma(struct rds_connection *conn, struct 
rm_rdma_op *op)
 */
if (!op->op_write) {
ret = rds_iw_build_send_reg(&ic->i_sends[fr_pos],
-   &op->op_sg[0], sg_nents);
+   &op->op_sg[0], sg_nents,
+   device->rdma_read_access_flags);
if (ret) {
printk(KERN_WARNING "RDS/IW: failed to reg send mem\n");
goto out;
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH infiniband-diags] ibportstate.c: Fix unsigned comparison warnings

2015-11-10 Thread Hal Rosenstock

src/ibportstate.c:450: warning: comparison of unsigned expression < 0 is always 
false
src/ibportstate.c:454: warning: comparison of unsigned expression < 0 is always 
false
src/ibportstate.c:458: warning: comparison of unsigned expression < 0 is always 
false
src/ibportstate.c:462: warning: comparison of unsigned expression < 0 is always 
false
src/ibportstate.c:483: warning: comparison of unsigned expression < 0 is always 
false
src/ibportstate.c:500: warning: comparison of unsigned expression < 0 is always 
false
src/ibportstate.c:504: warning: comparison of unsigned expression < 0 is always 
false

Signed-off-by: Hal Rosenstock 
---
diff --git a/src/ibportstate.c b/src/ibportstate.c
index 47e9133..cb47aa9 100644
--- a/src/ibportstate.c
+++ b/src/ibportstate.c
@@ -447,40 +447,40 @@ int main(int argc, char **argv)
val = strtoull(argv[i], 0, 0);
switch (j) {
case SPEED:
-   if (val < 0 || val > 15)
+   if (val > 15)
IBEXIT("invalid speed value %ld", val);
break;
case ESPEED:
-   if (val < 0 || val > 31)
+   if (val > 31)
IBEXIT("invalid extended speed value 
%ld", val);
break;
case FDR10SPEED:
-   if (val < 0 || val > 1)
+   if (val > 1)
IBEXIT("invalid fdr10 speed value %ld", 
val);
break;
case WIDTH:
-   if (val < 0 || (val > 15 && val != 255))
+   if ((val > 15 && val != 255))
IBEXIT("invalid width value %ld", val);
break;
case VLS:
-   if (val <= 0 || val > 5)
+   if (val == 0 || val > 5)
IBEXIT("invalid vls value %ld", val);
break;
case MTU:
-   if (val <= 0 || val > 5)
+   if (val == 0 || val > 5)
IBEXIT("invalid mtu value %ld", val);
break;
case LID:
-   if (val <= 0 || val >= 0xC000)
+   if (val == 0 || val >= 0xC000)
IBEXIT("invalid lid value 0x%lx", val);
break;
case SMLID:
-   if (val <= 0 || val >= 0xC000)
+   if (val == 0 || val >= 0xC000)
IBEXIT("invalid smlid value 0x%lx",
val);
break;
case LMC:
-   if (val < 0 || val > 7)
+   if (val > 7)
IBEXIT("invalid lmc value %ld", val);
break;
case MKEY:
@@ -497,11 +497,11 @@ int main(int argc, char **argv)
/* All 64-bit values are legal */
break;
case MKEYLEASE:
-   if (val < 0 || val > 0x)
+   if (val > 0x)
IBEXIT("invalid mkey lease time %ld", 
val);
break;
case MKEYPROT:
-   if (val < 0 || val > 3)
+   if (val > 3)
IBEXIT("invalid mkey protection bit 
setting %ld", val);
}
*port_args[j].val = val;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH infiniband-diags] Remove unused pisize parameter from dump_portinfo in ibdiag_common

2015-11-10 Thread Hal Rosenstock

src/ibdiag_common.c: In function ?dump_portinfo?:
src/ibdiag_common.c:856: warning: unused parameter ?pisize?

Signed-off-by: Hal Rosenstock 
---
diff --git a/include/ibdiag_common.h b/include/ibdiag_common.h
index af5b3c4..21b0522 100644
--- a/include/ibdiag_common.h
+++ b/include/ibdiag_common.h
@@ -150,7 +150,7 @@ int vsnprint_field(char *buf, size_t n, enum MAD_FIELDS f, 
int spacing,
   const char *format, va_list va_args);
 int snprint_field(char *buf, size_t n, enum MAD_FIELDS f, int spacing,
  const char *format, ...);
-void dump_portinfo(void *pi, int pisize, int tabs);
+void dump_portinfo(void *pi, int tabs);
 
 /**
  * Some common command line parsing
diff --git a/src/ibdiag_common.c b/src/ibdiag_common.c
index 5ec0167..3ebdbb9 100644
--- a/src/ibdiag_common.c
+++ b/src/ibdiag_common.c
@@ -853,7 +853,7 @@ int snprint_field(char *buf, size_t n, enum MAD_FIELDS f, 
int spacing,
return ret;
 }
 
-void dump_portinfo(void *pi, int pisize, int tabs)
+void dump_portinfo(void *pi, int tabs)
 {
int field, i;
char val[64];
diff --git a/src/saquery.c b/src/saquery.c
index cc8d8dc..bd70b7b 100644
--- a/src/saquery.c
+++ b/src/saquery.c
@@ -304,7 +304,7 @@ static void dump_one_portinfo_record(void *data, struct 
query_params *p)
   "\t\tOptions.0x%x\n"
   "\tPortInfo dump:\n",
   cl_ntoh16(pir->lid), pir->port_num, pir->options);
-   dump_portinfo(pi, sizeof(*pi), 2);
+   dump_portinfo(pi, 2);
 }
 
 static void dump_one_mcmember_record(void *data, struct query_params *p)
diff --git a/src/smpquery.c b/src/smpquery.c
index 187ef61..2bd7132 100644
--- a/src/smpquery.c
+++ b/src/smpquery.c
@@ -161,7 +161,7 @@ static char *port_info(ib_portid_t * dest, char **argv, int 
argc)
return "port info query failed";
 
printf("# Port info: %s port %d\n", portid2str(dest), orig_portnum);
-   dump_portinfo(data, sizeof data, 0);
+   dump_portinfo(data, 0);
return 0;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH infiniband-diags] ibsendtrap.c: Eliminate unused parameter from send_trap

2015-11-10 Thread Hal Rosenstock

src/ibsendtrap.c: In function ?send_trap?:
src/ibsendtrap.c:124: warning: unused parameter ?name?

Signed-off-by: Hal Rosenstock 
---
diff --git a/src/ibsendtrap.c b/src/ibsendtrap.c
index 46fd6e7..659f2d2 100644
--- a/src/ibsendtrap.c
+++ b/src/ibsendtrap.c
@@ -121,8 +121,7 @@ static void build_trap129(ib_mad_notice_attr_t * n, 
ib_portid_t * port)
n->data_details.ntc_129_131.port_num = (uint8_t) error_port;
 }
 
-static int send_trap(const char *name,
-void (*build) (ib_mad_notice_attr_t *, ib_portid_t *))
+static int send_trap(void (*build) (ib_mad_notice_attr_t *, ib_portid_t *))
 {
ib_portid_t sm_port;
ib_portid_t selfportid;
@@ -169,7 +168,7 @@ int process_send_trap(char *trap_name)
 
for (i = 0; traps[i].trap_name; i++)
if (strcmp(traps[i].trap_name, trap_name) == 0)
-   return send_trap(trap_name, traps[i].build_func);
+   return send_trap(traps[i].build_func);
ibdiag_show_usage();
return 1;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Christoph Hellwig
On Tue, Nov 10, 2015 at 12:44:13PM +0200, Sagi Grimberg wrote:
> Different devices may require different access permissions
> for rdma reads e.g. for Infiniband devices, local write access
> suffices while iWARP devices require remote write permissions as
> well.
> 
> This situation generates extra logic for ULPs that need to be aware
> of the underlying device to determine rdma read access. Instead,
> add a device attribute for ULPs to use without being aware of the
> underlying device.
> 
> Also, set rdma_read_access_flags in the relevant device drivers:
> mlx4, mlx5, qib, ocrdma, nes: IB_ACCESS_LOCAL_WRITE
> cxgb3, cxgb4, hfi: IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE

>From all I can tell nes also is a iWarp driver.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 3/3] RDS_IW: Use device rdma_read_access_flags

2015-11-10 Thread Christoph Hellwig
Looks reasonable, although currently this code is only used for iWarp
anyway.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] Introduce device attribute rdma_read_access_flags

2015-11-10 Thread Christoph Hellwig
On Tue, Nov 10, 2015 at 12:44:12PM +0200, Sagi Grimberg wrote:
> Instead of each ULP being aware of iWARP/IB protocol in order
> to determine the rdma_read access flags, have it accessible
> as an attribute in the ib_device.
> 
> Patch 2,3 fixes RDS and svcrdma which gave remote access to rdma_reads
> unconditionally.
> 
> This patchset goes on top of Christoph's device attributes merge into
> struct ib_device.

FYI, I've updated the git branch to be based on current linus' tree
which required a few bit to be fixed.  I'd also like to note that while
everyone but Or seemed to be generally fine with it I'd really prefer
and actualy revivewed-by or acked-by tag.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Christoph Hellwig
On Tue, Nov 10, 2015 at 12:44:14PM +0200, Sagi Grimberg wrote:
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -238,7 +238,7 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
>   read = min_t(int, nents << PAGE_SHIFT, rs_length);
>  
>   frmr->direction = DMA_FROM_DEVICE;
> - frmr->access_flags = (IB_ACCESS_LOCAL_WRITE|IB_ACCESS_REMOTE_WRITE);
> + frmr->access_flags = xprt->sc_cm_id->device->rdma_read_access_flags;
>   frmr->sg_nents = nents;

I think you can simply drop the access_flags member and use
rdma_read_access_flags directly later in the function.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Christoph Hellwig
Oh, and while we're at it.  Can someone explain why we're even
using rdma_read_chunk_frmr for IB?  It seems to work around the
fact tat iWarp only allow a single RDMA READ SGE, but it's used
whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS, which seems
wrong.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Sagi Grimberg



On 10/11/2015 13:38, Christoph Hellwig wrote:

On Tue, Nov 10, 2015 at 12:44:14PM +0200, Sagi Grimberg wrote:

--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -238,7 +238,7 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
read = min_t(int, nents << PAGE_SHIFT, rs_length);

frmr->direction = DMA_FROM_DEVICE;
-   frmr->access_flags = (IB_ACCESS_LOCAL_WRITE|IB_ACCESS_REMOTE_WRITE);
+   frmr->access_flags = xprt->sc_cm_id->device->rdma_read_access_flags;
frmr->sg_nents = nents;


I think you can simply drop the access_flags member and use
rdma_read_access_flags directly later in the function.



I will.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Sagi Grimberg



On 10/11/2015 13:41, Christoph Hellwig wrote:

Oh, and while we're at it.  Can someone explain why we're even
using rdma_read_chunk_frmr for IB?  It seems to work around the
fact tat iWarp only allow a single RDMA READ SGE, but it's used
whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS, which seems
wrong.


I think Steve can answer it better than I can. I think that it is
just to have a single code path for both IB and iWARP. I agree that
the condition seems wrong and for small transfers rdma_read_chunk_frmr
is really a performance loss.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Yann Droneaud
Hi,

Le mardi 10 novembre 2015 à 12:44 +0200, Sagi Grimberg a écrit :
> Different devices may require different access permissions
> for rdma reads e.g. for Infiniband devices, local write access
> suffices while iWARP devices require remote write permissions as
> well.
> 
> This situation generates extra logic for ULPs that need to be aware
> of the underlying device to determine rdma read access. Instead,
> add a device attribute for ULPs to use without being aware of the
> underlying device.
> 
> Also, set rdma_read_access_flags in the relevant device drivers:
> mlx4, mlx5, qib, ocrdma, nes: IB_ACCESS_LOCAL_WRITE
> cxgb3, cxgb4, hfi: IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE
> 

Why were those hw providers not modified to
enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to
set it for them ?

Regards.

-- 
Yann Droneaud
OPTEYA

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 3/3] RDS_IW: Use device rdma_read_access_flags

2015-11-10 Thread Sagi Grimberg

Looks reasonable, although currently this code is only used for iWarp
anyway.


I know... I'm hoping this will change at some point, and when it does,
it will get it right hopefully.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Sagi Grimberg



 From all I can tell nes also is a iWarp driver.


It is.. I don't know why I treated it as IB :)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] Introduce device attribute rdma_read_access_flags

2015-11-10 Thread Sagi Grimberg




FYI, I've updated the git branch to be based on current linus' tree
which required a few bit to be fixed.  I'd also like to note that while
everyone but Or seemed to be generally fine with it I'd really prefer
and actualy revivewed-by or acked-by tag.


You can add:

Tested-by: Sagi Grimberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Christoph Hellwig
On Tue, Nov 10, 2015 at 01:46:40PM +0200, Sagi Grimberg wrote:
> 
> 
> On 10/11/2015 13:41, Christoph Hellwig wrote:
> >Oh, and while we're at it.  Can someone explain why we're even
> >using rdma_read_chunk_frmr for IB?  It seems to work around the
> >fact tat iWarp only allow a single RDMA READ SGE, but it's used
> >whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS, which seems
> >wrong.
> 
> I think Steve can answer it better than I can. I think that it is
> just to have a single code path for both IB and iWARP. I agree that
> the condition seems wrong and for small transfers rdma_read_chunk_frmr
> is really a performance loss.

Well, the code path already exists, but only is used fi
IB_DEVICE_MEM_MGT_EXTENSIONS isn't set.  Below is an untested patch
that demonstrates how I think svcrdma should setup the reads.  Note
that this also allows to entirely remove it's allphys MR.

Note that as a followon this would also allow to remove the
non-READ_W_INV code path from rdma_read_chunk_frmr as a future
step.


diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index f869807..35fa638 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -147,7 +147,6 @@ struct svcxprt_rdma {
struct ib_qp *sc_qp;
struct ib_cq *sc_rq_cq;
struct ib_cq *sc_sq_cq;
-   struct ib_mr *sc_phys_mr;   /* MR for server memory */
int  (*sc_reader)(struct svcxprt_rdma *,
  struct svc_rqst *,
  struct svc_rdma_op_ctxt *,
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index ff4f01e..a410cb6 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -160,7 +160,7 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
goto err;
atomic_inc(&xprt->sc_dma_used);
 
-   /* The lkey here is either a local dma lkey or a dma_mr lkey */
+   /* The lkey here is the local dma lkey */
ctxt->sge[pno].lkey = xprt->sc_dma_lkey;
ctxt->sge[pno].length = len;
ctxt->count++;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 9f3eb89..2780da4 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -887,8 +887,6 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt 
*xprt)
struct ib_cq_init_attr cq_attr = {};
struct ib_qp_init_attr qp_attr;
struct ib_device *dev;
-   int uninitialized_var(dma_mr_acc);
-   int need_dma_mr = 0;
int ret = 0;
int i;
 
@@ -986,68 +984,41 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt 
*xprt)
}
newxprt->sc_qp = newxprt->sc_cm_id->qp;
 
-   /*
-* Use the most secure set of MR resources based on the
-* transport type and available memory management features in
-* the device. Here's the table implemented below:
-*
-*  FastGlobal  DMA Remote WR
-*  Reg LKEYMR  Access
-*  Sup'd   Sup'd   Needed  Needed
-*
-* IWARPN   N   Y   Y
-*  N   Y   Y   Y
-*  Y   N   Y   N
-*  Y   Y   N   -
-*
-* IB   N   N   Y   N
-*  N   Y   N   -
-*  Y   N   Y   N
-*  Y   Y   N   -
-*
-* NB:  iWARP requires remote write access for the data sink
-*  of an RDMA_READ. IB does not.
-*/
-   newxprt->sc_reader = rdma_read_chunk_lcl;
-   if (dev->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) {
-   newxprt->sc_frmr_pg_list_len =
-   dev->max_fast_reg_page_list_len;
-   newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_FAST_REG;
-   newxprt->sc_reader = rdma_read_chunk_frmr;
-   }
+   if (rdma_protocol_iwarp(dev, newxprt->sc_cm_id->port_num)) {
+   /*
+* iWARP requires remote write access for the data sink, and
+* only supports a single SGE for RDMA_READ requests, so we'll
+* have to use a memory registration for each RDMA_READ.
+*/
+   if (!(dev->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS)) {
+   /*
+* We're an iWarp device but don't support FRs.
+* Tought luck, better exit early as we have little
+* chance of doing something useful.
+*/
+   goto errout;
+   }
 
- 

[PATCH] IB: start documenting device capabilities

2015-11-10 Thread Christoph Hellwig
Just IB_DEVICE_LOCAL_DMA_LKEY and IB_DEVICE_MEM_MGT_EXTENSIONS for now
as I'm most familar with those.

Signed-off-by: Christoph Hellwig 
---
 include/rdma/ib_verbs.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 45ce36e..6034f92 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -120,6 +120,14 @@ enum ib_device_cap_flags {
IB_DEVICE_RC_RNR_NAK_GEN= (1<<12),
IB_DEVICE_SRQ_RESIZE= (1<<13),
IB_DEVICE_N_NOTIFY_CQ   = (1<<14),
+
+   /*
+* This device supports a per-device lkey or stag that can be
+* used without performing a memory registration for the local
+* memory.  Note that ULPs should never check this flag, but
+* instead of use the local_dma_lkey flag in the ib_pd structure,
+* which will always contain a usable lkey.
+*/
IB_DEVICE_LOCAL_DMA_LKEY= (1<<15),
IB_DEVICE_RESERVED  = (1<<16), /* old SEND_W_INV */
IB_DEVICE_MEM_WINDOW= (1<<17),
@@ -133,6 +141,16 @@ enum ib_device_cap_flags {
IB_DEVICE_UD_IP_CSUM= (1<<18),
IB_DEVICE_UD_TSO= (1<<19),
IB_DEVICE_XRC   = (1<<20),
+
+   /*
+* This device supports the IB "base memory management extension",
+* which includes support for fast registrations (IB_WR_REG_MR,
+* IB_WR_LOCAL_INV and IB_WR_SEND_WITH_INV verbs).  This flag should
+* also be set by any iWarp device which must support FRs to comply
+* to the iWarp verbs spec.  iWarp devices also support the
+* IB_WR_RDMA_READ_WITH_INV verb for RDMA READs that invalidate the
+* stag.
+*/
IB_DEVICE_MEM_MGT_EXTENSIONS= (1<<21),
IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (1<<22),
IB_DEVICE_MEM_WINDOW_TYPE_2A= (1<<23),
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Sagi Grimberg

Hi Yann,



Why were those hw providers not modified to
enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to
set it for them ?


Do you mean that ULPs will set IB_ACCESS_LOCAL_WRITE and
iWARP providers executing the memory registration will add
IB_ACCESS_REMOTE_WRITE? That's even better I think! Given that
changing the access rights under the user legs is something we'd want
to do. is it?

Given that for IB memory registration is not even needed for rdma_read,
I'd like to move away from code like:

if (rdma_protocol_iwarp())
register memory for rdma_read
else
go ahead with ib_sge for rdma_read

I've maid some initial experimentation on trying to abstract this
aspect from ULPs but it's far from being complete and needs more work.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Sagi Grimberg



On 10/11/2015 14:28, Sagi Grimberg wrote:

Hi Yann,



Why were those hw providers not modified to
enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to
set it for them ?


Do you mean that ULPs will set IB_ACCESS_LOCAL_WRITE and
iWARP providers executing the memory registration will add
IB_ACCESS_REMOTE_WRITE? That's even better I think! Given that
changing the access rights under the user legs is something we'd want
to do. is it?


Oh, forgot that the providers do not know how this key will be used.
Thanks Tom for reminding me.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Tom Talpey

On 11/10/2015 6:51 AM, Yann Droneaud wrote:

Le mardi 10 novembre 2015 à 12:44 +0200, Sagi Grimberg a écrit :

Also, set rdma_read_access_flags in the relevant device drivers:
mlx4, mlx5, qib, ocrdma, nes: IB_ACCESS_LOCAL_WRITE
cxgb3, cxgb4, hfi: IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE



Why were those hw providers not modified to
enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to
set it for them ?


Because the Linux verbs don't tell the driver whether the registration
is for an RDMA Read sink buffer.

Sagi, the Windows NDKPI has an NDK_MR_FLAG_RDMA_READ_SINK attribute
which the upper layer can use to convey this information, I've mentioned
it here before.
https://msdn.microsoft.com/en-us/library/windows/hardware/hh439908(v=vs.85).aspx

When this approach is used, the upper layer doesn't have to be aware
at all of the lower layer's details.

Tom.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Sagi Grimberg



Sagi, the Windows NDKPI has an NDK_MR_FLAG_RDMA_READ_SINK attribute
which the upper layer can use to convey this information, I've mentioned
it here before.
https://msdn.microsoft.com/en-us/library/windows/hardware/hh439908(v=vs.85).aspx



Thanks for the tip Tom.



When this approach is used, the upper layer doesn't have to be aware
at all of the lower layer's details.


Yea, we could do that too. Any preferences from other people?
I'm pretty indifferent on which way to go...

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB: start documenting device capabilities

2015-11-10 Thread Sagi Grimberg



+
+   /*
+* This device supports the IB "base memory management extension",
+* which includes support for fast registrations (IB_WR_REG_MR,
+* IB_WR_LOCAL_INV and IB_WR_SEND_WITH_INV verbs).  This flag should
+* also be set by any iWarp device which must support FRs to comply
+* to the iWarp verbs spec.  iWarp devices also support the
+* IB_WR_RDMA_READ_WITH_INV verb for RDMA READs that invalidate the
+* stag.
+*/


Kinda weird that READ_WITH_INV came in without a device cap for it.

Looks good,

Reviewed-by: Sagi Grimberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Christoph Hellwig
On Tue, Nov 10, 2015 at 02:42:44PM +0200, Sagi Grimberg wrote:
> >When this approach is used, the upper layer doesn't have to be aware
> >at all of the lower layer's details.
> 
> Yea, we could do that too. Any preferences from other people?
> I'm pretty indifferent on which way to go...

Yes, that's the way to go.  I though we had agree on doing that as
one of the next step of the MR API cleanups, but decided to postpone
it past the first iteration for some reason that made it non-trivial.

I have to say I still don't like the Windows API either, though as it
still keeps the criptic {local,remote}_{read,write} flags.

Basically what we want is two-dimension selection:

/*
 * If %true this is the target of RDMA READ/WRITE operations
 * from the remote system.  If %false this is the sink / source
 * for RDMA READ/WRITE operations issued by the local system.
 */
enum ib_mr_scope { IB_MR_REMOTE, IB_MR_LOCAL };

/*
 * Operation we're registering for.  Can be OR'ed together
 * when allowing RDMA READs and WRITEs on a single MR.
 */
enum ib_mr_dir { IB_MR_RDMA_READ = 1 << 0, IB_MR_RDMA_WRITE = 1 << 1 };

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Christoph Hellwig
FYI, this is the API I'd aim for (only SRP and no HW driver converted
yet):

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 0e21367..7ea695c 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1484,14 +1484,15 @@ EXPORT_SYMBOL(ib_check_mr_status);
 int ib_map_mr_sg(struct ib_mr *mr,
 struct scatterlist *sg,
 int sg_nents,
-unsigned int page_size)
+unsigned int page_size,
+unsigned int flags)
 {
if (unlikely(!mr->device->map_mr_sg))
return -ENOSYS;
 
mr->page_size = page_size;
 
-   return mr->device->map_mr_sg(mr, sg, sg_nents);
+   return mr->device->map_mr_sg(mr, sg, sg_nents, flags);
 }
 EXPORT_SYMBOL(ib_map_mr_sg);
 
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 62b6cba..d77a5b4 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1314,7 +1314,6 @@ static int srp_map_finish_fr(struct srp_map_state *state,
struct srp_target_port *target = ch->target;
struct srp_device *dev = target->srp_host->srp_dev;
struct ib_send_wr *bad_wr;
-   struct ib_reg_wr wr;
struct srp_fr_desc *desc;
u32 rkey;
int n, err;
@@ -1342,20 +1341,17 @@ static int srp_map_finish_fr(struct srp_map_state 
*state,
ib_update_fast_reg_key(desc->mr, rkey);
 
n = ib_map_mr_sg(desc->mr, state->sg, state->sg_nents,
-dev->mr_page_size);
+dev->mr_page_size,
+/*
+ * XXX: add a bool write argument to this function,
+ * so that we only need to open up the required
+ * permissions.
+ */
+IB_MR_REMOTE | IB_MR_RDMA_READ | IB_MR_RDMA_WRITE);
if (unlikely(n < 0))
return n;
 
-   wr.wr.next = NULL;
-   wr.wr.opcode = IB_WR_REG_MR;
-   wr.wr.wr_id = FAST_REG_WR_ID_MASK;
-   wr.wr.num_sge = 0;
-   wr.wr.send_flags = 0;
-   wr.mr = desc->mr;
-   wr.key = desc->mr->rkey;
-   wr.access = (IB_ACCESS_LOCAL_WRITE |
-IB_ACCESS_REMOTE_READ |
-IB_ACCESS_REMOTE_WRITE);
+   desc->mr->wr.wr_id = FAST_REG_WR_ID_MASK;
 
*state->fr.next++ = desc;
state->nmdesc++;
@@ -1363,7 +1359,7 @@ static int srp_map_finish_fr(struct srp_map_state *state,
srp_map_desc(state, desc->mr->iova,
 desc->mr->length, desc->mr->rkey);
 
-   err = ib_post_send(ch->qp, &wr.wr, &bad_wr);
+   err = ib_post_send(ch->qp, &desc->mr->wr, &bad_wr);
if (unlikely(err))
return err;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 83d6ee8..b168b3a 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1105,18 +1105,6 @@ static inline struct ib_ud_wr *ud_wr(struct ib_send_wr 
*wr)
return container_of(wr, struct ib_ud_wr, wr);
 }
 
-struct ib_reg_wr {
-   struct ib_send_wr   wr;
-   struct ib_mr*mr;
-   u32 key;
-   int access;
-};
-
-static inline struct ib_reg_wr *reg_wr(struct ib_send_wr *wr)
-{
-   return container_of(wr, struct ib_reg_wr, wr);
-}
-
 struct ib_bind_mw_wr {
struct ib_send_wr   wr;
struct ib_mw*mw;
@@ -1314,7 +1302,18 @@ struct ib_qp {
enum ib_qp_type qp_type;
 };
 
+enum ib_mr_flags {
+   /* scope: either remote or local */
+   IB_MR_REMOTE,
+   IB_MR_LOCAL,
+
+   /* direction: one or both can be ORed into the scope above */
+   IB_MR_RDMA_READ = (1 << 10),
+   IB_MR_RDMA_WRITE= (1 << 11)
+};
+
 struct ib_mr {
+   struct ib_send_wr  wr;
struct ib_device  *device;
struct ib_pd  *pd;
struct ib_uobject *uobject;
@@ -1326,6 +1325,11 @@ struct ib_mr {
atomic_t   usecnt; /* count number of MWs */
 };
 
+static inline struct ib_mr *wr_to_mr(struct ib_send_wr *wr)
+{
+   return container_of(wr, struct ib_mr, wr);
+}
+
 struct ib_mw {
struct ib_device*device;
struct ib_pd*pd;
@@ -1706,7 +1710,8 @@ struct ib_device {
   u32 max_num_sg);
int(*map_mr_sg)(struct ib_mr *mr,
struct scatterlist *sg,
-   int sg_nents);
+   int sg_nents,
+   unsigned int flags);
int(*rereg_phys_mr)(struct ib_mr *mr,
int mr_rereg_mask,
struc

[PATCH 1/2] staging/rdma/hfi1: add common routine for queuing acks

2015-11-10 Thread Mike Marciniszyn
This patch is a prelimary patch required to
coalesce acks.

The routine to "schedule" a QP for sending a NAK is
now centralized in rc_defer_ack().  The flag is changed
for clarity since the all acks will potentially use
the deferral  mechanism.

Reviewed-by: Dennis Dalessandro 
Signed-off-by: Mike Marciniszyn 
---
 drivers/staging/rdma/hfi1/driver.c |4 ++-
 drivers/staging/rdma/hfi1/rc.c |   42 +---
 drivers/staging/rdma/hfi1/verbs.h  |   12 ++
 3 files changed, 24 insertions(+), 34 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/driver.c 
b/drivers/staging/rdma/hfi1/driver.c
index ce69141..e8e0f55 100644
--- a/drivers/staging/rdma/hfi1/driver.c
+++ b/drivers/staging/rdma/hfi1/driver.c
@@ -721,8 +721,8 @@ static inline void process_rcv_qp_work(struct hfi1_packet 
*packet)
 */
list_for_each_entry_safe(qp, nqp, &rcd->qp_wait_list, rspwait) {
list_del_init(&qp->rspwait);
-   if (qp->r_flags & HFI1_R_RSP_NAK) {
-   qp->r_flags &= ~HFI1_R_RSP_NAK;
+   if (qp->r_flags & HFI1_R_RSP_DEFERED_ACK) {
+   qp->r_flags &= ~HFI1_R_RSP_DEFERED_ACK;
hfi1_send_rc_ack(rcd, qp, 0);
}
if (qp->r_flags & HFI1_R_RSP_SEND) {
diff --git a/drivers/staging/rdma/hfi1/rc.c b/drivers/staging/rdma/hfi1/rc.c
index 0b19206..6fe0104 100644
--- a/drivers/staging/rdma/hfi1/rc.c
+++ b/drivers/staging/rdma/hfi1/rc.c
@@ -1608,6 +1608,16 @@ bail:
return;
 }
 
+static inline void rc_defered_ack(struct hfi1_ctxtdata *rcd,
+ struct hfi1_qp *qp)
+{
+   if (list_empty(&qp->rspwait)) {
+   qp->r_flags |= HFI1_R_RSP_DEFERED_ACK;
+   atomic_inc(&qp->refcount);
+   list_add_tail(&qp->rspwait, &rcd->qp_wait_list);
+   }
+}
+
 /**
  * rc_rcv_error - process an incoming duplicate or error RC packet
  * @ohdr: the other headers for this packet
@@ -1650,11 +1660,7 @@ static noinline int rc_rcv_error(struct 
hfi1_other_headers *ohdr, void *data,
 * in the receive queue have been processed.
 * Otherwise, we end up propagating congestion.
 */
-   if (list_empty(&qp->rspwait)) {
-   qp->r_flags |= HFI1_R_RSP_NAK;
-   atomic_inc(&qp->refcount);
-   list_add_tail(&qp->rspwait, &rcd->qp_wait_list);
-   }
+   rc_defered_ack(rcd, qp);
}
goto done;
}
@@ -2337,11 +2343,7 @@ rnr_nak:
qp->r_nak_state = IB_RNR_NAK | qp->r_min_rnr_timer;
qp->r_ack_psn = qp->r_psn;
/* Queue RNR NAK for later */
-   if (list_empty(&qp->rspwait)) {
-   qp->r_flags |= HFI1_R_RSP_NAK;
-   atomic_inc(&qp->refcount);
-   list_add_tail(&qp->rspwait, &rcd->qp_wait_list);
-   }
+   rc_defered_ack(rcd, qp);
return;
 
 nack_op_err:
@@ -2349,11 +2351,7 @@ nack_op_err:
qp->r_nak_state = IB_NAK_REMOTE_OPERATIONAL_ERROR;
qp->r_ack_psn = qp->r_psn;
/* Queue NAK for later */
-   if (list_empty(&qp->rspwait)) {
-   qp->r_flags |= HFI1_R_RSP_NAK;
-   atomic_inc(&qp->refcount);
-   list_add_tail(&qp->rspwait, &rcd->qp_wait_list);
-   }
+   rc_defered_ack(rcd, qp);
return;
 
 nack_inv_unlck:
@@ -2363,11 +2361,7 @@ nack_inv:
qp->r_nak_state = IB_NAK_INVALID_REQUEST;
qp->r_ack_psn = qp->r_psn;
/* Queue NAK for later */
-   if (list_empty(&qp->rspwait)) {
-   qp->r_flags |= HFI1_R_RSP_NAK;
-   atomic_inc(&qp->refcount);
-   list_add_tail(&qp->rspwait, &rcd->qp_wait_list);
-   }
+   rc_defered_ack(rcd, qp);
return;
 
 nack_acc_unlck:
@@ -2421,13 +2415,7 @@ void hfi1_rc_hdrerr(
 * Otherwise, we end up
 * propagating congestion.
 */
-   if (list_empty(&qp->rspwait)) {
-   qp->r_flags |= HFI1_R_RSP_NAK;
-   atomic_inc(&qp->refcount);
-   list_add_tail(
-   &qp->rspwait,
-   &rcd->qp_wait_list);
-   }
+   rc_defered_ack(rcd, qp);
} /* Out of sequence NAK */
} /* QP Request NAKs */
 }
diff --git a/drivers/staging/rdma/hfi1/verbs.h 
b/drivers/staging/rdma/hfi1/verbs.h
index e4a8a0d..c5e6f47 100644
--- a/drivers/staging/rdma/hfi1/verbs.h
+++ b/drivers/staging/rdma/hfi1/verbs.h
@@ -547,11 +547,13 @@ struct hfi1_qp {
 /*
  * Bit definitions for r_flags.
  */
-#define HFI1_R_REUSE_SGE 0x01
-#define HFI1_R_RDMAR_SEQ 0x02
-#define HFI1_R_RSP_NAK

[PATCH 2/2] staging/rdma/hfi1: add ACK coalescing logic

2015-11-10 Thread Mike Marciniszyn
Implement ACK coalesing logic using a 8 bit counter.

The algorithm is send pio ack when:
- fecn present
- this is the first packet in an interrupt session
- counter is >= HFI1_PSN_CREDIT

Otherwise the ack is defered.

Reviewed-by: Dennis Dalessandro 
Signed-off-by: Mike Marciniszyn 
---
 drivers/staging/rdma/hfi1/qp.c|1 +
 drivers/staging/rdma/hfi1/rc.c|   29 +++--
 drivers/staging/rdma/hfi1/verbs.h |7 ---
 3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/qp.c b/drivers/staging/rdma/hfi1/qp.c
index df1fa56..8f867ba 100644
--- a/drivers/staging/rdma/hfi1/qp.c
+++ b/drivers/staging/rdma/hfi1/qp.c
@@ -378,6 +378,7 @@ static void reset_qp(struct hfi1_qp *qp, enum ib_qp_type 
type)
}
qp->s_ack_state = IB_OPCODE_RC_ACKNOWLEDGE;
qp->r_nak_state = 0;
+   qp->r_adefered = 0;
qp->r_aflags = 0;
qp->r_flags = 0;
qp->s_head = 0;
diff --git a/drivers/staging/rdma/hfi1/rc.c b/drivers/staging/rdma/hfi1/rc.c
index 6fe0104..0193497 100644
--- a/drivers/staging/rdma/hfi1/rc.c
+++ b/drivers/staging/rdma/hfi1/rc.c
@@ -1618,6 +1618,17 @@ static inline void rc_defered_ack(struct hfi1_ctxtdata 
*rcd,
}
 }
 
+static inline void rc_cancel_ack(struct hfi1_qp *qp)
+{
+   qp->r_adefered = 0;
+   if (list_empty(&qp->rspwait))
+   return;
+   list_del_init(&qp->rspwait);
+   qp->r_flags &= ~HFI1_R_RSP_DEFERED_ACK;
+   if (atomic_dec_and_test(&qp->refcount))
+   wake_up(&qp->wait);
+}
+
 /**
  * rc_rcv_error - process an incoming duplicate or error RC packet
  * @ohdr: the other headers for this packet
@@ -2335,8 +2346,22 @@ send_last:
qp->r_ack_psn = psn;
qp->r_nak_state = 0;
/* Send an ACK if requested or required. */
-   if (psn & (1 << 31))
-   goto send_ack;
+   if (psn & IB_BTH_REQ_ACK) {
+   if (packet->numpkt == 0) {
+   rc_cancel_ack(qp);
+   goto send_ack;
+   }
+   if (qp->r_adefered >= HFI1_PSN_CREDIT) {
+   rc_cancel_ack(qp);
+   goto send_ack;
+   }
+   if (unlikely(is_fecn)) {
+   rc_cancel_ack(qp);
+   goto send_ack;
+   }
+   qp->r_adefered++;
+   rc_defered_ack(rcd, qp);
+   }
return;
 
 rnr_nak:
diff --git a/drivers/staging/rdma/hfi1/verbs.h 
b/drivers/staging/rdma/hfi1/verbs.h
index c5e6f47..6d2012b 100644
--- a/drivers/staging/rdma/hfi1/verbs.h
+++ b/drivers/staging/rdma/hfi1/verbs.h
@@ -120,9 +120,9 @@ struct hfi1_packet;
 
 #define HFI1_VENDOR_IPGcpu_to_be16(0xFFA0)
 
-#define IB_BTH_REQ_ACK (1 << 31)
-#define IB_BTH_SOLICITED   (1 << 23)
-#define IB_BTH_MIG_REQ (1 << 22)
+#define IB_BTH_REQ_ACK BIT(31)
+#define IB_BTH_SOLICITED   BIT(23)
+#define IB_BTH_MIG_REQ BIT(22)
 
 #define IB_GRH_VERSION 6
 #define IB_GRH_VERSION_MASK0xF
@@ -484,6 +484,7 @@ struct hfi1_qp {
u32 r_psn;  /* expected rcv packet sequence number */
u32 r_msn;  /* message sequence number */
 
+   u8 r_adefered; /* number of acks defered */
u8 r_state; /* opcode of last packet received */
u8 r_flags;
u8 r_head_ack_queue;/* index into s_ack_queue[] */

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Chuck Lever

> On Nov 10, 2015, at 7:04 AM, Christoph Hellwig  wrote:
> 
>> On Tue, Nov 10, 2015 at 01:46:40PM +0200, Sagi Grimberg wrote:
>> 
>> 
>>> On 10/11/2015 13:41, Christoph Hellwig wrote:
>>> Oh, and while we're at it.  Can someone explain why we're even
>>> using rdma_read_chunk_frmr for IB?  It seems to work around the
>>> fact tat iWarp only allow a single RDMA READ SGE, but it's used
>>> whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS, which seems
>>> wrong.
>> 
>> I think Steve can answer it better than I can. I think that it is
>> just to have a single code path for both IB and iWARP. I agree that
>> the condition seems wrong and for small transfers rdma_read_chunk_frmr
>> is really a performance loss.
> 
> Well, the code path already exists, but only is used fi
> IB_DEVICE_MEM_MGT_EXTENSIONS isn't set.  Below is an untested patch
> that demonstrates how I think svcrdma should setup the reads.  Note
> that this also allows to entirely remove it's allphys MR.

Two comments:

1. RFCs and changes in sunrpc must be copied to linux-nfs.

2. Is there a reason IB is not using the allphys MR for RDMA Read?
That would be much more efficient. On the NFS server that MR isn't
exposed, thus it is safe to use.


> Note that as a followon this would also allow to remove the
> non-READ_W_INV code path from rdma_read_chunk_frmr as a future
> step.
> 
> 
> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
> index f869807..35fa638 100644
> --- a/include/linux/sunrpc/svc_rdma.h
> +++ b/include/linux/sunrpc/svc_rdma.h
> @@ -147,7 +147,6 @@ struct svcxprt_rdma {
>struct ib_qp *sc_qp;
>struct ib_cq *sc_rq_cq;
>struct ib_cq *sc_sq_cq;
> -struct ib_mr *sc_phys_mr;/* MR for server memory */
>int (*sc_reader)(struct svcxprt_rdma *,
>  struct svc_rqst *,
>  struct svc_rdma_op_ctxt *,
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index ff4f01e..a410cb6 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -160,7 +160,7 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
>goto err;
>atomic_inc(&xprt->sc_dma_used);
> 
> -/* The lkey here is either a local dma lkey or a dma_mr lkey */
> +/* The lkey here is the local dma lkey */
>ctxt->sge[pno].lkey = xprt->sc_dma_lkey;
>ctxt->sge[pno].length = len;
>ctxt->count++;
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
> b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 9f3eb89..2780da4 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -887,8 +887,6 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt 
> *xprt)
>struct ib_cq_init_attr cq_attr = {};
>struct ib_qp_init_attr qp_attr;
>struct ib_device *dev;
> -int uninitialized_var(dma_mr_acc);
> -int need_dma_mr = 0;
>int ret = 0;
>int i;
> 
> @@ -986,68 +984,41 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt 
> *xprt)
>}
>newxprt->sc_qp = newxprt->sc_cm_id->qp;
> 
> -/*
> - * Use the most secure set of MR resources based on the
> - * transport type and available memory management features in
> - * the device. Here's the table implemented below:
> - *
> - *FastGlobalDMARemote WR
> - *RegLKEYMRAccess
> - *Sup'dSup'dNeededNeeded
> - *
> - * IWARPNNYY
> - *NYYY
> - *YNYN
> - *YYN-
> - *
> - * IBNNYN
> - *NYN-
> - *YNYN
> - *YYN-
> - *
> - * NB:iWARP requires remote write access for the data sink
> - *of an RDMA_READ. IB does not.
> - */
> -newxprt->sc_reader = rdma_read_chunk_lcl;
> -if (dev->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) {
> -newxprt->sc_frmr_pg_list_len =
> -dev->max_fast_reg_page_list_len;
> -newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_FAST_REG;
> -newxprt->sc_reader = rdma_read_chunk_frmr;
> -}
> +if (rdma_protocol_iwarp(dev, newxprt->sc_cm_id->port_num)) {
> +/*
> + * iWARP requires remote write access for the data sink, and
> + * only supports a single SGE for RDMA_READ requests, so we'll
> + * have to use a memory registration for each RDMA_READ.
> + */
> +if (!(dev->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS)) {
> +/*
> + * We're an iWarp device but don't support FRs.
> + * Tought luck, better exit early as we have little
> + * chance of doing something useful.
> + */
> +goto errout;
> +}
> 
> -/*
> -

Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Tom Talpey

On 11/10/2015 9:34 AM, Chuck Lever wrote:

2. Is there a reason IB is not using the allphys MR for RDMA Read?
That would be much more efficient. On the NFS server that MR isn't
exposed, thus it is safe to use.


True, but only if the layout of the buffer's physical pages do not
exceed the local RDMA Read scatter limit. If the size is large then
the privileged MR will require additional RDMA Read operations, which
can be more expensive. It's certainly a valid optimization, if the
buffer "fits".

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-10 Thread Eli Cohen
On Mon, Nov 09, 2015 at 04:30:59PM -0700, Jason Gunthorpe wrote:
> On Tue, Nov 10, 2015 at 01:24:31AM +0200, Eli Cohen wrote:
> > > Also, when the driver tests the ex flags for support it should be
> > > returning EOPNOTSUPP or such not EINVAL.. Return codes for the ex
> > > stuff could stand a good sanity audit.
> > > 
> > 
> > #define EOPNOTSUPP  95  /* Operation not supported on
> > transport endpoint */
> > 
> > This does not seem like an ideal choise either. I think ENOSYS in this
> > case is a better choise.
> 
> Call it ENOTSUP then:
> 
>ENOTSUP Operation not supported (POSIX.1)
> 
> Same value on Linux.
> 

Yes, that's better. I will resend.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-10 Thread Eli Cohen
On Tue, Nov 10, 2015 at 05:23:10PM +0200, Eli Cohen wrote:
> > 
> > Call it ENOTSUP then:
> > 
> >ENOTSUP Operation not supported (POSIX.1)
> > 
> > Same value on Linux.
> > 
> 
> Yes, that's better. I will resend.


Well it seems like ENOTSUP is defined only here:

./arch/parisc/include/uapi/asm/errno.h:115:#define ENOTSUP 252 /* Function 
not implemented (POSIX.4 / HPUX) */

Which obviously I cannot use. Jason, do you have another idea?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Steve Wise

On 11/10/2015 5:46 AM, Sagi Grimberg wrote:



On 10/11/2015 13:41, Christoph Hellwig wrote:

Oh, and while we're at it.  Can someone explain why we're even
using rdma_read_chunk_frmr for IB?  It seems to work around the
fact tat iWarp only allow a single RDMA READ SGE, but it's used
whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS, which seems
wrong.


I think Steve can answer it better than I can. I think that it is
just to have a single code path for both IB and iWARP. I agree that
the condition seems wrong and for small transfers rdma_read_chunk_frmr
is really a performance loss.


This was probably just an oversight/mistake.  The focus was on enabling 
iWARP at the time.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] Introduce device attribute rdma_read_access_flags

2015-11-10 Thread Steve Wise

On 11/10/2015 5:31 AM, Christoph Hellwig wrote:

On Tue, Nov 10, 2015 at 12:44:12PM +0200, Sagi Grimberg wrote:

Instead of each ULP being aware of iWARP/IB protocol in order
to determine the rdma_read access flags, have it accessible
as an attribute in the ib_device.

Patch 2,3 fixes RDS and svcrdma which gave remote access to rdma_reads
unconditionally.

This patchset goes on top of Christoph's device attributes merge into
struct ib_device.

FYI, I've updated the git branch to be based on current linus' tree
which required a few bit to be fixed.  I'd also like to note that while
everyone but Or seemed to be generally fine with it I'd really prefer
and actualy revivewed-by or acked-by tag.
--


Acked-by: Steve Wise 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Chuck Lever

> On Nov 10, 2015, at 6:38 AM, Christoph Hellwig  wrote:
> 
> On Tue, Nov 10, 2015 at 12:44:14PM +0200, Sagi Grimberg wrote:
>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> @@ -238,7 +238,7 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
>>  read = min_t(int, nents << PAGE_SHIFT, rs_length);
>> 
>>  frmr->direction = DMA_FROM_DEVICE;
>> -frmr->access_flags = (IB_ACCESS_LOCAL_WRITE|IB_ACCESS_REMOTE_WRITE);
>> +frmr->access_flags = xprt->sc_cm_id->device->rdma_read_access_flags;
>>  frmr->sg_nents = nents;
> 
> I think you can simply drop the access_flags member and use
> rdma_read_access_flags directly later in the function.

> Oh, and while we're at it.  Can someone explain why we're even
> using rdma_read_chunk_frmr for IB?  It seems to work around the
> fact tat iWarp only allow a single RDMA READ SGE, but it's used
> whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS, which seems
> wrong.

Christoph: I agree on both points.

Sagi: the overall approach looks fine to me. Will wait for you
to repost with Christoph’s first suggestion addressed.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


srp state in current mainline

2015-11-10 Thread Christoph Hellwig
I've just tried forward porting some work affecting SRP from a 4.1-ish
base, and started to run into error ASAP on current Linus' HEAD and also
4.3.  In current HEAD memory registrations on the client seem to fail,
probably due to the MR rework, but even on 4.3 I run into crazy
corruption reports from xfstests, which mostly seem to be slab
poisoning.  I'm not sure at this point if they are caused by the
target or initiator, but I'd like to share them.  This is a simply
xfstests run using XFS on a remote LIO ramdisk.

4.3 (actually -rc, but I didn't see any change since):

[   86.316719] run fstests generic/018 at 2015-11-10 08:52:56
[   86.558749] XFS (sdb): Mounting V4 Filesystem
[   86.798915] XFS (sdb): Ending clean mount
[   86.887999] XFS (sdc): Mounting V4 Filesystem
[   86.894340] XFS (sdc): Ending clean mount
[   86.980603] XFS (sdc): Unmounting Filesystem
[   86.992347] XFS (sdc): Mounting V4 Filesystem
[   86.999572] XFS (sdc): Ending clean mount
[   87.052941] XFS (sdc): Unmounting Filesystem
[   87.065080] XFS (sdc): Mounting V4 Filesystem
[   87.070402] XFS (sdc): Ending clean mount
[   87.124831] XFS (sdc): Unmounting Filesystem
[   87.136828] XFS (sdc): Mounting V4 Filesystem
[   87.144746] XFS (sdc): Ending clean mount
[   87.157052] XFS (sdc): Metadata corruption detected at 
xfs_agi_read_verify+0x4a/0xe0 [xfs], block 0x2
[   87.166374] XFS (sdc): Unmount and run xfs_repair
[   87.171161] XFS (sdc): First 64 bytes of corrupted metadata buffer:
[   87.177515] 880314c2ce00: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b  
[   87.186312] 880314c2ce10: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b  
[   87.195107] 880314c2ce20: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b  
[   87.203902] 880314c2ce30: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b  
[   87.212716] XFS (sdc): metadata I/O error: block 0x2
("xfs_trans_read_buf_map") error 117 numblks 1
[   87.221816] XFS (sdc): xfs_do_force_shutdown(0x1) called from line
315 of file fs/xfs/xfs_trans_buf.c.  Return address = 0xa067138c
   87.221828] XFS (sdc): I/O Error Detected. Shutting down filesystem
   [   87.228132] XFS (sdc): Please umount the filesystem and rectify
   the problem(s)
   [   87.328890] XFS (sdc): xfs_log_force: error -5 returned.
   [   87.328897] XFS (sdc): Unmounting Filesystem
   [   87.328906] XFS (sdc): xfs_log_force: error -5 returned.
   [   87.328921] XFS (sdc): xfs_log_force: error -5 returned.
   [   87.432013] run fstests generic/020 at 2015-11-10 08:52:57

and then a couple times more until xfstests eventually gives up.

On current Linus' HEAD tree:

[  128.786534] run fstests generic/001 at 2015-11-10 09:05:06
[  132.914320] XFS (sdb): Unmounting Filesystem
[  132.914463] [ cut here ]
[  132.914474] WARNING: CPU: 3 PID: 1795 at 
drivers/infiniband/ulp/srp/ib_srp.c:1262 srp_map_desc+0x64/ 0x80 [ib_srp]()
[  132.914476] Modules linked in: xfs libcrc32c ib_srp(O) scsi_transport_srp 
nfsd auth_rpcgss oid_regis
try nfs_acl nfs lockd grace fscache sunrpc intel_powerclamp coretemp kvm_intel 
kvm irqbypass crct10dif_
pclmul crc32_pclmul sha256_ssse3 sha256_generic hmac drbg mgag200 ttm 
ansi_cprng drm_kms_helper aesni_i
ntel drm aes_x86_64 lrw gf128mul snd_pcm glue_helper ablk_helper snd_timer 
evdev cryptd ipmi_devintf i7
core_edac shpchp iTCO_wdt iTCO_vendor_support psmouse snd soundcore 
i2c_algo_bit i2c_core lpc_ich serio
_raw edac_core dcdbas acpi_power_meter pcspkr mfd_core acpi_cpufreq button 
tpm_tis ipmi_si ipmi_msghand
ler tpm ib_ipoib ib_umad rdma_ucm rdma_cm iw_cm ib_uverbs ib_cm mlx4_ib ib_sa 
ib_mad ib_core ib_addr au
tofs4 ext4 crc16 mbcache jbd2 sd_mod sg sr_mod cdrom ata_generic crc32c_intel 
ehci_pci uhci_hcd
[  132.914541]  ehci_hcd mptsas ata_piix scsi_transport_sas mptscsih
libata mptbase mlx4_core usbcore s
csi_mod usb_common bnx2
[  132.914552] CPU: 3 PID: 1795 Comm: xfsaild/sdb Tainted: G  IO 4.3.0+ 
#28
[  132.914554] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS 3.0.0 
01/31/2011
[  132.914556]  a05aa460 812c9453  
8106ef51
[  132.914559]  880313acfa60 880313acfa30 0006153efe80 
0001
[  132.914562]  880620a310c0 a05a3a14 08012207 
880313acfa60
[  132.914566] Call Trace:
[  132.914572]  [] ? dump_stack+0x40/0x5d
[  132.914578]  [] ? warn_slowpath_common+0x81/0xb0
[  132.914582]  [] ? srp_map_desc+0x64/0x80 [ib_srp]
[  132.914585]  [] ? srp_map_finish_fr+0x159/0x1f0 [ib_srp]
[  132.914589]  [] ? srp_map_idb.isra.39+0xf1/0x150 [ib_srp]
[  132.914593]  [] ? srp_queuecommand+0xc21/0xc70 [ib_srp]
[  132.914600]  [] ? scsi_init_sgtable+0x3f/0x70 [scsi_mod]
[  132.914607]  [] ? scsi_dispatch_cmd+0xd8/0x1f0 [scsi_mod]
[  132.914613]  [] ? scsi_request_fn+0x46a/0x600 [scsi_mod]
[  132.914619]  [] ? __blk_run_queue+0x2f/0x40
[  132.914624]  [] ? cfq_insert_request+0x2f3/0x530
[  13

Re: [PATCH ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-10 Thread Jason Gunthorpe
On Tue, Nov 10, 2015 at 05:40:26PM +0200, Eli Cohen wrote:
> On Tue, Nov 10, 2015 at 05:23:10PM +0200, Eli Cohen wrote:
> > > 
> > > Call it ENOTSUP then:
> > > 
> > >ENOTSUP Operation not supported (POSIX.1)
> > > 
> > > Same value on Linux.
> > 
> > Yes, that's better. I will resend.
> 
> 
> Well it seems like ENOTSUP is defined only here:
> 
> ./arch/parisc/include/uapi/asm/errno.h:115:#define ENOTSUP 252 /* 
> Function not implemented (POSIX.4 / HPUX) */

> Which obviously I cannot use. Jason, do you have another idea?

I would stick with EOPNOTSUPP in the kernel and userspace can view
that as ENOTSUP. My remark was more along the lines that EOPNOTSUPP is
not socket specific because it encompasses ENOTSUP as well on Linux,
due to having the same value.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-10 Thread Jason Gunthorpe
On Tue, Nov 10, 2015 at 12:25:41PM +0200, Matan Barak wrote:
> The kernel will do the above fallback if the command is a legacy
> command wrapped in an extended command (i.e - no extra flags).
> If an application uses one of the extended values, and fall back to
> legacy verb on ENOSYS, it'll behave differently after this change:
> Re-posting this example:

Again, that doesn't seem like a likely case today, the extended verbs
added so far don't strike me as optional, and even so, if someone is
coding that kind of fall back it is incorrect to just look for ENOSYS,
trying to turn on an optional feature could fail for many different
reasons.

If someone is actually doing this, then we can work around it, but
the kernel seems to be a bit more fluid on error returns - probably
because people get them so wrong so often :(

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 ib-next 0/3] IB core uverbs support for leagacy commands

2015-11-10 Thread Eli Cohen
Hi Doug,

this patcheset is addresses comments from the community. Specifically if the 
verbs is not supported by a hardware driver, we return -EOPNOTSUPP.

Eli

Eli Cohen (3):
  IB/core: Avoid duplicate code
  IB/core: IB/core: Allow legacy verbs through extended interfaces
  IB/core: Modify conditional on ucontext existence

 drivers/infiniband/core/uverbs_main.c | 70 +--
 1 file changed, 34 insertions(+), 36 deletions(-)

-- 
2.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 ib-next 3/3] IB/core: Modify conditional on ucontext existence

2015-11-10 Thread Eli Cohen
Since we allow to call legacy verbs using their extended counterpart,
the check on ucontext has to move up to a common area in case this verb
is ever extended.

Signed-off-by: Eli Cohen 
---
 drivers/infiniband/core/uverbs_main.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_main.c 
b/drivers/infiniband/core/uverbs_main.c
index ecb907385b85..5a0416050074 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -724,6 +724,12 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
goto out;
}
 
+   if (!file->ucontext &&
+   command != IB_USER_VERBS_CMD_GET_CONTEXT) {
+   ret = -EINVAL;
+   goto out;
+   }
+
flags = (hdr.command &
 IB_USER_VERBS_CMD_FLAGS_MASK) >> IB_USER_VERBS_CMD_FLAGS_SHIFT;
 
@@ -734,12 +740,6 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
goto out;
}
 
-   if (!file->ucontext &&
-   command != IB_USER_VERBS_CMD_GET_CONTEXT) {
-   ret = -EINVAL;
-   goto out;
-   }
-
if (hdr.in_words * 4 != count) {
ret = -EINVAL;
goto out;
-- 
2.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-10 Thread Eli Cohen
When an extended verbs is an extension to a legacy verb, the original
functionality is preserved. Hence we do not require each hardware driver
to set the extended capability. This will allow to use the extended verb
in its simple form with drivers that do not support the extended
capability.

Signed-off-by: Eli Cohen 
---

Changes from previous version:
If verify_command_mask fails return -EOPNOTSUPP.

 drivers/infiniband/core/uverbs_main.c | 29 +++--
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_main.c 
b/drivers/infiniband/core/uverbs_main.c
index e93ba9125198..ecb907385b85 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -672,6 +672,21 @@ out:
return ev_file;
 }
 
+static int verify_command_mask(struct ib_device *ib_dev, __u32 command)
+{
+   u64 mask;
+
+   if (command <= IB_USER_VERBS_CMD_OPEN_QP)
+   mask = ib_dev->uverbs_cmd_mask;
+   else
+   mask = ib_dev->uverbs_ex_cmd_mask;
+
+   if (mask & ((u64)1 << command))
+   return 0;
+
+   return -1;
+}
+
 static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
 size_t count, loff_t *pos)
 {
@@ -704,6 +719,10 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
}
 
command = hdr.command & IB_USER_VERBS_CMD_COMMAND_MASK;
+   if (verify_command_mask(ib_dev, command)) {
+   ret = -EOPNOTSUPP;
+   goto out;
+   }
 
flags = (hdr.command &
 IB_USER_VERBS_CMD_FLAGS_MASK) >> IB_USER_VERBS_CMD_FLAGS_SHIFT;
@@ -721,11 +740,6 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
goto out;
}
 
-   if (!(ib_dev->uverbs_cmd_mask & (1ull << command))) {
-   ret = -ENOSYS;
-   goto out;
-   }
-
if (hdr.in_words * 4 != count) {
ret = -EINVAL;
goto out;
@@ -753,11 +767,6 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
goto out;
}
 
-   if (!(ib_dev->uverbs_ex_cmd_mask & (1ull << command))) {
-   ret = -ENOSYS;
-   goto out;
-   }
-
if (count < (sizeof(hdr) + sizeof(ex_hdr))) {
ret = -EINVAL;
goto out;
-- 
2.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 ib-next 1/3] IB/core: Avoid duplicate code

2015-11-10 Thread Eli Cohen
Move the check on the validity of the command to a common area.

Signed-off-by: Eli Cohen 
---
 drivers/infiniband/core/uverbs_main.c | 29 +
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_main.c 
b/drivers/infiniband/core/uverbs_main.c
index e3ef28861be6..e93ba9125198 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -678,6 +678,7 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
struct ib_uverbs_file *file = filp->private_data;
struct ib_device *ib_dev;
struct ib_uverbs_cmd_hdr hdr;
+   __u32 command;
__u32 flags;
int srcu_key;
ssize_t ret;
@@ -696,20 +697,18 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
goto out;
}
 
+   if (hdr.command & ~(__u32)(IB_USER_VERBS_CMD_FLAGS_MASK |
+  IB_USER_VERBS_CMD_COMMAND_MASK)) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   command = hdr.command & IB_USER_VERBS_CMD_COMMAND_MASK;
+
flags = (hdr.command &
 IB_USER_VERBS_CMD_FLAGS_MASK) >> IB_USER_VERBS_CMD_FLAGS_SHIFT;
 
if (!flags) {
-   __u32 command;
-
-   if (hdr.command & ~(__u32)(IB_USER_VERBS_CMD_FLAGS_MASK |
-  IB_USER_VERBS_CMD_COMMAND_MASK)) {
-   ret = -EINVAL;
-   goto out;
-   }
-
-   command = hdr.command & IB_USER_VERBS_CMD_COMMAND_MASK;
-
if (command >= ARRAY_SIZE(uverbs_cmd_table) ||
!uverbs_cmd_table[command]) {
ret = -EINVAL;
@@ -738,21 +737,11 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
 hdr.out_words * 4);
 
} else if (flags == IB_USER_VERBS_CMD_FLAG_EXTENDED) {
-   __u32 command;
-
struct ib_uverbs_ex_cmd_hdr ex_hdr;
struct ib_udata ucore;
struct ib_udata uhw;
size_t written_count = count;
 
-   if (hdr.command & ~(__u32)(IB_USER_VERBS_CMD_FLAGS_MASK |
-  IB_USER_VERBS_CMD_COMMAND_MASK)) {
-   ret = -EINVAL;
-   goto out;
-   }
-
-   command = hdr.command & IB_USER_VERBS_CMD_COMMAND_MASK;
-
if (command >= ARRAY_SIZE(uverbs_ex_cmd_table) ||
!uverbs_ex_cmd_table[command]) {
ret = -ENOSYS;
-- 
2.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Jason Gunthorpe
On Tue, Nov 10, 2015 at 12:44:13PM +0200, Sagi Grimberg wrote:
> Different devices may require different access permissions
> for rdma reads e.g. for Infiniband devices, local write access
> suffices while iWARP devices require remote write permissions as
> well.
> 
> This situation generates extra logic for ULPs that need to be aware
> of the underlying device to determine rdma read access. Instead,
> add a device attribute for ULPs to use without being aware of the
> underlying device.
> 
> Also, set rdma_read_access_flags in the relevant device drivers:
> mlx4, mlx5, qib, ocrdma, nes: IB_ACCESS_LOCAL_WRITE
> cxgb3, cxgb4, hfi: IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE

IB devices *should* be using a local_dma_lkey instead of a memory
registration for the RDMA READ target memory.

iWarp devices *must* use a memory registration instead.

Having an API that encourages ULPs to do useless MR work on IB does
not seem like a great idea, but if the ULP needs to create a MR anyhow
(SG limits or whatever), then it makes some sense.. But not in absence
of the 'need a mr' general check..

Finally, please don't add rdma_read_access_flags - we went to a lot of
trouble to add the cap stuff, and this stuff is already covered by it.
No need to change every driver.

I'd suggest something like

  unsigned int rdma_cap_rdma_read_mr_flags(const struct ib_pd *pd)
  {
 if (rdma_protocol_iwarp(pd->device, rdma_start_port(pd->device)))
   return IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE;
 return IB_ACCESS_LOCAL_WRITE;
  }

  bool rdma_cap_need_rdma_read_mr(const struct ib_pd *pd) ...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Jason Gunthorpe
On Tue, Nov 10, 2015 at 12:51:10PM +0100, Yann Droneaud wrote:
> Why were those hw providers not modified to
> enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to
> set it for them ?

iWarp hardware simply cannot do it it all.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-10 Thread Jason Gunthorpe
On Tue, Nov 10, 2015 at 08:00:09PM +0200, Eli Cohen wrote:
> When an extended verbs is an extension to a legacy verb, the original
> functionality is preserved. Hence we do not require each hardware driver
> to set the extended capability. This will allow to use the extended verb
> in its simple form with drivers that do not support the extended
> capability.

Can we please get rid of uverbs_cmd_mask and uverbs_ex_cmd_mask ?

The driver should set the function pointers it does not support to
NULL. We don't need the bitmask because we already know the answer.

For performance, I recommend having the core replace the NULL op
pointers from the driver with a stub that returns ENOTSUP, and then
drop all the checking in the fast path.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB: start documenting device capabilities

2015-11-10 Thread Jason Gunthorpe
On Tue, Nov 10, 2015 at 01:19:43PM +0100, Christoph Hellwig wrote:
> Just IB_DEVICE_LOCAL_DMA_LKEY and IB_DEVICE_MEM_MGT_EXTENSIONS for now
> as I'm most familar with those.
> 
> Signed-off-by: Christoph Hellwig 

Looks right to me

Reviewed-By: Jason Gunthorpe 

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Jason Gunthorpe
On Tue, Nov 10, 2015 at 04:04:32AM -0800, Christoph Hellwig wrote:
> On Tue, Nov 10, 2015 at 01:46:40PM +0200, Sagi Grimberg wrote:
> > 
> > 
> > On 10/11/2015 13:41, Christoph Hellwig wrote:
> > >Oh, and while we're at it.  Can someone explain why we're even
> > >using rdma_read_chunk_frmr for IB?  It seems to work around the
> > >fact tat iWarp only allow a single RDMA READ SGE, but it's used
> > >whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS, which seems
> > >wrong.
> > 
> > I think Steve can answer it better than I can. I think that it is
> > just to have a single code path for both IB and iWARP. I agree that
> > the condition seems wrong and for small transfers rdma_read_chunk_frmr
> > is really a performance loss.
> 
> Well, the code path already exists, but only is used fi
> IB_DEVICE_MEM_MGT_EXTENSIONS isn't set.  Below is an untested patch
> that demonstrates how I think svcrdma should setup the reads.  Note
> that this also allows to entirely remove it's allphys MR.
> 
> Note that as a followon this would also allow to remove the
> non-READ_W_INV code path from rdma_read_chunk_frmr as a future
> step.

I like this, my only comment is we should have a rdma_cap for this
behavior, rdma_cap_needs_rdma_read_mr(pd) or something?

> + if (rdma_protocol_iwarp(dev, newxprt->sc_cm_id->port_num)) {

Use here

> + /*
> +  * iWARP requires remote write access for the data sink, and
> +  * only supports a single SGE for RDMA_READ requests, so we'll
> +  * have to use a memory registration for each RDMA_READ.
> +  */
> + if (!(dev->device_cap_flags &
> IB_DEVICE_MEM_MGT_EXTENSIONS)) {

Lets enforce this in the core, if rdma_cap_needs_rdma_read_mr is set
the the device must also set IB_DEVICE_MEM_MGT_EXTENSIONS, check at
device creation time.

> + } else if (rdma_ib_or_roce(dev, newxprt->sc_cm_id->port_num)) {
> + /*
> +  * For IB or RoCE life is easy, no unsafe write access is
> +  * required and multiple SGEs are supported, so we don't need
> +  * to use MRs.
> +  */
> + newxprt->sc_reader = rdma_read_chunk_lcl;
> + } else {
> + /*
> +  * Neither iWarp nor IB-ish, we're out of luck.
> +  */
>   goto errout;

No need for the else, !rdma_cap_needs_rdma_read_mr means pd->local_dma_lkey is 
okay
to use.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Jason Gunthorpe
On Tue, Nov 10, 2015 at 05:41:47AM -0800, Christoph Hellwig wrote:
> FYI, this is the API I'd aim for (only SRP and no HW driver converted
> yet):

>   n = ib_map_mr_sg(desc->mr, state->sg, state->sg_nents,
> -  dev->mr_page_size);
> +  dev->mr_page_size,
> +  /*
> +   * XXX: add a bool write argument to this function,
> +   * so that we only need to open up the required
> +   * permissions.
> +   */
> +  IB_MR_REMOTE | IB_MR_RDMA_READ |
> IB_MR_RDMA_WRITE);

I would call it IB_RDMA_LKEY and IB_RDMA_RKEY. We have other places in
the API where lkey/rkey is used and it makes a lot more sense to think
about a MR as being either a lkey or rkey usable MR - since this is
effectively what we are doing here with these ops.

IB_MR_RKEY | IB_MR_RDMA_READ == The resulting key can be used
in a wr.rdma.rkey field

IB_MR_LKEY | IB_MR_SEND == The key can be used in wr.sg_list[].lkey

etc

It is an error to use a IB_MR_LKEY in a rkey field, etc..

> +enum ib_mr_flags {
> + /* scope: either remote or local */
> + IB_MR_REMOTE,
> + IB_MR_LOCAL,
> +
> + /* direction: one or both can be ORed into the scope above */
> + IB_MR_RDMA_READ = (1 << 10),
> + IB_MR_RDMA_WRITE= (1 << 11)

Don't forget SEND too.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-10 Thread Bart Van Assche
On 11/10/2015 07:40 AM, Eli Cohen wrote:
> On Tue, Nov 10, 2015 at 05:23:10PM +0200, Eli Cohen wrote:
>>>
>>> Call it ENOTSUP then:
>>>
>>> ENOTSUP Operation not supported (POSIX.1)
>>>
>>> Same value on Linux.
>>>
>>
>> Yes, that's better. I will resend.
> 
> 
> Well it seems like ENOTSUP is defined only here:
> 
> ./arch/parisc/include/uapi/asm/errno.h:115:#define ENOTSUP 252 /* 
> Function not implemented (POSIX.4 / HPUX) */
> 
> Which obviously I cannot use. Jason, do you have another idea?

How about using ENOTSUPP ?

$ PAGER= git grep 'define ENOTSUPP' include
include/linux/errno.h:#define ENOTSUPP 524 /* Operation is not supported */

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-10 Thread Eli Cohen
On Tue, Nov 10, 2015 at 11:21:07AM -0700, Jason Gunthorpe wrote:
> On Tue, Nov 10, 2015 at 08:00:09PM +0200, Eli Cohen wrote:
> > When an extended verbs is an extension to a legacy verb, the original
> > functionality is preserved. Hence we do not require each hardware driver
> > to set the extended capability. This will allow to use the extended verb
> > in its simple form with drivers that do not support the extended
> > capability.
> 
> Can we please get rid of uverbs_cmd_mask and uverbs_ex_cmd_mask ?
> 
> The driver should set the function pointers it does not support to
> NULL. We don't need the bitmask because we already know the answer.
> 
> For performance, I recommend having the core replace the NULL op
> pointers from the driver with a stub that returns ENOTSUP, and then
> drop all the checking in the fast path.
>

Yes, we can do this but I think this should be the subject for another
patch, agree?

Regarding using stabs, it may be nice but I don't think performance is
the issue here. Most verbs implementations involve relatively long i/o
operations anyway so the gain will not be noticable.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-10 Thread Eli Cohen
On Tue, Nov 10, 2015 at 11:23:35AM -0800, Bart Van Assche wrote:
> 
> How about using ENOTSUPP ?
> 
> $ PAGER= git grep 'define ENOTSUPP' include
> include/linux/errno.h:#define ENOTSUPP 524 /* Operation is not supported */
> 

Appearently this looks a better option but the following appears as a
icomment above this group: /* Defined for the NFSv3 protocol */

I don't mind using this if we can all agree that this is the best
choise.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-10 Thread Jason Gunthorpe
On Tue, Nov 10, 2015 at 10:02:26PM +0200, Eli Cohen wrote:
> On Tue, Nov 10, 2015 at 11:23:35AM -0800, Bart Van Assche wrote:
> > 
> > How about using ENOTSUPP ?
> > 
> > $ PAGER= git grep 'define ENOTSUPP' include
> > include/linux/errno.h:#define ENOTSUPP 524 /* Operation is not supported */
> > 
> 
> Appearently this looks a better option but the following appears as a
> icomment above this group: /* Defined for the NFSv3 protocol */
> 
> I don't mind using this if we can all agree that this is the best
> choise.

We cannot use ENOTSUPP, it is not supported by userspace, there was a
list discussion about that a while back.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-10 Thread Jason Gunthorpe
On Tue, Nov 10, 2015 at 09:57:13PM +0200, Eli Cohen wrote:

> Yes, we can do this but I think this should be the subject for another
> patch, agree?

Sure

> Regarding using stabs, it may be nice but I don't think performance is
> the issue here. Most verbs implementations involve relatively long i/o
> operations anyway so the gain will not be noticable.

Intel keeps optimizing this common path since they use it for post..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Tom Talpey

On 11/10/2015 1:25 PM, Jason Gunthorpe wrote:

On Tue, Nov 10, 2015 at 04:04:32AM -0800, Christoph Hellwig wrote:

On Tue, Nov 10, 2015 at 01:46:40PM +0200, Sagi Grimberg wrote:



On 10/11/2015 13:41, Christoph Hellwig wrote:

Oh, and while we're at it.  Can someone explain why we're even
using rdma_read_chunk_frmr for IB?  It seems to work around the
fact tat iWarp only allow a single RDMA READ SGE, but it's used
whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS, which seems
wrong.


I think Steve can answer it better than I can. I think that it is
just to have a single code path for both IB and iWARP. I agree that
the condition seems wrong and for small transfers rdma_read_chunk_frmr
is really a performance loss.


Well, the code path already exists, but only is used fi
IB_DEVICE_MEM_MGT_EXTENSIONS isn't set.  Below is an untested patch
that demonstrates how I think svcrdma should setup the reads.  Note
that this also allows to entirely remove it's allphys MR.

Note that as a followon this would also allow to remove the
non-READ_W_INV code path from rdma_read_chunk_frmr as a future
step.


I like this, my only comment is we should have a rdma_cap for this
behavior, rdma_cap_needs_rdma_read_mr(pd) or something?


Windows NDKPI has this, it's the oh-so-succinctly-named flag
NDK_ADAPTER_FLAG_RDMA_READ_SINK_NOT_REQUIRED. The ULP is free
to ignore it and pass the NDK_MR_FLAG_RDMA_READ_SINK flag anyway,
the provider is expected to ignore it if not needed.




+   if (rdma_protocol_iwarp(dev, newxprt->sc_cm_id->port_num)) {


Use here


+   /*
+* iWARP requires remote write access for the data sink, and
+* only supports a single SGE for RDMA_READ requests, so we'll
+* have to use a memory registration for each RDMA_READ.
+*/
+   if (!(dev->device_cap_flags &
IB_DEVICE_MEM_MGT_EXTENSIONS)) {


Lets enforce this in the core, if rdma_cap_needs_rdma_read_mr is set
the the device must also set IB_DEVICE_MEM_MGT_EXTENSIONS, check at
device creation time.


+   } else if (rdma_ib_or_roce(dev, newxprt->sc_cm_id->port_num)) {
+   /*
+* For IB or RoCE life is easy, no unsafe write access is
+* required and multiple SGEs are supported, so we don't need
+* to use MRs.
+*/
+   newxprt->sc_reader = rdma_read_chunk_lcl;
+   } else {
+   /*
+* Neither iWarp nor IB-ish, we're out of luck.
+*/
goto errout;


No need for the else, !rdma_cap_needs_rdma_read_mr means pd->local_dma_lkey is 
okay
to use.


Hmm, agreed, but it must still be acceptable to perform a registration
instead of using the local_dma_lkey. As mentioned earlier, there are
scatter limits and other implications for all-physical addressing that
an upper layer may choose to avoid.

Tom.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Jason Gunthorpe
On Tue, Nov 10, 2015 at 04:00:43PM -0500, Tom Talpey wrote:

> Hmm, agreed, but it must still be acceptable to perform a registration
> instead of using the local_dma_lkey. As mentioned earlier, there are
> scatter limits and other implications for all-physical addressing that
> an upper layer may choose to avoid.

It is always acceptable to use a lkey MR instead of the local dma
lkey, but ULPs should prefer to use the local dma lkey if possible,
for performance reasons.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Tom Talpey

On 11/10/2015 4:17 PM, Jason Gunthorpe wrote:

On Tue, Nov 10, 2015 at 04:00:43PM -0500, Tom Talpey wrote:


Hmm, agreed, but it must still be acceptable to perform a registration
instead of using the local_dma_lkey. As mentioned earlier, there are
scatter limits and other implications for all-physical addressing that
an upper layer may choose to avoid.


It is always acceptable to use a lkey MR instead of the local dma
lkey, but ULPs should prefer to use the local dma lkey if possible,
for performance reasons.


Sure, the key words are "if possible". For example, a single 1MB
RDMA Read is unlikely to be possible with the dma lkey, assuming
worst-case physical page scatter it would need 256 SGEs. But 1MB
can be registered easily.

In any case, my point was that the ULP gets to choose, so, it looks
like we agree.

Tom.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Chuck Lever

> On Nov 10, 2015, at 4:30 PM, Tom Talpey  wrote:
> 
> On 11/10/2015 4:17 PM, Jason Gunthorpe wrote:
>> On Tue, Nov 10, 2015 at 04:00:43PM -0500, Tom Talpey wrote:
>> 
>>> Hmm, agreed, but it must still be acceptable to perform a registration
>>> instead of using the local_dma_lkey. As mentioned earlier, there are
>>> scatter limits and other implications for all-physical addressing that
>>> an upper layer may choose to avoid.
>> 
>> It is always acceptable to use a lkey MR instead of the local dma
>> lkey, but ULPs should prefer to use the local dma lkey if possible,
>> for performance reasons.
> 
> Sure, the key words are "if possible". For example, a single 1MB
> RDMA Read is unlikely to be possible with the dma lkey, assuming
> worst-case physical page scatter it would need 256 SGEs. But 1MB
> can be registered easily.
> 
> In any case, my point was that the ULP gets to choose, so, it looks
> like we agree.

I’d like to see our NFS server use the local DMA lkey where it
makes sense, to avoid the cost of registering and invalidating
memory.

I have to agree with Tom that once the device’s s/g limit is
exceeded, the server has to post an RDMA Read WR every few
pages, and appears to get expensive for large NFS requests.

The current mechanism of statically choosing either FRMR or
local DMA depending on the device is probably not adequate.
Maybe we could post all the Read WRs via a single chain? Or
stick with FRMR when the amount of data to read is significant.

I’ve also tested Christoph’s patch. The logic currently in
rdma_read_chunk_lcl does not seem up to the task. Once the
device’s s/g limit is exceeded, the server starts throwing
local length exceptions, and the client workload hangs.

None of this is impossible to fix, but there is some work to
do here.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] staging/rdma/hfi1: set Gen3 half-swing for integrated devices

2015-11-10 Thread ira.weiny
On Tue, Nov 10, 2015 at 12:59:29PM +0300, Dan Carpenter wrote:
> Gar...  No.  Please please get rid of the PC() macro.  It makes the code
> impossible to understand because instead of hitting CTRL-[ you have
> decode it and then manually type out
> 
> :cs find g CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_OVERWRITE_ENABLE_SHIFT
> 
> which is the length of a typical college essay.  I meant just put a
> comment like this:
> 
> /*
>  * In the hardware spec these are prefixed with:
>  * CCE_PCIE_CTRL_...
>  * But it is too long to use in code.
>  */
> #define XMT_MARGIN_GEN1_GEN2_OVERWRITE_ENABLE_MASK 0x1ull
> 
> Or probably even better:
> 
> #define CCE_PCIE_CTRL (CCE + 0x00C0)
> #define LANE_BUNDLE_MASK  0x3ull /* CCE_PCIE_CTRL_PCIE_LANE_BUNDLE_MASK */
> #define LANE_BUNDLE_SHIFT 0  /* CCE_PCIE_CTRL_PCIE_LANE_BUNDLE_SHIFT 
> */
> #define LANE_DELAY_MASK   0xFull /* 
> CCE_PCIE_CTRL_PCIE_LANE_DELAY_MASK */
> #define LANE_DELAY_SHIFT  2  /* CCE_PCIE_CTRL_PCIE_LANE_DELAY_SHIFT */
> #define MARGIN_OVERWRITE_SHIFT8  /* 
> CCE_PCIE_CTRL_XMT_MARGIN_OVERWRITE_ENABLE_SHIFT */
> #define MARGIN_SHIFT  9  /* CCE_PCIE_CTRL_XMT_MARGIN_SHIFT */
> #define MARGIN_G1G2_OVERWRITE_MASK 0x1ull /* 
> CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_OVERWRITE_ENABLE_MASK */
> #define MARGIN_G1G2_OVERWRITE_SHIFT 12/* 
> CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_OVERWRITE_ENABLE_SHIFT */
> #define MARGIN_G1G2_MASK  0x7ull /* 
> CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_MASK */
> #define MARGIN_G1G2_SHIFT 13 /* 
> CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_SHIFT */
> 
> Those lines go over the 80 character limit but it's fine.

My apologies for not understanding what you meant.  I took your meaning to be
that we had to honor the checkpatch checks so while the PC macro was
undesirable it was ok if I just made some comments...

FWIW I don't like the PC macro either.  But we have a tool which is generating
these names to be identical to the hardware spec.  And we really want to
preserve those as a reference back to the spec.  Creating additional names which
are in the code is a bit cumbersome but what if we do something like this:


...
#define CCE_PCIE_CTRL (CCE + 0x00C0)
#define CCE_PCIE_CTRL_PCIE_LANE_BUNDLE_MASK 0x3ull
#define CCE_PCIE_CTRL_PCIE_LANE_BUNDLE_SHIFT 0
...




#define LANE_BUNDLE_MASKCCE_PCIE_CTRL_PCIE_LANE_BUNDLE_MASK
#define LANE_BUNDLE_SHIFT   CCE_PCIE_CTRL_PCIE_LANE_BUNDLE_SHIFT
...


?



An alternative would be to define some helper functions such as:

static inline u64 extract_xmt_margin_g1g2(u64 reg)
{
return (reg >> CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_SHIFT)
& CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_MASK;
}

...


...
xmt_margin = extract_xmt_margin_g1g2(pcie_ctrl);
...

I prefer the second option as it preserves the register names right in the
code.  So you can reference the hardware spec without looking anything up in a
header file.

I again apologize for misunderstanding your previous meaning.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Jason Gunthorpe
On Tue, Nov 10, 2015 at 06:01:01PM -0500, Chuck Lever wrote:
> The current mechanism of statically choosing either FRMR or
> local DMA depending on the device is probably not adequate.
> Maybe we could post all the Read WRs via a single chain? Or
> stick with FRMR when the amount of data to read is significant.

Yes, those are the right ways to go..

IMHO, the break point for when it makes sense to switch from a RDMA
READ chain to a MR is going to be a RDMA core tunable. The performance
curve won't have much to do with the ULP.

But certainly, if a requests fits in the SG list then it should just
use the local dma lkey directly, and consider allocating an
appropriate S/G list size when creating the QP.

The special thing about iwarp becomes simply defeating the use of the
local dma lkey for RDMA READ.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] staging/rdma/hfi1: Reduce number of parameters passed to send handlers

2015-11-10 Thread ira . weiny
From: Dennis Dalessandro 

The current send function handlers are passed a bunch of parameters that are
already part of the data structure that is passed in first (qp). This patch
removes all of this and just passes the QP.

Reviewed-by: Mike Marciniszyn 
Signed-off-by: Dennis Dalessandro 
Signed-off-by: Ira Weiny 
---
 drivers/staging/rdma/hfi1/diag.c  | 27 ++-
 drivers/staging/rdma/hfi1/hfi.h   | 20 ++
 drivers/staging/rdma/hfi1/ruc.c   | 15 ++-
 drivers/staging/rdma/hfi1/verbs.c | 55 ++-
 drivers/staging/rdma/hfi1/verbs.h | 23 ++--
 5 files changed, 71 insertions(+), 69 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
index 88414d720469..0aaad7412842 100644
--- a/drivers/staging/rdma/hfi1/diag.c
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -1618,14 +1618,12 @@ int snoop_recv_handler(struct hfi1_packet *packet)
 /*
  * Handle snooping and capturing packets when sdma is being used.
  */
-int snoop_send_dma_handler(struct hfi1_qp *qp, struct ahg_ib_header *ibhdr,
-  u32 hdrwords, struct hfi1_sge_state *ss, u32 len,
-  u32 plen, u32 dwords, u64 pbc)
+int snoop_send_dma_handler(struct hfi1_qp *qp, struct hfi1_pkt_state *ps,
+  u64 pbc)
 {
-   pr_alert("Snooping/Capture of  Send DMA Packets Is Not Supported!\n");
+   pr_alert("Snooping/Capture of Send DMA Packets Is Not Supported!\n");
snoop_dbg("Unsupported Operation");
-   return hfi1_verbs_send_dma(qp, ibhdr, hdrwords, ss, len, plen, dwords,
- 0);
+   return hfi1_verbs_send_dma(qp, ps, 0);
 }
 
 /*
@@ -1633,12 +1631,16 @@ int snoop_send_dma_handler(struct hfi1_qp *qp, struct 
ahg_ib_header *ibhdr,
  * bypass packets. The only way to send a bypass packet currently is to use the
  * diagpkt interface. When that interface is enable snoop/capture is not.
  */
-int snoop_send_pio_handler(struct hfi1_qp *qp, struct ahg_ib_header *ahdr,
-  u32 hdrwords, struct hfi1_sge_state *ss, u32 len,
-  u32 plen, u32 dwords, u64 pbc)
+int snoop_send_pio_handler(struct hfi1_qp *qp, struct hfi1_pkt_state *ps,
+  u64 pbc)
 {
-   struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
-   struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+   struct ahg_ib_header *ahdr = qp->s_hdr;
+   u32 hdrwords = qp->s_hdrwords;
+   struct hfi1_sge_state *ss = qp->s_cur_sge;
+   u32 len = qp->s_cur_size;
+   u32 dwords = (len + 3) >> 2;
+   u32 plen = hdrwords + dwords + 2; /* includes pbc */
+   struct hfi1_pportdata *ppd = ps->ppd;
struct snoop_packet *s_packet = NULL;
u32 *hdr = (u32 *)&ahdr->ibh;
u32 length = 0;
@@ -1783,8 +1785,7 @@ int snoop_send_pio_handler(struct hfi1_qp *qp, struct 
ahg_ib_header *ahdr,
break;
}
 out:
-   return hfi1_verbs_send_pio(qp, ahdr, hdrwords, ss, len, plen, dwords,
- md.u.pbc);
+   return hfi1_verbs_send_pio(qp, ps, md.u.pbc);
 }
 
 /*
diff --git a/drivers/staging/rdma/hfi1/hfi.h b/drivers/staging/rdma/hfi1/hfi.h
index b1655be34851..f633ca2a6ee4 100644
--- a/drivers/staging/rdma/hfi1/hfi.h
+++ b/drivers/staging/rdma/hfi1/hfi.h
@@ -1048,12 +1048,10 @@ struct hfi1_devdata {
 * Handlers for outgoing data so that snoop/capture does not
 * have to have its hooks in the send path
 */
-   int (*process_pio_send)(struct hfi1_qp *qp, struct ahg_ib_header *ibhdr,
-   u32 hdrwords, struct hfi1_sge_state *ss,
-   u32 len, u32 plen, u32 dwords, u64 pbc);
-   int (*process_dma_send)(struct hfi1_qp *qp, struct ahg_ib_header *ibhdr,
-   u32 hdrwords, struct hfi1_sge_state *ss,
-   u32 len, u32 plen, u32 dwords, u64 pbc);
+   int (*process_pio_send)(struct hfi1_qp *qp, struct hfi1_pkt_state *ps,
+   u64 pbc);
+   int (*process_dma_send)(struct hfi1_qp *qp, struct hfi1_pkt_state *ps,
+   u64 pbc);
void (*pio_inline_send)(struct hfi1_devdata *dd, struct pio_buf *pbuf,
u64 pbc, const void *from, size_t count);
 
@@ -1401,12 +1399,10 @@ void reset_link_credits(struct hfi1_devdata *dd);
 void assign_remote_cm_au_table(struct hfi1_devdata *dd, u8 vcu);
 
 int snoop_recv_handler(struct hfi1_packet *packet);
-int snoop_send_dma_handler(struct hfi1_qp *qp, struct ahg_ib_header *ibhdr,
-  u32 hdrwords, struct hfi1_sge_state *ss, u32 len,
-  u32 plen, u32 dwords, u64 pbc);
-int snoop_send_pio_handler(struct hfi1_qp *qp, struct ahg_ib_header *ibhdr,
-  u32 hdrwords, struct hfi1_sge_state *ss, u32 len,
-   

[PATCH] staging/rdma/hfi1: Handle packets with invalid RHF on context 0

2015-11-10 Thread ira . weiny
From: Niranjana Vishwanathapura 

Context 0 (which handles the error packets) can potentially receive an invalid
rhf. Hence, it can not depend on RHF sequence number and can only use DMA_RTAIL
mechanism. Detect such packets with invalid rhf using rhf sequence counting
mechanism and drop them.

As DMA_RTAIL mechanism has performance penalties, do not use context 0 for
performance critical verbs path. Use context 0 for VL15 (MAD), multicast and
error packets.

Reviewed-by: Arthur Kepner 
Reviewed-by: Mike Marciniszyn 
Reviewed-by: Dennis Dalessandro 
Reviewed-by: Dean Luick 
Reviewed-by: Mitko Haralanov 
Signed-off-by: Niranjana Vishwanathapura 
Signed-off-by: Mike Marciniszyn 
Signed-off-by: Ira Weiny 
---
 drivers/staging/rdma/hfi1/chip.c   |  74 -
 drivers/staging/rdma/hfi1/driver.c | 108 -
 drivers/staging/rdma/hfi1/hfi.h|   8 ++-
 drivers/staging/rdma/hfi1/init.c   |   9 +++-
 4 files changed, 146 insertions(+), 53 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index b7c7d1cac4e1..ca411386026b 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -121,8 +121,8 @@ struct flag_table {
 #define SEC_SC_HALTED  0x4 /* per-context only */
 #define SEC_SPC_FREEZE 0x8 /* per-HFI only */
 
-#define VL15CTXT  1
 #define MIN_KERNEL_KCTXTS 2
+#define FIRST_KERNEL_KCTXT1
 #define NUM_MAP_REGS 32
 
 /* Bit offset into the GUID which carries HFI id information */
@@ -7748,8 +7748,8 @@ void hfi1_rcvctrl(struct hfi1_devdata *dd, unsigned int 
op, int ctxt)
& RCV_TID_CTRL_TID_BASE_INDEX_MASK)
<< RCV_TID_CTRL_TID_BASE_INDEX_SHIFT);
write_kctxt_csr(dd, ctxt, RCV_TID_CTRL, reg);
-   if (ctxt == VL15CTXT)
-   write_csr(dd, RCV_VL15, VL15CTXT);
+   if (ctxt == HFI1_CTRL_CTXT)
+   write_csr(dd, RCV_VL15, HFI1_CTRL_CTXT);
}
if (op & HFI1_RCVCTRL_CTXT_DIS) {
write_csr(dd, RCV_VL15, 0);
@@ -8870,7 +8870,7 @@ static int request_msix_irqs(struct hfi1_devdata *dd)
int first_general, last_general;
int first_sdma, last_sdma;
int first_rx, last_rx;
-   int first_cpu, restart_cpu, curr_cpu;
+   int first_cpu, curr_cpu;
int rcv_cpu, sdma_cpu;
int i, ret = 0, possible;
int ht;
@@ -8909,22 +8909,19 @@ static int request_msix_irqs(struct hfi1_devdata *dd)
topology_sibling_cpumask(cpumask_first(local_mask)));
for (i = possible/ht; i < possible; i++)
cpumask_clear_cpu(i, def);
-   /* reset possible */
-   possible = cpumask_weight(def);
/* def now has full cores on chosen node*/
first_cpu = cpumask_first(def);
if (nr_cpu_ids >= first_cpu)
first_cpu++;
-   restart_cpu = first_cpu;
-   curr_cpu = restart_cpu;
+   curr_cpu = first_cpu;
 
-   for (i = first_cpu; i < dd->n_krcv_queues + first_cpu; i++) {
+   /*  One context is reserved as control context */
+   for (i = first_cpu; i < dd->n_krcv_queues + first_cpu - 1; i++) {
cpumask_clear_cpu(curr_cpu, def);
cpumask_set_cpu(curr_cpu, rcv);
-   if (curr_cpu >= possible)
-   curr_cpu = restart_cpu;
-   else
-   curr_cpu++;
+   curr_cpu = cpumask_next(curr_cpu, def);
+   if (curr_cpu >= nr_cpu_ids)
+   break;
}
/* def mask has non-rcv, rcv has recv mask */
rcv_cpu = cpumask_first(rcv);
@@ -9024,12 +9021,20 @@ static int request_msix_irqs(struct hfi1_devdata *dd)
if (sdma_cpu >= nr_cpu_ids)
sdma_cpu = cpumask_first(def);
} else if (handler == receive_context_interrupt) {
-   dd_dev_info(dd, "rcv ctxt %d cpu %d\n",
-   rcd->ctxt, rcv_cpu);
-   cpumask_set_cpu(rcv_cpu, dd->msix_entries[i].mask);
-   rcv_cpu = cpumask_next(rcv_cpu, rcv);
-   if (rcv_cpu >= nr_cpu_ids)
-   rcv_cpu = cpumask_first(rcv);
+   dd_dev_info(dd, "rcv ctxt %d cpu %d\n", rcd->ctxt,
+   (rcd->ctxt == HFI1_CTRL_CTXT) ?
+   cpumask_first(def) : rcv_cpu);
+   if (rcd->ctxt == HFI1_CTRL_CTXT) {
+   /* map to first default */
+   cpumask_set_cpu(cpumask_first(def),
+   dd->msix_entries[i].mask);
+   } else {
+   cpumask_set_cpu(rcv_cpu,
+   

[PATCH 0/8] Fix hfi1_ioctl locking

2015-11-10 Thread ira . weiny
From: Ira Weiny 

It was identified that hfi1_ioctl may sleep with a spin lock held.

This was identified publicly here:

http://www.spinics.net/lists/linux-rdma/msg29926.html

As well as by our internal development.

This series cleans up the code and parameter checks, as well as fixing the
locking.


Dennis Dalessandro (2):
  staging/rdma/hfi1: Reduce snoop locking scope in IOCTL handler.
  staging/rdma/hfi1: Return immediately on error

Ira Weiny (5):
  staging/rdma/hfi1: Fix camel case variables
  staging/rdma/hfi1: Return early from hfi1_ioctl parameter errors
  staging/rdma/hfi1: remove unneeded goto done
  staging/rdma/hfi1: return early if setlink state was specified
  staging/rdma/hfi1: Further clean up hfi1_ioctl parameter checks

Jubin John (1):
  staging/rdma/hfi1: diag.c Correct code style issues

 drivers/staging/rdma/hfi1/diag.c | 465 +++
 1 file changed, 223 insertions(+), 242 deletions(-)

-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/8] staging/rdma/hfi1: diag.c Correct code style issues

2015-11-10 Thread ira . weiny
From: Jubin John 

Using the latest checkpatch correct the checks on diag.c

Reviewed-by: Dennis Dalessandro 
Reviewed-by: Mike Marciniszyn 
Signed-off-by: Jubin John 
Signed-off-by: Ira Weiny 
---
 drivers/staging/rdma/hfi1/diag.c | 81 ++--
 1 file changed, 37 insertions(+), 44 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
index 0aaad7412842..c5cc2db1b559 100644
--- a/drivers/staging/rdma/hfi1/diag.c
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -78,8 +78,8 @@
hfi1_cdbg(SNOOP, fmt, ##__VA_ARGS__)
 
 /* Snoop option mask */
-#define SNOOP_DROP_SEND(1 << 0)
-#define SNOOP_USE_METADATA (1 << 1)
+#define SNOOP_DROP_SENDBIT(0)
+#define SNOOP_USE_METADATA BIT(1)
 
 static u8 snoop_flags;
 
@@ -135,7 +135,7 @@ static struct cdev diagpkt_cdev;
 static struct device *diagpkt_device;
 
 static ssize_t diagpkt_write(struct file *fp, const char __user *data,
-size_t count, loff_t *off);
+size_t count, loff_t *off);
 
 static const struct file_operations diagpkt_file_ops = {
.owner = THIS_MODULE,
@@ -177,37 +177,37 @@ struct hfi1_link_info {
 #define HFI1_SNOOP_IOCGETLINKSTATE \
_IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ)
 #define HFI1_SNOOP_IOCSETLINKSTATE \
-   _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ+1)
+   _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 1)
 #define HFI1_SNOOP_IOCCLEARQUEUE \
-   _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ+2)
+   _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 2)
 #define HFI1_SNOOP_IOCCLEARFILTER \
-   _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ+3)
+   _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 3)
 #define HFI1_SNOOP_IOCSETFILTER \
-   _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ+4)
+   _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 4)
 #define HFI1_SNOOP_IOCGETVERSION \
-   _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ+5)
+   _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 5)
 #define HFI1_SNOOP_IOCSET_OPTS \
-   _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ+6)
+   _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 6)
 
 /*
  * These offsets +6/+7 could change, but these are already known and used
  * IOCTL numbers so don't change them without a good reason.
  */
 #define HFI1_SNOOP_IOCGETLINKSTATE_EXTRA \
-   _IOWR(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ+6, \
+   _IOWR(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 6, \
struct hfi1_link_info)
 #define HFI1_SNOOP_IOCSETLINKSTATE_EXTRA \
-   _IOWR(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ+7, \
+   _IOWR(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 7, \
struct hfi1_link_info)
 
 static int hfi1_snoop_open(struct inode *in, struct file *fp);
 static ssize_t hfi1_snoop_read(struct file *fp, char __user *data,
-   size_t pkt_len, loff_t *off);
+  size_t pkt_len, loff_t *off);
 static ssize_t hfi1_snoop_write(struct file *fp, const char __user *data,
-size_t count, loff_t *off);
+   size_t count, loff_t *off);
 static long hfi1_ioctl(struct file *fp, unsigned int cmd, unsigned long arg);
 static unsigned int hfi1_snoop_poll(struct file *fp,
-   struct poll_table_struct *wait);
+   struct poll_table_struct *wait);
 static int hfi1_snoop_release(struct inode *in, struct file *fp);
 
 struct hfi1_packet_filter_command {
@@ -323,14 +323,12 @@ static void hfi1_snoop_remove(struct hfi1_devdata *dd)
 
 void hfi1_diag_remove(struct hfi1_devdata *dd)
 {
-
hfi1_snoop_remove(dd);
if (atomic_dec_and_test(&diagpkt_count))
hfi1_cdev_cleanup(&diagpkt_cdev, &diagpkt_device);
hfi1_cdev_cleanup(&dd->diag_cdev, &dd->diag_device);
 }
 
-
 /*
  * Allocated structure shared between the credit return mechanism and
  * diagpkt_send().
@@ -393,7 +391,7 @@ static ssize_t diagpkt_send(struct diag_pkt *dp)
 
if (dp->version != _DIAG_PKT_VERS) {
dd_dev_err(dd, "Invalid version %u for diagpkt_write\n",
-   dp->version);
+  dp->version);
ret = -EINVAL;
goto bail;
}
@@ -440,7 +438,7 @@ static ssize_t diagpkt_send(struct diag_pkt *dp)
}
 
if (copy_from_user(tmpbuf,
-  (const void __user *) (unsigned long) dp->data,
+  (const void __user *)(unsigned long)dp->data,
   dp->len)) {
ret = -EFAULT;
goto bail;
@@ -530,9 +528,9 @@ static ssize_t diagpkt_send(struct diag_pkt *dp)
 * NOTE: PRC_FILL_ERR is at best informational and cannot
 

[PATCH 2/8] staging/rdma/hfi1: Fix camel case variables

2015-11-10 Thread ira . weiny
From: Ira Weiny 

physState, linkState, and devState should be phys_state, link_state, and
dev_state

Signed-off-by: Dennis Dalessandro 
Signed-off-by: Ira Weiny 
---
 drivers/staging/rdma/hfi1/diag.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
index c5cc2db1b559..c2839473f983 100644
--- a/drivers/staging/rdma/hfi1/diag.c
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -964,9 +964,9 @@ static long hfi1_ioctl(struct file *fp, unsigned int cmd, 
unsigned long arg)
void *filter_value = NULL;
long ret = 0;
int value = 0;
-   u8 physState = 0;
-   u8 linkState = 0;
-   u16 devState = 0;
+   u8 phys_state = 0;
+   u8 link_state = 0;
+   u16 dev_state = 0;
unsigned long flags = 0;
unsigned long *argp = NULL;
struct hfi1_packet_filter_command filter_cmd = {0};
@@ -1030,31 +1030,31 @@ static long hfi1_ioctl(struct file *fp, unsigned int 
cmd, unsigned long arg)
}
 
/* What we want to transition to */
-   physState = (value >> 4) & 0xF;
-   linkState = value & 0xF;
+   phys_state = (value >> 4) & 0xF;
+   link_state = value & 0xF;
snoop_dbg("Setting link state 0x%x", value);
 
-   switch (linkState) {
+   switch (link_state) {
case IB_PORT_NOP:
-   if (physState == 0)
+   if (phys_state == 0)
break;
/* fall through */
case IB_PORT_DOWN:
-   switch (physState) {
+   switch (phys_state) {
case 0:
-   devState = HLS_DN_DOWNDEF;
+   dev_state = HLS_DN_DOWNDEF;
break;
case 2:
-   devState = HLS_DN_POLL;
+   dev_state = HLS_DN_POLL;
break;
case 3:
-   devState = HLS_DN_DISABLE;
+   dev_state = HLS_DN_DISABLE;
break;
default:
ret = -EINVAL;
goto done;
}
-   ret = set_link_state(ppd, devState);
+   ret = set_link_state(ppd, dev_state);
break;
case IB_PORT_ARMED:
ret = set_link_state(ppd, HLS_UP_ARMED);
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/8] staging/rdma/hfi1: Further clean up hfi1_ioctl parameter checks

2015-11-10 Thread ira . weiny
From: Ira Weiny 

Final clean up of the if/then/else clause for the parameter checks of
hfi1_ioctl

Signed-off-by: Dennis Dalessandro 
Signed-off-by: Ira Weiny 
---
 drivers/staging/rdma/hfi1/diag.c | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
index a489a79dd3b6..43f08080480c 100644
--- a/drivers/staging/rdma/hfi1/diag.c
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -974,24 +974,28 @@ static long hfi1_ioctl(struct file *fp, unsigned int cmd, 
unsigned long arg)
struct hfi1_pportdata *ppd = NULL;
unsigned int index;
struct hfi1_link_info link_info;
+   int read_cmd, write_cmd, read_ok, write_ok;
 
dd = hfi1_dd_from_sc_inode(fp->f_inode);
if (dd == NULL)
return -ENODEV;
 
mode_flag = dd->hfi1_snoop.mode_flag;
+   read_cmd = _IOC_DIR(cmd) & _IOC_READ;
+   write_cmd = _IOC_DIR(cmd) & _IOC_WRITE;
+   write_ok = access_ok(VERIFY_WRITE, (void __user *)arg, _IOC_SIZE(cmd));
+   read_ok = access_ok(VERIFY_READ, (void __user *)arg, _IOC_SIZE(cmd));
 
-   if (((_IOC_DIR(cmd) & _IOC_READ)
-   && !access_ok(VERIFY_WRITE, (void __user *)arg, _IOC_SIZE(cmd)))
-   || ((_IOC_DIR(cmd) & _IOC_WRITE)
-   && !access_ok(VERIFY_READ, (void __user *)arg, _IOC_SIZE(cmd {
+   if ((read_cmd && !write_ok) || (write_cmd && !read_ok))
return -EFAULT;
-   } else if (!capable(CAP_SYS_ADMIN)) {
+
+   if (!capable(CAP_SYS_ADMIN))
return -EPERM;
-   } else if ((mode_flag & HFI1_PORT_CAPTURE_MODE) &&
-  (cmd != HFI1_SNOOP_IOCCLEARQUEUE) &&
-  (cmd != HFI1_SNOOP_IOCCLEARFILTER) &&
-  (cmd != HFI1_SNOOP_IOCSETFILTER)) {
+
+   if ((mode_flag & HFI1_PORT_CAPTURE_MODE) &&
+   (cmd != HFI1_SNOOP_IOCCLEARQUEUE) &&
+   (cmd != HFI1_SNOOP_IOCCLEARFILTER) &&
+   (cmd != HFI1_SNOOP_IOCSETFILTER))
/* Capture devices are allowed only 3 operations
 * 1.Clear capture queue
 * 2.Clear capture filter
@@ -999,10 +1003,10 @@ static long hfi1_ioctl(struct file *fp, unsigned int 
cmd, unsigned long arg)
 * Other are invalid.
 */
return -EINVAL;
-   } else if (cmd == HFI1_SNOOP_IOCSETLINKSTATE) {
+
+   if (cmd == HFI1_SNOOP_IOCSETLINKSTATE)
/* We do not support the old setlink state */
return -EINVAL;
-   }
 
spin_lock_irqsave(&dd->hfi1_snoop.snoop_lock, flags);
 
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/8] staging/rdma/hfi1: Return immediately on error

2015-11-10 Thread ira . weiny
From: Dennis Dalessandro 

Now that the spinlock is not taken throughout hfi1_ioctl it is safe to return
early rather than setting a variable and falling through the switch.

Reviewed-by: Mike Marciniszyn 
Signed-off-by: Dennis Dalessandro 
Signed-off-by: Ira Weiny 
---
 drivers/staging/rdma/hfi1/diag.c | 59 
 1 file changed, 23 insertions(+), 36 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
index 34a8c4da71d2..0f01618cc5b7 100644
--- a/drivers/staging/rdma/hfi1/diag.c
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -1015,20 +1015,16 @@ static long hfi1_ioctl(struct file *fp, unsigned int 
cmd, unsigned long arg)
if (copy_from_user(&link_info,
   (struct hfi1_link_info __user *)arg,
   sizeof(link_info)))
-   ret = -EFAULT;
+   return -EFAULT;
 
value = link_info.port_state;
index = link_info.port_number;
-   if (index > dd->num_pports - 1) {
-   ret = -EINVAL;
-   break;
-   }
+   if (index > dd->num_pports - 1)
+   return -EINVAL;
 
ppd = &dd->pport[index];
-   if (!ppd) {
-   ret = -EINVAL;
-   break;
-   }
+   if (!ppd)
+   return -EINVAL;
 
/* What we want to transition to */
phys_state = (value >> 4) & 0xF;
@@ -1052,7 +1048,7 @@ static long hfi1_ioctl(struct file *fp, unsigned int cmd, 
unsigned long arg)
dev_state = HLS_DN_DISABLE;
break;
default:
-   ret = -EINVAL;
+   return -EINVAL;
}
ret = set_link_state(ppd, dev_state);
break;
@@ -1067,8 +1063,7 @@ static long hfi1_ioctl(struct file *fp, unsigned int cmd, 
unsigned long arg)
send_idle_sma(dd, SMA_IDLE_ACTIVE);
break;
default:
-   ret = -EINVAL;
-   break;
+   return -EINVAL;
}
 
if (ret)
@@ -1081,7 +1076,7 @@ static long hfi1_ioctl(struct file *fp, unsigned int cmd, 
unsigned long arg)
if (copy_from_user(&link_info,
   (struct hfi1_link_info __user *)arg,
   sizeof(link_info)))
-   ret = -EFAULT;
+   return -EFAULT;
index = link_info.port_number;
} else {
ret = __get_user(index, (int __user *)arg);
@@ -1089,16 +1084,13 @@ static long hfi1_ioctl(struct file *fp, unsigned int 
cmd, unsigned long arg)
break;
}
 
-   if (index > dd->num_pports - 1) {
-   ret = -EINVAL;
-   break;
-   }
+   if (index > dd->num_pports - 1)
+   return -EINVAL;
 
ppd = &dd->pport[index];
-   if (!ppd) {
-   ret = -EINVAL;
-   break;
-   }
+   if (!ppd)
+   return -EINVAL;
+
value = hfi1_ibphys_portstate(ppd);
value <<= 4;
value |= driver_lstate(ppd);
@@ -1115,7 +1107,7 @@ static long hfi1_ioctl(struct file *fp, unsigned int cmd, 
unsigned long arg)
ppd->link_width_active;
if (copy_to_user((struct hfi1_link_info __user *)arg,
 &link_info, sizeof(link_info)))
-   ret = -EFAULT;
+   return -EFAULT;
} else {
ret = __put_user(value, (int __user *)arg);
}
@@ -1146,14 +1138,12 @@ static long hfi1_ioctl(struct file *fp, unsigned int 
cmd, unsigned long arg)
/* just copy command structure */
argp = (unsigned long *)arg;
if (copy_from_user(&filter_cmd, (void __user *)argp,
-  sizeof(filter_cmd))) {
-   ret = -EFAULT;
-   break;
-   }
+  sizeof(filter_cmd)))
+   return -EFAULT;
+
if (filter_cmd.opcode >= HFI1_MAX_FILTERS) {
pr_alert("Invalid opcode in request\n");
-   ret = -EINVAL;
-   break;
+   return -EINVAL;

[PATCH 4/8] staging/rdma/hfi1: remove unneeded goto done

2015-11-10 Thread ira . weiny
From: Ira Weiny 

This goto done is followed by an if (ret) break in the outer switch clause.  It
is unnecessary.

Signed-off-by: Dennis Dalessandro 
Signed-off-by: Ira Weiny 
---
 drivers/staging/rdma/hfi1/diag.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
index 2bb857b2a103..556a47591989 100644
--- a/drivers/staging/rdma/hfi1/diag.c
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -1053,7 +1053,6 @@ static long hfi1_ioctl(struct file *fp, unsigned int cmd, 
unsigned long arg)
break;
default:
ret = -EINVAL;
-   goto done;
}
ret = set_link_state(ppd, dev_state);
break;
@@ -1201,7 +1200,7 @@ static long hfi1_ioctl(struct file *fp, unsigned int cmd, 
unsigned long arg)
ret = -ENOTTY;
break;
}
-done:
+
spin_unlock_irqrestore(&dd->hfi1_snoop.snoop_lock, flags);
return ret;
 }
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/8] staging/rdma/hfi1: Reduce snoop locking scope in IOCTL handler.

2015-11-10 Thread ira . weiny
From: Dennis Dalessandro 

This patch avoids issues while calling into copy from/to user while holding the
lock by only taking the lock when it is absolutely required.

The only commands which require the snoop lock are: *Set Filter *Clear Filter
*Clear Queue

Reported-by: Alexey Khoroshilov 
Reviewed-by: Mike Marciniszyn 
Signed-off-by: Dennis Dalessandro 
Signed-off-by: Ira Weiny 
---
 drivers/staging/rdma/hfi1/diag.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
index 43f08080480c..34a8c4da71d2 100644
--- a/drivers/staging/rdma/hfi1/diag.c
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -1008,8 +1008,6 @@ static long hfi1_ioctl(struct file *fp, unsigned int cmd, 
unsigned long arg)
/* We do not support the old setlink state */
return -EINVAL;
 
-   spin_lock_irqsave(&dd->hfi1_snoop.snoop_lock, flags);
-
switch (cmd) {
case HFI1_SNOOP_IOCSETLINKSTATE_EXTRA:
memset(&link_info, 0, sizeof(link_info));
@@ -1125,11 +1123,14 @@ static long hfi1_ioctl(struct file *fp, unsigned int 
cmd, unsigned long arg)
 
case HFI1_SNOOP_IOCCLEARQUEUE:
snoop_dbg("Clearing snoop queue");
+   spin_lock_irqsave(&dd->hfi1_snoop.snoop_lock, flags);
drain_snoop_list(&dd->hfi1_snoop.queue);
+   spin_unlock_irqrestore(&dd->hfi1_snoop.snoop_lock, flags);
break;
 
case HFI1_SNOOP_IOCCLEARFILTER:
snoop_dbg("Clearing filter");
+   spin_lock_irqsave(&dd->hfi1_snoop.snoop_lock, flags);
if (dd->hfi1_snoop.filter_callback) {
/* Drain packets first */
drain_snoop_list(&dd->hfi1_snoop.queue);
@@ -1137,6 +1138,7 @@ static long hfi1_ioctl(struct file *fp, unsigned int cmd, 
unsigned long arg)
}
kfree(dd->hfi1_snoop.filter_value);
dd->hfi1_snoop.filter_value = NULL;
+   spin_unlock_irqrestore(&dd->hfi1_snoop.snoop_lock, flags);
break;
 
case HFI1_SNOOP_IOCSETFILTER:
@@ -1173,13 +1175,14 @@ static long hfi1_ioctl(struct file *fp, unsigned int 
cmd, unsigned long arg)
break;
}
/* Drain packets first */
+   spin_lock_irqsave(&dd->hfi1_snoop.snoop_lock, flags);
drain_snoop_list(&dd->hfi1_snoop.queue);
dd->hfi1_snoop.filter_callback =
hfi1_filters[filter_cmd.opcode].filter;
/* just in case we see back to back sets */
kfree(dd->hfi1_snoop.filter_value);
dd->hfi1_snoop.filter_value = filter_value;
-
+   spin_unlock_irqrestore(&dd->hfi1_snoop.snoop_lock, flags);
break;
case HFI1_SNOOP_IOCGETVERSION:
value = SNOOP_CAPTURE_VERSION;
@@ -1203,7 +1206,6 @@ static long hfi1_ioctl(struct file *fp, unsigned int cmd, 
unsigned long arg)
break;
}
 
-   spin_unlock_irqrestore(&dd->hfi1_snoop.snoop_lock, flags);
return ret;
 }
 
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/8] staging/rdma/hfi1: Return early from hfi1_ioctl parameter errors

2015-11-10 Thread ira . weiny
From: Ira Weiny 

Rather than have a switch in a large else clause make the parameter checks
return immediately.

Signed-off-by: Dennis Dalessandro 
Signed-off-by: Ira Weiny 
---
 drivers/staging/rdma/hfi1/diag.c | 348 +++
 1 file changed, 173 insertions(+), 175 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
index c2839473f983..2bb857b2a103 100644
--- a/drivers/staging/rdma/hfi1/diag.c
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -979,17 +979,15 @@ static long hfi1_ioctl(struct file *fp, unsigned int cmd, 
unsigned long arg)
if (dd == NULL)
return -ENODEV;
 
-   spin_lock_irqsave(&dd->hfi1_snoop.snoop_lock, flags);
-
mode_flag = dd->hfi1_snoop.mode_flag;
 
if (((_IOC_DIR(cmd) & _IOC_READ)
&& !access_ok(VERIFY_WRITE, (void __user *)arg, _IOC_SIZE(cmd)))
|| ((_IOC_DIR(cmd) & _IOC_WRITE)
&& !access_ok(VERIFY_READ, (void __user *)arg, _IOC_SIZE(cmd {
-   ret = -EFAULT;
+   return -EFAULT;
} else if (!capable(CAP_SYS_ADMIN)) {
-   ret = -EPERM;
+   return -EPERM;
} else if ((mode_flag & HFI1_PORT_CAPTURE_MODE) &&
   (cmd != HFI1_SNOOP_IOCCLEARQUEUE) &&
   (cmd != HFI1_SNOOP_IOCCLEARFILTER) &&
@@ -1000,208 +998,208 @@ static long hfi1_ioctl(struct file *fp, unsigned int 
cmd, unsigned long arg)
 * 3.Set capture filter
 * Other are invalid.
 */
+   return -EINVAL;
+   }
+
+   spin_lock_irqsave(&dd->hfi1_snoop.snoop_lock, flags);
+
+   switch (cmd) {
+   case HFI1_SNOOP_IOCSETLINKSTATE:
+   snoop_dbg("HFI1_SNOOP_IOCSETLINKSTATE is not valid");
ret = -EINVAL;
-   } else {
-   switch (cmd) {
-   case HFI1_SNOOP_IOCSETLINKSTATE:
-   snoop_dbg("HFI1_SNOOP_IOCSETLINKSTATE is not valid");
+   break;
+
+   case HFI1_SNOOP_IOCSETLINKSTATE_EXTRA:
+   memset(&link_info, 0, sizeof(link_info));
+
+   if (copy_from_user(&link_info,
+  (struct hfi1_link_info __user *)arg,
+  sizeof(link_info)))
+   ret = -EFAULT;
+
+   value = link_info.port_state;
+   index = link_info.port_number;
+   if (index > dd->num_pports - 1) {
ret = -EINVAL;
break;
+   }
 
-   case HFI1_SNOOP_IOCSETLINKSTATE_EXTRA:
-   memset(&link_info, 0, sizeof(link_info));
+   ppd = &dd->pport[index];
+   if (!ppd) {
+   ret = -EINVAL;
+   break;
+   }
 
-   if (copy_from_user(&link_info,
-   (struct hfi1_link_info __user *)arg,
-   sizeof(link_info)))
-   ret = -EFAULT;
+   /* What we want to transition to */
+   phys_state = (value >> 4) & 0xF;
+   link_state = value & 0xF;
+   snoop_dbg("Setting link state 0x%x", value);
 
-   value = link_info.port_state;
-   index = link_info.port_number;
-   if (index > dd->num_pports - 1) {
-   ret = -EINVAL;
+   switch (link_state) {
+   case IB_PORT_NOP:
+   if (phys_state == 0)
break;
-   }
-
-   ppd = &dd->pport[index];
-   if (!ppd) {
-   ret = -EINVAL;
-   break;
-   }
-
-   /* What we want to transition to */
-   phys_state = (value >> 4) & 0xF;
-   link_state = value & 0xF;
-   snoop_dbg("Setting link state 0x%x", value);
-
-   switch (link_state) {
-   case IB_PORT_NOP:
-   if (phys_state == 0)
-   break;
-   /* fall through */
-   case IB_PORT_DOWN:
-   switch (phys_state) {
-   case 0:
-   dev_state = HLS_DN_DOWNDEF;
-   break;
-   case 2:
-   dev_state = HLS_DN_POLL;
-   break;
-   case 3:
-   dev_state = HLS_DN_DISABLE;
-   break;
-   default:
-

[PATCH 5/8] staging/rdma/hfi1: return early if setlink state was specified

2015-11-10 Thread ira . weiny
From: Ira Weiny 

Set link state was not supported and so we can return early in the parameter
checks rather than falling through the switch clause.

Signed-off-by: Dennis Dalessandro 
Signed-off-by: Ira Weiny 
---
 drivers/staging/rdma/hfi1/diag.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
index 556a47591989..a489a79dd3b6 100644
--- a/drivers/staging/rdma/hfi1/diag.c
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -999,16 +999,14 @@ static long hfi1_ioctl(struct file *fp, unsigned int cmd, 
unsigned long arg)
 * Other are invalid.
 */
return -EINVAL;
+   } else if (cmd == HFI1_SNOOP_IOCSETLINKSTATE) {
+   /* We do not support the old setlink state */
+   return -EINVAL;
}
 
spin_lock_irqsave(&dd->hfi1_snoop.snoop_lock, flags);
 
switch (cmd) {
-   case HFI1_SNOOP_IOCSETLINKSTATE:
-   snoop_dbg("HFI1_SNOOP_IOCSETLINKSTATE is not valid");
-   ret = -EINVAL;
-   break;
-
case HFI1_SNOOP_IOCSETLINKSTATE_EXTRA:
memset(&link_info, 0, sizeof(link_info));
 
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] IB/hfi1: might sleep under spinlock in hfi1_ioctl()

2015-11-10 Thread ira.weiny
On Fri, Oct 30, 2015 at 07:58:18PM -0400, ira. weiny wrote:
> On Sat, Oct 31, 2015 at 12:32:29AM +0300, Alexey Khoroshilov wrote:
> > Hello,
> > 
> > hfi1_ioctl() contains many calls to might sleep functions with
> > dd->hfi1_snoop.snoop_lock spinlock held (for example, access_ok,
> > copy_from_user, kzalloc(GFP_KERNEL), etc.).
> > 
> > Should dd->hfi1_snoop.snoop_lock be acquired just before updating state?
> 
> I believe you are correct.
> 
> I am currently in the process of pushing fixes to the staging tree.
> 
> We have a patch which fixes this queued up but it depends on at least one 
> other
> patch in my queue.
> 
> I will do my best to get this submitted soon.

I have just posted a series which addresses this problem as well as doing
general clean up on hfi1_ioctl.  The specific fix is contained in this patch.

[PATCH 7/8] staging/rdma/hfi1: Reduce snoop locking scope in IOCTL handler.

Thanks for the report,
Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 5/9] staging/rdma/hfi1: Add function stubs for TID caching

2015-11-10 Thread ira.weiny
> > 
> > Ok, patch 7-9 of this series do not depend on this patch nor number 6.  I 
> > will
> > resend those 3 while I figure out what to do about these 2:
> > 
> > staging/rdma/hfi1: Add function stubs for TID caching
> > staging/rdma/hfi1: Implement Expected Receive TID caching
> > 
> > Frankly this was an attempt to reduce the size of "Implement Expected 
> > Receive
> > TID caching".  I obviously did something wrong.
> > 
> > I really don't know that I can split these up any more without causing 
> > issues
> > with bisecting the code.
> 
> I strongly doubt that you created this new feature all "at once", it
> took a set of steps to get to your final destination.  So show that
> work, like your math professor told you...
> 

The original author and I have been going through the code to see what we can
do.  We have identified a couple of other pieces which can be split.

One question.  Is it ok to have functionality which is added which is unused in
a preliminary patch?  I believe this is ok as long as the code compiles but I
just wanted to make sure.  While there are different operations added in this
patch it is broken to not use them as a set.  So we need to have a series which
implement the pieces with a final patch which exposes the set of operations.

Is this acceptable?

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] staging/rdma/hfi1: Reduce number of parameters passed to send handlers

2015-11-10 Thread Leon Romanovsky
On Wed, Nov 11, 2015 at 12:34:37AM -0500, ira.we...@intel.com wrote:
> From: Dennis Dalessandro 
> 
> +int snoop_send_dma_handler(struct hfi1_qp *qp, struct hfi1_pkt_state *ps,
> +u64 pbc)
>  {
> - pr_alert("Snooping/Capture of  Send DMA Packets Is Not Supported!\n");
> + pr_alert("Snooping/Capture of Send DMA Packets Is Not Supported!\n");
>   snoop_dbg("Unsupported Operation");
Is it really necessary to do both print alert to message log (pr_alert) and to
trace buffer (snoop_dbg)?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/13] staging/rdma/hfi1: Add space between concatenated string elements

2015-11-10 Thread Jubin John
Space between concantenated string elements is more human
readable and fixes the checkpatch issue:
CHECK: Concatenated strings should use spaces between elements

Reviewed-by: Mike Marciniszyn 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/chip.c   |   10 +-
 drivers/staging/rdma/hfi1/driver.c |2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index 9db141d..eb281e9 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -8865,8 +8865,8 @@ static int request_intx_irq(struct hfi1_devdata *dd)
 {
int ret;
 
-   snprintf(dd->intx_name, sizeof(dd->intx_name), DRIVER_NAME"_%d",
-   dd->unit);
+   snprintf(dd->intx_name, sizeof(dd->intx_name), DRIVER_NAME "_%d",
+dd->unit);
ret = request_irq(dd->pcidev->irq, general_interrupt,
  IRQF_SHARED, dd->intx_name, dd);
if (ret)
@@ -8968,7 +8968,7 @@ static int request_msix_irqs(struct hfi1_devdata *dd)
handler = general_interrupt;
arg = dd;
snprintf(me->name, sizeof(me->name),
-   DRIVER_NAME"_%d", dd->unit);
+DRIVER_NAME "_%d", dd->unit);
err_info = "general";
} else if (first_sdma <= i && i < last_sdma) {
idx = i - first_sdma;
@@ -8976,7 +8976,7 @@ static int request_msix_irqs(struct hfi1_devdata *dd)
handler = sdma_interrupt;
arg = sde;
snprintf(me->name, sizeof(me->name),
-   DRIVER_NAME"_%d sdma%d", dd->unit, idx);
+DRIVER_NAME "_%d sdma%d", dd->unit, idx);
err_info = "sdma";
remap_sdma_interrupts(dd, idx, i);
} else if (first_rx <= i && i < last_rx) {
@@ -8996,7 +8996,7 @@ static int request_msix_irqs(struct hfi1_devdata *dd)
thread = receive_context_thread;
arg = rcd;
snprintf(me->name, sizeof(me->name),
-   DRIVER_NAME"_%d kctxt%d", dd->unit, idx);
+DRIVER_NAME "_%d kctxt%d", dd->unit, idx);
err_info = "receive context";
remap_receive_available_interrupt(dd, idx, i);
} else {
diff --git a/drivers/staging/rdma/hfi1/driver.c 
b/drivers/staging/rdma/hfi1/driver.c
index ce69141..e0cc196 100644
--- a/drivers/staging/rdma/hfi1/driver.c
+++ b/drivers/staging/rdma/hfi1/driver.c
@@ -158,7 +158,7 @@ const char *get_unit_name(int unit)
 {
static char iname[16];
 
-   snprintf(iname, sizeof(iname), DRIVER_NAME"_%u", unit);
+   snprintf(iname, sizeof(iname), DRIVER_NAME "_%u", unit);
return iname;
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/13] staging/rdma/hfi1: rework is_a0() and is_bx()

2015-11-10 Thread Jubin John
From: Mike Marciniszyn 

The current is_bx() will incorrectly match on other steppings.

is_a0() is removed in favor of is_ax().

Reviewed-by: Mark Debbage 
Signed-off-by: Mike Marciniszyn 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/chip.c |   46 ++
 drivers/staging/rdma/hfi1/chip.h |1 -
 drivers/staging/rdma/hfi1/firmware.c |2 +-
 drivers/staging/rdma/hfi1/hfi.h  |4 +-
 drivers/staging/rdma/hfi1/init.c |2 +-
 drivers/staging/rdma/hfi1/pcie.c |4 +-
 6 files changed, 25 insertions(+), 34 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index eb281e9..75416d8 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -1871,13 +1871,6 @@ static struct cntr_entry port_cntrs[PORT_CNTR_LAST] = {
 
 /*  */
 
-/* return true if this is chip revision revision a0 */
-int is_a0(struct hfi1_devdata *dd)
-{
-   return ((dd->revision >> CCE_REVISION_CHIP_REV_MINOR_SHIFT)
-   & CCE_REVISION_CHIP_REV_MINOR_MASK) == 0;
-}
-
 /* return true if this is chip revision revision a */
 int is_ax(struct hfi1_devdata *dd)
 {
@@ -1893,7 +1886,7 @@ int is_bx(struct hfi1_devdata *dd)
u8 chip_rev_minor =
dd->revision >> CCE_REVISION_CHIP_REV_MINOR_SHIFT
& CCE_REVISION_CHIP_REV_MINOR_MASK;
-   return !!(chip_rev_minor & 0x10);
+   return (chip_rev_minor & 0xF0) == 0x10;
 }
 
 /*
@@ -2188,9 +2181,8 @@ static void handle_cce_err(struct hfi1_devdata *dd, u32 
unused, u64 reg)
dd_dev_info(dd, "CCE Error: %s\n",
cce_err_status_string(buf, sizeof(buf), reg));
 
-   if ((reg & CCE_ERR_STATUS_CCE_CLI2_ASYNC_FIFO_PARITY_ERR_SMASK)
-   && is_a0(dd)
-   && (dd->icode != ICODE_FUNCTIONAL_SIMULATOR)) {
+   if ((reg & CCE_ERR_STATUS_CCE_CLI2_ASYNC_FIFO_PARITY_ERR_SMASK) &&
+   is_ax(dd) && (dd->icode != ICODE_FUNCTIONAL_SIMULATOR)) {
/* this error requires a manual drop into SPC freeze mode */
/* then a fix up */
start_freeze_handling(dd->pport, FREEZE_SELF);
@@ -2250,7 +2242,7 @@ static void handle_rxe_err(struct hfi1_devdata *dd, u32 
unused, u64 reg)
 * Freeze mode recovery is disabled for the errors
 * in RXE_FREEZE_ABORT_MASK
 */
-   if (is_a0(dd) && (reg & RXE_FREEZE_ABORT_MASK))
+   if (is_ax(dd) && (reg & RXE_FREEZE_ABORT_MASK))
flags = FREEZE_ABORT;
 
start_freeze_handling(dd->pport, flags);
@@ -2353,7 +2345,7 @@ static void handle_egress_err(struct hfi1_devdata *dd, 
u32 unused, u64 reg)
 
if (reg & ALL_TXE_EGRESS_FREEZE_ERR)
start_freeze_handling(dd->pport, 0);
-   if (is_a0(dd) && (reg &
+   if (is_ax(dd) && (reg &
SEND_EGRESS_ERR_STATUS_TX_CREDIT_RETURN_VL_ERR_SMASK)
&& (dd->icode != ICODE_FUNCTIONAL_SIMULATOR))
start_freeze_handling(dd->pport, 0);
@@ -3048,7 +3040,7 @@ static void adjust_lcb_for_fpga_serdes(struct 
hfi1_devdata *dd)
/* else this is _p */
 
version = emulator_rev(dd);
-   if (!is_a0(dd))
+   if (!is_ax(dd))
version = 0x2d; /* all B0 use 0x2d or higher settings */
 
if (version <= 0x12) {
@@ -3334,7 +3326,7 @@ void handle_freeze(struct work_struct *work)
write_csr(dd, CCE_CTRL, CCE_CTRL_SPC_UNFREEZE_SMASK);
wait_for_freeze_status(dd, 0);
 
-   if (is_a0(dd)) {
+   if (is_ax(dd)) {
write_csr(dd, CCE_CTRL, CCE_CTRL_SPC_FREEZE_SMASK);
wait_for_freeze_status(dd, 1);
write_csr(dd, CCE_CTRL, CCE_CTRL_SPC_UNFREEZE_SMASK);
@@ -3862,7 +3854,7 @@ void handle_verify_cap(struct work_struct *work)
 *  REPLAY_BUF_MBE_SMASK
 *  FLIT_INPUT_BUF_MBE_SMASK
 */
-   if (is_a0(dd)) {/* fixed in B0 */
+   if (is_ax(dd)) {/* fixed in B0 */
reg = read_csr(dd, DC_LCB_CFG_LINK_KILL_EN);
reg |= DC_LCB_CFG_LINK_KILL_EN_REPLAY_BUF_MBE_SMASK
| DC_LCB_CFG_LINK_KILL_EN_FLIT_INPUT_BUF_MBE_SMASK;
@@ -7297,8 +7289,8 @@ static int set_buffer_control(struct hfi1_devdata *dd,
 */
use_all_mask = 0;
if ((be16_to_cpu(new_bc->overall_shared_limit) <
-   be16_to_cpu(cur_bc.overall_shared_limit))
-   || (is_a0(dd) && any_shared_limit_changing)) {
+be16_to_cpu(cur_bc.overall_shared_limit)) ||
+   (is_ax(dd) && any_shared_limit_changing)) {
set_global_shared(dd, 0);
cur_bc.overall_shared_limit = 0;
use_all_mask = 1;
@@ -7472,7 +7464,7 @@ int 

[PATCH 12/13] staging/rdma/hfi1: Read EFI variable for device description

2015-11-10 Thread Jubin John
From: Dean Luick 

Read an EFI variable for the device description.  Create the
infrastructure for additional variable reads.

Reviewed-by: Easwar Hariharan 
Signed-off-by: Dean Luick 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/Makefile |2 +-
 drivers/staging/rdma/hfi1/chip.c   |   38 ++--
 drivers/staging/rdma/hfi1/efivar.c |  176 
 drivers/staging/rdma/hfi1/efivar.h |   60 
 4 files changed, 267 insertions(+), 9 deletions(-)
 create mode 100644 drivers/staging/rdma/hfi1/efivar.c
 create mode 100644 drivers/staging/rdma/hfi1/efivar.h

diff --git a/drivers/staging/rdma/hfi1/Makefile 
b/drivers/staging/rdma/hfi1/Makefile
index 2e5daa6..68c5a31 100644
--- a/drivers/staging/rdma/hfi1/Makefile
+++ b/drivers/staging/rdma/hfi1/Makefile
@@ -7,7 +7,7 @@
 #
 obj-$(CONFIG_INFINIBAND_HFI1) += hfi1.o
 
-hfi1-y := chip.o cq.o device.o diag.o dma.o driver.o eprom.o file_ops.o 
firmware.o \
+hfi1-y := chip.o cq.o device.o diag.o dma.o driver.o efivar.o eprom.o 
file_ops.o firmware.o \
init.o intr.o keys.o mad.o mmap.o mr.o pcie.o pio.o pio_copy.o \
qp.o qsfp.o rc.o ruc.o sdma.o srq.o sysfs.o trace.o twsi.o \
uc.o ud.o user_pages.o user_sdma.o verbs_mcast.o verbs.o
diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index 15ae4c9..dcaa61c 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -63,6 +63,7 @@
 #include "pio.h"
 #include "sdma.h"
 #include "eprom.h"
+#include "efivar.h"
 
 #define NUM_IB_PORTS 1
 
@@ -10524,6 +10525,32 @@ static void asic_should_init(struct hfi1_devdata *dd)
spin_unlock_irqrestore(&hfi1_devs_lock, flags);
 }
 
+/*
+ * Set dd->boardname.  Use a generic name if a name is not returned from
+ * EFI variable space.
+ *
+ * Return 0 on success, -ENOMEM if space could not be allocated.
+ */
+static int obtain_boardname(struct hfi1_devdata *dd)
+{
+   /* generic board description */
+   const char generic[] =
+   "Intel Omni-Path Host Fabric Interface Adapter 100 Series";
+   unsigned long size;
+   int ret;
+
+   ret = read_hfi1_efi_var(dd, "description", &size,
+   (void **)&dd->boardname);
+   if (ret) {
+   dd_dev_err(dd, "Board description not found\n");
+   /* use generic description */
+   dd->boardname = kstrdup(generic, GFP_KERNEL);
+   if (!dd->boardname)
+   return -ENOMEM;
+   }
+   return 0;
+}
+
 /**
  * Allocate and initialize the device structure for the hfi.
  * @dev: the pci_dev for hfi1_ib device
@@ -10721,18 +10748,13 @@ struct hfi1_devdata *hfi1_init_dd(struct pci_dev 
*pdev,
 
parse_platform_config(dd);
 
-   /* add board names as they are defined */
-   dd->boardname = kmalloc(64, GFP_KERNEL);
-   if (!dd->boardname)
+   ret = obtain_boardname(dd);
+   if (ret)
goto bail_cleanup;
-   snprintf(dd->boardname, 64, "Board ID 0x%llx",
-dd->revision >> CCE_REVISION_BOARD_ID_LOWER_NIBBLE_SHIFT
-   & CCE_REVISION_BOARD_ID_LOWER_NIBBLE_MASK);
 
snprintf(dd->boardversion, BOARD_VERS_MAX,
-"ChipABI %u.%u, %s, ChipRev %u.%u, SW Compat %llu\n",
+"ChipABI %u.%u, ChipRev %u.%u, SW Compat %llu\n",
 HFI1_CHIP_VERS_MAJ, HFI1_CHIP_VERS_MIN,
-dd->boardname,
 (u32)dd->majrev,
 (u32)dd->minrev,
 (dd->revision >> CCE_REVISION_SW_SHIFT)
diff --git a/drivers/staging/rdma/hfi1/efivar.c 
b/drivers/staging/rdma/hfi1/efivar.c
new file mode 100644
index 000..476391d
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/efivar.c
@@ -0,0 +1,176 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in
+ *the

[PATCH 11/13] staging/rdma/hfi1: Remove unneeded variable index

2015-11-10 Thread Jubin John
From: Dean Luick 

The variable "index" increments the same as dd->ndevcntrs.
Just use the later.  Remove uneeded usage of "index" in the
fill loop - it is not used there or later in the function.

Reviewed-by: Dennis Dalessandro 
Signed-off-by: Dean Luick 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/chip.c |   19 +++
 1 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index 5bd4841..15ae4c9 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -8327,7 +8327,7 @@ mod_timer(&dd->synth_stats_timer, jiffies + HZ * 
SYNTH_CNT_TIME);
 #define C_MAX_NAME 13 /* 12 chars + one for /0 */
 static int init_cntrs(struct hfi1_devdata *dd)
 {
-   int i, rcv_ctxts, index, j;
+   int i, rcv_ctxts, j;
size_t sz;
char *p;
char name[C_MAX_NAME];
@@ -8344,7 +8344,6 @@ static int init_cntrs(struct hfi1_devdata *dd)
/* size names and determine how many we have*/
dd->ndevcntrs = 0;
sz = 0;
-   index = 0;
 
for (i = 0; i < DEV_CNTR_LAST; i++) {
hfi1_dbg_early("Init cntr %s\n", dev_cntrs[i].name);
@@ -8355,7 +8354,7 @@ static int init_cntrs(struct hfi1_devdata *dd)
 
if (dev_cntrs[i].flags & CNTR_VL) {
hfi1_dbg_early("\tProcessing VL cntr\n");
-   dev_cntrs[i].offset = index;
+   dev_cntrs[i].offset = dd->ndevcntrs;
for (j = 0; j < C_VL_COUNT; j++) {
memset(name, '\0', C_MAX_NAME);
snprintf(name, C_MAX_NAME, "%s%d",
@@ -8365,13 +8364,12 @@ static int init_cntrs(struct hfi1_devdata *dd)
sz++;
hfi1_dbg_early("\t\t%s\n", name);
dd->ndevcntrs++;
-   index++;
}
} else if (dev_cntrs[i].flags & CNTR_SDMA) {
hfi1_dbg_early(
   "\tProcessing per SDE counters chip 
enginers %u\n",
   dd->chip_sdma_engines);
-   dev_cntrs[i].offset = index;
+   dev_cntrs[i].offset = dd->ndevcntrs;
for (j = 0; j < dd->chip_sdma_engines; j++) {
memset(name, '\0', C_MAX_NAME);
snprintf(name, C_MAX_NAME, "%s%d",
@@ -8380,24 +8378,22 @@ static int init_cntrs(struct hfi1_devdata *dd)
sz++;
hfi1_dbg_early("\t\t%s\n", name);
dd->ndevcntrs++;
-   index++;
}
} else {
/* +1 for newline  */
sz += strlen(dev_cntrs[i].name) + 1;
+   dev_cntrs[i].offset = dd->ndevcntrs;
dd->ndevcntrs++;
-   dev_cntrs[i].offset = index;
-   index++;
hfi1_dbg_early("\tAdding %s\n", dev_cntrs[i].name);
}
}
 
/* allocate space for the counter values */
-   dd->cntrs = kcalloc(index, sizeof(u64), GFP_KERNEL);
+   dd->cntrs = kcalloc(dd->ndevcntrs, sizeof(u64), GFP_KERNEL);
if (!dd->cntrs)
goto bail;
 
-   dd->scntrs = kcalloc(index, sizeof(u64), GFP_KERNEL);
+   dd->scntrs = kcalloc(dd->ndevcntrs, sizeof(u64), GFP_KERNEL);
if (!dd->scntrs)
goto bail;
 
@@ -8409,7 +8405,7 @@ static int init_cntrs(struct hfi1_devdata *dd)
goto bail;
 
/* fill in the names */
-   for (p = dd->cntrnames, i = 0, index = 0; i < DEV_CNTR_LAST; i++) {
+   for (p = dd->cntrnames, i = 0; i < DEV_CNTR_LAST; i++) {
if (dev_cntrs[i].flags & CNTR_DISABLED) {
/* Nothing */
} else {
@@ -8439,7 +8435,6 @@ static int init_cntrs(struct hfi1_devdata *dd)
p += strlen(dev_cntrs[i].name);
*p++ = '\n';
}
-   index++;
}
}
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/13] staging/rdma/hfi: Remove rcv bubbles code

2015-11-10 Thread Jubin John
From: Ira Weiny 

Rcv bubbles were improperly calculated for HFIs, fix that here.

Reviewed-by: Mike Marciniszyn 
Reviewed-by: Arthur Kepner 
Signed-off-by: Ira Weiny 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/mad.c |   64 +-
 1 files changed, 2 insertions(+), 62 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/mad.c b/drivers/staging/rdma/hfi1/mad.c
index a4b7033..a23ddc4 100644
--- a/drivers/staging/rdma/hfi1/mad.c
+++ b/drivers/staging/rdma/hfi1/mad.c
@@ -2271,34 +2271,8 @@ static void a0_portstatus(struct hfi1_pportdata *ppd,
 {
if (!is_bx(ppd->dd)) {
unsigned long vl;
-   int vfi = 0;
u64 max_vl_xmit_wait = 0, tmp;
u32 vl_all_mask = VL_MASK_ALL;
-   u64 rcv_data, rcv_bubble;
-
-   rcv_data = be64_to_cpu(rsp->port_rcv_data);
-   rcv_bubble = be64_to_cpu(rsp->port_rcv_bubble);
-   /* In the measured time period, calculate the total number
-* of flits that were received. Subtract out one false
-* rcv_bubble increment for every 32 received flits but
-* don't let the number go negative.
-*/
-   if (rcv_bubble >= (rcv_data>>5)) {
-   rcv_bubble -= (rcv_data>>5);
-   rsp->port_rcv_bubble = cpu_to_be64(rcv_bubble);
-   }
-   for_each_set_bit(vl, (unsigned long *)&(vl_select_mask),
-8 * sizeof(vl_select_mask)) {
-   rcv_data = be64_to_cpu(rsp->vls[vfi].port_vl_rcv_data);
-   rcv_bubble =
-   be64_to_cpu(rsp->vls[vfi].port_vl_rcv_bubble);
-   if (rcv_bubble >= (rcv_data>>5)) {
-   rcv_bubble -= (rcv_data>>5);
-   rsp->vls[vfi].port_vl_rcv_bubble =
-   cpu_to_be64(rcv_bubble);
-   }
-   vfi++;
-   }
 
for_each_set_bit(vl, (unsigned long *)&(vl_all_mask),
 8 * sizeof(vl_all_mask)) {
@@ -2363,8 +2337,6 @@ static int pma_get_opa_portstatus(struct opa_pma_mad *pmp,
  CNTR_INVALID_VL));
rsp->port_rcv_data = cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_FLITS,
 CNTR_INVALID_VL));
-   rsp->port_rcv_bubble =
-   cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_BBL, CNTR_INVALID_VL));
rsp->port_xmit_pkts = cpu_to_be64(read_dev_cntr(dd, C_DC_XMIT_PKTS,
  CNTR_INVALID_VL));
rsp->port_rcv_pkts = cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_PKTS,
@@ -2434,9 +2406,6 @@ static int pma_get_opa_portstatus(struct opa_pma_mad *pmp,
 
tmp = read_dev_cntr(dd, C_DC_RX_FLIT_VL, idx_from_vl(vl));
rsp->vls[vfi].port_vl_rcv_data = cpu_to_be64(tmp);
-   rsp->vls[vfi].port_vl_rcv_bubble =
-   cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_BBL_VL,
-   idx_from_vl(vl)));
 
rsp->vls[vfi].port_vl_rcv_pkts =
cpu_to_be64(read_dev_cntr(dd, C_DC_RX_PKT_VL,
@@ -2519,32 +2488,8 @@ static void a0_datacounters(struct hfi1_devdata *dd, 
struct _port_dctrs *rsp,
if (!is_bx(dd)) {
unsigned long vl;
int vfi = 0;
-   u64 rcv_data, rcv_bubble, sum_vl_xmit_wait = 0;
-
-   rcv_data = be64_to_cpu(rsp->port_rcv_data);
-   rcv_bubble = be64_to_cpu(rsp->port_rcv_bubble);
-   /* In the measured time period, calculate the total number
-* of flits that were received. Subtract out one false
-* rcv_bubble increment for every 32 received flits but
-* don't let the number go negative.
-*/
-   if (rcv_bubble >= (rcv_data>>5)) {
-   rcv_bubble -= (rcv_data>>5);
-   rsp->port_rcv_bubble = cpu_to_be64(rcv_bubble);
-   }
-   for_each_set_bit(vl, (unsigned long *)&(vl_select_mask),
-   8 * sizeof(vl_select_mask)) {
-   rcv_data = be64_to_cpu(rsp->vls[vfi].port_vl_rcv_data);
-   rcv_bubble =
-   be64_to_cpu(rsp->vls[vfi].port_vl_rcv_bubble);
-   if (rcv_bubble >= (rcv_data>>5)) {
-   rcv_bubble -= (rcv_data>>5);
-   rsp->vls[vfi].port_vl_rcv_bubble =
-   cpu_to_be64(rcv_bubble);
-   }
-   vfi++;
-   }
-   vfi = 0;
+   u64 sum_vl_xmit_wait = 0;
+
for_ea

[PATCH 08/13] staging/rdma/hfi1: change krcvqs module parameter type from byte to uint

2015-11-10 Thread Jubin John
From: Mark F. Brown 

The krcvqs parameter is displayed incorrectly in sysfs.
The workaround is to set the param type as uint.

Reviewed-by: Mike Marciniszyn 
Reviewed-by: Mitko Haralanov 
Signed-off-by: Mark F. Brown 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/hfi.h  |2 +-
 drivers/staging/rdma/hfi1/init.c |4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/hfi.h b/drivers/staging/rdma/hfi1/hfi.h
index fa2a284..a17fc87 100644
--- a/drivers/staging/rdma/hfi1/hfi.h
+++ b/drivers/staging/rdma/hfi1/hfi.h
@@ -1651,7 +1651,7 @@ extern unsigned int hfi1_cu;
 extern unsigned int user_credit_return_threshold;
 extern uint num_rcv_contexts;
 extern unsigned n_krcvqs;
-extern u8 krcvqs[];
+extern uint krcvqs[];
 extern int krcvqsset;
 extern uint kdeth_qp;
 extern uint loopback;
diff --git a/drivers/staging/rdma/hfi1/init.c b/drivers/staging/rdma/hfi1/init.c
index 0385a17..e65b999 100644
--- a/drivers/staging/rdma/hfi1/init.c
+++ b/drivers/staging/rdma/hfi1/init.c
@@ -86,9 +86,9 @@ module_param_named(num_rcv_contexts, num_rcv_contexts, uint, 
S_IRUGO);
 MODULE_PARM_DESC(
num_rcv_contexts, "Set max number of user receive contexts to use");
 
-u8 krcvqs[RXE_NUM_DATA_VL];
+uint krcvqs[RXE_NUM_DATA_VL];
 int krcvqsset;
-module_param_array(krcvqs, byte, &krcvqsset, S_IRUGO);
+module_param_array(krcvqs, uint, &krcvqsset, S_IRUGO);
 MODULE_PARM_DESC(krcvqs, "Array of the number of kernel receive queues by VL");
 
 /* computed based on above array */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/13] staging/rdma/hfi1: adding per SDMA engine stats to hfistats

2015-11-10 Thread Jubin John
From: Vennila Megavannan 

Added the following per sdma engine stats:
  - SendDmaDescFetchedCnt
  - software maintained count of SDMA interrupts
 (SDmaInt, SDmaIdleInt, SDmaProgressInt)
  - software maintained counts of SDMA error cases

Reviewed-by: Dennis Dalessandro 
Signed-off-by: Mike Marciniszyn 
Signed-off-by: Vennila Megavannan 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/chip.c   |  110 +++-
 drivers/staging/rdma/hfi1/chip.h   |5 +
 drivers/staging/rdma/hfi1/chip_registers.h |1 +
 drivers/staging/rdma/hfi1/hfi.h|1 +
 drivers/staging/rdma/hfi1/sdma.c   |9 ++-
 drivers/staging/rdma/hfi1/sdma.h   |7 ++
 6 files changed, 129 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index aac47ad..5bd4841 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -1296,10 +1296,58 @@ static u64 dev_access_u32_csr(const struct cntr_entry 
*entry,
void *context, int vl, int mode, u64 data)
 {
struct hfi1_devdata *dd = context;
+   u64 csr = entry->csr;
 
-   if (vl != CNTR_INVALID_VL)
-   return 0;
-   return read_write_csr(dd, entry->csr, mode, data);
+   if (entry->flags & CNTR_SDMA) {
+   if (vl == CNTR_INVALID_VL)
+   return 0;
+   csr += 0x100 * vl;
+   } else {
+   if (vl != CNTR_INVALID_VL)
+   return 0;
+   }
+   return read_write_csr(dd, csr, mode, data);
+}
+
+static u64 access_sde_err_cnt(const struct cntr_entry *entry,
+ void *context, int idx, int mode, u64 data)
+{
+   struct hfi1_devdata *dd = (struct hfi1_devdata *)context;
+
+   if (dd->per_sdma && idx < dd->num_sdma)
+   return dd->per_sdma[idx].err_cnt;
+   return 0;
+}
+
+static u64 access_sde_int_cnt(const struct cntr_entry *entry,
+ void *context, int idx, int mode, u64 data)
+{
+   struct hfi1_devdata *dd = (struct hfi1_devdata *)context;
+
+   if (dd->per_sdma && idx < dd->num_sdma)
+   return dd->per_sdma[idx].sdma_int_cnt;
+   return 0;
+}
+
+static u64 access_sde_idle_int_cnt(const struct cntr_entry *entry,
+  void *context, int idx, int mode, u64 data)
+{
+   struct hfi1_devdata *dd = (struct hfi1_devdata *)context;
+
+   if (dd->per_sdma && idx < dd->num_sdma)
+   return dd->per_sdma[idx].idle_int_cnt;
+   return 0;
+}
+
+static u64 access_sde_progress_int_cnt(const struct cntr_entry *entry,
+  void *context, int idx, int mode,
+  u64 data)
+{
+   struct hfi1_devdata *dd = (struct hfi1_devdata *)context;
+
+   if (dd->per_sdma && idx < dd->num_sdma)
+   return dd->per_sdma[idx].progress_int_cnt;
+   return 0;
 }
 
 static u64 dev_access_u64_csr(const struct cntr_entry *entry, void *context,
@@ -1728,6 +1776,22 @@ static struct cntr_entry dev_cntrs[DEV_CNTR_LAST] = {
access_sw_kmem_wait),
 [C_SW_SEND_SCHED] = CNTR_ELEM("SendSched", 0, 0, CNTR_NORMAL,
access_sw_send_schedule),
+[C_SDMA_DESC_FETCHED_CNT] = CNTR_ELEM("SDEDscFdCn",
+ SEND_DMA_DESC_FETCHED_CNT, 0,
+ CNTR_NORMAL | CNTR_32BIT | CNTR_SDMA,
+ dev_access_u32_csr),
+[C_SDMA_INT_CNT] = CNTR_ELEM("SDMAInt", 0, 0,
+CNTR_NORMAL | CNTR_32BIT | CNTR_SDMA,
+access_sde_int_cnt),
+[C_SDMA_ERR_CNT] = CNTR_ELEM("SDMAErrCt", 0, 0,
+CNTR_NORMAL | CNTR_32BIT | CNTR_SDMA,
+access_sde_err_cnt),
+[C_SDMA_IDLE_INT_CNT] = CNTR_ELEM("SDMAIdInt", 0, 0,
+ CNTR_NORMAL | CNTR_32BIT | CNTR_SDMA,
+ access_sde_idle_int_cnt),
+[C_SDMA_PROGRESS_INT_CNT] = CNTR_ELEM("SDMAPrIntCn", 0, 0,
+ CNTR_NORMAL | CNTR_32BIT | CNTR_SDMA,
+ access_sde_progress_int_cnt)
 };
 
 static struct cntr_entry port_cntrs[PORT_CNTR_LAST] = {
@@ -2520,6 +2584,7 @@ static void handle_sdma_eng_err(struct hfi1_devdata *dd,
dd_dev_err(sde->dd, "CONFIG SDMA(%u) source: %u status 0x%llx\n",
   sde->this_idx, source, (unsigned long long)status);
 #endif
+   sde->err_cnt++;
sdma_engine_error(sde, status);
 }
 
@@ -7885,6 +7950,20 @@ u32 hfi1_read_cntrs(struct hfi1_devdata *dd, loff_t pos, 
char **namep,
dd->cntrs[entry->offset + j] =
val;
}
+ 

[PATCH 13/13] staging/rdma/hfi1: Adjust EPROM partitions, add EPROM commands

2015-11-10 Thread Jubin John
From: Dean Luick 

Add a new EPROM partition, adjusting partition placement.

Add EPROM range commands as a supserset of the partition
commands.  Remove old partition commands.

Enhance EPROM erase, creating a range function and using the
largest erase (sub) commands when possible.

Reviewed-by: Dennis Dalessandro 
Signed-off-by: Dean Luick 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/eprom.c|  119 +++---
 drivers/staging/rdma/hfi1/file_ops.c |   18 ++
 include/uapi/rdma/hfi/hfi1_user.h|   10 +--
 3 files changed, 77 insertions(+), 70 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/eprom.c 
b/drivers/staging/rdma/hfi1/eprom.c
index b61d3ae..fb620c9 100644
--- a/drivers/staging/rdma/hfi1/eprom.c
+++ b/drivers/staging/rdma/hfi1/eprom.c
@@ -53,17 +53,26 @@
 #include "eprom.h"
 
 /*
- * The EPROM is logically divided into two partitions:
+ * The EPROM is logically divided into three partitions:
  * partition 0: the first 128K, visible from PCI ROM BAR
- * partition 1: the rest
+ * partition 1: 4K config file (sector size)
+ * partition 2: the rest
  */
 #define P0_SIZE (128 * 1024)
+#define P1_SIZE   (4 * 1024)
 #define P1_START P0_SIZE
+#define P2_START (P0_SIZE + P1_SIZE)
+
+/* erase sizes supported by the controller */
+#define SIZE_4KB (4 * 1024)
+#define MASK_4KB (SIZE_4KB - 1)
 
-/* largest erase size supported by the controller */
 #define SIZE_32KB (32 * 1024)
 #define MASK_32KB (SIZE_32KB - 1)
 
+#define SIZE_64KB (64 * 1024)
+#define MASK_64KB (SIZE_64KB - 1)
+
 /* controller page size, in bytes */
 #define EP_PAGE_SIZE 256
 #define EEP_PAGE_MASK (EP_PAGE_SIZE - 1)
@@ -75,10 +84,12 @@
 #define CMD_READ_DATA(addr)((0x03 << CMD_SHIFT) | addr)
 #define CMD_READ_SR1   ((0x05 << CMD_SHIFT))
 #define CMD_WRITE_ENABLE   ((0x06 << CMD_SHIFT))
+#define CMD_SECTOR_ERASE_4KB(addr)  ((0x20 << CMD_SHIFT) | addr)
 #define CMD_SECTOR_ERASE_32KB(addr) ((0x52 << CMD_SHIFT) | addr)
 #define CMD_CHIP_ERASE ((0x60 << CMD_SHIFT))
 #define CMD_READ_MANUF_DEV_ID  ((0x90 << CMD_SHIFT))
 #define CMD_RELEASE_POWERDOWN_NOID  ((0xab << CMD_SHIFT))
+#define CMD_SECTOR_ERASE_64KB(addr) ((0xd8 << CMD_SHIFT) | addr)
 
 /* controller interface speeds */
 #define EP_SPEED_FULL 0x2  /* full speed */
@@ -188,28 +199,43 @@ static int erase_chip(struct hfi1_devdata *dd)
 }
 
 /*
- * Erase a range using the 32KB erase command.
+ * Erase a range.
  */
-static int erase_32kb_range(struct hfi1_devdata *dd, u32 start, u32 end)
+static int erase_range(struct hfi1_devdata *dd, u32 start, u32 len)
 {
+   u32 end = start + len;
int ret = 0;
 
if (end < start)
return -EINVAL;
 
-   if ((start & MASK_32KB) || (end & MASK_32KB)) {
+   /* check the end points for the minimum erase */
+   if ((start & MASK_4KB) || (end & MASK_4KB)) {
dd_dev_err(dd,
-   "%s: non-aligned range (0x%x,0x%x) for a 32KB erase\n",
+   "%s: non-aligned range (0x%x,0x%x) for a 4KB erase\n",
__func__, start, end);
return -EINVAL;
}
 
write_enable(dd);
 
-   for (; start < end; start += SIZE_32KB) {
+   while (start < end) {
write_csr(dd, ASIC_EEP_ADDR_CMD, CMD_WRITE_ENABLE);
-   write_csr(dd, ASIC_EEP_ADDR_CMD,
-   CMD_SECTOR_ERASE_32KB(start));
+   /* check in order of largest to smallest */
+   if (((start & MASK_64KB) == 0) && (start + SIZE_64KB <= end)) {
+   write_csr(dd, ASIC_EEP_ADDR_CMD,
+ CMD_SECTOR_ERASE_64KB(start));
+   start += SIZE_64KB;
+   } else if (((start & MASK_32KB) == 0) &&
+  (start + SIZE_32KB <= end)) {
+   write_csr(dd, ASIC_EEP_ADDR_CMD,
+ CMD_SECTOR_ERASE_32KB(start));
+   start += SIZE_32KB;
+   } else {/* 4KB will work */
+   write_csr(dd, ASIC_EEP_ADDR_CMD,
+ CMD_SECTOR_ERASE_4KB(start));
+   start += SIZE_4KB;
+   }
ret = wait_for_not_busy(dd);
if (ret)
goto done;
@@ -309,6 +335,18 @@ done:
return ret;
 }
 
+/* convert an range composite to a length, in bytes */
+static inline u32 extract_rlen(u32 composite)
+{
+   return (composite & 0x) * EP_PAGE_SIZE;
+}
+
+/* convert an range composite to a start, in bytes */
+static inline u32 extract_rstart(u32 composite)
+{
+   return (composite >> 16) * EP_PAGE_SIZE;
+}
+
 /*
  * Perform the given operation on the EPROM.  Called from user space.  The
  * user credentials have already been checked.
@@ -319,6 +357,8 @@ int handle_eprom_command(const struct hfi1_cmd *cmd)
 {

[PATCH 09/13] staging/rdma/hfi1: Change default krcvqs

2015-11-10 Thread Jubin John
Change the default number of krcvqs to number of numa nodes + 1
based on the performance data collected.

Reviewed-by: Mike Marciniszyn 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/chip.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index 75416d8..aac47ad 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -9174,7 +9174,7 @@ static int set_up_context_variables(struct hfi1_devdata 
*dd)
if (n_krcvqs)
num_kernel_contexts = n_krcvqs + MIN_KERNEL_KCTXTS;
else
-   num_kernel_contexts = num_online_nodes();
+   num_kernel_contexts = num_online_nodes() + 1;
num_kernel_contexts =
max_t(int, MIN_KERNEL_KCTXTS, num_kernel_contexts);
/*
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/13] staging/rdma/hfi1: Move s_sde to the read mostly portion of the hfi1_qp structure

2015-11-10 Thread Jubin John
From: Harish Chegondi 

This would reduce L2 cache misses on s_sde in the _hfi1_schedule_send
function when invoked from post_send thereby improving performance of
post_send.

Reviewed-by: Mike Marciniszyn 
Signed-off-by: Harish Chegondi 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/verbs.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/verbs.h 
b/drivers/staging/rdma/hfi1/verbs.h
index 600bd1b..638094d 100644
--- a/drivers/staging/rdma/hfi1/verbs.h
+++ b/drivers/staging/rdma/hfi1/verbs.h
@@ -441,6 +441,7 @@ struct hfi1_qp {
struct hfi1_swqe *s_wq;  /* send work queue */
struct hfi1_mmap_info *ip;
struct ahg_ib_header *s_hdr; /* next packet header to send */
+   struct sdma_engine *s_sde; /* current sde */
u8 s_sc;/* SC[0..4] for next packet */
unsigned long timeout_jiffies;  /* computed from timeout */
 
@@ -504,7 +505,6 @@ struct hfi1_qp {
struct hfi1_swqe *s_wqe;
struct hfi1_sge_state s_sge; /* current send request data */
struct hfi1_mregion *s_rdma_mr;
-   struct sdma_engine *s_sde; /* current sde */
u32 s_cur_size; /* size of send packet in bytes */
u32 s_len;  /* total length of s_sge */
u32 s_rdma_read_len;/* total length of s_rdma_read_sge */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/13] staging/rdma/hfi1: Use BIT macro

2015-11-10 Thread Jubin John
This patch fixes the checkpatch issue:
CHECK: Prefer using the BIT macro

Reviewed-by: Dean Luick 
Reviewed-by: Ira Weiny 
Reviewed-by: Mike Marciniszyn 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/chip.h   |   48 ++--
 drivers/staging/rdma/hfi1/common.h |4 +-
 drivers/staging/rdma/hfi1/diag.c   |4 +-
 drivers/staging/rdma/hfi1/hfi.h|   22 
 drivers/staging/rdma/hfi1/init.c   |2 +-
 drivers/staging/rdma/hfi1/mad.c|4 +-
 drivers/staging/rdma/hfi1/qp.h |2 +-
 drivers/staging/rdma/hfi1/qsfp.h   |   10 +++---
 drivers/staging/rdma/hfi1/sdma.c   |8 +++---
 drivers/staging/rdma/hfi1/verbs.h  |6 ++--
 10 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/chip.h b/drivers/staging/rdma/hfi1/chip.h
index ebf9041..ccfaf6d 100644
--- a/drivers/staging/rdma/hfi1/chip.h
+++ b/drivers/staging/rdma/hfi1/chip.h
@@ -241,18 +241,18 @@
 #define HCMD_SUCCESS 2
 
 /* DC_DC8051_DBG_ERR_INFO_SET_BY_8051.ERROR - error flags */
-#define SPICO_ROM_FAILED   (1 <<  0)
-#define UNKNOWN_FRAME  (1 <<  1)
-#define TARGET_BER_NOT_MET (1 <<  2)
-#define FAILED_SERDES_INTERNAL_LOOPBACK (1 <<  3)
-#define FAILED_SERDES_INIT (1 <<  4)
-#define FAILED_LNI_POLLING (1 <<  5)
-#define FAILED_LNI_DEBOUNCE(1 <<  6)
-#define FAILED_LNI_ESTBCOMM(1 <<  7)
-#define FAILED_LNI_OPTEQ   (1 <<  8)
-#define FAILED_LNI_VERIFY_CAP1 (1 <<  9)
-#define FAILED_LNI_VERIFY_CAP2 (1 << 10)
-#define FAILED_LNI_CONFIGLT(1 << 11)
+#define SPICO_ROM_FAILED   BIT(0)
+#define UNKNOWN_FRAME  BIT(1)
+#define TARGET_BER_NOT_MET BIT(2)
+#define FAILED_SERDES_INTERNAL_LOOPBACKBIT(3)
+#define FAILED_SERDES_INIT BIT(4)
+#define FAILED_LNI_POLLING BIT(5)
+#define FAILED_LNI_DEBOUNCEBIT(6)
+#define FAILED_LNI_ESTBCOMMBIT(7)
+#define FAILED_LNI_OPTEQ   BIT(8)
+#define FAILED_LNI_VERIFY_CAP1 BIT(9)
+#define FAILED_LNI_VERIFY_CAP2 BIT(10)
+#define FAILED_LNI_CONFIGLTBIT(11)
 
 #define FAILED_LNI (FAILED_LNI_POLLING | FAILED_LNI_DEBOUNCE \
| FAILED_LNI_ESTBCOMM | FAILED_LNI_OPTEQ \
@@ -261,16 +261,16 @@
| FAILED_LNI_CONFIGLT)
 
 /* DC_DC8051_DBG_ERR_INFO_SET_BY_8051.HOST_MSG - host message flags */
-#define HOST_REQ_DONE (1 << 0)
-#define BC_PWR_MGM_MSG(1 << 1)
-#define BC_SMA_MSG(1 << 2)
-#define BC_BCC_UNKOWN_MSG (1 << 3)
-#define BC_IDLE_UNKNOWN_MSG   (1 << 4)
-#define EXT_DEVICE_CFG_REQ(1 << 5)
-#define VERIFY_CAP_FRAME  (1 << 6)
-#define LINKUP_ACHIEVED   (1 << 7)
-#define LINK_GOING_DOWN   (1 << 8)
-#define LINK_WIDTH_DOWNGRADED  (1 << 9)
+#define HOST_REQ_DONE  BIT(0)
+#define BC_PWR_MGM_MSG BIT(1)
+#define BC_SMA_MSG BIT(2)
+#define BC_BCC_UNKNOWN_MSG BIT(3)
+#define BC_IDLE_UNKNOWN_MSGBIT(4)
+#define EXT_DEVICE_CFG_REQ BIT(5)
+#define VERIFY_CAP_FRAME   BIT(6)
+#define LINKUP_ACHIEVEDBIT(7)
+#define LINK_GOING_DOWNBIT(8)
+#define LINK_WIDTH_DOWNGRADED  BIT(9)
 
 /* DC_DC8051_CFG_EXT_DEV_1.REQ_TYPE - 8051 host requests */
 #define HREQ_LOAD_CONFIG   0x01
@@ -334,14 +334,14 @@
  * the CSR fields hold multiples of this value.
  */
 #define RCV_SHIFT 3
-#define RCV_INCREMENT (1 << RCV_SHIFT)
+#define RCV_INCREMENT BIT(RCV_SHIFT)
 
 /*
  * Receive header queue entry increment - the CSR holds multiples of
  * this value.
  */
 #define HDRQ_SIZE_SHIFT 5
-#define HDRQ_INCREMENT (1 << HDRQ_SIZE_SHIFT)
+#define HDRQ_INCREMENT BIT(HDRQ_SIZE_SHIFT)
 
 /*
  * Freeze handling flags
diff --git a/drivers/staging/rdma/hfi1/common.h 
b/drivers/staging/rdma/hfi1/common.h
index 5e20323..e7616fb 100644
--- a/drivers/staging/rdma/hfi1/common.h
+++ b/drivers/staging/rdma/hfi1/common.h
@@ -348,10 +348,10 @@ struct hfi1_message_header {
 #define HFI1_QPN_MASK 0xFF
 #define HFI1_FECN_SHIFT 31
 #define HFI1_FECN_MASK 1
-#define HFI1_FECN_SMASK (1 << HFI1_FECN_SHIFT)
+#define HFI1_FECN_SMASK BIT(HFI1_FECN_SHIFT)
 #define HFI1_BECN_SHIFT 30
 #define HFI1_BECN_MASK 1
-#define HFI1_BECN_SMASK (1 << HFI1_BECN_SHIFT)
+#define HFI1_BECN_SMASK BIT(HFI1_BECN_SHIFT)
 #define HFI1_MULTICAST_LID_BASE 0xC000
 
 static inline __u64 rhf_to_cpu(const __le32 *rbuf)
diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
index 88414d7..076a419 100644
--- a/drivers/staging/rdma/hfi1/diag.c
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -78,8 +78,8 @@
hfi1_cdbg(SNOOP, fmt, ##__VA_ARGS__)
 
 /* Snoop option mask */
-#define SNOOP_DROP_SEND(1 << 0)
-#define SNOOP_USE_METADATA (1 << 1)
+#define SNOOP_DROP_SENDBIT(0)
+#define SNOOP_USE_METADATA BIT(1)
 
 s

[PATCH 03/13] staging/rdma/hfi1: remove RxCtxRHQS from hfi1stats

2015-11-10 Thread Jubin John
From: Vennila Megavannan 

Removed the RxCtxRHQS counter being dumped into dev_cntrs

Reviewed-by: Dennis Dalessandro 
Signed-off-by: Vennila Megavannan 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/chip.c   |2 --
 drivers/staging/rdma/hfi1/chip.h   |1 -
 drivers/staging/rdma/hfi1/chip_registers.h |1 -
 3 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index 16302d6..9db141d 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -1587,8 +1587,6 @@ static struct cntr_entry dev_cntrs[DEV_CNTR_LAST] = {
 [C_RX_TID_FLGMS] = RXE32_DEV_CNTR_ELEM(RxTidFLGMs,
RCV_TID_FLOW_GEN_MISMATCH_CNT,
CNTR_NORMAL),
-[C_RX_CTX_RHQS] = RXE32_DEV_CNTR_ELEM(RxCtxRHQS, RCV_CONTEXT_RHQ_STALL,
-   CNTR_NORMAL),
 [C_RX_CTX_EGRS] = RXE32_DEV_CNTR_ELEM(RxCtxEgrS, RCV_CONTEXT_EGR_STALL,
CNTR_NORMAL),
 [C_RCV_TID_FLSMS] = RXE32_DEV_CNTR_ELEM(RxTidFLSMs,
diff --git a/drivers/staging/rdma/hfi1/chip.h b/drivers/staging/rdma/hfi1/chip.h
index ccfaf6d..6946fae 100644
--- a/drivers/staging/rdma/hfi1/chip.h
+++ b/drivers/staging/rdma/hfi1/chip.h
@@ -721,7 +721,6 @@ enum {
C_RX_TID_FULL,
C_RX_TID_INVALID,
C_RX_TID_FLGMS,
-   C_RX_CTX_RHQS,
C_RX_CTX_EGRS,
C_RCV_TID_FLSMS,
C_CCE_PCI_CR_ST,
diff --git a/drivers/staging/rdma/hfi1/chip_registers.h 
b/drivers/staging/rdma/hfi1/chip_registers.h
index bf45de2..5056c84 100644
--- a/drivers/staging/rdma/hfi1/chip_registers.h
+++ b/drivers/staging/rdma/hfi1/chip_registers.h
@@ -379,7 +379,6 @@
 #define DC_LCB_STS_ROUND_TRIP_LTP_CNT (DC_LCB_CSRS + 0x04B0)
 #define RCV_BUF_OVFL_CNT 10
 #define RCV_CONTEXT_EGR_STALL 22
-#define RCV_CONTEXT_RHQ_STALL 21
 #define RCV_DATA_PKT_CNT 0
 #define RCV_DWORD_CNT 1
 #define RCV_TID_FLOW_GEN_MISMATCH_CNT 20
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/13] staging/rdma/hfi1: Fix downgrade race

2015-11-10 Thread Jubin John
From: Dean Luick 

A link downgrade can race with link up. Avoid the race
in two ways. First, by having the downgrade application logic
take the link state mutex for all of its checking. Second, by
waiting for the link to move out of the going up state.

Reviewed-by: Dennis Dalessandro 
Signed-off-by: Dean Luick 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/chip.c |   31 ---
 1 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index e489819..16302d6 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -3907,18 +3907,32 @@ void handle_verify_cap(struct work_struct *work)
  */
 void apply_link_downgrade_policy(struct hfi1_pportdata *ppd, int 
refresh_widths)
 {
-   int skip = 1;
int do_bounce = 0;
-   u16 lwde = ppd->link_width_downgrade_enabled;
+   int tries;
+   u16 lwde;
u16 tx, rx;
 
+   /* use the hls lock to avoid a race with actual link up */
+   tries = 0;
+retry:
mutex_lock(&ppd->hls_lock);
/* only apply if the link is up */
-   if (ppd->host_link_state & HLS_UP)
-   skip = 0;
-   mutex_unlock(&ppd->hls_lock);
-   if (skip)
-   return;
+   if (!(ppd->host_link_state & HLS_UP)) {
+   /* still going up..wait and retry */
+   if (ppd->host_link_state & HLS_GOING_UP) {
+   if (++tries < 1000) {
+   mutex_unlock(&ppd->hls_lock);
+   usleep_range(100, 120); /* arbitrary */
+   goto retry;
+   }
+   dd_dev_err(ppd->dd,
+  "%s: giving up waiting for link state 
change\n",
+  __func__);
+   }
+   goto done;
+   }
+
+   lwde = ppd->link_width_downgrade_enabled;
 
if (refresh_widths) {
get_link_widths(ppd->dd, &tx, &rx);
@@ -3956,6 +3970,9 @@ void apply_link_downgrade_policy(struct hfi1_pportdata 
*ppd, int refresh_widths)
do_bounce = 1;
}
 
+done:
+   mutex_unlock(&ppd->hls_lock);
+
if (do_bounce) {
set_link_down_reason(ppd, OPA_LINKDOWN_REASON_WIDTH_POLICY, 0,
  OPA_LINKDOWN_REASON_WIDTH_POLICY);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] staging/rdma/hfi1: set Gen3 half-swing for integrated devices

2015-11-10 Thread Dan Carpenter
That's fine.  I feel like the first option is better because those long
names really are cumbersome.  I sometimes cut and paste bugs caused by
using really long names because it's hard to spot the differences
between two really long names which are 80% the same filler text.  (I'm
using a static checker so the checker finds the bugs and I have to stare
at it for a very long time to spot the characters which are different).

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/core: Fix LinkDownReason define for consistency

2015-11-10 Thread Jubin John
From: Easwar Hariharan 

LinkDownReason LocalMediaNotInstalled lacked an underscore
and was inconsistent with other defines in the same family.
This patch fixes this.

Reviewed-by: Ira Weiny 
Signed-off-by: Easwar Hariharan 
Signed-off-by: Jubin John 
---
 include/rdma/opa_port_info.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/rdma/opa_port_info.h b/include/rdma/opa_port_info.h
index a0fa975..2b95c2c 100644
--- a/include/rdma/opa_port_info.h
+++ b/include/rdma/opa_port_info.h
@@ -97,7 +97,7 @@
 #define OPA_LINKDOWN_REASON_WIDTH_POLICY   41
 /* 42-48 reserved */
 #define OPA_LINKDOWN_REASON_DISCONNECTED   49
-#define OPA_LINKDOWN_REASONLOCAL_MEDIA_NOT_INSTALLED   50
+#define OPA_LINKDOWN_REASON_LOCAL_MEDIA_NOT_INSTALLED  50
 #define OPA_LINKDOWN_REASON_NOT_INSTALLED  51
 #define OPA_LINKDOWN_REASON_CHASSIS_CONFIG 52
 /* 53 reserved */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html