Re: [PATCH for-next 2/5] IB/core: Add support for extended query device caps

2014-11-06 Thread Eli Cohen
On Tue, Nov 04, 2014 at 02:35:09PM +0200, Haggai Eran wrote:
 On 03/11/2014 10:02, Eli Cohen wrote:
  +
  +   if (ucore-outlen  sizeof(resp))
  +   return -ENOSPC;
 
 This check may cause compatibility problems when running a newer kernel
 with old userspace. The userspace code will have a smaller
 ib_uverbs_ex_query_device_resp struct, so the verb will always fail. A
 possible solution is to drop this check, and modify ib_copy_to_udata so
 that it only copies up to ucore-outlen bytes.
 
Makes sense. Will fix that in V1.

  +
  +   if (cmd.comp_mask)
  +   return -EINVAL;
 
 This check may make it difficult for userspace to use this verb. If
 running an older kernel with a newer userspace, the userspace will need
 to run the verb multiple times to find out which combination of
 comp_mask bits is actually supported. I think a better way would be to
 drop this check, and let userspace rely on the returned comp_mask in the
 ib_uverbs_ex_query_device_resp struct to determine which features are
 supported by the current kernel.

Agree - this should hold true for any extended query. Will fix in v1.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] rping: ignore flushed completions

2014-11-06 Thread Hariprasad Shenai
Based on original work by Steve Wise st...@opengridcomputing.com

Signed-off-by: Hariprasad Shenai haripra...@chelsio.com
---
 examples/rping.c |   15 ++-
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/examples/rping.c b/examples/rping.c
index f0414de..58b642e 100644
--- a/examples/rping.c
+++ b/examples/rping.c
@@ -277,15 +277,20 @@ static int rping_cq_event_handler(struct rping_cb *cb)
struct ibv_wc wc;
struct ibv_recv_wr *bad_wr;
int ret;
+   int flushed = 0;
 
while ((ret = ibv_poll_cq(cb-cq, 1, wc)) == 1) {
ret = 0;
 
if (wc.status) {
-   if (wc.status != IBV_WC_WR_FLUSH_ERR)
-   fprintf(stderr,
-   cq completion failed status %d\n,
-   wc.status);
+   if (wc.status == IBV_WC_WR_FLUSH_ERR) {
+   flushed = 1;
+   continue;
+
+   }
+   fprintf(stderr,
+   cq completion failed status %d\n,
+   wc.status);
ret = -1;
goto error;
}
@@ -334,7 +339,7 @@ static int rping_cq_event_handler(struct rping_cb *cb)
fprintf(stderr, poll error %d\n, ret);
goto error;
}
-   return 0;
+   return flushed;
 
 error:
cb-state = ERROR;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] rping: Fixes race, where ibv context was getting freed before memory was deregistered

2014-11-06 Thread Hariprasad Shenai
While running rping as a client without server on the other end, 
rping_test_client fails
and the ibv context was getting freed before memory was deregistered. This 
patch fixes it.

Signed-off-by: Hariprasad Shenai haripra...@chelsio.com
---
 examples/rping.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/examples/rping.c b/examples/rping.c
index 949cbe6..f0414de 100644
--- a/examples/rping.c
+++ b/examples/rping.c
@@ -1055,18 +1055,19 @@ static int rping_run_client(struct rping_cb *cb)
ret = rping_connect_client(cb);
if (ret) {
fprintf(stderr, connect error %d\n, ret);
-   goto err2;
+   goto err3;
}
 
ret = rping_test_client(cb);
if (ret) {
fprintf(stderr, rping client failed: %d\n, ret);
-   goto err3;
+   goto err4;
}
 
ret = 0;
-err3:
+err4:
rdma_disconnect(cb-cm_id);
+err3:
pthread_join(cb-cqthread, NULL);
 err2:
rping_free_buffers(cb);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 2/2] net/mlx5_core: Fix race on driver load

2014-11-06 Thread Eli Cohen
When events arrive at driver load, the event handler gets called even before
the spinlock and list are initialized. Fix this by moving the initialization
before EQs creation.

Signed-off-by: Eli Cohen e...@mellanox.com
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 88b2ffa0edfb..ecc6341e728a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -855,14 +855,14 @@ static int init_one(struct pci_dev *pdev,
dev-profile = profile[prof_sel];
dev-event = mlx5_core_event;
 
+   INIT_LIST_HEAD(priv-ctx_list);
+   spin_lock_init(priv-ctx_lock);
err = mlx5_dev_init(dev, pdev);
if (err) {
dev_err(pdev-dev, mlx5_dev_init failed %d\n, err);
goto out;
}
 
-   INIT_LIST_HEAD(priv-ctx_list);
-   spin_lock_init(priv-ctx_lock);
err = mlx5_register_device(dev);
if (err) {
dev_err(pdev-dev, mlx5_register_device failed %d\n, err);
-- 
2.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 0/2] mlx5_core fixes for 3.18

2014-11-06 Thread Eli Cohen
Hi Dave,
the following two patches fix races to could lead to kernel panic in some cases.

Thanks,
Eli

Eli Cohen (2):
  net/mlx5_core: Fix race in create EQ
  net/mlx5_core: Fix race on driver load

 drivers/net/ethernet/mellanox/mlx5/core/eq.c   | 7 +++
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ++--
 2 files changed, 5 insertions(+), 6 deletions(-)

-- 
2.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 1/2] net/mlx5_core: Fix race in create EQ

2014-11-06 Thread Eli Cohen
After the EQ is created, it can possibly generate interrupts and the interrupt
handler is referencing eq-dev. It is therefore required to set eq-dev before
calling request_irq() so if an event is generated before request_irq() returns,
we will have a valid eq-dev field.

Signed-off-by: Eli Cohen e...@mellanox.com
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index a278238a2db6..ad2c96a02a53 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -374,15 +374,14 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct 
mlx5_eq *eq, u8 vecidx,
snprintf(eq-name, MLX5_MAX_EQ_NAME, %s@pci:%s,
 name, pci_name(dev-pdev));
eq-eqn = out.eq_number;
+   eq-irqn = vecidx;
+   eq-dev = dev;
+   eq-doorbell = uar-map + MLX5_EQ_DOORBEL_OFFSET;
err = request_irq(table-msix_arr[vecidx].vector, mlx5_msix_handler, 0,
  eq-name, eq);
if (err)
goto err_eq;
 
-   eq-irqn = vecidx;
-   eq-dev = dev;
-   eq-doorbell = uar-map + MLX5_EQ_DOORBEL_OFFSET;
-
err = mlx5_debug_eq_add(dev, eq);
if (err)
goto err_irq;
-- 
2.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 1/2] IB/uverbs: Enable device removal when there are active user space applications

2014-11-06 Thread Yishai Hadas
Enables the uverbs_remove_one to succeed despite the fact that there are
running IB applications working with the given ib device.  This functionality
enables a HW device to be unbind/reset despite the fact that there are running
user space applications using it.

It exposes a new IB kernel API named 'disassociate_ucontext' which lets a
driver detaching its HW resources from a given user context without
crashing/terminating the application. In case a driver implemented the above
API and registered with ib_uverb there will be no dependency between its device
to its uverbs_device. Upon calling remove_one of ib_uverbs the call should
return after disassociating the open HW resources without waiting to clients
disconnecting. In case driver didn't implement this API there will be no change
to current behaviour and uverbs_remove_one will return only when last client
has disconnected and reference count on uverbs device became 0.

In case the lower driver device was removed any application will continue
working over some zombie HCA, further calls will ended with an immediate error.

Signed-off-by: Yishai Hadas yish...@mellanox.com
Signed-off-by: Jack Morgenstein ja...@mellanox.com

---
 drivers/infiniband/core/uverbs.h  |9 +
 drivers/infiniband/core/uverbs_cmd.c  |8 +
 drivers/infiniband/core/uverbs_main.c |  317 +++--
 include/rdma/ib_verbs.h   |2 +
 4 files changed, 280 insertions(+), 56 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 643c08a..e485e67 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -94,6 +94,12 @@ struct ib_uverbs_device {
struct cdev cdev;
struct rb_root  xrcd_tree;
struct mutexxrcd_tree_mutex;
+   struct mutexdisassociate_mutex; /* protect 
lists of files. */
+   int disassociated;
+   int disassociated_supported;
+   struct srcu_struct  disassociate_srcu;
+   struct list_headuverbs_file_list;
+   struct list_headuverbs_events_file_list;
 };
 
 struct ib_uverbs_event_file {
@@ -105,6 +111,7 @@ struct ib_uverbs_event_file {
wait_queue_head_t   poll_wait;
struct fasync_struct   *async_queue;
struct list_headevent_list;
+   struct list_headlist;
 };
 
 struct ib_uverbs_file {
@@ -114,6 +121,8 @@ struct ib_uverbs_file {
struct ib_ucontext *ucontext;
struct ib_event_handler event_handler;
struct ib_uverbs_event_file*async_file;
+   struct list_headlist;
+   int fatal_event_raised;
 };
 
 struct ib_uverbs_event {
diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 5ba2a86..0b19361 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -38,6 +38,7 @@
 #include linux/slab.h
 
 #include asm/uaccess.h
+#include linux/sched.h
 
 #include uverbs.h
 #include core_priv.h
@@ -326,6 +327,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
INIT_LIST_HEAD(ucontext-xrcd_list);
INIT_LIST_HEAD(ucontext-rule_list);
ucontext-closing = 0;
+   ucontext-tgid = get_task_pid(current-group_leader, PIDTYPE_PID);
 
resp.num_comp_vectors = file-device-num_comp_vectors;
 
@@ -1286,6 +1288,12 @@ ssize_t ib_uverbs_create_comp_channel(struct 
ib_uverbs_file *file,
return -EFAULT;
}
 
+   /* Taking ref count on uverbs_file to make sure that file won't be 
freed till
+ * that event file is closed. It will enable accessing the 
uverbs_device fields as part of
+ * closing the events file and making sure that uverbs device is 
available by that time as well.
+ * Note: similar is already done for the async event file.
+   */
+   kref_get(file-ref);
fd_install(resp.fd, filp);
return in_len;
 }
diff --git a/drivers/infiniband/core/uverbs_main.c 
b/drivers/infiniband/core/uverbs_main.c
index 71ab83f..d718d64 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -133,7 +133,12 @@ static void ib_uverbs_release_dev(struct kref *ref)
struct ib_uverbs_device *dev =
container_of(ref, struct ib_uverbs_device, ref);
 
-   complete(dev-comp);
+   if (dev-disassociated) {
+   cleanup_srcu_struct(dev-disassociate_srcu);
+   kfree(dev);
+   } else {
+   complete(dev-comp);
+   }
 }
 
 static void ib_uverbs_release_event_file(struct kref 

[PATCH for-next 2/2] IB/mlx4_ib: Disassociate support

2014-11-06 Thread Yishai Hadas
Implements the IB core disassociate_ucontext API. The driver detaches the HW
resources for a given user context to prevent a dependency between application
termination and device disconnecting. This is done by managing the VMAs that
were mapped to the HW bars such as door bell and blueflame. When need to detach
remap them to an arbitrary kernel page returned by the zap API.

Signed-off-by: Yishai Hadas yish...@mellanox.com
Signed-off-by: Jack Morgenstein ja...@mellanox.com

---
 drivers/infiniband/hw/mlx4/main.c|  119 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h |   12 
 2 files changed, 130 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index bda5994..76151b2 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -645,7 +645,7 @@ static struct ib_ucontext *mlx4_ib_alloc_ucontext(struct 
ib_device *ibdev,
resp.cqe_size = dev-dev-caps.cqe_size;
}
 
-   context = kmalloc(sizeof *context, GFP_KERNEL);
+   context = kzalloc(sizeof *context, GFP_KERNEL);
if (!context)
return ERR_PTR(-ENOMEM);
 
@@ -682,21 +682,134 @@ static int mlx4_ib_dealloc_ucontext(struct ib_ucontext 
*ibcontext)
return 0;
 }
 
+static void  mlx4_ib_vma_open(struct vm_area_struct *area)
+{
+   /* vma_open is called when a new VMA is created on top of our VMA.
+* This is done through either mremap flow or split_vma (usually due to 
mlock,
+* madvise, munmap, etc.) We do not support a clone of the vma, as this 
VMA is
+* strongly hardware related.  Therefore we set the vm_ops of the newly
+* created/cloned VMA to NULL, to prevent it from calling us again and 
trying
+* to do incorrect actions.  We assume that the original vma size is 
exactly a
+* single page that there will be no splitting operations on.
+*/
+   area-vm_ops = NULL;
+}
+
+static void  mlx4_ib_vma_close(struct vm_area_struct *area)
+{
+   struct mlx4_ib_vma_private_data *mlx4_ib_vma_priv_data;
+
+   /* It's guaranteed that all VMAs opened on a FD are closed before the
+* file itself is closed, therefore no sync is needed with the regular 
closing
+* flow. (e.g. mlx4_ib_dealloc_ucontext) However need a sync with 
accessing the
+* vma as part of mlx4_ib_disassociate_ucontext.  The close operation is
+* usually called under mm-mmap_sem except when process is exiting.  
The
+* exiting case is handled explicitly as part of 
mlx4_ib_disassociate_ucontext.
+*/
+   mlx4_ib_vma_priv_data = (struct mlx4_ib_vma_private_data 
*)area-vm_private_data;
+
+   /* set the vma context pointer to null in the mlx4_ib driver's private 
data,
+* to protect against a race condition in 
mlx4_ib_dissassociate_ucontext().
+*/
+   mlx4_ib_vma_priv_data-vma = NULL;
+}
+
+static const struct vm_operations_struct mlx4_ib_vm_ops = {
+   .open = mlx4_ib_vma_open,
+   .close = mlx4_ib_vma_close
+};
+
+static void mlx4_ib_disassociate_ucontext(struct ib_ucontext *ibcontext)
+{
+   int i;
+   int ret = 0;
+   struct vm_area_struct *vma;
+   struct mlx4_ib_ucontext *context = to_mucontext(ibcontext);
+   struct task_struct *owning_process  = NULL;
+   struct mm_struct   *owning_mm   = NULL;
+
+   owning_process = get_pid_task(ibcontext-tgid, PIDTYPE_PID);
+   if (!owning_process)
+   return;
+
+   owning_mm = get_task_mm(owning_process);
+   if (!owning_mm) {
+   pr_info(no mm, disassociate ucontext is pending task 
termination\n);
+   while (1) {
+   /* make sure that task is dead before returning, it may 
prevent a rare case
+* of module down in parallel to a call to 
mlx4_ib_vma_close.
+*/
+   put_task_struct(owning_process);
+   msleep(1);
+   owning_process = get_pid_task(ibcontext-tgid, 
PIDTYPE_PID);
+   if (!owning_process || owning_process-state == 
TASK_DEAD) {
+   pr_info(disassociate ucontext done, task was 
terminated\n);
+   /* in case task was dead need to release the 
task struct */
+   if (owning_process)
+   put_task_struct(owning_process);
+   return;
+   }
+   }
+   }
+
+   /* need to protect from a race on closing the vma as part of 
mlx4_ib_vma_close */
+   down_read(owning_mm-mmap_sem);
+   for (i = 0; i  HW_BAR_COUNT; i++) {
+   vma = context-hw_bar_info[i].vma;
+   if (!vma)
+   continue;
+
+   ret = zap_vma_ptes(context-hw_bar_info[i].vma, 

[PATCH for-next 0/2] HW Device hot-removal support

2014-11-06 Thread Yishai Hadas
Currently, if there is any user space application using an IB device,
it is impossible to unload the HW device driver for this device.

Similarly, if the device is hot-unplugged or reset, the device driver
hardware removal flow blocks until all user contexts are destroyed.

This patchset removes the above limitations. The IB-core and uverbs
layers are still required to remain loaded as long as there are user
applications using the verbs API. However, the hardware device drivers
are not blocked any more by the user space activity.

To support this, the hardware device needs to expose a new kernel API
named 'disassociate_ucontext'. The device driver is given a ucontext
to detach from, and it should block this user context from any future
hardware access. In the IB-core level, we use this interface to
deactivate all ucontext that address a specific device when handling a
remove_one callback for it.

The first patch introduces the new API between the HW device driver and
the IB core. For devices which implement the functionality, IB core
will use it in remove_one, disassociating any active ucontext from the
hardware device. Other drivers that didn't implement it will behave as
today, remove_one will block until all ucontexts referring the device
are destroyed before returning.

The second patch provides implementation of this API for the mlx4
driver. 

Yishai Hadas (2):
  IB/uverbs: Enable device removal when there are active user space
applications
  IB/mlx4_ib: Disassociate support

 drivers/infiniband/core/uverbs.h  |9 +
 drivers/infiniband/core/uverbs_cmd.c  |8 +
 drivers/infiniband/core/uverbs_main.c |  317 +++--
 drivers/infiniband/hw/mlx4/main.c |  119 -
 drivers/infiniband/hw/mlx4/mlx4_ib.h  |   12 ++
 include/rdma/ib_verbs.h   |2 +
 6 files changed, 410 insertions(+), 57 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/srp: Fix a 32-bit compiler warning

2014-11-06 Thread Bart Van Assche
The result of a pointer subtraction has type ptrdiff_t. Hence change a
%ld format specifier into %td. This change avoids that the following
warning is printed on 32-bit systems:

warning: format '%ld' expects argument of type 'long int', but argument 5 has 
type 'int' [-Wformat=]

Reported-by: Wu Fengguang fengguang...@intel.com
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Sebastian Parschauer sebastian.rie...@profitbricks.com
---
 drivers/infiniband/ulp/srp/ib_srp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 89e4560..577eb01 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1747,7 +1747,7 @@ static void srp_process_rsp(struct srp_rdma_ch *ch, 
struct srp_rsp *rsp)
}
if (!scmnd) {
shost_printk(KERN_ERR, target-scsi_host,
-Null scmnd for RSP w/tag %#016llx 
received on ch %ld / QP %#x\n,
+Null scmnd for RSP w/tag %#016llx 
received on ch %td / QP %#x\n,
 rsp-tag, ch - target-ch, ch-qp-qp_num);
 
spin_lock_irqsave(ch-lock, flags);
-- 
2.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 for-next 5/5] IB/mlx4: Modify mlx4 to comply with extended atomic definitions

2014-11-06 Thread Eli Cohen
Set the extended masked atomic capabilities. For ConnectX devices argument size
is fixed to 8 bytes and bit boundary is 64.

Signed-off-by: Eli Cohen e...@mellanox.com
---
 drivers/infiniband/hw/mlx4/main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 8b72cf392b34..7de8cf12a605 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -223,6 +223,9 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
props-atomic_cap  = dev-dev-caps.flags  
MLX4_DEV_CAP_FLAG_ATOMIC ?
IB_ATOMIC_HCA : IB_ATOMIC_NONE;
props-masked_atomic_cap   = props-atomic_cap;
+   props-log_atomic_arg_sizes = 8;
+   props-max_fa_bit_boundary = 64;
+   props-log_max_atomic_inline = 8;
props-max_pkeys   = dev-dev-caps.pkey_table_len[1];
props-max_mcast_grp   = dev-dev-caps.num_mgms + 
dev-dev-caps.num_amgms;
props-max_mcast_qp_attach = dev-dev-caps.num_qp_per_mgm;
-- 
2.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 for-next 4/5] IB/mlx5: Add extended atomic support

2014-11-06 Thread Eli Cohen
Connect-IB extended atomic operations provides masked compare and swap and
multi field fetch and add operations with arguments sizes bigger than 64 bits.

Also, Connect-IB supports BE replies to atomic opertation, add that
to the advertized capabilities.

Add the required functionality to mlx5 and publish capabilities.

Signed-off-by: Eli Cohen e...@mellanox.com
---
 drivers/infiniband/hw/mlx5/main.c  | 47 +-
 drivers/infiniband/hw/mlx5/qp.c| 26 ++--
 drivers/net/ethernet/mellanox/mlx5/core/fw.c   | 51 +++-
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 21 +++---
 include/linux/mlx5/device.h|  4 +-
 include/linux/mlx5/driver.h| 55 ++
 include/linux/mlx5/mlx5_ifc.h  | 20 ++
 7 files changed, 194 insertions(+), 30 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 1ba6c42e4df8..3c6fa99c4256 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -151,6 +151,47 @@ static void free_comp_eqs(struct mlx5_ib_dev *dev)
spin_unlock(table-lock);
 }
 
+static void update_atomic_caps(struct mlx5_caps*caps,
+  struct ib_device_attr *props)
+{
+   struct mlx5_atomic_caps *atom = caps-atom;
+   unsigned long last;
+   unsigned long arg;
+   int tmp;
+
+   tmp = MLX5_ATOMIC_OPS_CMP_SWAP | MLX5_ATOMIC_OPS_FETCH_ADD;
+   if (((atom-atomic_ops  tmp) == tmp)  (atom-atomic_sizes_qp  8)) {
+   if (atom-requestor_endianess)
+   props-atomic_cap = IB_ATOMIC_HCA;
+   else
+   props-atomic_cap = IB_ATOMIC_HCA_REPLY_BE;
+   } else {
+   props-atomic_cap = IB_ATOMIC_NONE;
+   }
+
+   tmp = MLX5_ATOMIC_OPS_MASKED_CMP_SWAP | 
MLX5_ATOMIC_OPS_MASKED_FETCH_ADD;
+   if (((atom-atomic_ops  tmp) == tmp)) {
+   if (atom-requestor_endianess)
+   props-masked_atomic_cap = IB_ATOMIC_HCA;
+   else
+   props-masked_atomic_cap = IB_ATOMIC_HCA_REPLY_BE;
+   } else {
+   props-masked_atomic_cap = IB_ATOMIC_NONE;
+   }
+   if ((props-atomic_cap != IB_ATOMIC_NONE) ||
+   (props-masked_atomic_cap != IB_ATOMIC_NONE)) {
+   props-log_atomic_arg_sizes = caps-atom.atomic_sizes_qp;
+   props-max_fa_bit_boundary = 64;
+   arg = (unsigned long)props-log_atomic_arg_sizes;
+   last = find_last_bit(arg, sizeof(arg));
+   props-log_max_atomic_inline = min_t(unsigned long, last, 6);
+   } else {
+   props-log_atomic_arg_sizes = 0;
+   props-max_fa_bit_boundary = 0;
+   props-log_max_atomic_inline = 0;
+   }
+}
+
 static int mlx5_ib_query_device(struct ib_device *ibdev,
struct ib_device_attr *props)
 {
@@ -235,8 +276,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
props-max_srq_sge = max_rq_sg - 1;
props-max_fast_reg_page_list_len = (unsigned int)-1;
props-local_ca_ack_delay  = gen-local_ca_ack_delay;
-   props-atomic_cap  = IB_ATOMIC_NONE;
-   props-masked_atomic_cap   = IB_ATOMIC_NONE;
+   update_atomic_caps(dev-mdev-caps, props);
props-max_pkeys   = be16_to_cpup((__be16 *)(out_mad-data + 
28));
props-max_mcast_grp   = 1  gen-log_max_mcg;
props-max_mcast_qp_attach = gen-max_qp_mcg;
@@ -1374,6 +1414,9 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
(1ull  IB_USER_VERBS_CMD_CLOSE_XRCD);
}
 
+   dev-ib_dev.uverbs_ex_cmd_mask  |=
+   (1ull  IB_USER_VERBS_EX_CMD_QUERY_DEVICE);
+
err = init_node_data(dev);
if (err)
goto err_eqs;
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 9ca39ad68cb8..47ca93ce214f 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1254,7 +1254,27 @@ int mlx5_ib_destroy_qp(struct ib_qp *qp)
return 0;
 }
 
-static __be32 to_mlx5_access_flags(struct mlx5_ib_qp *qp, const struct 
ib_qp_attr *attr,
+static u32 atomic_mode_qp(struct mlx5_ib_dev *dev)
+{
+   struct mlx5_atomic_caps *acaps = dev-mdev-caps.atom;
+   unsigned long mask;
+   unsigned long tmp;
+
+   mask = acaps-atomic_sizes_qp  acaps-atomic_sizes_dc;
+
+   tmp = find_last_bit(mask, 8 * sizeof(mask));
+   if (tmp  2 || tmp = 16)
+   return MLX5_ATOMIC_MODE_NONE  16;
+
+   if (tmp == 2)
+   return MLX5_ATOMIC_MODE_CX  16;
+
+   return tmp  16;
+}
+
+static __be32 to_mlx5_access_flags(struct mlx5_ib_dev *dev,
+  struct mlx5_ib_qp *qp,
+  const struct ib_qp_attr *attr,

Re: [PATCH v3 01/11] blk-mq: Add blk_mq_unique_tag()

2014-11-06 Thread Bart Van Assche

On 11/05/14 19:54, Christoph Hellwig wrote:

On Wed, Nov 05, 2014 at 01:37:14PM +0100, Bart Van Assche wrote:

That's strange. I have compared the patches that are already in your tree
with the patches I had posted myself with a diff tool. These patches look
identical to what I had posted except for one CC tag that has been left out.
If I try to apply the three patches that have not yet been included in your
tree (9/11..11/11) on top the drivers-for-3.19 branch then these patches
apply fine. Anyway, I have rebased my tree on top of your drivers-for-3.19
branch, added a few other patches (including one block layer patch that has
not yet been posted) and retested the SRP initiator driver against the
traditional SCSI core and also against the scsi-mq core. The result can be
found here: https://github.com/bvanassche/linux/commits/srp-multiple-hwq-v4.
Can you please retry to apply patches 9/11..11/11 apply on top of the
drivers-for-3.19 branch ?


I've pulled in the three remaining patches from the series from that
tree.  If you want me to pull in the remaining trivial srp patch as well
please give me a Reviewed-by: and I'll also pull it in.


Thanks !

Regarding the remaining SRP patch: Roland has already been asked to pull 
that patch (see also 
http://thread.gmane.org/gmane.linux.drivers.rdma/22018).


Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 for-next 3/5] IB/core: Extend atomic operations

2014-11-06 Thread Eli Cohen
Further enhance the extended atomic operations support as was introduced
in commit 5e80ba8ff0bd IB/core: Add support for masked atomic operations.

1. Allow arbitrary argument sizes. The original extended atomics commit defined
64 bits arguments. This patch allows arbitrary arguments which are power of 2
bytes in size.

2. Add the option to define response for atomic operations in network order.
enum ib_atomic_cap is extended to have big endian variants.

The device attributes struct defines three new fields:

log_atomic_arg_sizes - is a bit mask which encodes which argument sizes are
supported. A set bit at location n (zero based) means an argument of size 2 ^ n
is supported.

max_fa_bit_boundary - Max fetch and add bit boundary. Multi field fetch and add
operations use a bit mask that defines bit locations where carry bit is not
passed to the next higher order bit. So, if this field has the value 64, it
means that the max value subject to fetch and add is 64 bits which means no
carry from bit 63 to 64 or from bit 127 to 128 etc.

log_max_atomic_inline - atomic arguments can be inline in the WQE or be
referenced through a memory key. This value defines the max inline argument
size possible.

Signed-off-by: Eli Cohen e...@mellanox.com
---
Changes from v0:
Don not enforce comp_mask to the know masks defined by
~IB_UVERBS_EX_QUERY_DEV_MAX_MASK.

 drivers/infiniband/core/uverbs_cmd.c | 14 ++
 include/rdma/ib_verbs.h  |  7 ++-
 include/uapi/rdma/ib_user_verbs.h| 14 ++
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 74ad0d0de92b..0bc215fa2a85 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -445,6 +445,8 @@ ssize_t ib_uverbs_query_device(struct ib_uverbs_file *file,
 
memset(resp, 0, sizeof resp);
copy_query_dev_fields(file, resp, attr);
+   if (resp.atomic_cap  IB_ATOMIC_GLOB)
+   resp.atomic_cap = IB_ATOMIC_NONE;
 
if (copy_to_user((void __user *) (unsigned long) cmd.response,
 resp, sizeof resp))
@@ -3286,6 +3288,18 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file 
*file,
copy_query_dev_fields(file, resp.base, attr);
resp.comp_mask = 0;
 
+   if (cmd.comp_mask  IB_UVERBS_EX_QUERY_DEV_MASKED_ATOMIC) {
+   resp.atomics.masked_atomic_cap = attr.masked_atomic_cap;
+   resp.atomics.log_atomic_arg_sizes = attr.log_atomic_arg_sizes;
+   resp.atomics.max_fa_bit_boundary = attr.max_fa_bit_boundary;
+   resp.atomics.log_max_atomic_inline = attr.log_max_atomic_inline;
+   resp.comp_mask |= IB_UVERBS_EX_QUERY_DEV_MASKED_ATOMIC;
+   } else {
+   resp.atomics.masked_atomic_cap = IB_ATOMIC_NONE;
+   resp.atomics.log_atomic_arg_sizes = 0;
+   resp.atomics.max_fa_bit_boundary = 0;
+   resp.atomics.log_max_atomic_inline = 0;
+   }
err = ib_copy_to_udata(ucore, resp, sizeof(resp));
if (err)
return err;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 97a999f9e4d8..2b65e31ca298 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -140,7 +140,9 @@ enum ib_signature_guard_cap {
 enum ib_atomic_cap {
IB_ATOMIC_NONE,
IB_ATOMIC_HCA,
-   IB_ATOMIC_GLOB
+   IB_ATOMIC_GLOB,
+   IB_ATOMIC_HCA_REPLY_BE,
+   IB_ATOMIC_GLOB_REPLY_BE,
 };
 
 struct ib_device_attr {
@@ -186,6 +188,9 @@ struct ib_device_attr {
u8  local_ca_ack_delay;
int sig_prot_cap;
int sig_guard_cap;
+   u32 log_atomic_arg_sizes; /* bit-mask of supported 
sizes */
+   u32 max_fa_bit_boundary;
+   u32 log_max_atomic_inline;
 };
 
 enum ib_mtu {
diff --git a/include/uapi/rdma/ib_user_verbs.h 
b/include/uapi/rdma/ib_user_verbs.h
index ed8c3d9da42c..ec98fe636f2b 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -202,13 +202,27 @@ struct ib_uverbs_query_device_resp {
__u8  reserved[4];
 };
 
+enum {
+   IB_UVERBS_EX_QUERY_DEV_MASKED_ATOMIC= 1  0,
+   IB_UVERBS_EX_QUERY_DEV_LAST = 1  1,
+   IB_UVERBS_EX_QUERY_DEV_MAX_MASK = IB_UVERBS_EX_QUERY_DEV_LAST - 
1,
+};
+
 struct ib_uverbs_ex_query_device {
__u32 comp_mask;
 };
 
+struct ib_uverbs_ex_atomic_caps {
+   __u32 masked_atomic_cap;
+   __u32 log_atomic_arg_sizes; /* bit-mask of supported sizes */
+   __u32 max_fa_bit_boundary;
+   __u32 log_max_atomic_inline;
+};
+
 struct ib_uverbs_ex_query_device_resp {
struct ib_uverbs_query_device_resp base;
__u32 comp_mask;
+   struct ib_uverbs_ex_atomic_caps atomics;
 };
 
 struct ib_uverbs_query_port {
-- 
2.1.2

--
To 

[PATCH v1 for-next 1/5] IB/mlx5: Fix sparse warnings

2014-11-06 Thread Eli Cohen
1. Add required __acquire/__release statements to balance spinlock usage.
2. Change the index parameter of begin_wqe() to be unsigned to match supplied
argument type.

Signed-off-by: Eli Cohen e...@mellanox.com
---
 drivers/infiniband/hw/mlx5/qp.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index e261a53f9a02..9ca39ad68cb8 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1011,9 +1011,14 @@ static void mlx5_ib_lock_cqs(struct mlx5_ib_cq *send_cq, 
struct mlx5_ib_cq *recv
}
} else {
spin_lock_irq(send_cq-lock);
+   __acquire(recv_cq-lock);
}
} else if (recv_cq) {
spin_lock_irq(recv_cq-lock);
+   __acquire(send_cq-lock);
+   } else {
+   __acquire(send_cq-lock);
+   __acquire(recv_cq-lock);
}
 }
 
@@ -1033,10 +1038,15 @@ static void mlx5_ib_unlock_cqs(struct mlx5_ib_cq 
*send_cq, struct mlx5_ib_cq *re
spin_unlock_irq(recv_cq-lock);
}
} else {
+   __release(recv_cq-lock);
spin_unlock_irq(send_cq-lock);
}
} else if (recv_cq) {
+   __release(send_cq-lock);
spin_unlock_irq(recv_cq-lock);
+   } else {
+   __release(recv_cq-lock);
+   __release(send_cq-lock);
}
 }
 
@@ -2411,7 +2421,7 @@ static u8 get_fence(u8 fence, struct ib_send_wr *wr)
 
 static int begin_wqe(struct mlx5_ib_qp *qp, void **seg,
 struct mlx5_wqe_ctrl_seg **ctrl,
-struct ib_send_wr *wr, int *idx,
+struct ib_send_wr *wr, unsigned *idx,
 int *size, int nreq)
 {
int err = 0;
@@ -2737,6 +2747,8 @@ out:
 
if (bf-need_lock)
spin_lock(bf-lock);
+   else
+   __acquire(bf-lock);
 
/* TBD enable WC */
if (0  nreq == 1  bf-uuarn  inl  size  1  size = 
bf-buf_size / 16) {
@@ -2753,6 +2765,8 @@ out:
bf-offset ^= bf-buf_size;
if (bf-need_lock)
spin_unlock(bf-lock);
+   else
+   __release(bf-lock);
}
 
spin_unlock_irqrestore(qp-sq.lock, flags);
-- 
2.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 for-next 2/5] IB/core: Add support for extended query device caps

2014-11-06 Thread Eli Cohen
Add extensible query device capabilities verb to allow adding new features.
ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to copy
capability fields to be used by both ib_uverbs_query_device and
ib_uverbs_ex_query_device.

Signed-off-by: Eli Cohen e...@mellanox.com
---
Changes from v0:
1. Allow userspace to pass response buffer smaller than the kernel's.
2. Do not enforce comp_mask at input of query device.
3. Modify ib_copy_to_udata to copy the minimum size between the caller's
   request and the size provided by userspace.

 drivers/infiniband/core/uverbs.h  |   1 +
 drivers/infiniband/core/uverbs_cmd.c  | 121 ++
 drivers/infiniband/core/uverbs_main.c |   3 +-
 include/rdma/ib_verbs.h   |   5 +-
 include/uapi/rdma/ib_user_verbs.h |  12 +++-
 5 files changed, 98 insertions(+), 44 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 643c08a025a5..b716b0815644 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -258,5 +258,6 @@ IB_UVERBS_DECLARE_CMD(close_xrcd);
 
 IB_UVERBS_DECLARE_EX_CMD(create_flow);
 IB_UVERBS_DECLARE_EX_CMD(destroy_flow);
+IB_UVERBS_DECLARE_EX_CMD(query_device);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 5ba2a86aab6a..74ad0d0de92b 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -378,6 +378,52 @@ err:
return ret;
 }
 
+static void copy_query_dev_fields(struct ib_uverbs_file *file,
+ struct ib_uverbs_query_device_resp *resp,
+ struct ib_device_attr *attr)
+{
+   resp-fw_ver= attr-fw_ver;
+   resp-node_guid = file-device-ib_dev-node_guid;
+   resp-sys_image_guid= attr-sys_image_guid;
+   resp-max_mr_size   = attr-max_mr_size;
+   resp-page_size_cap = attr-page_size_cap;
+   resp-vendor_id = attr-vendor_id;
+   resp-vendor_part_id= attr-vendor_part_id;
+   resp-hw_ver= attr-hw_ver;
+   resp-max_qp= attr-max_qp;
+   resp-max_qp_wr = attr-max_qp_wr;
+   resp-device_cap_flags  = attr-device_cap_flags;
+   resp-max_sge   = attr-max_sge;
+   resp-max_sge_rd= attr-max_sge_rd;
+   resp-max_cq= attr-max_cq;
+   resp-max_cqe   = attr-max_cqe;
+   resp-max_mr= attr-max_mr;
+   resp-max_pd= attr-max_pd;
+   resp-max_qp_rd_atom= attr-max_qp_rd_atom;
+   resp-max_ee_rd_atom= attr-max_ee_rd_atom;
+   resp-max_res_rd_atom   = attr-max_res_rd_atom;
+   resp-max_qp_init_rd_atom   = attr-max_qp_init_rd_atom;
+   resp-max_ee_init_rd_atom   = attr-max_ee_init_rd_atom;
+   resp-atomic_cap= attr-atomic_cap;
+   resp-max_ee= attr-max_ee;
+   resp-max_rdd   = attr-max_rdd;
+   resp-max_mw= attr-max_mw;
+   resp-max_raw_ipv6_qp   = attr-max_raw_ipv6_qp;
+   resp-max_raw_ethy_qp   = attr-max_raw_ethy_qp;
+   resp-max_mcast_grp = attr-max_mcast_grp;
+   resp-max_mcast_qp_attach   = attr-max_mcast_qp_attach;
+   resp-max_total_mcast_qp_attach = attr-max_total_mcast_qp_attach;
+   resp-max_ah= attr-max_ah;
+   resp-max_fmr   = attr-max_fmr;
+   resp-max_map_per_fmr   = attr-max_map_per_fmr;
+   resp-max_srq   = attr-max_srq;
+   resp-max_srq_wr= attr-max_srq_wr;
+   resp-max_srq_sge   = attr-max_srq_sge;
+   resp-max_pkeys = attr-max_pkeys;
+   resp-local_ca_ack_delay= attr-local_ca_ack_delay;
+   resp-phys_port_cnt = file-device-ib_dev-phys_port_cnt;
+}
+
 ssize_t ib_uverbs_query_device(struct ib_uverbs_file *file,
   const char __user *buf,
   int in_len, int out_len)
@@ -398,47 +444,7 @@ ssize_t ib_uverbs_query_device(struct ib_uverbs_file *file,
return ret;
 
memset(resp, 0, sizeof resp);
-
-   resp.fw_ver= attr.fw_ver;
-   resp.node_guid = file-device-ib_dev-node_guid;
-   resp.sys_image_guid= attr.sys_image_guid;
-   resp.max_mr_size   = attr.max_mr_size;
-   resp.page_size_cap = attr.page_size_cap;
-   resp.vendor_id = attr.vendor_id;
-   resp.vendor_part_id= attr.vendor_part_id;
-   resp.hw_ver= attr.hw_ver;
-   resp.max_qp= attr.max_qp;
-   resp.max_qp_wr = attr.max_qp_wr;
-   resp.device_cap_flags  = attr.device_cap_flags;
-   resp.max_sge

Re: [PATCH] IB/srp: Fix a 32-bit compiler warning

2014-11-06 Thread Christoph Hellwig
On Thu, Nov 06, 2014 at 03:18:12PM +0100, Bart Van Assche wrote:
 The result of a pointer subtraction has type ptrdiff_t. Hence change a
 %ld format specifier into %td. This change avoids that the following
 warning is printed on 32-bit systems:
 
 warning: format '%ld' expects argument of type 'long int', but argument 5 has 
 type 'int' [-Wformat=]

Thanks.  Given that this is a new warning in the patches I merged I'll
add this one as well.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 0/5] Dynamically Connected support

2014-11-06 Thread Eli Cohen
Hi Roland,

the following series of patches introduces a new transport service named DC.
Support is added at IB/core layer, uverbs interface to userspace and mlx5 for
Connect-IB devices. Details on the new transport can be found in the first
patch in the series.

Eli

Eli Cohen (5):
  IB/core: Add DC transport support
  IB/uverbs: Add userspace interface to DC verbs
  mlx5_core: Add DC support at mlx5 core layer
  mlx5_ib: Add support for DC
  mlx5_core: Update mlx5_command_str with DC commands

 drivers/infiniband/core/uverbs.h  |  11 +
 drivers/infiniband/core/uverbs_cmd.c  | 474 +++---
 drivers/infiniband/core/uverbs_main.c |  35 +-
 drivers/infiniband/core/verbs.c   |  87 
 drivers/infiniband/hw/mlx5/main.c |  19 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h  |  24 ++
 drivers/infiniband/hw/mlx5/qp.c   | 289 -
 drivers/infiniband/hw/mlx5/user.h |  41 ++
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
 drivers/net/ethernet/mellanox/mlx5/core/debugfs.c | 110 +
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  |  15 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c|   6 +
 drivers/net/ethernet/mellanox/mlx5/core/qp.c  | 185 +
 include/linux/mlx5/device.h   |  12 +
 include/linux/mlx5/driver.h   |  24 +-
 include/linux/mlx5/mlx5_ifc.h | 179 
 include/linux/mlx5/qp.h   |  39 +-
 include/rdma/ib_verbs.h   |  88 +++-
 include/uapi/rdma/ib_user_verbs.h | 126 +-
 19 files changed, 1710 insertions(+), 75 deletions(-)

-- 
2.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 1/5] IB/core: Add DC transport support

2014-11-06 Thread Eli Cohen
The Dynamically Connected (DC) Transport Service provides a reliable
datagram-like model that allows a single sender to target multiple destinations
from the same QP, keeping the communication resource footprint essentially
independent of system size. DC supports RDMA read and write operations, as well
as atomic variable updates.  With this transport a DC initiator QP may be used
to target multiple remote DC Targets, in one or more remote processes.  As far
as reachability is concerned, the DC model is somewhat similar to the
Unreliable Datagram (UD) model in the sense that each WR submitted to the DC SQ
carries the information that identifies the remote destination. DC contexts are
then dynamically tied to each other across the network to create a temporary
RC-equivalent connection that is used to reliably deliver one or more messages.
This dynamic connection is created in-band and pipelined with the subsequent
data communication thus eliminating most of the cost associated with the 3-way
handshake off the Connection Manager protocol used for connecting RC QPs. When
all WRs posted to that remote network address are acknowledged, the initiator
sends a disconnect request to the responder, thereby releasing the responder
resources.
A DC initiator is yet another type of QP identified by a new transport type,
IB_QPT_DC_INI. The target is end is presented by a new object of type ib_dct.

This patch extend the verbs API with the following new APIs:
ib_create_dc - Create a DC target
ib_destroy_dct - Destroy a DC target
ib_query_dct - query DC target
ib_arm_dct - Arm a DC target to generate asynchronous event on DC key
violation. Once a event is generated, the DC target moves to a fired state and
will not generated further key violation events unless re-armed.

ib_modify_qp_ex - is an extension to ib_modify_qp which allows to pass the 64
bit DC key.

Signed-off-by: Eli Cohen e...@mellanox.com
---
 drivers/infiniband/core/verbs.c | 87 +
 include/rdma/ib_verbs.h | 87 -
 2 files changed, 172 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index c2b89cc5dbca..c2b2d00c9794 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -521,6 +521,9 @@ static const struct {
[IB_QPT_RC]  = (IB_QP_PKEY_INDEX
|
IB_QP_PORT  
|
IB_QP_ACCESS_FLAGS),
+   [IB_QPT_DC_INI]  = (IB_QP_PKEY_INDEX
|
+   IB_QP_PORT  
|
+   IB_QP_DC_KEY),
[IB_QPT_XRC_INI] = (IB_QP_PKEY_INDEX
|
IB_QP_PORT  
|
IB_QP_ACCESS_FLAGS),
@@ -549,6 +552,9 @@ static const struct {
[IB_QPT_RC]  = (IB_QP_PKEY_INDEX
|
IB_QP_PORT  
|
IB_QP_ACCESS_FLAGS),
+   [IB_QPT_DC_INI]  = (IB_QP_PKEY_INDEX
|
+   IB_QP_PORT  
|
+   IB_QP_DC_KEY),
[IB_QPT_XRC_INI] = (IB_QP_PKEY_INDEX
|
IB_QP_PORT  
|
IB_QP_ACCESS_FLAGS),
@@ -574,6 +580,8 @@ static const struct {
IB_QP_RQ_PSN
|
IB_QP_MAX_DEST_RD_ATOMIC
|
IB_QP_MIN_RNR_TIMER),
+   [IB_QPT_DC_INI]  = (IB_QP_AV
|
+   IB_QP_PATH_MTU),
[IB_QPT_XRC_INI] = (IB_QP_AV
|
IB_QP_PATH_MTU  
|
IB_QP_DEST_QPN  
|
@@ -600,6 +608,8 @@ static const struct {
 [IB_QPT_RC]  = (IB_QP_ALT_PATH 
|
 IB_QP_ACCESS_FLAGS 
|
 IB_QP_PKEY_INDEX),
+[IB_QPT_DC_INI]  = (IB_QP_PKEY_INDEX   
|
+IB_QP_DC_KEY),
 [IB_QPT_XRC_INI] = (IB_QP_ALT_PATH  

[PATCH for-next 3/5] mlx5_core: Add DC support at mlx5 core layer

2014-11-06 Thread Eli Cohen
Update debugfs, implement DC commands and handle events.

Signed-off-by: Eli Cohen e...@mellanox.com
---
 drivers/net/ethernet/mellanox/mlx5/core/debugfs.c | 110 +
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  |  15 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c|   6 +
 drivers/net/ethernet/mellanox/mlx5/core/qp.c  | 185 ++
 include/linux/mlx5/device.h   |  12 ++
 include/linux/mlx5/driver.h   |  24 ++-
 include/linux/mlx5/mlx5_ifc.h | 179 +
 include/linux/mlx5/qp.h   |  39 -
 8 files changed, 566 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
index 10e1f1a18255..3e115dee235a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
@@ -35,6 +35,7 @@
 #include linux/mlx5/qp.h
 #include linux/mlx5/cq.h
 #include linux/mlx5/driver.h
+#include linux/mlx5/mlx5_ifc.h
 #include mlx5_core.h
 
 enum {
@@ -62,6 +63,22 @@ static char *qp_fields[] = {
 };
 
 enum {
+   DCT_PID,
+   DCT_STATE,
+   DCT_MTU,
+   DCT_KEY_VIOL,
+   DCT_CQN,
+};
+
+static char *dct_fields[] = {
+   [DCT_PID]   = pid,
+   [DCT_STATE] = state,
+   [DCT_MTU]   = mtu,
+   [DCT_KEY_VIOL]  = key_violations,
+   [DCT_CQN]   = cqn,
+};
+
+enum {
EQ_NUM_EQES,
EQ_INTR,
EQ_LOG_PG_SZ,
@@ -122,6 +139,26 @@ void mlx5_qp_debugfs_cleanup(struct mlx5_core_dev *dev)
debugfs_remove_recursive(dev-priv.qp_debugfs);
 }
 
+int mlx5_dct_debugfs_init(struct mlx5_core_dev *dev)
+{
+   if (!mlx5_debugfs_root)
+   return 0;
+
+   dev-priv.dct_debugfs = debugfs_create_dir(DCTs,  dev-priv.dbg_root);
+   if (!dev-priv.dct_debugfs)
+   return -ENOMEM;
+
+   return 0;
+}
+
+void mlx5_dct_debugfs_cleanup(struct mlx5_core_dev *dev)
+{
+   if (!mlx5_debugfs_root)
+   return;
+
+   debugfs_remove_recursive(dev-priv.dct_debugfs);
+}
+
 int mlx5_eq_debugfs_init(struct mlx5_core_dev *dev)
 {
if (!mlx5_debugfs_root)
@@ -355,6 +392,51 @@ out:
return param;
 }
 
+static u64 dct_read_field(struct mlx5_core_dev *dev, struct mlx5_core_dct *dct,
+ int index, int *is_str)
+{
+   void *out;
+   void *dctc;
+   int out_sz = MLX5_ST_SZ_BYTES(query_dct_out);
+   u64 param = 0;
+   int err;
+
+   out = kzalloc(out_sz, GFP_KERNEL);
+   if (!out)
+   return param;
+
+   err = mlx5_core_dct_query(dev, dct, out);
+   if (err) {
+   mlx5_core_warn(dev, failed to query dct\n);
+   goto out;
+   }
+
+   dctc = MLX5_ADDR_OF(query_dct_out, out, dct_context_entry);
+   *is_str = 0;
+   switch (index) {
+   case DCT_PID:
+   param = dct-pid;
+   break;
+   case DCT_STATE:
+   param = (u64)mlx5_dct_state_str(MLX5_GET(dctc, dctc, state));
+   *is_str = 1;
+   break;
+   case DCT_MTU:
+   param = ib_mtu_enum_to_int(MLX5_GET(dctc, dctc, mtu));
+   break;
+   case DCT_KEY_VIOL:
+   param = MLX5_GET(dctc, dctc, dc_access_key_violation_count);
+   break;
+   case DCT_CQN:
+   param = MLX5_GET(dctc, dctc, cqn);
+   break;
+   }
+
+out:
+   kfree(out);
+   return param;
+}
+
 static u64 eq_read_field(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
 int index)
 {
@@ -457,6 +539,10 @@ static ssize_t dbg_read(struct file *filp, char __user 
*buf, size_t count,
field = cq_read_field(d-dev, d-object, desc-i);
break;
 
+   case MLX5_DBG_RSC_DCT:
+   field = dct_read_field(d-dev, d-object, desc-i, is_str);
+   break;
+
default:
mlx5_core_warn(d-dev, invalid resource type %d\n, d-type);
return -EINVAL;
@@ -558,6 +644,30 @@ void mlx5_debug_qp_remove(struct mlx5_core_dev *dev, 
struct mlx5_core_qp *qp)
rem_res_tree(qp-dbg);
 }
 
+int mlx5_debug_dct_add(struct mlx5_core_dev *dev, struct mlx5_core_dct *dct)
+{
+   int err;
+
+   if (!mlx5_debugfs_root)
+   return 0;
+
+   err = add_res_tree(dev, MLX5_DBG_RSC_DCT, dev-priv.dct_debugfs,
+  dct-dbg, dct-dctn, dct_fields,
+  ARRAY_SIZE(dct_fields), dct);
+   if (err)
+   dct-dbg = NULL;
+
+   return err;
+}
+
+void mlx5_debug_dct_remove(struct mlx5_core_dev *dev, struct mlx5_core_dct 
*dct)
+{
+   if (!mlx5_debugfs_root)
+   return;
+
+   if (dct-dbg)
+   rem_res_tree(dct-dbg);
+}
 
 int mlx5_debug_eq_add(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
 {
diff --git 

[PATCH for-next 2/5] IB/uverbs: Add userspace interface to DC verbs

2014-11-06 Thread Eli Cohen
Signed-off-by: Eli Cohen e...@mellanox.com
---
 drivers/infiniband/core/uverbs.h  |  11 +
 drivers/infiniband/core/uverbs_cmd.c  | 474 +-
 drivers/infiniband/core/uverbs_main.c |  35 ++-
 include/rdma/ib_verbs.h   |   1 +
 include/uapi/rdma/ib_user_verbs.h | 126 -
 5 files changed, 584 insertions(+), 63 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index b716b0815644..3343696df6b1 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -163,6 +163,10 @@ struct ib_ucq_object {
u32 async_events_reported;
 };
 
+struct ib_udct_object {
+   struct ib_uevent_object uevent;
+};
+
 extern spinlock_t ib_uverbs_idr_lock;
 extern struct idr ib_uverbs_pd_idr;
 extern struct idr ib_uverbs_mr_idr;
@@ -173,6 +177,7 @@ extern struct idr ib_uverbs_qp_idr;
 extern struct idr ib_uverbs_srq_idr;
 extern struct idr ib_uverbs_xrcd_idr;
 extern struct idr ib_uverbs_rule_idr;
+extern struct idr ib_uverbs_dct_idr;
 
 void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj);
 
@@ -189,6 +194,7 @@ void ib_uverbs_release_uevent(struct ib_uverbs_file *file,
 void ib_uverbs_comp_handler(struct ib_cq *cq, void *cq_context);
 void ib_uverbs_cq_event_handler(struct ib_event *event, void *context_ptr);
 void ib_uverbs_qp_event_handler(struct ib_event *event, void *context_ptr);
+void ib_uverbs_dct_event_handler(struct ib_event *event, void *context_ptr);
 void ib_uverbs_srq_event_handler(struct ib_event *event, void *context_ptr);
 void ib_uverbs_event_handler(struct ib_event_handler *handler,
 struct ib_event *event);
@@ -259,5 +265,10 @@ IB_UVERBS_DECLARE_CMD(close_xrcd);
 IB_UVERBS_DECLARE_EX_CMD(create_flow);
 IB_UVERBS_DECLARE_EX_CMD(destroy_flow);
 IB_UVERBS_DECLARE_EX_CMD(query_device);
+IB_UVERBS_DECLARE_EX_CMD(create_dct);
+IB_UVERBS_DECLARE_EX_CMD(destroy_dct);
+IB_UVERBS_DECLARE_EX_CMD(query_dct);
+IB_UVERBS_DECLARE_EX_CMD(arm_dct);
+IB_UVERBS_DECLARE_EX_CMD(modify_qp);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 0bc215fa2a85..e2a1f691315c 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -56,6 +56,7 @@ static struct uverbs_lock_class ah_lock_class = { .name = 
AH-uobj };
 static struct uverbs_lock_class srq_lock_class = { .name = SRQ-uobj };
 static struct uverbs_lock_class xrcd_lock_class = { .name = XRCD-uobj };
 static struct uverbs_lock_class rule_lock_class = { .name = RULE-uobj };
+static struct uverbs_lock_class dct_lock_class = { .name = DCT-uobj };
 
 /*
  * The ib_uobject locking scheme is as follows:
@@ -258,6 +259,16 @@ static void put_qp_write(struct ib_qp *qp)
put_uobj_write(qp-uobject);
 }
 
+static struct ib_dct *idr_read_dct(int dct_handle, struct ib_ucontext *context)
+{
+   return idr_read_obj(ib_uverbs_dct_idr, dct_handle, context, 0);
+}
+
+static void put_dct_read(struct ib_dct *dct)
+{
+   put_uobj_read(dct-uobject);
+}
+
 static struct ib_srq *idr_read_srq(int srq_handle, struct ib_ucontext *context)
 {
return idr_read_obj(ib_uverbs_srq_idr, srq_handle, context, 0);
@@ -325,6 +336,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
INIT_LIST_HEAD(ucontext-ah_list);
INIT_LIST_HEAD(ucontext-xrcd_list);
INIT_LIST_HEAD(ucontext-rule_list);
+   INIT_LIST_HEAD(ucontext-dct_list);
ucontext-closing = 0;
 
resp.num_comp_vectors = file-device-num_comp_vectors;
@@ -1990,86 +2002,79 @@ static int modify_qp_mask(enum ib_qp_type qp_type, int 
mask)
}
 }
 
-ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
-   const char __user *buf, int in_len,
-   int out_len)
+static ssize_t modify_qp(struct ib_uverbs_file *file,
+struct ib_uverbs_modify_qp_ex *cmd,
+struct ib_udata *udata)
 {
-   struct ib_uverbs_modify_qp cmd;
-   struct ib_udataudata;
struct ib_qp  *qp;
struct ib_qp_attr *attr;
intret;
 
-   if (copy_from_user(cmd, buf, sizeof cmd))
-   return -EFAULT;
-
-   INIT_UDATA(udata, buf + sizeof cmd, NULL, in_len - sizeof cmd,
-  out_len);
-
attr = kmalloc(sizeof *attr, GFP_KERNEL);
if (!attr)
return -ENOMEM;
 
-   qp = idr_read_qp(cmd.qp_handle, file-ucontext);
+   qp = idr_read_qp(cmd-qp_handle, file-ucontext);
if (!qp) {
ret = -EINVAL;
goto out;
}
 
-   attr-qp_state= cmd.qp_state;
-   attr-cur_qp_state= cmd.cur_qp_state;
-   attr-path_mtu= cmd.path_mtu;
-   attr-path_mig_state  = cmd.path_mig_state;
-   attr-qkey= cmd.qkey;

[PATCH for-next 5/5] mlx5_core: Update mlx5_command_str with DC commands

2014-11-06 Thread Eli Cohen
Add support for DC commands and make a few other minor fixes.

Signed-off-by: Eli Cohen e...@mellanox.com
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 368c6c5ea014..37786ea6ad6c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -289,10 +289,10 @@ const char *mlx5_command_str(int command)
return TEARDOWN_HCA;
 
case MLX5_CMD_OP_ENABLE_HCA:
-   return MLX5_CMD_OP_ENABLE_HCA;
+   return ENABLE_HCA;
 
case MLX5_CMD_OP_DISABLE_HCA:
-   return MLX5_CMD_OP_DISABLE_HCA;
+   return DISABLE_HCA;
 
case MLX5_CMD_OP_QUERY_PAGES:
return QUERY_PAGES;
@@ -390,6 +390,21 @@ const char *mlx5_command_str(int command)
case MLX5_CMD_OP_RESIZE_SRQ:
return RESIZE_SRQ;
 
+   case MLX5_CMD_OP_CREATE_DCT:
+   return CREATE_DCT;
+
+   case MLX5_CMD_OP_DESTROY_DCT:
+   return DESTROY_DCT;
+
+   case MLX5_CMD_OP_DRAIN_DCT:
+   return DRAIN_DCT;
+
+   case MLX5_CMD_OP_QUERY_DCT:
+   return QUERY_DCT;
+
+   case MLX5_CMD_OP_ARM_DCT_FOR_KEY_VIOLATION:
+   return ARM_DCT;
+
case MLX5_CMD_OP_ALLOC_PD:
return ALLOC_PD;
 
@@ -415,7 +430,7 @@ const char *mlx5_command_str(int command)
return DEALLOC_XRCD;
 
case MLX5_CMD_OP_ACCESS_REG:
-   return MLX5_CMD_OP_ACCESS_REG;
+   return ACCESS_REG;
 
default: return unknown command opcode;
}
-- 
2.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 4/5] mlx5_ib: Add support for DC

2014-11-06 Thread Eli Cohen
Signed-off-by: Eli Cohen e...@mellanox.com
---
 drivers/infiniband/hw/mlx5/main.c|  19 +++
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  24 +++
 drivers/infiniband/hw/mlx5/qp.c  | 289 ++-
 drivers/infiniband/hw/mlx5/user.h|  41 +
 4 files changed, 370 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 3c6fa99c4256..c805385d878d 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -283,6 +283,11 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
props-max_total_mcast_qp_attach = props-max_mcast_qp_attach *
   props-max_mcast_grp;
props-max_map_per_fmr = INT_MAX; /* no limit in ConnectIB */
+   if (gen-flags  MLX5_DEV_CAP_FLAG_DCT) {
+   props-device_cap_flags |= IB_DEVICE_DC_TRANSPORT;
+   props-dc_rd_req = 1  gen-log_max_ra_req_dc;
+   props-dc_rd_res = 1  gen-log_max_ra_res_dc;
+   }
 
 out:
kfree(in_mad);
@@ -1405,6 +1410,8 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
dev-ib_dev.free_fast_reg_page_list  = mlx5_ib_free_fast_reg_page_list;
dev-ib_dev.check_mr_status = mlx5_ib_check_mr_status;
+   dev-ib_dev.uverbs_ex_cmd_mask  |=
+   (1ull  IB_USER_VERBS_EX_CMD_MODIFY_QP);
 
if (mdev-caps.gen.flags  MLX5_DEV_CAP_FLAG_XRC) {
dev-ib_dev.alloc_xrcd = mlx5_ib_alloc_xrcd;
@@ -1417,6 +1424,18 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
dev-ib_dev.uverbs_ex_cmd_mask  |=
(1ull  IB_USER_VERBS_EX_CMD_QUERY_DEVICE);
 
+   if (mdev-caps.gen.flags  MLX5_DEV_CAP_FLAG_DCT) {
+   dev-ib_dev.create_dct = mlx5_ib_create_dct;
+   dev-ib_dev.destroy_dct = mlx5_ib_destroy_dct;
+   dev-ib_dev.query_dct = mlx5_ib_query_dct;
+   dev-ib_dev.arm_dct = mlx5_ib_arm_dct;
+   dev-ib_dev.uverbs_ex_cmd_mask |=
+   (1ull  IB_USER_VERBS_EX_CMD_CREATE_DCT)   |
+   (1ull  IB_USER_VERBS_EX_CMD_DESTROY_DCT)  |
+   (1ull  IB_USER_VERBS_EX_CMD_QUERY_DCT)|
+   (1ull  IB_USER_VERBS_EX_CMD_ARM_DCT);
+   }
+
err = init_node_data(dev);
if (err)
goto err_eqs;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 386780f0d1e1..5a78f2c60867 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -194,6 +194,11 @@ struct mlx5_ib_qp {
boolsignature_en;
 };
 
+struct mlx5_ib_dct {
+   struct ib_dct   ibdct;
+   struct mlx5_core_dctmdct;
+};
+
 struct mlx5_ib_cq_buf {
struct mlx5_buf buf;
struct ib_umem  *umem;
@@ -444,6 +449,16 @@ static inline struct mlx5_ib_fast_reg_page_list 
*to_mfrpl(struct ib_fast_reg_pag
return container_of(ibfrpl, struct mlx5_ib_fast_reg_page_list, ibfrpl);
 }
 
+static inline struct mlx5_ib_dct *to_mibdct(struct mlx5_core_dct *mdct)
+{
+   return container_of(mdct, struct mlx5_ib_dct, mdct);
+}
+
+static inline struct mlx5_ib_dct *to_mdct(struct ib_dct *ibdct)
+{
+   return container_of(ibdct, struct mlx5_ib_dct, ibdct);
+}
+
 struct mlx5_ib_ah {
struct ib_ahibah;
struct mlx5_av  av;
@@ -482,6 +497,8 @@ struct ib_qp *mlx5_ib_create_qp(struct ib_pd *pd,
struct ib_udata *udata);
 int mlx5_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
  int attr_mask, struct ib_udata *udata);
+int mlx5_ib_modify_qp_ex(struct ib_qp *ibqp, struct ib_qp_attr *attr,
+int attr_mask, struct ib_udata *udata);
 int mlx5_ib_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr, int 
qp_attr_mask,
 struct ib_qp_init_attr *qp_init_attr);
 int mlx5_ib_destroy_qp(struct ib_qp *qp);
@@ -524,6 +541,13 @@ struct ib_xrcd *mlx5_ib_alloc_xrcd(struct ib_device *ibdev,
  struct ib_ucontext *context,
  struct ib_udata *udata);
 int mlx5_ib_dealloc_xrcd(struct ib_xrcd *xrcd);
+struct ib_dct *mlx5_ib_create_dct(struct ib_pd *pd,
+ struct ib_dct_init_attr *attr,
+ struct ib_udata *uhw);
+int mlx5_ib_destroy_dct(struct ib_dct *dct, struct ib_udata *uhw);
+int mlx5_ib_query_dct(struct ib_dct *dct, struct ib_dct_attr *attr,
+ struct ib_udata *uhw);
+int mlx5_ib_arm_dct(struct ib_dct *dct, struct ib_udata *uhw);
 int mlx5_vector2eqn(struct mlx5_ib_dev *dev, int vector, int *eqn, int *irqn);
 int mlx5_ib_get_buf_offset(u64 addr, int page_shift, u32 *offset);
 int 

Re: [PATCH for-next 1/5] IB/core: Add DC transport support

2014-11-06 Thread Or Gerlitz

On 11/6/2014 5:52 PM, Eli Cohen wrote:

ib_modify_qp_ex - is an extension to ib_modify_qp which allows to pass the 64
bit DC key.


we don't add new such kernel verb, probably just left over in the 
change-log from earlier internal version, right?

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 net-next 0/3] RDMA/cxgb4,cxgb4vf,cxgb4i,csiostor: Cleanup macros

2014-11-06 Thread Hariprasad S
On Wed, Nov 05, 2014 at 14:54:43 -0500, David Miller wrote:
 From: Hariprasad Shenai haripra...@chelsio.com
 Date: Tue,  4 Nov 2014 08:20:54 +0530
 
  It's not really the hardware which generates these hardware constant 
  symbolic
  macros/register defines of course, it's scripts developed by the hardware 
  team.
  Various patches have ended up changing the style of the symbolic 
  macros/register
  defines and some of them used the macros/register defines that matches the
  output of the script from the hardware team.
 
 We've told you that we don't care what format your internal whatever uses
 for these macros.
 
 We have standards, tastes, and desires and reasons for naming macros
 in a certain way in upstream kernel code.
 
 I consider it flat out unacceptable to use macros with one letter
 prefixes like S_.  You simply should not do this.
 

Okay. We’ll clean up all of the macros to match the files' original style. We
do need to change the sense of the *_MASK macros since they don’t match how we 
use them as field tokens.  Also the *_SHIFT, *_MASK and *_GET names are
sucking up space and making lines wrap unnecessarily, creating readability
problems.  Can we change these to *_S, *_M and *_G?  E.g.:

-#define  INGPADBOUNDARY_MASK0x0070U
-#define  INGPADBOUNDARY_SHIFT   4
-#define  INGPADBOUNDARY(x)  ((x)  INGPADBOUNDARY_SHIFT)
-#define  INGPADBOUNDARY_GET(x)  (((x)  INGPADBOUNDARY_MASK) \
-  INGPADBOUNDARY_SHIFT)
+#define  INGPADBOUNDARY_M   0x0007U
+#define  INGPADBOUNDARY_S   4
+#define  INGPADBOUNDARY(x)  ((x)  INGPADBOUNDARY_S)
+#define  INGPADBOUNDARY_G(x)(((x)  INGPADBOUNDARY_S) \
+  INGPADBOUNDARY_M)


 
Thanks,
Hari
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


NFSoRDMA bi-weekly meeting minutes (11/6)

2014-11-06 Thread Shirley Ma
Attendees:

Jeff Becker (NASA)
Wendy Cheng (Intel)
Rupert Dance (Soft Forge)
Steve Dickson (Red Hat)
Chuck Lever (Oracle)
Doug Ledford (RedHat)
Shirley Ma (Oracle)
Sachin Prabhu (RedHat)
Devesh Sharma (Emulex)
Anna Schumaker (Net App)
Steve Wise (OpenGridComputing, Chelsio)

Yan Burman(Mellanox) missed the call because of the daylight time change. :(

Moderator:
Shirley Ma (Oracle)

NFSoRDMA developers bi-weekly meeting is to help organizing NFSoRDMA 
development and test effort from different resources to speed up NFSoRDMA 
upstream kernel work and NFSoRDMA diagnosing/debugging tools development. 
Hopefully the quality of NFSoRDMA upstream patches can be improved by being 
tested with a quorum of HW vendors.

Today's meeting notes:
1. OFA OFA Interop event (Rupert)
The Interop event went pretty well. The test covered IB, RoCE and iWARP with 
different vendors HW and upsteam/OFED stack. NFSoRDMA IB was included in this 
test event, however NFSoRDMA RoCE wasn't able to test since the modules were 
not in the stack yet. The detail report will come in a few weeks.

2. Upstream bugs: (Chuck, Anna, Shirley)
3.17 kernel has a bug in tearing down connection, this bug was hit consistently 
when enabling multiple EQs in xprtrdma when Shirley run fio multithread random 
read/write workload. Chuck has a nice patch to this bug, Shirley has validated 
this fix by stressing the fio overnight. Anna will check to see the possibility 
to push to the stable tree since it blocks multi-threads NFSoRDMA workload. 
Here is the link to the bug report:
https://bugzilla.linux-nfs.org/show_bug.cgi?id=276

3. Performance test and analyze tools: (Sachin, Chuck, Wendy, Shirley, SteveW)
Discussed about several tools on analyzing NFSoRDMA performance for both 
latency and bandwidth:
-- systemTap: Sachine starts to look at how to use systemTap, it requires 
sometimes to study the tool and create the probe scripts to NFS, RPC, xprtrdma 
layer.
-- Ftrace: enabling trace modules and functions to report the execution flow 
latency.
-- perf: report execution flows APIs latency and cpu usage
-- /proc/self/mountstats: report total execution time, RTT and wait time for 
each RPC. The execution time latency contributes from wake up and wait, which 
depends on how busy the system is. RPC RTT itself latency is reasonable.

The NFSoRDMA performance relies on both implementation and protocol. We don't 
know the weight of performance gap from either implementation or protocol yet. 
RPC seems slow, pNFS might have better performance for supporting multiple 
queue pairs. Chuck will increase RPC credit limit to see how much performance 
gain from there. Our performance goal is to look at the implementation issues, 
then protocols.

Feel free to reply here for anything missing or incorrect. See you on Nov.20th.

10/23/2014
@7:30am PT
@8:30am MT
@9:30am CT
@10:30am ET
@Bangalore @9:00pm
@Israel @6:30pm

Duration: 1 hour

Call-in number:
Israel: +972 37219638
Bangalore: +91 8039890080 (180030109800)
France  Colombes +33 1 5760 +33 176728936
US: 8666824770,  408-7744073

Conference Code: 2308833
Passcode: 63767362 (it's NFSoRDMA, in case you couldn't remember)

Thanks everyone for joining the call and providing valuable inputs/work to the 
community to make NFSoRDMA better.

Cheers,
Shirley
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 0/2] mlx5_core fixes for 3.18

2014-11-06 Thread David Miller
From: Eli Cohen e...@dev.mellanox.co.il
Date: Thu,  6 Nov 2014 12:51:20 +0200

 the following two patches fix races to could lead to kernel panic in some 
 cases.

Series applied, thanks Eli.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 net-next 0/3] RDMA/cxgb4,cxgb4vf,cxgb4i,csiostor: Cleanup macros

2014-11-06 Thread David Miller
From: Hariprasad S haripra...@chelsio.com
Date: Thu, 6 Nov 2014 21:45:10 +0530

 On Wed, Nov 05, 2014 at 14:54:43 -0500, David Miller wrote:
 From: Hariprasad Shenai haripra...@chelsio.com
 Date: Tue,  4 Nov 2014 08:20:54 +0530
 
  It's not really the hardware which generates these hardware constant 
  symbolic
  macros/register defines of course, it's scripts developed by the hardware 
  team.
  Various patches have ended up changing the style of the symbolic 
  macros/register
  defines and some of them used the macros/register defines that matches the
  output of the script from the hardware team.
 
 We've told you that we don't care what format your internal whatever uses
 for these macros.
 
 We have standards, tastes, and desires and reasons for naming macros
 in a certain way in upstream kernel code.
 
 I consider it flat out unacceptable to use macros with one letter
 prefixes like S_.  You simply should not do this.
 
 
 Okay. We’ll clean up all of the macros to match the files' original style. We
 do need to change the sense of the *_MASK macros since they don’t match how 
 we 
 use them as field tokens.  Also the *_SHIFT, *_MASK and *_GET names are
 sucking up space and making lines wrap unnecessarily, creating readability
 problems.  Can we change these to *_S, *_M and *_G?  E.g.:

That's fine.


[PATCH v2 1/6] ib/mad: Add function to support format specifiers for node description

2014-11-06 Thread ira . weiny
From: Ira Weiny ira.we...@intel.com

ib_build_node_desc - prints src node description into dest while mapping format
specifiers

Specifiers supported:
%h system hostname
%d device name

Define a default Node Description format to be %h %d

Original work done by Mike Heinz.

The function signature is generic to support some devices which are not
processing an ib_smp object when calling this function.

Reviewed-by: John Fleck john.fl...@intel.com
Reviewed-by: Michael Heinz michael.william.he...@intel.com
Reviewed-by: Mike Marciniszyn mike.marcinis...@intel.com
Signed-off-by: Ira Weiny ira.we...@intel.com

---

Changes from V1
remove unnecessary ib_smi include

 drivers/infiniband/core/mad.c | 37 +
 include/rdma/ib_mad.h | 17 +
 include/rdma/ib_verbs.h   |  6 --
 3 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 74c30f4..93cf8a0 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -39,6 +39,7 @@
 #include linux/dma-mapping.h
 #include linux/slab.h
 #include linux/module.h
+#include linux/utsname.h
 #include rdma/ib_cache.h
 
 #include mad_priv.h
@@ -996,6 +997,42 @@ int ib_get_mad_data_offset(u8 mgmt_class)
 }
 EXPORT_SYMBOL(ib_get_mad_data_offset);
 
+void ib_build_node_desc(char *dest, char *src, int dest_len,
+   struct ib_device *dev)
+{
+   char *end = dest + dest_len-1;
+   char *field;
+
+   while (*src  (dest  end)) {
+   if (*src != '%') {
+   *dest++ = *src++;
+   } else {
+   src++;
+   switch (*src) {
+   case 'h':
+   field = init_utsname()-nodename;
+   src++;
+   while (*field  (*field != '.')  (dest  
end))
+   *dest++ = *field++;
+   break;
+   case 'd':
+   field = dev-name;
+   src++;
+   while (*field  (dest  end))
+   *dest++ = *field++;
+   break;
+   default:
+   src++;
+   }
+   }
+   }
+   if (dest  end)
+   *dest = 0;
+   else
+   *end = 0;
+}
+EXPORT_SYMBOL(ib_build_node_desc);
+
 int ib_is_mad_class_rmpp(u8 mgmt_class)
 {
if ((mgmt_class == IB_MGMT_CLASS_SUBN_ADM) ||
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index 9bb99e9..975642e 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -677,4 +677,21 @@ void ib_free_send_mad(struct ib_mad_send_buf *send_buf);
  */
 int ib_mad_kernel_rmpp_agent(struct ib_mad_agent *agent);
 
+#define IB_DEFAULT_ND_FORMAT %h %d
+/**
+ * ib_build_node_desc - prints src node description into dest while mapping
+ * format specifiers
+ *
+ * Specifiers supported:
+ * %h system hostname
+ * %d device name
+ *
+ * @dest: destination buffer
+ * @src: source buffer
+ * @dest_len: destination buffer length
+ * @dev: ib_device
+ */
+void ib_build_node_desc(char *dest, char *src, int dest_len,
+   struct ib_device *dev);
+
 #endif /* IB_MAD_H */
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 470a011..f3ec6de 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -55,6 +55,8 @@
 
 extern struct workqueue_struct *ib_wq;
 
+#define IB_DEVICE_DESC_MAX 64
+
 union ib_gid {
u8  raw[16];
struct {
@@ -351,7 +353,7 @@ enum ib_device_modify_flags {
 
 struct ib_device_modify {
u64 sys_image_guid;
-   charnode_desc[64];
+   charnode_desc[IB_DEVICE_DESC_MAX];
 };
 
 enum ib_port_modify_flags {
@@ -1625,7 +1627,7 @@ struct ib_device {
u64  uverbs_cmd_mask;
u64  uverbs_ex_cmd_mask;
 
-   char node_desc[64];
+   char node_desc[IB_DEVICE_DESC_MAX];
__be64   node_guid;
u32  local_dma_lkey;
u8   node_type;
-- 
1.8.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 1/6] ib/mad: Add function to support format specifiers for node description

2014-11-06 Thread Hefty, Sean
 +void ib_build_node_desc(char *dest, char *src, int dest_len,
 + struct ib_device *dev)
 +{
 + char *end = dest + dest_len-1;
 + char *field;
 +
 + while (*src  (dest  end)) {
 + if (*src != '%') {
 + *dest++ = *src++;
 + } else {
 + src++;
 + switch (*src) {
 + case 'h':
 + field = init_utsname()-nodename;
 + src++;
 + while (*field  (*field != '.')  (dest  
 end))
 + *dest++ = *field++;
 + break;

Indentation is off

 + case 'd':
 + field = dev-name;
 + src++;
 + while (*field  (dest  end))
 + *dest++ = *field++;
 + break;
 + default:
 + src++;
 + }
 + }

src++ is called in every case and could be moved outside

 + }
 + if (dest  end)
 + *dest = 0;
 + else
 + *end = 0;

*dest = '\0'; should be sufficient in all cases


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv3 net-next 0/3] RDMA/cxgb4,cxgb4vf,cxgb4i,csiostor: Cleanup macros

2014-11-06 Thread Hariprasad Shenai
Hi,

This series moves the debugfs code to a new file debugfs.c and cleans up
macros/register defines.

Various patches have ended up changing the style of the symbolic macros/register
defines and some of them used the macros/register defines that matches the
output of the script from the hardware team.

As a result, the current kernel.org files are a mix of different macro styles.
Since this macro/register defines is used by five different drivers, a
few patch series have ended up adding duplicate macro/register define entries
with different styles. This makes these register define/macro files a complete
mess and we want to make them clean and consistent.

Will post few more series so that we can cover all the macros so that they all
follow the same style to be consistent.

The patches series is created against 'net-next' tree.
And includes patches on cxgb4, cxgb4vf, iw_cxgb4, csiostor and cxgb4i driver.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.

Thanks

V3: Use suffix instead of prefix for macros/register defines
V2: Changes the description and cover-letter content to answer David Miller's
question

Hariprasad Shenai (3):
  cxgb4: Add cxgb4_debugfs.c, move all debugfs code to new file
  cxgb4: Cleanup macros so they follow the same style and look
consistent
  cxgb4: Cleanup macros so they follow the same style and look
consistent, part 2

 drivers/infiniband/hw/cxgb4/cm.c   |   56 +++---
 drivers/infiniband/hw/cxgb4/cq.c   |8 +-
 drivers/infiniband/hw/cxgb4/mem.c  |   14 +-
 drivers/infiniband/hw/cxgb4/qp.c   |   26 ++--
 drivers/net/ethernet/chelsio/cxgb4/Makefile|1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |3 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.h |6 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |  158 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.h |   52 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c|  173 +---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h |   15 +-
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |   32 ++--
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c |  127 +++---
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h   |   72 +++--
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h  |  142 
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c |   32 ++--
 drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h |2 +-
 drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c |  150 +-
 drivers/scsi/csiostor/csio_attr.c  |8 +-
 drivers/scsi/csiostor/csio_hw.c|   14 +-
 drivers/scsi/csiostor/csio_hw_t4.c |   15 +-
 drivers/scsi/csiostor/csio_hw_t5.c |   21 ++-
 drivers/scsi/csiostor/csio_init.c  |6 +-
 drivers/scsi/csiostor/csio_lnode.c |   18 +-
 drivers/scsi/csiostor/csio_mb.c|  172 ++--
 drivers/scsi/csiostor/csio_scsi.c  |   24 ++--
 drivers/scsi/csiostor/csio_wr.h|2 +-
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c |   35 ++--
 28 files changed, 816 insertions(+), 568 deletions(-)
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.h

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv3 net-next 1/3] cxgb4: Add cxgb4_debugfs.c, move all debugfs code to new file

2014-11-06 Thread Hariprasad Shenai
Signed-off-by: Hariprasad Shenai haripra...@chelsio.com
---
 drivers/net/ethernet/chelsio/cxgb4/Makefile|1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |  158 
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.h |   52 +++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c|   97 +
 5 files changed, 217 insertions(+), 92 deletions(-)
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.h

diff --git a/drivers/net/ethernet/chelsio/cxgb4/Makefile 
b/drivers/net/ethernet/chelsio/cxgb4/Makefile
index 1df65c9..b852807 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/Makefile
+++ b/drivers/net/ethernet/chelsio/cxgb4/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_CHELSIO_T4) += cxgb4.o
 
 cxgb4-objs := cxgb4_main.o l2t.o t4_hw.o sge.o
 cxgb4-$(CONFIG_CHELSIO_T4_DCB) +=  cxgb4_dcb.o
+cxgb4-$(CONFIG_DEBUG_FS) += cxgb4_debugfs.o
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 3c481b2..dad1ea9 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -1085,4 +1085,5 @@ void t4_db_dropped(struct adapter *adapter);
 int t4_fwaddrspace_write(struct adapter *adap, unsigned int mbox,
 u32 addr, u32 val);
 void t4_sge_decode_idma_state(struct adapter *adapter, int state);
+void t4_free_mem(void *addr);
 #endif /* __CXGB4_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
new file mode 100644
index 000..e86b5fe
--- /dev/null
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -0,0 +1,158 @@
+/*
+ * This file is part of the Chelsio T4 Ethernet driver for Linux.
+ *
+ * Copyright (c) 2003-2014 Chelsio Communications, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include linux/seq_file.h
+#include linux/debugfs.h
+#include linux/string_helpers.h
+#include linux/sort.h
+
+#include cxgb4.h
+#include t4_regs.h
+#include t4fw_api.h
+#include cxgb4_debugfs.h
+#include l2t.h
+
+static ssize_t mem_read(struct file *file, char __user *buf, size_t count,
+   loff_t *ppos)
+{
+   loff_t pos = *ppos;
+   loff_t avail = file_inode(file)-i_size;
+   unsigned int mem = (uintptr_t)file-private_data  3;
+   struct adapter *adap = file-private_data - mem;
+   __be32 *data;
+   int ret;
+
+   if (pos  0)
+   return -EINVAL;
+   if (pos = avail)
+   return 0;
+   if (count  avail - pos)
+   count = avail - pos;
+
+   data = t4_alloc_mem(count);
+   if (!data)
+   return -ENOMEM;
+
+   spin_lock(adap-win0_lock);
+   ret = t4_memory_rw(adap, 0, mem, pos, count, data, T4_MEMORY_READ);
+   spin_unlock(adap-win0_lock);
+   if (ret) {
+   t4_free_mem(data);
+   return ret;
+   }
+   ret = copy_to_user(buf, data, count);
+
+   t4_free_mem(data);
+   if (ret)
+   return -EFAULT;
+
+   *ppos = pos + count;
+   return count;
+}
+
+static const struct file_operations mem_debugfs_fops = {
+   .owner   = THIS_MODULE,
+   .open= simple_open,
+   .read= mem_read,
+   .llseek  = default_llseek,
+};
+
+static void add_debugfs_mem(struct adapter *adap, const char *name,
+   unsigned int idx, unsigned int size_mb)
+{
+   struct dentry *de;
+
+   de = 

[PATCHv3 net-next 2/3] cxgb4: Cleanup macros so they follow the same style and look consistent

2014-11-06 Thread Hariprasad Shenai
Various patches have ended up changing the style of the symbolic macros/register
to different style.

As a result, the current kernel.org files are a mix of different macro styles.
Since this macro/register defines is used by different drivers a
few patch series have ended up adding duplicate macro/register define entries
with different styles. This makes these register define/macro files a complete
mess and we want to make them clean and consistent. This patch cleans up a part
of it.

Signed-off-by: Hariprasad Shenai haripra...@chelsio.com
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |   32 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c|   16 +++--
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c |6 +-
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h   |   72 +++-
 drivers/scsi/csiostor/csio_hw_t4.c |   15 ++--
 drivers/scsi/csiostor/csio_hw_t5.c |   21 +++---
 drivers/scsi/csiostor/csio_init.c  |6 +-
 7 files changed, 106 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index e86b5fe..c98a350 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -128,30 +128,30 @@ int t4_setup_debugfs(struct adapter *adap)
  t4_debugfs_files,
  ARRAY_SIZE(t4_debugfs_files));
 
-   i = t4_read_reg(adap, MA_TARGET_MEM_ENABLE);
-   if (i  EDRAM0_ENABLE) {
-   size = t4_read_reg(adap, MA_EDRAM0_BAR);
-   add_debugfs_mem(adap, edc0, MEM_EDC0, EDRAM_SIZE_GET(size));
+   i = t4_read_reg(adap, MA_TARGET_MEM_ENABLE_A);
+   if (i  EDRAM0_ENABLE_F) {
+   size = t4_read_reg(adap, MA_EDRAM0_BAR_A);
+   add_debugfs_mem(adap, edc0, MEM_EDC0, EDRAM0_SIZE_G(size));
}
-   if (i  EDRAM1_ENABLE) {
-   size = t4_read_reg(adap, MA_EDRAM1_BAR);
-   add_debugfs_mem(adap, edc1, MEM_EDC1, EDRAM_SIZE_GET(size));
+   if (i  EDRAM1_ENABLE_F) {
+   size = t4_read_reg(adap, MA_EDRAM1_BAR_A);
+   add_debugfs_mem(adap, edc1, MEM_EDC1, EDRAM1_SIZE_G(size));
}
if (is_t4(adap-params.chip)) {
-   size = t4_read_reg(adap, MA_EXT_MEMORY_BAR);
-   if (i  EXT_MEM_ENABLE)
+   size = t4_read_reg(adap, MA_EXT_MEMORY_BAR_A);
+   if (i  EXT_MEM_ENABLE_F)
add_debugfs_mem(adap, mc, MEM_MC,
-   EXT_MEM_SIZE_GET(size));
+   EXT_MEM_SIZE_G(size));
} else {
-   if (i  EXT_MEM_ENABLE) {
-   size = t4_read_reg(adap, MA_EXT_MEMORY_BAR);
+   if (i  EXT_MEM0_ENABLE_F) {
+   size = t4_read_reg(adap, MA_EXT_MEMORY0_BAR_A);
add_debugfs_mem(adap, mc0, MEM_MC0,
-   EXT_MEM_SIZE_GET(size));
+   EXT_MEM0_SIZE_G(size));
}
-   if (i  EXT_MEM1_ENABLE) {
-   size = t4_read_reg(adap, MA_EXT_MEMORY1_BAR);
+   if (i  EXT_MEM1_ENABLE_F) {
+   size = t4_read_reg(adap, MA_EXT_MEMORY1_BAR_A);
add_debugfs_mem(adap, mc1, MEM_MC1,
-   EXT_MEM_SIZE_GET(size));
+   EXT_MEM1_SIZE_G(size));
}
}
return 0;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 172f68b..a2d6e50 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -3802,7 +3802,7 @@ int cxgb4_read_tpte(struct net_device *dev, u32 stag, 
__be32 *tpte)
 {
struct adapter *adap;
u32 offset, memtype, memaddr;
-   u32 edc0_size, edc1_size, mc0_size, mc1_size;
+   u32 edc0_size, edc1_size, mc0_size, mc1_size, size;
u32 edc0_end, edc1_end, mc0_end, mc1_end;
int ret;
 
@@ -3816,9 +3816,12 @@ int cxgb4_read_tpte(struct net_device *dev, u32 stag, 
__be32 *tpte)
 * and EDC1.  Some cards will have neither MC0 nor MC1, most cards have
 * MC0, and some have both MC0 and MC1.
 */
-   edc0_size = EDRAM_SIZE_GET(t4_read_reg(adap, MA_EDRAM0_BAR))  20;
-   edc1_size = EDRAM_SIZE_GET(t4_read_reg(adap, MA_EDRAM1_BAR))  20;
-   mc0_size = EXT_MEM_SIZE_GET(t4_read_reg(adap, MA_EXT_MEMORY_BAR))  20;
+   size = t4_read_reg(adap, MA_EDRAM0_BAR_A);
+   edc0_size = EDRAM0_SIZE_G(size)  20;
+   size = t4_read_reg(adap, MA_EDRAM1_BAR_A);
+   edc1_size = EDRAM1_SIZE_G(size)  20;
+   size = t4_read_reg(adap, MA_EXT_MEMORY0_BAR_A);
+   mc0_size = EXT_MEM0_SIZE_G(size)  20;
 
 

[PATCHv3 net-next 3/3] cxgb4: Cleanup macros so they follow the same style and look consistent, part 2

2014-11-06 Thread Hariprasad Shenai
Various patches have ended up changing the style of the symbolic macros/register
defines to different style.

As a result, the current kernel.org files are a mix of different macro styles.
Since this macro/register defines is used by different drivers a
few patch series have ended up adding duplicate macro/register define entries
with different styles. This makes these register define/macro files a complete
mess and we want to make them clean and consistent. This patch cleans up a part
of it.

Signed-off-by: Hariprasad Shenai haripra...@chelsio.com
---
 drivers/infiniband/hw/cxgb4/cm.c   |   56 
 drivers/infiniband/hw/cxgb4/cq.c   |8 +-
 drivers/infiniband/hw/cxgb4/mem.c  |   14 +-
 drivers/infiniband/hw/cxgb4/qp.c   |   26 ++--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.h |6 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c|   60 
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h |   15 +-
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |   32 ++--
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c |  121 +++---
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h  |  142 +
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c |   32 ++--
 drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h |2 +-
 drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c |  150 +-
 drivers/scsi/csiostor/csio_attr.c  |8 +-
 drivers/scsi/csiostor/csio_hw.c|   14 +-
 drivers/scsi/csiostor/csio_lnode.c |   18 +-
 drivers/scsi/csiostor/csio_mb.c|  172 ++--
 drivers/scsi/csiostor/csio_scsi.c  |   24 ++--
 drivers/scsi/csiostor/csio_wr.h|2 +-
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c |   35 ++--
 21 files changed, 509 insertions(+), 430 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index fb61f66..a07d8e1 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -472,10 +472,10 @@ static void send_flowc(struct c4iw_ep *ep, struct sk_buff 
*skb)
skb = get_skb(skb, flowclen, GFP_KERNEL);
flowc = (struct fw_flowc_wr *)__skb_put(skb, flowclen);
 
-   flowc-op_to_nparams = cpu_to_be32(FW_WR_OP(FW_FLOWC_WR) |
-  FW_FLOWC_WR_NPARAMS(8));
-   flowc-flowid_len16 = cpu_to_be32(FW_WR_LEN16(DIV_ROUND_UP(flowclen,
- 16)) | FW_WR_FLOWID(ep-hwtid));
+   flowc-op_to_nparams = cpu_to_be32(FW_WR_OP_V(FW_FLOWC_WR) |
+  FW_FLOWC_WR_NPARAMS_V(8));
+   flowc-flowid_len16 = cpu_to_be32(FW_WR_LEN16_V(DIV_ROUND_UP(flowclen,
+ 16)) | FW_WR_FLOWID_V(ep-hwtid));
 
flowc-mnemval[0].mnemonic = FW_FLOWC_MNEM_PFNVFN;
flowc-mnemval[0].val = cpu_to_be32(FW_PFVF_CMD_PFN
@@ -803,16 +803,16 @@ static void send_mpa_req(struct c4iw_ep *ep, struct 
sk_buff *skb,
req = (struct fw_ofld_tx_data_wr *)skb_put(skb, wrlen);
memset(req, 0, wrlen);
req-op_to_immdlen = cpu_to_be32(
-   FW_WR_OP(FW_OFLD_TX_DATA_WR) |
-   FW_WR_COMPL(1) |
-   FW_WR_IMMDLEN(mpalen));
+   FW_WR_OP_V(FW_OFLD_TX_DATA_WR) |
+   FW_WR_COMPL_F |
+   FW_WR_IMMDLEN_V(mpalen));
req-flowid_len16 = cpu_to_be32(
-   FW_WR_FLOWID(ep-hwtid) |
-   FW_WR_LEN16(wrlen  4));
+   FW_WR_FLOWID_V(ep-hwtid) |
+   FW_WR_LEN16_V(wrlen  4));
req-plen = cpu_to_be32(mpalen);
req-tunnel_to_proxy = cpu_to_be32(
-   FW_OFLD_TX_DATA_WR_FLUSH(1) |
-   FW_OFLD_TX_DATA_WR_SHOVE(1));
+   FW_OFLD_TX_DATA_WR_FLUSH_F |
+   FW_OFLD_TX_DATA_WR_SHOVE_F);
 
mpa = (struct mpa_message *)(req + 1);
memcpy(mpa-key, MPA_KEY_REQ, sizeof(mpa-key));
@@ -897,16 +897,16 @@ static int send_mpa_reject(struct c4iw_ep *ep, const void 
*pdata, u8 plen)
req = (struct fw_ofld_tx_data_wr *)skb_put(skb, wrlen);
memset(req, 0, wrlen);
req-op_to_immdlen = cpu_to_be32(
-   FW_WR_OP(FW_OFLD_TX_DATA_WR) |
-   FW_WR_COMPL(1) |
-   FW_WR_IMMDLEN(mpalen));
+   FW_WR_OP_V(FW_OFLD_TX_DATA_WR) |
+   FW_WR_COMPL_F |
+   FW_WR_IMMDLEN_V(mpalen));
req-flowid_len16 = cpu_to_be32(
-   FW_WR_FLOWID(ep-hwtid) |
-   FW_WR_LEN16(wrlen  4));
+   FW_WR_FLOWID_V(ep-hwtid) |
+   FW_WR_LEN16_V(wrlen  4));
req-plen = cpu_to_be32(mpalen);
req-tunnel_to_proxy = cpu_to_be32(
-   FW_OFLD_TX_DATA_WR_FLUSH(1) |
-   FW_OFLD_TX_DATA_WR_SHOVE(1));
+  

[PATCH 1/2] iw_cxgb4: Fixes locking issue in process_mpa_request

2014-11-06 Thread Hariprasad Shenai
=
[ INFO: possible recursive locking detected ]
3.17.0+ #3 Tainted: GE
-
kworker/u64:3/299 is trying to acquire lock:
 (epc-mutex){+.+.+.}, at: [a074e07a]
process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]

but task is already holding lock:
 (epc-mutex){+.+.+.}, at: [a074e34e] rx_data+0x9e/0x1f0 [iw_cxgb4]

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(epc-mutex);
  lock(epc-mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/u64:3/299:
 #0:  (%siw_cxgb4){.+.+.+}, at: [8106f14d]
process_one_work+0x13d/0x4d0
 #1:  (skb_work){+.+.+.}, at: [8106f14d] process_one_work+0x13d/0x4d0
 #2:  (epc-mutex){+.+.+.}, at: [a074e34e] rx_data+0x9e/0x1f0
[iw_cxgb4]

stack backtrace:
CPU: 2 PID: 299 Comm: kworker/u64:3 Tainted: GE  3.17.0+ #3
Hardware name: Dell Inc. PowerEdge T110/0X744K, BIOS 1.2.1 01/28/2010
Workqueue: iw_cxgb4 process_work [iw_cxgb4]
 8800b91593d0 8800b8a2f9f8 815df107 0001
 8800b9158750 8800b8a2fa28 8109f0e2 8800bb768a00
 8800b91593d0 8800b9158750  8800b8a2fa88
Call Trace:
 [815df107] dump_stack+0x49/0x62
 [8109f0e2] print_deadlock_bug+0xf2/0x100
 [810a0f04] validate_chain+0x454/0x700
 [810a1574] __lock_acquire+0x3c4/0x580
 [a074e07a] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
 [810a17cc] lock_acquire+0x9c/0x110
 [a074e07a] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
 [815e111b] mutex_lock_nested+0x4b/0x360
 [a074e07a] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
 [810c181a] ? del_timer_sync+0xaa/0xd0
 [810c1770] ? try_to_del_timer_sync+0x70/0x70
 [a074e07a] process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
 [a074a3ec] ? update_rx_credits+0xec/0x140 [iw_cxgb4]
 [a074e381] rx_data+0xd1/0x1f0 [iw_cxgb4]
 [8109ff23] ? mark_held_locks+0x73/0xa0
 [815e4b90] ? _raw_spin_unlock_irqrestore+0x40/0x70
 [810a020d] ? trace_hardirqs_on_caller+0xfd/0x1c0
 [810a02dd] ? trace_hardirqs_on+0xd/0x10
 [a074c931] process_work+0x51/0x80 [iw_cxgb4]
 [8106f1c8] process_one_work+0x1b8/0x4d0
 [8106f14d] ? process_one_work+0x13d/0x4d0
 [8106f600] worker_thread+0x120/0x3c0
 [8106f4e0] ? process_one_work+0x4d0/0x4d0
 [81074a0e] kthread+0xde/0x100
 [815e4b40] ? _raw_spin_unlock_irq+0x30/0x40
 [81074930] ? __init_kthread_worker+0x70/0x70
 [815e512c] ret_from_fork+0x7c/0xb0
 [81074930] ? __init_kthread_worker+0x70/0x70
===

Based on original work by Steve Wise sw...@opengridcomputing.com

Signed-off-by: Steve Wise sw...@opengridcomputing.com
Signed-off-by: Hariprasad Shenai haripra...@chelsio.com
---
 drivers/infiniband/hw/cxgb4/cm.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index fb61f66..ce87fd3 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -1640,7 +1640,8 @@ static void process_mpa_request(struct c4iw_ep *ep, 
struct sk_buff *skb)
__state_set(ep-com, MPA_REQ_RCVD);
 
/* drive upcall */
-   mutex_lock(ep-parent_ep-com.mutex);
+   mutex_lock_nested(ep-parent_ep-com.mutex,
+ SINGLE_DEPTH_NESTING);
if (ep-parent_ep-com.state != DEAD) {
if (connect_request_upcall(ep))
abort_connection(ep, skb, GFP_KERNEL);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] iw_cxgb4: Fixes locking issue and MR limit for T4/T5 adapter

2014-11-06 Thread Hariprasad Shenai
Hi, 

This patch series fixes a lock issue and limit MR's to 8 Gb for Chelsio T4/T5
adapter.

The patches series is created against 'infiniband' tree.
And includes patches on iw_cxgb4 driver.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.

Thanks

Hariprasad Shenai (2):
  iw_cxgb4: Fixes locking issue in process_mpa_request
  iw_cxgb4: limit MRs to  8GB for T4/T5 devices

 drivers/infiniband/hw/cxgb4/cm.c  |3 ++-
 drivers/infiniband/hw/cxgb4/mem.c |   22 ++
 2 files changed, 24 insertions(+), 1 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] iw_cxgb4: limit MRs to 8GB for T4/T5 devices

2014-11-06 Thread Hariprasad Shenai
T4/T5 hardware can't handle MRs = 8GB due to a hardware bug.  So limit
registrations to  8GB for thse devices.

Based on original work by Steve Wise sw...@opengridcomputing.com

Signed-off-by: Steve Wise sw...@opengridcomputing.com
Signed-off-by: Hariprasad Shenai haripra...@chelsio.com
---
 drivers/infiniband/hw/cxgb4/mem.c |   22 ++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/mem.c 
b/drivers/infiniband/hw/cxgb4/mem.c
index ec7a298..d5dd3f2 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -50,6 +50,13 @@ static int inline_threshold = C4IW_INLINE_THRESHOLD;
 module_param(inline_threshold, int, 0644);
 MODULE_PARM_DESC(inline_threshold, inline vs dsgl threshold (default=128));
 
+static int mr_exceeds_hw_limits(struct c4iw_dev *dev, u64 length)
+{
+   return (is_t4(dev-rdev.lldi.adapter_type) ||
+   is_t5(dev-rdev.lldi.adapter_type)) 
+   length = 8*1024*1024*1024ULL;
+}
+
 static int _c4iw_write_mem_dma_aligned(struct c4iw_rdev *rdev, u32 addr,
   u32 len, dma_addr_t data, int wait)
 {
@@ -536,6 +543,11 @@ int c4iw_reregister_phys_mem(struct ib_mr *mr, int 
mr_rereg_mask,
return ret;
}
 
+   if (mr_exceeds_hw_limits(rhp, total_size)) {
+   kfree(page_list);
+   return -EINVAL;
+   }
+
ret = reregister_mem(rhp, php, mh, shift, npages);
kfree(page_list);
if (ret)
@@ -596,6 +608,12 @@ struct ib_mr *c4iw_register_phys_mem(struct ib_pd *pd,
if (ret)
goto err;
 
+   if (mr_exceeds_hw_limits(rhp, total_size)) {
+   kfree(page_list);
+   ret = -EINVAL;
+   goto err;
+   }
+
ret = alloc_pbl(mhp, npages);
if (ret) {
kfree(page_list);
@@ -699,6 +717,10 @@ struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 
start, u64 length,
 
php = to_c4iw_pd(pd);
rhp = php-rhp;
+
+   if (mr_exceeds_hw_limits(rhp, length))
+   return ERR_PTR(-EINVAL);
+
mhp = kzalloc(sizeof(*mhp), GFP_KERNEL);
if (!mhp)
return ERR_PTR(-ENOMEM);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html