Re: [PATCH for-next 2/5] IB/core: Add support for extended query device caps
On Tue, Nov 04, 2014 at 02:35:09PM +0200, Haggai Eran wrote: On 03/11/2014 10:02, Eli Cohen wrote: + + if (ucore-outlen sizeof(resp)) + return -ENOSPC; This check may cause compatibility problems when running a newer kernel with old userspace. The userspace code will have a smaller ib_uverbs_ex_query_device_resp struct, so the verb will always fail. A possible solution is to drop this check, and modify ib_copy_to_udata so that it only copies up to ucore-outlen bytes. Makes sense. Will fix that in V1. + + if (cmd.comp_mask) + return -EINVAL; This check may make it difficult for userspace to use this verb. If running an older kernel with a newer userspace, the userspace will need to run the verb multiple times to find out which combination of comp_mask bits is actually supported. I think a better way would be to drop this check, and let userspace rely on the returned comp_mask in the ib_uverbs_ex_query_device_resp struct to determine which features are supported by the current kernel. Agree - this should hold true for any extended query. Will fix in v1. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] rping: ignore flushed completions
Based on original work by Steve Wise st...@opengridcomputing.com Signed-off-by: Hariprasad Shenai haripra...@chelsio.com --- examples/rping.c | 15 ++- 1 files changed, 10 insertions(+), 5 deletions(-) diff --git a/examples/rping.c b/examples/rping.c index f0414de..58b642e 100644 --- a/examples/rping.c +++ b/examples/rping.c @@ -277,15 +277,20 @@ static int rping_cq_event_handler(struct rping_cb *cb) struct ibv_wc wc; struct ibv_recv_wr *bad_wr; int ret; + int flushed = 0; while ((ret = ibv_poll_cq(cb-cq, 1, wc)) == 1) { ret = 0; if (wc.status) { - if (wc.status != IBV_WC_WR_FLUSH_ERR) - fprintf(stderr, - cq completion failed status %d\n, - wc.status); + if (wc.status == IBV_WC_WR_FLUSH_ERR) { + flushed = 1; + continue; + + } + fprintf(stderr, + cq completion failed status %d\n, + wc.status); ret = -1; goto error; } @@ -334,7 +339,7 @@ static int rping_cq_event_handler(struct rping_cb *cb) fprintf(stderr, poll error %d\n, ret); goto error; } - return 0; + return flushed; error: cb-state = ERROR; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] rping: Fixes race, where ibv context was getting freed before memory was deregistered
While running rping as a client without server on the other end, rping_test_client fails and the ibv context was getting freed before memory was deregistered. This patch fixes it. Signed-off-by: Hariprasad Shenai haripra...@chelsio.com --- examples/rping.c |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/examples/rping.c b/examples/rping.c index 949cbe6..f0414de 100644 --- a/examples/rping.c +++ b/examples/rping.c @@ -1055,18 +1055,19 @@ static int rping_run_client(struct rping_cb *cb) ret = rping_connect_client(cb); if (ret) { fprintf(stderr, connect error %d\n, ret); - goto err2; + goto err3; } ret = rping_test_client(cb); if (ret) { fprintf(stderr, rping client failed: %d\n, ret); - goto err3; + goto err4; } ret = 0; -err3: +err4: rdma_disconnect(cb-cm_id); +err3: pthread_join(cb-cqthread, NULL); err2: rping_free_buffers(cb); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 2/2] net/mlx5_core: Fix race on driver load
When events arrive at driver load, the event handler gets called even before the spinlock and list are initialized. Fix this by moving the initialization before EQs creation. Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 88b2ffa0edfb..ecc6341e728a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -855,14 +855,14 @@ static int init_one(struct pci_dev *pdev, dev-profile = profile[prof_sel]; dev-event = mlx5_core_event; + INIT_LIST_HEAD(priv-ctx_list); + spin_lock_init(priv-ctx_lock); err = mlx5_dev_init(dev, pdev); if (err) { dev_err(pdev-dev, mlx5_dev_init failed %d\n, err); goto out; } - INIT_LIST_HEAD(priv-ctx_list); - spin_lock_init(priv-ctx_lock); err = mlx5_register_device(dev); if (err) { dev_err(pdev-dev, mlx5_register_device failed %d\n, err); -- 2.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 0/2] mlx5_core fixes for 3.18
Hi Dave, the following two patches fix races to could lead to kernel panic in some cases. Thanks, Eli Eli Cohen (2): net/mlx5_core: Fix race in create EQ net/mlx5_core: Fix race on driver load drivers/net/ethernet/mellanox/mlx5/core/eq.c | 7 +++ drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ++-- 2 files changed, 5 insertions(+), 6 deletions(-) -- 2.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 1/2] net/mlx5_core: Fix race in create EQ
After the EQ is created, it can possibly generate interrupts and the interrupt handler is referencing eq-dev. It is therefore required to set eq-dev before calling request_irq() so if an event is generated before request_irq() returns, we will have a valid eq-dev field. Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/net/ethernet/mellanox/mlx5/core/eq.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c index a278238a2db6..ad2c96a02a53 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c @@ -374,15 +374,14 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx, snprintf(eq-name, MLX5_MAX_EQ_NAME, %s@pci:%s, name, pci_name(dev-pdev)); eq-eqn = out.eq_number; + eq-irqn = vecidx; + eq-dev = dev; + eq-doorbell = uar-map + MLX5_EQ_DOORBEL_OFFSET; err = request_irq(table-msix_arr[vecidx].vector, mlx5_msix_handler, 0, eq-name, eq); if (err) goto err_eq; - eq-irqn = vecidx; - eq-dev = dev; - eq-doorbell = uar-map + MLX5_EQ_DOORBEL_OFFSET; - err = mlx5_debug_eq_add(dev, eq); if (err) goto err_irq; -- 2.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for-next 1/2] IB/uverbs: Enable device removal when there are active user space applications
Enables the uverbs_remove_one to succeed despite the fact that there are running IB applications working with the given ib device. This functionality enables a HW device to be unbind/reset despite the fact that there are running user space applications using it. It exposes a new IB kernel API named 'disassociate_ucontext' which lets a driver detaching its HW resources from a given user context without crashing/terminating the application. In case a driver implemented the above API and registered with ib_uverb there will be no dependency between its device to its uverbs_device. Upon calling remove_one of ib_uverbs the call should return after disassociating the open HW resources without waiting to clients disconnecting. In case driver didn't implement this API there will be no change to current behaviour and uverbs_remove_one will return only when last client has disconnected and reference count on uverbs device became 0. In case the lower driver device was removed any application will continue working over some zombie HCA, further calls will ended with an immediate error. Signed-off-by: Yishai Hadas yish...@mellanox.com Signed-off-by: Jack Morgenstein ja...@mellanox.com --- drivers/infiniband/core/uverbs.h |9 + drivers/infiniband/core/uverbs_cmd.c |8 + drivers/infiniband/core/uverbs_main.c | 317 +++-- include/rdma/ib_verbs.h |2 + 4 files changed, 280 insertions(+), 56 deletions(-) diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index 643c08a..e485e67 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -94,6 +94,12 @@ struct ib_uverbs_device { struct cdev cdev; struct rb_root xrcd_tree; struct mutexxrcd_tree_mutex; + struct mutexdisassociate_mutex; /* protect lists of files. */ + int disassociated; + int disassociated_supported; + struct srcu_struct disassociate_srcu; + struct list_headuverbs_file_list; + struct list_headuverbs_events_file_list; }; struct ib_uverbs_event_file { @@ -105,6 +111,7 @@ struct ib_uverbs_event_file { wait_queue_head_t poll_wait; struct fasync_struct *async_queue; struct list_headevent_list; + struct list_headlist; }; struct ib_uverbs_file { @@ -114,6 +121,8 @@ struct ib_uverbs_file { struct ib_ucontext *ucontext; struct ib_event_handler event_handler; struct ib_uverbs_event_file*async_file; + struct list_headlist; + int fatal_event_raised; }; struct ib_uverbs_event { diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 5ba2a86..0b19361 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -38,6 +38,7 @@ #include linux/slab.h #include asm/uaccess.h +#include linux/sched.h #include uverbs.h #include core_priv.h @@ -326,6 +327,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file, INIT_LIST_HEAD(ucontext-xrcd_list); INIT_LIST_HEAD(ucontext-rule_list); ucontext-closing = 0; + ucontext-tgid = get_task_pid(current-group_leader, PIDTYPE_PID); resp.num_comp_vectors = file-device-num_comp_vectors; @@ -1286,6 +1288,12 @@ ssize_t ib_uverbs_create_comp_channel(struct ib_uverbs_file *file, return -EFAULT; } + /* Taking ref count on uverbs_file to make sure that file won't be freed till + * that event file is closed. It will enable accessing the uverbs_device fields as part of + * closing the events file and making sure that uverbs device is available by that time as well. + * Note: similar is already done for the async event file. + */ + kref_get(file-ref); fd_install(resp.fd, filp); return in_len; } diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 71ab83f..d718d64 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -133,7 +133,12 @@ static void ib_uverbs_release_dev(struct kref *ref) struct ib_uverbs_device *dev = container_of(ref, struct ib_uverbs_device, ref); - complete(dev-comp); + if (dev-disassociated) { + cleanup_srcu_struct(dev-disassociate_srcu); + kfree(dev); + } else { + complete(dev-comp); + } } static void ib_uverbs_release_event_file(struct kref
[PATCH for-next 2/2] IB/mlx4_ib: Disassociate support
Implements the IB core disassociate_ucontext API. The driver detaches the HW resources for a given user context to prevent a dependency between application termination and device disconnecting. This is done by managing the VMAs that were mapped to the HW bars such as door bell and blueflame. When need to detach remap them to an arbitrary kernel page returned by the zap API. Signed-off-by: Yishai Hadas yish...@mellanox.com Signed-off-by: Jack Morgenstein ja...@mellanox.com --- drivers/infiniband/hw/mlx4/main.c| 119 +- drivers/infiniband/hw/mlx4/mlx4_ib.h | 12 2 files changed, 130 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index bda5994..76151b2 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -645,7 +645,7 @@ static struct ib_ucontext *mlx4_ib_alloc_ucontext(struct ib_device *ibdev, resp.cqe_size = dev-dev-caps.cqe_size; } - context = kmalloc(sizeof *context, GFP_KERNEL); + context = kzalloc(sizeof *context, GFP_KERNEL); if (!context) return ERR_PTR(-ENOMEM); @@ -682,21 +682,134 @@ static int mlx4_ib_dealloc_ucontext(struct ib_ucontext *ibcontext) return 0; } +static void mlx4_ib_vma_open(struct vm_area_struct *area) +{ + /* vma_open is called when a new VMA is created on top of our VMA. +* This is done through either mremap flow or split_vma (usually due to mlock, +* madvise, munmap, etc.) We do not support a clone of the vma, as this VMA is +* strongly hardware related. Therefore we set the vm_ops of the newly +* created/cloned VMA to NULL, to prevent it from calling us again and trying +* to do incorrect actions. We assume that the original vma size is exactly a +* single page that there will be no splitting operations on. +*/ + area-vm_ops = NULL; +} + +static void mlx4_ib_vma_close(struct vm_area_struct *area) +{ + struct mlx4_ib_vma_private_data *mlx4_ib_vma_priv_data; + + /* It's guaranteed that all VMAs opened on a FD are closed before the +* file itself is closed, therefore no sync is needed with the regular closing +* flow. (e.g. mlx4_ib_dealloc_ucontext) However need a sync with accessing the +* vma as part of mlx4_ib_disassociate_ucontext. The close operation is +* usually called under mm-mmap_sem except when process is exiting. The +* exiting case is handled explicitly as part of mlx4_ib_disassociate_ucontext. +*/ + mlx4_ib_vma_priv_data = (struct mlx4_ib_vma_private_data *)area-vm_private_data; + + /* set the vma context pointer to null in the mlx4_ib driver's private data, +* to protect against a race condition in mlx4_ib_dissassociate_ucontext(). +*/ + mlx4_ib_vma_priv_data-vma = NULL; +} + +static const struct vm_operations_struct mlx4_ib_vm_ops = { + .open = mlx4_ib_vma_open, + .close = mlx4_ib_vma_close +}; + +static void mlx4_ib_disassociate_ucontext(struct ib_ucontext *ibcontext) +{ + int i; + int ret = 0; + struct vm_area_struct *vma; + struct mlx4_ib_ucontext *context = to_mucontext(ibcontext); + struct task_struct *owning_process = NULL; + struct mm_struct *owning_mm = NULL; + + owning_process = get_pid_task(ibcontext-tgid, PIDTYPE_PID); + if (!owning_process) + return; + + owning_mm = get_task_mm(owning_process); + if (!owning_mm) { + pr_info(no mm, disassociate ucontext is pending task termination\n); + while (1) { + /* make sure that task is dead before returning, it may prevent a rare case +* of module down in parallel to a call to mlx4_ib_vma_close. +*/ + put_task_struct(owning_process); + msleep(1); + owning_process = get_pid_task(ibcontext-tgid, PIDTYPE_PID); + if (!owning_process || owning_process-state == TASK_DEAD) { + pr_info(disassociate ucontext done, task was terminated\n); + /* in case task was dead need to release the task struct */ + if (owning_process) + put_task_struct(owning_process); + return; + } + } + } + + /* need to protect from a race on closing the vma as part of mlx4_ib_vma_close */ + down_read(owning_mm-mmap_sem); + for (i = 0; i HW_BAR_COUNT; i++) { + vma = context-hw_bar_info[i].vma; + if (!vma) + continue; + + ret = zap_vma_ptes(context-hw_bar_info[i].vma,
[PATCH for-next 0/2] HW Device hot-removal support
Currently, if there is any user space application using an IB device, it is impossible to unload the HW device driver for this device. Similarly, if the device is hot-unplugged or reset, the device driver hardware removal flow blocks until all user contexts are destroyed. This patchset removes the above limitations. The IB-core and uverbs layers are still required to remain loaded as long as there are user applications using the verbs API. However, the hardware device drivers are not blocked any more by the user space activity. To support this, the hardware device needs to expose a new kernel API named 'disassociate_ucontext'. The device driver is given a ucontext to detach from, and it should block this user context from any future hardware access. In the IB-core level, we use this interface to deactivate all ucontext that address a specific device when handling a remove_one callback for it. The first patch introduces the new API between the HW device driver and the IB core. For devices which implement the functionality, IB core will use it in remove_one, disassociating any active ucontext from the hardware device. Other drivers that didn't implement it will behave as today, remove_one will block until all ucontexts referring the device are destroyed before returning. The second patch provides implementation of this API for the mlx4 driver. Yishai Hadas (2): IB/uverbs: Enable device removal when there are active user space applications IB/mlx4_ib: Disassociate support drivers/infiniband/core/uverbs.h |9 + drivers/infiniband/core/uverbs_cmd.c |8 + drivers/infiniband/core/uverbs_main.c | 317 +++-- drivers/infiniband/hw/mlx4/main.c | 119 - drivers/infiniband/hw/mlx4/mlx4_ib.h | 12 ++ include/rdma/ib_verbs.h |2 + 6 files changed, 410 insertions(+), 57 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/srp: Fix a 32-bit compiler warning
The result of a pointer subtraction has type ptrdiff_t. Hence change a %ld format specifier into %td. This change avoids that the following warning is printed on 32-bit systems: warning: format '%ld' expects argument of type 'long int', but argument 5 has type 'int' [-Wformat=] Reported-by: Wu Fengguang fengguang...@intel.com Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Sagi Grimberg sa...@mellanox.com Cc: Sebastian Parschauer sebastian.rie...@profitbricks.com --- drivers/infiniband/ulp/srp/ib_srp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 89e4560..577eb01 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1747,7 +1747,7 @@ static void srp_process_rsp(struct srp_rdma_ch *ch, struct srp_rsp *rsp) } if (!scmnd) { shost_printk(KERN_ERR, target-scsi_host, -Null scmnd for RSP w/tag %#016llx received on ch %ld / QP %#x\n, +Null scmnd for RSP w/tag %#016llx received on ch %td / QP %#x\n, rsp-tag, ch - target-ch, ch-qp-qp_num); spin_lock_irqsave(ch-lock, flags); -- 2.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 for-next 5/5] IB/mlx4: Modify mlx4 to comply with extended atomic definitions
Set the extended masked atomic capabilities. For ConnectX devices argument size is fixed to 8 bytes and bit boundary is 64. Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/infiniband/hw/mlx4/main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 8b72cf392b34..7de8cf12a605 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -223,6 +223,9 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, props-atomic_cap = dev-dev-caps.flags MLX4_DEV_CAP_FLAG_ATOMIC ? IB_ATOMIC_HCA : IB_ATOMIC_NONE; props-masked_atomic_cap = props-atomic_cap; + props-log_atomic_arg_sizes = 8; + props-max_fa_bit_boundary = 64; + props-log_max_atomic_inline = 8; props-max_pkeys = dev-dev-caps.pkey_table_len[1]; props-max_mcast_grp = dev-dev-caps.num_mgms + dev-dev-caps.num_amgms; props-max_mcast_qp_attach = dev-dev-caps.num_qp_per_mgm; -- 2.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 for-next 4/5] IB/mlx5: Add extended atomic support
Connect-IB extended atomic operations provides masked compare and swap and multi field fetch and add operations with arguments sizes bigger than 64 bits. Also, Connect-IB supports BE replies to atomic opertation, add that to the advertized capabilities. Add the required functionality to mlx5 and publish capabilities. Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/infiniband/hw/mlx5/main.c | 47 +- drivers/infiniband/hw/mlx5/qp.c| 26 ++-- drivers/net/ethernet/mellanox/mlx5/core/fw.c | 51 +++- drivers/net/ethernet/mellanox/mlx5/core/main.c | 21 +++--- include/linux/mlx5/device.h| 4 +- include/linux/mlx5/driver.h| 55 ++ include/linux/mlx5/mlx5_ifc.h | 20 ++ 7 files changed, 194 insertions(+), 30 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 1ba6c42e4df8..3c6fa99c4256 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -151,6 +151,47 @@ static void free_comp_eqs(struct mlx5_ib_dev *dev) spin_unlock(table-lock); } +static void update_atomic_caps(struct mlx5_caps*caps, + struct ib_device_attr *props) +{ + struct mlx5_atomic_caps *atom = caps-atom; + unsigned long last; + unsigned long arg; + int tmp; + + tmp = MLX5_ATOMIC_OPS_CMP_SWAP | MLX5_ATOMIC_OPS_FETCH_ADD; + if (((atom-atomic_ops tmp) == tmp) (atom-atomic_sizes_qp 8)) { + if (atom-requestor_endianess) + props-atomic_cap = IB_ATOMIC_HCA; + else + props-atomic_cap = IB_ATOMIC_HCA_REPLY_BE; + } else { + props-atomic_cap = IB_ATOMIC_NONE; + } + + tmp = MLX5_ATOMIC_OPS_MASKED_CMP_SWAP | MLX5_ATOMIC_OPS_MASKED_FETCH_ADD; + if (((atom-atomic_ops tmp) == tmp)) { + if (atom-requestor_endianess) + props-masked_atomic_cap = IB_ATOMIC_HCA; + else + props-masked_atomic_cap = IB_ATOMIC_HCA_REPLY_BE; + } else { + props-masked_atomic_cap = IB_ATOMIC_NONE; + } + if ((props-atomic_cap != IB_ATOMIC_NONE) || + (props-masked_atomic_cap != IB_ATOMIC_NONE)) { + props-log_atomic_arg_sizes = caps-atom.atomic_sizes_qp; + props-max_fa_bit_boundary = 64; + arg = (unsigned long)props-log_atomic_arg_sizes; + last = find_last_bit(arg, sizeof(arg)); + props-log_max_atomic_inline = min_t(unsigned long, last, 6); + } else { + props-log_atomic_arg_sizes = 0; + props-max_fa_bit_boundary = 0; + props-log_max_atomic_inline = 0; + } +} + static int mlx5_ib_query_device(struct ib_device *ibdev, struct ib_device_attr *props) { @@ -235,8 +276,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev, props-max_srq_sge = max_rq_sg - 1; props-max_fast_reg_page_list_len = (unsigned int)-1; props-local_ca_ack_delay = gen-local_ca_ack_delay; - props-atomic_cap = IB_ATOMIC_NONE; - props-masked_atomic_cap = IB_ATOMIC_NONE; + update_atomic_caps(dev-mdev-caps, props); props-max_pkeys = be16_to_cpup((__be16 *)(out_mad-data + 28)); props-max_mcast_grp = 1 gen-log_max_mcg; props-max_mcast_qp_attach = gen-max_qp_mcg; @@ -1374,6 +1414,9 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev) (1ull IB_USER_VERBS_CMD_CLOSE_XRCD); } + dev-ib_dev.uverbs_ex_cmd_mask |= + (1ull IB_USER_VERBS_EX_CMD_QUERY_DEVICE); + err = init_node_data(dev); if (err) goto err_eqs; diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 9ca39ad68cb8..47ca93ce214f 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1254,7 +1254,27 @@ int mlx5_ib_destroy_qp(struct ib_qp *qp) return 0; } -static __be32 to_mlx5_access_flags(struct mlx5_ib_qp *qp, const struct ib_qp_attr *attr, +static u32 atomic_mode_qp(struct mlx5_ib_dev *dev) +{ + struct mlx5_atomic_caps *acaps = dev-mdev-caps.atom; + unsigned long mask; + unsigned long tmp; + + mask = acaps-atomic_sizes_qp acaps-atomic_sizes_dc; + + tmp = find_last_bit(mask, 8 * sizeof(mask)); + if (tmp 2 || tmp = 16) + return MLX5_ATOMIC_MODE_NONE 16; + + if (tmp == 2) + return MLX5_ATOMIC_MODE_CX 16; + + return tmp 16; +} + +static __be32 to_mlx5_access_flags(struct mlx5_ib_dev *dev, + struct mlx5_ib_qp *qp, + const struct ib_qp_attr *attr,
Re: [PATCH v3 01/11] blk-mq: Add blk_mq_unique_tag()
On 11/05/14 19:54, Christoph Hellwig wrote: On Wed, Nov 05, 2014 at 01:37:14PM +0100, Bart Van Assche wrote: That's strange. I have compared the patches that are already in your tree with the patches I had posted myself with a diff tool. These patches look identical to what I had posted except for one CC tag that has been left out. If I try to apply the three patches that have not yet been included in your tree (9/11..11/11) on top the drivers-for-3.19 branch then these patches apply fine. Anyway, I have rebased my tree on top of your drivers-for-3.19 branch, added a few other patches (including one block layer patch that has not yet been posted) and retested the SRP initiator driver against the traditional SCSI core and also against the scsi-mq core. The result can be found here: https://github.com/bvanassche/linux/commits/srp-multiple-hwq-v4. Can you please retry to apply patches 9/11..11/11 apply on top of the drivers-for-3.19 branch ? I've pulled in the three remaining patches from the series from that tree. If you want me to pull in the remaining trivial srp patch as well please give me a Reviewed-by: and I'll also pull it in. Thanks ! Regarding the remaining SRP patch: Roland has already been asked to pull that patch (see also http://thread.gmane.org/gmane.linux.drivers.rdma/22018). Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 for-next 3/5] IB/core: Extend atomic operations
Further enhance the extended atomic operations support as was introduced in commit 5e80ba8ff0bd IB/core: Add support for masked atomic operations. 1. Allow arbitrary argument sizes. The original extended atomics commit defined 64 bits arguments. This patch allows arbitrary arguments which are power of 2 bytes in size. 2. Add the option to define response for atomic operations in network order. enum ib_atomic_cap is extended to have big endian variants. The device attributes struct defines three new fields: log_atomic_arg_sizes - is a bit mask which encodes which argument sizes are supported. A set bit at location n (zero based) means an argument of size 2 ^ n is supported. max_fa_bit_boundary - Max fetch and add bit boundary. Multi field fetch and add operations use a bit mask that defines bit locations where carry bit is not passed to the next higher order bit. So, if this field has the value 64, it means that the max value subject to fetch and add is 64 bits which means no carry from bit 63 to 64 or from bit 127 to 128 etc. log_max_atomic_inline - atomic arguments can be inline in the WQE or be referenced through a memory key. This value defines the max inline argument size possible. Signed-off-by: Eli Cohen e...@mellanox.com --- Changes from v0: Don not enforce comp_mask to the know masks defined by ~IB_UVERBS_EX_QUERY_DEV_MAX_MASK. drivers/infiniband/core/uverbs_cmd.c | 14 ++ include/rdma/ib_verbs.h | 7 ++- include/uapi/rdma/ib_user_verbs.h| 14 ++ 3 files changed, 34 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 74ad0d0de92b..0bc215fa2a85 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -445,6 +445,8 @@ ssize_t ib_uverbs_query_device(struct ib_uverbs_file *file, memset(resp, 0, sizeof resp); copy_query_dev_fields(file, resp, attr); + if (resp.atomic_cap IB_ATOMIC_GLOB) + resp.atomic_cap = IB_ATOMIC_NONE; if (copy_to_user((void __user *) (unsigned long) cmd.response, resp, sizeof resp)) @@ -3286,6 +3288,18 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file, copy_query_dev_fields(file, resp.base, attr); resp.comp_mask = 0; + if (cmd.comp_mask IB_UVERBS_EX_QUERY_DEV_MASKED_ATOMIC) { + resp.atomics.masked_atomic_cap = attr.masked_atomic_cap; + resp.atomics.log_atomic_arg_sizes = attr.log_atomic_arg_sizes; + resp.atomics.max_fa_bit_boundary = attr.max_fa_bit_boundary; + resp.atomics.log_max_atomic_inline = attr.log_max_atomic_inline; + resp.comp_mask |= IB_UVERBS_EX_QUERY_DEV_MASKED_ATOMIC; + } else { + resp.atomics.masked_atomic_cap = IB_ATOMIC_NONE; + resp.atomics.log_atomic_arg_sizes = 0; + resp.atomics.max_fa_bit_boundary = 0; + resp.atomics.log_max_atomic_inline = 0; + } err = ib_copy_to_udata(ucore, resp, sizeof(resp)); if (err) return err; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 97a999f9e4d8..2b65e31ca298 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -140,7 +140,9 @@ enum ib_signature_guard_cap { enum ib_atomic_cap { IB_ATOMIC_NONE, IB_ATOMIC_HCA, - IB_ATOMIC_GLOB + IB_ATOMIC_GLOB, + IB_ATOMIC_HCA_REPLY_BE, + IB_ATOMIC_GLOB_REPLY_BE, }; struct ib_device_attr { @@ -186,6 +188,9 @@ struct ib_device_attr { u8 local_ca_ack_delay; int sig_prot_cap; int sig_guard_cap; + u32 log_atomic_arg_sizes; /* bit-mask of supported sizes */ + u32 max_fa_bit_boundary; + u32 log_max_atomic_inline; }; enum ib_mtu { diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h index ed8c3d9da42c..ec98fe636f2b 100644 --- a/include/uapi/rdma/ib_user_verbs.h +++ b/include/uapi/rdma/ib_user_verbs.h @@ -202,13 +202,27 @@ struct ib_uverbs_query_device_resp { __u8 reserved[4]; }; +enum { + IB_UVERBS_EX_QUERY_DEV_MASKED_ATOMIC= 1 0, + IB_UVERBS_EX_QUERY_DEV_LAST = 1 1, + IB_UVERBS_EX_QUERY_DEV_MAX_MASK = IB_UVERBS_EX_QUERY_DEV_LAST - 1, +}; + struct ib_uverbs_ex_query_device { __u32 comp_mask; }; +struct ib_uverbs_ex_atomic_caps { + __u32 masked_atomic_cap; + __u32 log_atomic_arg_sizes; /* bit-mask of supported sizes */ + __u32 max_fa_bit_boundary; + __u32 log_max_atomic_inline; +}; + struct ib_uverbs_ex_query_device_resp { struct ib_uverbs_query_device_resp base; __u32 comp_mask; + struct ib_uverbs_ex_atomic_caps atomics; }; struct ib_uverbs_query_port { -- 2.1.2 -- To
[PATCH v1 for-next 1/5] IB/mlx5: Fix sparse warnings
1. Add required __acquire/__release statements to balance spinlock usage. 2. Change the index parameter of begin_wqe() to be unsigned to match supplied argument type. Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/infiniband/hw/mlx5/qp.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index e261a53f9a02..9ca39ad68cb8 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1011,9 +1011,14 @@ static void mlx5_ib_lock_cqs(struct mlx5_ib_cq *send_cq, struct mlx5_ib_cq *recv } } else { spin_lock_irq(send_cq-lock); + __acquire(recv_cq-lock); } } else if (recv_cq) { spin_lock_irq(recv_cq-lock); + __acquire(send_cq-lock); + } else { + __acquire(send_cq-lock); + __acquire(recv_cq-lock); } } @@ -1033,10 +1038,15 @@ static void mlx5_ib_unlock_cqs(struct mlx5_ib_cq *send_cq, struct mlx5_ib_cq *re spin_unlock_irq(recv_cq-lock); } } else { + __release(recv_cq-lock); spin_unlock_irq(send_cq-lock); } } else if (recv_cq) { + __release(send_cq-lock); spin_unlock_irq(recv_cq-lock); + } else { + __release(recv_cq-lock); + __release(send_cq-lock); } } @@ -2411,7 +2421,7 @@ static u8 get_fence(u8 fence, struct ib_send_wr *wr) static int begin_wqe(struct mlx5_ib_qp *qp, void **seg, struct mlx5_wqe_ctrl_seg **ctrl, -struct ib_send_wr *wr, int *idx, +struct ib_send_wr *wr, unsigned *idx, int *size, int nreq) { int err = 0; @@ -2737,6 +2747,8 @@ out: if (bf-need_lock) spin_lock(bf-lock); + else + __acquire(bf-lock); /* TBD enable WC */ if (0 nreq == 1 bf-uuarn inl size 1 size = bf-buf_size / 16) { @@ -2753,6 +2765,8 @@ out: bf-offset ^= bf-buf_size; if (bf-need_lock) spin_unlock(bf-lock); + else + __release(bf-lock); } spin_unlock_irqrestore(qp-sq.lock, flags); -- 2.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 for-next 2/5] IB/core: Add support for extended query device caps
Add extensible query device capabilities verb to allow adding new features. ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to copy capability fields to be used by both ib_uverbs_query_device and ib_uverbs_ex_query_device. Signed-off-by: Eli Cohen e...@mellanox.com --- Changes from v0: 1. Allow userspace to pass response buffer smaller than the kernel's. 2. Do not enforce comp_mask at input of query device. 3. Modify ib_copy_to_udata to copy the minimum size between the caller's request and the size provided by userspace. drivers/infiniband/core/uverbs.h | 1 + drivers/infiniband/core/uverbs_cmd.c | 121 ++ drivers/infiniband/core/uverbs_main.c | 3 +- include/rdma/ib_verbs.h | 5 +- include/uapi/rdma/ib_user_verbs.h | 12 +++- 5 files changed, 98 insertions(+), 44 deletions(-) diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index 643c08a025a5..b716b0815644 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -258,5 +258,6 @@ IB_UVERBS_DECLARE_CMD(close_xrcd); IB_UVERBS_DECLARE_EX_CMD(create_flow); IB_UVERBS_DECLARE_EX_CMD(destroy_flow); +IB_UVERBS_DECLARE_EX_CMD(query_device); #endif /* UVERBS_H */ diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 5ba2a86aab6a..74ad0d0de92b 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -378,6 +378,52 @@ err: return ret; } +static void copy_query_dev_fields(struct ib_uverbs_file *file, + struct ib_uverbs_query_device_resp *resp, + struct ib_device_attr *attr) +{ + resp-fw_ver= attr-fw_ver; + resp-node_guid = file-device-ib_dev-node_guid; + resp-sys_image_guid= attr-sys_image_guid; + resp-max_mr_size = attr-max_mr_size; + resp-page_size_cap = attr-page_size_cap; + resp-vendor_id = attr-vendor_id; + resp-vendor_part_id= attr-vendor_part_id; + resp-hw_ver= attr-hw_ver; + resp-max_qp= attr-max_qp; + resp-max_qp_wr = attr-max_qp_wr; + resp-device_cap_flags = attr-device_cap_flags; + resp-max_sge = attr-max_sge; + resp-max_sge_rd= attr-max_sge_rd; + resp-max_cq= attr-max_cq; + resp-max_cqe = attr-max_cqe; + resp-max_mr= attr-max_mr; + resp-max_pd= attr-max_pd; + resp-max_qp_rd_atom= attr-max_qp_rd_atom; + resp-max_ee_rd_atom= attr-max_ee_rd_atom; + resp-max_res_rd_atom = attr-max_res_rd_atom; + resp-max_qp_init_rd_atom = attr-max_qp_init_rd_atom; + resp-max_ee_init_rd_atom = attr-max_ee_init_rd_atom; + resp-atomic_cap= attr-atomic_cap; + resp-max_ee= attr-max_ee; + resp-max_rdd = attr-max_rdd; + resp-max_mw= attr-max_mw; + resp-max_raw_ipv6_qp = attr-max_raw_ipv6_qp; + resp-max_raw_ethy_qp = attr-max_raw_ethy_qp; + resp-max_mcast_grp = attr-max_mcast_grp; + resp-max_mcast_qp_attach = attr-max_mcast_qp_attach; + resp-max_total_mcast_qp_attach = attr-max_total_mcast_qp_attach; + resp-max_ah= attr-max_ah; + resp-max_fmr = attr-max_fmr; + resp-max_map_per_fmr = attr-max_map_per_fmr; + resp-max_srq = attr-max_srq; + resp-max_srq_wr= attr-max_srq_wr; + resp-max_srq_sge = attr-max_srq_sge; + resp-max_pkeys = attr-max_pkeys; + resp-local_ca_ack_delay= attr-local_ca_ack_delay; + resp-phys_port_cnt = file-device-ib_dev-phys_port_cnt; +} + ssize_t ib_uverbs_query_device(struct ib_uverbs_file *file, const char __user *buf, int in_len, int out_len) @@ -398,47 +444,7 @@ ssize_t ib_uverbs_query_device(struct ib_uverbs_file *file, return ret; memset(resp, 0, sizeof resp); - - resp.fw_ver= attr.fw_ver; - resp.node_guid = file-device-ib_dev-node_guid; - resp.sys_image_guid= attr.sys_image_guid; - resp.max_mr_size = attr.max_mr_size; - resp.page_size_cap = attr.page_size_cap; - resp.vendor_id = attr.vendor_id; - resp.vendor_part_id= attr.vendor_part_id; - resp.hw_ver= attr.hw_ver; - resp.max_qp= attr.max_qp; - resp.max_qp_wr = attr.max_qp_wr; - resp.device_cap_flags = attr.device_cap_flags; - resp.max_sge
Re: [PATCH] IB/srp: Fix a 32-bit compiler warning
On Thu, Nov 06, 2014 at 03:18:12PM +0100, Bart Van Assche wrote: The result of a pointer subtraction has type ptrdiff_t. Hence change a %ld format specifier into %td. This change avoids that the following warning is printed on 32-bit systems: warning: format '%ld' expects argument of type 'long int', but argument 5 has type 'int' [-Wformat=] Thanks. Given that this is a new warning in the patches I merged I'll add this one as well. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for-next 0/5] Dynamically Connected support
Hi Roland, the following series of patches introduces a new transport service named DC. Support is added at IB/core layer, uverbs interface to userspace and mlx5 for Connect-IB devices. Details on the new transport can be found in the first patch in the series. Eli Eli Cohen (5): IB/core: Add DC transport support IB/uverbs: Add userspace interface to DC verbs mlx5_core: Add DC support at mlx5 core layer mlx5_ib: Add support for DC mlx5_core: Update mlx5_command_str with DC commands drivers/infiniband/core/uverbs.h | 11 + drivers/infiniband/core/uverbs_cmd.c | 474 +++--- drivers/infiniband/core/uverbs_main.c | 35 +- drivers/infiniband/core/verbs.c | 87 drivers/infiniband/hw/mlx5/main.c | 19 + drivers/infiniband/hw/mlx5/mlx5_ib.h | 24 ++ drivers/infiniband/hw/mlx5/qp.c | 289 - drivers/infiniband/hw/mlx5/user.h | 41 ++ drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 21 +- drivers/net/ethernet/mellanox/mlx5/core/debugfs.c | 110 + drivers/net/ethernet/mellanox/mlx5/core/eq.c | 15 +- drivers/net/ethernet/mellanox/mlx5/core/main.c| 6 + drivers/net/ethernet/mellanox/mlx5/core/qp.c | 185 + include/linux/mlx5/device.h | 12 + include/linux/mlx5/driver.h | 24 +- include/linux/mlx5/mlx5_ifc.h | 179 include/linux/mlx5/qp.h | 39 +- include/rdma/ib_verbs.h | 88 +++- include/uapi/rdma/ib_user_verbs.h | 126 +- 19 files changed, 1710 insertions(+), 75 deletions(-) -- 2.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for-next 1/5] IB/core: Add DC transport support
The Dynamically Connected (DC) Transport Service provides a reliable datagram-like model that allows a single sender to target multiple destinations from the same QP, keeping the communication resource footprint essentially independent of system size. DC supports RDMA read and write operations, as well as atomic variable updates. With this transport a DC initiator QP may be used to target multiple remote DC Targets, in one or more remote processes. As far as reachability is concerned, the DC model is somewhat similar to the Unreliable Datagram (UD) model in the sense that each WR submitted to the DC SQ carries the information that identifies the remote destination. DC contexts are then dynamically tied to each other across the network to create a temporary RC-equivalent connection that is used to reliably deliver one or more messages. This dynamic connection is created in-band and pipelined with the subsequent data communication thus eliminating most of the cost associated with the 3-way handshake off the Connection Manager protocol used for connecting RC QPs. When all WRs posted to that remote network address are acknowledged, the initiator sends a disconnect request to the responder, thereby releasing the responder resources. A DC initiator is yet another type of QP identified by a new transport type, IB_QPT_DC_INI. The target is end is presented by a new object of type ib_dct. This patch extend the verbs API with the following new APIs: ib_create_dc - Create a DC target ib_destroy_dct - Destroy a DC target ib_query_dct - query DC target ib_arm_dct - Arm a DC target to generate asynchronous event on DC key violation. Once a event is generated, the DC target moves to a fired state and will not generated further key violation events unless re-armed. ib_modify_qp_ex - is an extension to ib_modify_qp which allows to pass the 64 bit DC key. Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/infiniband/core/verbs.c | 87 + include/rdma/ib_verbs.h | 87 - 2 files changed, 172 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index c2b89cc5dbca..c2b2d00c9794 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -521,6 +521,9 @@ static const struct { [IB_QPT_RC] = (IB_QP_PKEY_INDEX | IB_QP_PORT | IB_QP_ACCESS_FLAGS), + [IB_QPT_DC_INI] = (IB_QP_PKEY_INDEX | + IB_QP_PORT | + IB_QP_DC_KEY), [IB_QPT_XRC_INI] = (IB_QP_PKEY_INDEX | IB_QP_PORT | IB_QP_ACCESS_FLAGS), @@ -549,6 +552,9 @@ static const struct { [IB_QPT_RC] = (IB_QP_PKEY_INDEX | IB_QP_PORT | IB_QP_ACCESS_FLAGS), + [IB_QPT_DC_INI] = (IB_QP_PKEY_INDEX | + IB_QP_PORT | + IB_QP_DC_KEY), [IB_QPT_XRC_INI] = (IB_QP_PKEY_INDEX | IB_QP_PORT | IB_QP_ACCESS_FLAGS), @@ -574,6 +580,8 @@ static const struct { IB_QP_RQ_PSN | IB_QP_MAX_DEST_RD_ATOMIC | IB_QP_MIN_RNR_TIMER), + [IB_QPT_DC_INI] = (IB_QP_AV | + IB_QP_PATH_MTU), [IB_QPT_XRC_INI] = (IB_QP_AV | IB_QP_PATH_MTU | IB_QP_DEST_QPN | @@ -600,6 +608,8 @@ static const struct { [IB_QPT_RC] = (IB_QP_ALT_PATH | IB_QP_ACCESS_FLAGS | IB_QP_PKEY_INDEX), +[IB_QPT_DC_INI] = (IB_QP_PKEY_INDEX | +IB_QP_DC_KEY), [IB_QPT_XRC_INI] = (IB_QP_ALT_PATH
[PATCH for-next 3/5] mlx5_core: Add DC support at mlx5 core layer
Update debugfs, implement DC commands and handle events. Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/net/ethernet/mellanox/mlx5/core/debugfs.c | 110 + drivers/net/ethernet/mellanox/mlx5/core/eq.c | 15 +- drivers/net/ethernet/mellanox/mlx5/core/main.c| 6 + drivers/net/ethernet/mellanox/mlx5/core/qp.c | 185 ++ include/linux/mlx5/device.h | 12 ++ include/linux/mlx5/driver.h | 24 ++- include/linux/mlx5/mlx5_ifc.h | 179 + include/linux/mlx5/qp.h | 39 - 8 files changed, 566 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c index 10e1f1a18255..3e115dee235a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c @@ -35,6 +35,7 @@ #include linux/mlx5/qp.h #include linux/mlx5/cq.h #include linux/mlx5/driver.h +#include linux/mlx5/mlx5_ifc.h #include mlx5_core.h enum { @@ -62,6 +63,22 @@ static char *qp_fields[] = { }; enum { + DCT_PID, + DCT_STATE, + DCT_MTU, + DCT_KEY_VIOL, + DCT_CQN, +}; + +static char *dct_fields[] = { + [DCT_PID] = pid, + [DCT_STATE] = state, + [DCT_MTU] = mtu, + [DCT_KEY_VIOL] = key_violations, + [DCT_CQN] = cqn, +}; + +enum { EQ_NUM_EQES, EQ_INTR, EQ_LOG_PG_SZ, @@ -122,6 +139,26 @@ void mlx5_qp_debugfs_cleanup(struct mlx5_core_dev *dev) debugfs_remove_recursive(dev-priv.qp_debugfs); } +int mlx5_dct_debugfs_init(struct mlx5_core_dev *dev) +{ + if (!mlx5_debugfs_root) + return 0; + + dev-priv.dct_debugfs = debugfs_create_dir(DCTs, dev-priv.dbg_root); + if (!dev-priv.dct_debugfs) + return -ENOMEM; + + return 0; +} + +void mlx5_dct_debugfs_cleanup(struct mlx5_core_dev *dev) +{ + if (!mlx5_debugfs_root) + return; + + debugfs_remove_recursive(dev-priv.dct_debugfs); +} + int mlx5_eq_debugfs_init(struct mlx5_core_dev *dev) { if (!mlx5_debugfs_root) @@ -355,6 +392,51 @@ out: return param; } +static u64 dct_read_field(struct mlx5_core_dev *dev, struct mlx5_core_dct *dct, + int index, int *is_str) +{ + void *out; + void *dctc; + int out_sz = MLX5_ST_SZ_BYTES(query_dct_out); + u64 param = 0; + int err; + + out = kzalloc(out_sz, GFP_KERNEL); + if (!out) + return param; + + err = mlx5_core_dct_query(dev, dct, out); + if (err) { + mlx5_core_warn(dev, failed to query dct\n); + goto out; + } + + dctc = MLX5_ADDR_OF(query_dct_out, out, dct_context_entry); + *is_str = 0; + switch (index) { + case DCT_PID: + param = dct-pid; + break; + case DCT_STATE: + param = (u64)mlx5_dct_state_str(MLX5_GET(dctc, dctc, state)); + *is_str = 1; + break; + case DCT_MTU: + param = ib_mtu_enum_to_int(MLX5_GET(dctc, dctc, mtu)); + break; + case DCT_KEY_VIOL: + param = MLX5_GET(dctc, dctc, dc_access_key_violation_count); + break; + case DCT_CQN: + param = MLX5_GET(dctc, dctc, cqn); + break; + } + +out: + kfree(out); + return param; +} + static u64 eq_read_field(struct mlx5_core_dev *dev, struct mlx5_eq *eq, int index) { @@ -457,6 +539,10 @@ static ssize_t dbg_read(struct file *filp, char __user *buf, size_t count, field = cq_read_field(d-dev, d-object, desc-i); break; + case MLX5_DBG_RSC_DCT: + field = dct_read_field(d-dev, d-object, desc-i, is_str); + break; + default: mlx5_core_warn(d-dev, invalid resource type %d\n, d-type); return -EINVAL; @@ -558,6 +644,30 @@ void mlx5_debug_qp_remove(struct mlx5_core_dev *dev, struct mlx5_core_qp *qp) rem_res_tree(qp-dbg); } +int mlx5_debug_dct_add(struct mlx5_core_dev *dev, struct mlx5_core_dct *dct) +{ + int err; + + if (!mlx5_debugfs_root) + return 0; + + err = add_res_tree(dev, MLX5_DBG_RSC_DCT, dev-priv.dct_debugfs, + dct-dbg, dct-dctn, dct_fields, + ARRAY_SIZE(dct_fields), dct); + if (err) + dct-dbg = NULL; + + return err; +} + +void mlx5_debug_dct_remove(struct mlx5_core_dev *dev, struct mlx5_core_dct *dct) +{ + if (!mlx5_debugfs_root) + return; + + if (dct-dbg) + rem_res_tree(dct-dbg); +} int mlx5_debug_eq_add(struct mlx5_core_dev *dev, struct mlx5_eq *eq) { diff --git
[PATCH for-next 2/5] IB/uverbs: Add userspace interface to DC verbs
Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/infiniband/core/uverbs.h | 11 + drivers/infiniband/core/uverbs_cmd.c | 474 +- drivers/infiniband/core/uverbs_main.c | 35 ++- include/rdma/ib_verbs.h | 1 + include/uapi/rdma/ib_user_verbs.h | 126 - 5 files changed, 584 insertions(+), 63 deletions(-) diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index b716b0815644..3343696df6b1 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -163,6 +163,10 @@ struct ib_ucq_object { u32 async_events_reported; }; +struct ib_udct_object { + struct ib_uevent_object uevent; +}; + extern spinlock_t ib_uverbs_idr_lock; extern struct idr ib_uverbs_pd_idr; extern struct idr ib_uverbs_mr_idr; @@ -173,6 +177,7 @@ extern struct idr ib_uverbs_qp_idr; extern struct idr ib_uverbs_srq_idr; extern struct idr ib_uverbs_xrcd_idr; extern struct idr ib_uverbs_rule_idr; +extern struct idr ib_uverbs_dct_idr; void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj); @@ -189,6 +194,7 @@ void ib_uverbs_release_uevent(struct ib_uverbs_file *file, void ib_uverbs_comp_handler(struct ib_cq *cq, void *cq_context); void ib_uverbs_cq_event_handler(struct ib_event *event, void *context_ptr); void ib_uverbs_qp_event_handler(struct ib_event *event, void *context_ptr); +void ib_uverbs_dct_event_handler(struct ib_event *event, void *context_ptr); void ib_uverbs_srq_event_handler(struct ib_event *event, void *context_ptr); void ib_uverbs_event_handler(struct ib_event_handler *handler, struct ib_event *event); @@ -259,5 +265,10 @@ IB_UVERBS_DECLARE_CMD(close_xrcd); IB_UVERBS_DECLARE_EX_CMD(create_flow); IB_UVERBS_DECLARE_EX_CMD(destroy_flow); IB_UVERBS_DECLARE_EX_CMD(query_device); +IB_UVERBS_DECLARE_EX_CMD(create_dct); +IB_UVERBS_DECLARE_EX_CMD(destroy_dct); +IB_UVERBS_DECLARE_EX_CMD(query_dct); +IB_UVERBS_DECLARE_EX_CMD(arm_dct); +IB_UVERBS_DECLARE_EX_CMD(modify_qp); #endif /* UVERBS_H */ diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 0bc215fa2a85..e2a1f691315c 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -56,6 +56,7 @@ static struct uverbs_lock_class ah_lock_class = { .name = AH-uobj }; static struct uverbs_lock_class srq_lock_class = { .name = SRQ-uobj }; static struct uverbs_lock_class xrcd_lock_class = { .name = XRCD-uobj }; static struct uverbs_lock_class rule_lock_class = { .name = RULE-uobj }; +static struct uverbs_lock_class dct_lock_class = { .name = DCT-uobj }; /* * The ib_uobject locking scheme is as follows: @@ -258,6 +259,16 @@ static void put_qp_write(struct ib_qp *qp) put_uobj_write(qp-uobject); } +static struct ib_dct *idr_read_dct(int dct_handle, struct ib_ucontext *context) +{ + return idr_read_obj(ib_uverbs_dct_idr, dct_handle, context, 0); +} + +static void put_dct_read(struct ib_dct *dct) +{ + put_uobj_read(dct-uobject); +} + static struct ib_srq *idr_read_srq(int srq_handle, struct ib_ucontext *context) { return idr_read_obj(ib_uverbs_srq_idr, srq_handle, context, 0); @@ -325,6 +336,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file, INIT_LIST_HEAD(ucontext-ah_list); INIT_LIST_HEAD(ucontext-xrcd_list); INIT_LIST_HEAD(ucontext-rule_list); + INIT_LIST_HEAD(ucontext-dct_list); ucontext-closing = 0; resp.num_comp_vectors = file-device-num_comp_vectors; @@ -1990,86 +2002,79 @@ static int modify_qp_mask(enum ib_qp_type qp_type, int mask) } } -ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file, - const char __user *buf, int in_len, - int out_len) +static ssize_t modify_qp(struct ib_uverbs_file *file, +struct ib_uverbs_modify_qp_ex *cmd, +struct ib_udata *udata) { - struct ib_uverbs_modify_qp cmd; - struct ib_udataudata; struct ib_qp *qp; struct ib_qp_attr *attr; intret; - if (copy_from_user(cmd, buf, sizeof cmd)) - return -EFAULT; - - INIT_UDATA(udata, buf + sizeof cmd, NULL, in_len - sizeof cmd, - out_len); - attr = kmalloc(sizeof *attr, GFP_KERNEL); if (!attr) return -ENOMEM; - qp = idr_read_qp(cmd.qp_handle, file-ucontext); + qp = idr_read_qp(cmd-qp_handle, file-ucontext); if (!qp) { ret = -EINVAL; goto out; } - attr-qp_state= cmd.qp_state; - attr-cur_qp_state= cmd.cur_qp_state; - attr-path_mtu= cmd.path_mtu; - attr-path_mig_state = cmd.path_mig_state; - attr-qkey= cmd.qkey;
[PATCH for-next 5/5] mlx5_core: Update mlx5_command_str with DC commands
Add support for DC commands and make a few other minor fixes. Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index 368c6c5ea014..37786ea6ad6c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -289,10 +289,10 @@ const char *mlx5_command_str(int command) return TEARDOWN_HCA; case MLX5_CMD_OP_ENABLE_HCA: - return MLX5_CMD_OP_ENABLE_HCA; + return ENABLE_HCA; case MLX5_CMD_OP_DISABLE_HCA: - return MLX5_CMD_OP_DISABLE_HCA; + return DISABLE_HCA; case MLX5_CMD_OP_QUERY_PAGES: return QUERY_PAGES; @@ -390,6 +390,21 @@ const char *mlx5_command_str(int command) case MLX5_CMD_OP_RESIZE_SRQ: return RESIZE_SRQ; + case MLX5_CMD_OP_CREATE_DCT: + return CREATE_DCT; + + case MLX5_CMD_OP_DESTROY_DCT: + return DESTROY_DCT; + + case MLX5_CMD_OP_DRAIN_DCT: + return DRAIN_DCT; + + case MLX5_CMD_OP_QUERY_DCT: + return QUERY_DCT; + + case MLX5_CMD_OP_ARM_DCT_FOR_KEY_VIOLATION: + return ARM_DCT; + case MLX5_CMD_OP_ALLOC_PD: return ALLOC_PD; @@ -415,7 +430,7 @@ const char *mlx5_command_str(int command) return DEALLOC_XRCD; case MLX5_CMD_OP_ACCESS_REG: - return MLX5_CMD_OP_ACCESS_REG; + return ACCESS_REG; default: return unknown command opcode; } -- 2.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for-next 4/5] mlx5_ib: Add support for DC
Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/infiniband/hw/mlx5/main.c| 19 +++ drivers/infiniband/hw/mlx5/mlx5_ib.h | 24 +++ drivers/infiniband/hw/mlx5/qp.c | 289 ++- drivers/infiniband/hw/mlx5/user.h| 41 + 4 files changed, 370 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 3c6fa99c4256..c805385d878d 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -283,6 +283,11 @@ static int mlx5_ib_query_device(struct ib_device *ibdev, props-max_total_mcast_qp_attach = props-max_mcast_qp_attach * props-max_mcast_grp; props-max_map_per_fmr = INT_MAX; /* no limit in ConnectIB */ + if (gen-flags MLX5_DEV_CAP_FLAG_DCT) { + props-device_cap_flags |= IB_DEVICE_DC_TRANSPORT; + props-dc_rd_req = 1 gen-log_max_ra_req_dc; + props-dc_rd_res = 1 gen-log_max_ra_res_dc; + } out: kfree(in_mad); @@ -1405,6 +1410,8 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev) dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list; dev-ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list; dev-ib_dev.check_mr_status = mlx5_ib_check_mr_status; + dev-ib_dev.uverbs_ex_cmd_mask |= + (1ull IB_USER_VERBS_EX_CMD_MODIFY_QP); if (mdev-caps.gen.flags MLX5_DEV_CAP_FLAG_XRC) { dev-ib_dev.alloc_xrcd = mlx5_ib_alloc_xrcd; @@ -1417,6 +1424,18 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev) dev-ib_dev.uverbs_ex_cmd_mask |= (1ull IB_USER_VERBS_EX_CMD_QUERY_DEVICE); + if (mdev-caps.gen.flags MLX5_DEV_CAP_FLAG_DCT) { + dev-ib_dev.create_dct = mlx5_ib_create_dct; + dev-ib_dev.destroy_dct = mlx5_ib_destroy_dct; + dev-ib_dev.query_dct = mlx5_ib_query_dct; + dev-ib_dev.arm_dct = mlx5_ib_arm_dct; + dev-ib_dev.uverbs_ex_cmd_mask |= + (1ull IB_USER_VERBS_EX_CMD_CREATE_DCT) | + (1ull IB_USER_VERBS_EX_CMD_DESTROY_DCT) | + (1ull IB_USER_VERBS_EX_CMD_QUERY_DCT)| + (1ull IB_USER_VERBS_EX_CMD_ARM_DCT); + } + err = init_node_data(dev); if (err) goto err_eqs; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 386780f0d1e1..5a78f2c60867 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -194,6 +194,11 @@ struct mlx5_ib_qp { boolsignature_en; }; +struct mlx5_ib_dct { + struct ib_dct ibdct; + struct mlx5_core_dctmdct; +}; + struct mlx5_ib_cq_buf { struct mlx5_buf buf; struct ib_umem *umem; @@ -444,6 +449,16 @@ static inline struct mlx5_ib_fast_reg_page_list *to_mfrpl(struct ib_fast_reg_pag return container_of(ibfrpl, struct mlx5_ib_fast_reg_page_list, ibfrpl); } +static inline struct mlx5_ib_dct *to_mibdct(struct mlx5_core_dct *mdct) +{ + return container_of(mdct, struct mlx5_ib_dct, mdct); +} + +static inline struct mlx5_ib_dct *to_mdct(struct ib_dct *ibdct) +{ + return container_of(ibdct, struct mlx5_ib_dct, ibdct); +} + struct mlx5_ib_ah { struct ib_ahibah; struct mlx5_av av; @@ -482,6 +497,8 @@ struct ib_qp *mlx5_ib_create_qp(struct ib_pd *pd, struct ib_udata *udata); int mlx5_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask, struct ib_udata *udata); +int mlx5_ib_modify_qp_ex(struct ib_qp *ibqp, struct ib_qp_attr *attr, +int attr_mask, struct ib_udata *udata); int mlx5_ib_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr, int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr); int mlx5_ib_destroy_qp(struct ib_qp *qp); @@ -524,6 +541,13 @@ struct ib_xrcd *mlx5_ib_alloc_xrcd(struct ib_device *ibdev, struct ib_ucontext *context, struct ib_udata *udata); int mlx5_ib_dealloc_xrcd(struct ib_xrcd *xrcd); +struct ib_dct *mlx5_ib_create_dct(struct ib_pd *pd, + struct ib_dct_init_attr *attr, + struct ib_udata *uhw); +int mlx5_ib_destroy_dct(struct ib_dct *dct, struct ib_udata *uhw); +int mlx5_ib_query_dct(struct ib_dct *dct, struct ib_dct_attr *attr, + struct ib_udata *uhw); +int mlx5_ib_arm_dct(struct ib_dct *dct, struct ib_udata *uhw); int mlx5_vector2eqn(struct mlx5_ib_dev *dev, int vector, int *eqn, int *irqn); int mlx5_ib_get_buf_offset(u64 addr, int page_shift, u32 *offset); int
Re: [PATCH for-next 1/5] IB/core: Add DC transport support
On 11/6/2014 5:52 PM, Eli Cohen wrote: ib_modify_qp_ex - is an extension to ib_modify_qp which allows to pass the 64 bit DC key. we don't add new such kernel verb, probably just left over in the change-log from earlier internal version, right? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 net-next 0/3] RDMA/cxgb4,cxgb4vf,cxgb4i,csiostor: Cleanup macros
On Wed, Nov 05, 2014 at 14:54:43 -0500, David Miller wrote: From: Hariprasad Shenai haripra...@chelsio.com Date: Tue, 4 Nov 2014 08:20:54 +0530 It's not really the hardware which generates these hardware constant symbolic macros/register defines of course, it's scripts developed by the hardware team. Various patches have ended up changing the style of the symbolic macros/register defines and some of them used the macros/register defines that matches the output of the script from the hardware team. We've told you that we don't care what format your internal whatever uses for these macros. We have standards, tastes, and desires and reasons for naming macros in a certain way in upstream kernel code. I consider it flat out unacceptable to use macros with one letter prefixes like S_. You simply should not do this. Okay. We’ll clean up all of the macros to match the files' original style. We do need to change the sense of the *_MASK macros since they don’t match how we use them as field tokens. Also the *_SHIFT, *_MASK and *_GET names are sucking up space and making lines wrap unnecessarily, creating readability problems. Can we change these to *_S, *_M and *_G? E.g.: -#define INGPADBOUNDARY_MASK0x0070U -#define INGPADBOUNDARY_SHIFT 4 -#define INGPADBOUNDARY(x) ((x) INGPADBOUNDARY_SHIFT) -#define INGPADBOUNDARY_GET(x) (((x) INGPADBOUNDARY_MASK) \ - INGPADBOUNDARY_SHIFT) +#define INGPADBOUNDARY_M 0x0007U +#define INGPADBOUNDARY_S 4 +#define INGPADBOUNDARY(x) ((x) INGPADBOUNDARY_S) +#define INGPADBOUNDARY_G(x)(((x) INGPADBOUNDARY_S) \ + INGPADBOUNDARY_M) Thanks, Hari -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
NFSoRDMA bi-weekly meeting minutes (11/6)
Attendees: Jeff Becker (NASA) Wendy Cheng (Intel) Rupert Dance (Soft Forge) Steve Dickson (Red Hat) Chuck Lever (Oracle) Doug Ledford (RedHat) Shirley Ma (Oracle) Sachin Prabhu (RedHat) Devesh Sharma (Emulex) Anna Schumaker (Net App) Steve Wise (OpenGridComputing, Chelsio) Yan Burman(Mellanox) missed the call because of the daylight time change. :( Moderator: Shirley Ma (Oracle) NFSoRDMA developers bi-weekly meeting is to help organizing NFSoRDMA development and test effort from different resources to speed up NFSoRDMA upstream kernel work and NFSoRDMA diagnosing/debugging tools development. Hopefully the quality of NFSoRDMA upstream patches can be improved by being tested with a quorum of HW vendors. Today's meeting notes: 1. OFA OFA Interop event (Rupert) The Interop event went pretty well. The test covered IB, RoCE and iWARP with different vendors HW and upsteam/OFED stack. NFSoRDMA IB was included in this test event, however NFSoRDMA RoCE wasn't able to test since the modules were not in the stack yet. The detail report will come in a few weeks. 2. Upstream bugs: (Chuck, Anna, Shirley) 3.17 kernel has a bug in tearing down connection, this bug was hit consistently when enabling multiple EQs in xprtrdma when Shirley run fio multithread random read/write workload. Chuck has a nice patch to this bug, Shirley has validated this fix by stressing the fio overnight. Anna will check to see the possibility to push to the stable tree since it blocks multi-threads NFSoRDMA workload. Here is the link to the bug report: https://bugzilla.linux-nfs.org/show_bug.cgi?id=276 3. Performance test and analyze tools: (Sachin, Chuck, Wendy, Shirley, SteveW) Discussed about several tools on analyzing NFSoRDMA performance for both latency and bandwidth: -- systemTap: Sachine starts to look at how to use systemTap, it requires sometimes to study the tool and create the probe scripts to NFS, RPC, xprtrdma layer. -- Ftrace: enabling trace modules and functions to report the execution flow latency. -- perf: report execution flows APIs latency and cpu usage -- /proc/self/mountstats: report total execution time, RTT and wait time for each RPC. The execution time latency contributes from wake up and wait, which depends on how busy the system is. RPC RTT itself latency is reasonable. The NFSoRDMA performance relies on both implementation and protocol. We don't know the weight of performance gap from either implementation or protocol yet. RPC seems slow, pNFS might have better performance for supporting multiple queue pairs. Chuck will increase RPC credit limit to see how much performance gain from there. Our performance goal is to look at the implementation issues, then protocols. Feel free to reply here for anything missing or incorrect. See you on Nov.20th. 10/23/2014 @7:30am PT @8:30am MT @9:30am CT @10:30am ET @Bangalore @9:00pm @Israel @6:30pm Duration: 1 hour Call-in number: Israel: +972 37219638 Bangalore: +91 8039890080 (180030109800) France Colombes +33 1 5760 +33 176728936 US: 8666824770, 408-7744073 Conference Code: 2308833 Passcode: 63767362 (it's NFSoRDMA, in case you couldn't remember) Thanks everyone for joining the call and providing valuable inputs/work to the community to make NFSoRDMA better. Cheers, Shirley -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 0/2] mlx5_core fixes for 3.18
From: Eli Cohen e...@dev.mellanox.co.il Date: Thu, 6 Nov 2014 12:51:20 +0200 the following two patches fix races to could lead to kernel panic in some cases. Series applied, thanks Eli. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 net-next 0/3] RDMA/cxgb4,cxgb4vf,cxgb4i,csiostor: Cleanup macros
From: Hariprasad S haripra...@chelsio.com Date: Thu, 6 Nov 2014 21:45:10 +0530 On Wed, Nov 05, 2014 at 14:54:43 -0500, David Miller wrote: From: Hariprasad Shenai haripra...@chelsio.com Date: Tue, 4 Nov 2014 08:20:54 +0530 It's not really the hardware which generates these hardware constant symbolic macros/register defines of course, it's scripts developed by the hardware team. Various patches have ended up changing the style of the symbolic macros/register defines and some of them used the macros/register defines that matches the output of the script from the hardware team. We've told you that we don't care what format your internal whatever uses for these macros. We have standards, tastes, and desires and reasons for naming macros in a certain way in upstream kernel code. I consider it flat out unacceptable to use macros with one letter prefixes like S_. You simply should not do this. Okay. We’ll clean up all of the macros to match the files' original style. We do need to change the sense of the *_MASK macros since they don’t match how we use them as field tokens. Also the *_SHIFT, *_MASK and *_GET names are sucking up space and making lines wrap unnecessarily, creating readability problems. Can we change these to *_S, *_M and *_G? E.g.: That's fine.
[PATCH v2 1/6] ib/mad: Add function to support format specifiers for node description
From: Ira Weiny ira.we...@intel.com ib_build_node_desc - prints src node description into dest while mapping format specifiers Specifiers supported: %h system hostname %d device name Define a default Node Description format to be %h %d Original work done by Mike Heinz. The function signature is generic to support some devices which are not processing an ib_smp object when calling this function. Reviewed-by: John Fleck john.fl...@intel.com Reviewed-by: Michael Heinz michael.william.he...@intel.com Reviewed-by: Mike Marciniszyn mike.marcinis...@intel.com Signed-off-by: Ira Weiny ira.we...@intel.com --- Changes from V1 remove unnecessary ib_smi include drivers/infiniband/core/mad.c | 37 + include/rdma/ib_mad.h | 17 + include/rdma/ib_verbs.h | 6 -- 3 files changed, 58 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 74c30f4..93cf8a0 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -39,6 +39,7 @@ #include linux/dma-mapping.h #include linux/slab.h #include linux/module.h +#include linux/utsname.h #include rdma/ib_cache.h #include mad_priv.h @@ -996,6 +997,42 @@ int ib_get_mad_data_offset(u8 mgmt_class) } EXPORT_SYMBOL(ib_get_mad_data_offset); +void ib_build_node_desc(char *dest, char *src, int dest_len, + struct ib_device *dev) +{ + char *end = dest + dest_len-1; + char *field; + + while (*src (dest end)) { + if (*src != '%') { + *dest++ = *src++; + } else { + src++; + switch (*src) { + case 'h': + field = init_utsname()-nodename; + src++; + while (*field (*field != '.') (dest end)) + *dest++ = *field++; + break; + case 'd': + field = dev-name; + src++; + while (*field (dest end)) + *dest++ = *field++; + break; + default: + src++; + } + } + } + if (dest end) + *dest = 0; + else + *end = 0; +} +EXPORT_SYMBOL(ib_build_node_desc); + int ib_is_mad_class_rmpp(u8 mgmt_class) { if ((mgmt_class == IB_MGMT_CLASS_SUBN_ADM) || diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h index 9bb99e9..975642e 100644 --- a/include/rdma/ib_mad.h +++ b/include/rdma/ib_mad.h @@ -677,4 +677,21 @@ void ib_free_send_mad(struct ib_mad_send_buf *send_buf); */ int ib_mad_kernel_rmpp_agent(struct ib_mad_agent *agent); +#define IB_DEFAULT_ND_FORMAT %h %d +/** + * ib_build_node_desc - prints src node description into dest while mapping + * format specifiers + * + * Specifiers supported: + * %h system hostname + * %d device name + * + * @dest: destination buffer + * @src: source buffer + * @dest_len: destination buffer length + * @dev: ib_device + */ +void ib_build_node_desc(char *dest, char *src, int dest_len, + struct ib_device *dev); + #endif /* IB_MAD_H */ diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 470a011..f3ec6de 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -55,6 +55,8 @@ extern struct workqueue_struct *ib_wq; +#define IB_DEVICE_DESC_MAX 64 + union ib_gid { u8 raw[16]; struct { @@ -351,7 +353,7 @@ enum ib_device_modify_flags { struct ib_device_modify { u64 sys_image_guid; - charnode_desc[64]; + charnode_desc[IB_DEVICE_DESC_MAX]; }; enum ib_port_modify_flags { @@ -1625,7 +1627,7 @@ struct ib_device { u64 uverbs_cmd_mask; u64 uverbs_ex_cmd_mask; - char node_desc[64]; + char node_desc[IB_DEVICE_DESC_MAX]; __be64 node_guid; u32 local_dma_lkey; u8 node_type; -- 1.8.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 1/6] ib/mad: Add function to support format specifiers for node description
+void ib_build_node_desc(char *dest, char *src, int dest_len, + struct ib_device *dev) +{ + char *end = dest + dest_len-1; + char *field; + + while (*src (dest end)) { + if (*src != '%') { + *dest++ = *src++; + } else { + src++; + switch (*src) { + case 'h': + field = init_utsname()-nodename; + src++; + while (*field (*field != '.') (dest end)) + *dest++ = *field++; + break; Indentation is off + case 'd': + field = dev-name; + src++; + while (*field (dest end)) + *dest++ = *field++; + break; + default: + src++; + } + } src++ is called in every case and could be moved outside + } + if (dest end) + *dest = 0; + else + *end = 0; *dest = '\0'; should be sufficient in all cases -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv3 net-next 0/3] RDMA/cxgb4,cxgb4vf,cxgb4i,csiostor: Cleanup macros
Hi, This series moves the debugfs code to a new file debugfs.c and cleans up macros/register defines. Various patches have ended up changing the style of the symbolic macros/register defines and some of them used the macros/register defines that matches the output of the script from the hardware team. As a result, the current kernel.org files are a mix of different macro styles. Since this macro/register defines is used by five different drivers, a few patch series have ended up adding duplicate macro/register define entries with different styles. This makes these register define/macro files a complete mess and we want to make them clean and consistent. Will post few more series so that we can cover all the macros so that they all follow the same style to be consistent. The patches series is created against 'net-next' tree. And includes patches on cxgb4, cxgb4vf, iw_cxgb4, csiostor and cxgb4i driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. Thanks V3: Use suffix instead of prefix for macros/register defines V2: Changes the description and cover-letter content to answer David Miller's question Hariprasad Shenai (3): cxgb4: Add cxgb4_debugfs.c, move all debugfs code to new file cxgb4: Cleanup macros so they follow the same style and look consistent cxgb4: Cleanup macros so they follow the same style and look consistent, part 2 drivers/infiniband/hw/cxgb4/cm.c | 56 +++--- drivers/infiniband/hw/cxgb4/cq.c |8 +- drivers/infiniband/hw/cxgb4/mem.c | 14 +- drivers/infiniband/hw/cxgb4/qp.c | 26 ++-- drivers/net/ethernet/chelsio/cxgb4/Makefile|1 + drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |3 +- drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.h |6 +- drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 158 ++ drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.h | 52 ++ drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 173 +--- drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h | 15 +- drivers/net/ethernet/chelsio/cxgb4/sge.c | 32 ++-- drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 127 +++--- drivers/net/ethernet/chelsio/cxgb4/t4_regs.h | 72 +++-- drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h | 142 drivers/net/ethernet/chelsio/cxgb4vf/sge.c | 32 ++-- drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h |2 +- drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c | 150 +- drivers/scsi/csiostor/csio_attr.c |8 +- drivers/scsi/csiostor/csio_hw.c| 14 +- drivers/scsi/csiostor/csio_hw_t4.c | 15 +- drivers/scsi/csiostor/csio_hw_t5.c | 21 ++- drivers/scsi/csiostor/csio_init.c |6 +- drivers/scsi/csiostor/csio_lnode.c | 18 +- drivers/scsi/csiostor/csio_mb.c| 172 ++-- drivers/scsi/csiostor/csio_scsi.c | 24 ++-- drivers/scsi/csiostor/csio_wr.h|2 +- drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 35 ++-- 28 files changed, 816 insertions(+), 568 deletions(-) create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.h -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv3 net-next 1/3] cxgb4: Add cxgb4_debugfs.c, move all debugfs code to new file
Signed-off-by: Hariprasad Shenai haripra...@chelsio.com --- drivers/net/ethernet/chelsio/cxgb4/Makefile|1 + drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |1 + drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 158 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.h | 52 +++ drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 97 + 5 files changed, 217 insertions(+), 92 deletions(-) create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c create mode 100644 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.h diff --git a/drivers/net/ethernet/chelsio/cxgb4/Makefile b/drivers/net/ethernet/chelsio/cxgb4/Makefile index 1df65c9..b852807 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/Makefile +++ b/drivers/net/ethernet/chelsio/cxgb4/Makefile @@ -6,3 +6,4 @@ obj-$(CONFIG_CHELSIO_T4) += cxgb4.o cxgb4-objs := cxgb4_main.o l2t.o t4_hw.o sge.o cxgb4-$(CONFIG_CHELSIO_T4_DCB) += cxgb4_dcb.o +cxgb4-$(CONFIG_DEBUG_FS) += cxgb4_debugfs.o diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h index 3c481b2..dad1ea9 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h @@ -1085,4 +1085,5 @@ void t4_db_dropped(struct adapter *adapter); int t4_fwaddrspace_write(struct adapter *adap, unsigned int mbox, u32 addr, u32 val); void t4_sge_decode_idma_state(struct adapter *adapter, int state); +void t4_free_mem(void *addr); #endif /* __CXGB4_H__ */ diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c new file mode 100644 index 000..e86b5fe --- /dev/null +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c @@ -0,0 +1,158 @@ +/* + * This file is part of the Chelsio T4 Ethernet driver for Linux. + * + * Copyright (c) 2003-2014 Chelsio Communications, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include linux/seq_file.h +#include linux/debugfs.h +#include linux/string_helpers.h +#include linux/sort.h + +#include cxgb4.h +#include t4_regs.h +#include t4fw_api.h +#include cxgb4_debugfs.h +#include l2t.h + +static ssize_t mem_read(struct file *file, char __user *buf, size_t count, + loff_t *ppos) +{ + loff_t pos = *ppos; + loff_t avail = file_inode(file)-i_size; + unsigned int mem = (uintptr_t)file-private_data 3; + struct adapter *adap = file-private_data - mem; + __be32 *data; + int ret; + + if (pos 0) + return -EINVAL; + if (pos = avail) + return 0; + if (count avail - pos) + count = avail - pos; + + data = t4_alloc_mem(count); + if (!data) + return -ENOMEM; + + spin_lock(adap-win0_lock); + ret = t4_memory_rw(adap, 0, mem, pos, count, data, T4_MEMORY_READ); + spin_unlock(adap-win0_lock); + if (ret) { + t4_free_mem(data); + return ret; + } + ret = copy_to_user(buf, data, count); + + t4_free_mem(data); + if (ret) + return -EFAULT; + + *ppos = pos + count; + return count; +} + +static const struct file_operations mem_debugfs_fops = { + .owner = THIS_MODULE, + .open= simple_open, + .read= mem_read, + .llseek = default_llseek, +}; + +static void add_debugfs_mem(struct adapter *adap, const char *name, + unsigned int idx, unsigned int size_mb) +{ + struct dentry *de; + + de =
[PATCHv3 net-next 2/3] cxgb4: Cleanup macros so they follow the same style and look consistent
Various patches have ended up changing the style of the symbolic macros/register to different style. As a result, the current kernel.org files are a mix of different macro styles. Since this macro/register defines is used by different drivers a few patch series have ended up adding duplicate macro/register define entries with different styles. This makes these register define/macro files a complete mess and we want to make them clean and consistent. This patch cleans up a part of it. Signed-off-by: Hariprasad Shenai haripra...@chelsio.com --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 32 +- drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 16 +++-- drivers/net/ethernet/chelsio/cxgb4/t4_hw.c |6 +- drivers/net/ethernet/chelsio/cxgb4/t4_regs.h | 72 +++- drivers/scsi/csiostor/csio_hw_t4.c | 15 ++-- drivers/scsi/csiostor/csio_hw_t5.c | 21 +++--- drivers/scsi/csiostor/csio_init.c |6 +- 7 files changed, 106 insertions(+), 62 deletions(-) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c index e86b5fe..c98a350 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c @@ -128,30 +128,30 @@ int t4_setup_debugfs(struct adapter *adap) t4_debugfs_files, ARRAY_SIZE(t4_debugfs_files)); - i = t4_read_reg(adap, MA_TARGET_MEM_ENABLE); - if (i EDRAM0_ENABLE) { - size = t4_read_reg(adap, MA_EDRAM0_BAR); - add_debugfs_mem(adap, edc0, MEM_EDC0, EDRAM_SIZE_GET(size)); + i = t4_read_reg(adap, MA_TARGET_MEM_ENABLE_A); + if (i EDRAM0_ENABLE_F) { + size = t4_read_reg(adap, MA_EDRAM0_BAR_A); + add_debugfs_mem(adap, edc0, MEM_EDC0, EDRAM0_SIZE_G(size)); } - if (i EDRAM1_ENABLE) { - size = t4_read_reg(adap, MA_EDRAM1_BAR); - add_debugfs_mem(adap, edc1, MEM_EDC1, EDRAM_SIZE_GET(size)); + if (i EDRAM1_ENABLE_F) { + size = t4_read_reg(adap, MA_EDRAM1_BAR_A); + add_debugfs_mem(adap, edc1, MEM_EDC1, EDRAM1_SIZE_G(size)); } if (is_t4(adap-params.chip)) { - size = t4_read_reg(adap, MA_EXT_MEMORY_BAR); - if (i EXT_MEM_ENABLE) + size = t4_read_reg(adap, MA_EXT_MEMORY_BAR_A); + if (i EXT_MEM_ENABLE_F) add_debugfs_mem(adap, mc, MEM_MC, - EXT_MEM_SIZE_GET(size)); + EXT_MEM_SIZE_G(size)); } else { - if (i EXT_MEM_ENABLE) { - size = t4_read_reg(adap, MA_EXT_MEMORY_BAR); + if (i EXT_MEM0_ENABLE_F) { + size = t4_read_reg(adap, MA_EXT_MEMORY0_BAR_A); add_debugfs_mem(adap, mc0, MEM_MC0, - EXT_MEM_SIZE_GET(size)); + EXT_MEM0_SIZE_G(size)); } - if (i EXT_MEM1_ENABLE) { - size = t4_read_reg(adap, MA_EXT_MEMORY1_BAR); + if (i EXT_MEM1_ENABLE_F) { + size = t4_read_reg(adap, MA_EXT_MEMORY1_BAR_A); add_debugfs_mem(adap, mc1, MEM_MC1, - EXT_MEM_SIZE_GET(size)); + EXT_MEM1_SIZE_G(size)); } } return 0; diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index 172f68b..a2d6e50 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -3802,7 +3802,7 @@ int cxgb4_read_tpte(struct net_device *dev, u32 stag, __be32 *tpte) { struct adapter *adap; u32 offset, memtype, memaddr; - u32 edc0_size, edc1_size, mc0_size, mc1_size; + u32 edc0_size, edc1_size, mc0_size, mc1_size, size; u32 edc0_end, edc1_end, mc0_end, mc1_end; int ret; @@ -3816,9 +3816,12 @@ int cxgb4_read_tpte(struct net_device *dev, u32 stag, __be32 *tpte) * and EDC1. Some cards will have neither MC0 nor MC1, most cards have * MC0, and some have both MC0 and MC1. */ - edc0_size = EDRAM_SIZE_GET(t4_read_reg(adap, MA_EDRAM0_BAR)) 20; - edc1_size = EDRAM_SIZE_GET(t4_read_reg(adap, MA_EDRAM1_BAR)) 20; - mc0_size = EXT_MEM_SIZE_GET(t4_read_reg(adap, MA_EXT_MEMORY_BAR)) 20; + size = t4_read_reg(adap, MA_EDRAM0_BAR_A); + edc0_size = EDRAM0_SIZE_G(size) 20; + size = t4_read_reg(adap, MA_EDRAM1_BAR_A); + edc1_size = EDRAM1_SIZE_G(size) 20; + size = t4_read_reg(adap, MA_EXT_MEMORY0_BAR_A); + mc0_size = EXT_MEM0_SIZE_G(size) 20;
[PATCHv3 net-next 3/3] cxgb4: Cleanup macros so they follow the same style and look consistent, part 2
Various patches have ended up changing the style of the symbolic macros/register defines to different style. As a result, the current kernel.org files are a mix of different macro styles. Since this macro/register defines is used by different drivers a few patch series have ended up adding duplicate macro/register define entries with different styles. This makes these register define/macro files a complete mess and we want to make them clean and consistent. This patch cleans up a part of it. Signed-off-by: Hariprasad Shenai haripra...@chelsio.com --- drivers/infiniband/hw/cxgb4/cm.c | 56 drivers/infiniband/hw/cxgb4/cq.c |8 +- drivers/infiniband/hw/cxgb4/mem.c | 14 +- drivers/infiniband/hw/cxgb4/qp.c | 26 ++-- drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |2 +- drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.h |6 +- drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 60 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h | 15 +- drivers/net/ethernet/chelsio/cxgb4/sge.c | 32 ++-- drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 121 +++--- drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h | 142 + drivers/net/ethernet/chelsio/cxgb4vf/sge.c | 32 ++-- drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h |2 +- drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c | 150 +- drivers/scsi/csiostor/csio_attr.c |8 +- drivers/scsi/csiostor/csio_hw.c| 14 +- drivers/scsi/csiostor/csio_lnode.c | 18 +- drivers/scsi/csiostor/csio_mb.c| 172 ++-- drivers/scsi/csiostor/csio_scsi.c | 24 ++-- drivers/scsi/csiostor/csio_wr.h|2 +- drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 35 ++-- 21 files changed, 509 insertions(+), 430 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c index fb61f66..a07d8e1 100644 --- a/drivers/infiniband/hw/cxgb4/cm.c +++ b/drivers/infiniband/hw/cxgb4/cm.c @@ -472,10 +472,10 @@ static void send_flowc(struct c4iw_ep *ep, struct sk_buff *skb) skb = get_skb(skb, flowclen, GFP_KERNEL); flowc = (struct fw_flowc_wr *)__skb_put(skb, flowclen); - flowc-op_to_nparams = cpu_to_be32(FW_WR_OP(FW_FLOWC_WR) | - FW_FLOWC_WR_NPARAMS(8)); - flowc-flowid_len16 = cpu_to_be32(FW_WR_LEN16(DIV_ROUND_UP(flowclen, - 16)) | FW_WR_FLOWID(ep-hwtid)); + flowc-op_to_nparams = cpu_to_be32(FW_WR_OP_V(FW_FLOWC_WR) | + FW_FLOWC_WR_NPARAMS_V(8)); + flowc-flowid_len16 = cpu_to_be32(FW_WR_LEN16_V(DIV_ROUND_UP(flowclen, + 16)) | FW_WR_FLOWID_V(ep-hwtid)); flowc-mnemval[0].mnemonic = FW_FLOWC_MNEM_PFNVFN; flowc-mnemval[0].val = cpu_to_be32(FW_PFVF_CMD_PFN @@ -803,16 +803,16 @@ static void send_mpa_req(struct c4iw_ep *ep, struct sk_buff *skb, req = (struct fw_ofld_tx_data_wr *)skb_put(skb, wrlen); memset(req, 0, wrlen); req-op_to_immdlen = cpu_to_be32( - FW_WR_OP(FW_OFLD_TX_DATA_WR) | - FW_WR_COMPL(1) | - FW_WR_IMMDLEN(mpalen)); + FW_WR_OP_V(FW_OFLD_TX_DATA_WR) | + FW_WR_COMPL_F | + FW_WR_IMMDLEN_V(mpalen)); req-flowid_len16 = cpu_to_be32( - FW_WR_FLOWID(ep-hwtid) | - FW_WR_LEN16(wrlen 4)); + FW_WR_FLOWID_V(ep-hwtid) | + FW_WR_LEN16_V(wrlen 4)); req-plen = cpu_to_be32(mpalen); req-tunnel_to_proxy = cpu_to_be32( - FW_OFLD_TX_DATA_WR_FLUSH(1) | - FW_OFLD_TX_DATA_WR_SHOVE(1)); + FW_OFLD_TX_DATA_WR_FLUSH_F | + FW_OFLD_TX_DATA_WR_SHOVE_F); mpa = (struct mpa_message *)(req + 1); memcpy(mpa-key, MPA_KEY_REQ, sizeof(mpa-key)); @@ -897,16 +897,16 @@ static int send_mpa_reject(struct c4iw_ep *ep, const void *pdata, u8 plen) req = (struct fw_ofld_tx_data_wr *)skb_put(skb, wrlen); memset(req, 0, wrlen); req-op_to_immdlen = cpu_to_be32( - FW_WR_OP(FW_OFLD_TX_DATA_WR) | - FW_WR_COMPL(1) | - FW_WR_IMMDLEN(mpalen)); + FW_WR_OP_V(FW_OFLD_TX_DATA_WR) | + FW_WR_COMPL_F | + FW_WR_IMMDLEN_V(mpalen)); req-flowid_len16 = cpu_to_be32( - FW_WR_FLOWID(ep-hwtid) | - FW_WR_LEN16(wrlen 4)); + FW_WR_FLOWID_V(ep-hwtid) | + FW_WR_LEN16_V(wrlen 4)); req-plen = cpu_to_be32(mpalen); req-tunnel_to_proxy = cpu_to_be32( - FW_OFLD_TX_DATA_WR_FLUSH(1) | - FW_OFLD_TX_DATA_WR_SHOVE(1)); +
[PATCH 1/2] iw_cxgb4: Fixes locking issue in process_mpa_request
= [ INFO: possible recursive locking detected ] 3.17.0+ #3 Tainted: GE - kworker/u64:3/299 is trying to acquire lock: (epc-mutex){+.+.+.}, at: [a074e07a] process_mpa_request+0x1aa/0x3e0 [iw_cxgb4] but task is already holding lock: (epc-mutex){+.+.+.}, at: [a074e34e] rx_data+0x9e/0x1f0 [iw_cxgb4] other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(epc-mutex); lock(epc-mutex); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/u64:3/299: #0: (%siw_cxgb4){.+.+.+}, at: [8106f14d] process_one_work+0x13d/0x4d0 #1: (skb_work){+.+.+.}, at: [8106f14d] process_one_work+0x13d/0x4d0 #2: (epc-mutex){+.+.+.}, at: [a074e34e] rx_data+0x9e/0x1f0 [iw_cxgb4] stack backtrace: CPU: 2 PID: 299 Comm: kworker/u64:3 Tainted: GE 3.17.0+ #3 Hardware name: Dell Inc. PowerEdge T110/0X744K, BIOS 1.2.1 01/28/2010 Workqueue: iw_cxgb4 process_work [iw_cxgb4] 8800b91593d0 8800b8a2f9f8 815df107 0001 8800b9158750 8800b8a2fa28 8109f0e2 8800bb768a00 8800b91593d0 8800b9158750 8800b8a2fa88 Call Trace: [815df107] dump_stack+0x49/0x62 [8109f0e2] print_deadlock_bug+0xf2/0x100 [810a0f04] validate_chain+0x454/0x700 [810a1574] __lock_acquire+0x3c4/0x580 [a074e07a] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4] [810a17cc] lock_acquire+0x9c/0x110 [a074e07a] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4] [815e111b] mutex_lock_nested+0x4b/0x360 [a074e07a] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4] [810c181a] ? del_timer_sync+0xaa/0xd0 [810c1770] ? try_to_del_timer_sync+0x70/0x70 [a074e07a] process_mpa_request+0x1aa/0x3e0 [iw_cxgb4] [a074a3ec] ? update_rx_credits+0xec/0x140 [iw_cxgb4] [a074e381] rx_data+0xd1/0x1f0 [iw_cxgb4] [8109ff23] ? mark_held_locks+0x73/0xa0 [815e4b90] ? _raw_spin_unlock_irqrestore+0x40/0x70 [810a020d] ? trace_hardirqs_on_caller+0xfd/0x1c0 [810a02dd] ? trace_hardirqs_on+0xd/0x10 [a074c931] process_work+0x51/0x80 [iw_cxgb4] [8106f1c8] process_one_work+0x1b8/0x4d0 [8106f14d] ? process_one_work+0x13d/0x4d0 [8106f600] worker_thread+0x120/0x3c0 [8106f4e0] ? process_one_work+0x4d0/0x4d0 [81074a0e] kthread+0xde/0x100 [815e4b40] ? _raw_spin_unlock_irq+0x30/0x40 [81074930] ? __init_kthread_worker+0x70/0x70 [815e512c] ret_from_fork+0x7c/0xb0 [81074930] ? __init_kthread_worker+0x70/0x70 === Based on original work by Steve Wise sw...@opengridcomputing.com Signed-off-by: Steve Wise sw...@opengridcomputing.com Signed-off-by: Hariprasad Shenai haripra...@chelsio.com --- drivers/infiniband/hw/cxgb4/cm.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c index fb61f66..ce87fd3 100644 --- a/drivers/infiniband/hw/cxgb4/cm.c +++ b/drivers/infiniband/hw/cxgb4/cm.c @@ -1640,7 +1640,8 @@ static void process_mpa_request(struct c4iw_ep *ep, struct sk_buff *skb) __state_set(ep-com, MPA_REQ_RCVD); /* drive upcall */ - mutex_lock(ep-parent_ep-com.mutex); + mutex_lock_nested(ep-parent_ep-com.mutex, + SINGLE_DEPTH_NESTING); if (ep-parent_ep-com.state != DEAD) { if (connect_request_upcall(ep)) abort_connection(ep, skb, GFP_KERNEL); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] iw_cxgb4: Fixes locking issue and MR limit for T4/T5 adapter
Hi, This patch series fixes a lock issue and limit MR's to 8 Gb for Chelsio T4/T5 adapter. The patches series is created against 'infiniband' tree. And includes patches on iw_cxgb4 driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. Thanks Hariprasad Shenai (2): iw_cxgb4: Fixes locking issue in process_mpa_request iw_cxgb4: limit MRs to 8GB for T4/T5 devices drivers/infiniband/hw/cxgb4/cm.c |3 ++- drivers/infiniband/hw/cxgb4/mem.c | 22 ++ 2 files changed, 24 insertions(+), 1 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] iw_cxgb4: limit MRs to 8GB for T4/T5 devices
T4/T5 hardware can't handle MRs = 8GB due to a hardware bug. So limit registrations to 8GB for thse devices. Based on original work by Steve Wise sw...@opengridcomputing.com Signed-off-by: Steve Wise sw...@opengridcomputing.com Signed-off-by: Hariprasad Shenai haripra...@chelsio.com --- drivers/infiniband/hw/cxgb4/mem.c | 22 ++ 1 files changed, 22 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c index ec7a298..d5dd3f2 100644 --- a/drivers/infiniband/hw/cxgb4/mem.c +++ b/drivers/infiniband/hw/cxgb4/mem.c @@ -50,6 +50,13 @@ static int inline_threshold = C4IW_INLINE_THRESHOLD; module_param(inline_threshold, int, 0644); MODULE_PARM_DESC(inline_threshold, inline vs dsgl threshold (default=128)); +static int mr_exceeds_hw_limits(struct c4iw_dev *dev, u64 length) +{ + return (is_t4(dev-rdev.lldi.adapter_type) || + is_t5(dev-rdev.lldi.adapter_type)) + length = 8*1024*1024*1024ULL; +} + static int _c4iw_write_mem_dma_aligned(struct c4iw_rdev *rdev, u32 addr, u32 len, dma_addr_t data, int wait) { @@ -536,6 +543,11 @@ int c4iw_reregister_phys_mem(struct ib_mr *mr, int mr_rereg_mask, return ret; } + if (mr_exceeds_hw_limits(rhp, total_size)) { + kfree(page_list); + return -EINVAL; + } + ret = reregister_mem(rhp, php, mh, shift, npages); kfree(page_list); if (ret) @@ -596,6 +608,12 @@ struct ib_mr *c4iw_register_phys_mem(struct ib_pd *pd, if (ret) goto err; + if (mr_exceeds_hw_limits(rhp, total_size)) { + kfree(page_list); + ret = -EINVAL; + goto err; + } + ret = alloc_pbl(mhp, npages); if (ret) { kfree(page_list); @@ -699,6 +717,10 @@ struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, php = to_c4iw_pd(pd); rhp = php-rhp; + + if (mr_exceeds_hw_limits(rhp, length)) + return ERR_PTR(-EINVAL); + mhp = kzalloc(sizeof(*mhp), GFP_KERNEL); if (!mhp) return ERR_PTR(-ENOMEM); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html