Re: [PATCH 13/15] IB/srpt: Detect session shutdown reliably
On 01/06/2016 03:39 PM, Sagi Grimberg wrote: On 05/01/2016 16:26, Bart Van Assche wrote: The Last WQE Reached event is only generated after one or more work requests have been queued on the QP associated with a session. Since session shutdown can start before any work requests have been queued, use a zero-length RDMA write to wait until a QP has been drained. Is it just me or is there (a lot) more going on here other than this patch description? Hello Sagi, I will make the patch description more detailed. Sorry if some of this code is hard to follow but that's because of the high level of concurrency in the SRP target driver. Some time ago I documented how session management in the SCST ib_srpt driver works. This driver follows the same model. These notes can be found here: http://sourceforge.net/p/scst/svn/HEAD/tree/trunk/srpt/session-management.txt. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/15] IB/srpt: Fix srpt_handle_cmd() error paths
On 01/06/2016 03:31 PM, Sagi Grimberg wrote: On 05/01/2016 16:25, Bart Van Assche wrote: @@ -1518,8 +1517,7 @@ static int srpt_handle_cmd(struct srpt_rdma_ch *ch, if (srpt_get_desc_tbl(send_ioctx, srp_cmd, &dir, &data_len)) { pr_err("0x%llx: parsing SRP descriptor table failed.\n", srp_cmd->tag); -ret = TCM_INVALID_CDB_FIELD; -goto send_sense; +goto put; Do you still need to call target_put_sess_cmd() in this failure? Good catch. I will update this patch before I repost this patch series. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/15] IB/srpt: Fix srpt_close_session()
On 01/06/2016 03:21 PM, Sagi Grimberg wrote: Avoid that srpt_close_session() waits if it doesn't have to wait. Can you explain when it doesn't have to wait? is it possible that srpt_release_channel_work() was already triggered? isn't that a problem? Hello Sagi, The target core can decide to shut down an RDMA channel or a channel shutdown can be the result of the reception of a DREQ message. It is in the latter case that srpt_release_channel_work() can have finished before srpt_close_session() is called. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/15] IB/srpt: Detect session shutdown reliably
On 01/06/2016 06:21 AM, Christoph Hellwig wrote: On Tue, Jan 05, 2016 at 03:26:49PM +0100, Bart Van Assche wrote: The Last WQE Reached event is only generated after one or more work requests have been queued on the QP associated with a session. Since session shutdown can start before any work requests have been queued, use a zero-length RDMA write to wait until a QP has been drained. We actually ran into the same issue with a SRPT-derived work in progress driver recently.. @@ -2314,14 +2346,13 @@ static void srpt_cm_timewait_exit(struct srpt_rdma_ch *ch) { pr_info("Received CM TimeWait exit for ch %s-%d.\n", ch->sess_name, ch->qp->qp_num); + srpt_close_ch(ch); } static void srpt_cm_rep_error(struct srpt_rdma_ch *ch) { pr_info("Received CM REP error for ch %s-%d.\n", ch->sess_name, ch->qp->qp_num); } /** @@ -2329,33 +2360,7 @@ static void srpt_cm_rep_error(struct srpt_rdma_ch *ch) */ static void srpt_cm_dreq_recv(struct srpt_rdma_ch *ch) { + srpt_disconnect_ch(ch); } /** @@ -2364,7 +2369,7 @@ static void srpt_cm_dreq_recv(struct srpt_rdma_ch *ch) static void srpt_cm_drep_recv(struct srpt_rdma_ch *ch) { pr_info("Received InfiniBand DREP message for cm_id %p.\n", ch->cm_id); + srpt_close_ch(ch); } Is there any good reason to keep these one-liner helpers around? Hello Christoph, Not really. I will inline these functions. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/15] IB/srpt: Fix how aborted commands are processed
On 01/06/2016 06:13 AM, Christoph Hellwig wrote: pr_debug("Aborting cmd with state %d and tag %lld\n", state, ioctx->cmd.tag); @@ -1299,14 +1291,16 @@ static int srpt_abort_cmd(struct srpt_send_ioctx *ioctx) case SRPT_STATE_NEW: case SRPT_STATE_DATA_IN: case SRPT_STATE_MGMT: + case SRPT_STATE_DONE: /* * Do nothing - defer abort processing until * srpt_queue_response() is invoked. */ - WARN_ON(!(ioctx->cmd.transport_state & CMD_T_ABORTED)); Seems like this depends on your target core changes? Maybe it would be better to respin the series to got just on top of Doug's RDMA tree, as I think we're more likely to get this series merged for 4.5 than the target core changes.. This series indeed depends on my recently posted target core patch series. I will rebase, retest and repost this patch series. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/15] IB/srpt: Fix a rare crash in srpt_close_session()
Keep the ib_srpt session as long as srpt_close_session() may access it. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig --- drivers/infiniband/ulp/srpt/ib_srpt.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 669ae5c..7548bd5 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -2228,6 +2228,8 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id, pr_debug("Establish connection sess=%p name=%s cm_id=%p\n", ch->sess, ch->sess_name, ch->cm_id); + kref_get(&ch->kref); + /* create srp_login_response */ rsp->opcode = SRP_LOGIN_RSP; rsp->tag = req->tag; @@ -2991,6 +2993,8 @@ static void srpt_close_session(struct se_session *se_sess) srpt_disconnect_ch(ch); + kref_put(&ch->kref, srpt_free_ch); + if (!wait) return; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/15] IB/srpt: Fix srpt_write_pending()
The only allowed return values for the write_pending() callback function are 0, -EAGAIN and -ENOMEM. Since attempting to perform RDMA over a disconnecting channel will result in an IB error completion anyway, remove the code that checks the channel state from srpt_write_pending(). Signed-off-by: Bart Van Assche Cc: Christoph Hellwig --- drivers/infiniband/ulp/srpt/ib_srpt.c | 33 - 1 file changed, 4 insertions(+), 29 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index cacb697..669ae5c 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -2522,39 +2522,14 @@ out_unmap: */ static int srpt_write_pending(struct se_cmd *se_cmd) { - struct srpt_rdma_ch *ch; - struct srpt_send_ioctx *ioctx; + struct srpt_send_ioctx *ioctx = + container_of(se_cmd, struct srpt_send_ioctx, cmd); + struct srpt_rdma_ch *ch = ioctx->ch; enum srpt_command_state new_state; - int ret; - - ioctx = container_of(se_cmd, struct srpt_send_ioctx, cmd); new_state = srpt_set_cmd_state(ioctx, SRPT_STATE_NEED_DATA); WARN_ON(new_state == SRPT_STATE_DONE); - - ch = ioctx->ch; - BUG_ON(!ch); - - switch (ch->state) { - case CH_CONNECTING: - WARN(true, "unexpected channel state %d\n", ch->state); - ret = -EINVAL; - goto out; - case CH_LIVE: - break; - case CH_DISCONNECTING: - case CH_DRAINING: - case CH_DISCONNECTED: - pr_debug("cmd with tag %lld: channel disconnecting\n", -ioctx->cmd.tag); - srpt_set_cmd_state(ioctx, SRPT_STATE_DATA_IN); - ret = -EINVAL; - goto out; - } - ret = srpt_xfer_data(ch, ioctx); - -out: - return ret; + return srpt_xfer_data(ch, ioctx); } static u8 tcm_to_srp_tsk_mgmt_status(const int tcm_mgmt_status) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/15] IB/srpt: Detect session shutdown reliably
The Last WQE Reached event is only generated after one or more work requests have been queued on the QP associated with a session. Since session shutdown can start before any work requests have been queued, use a zero-length RDMA write to wait until a QP has been drained. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig --- drivers/infiniband/ulp/srpt/ib_srpt.c | 274 +- drivers/infiniband/ulp/srpt/ib_srpt.h | 18 ++- 2 files changed, 150 insertions(+), 142 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 6ec130d..cacb697 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -91,10 +91,11 @@ MODULE_PARM_DESC(srpt_service_guid, " instead of using the node_guid of the first HCA."); static struct ib_client srpt_client; -static void srpt_release_channel(struct srpt_rdma_ch *ch); +static void srpt_free_ch(struct kref *kref); static int srpt_queue_status(struct se_cmd *cmd); static void srpt_recv_done(struct ib_cq *cq, struct ib_wc *wc); static void srpt_send_done(struct ib_cq *cq, struct ib_wc *wc); +static void srpt_zerolength_write_done(struct ib_cq *cq, struct ib_wc *wc); static bool srpt_set_ch_state(struct srpt_rdma_ch *ch, enum rdma_ch_state new) { @@ -170,6 +171,23 @@ static void srpt_srq_event(struct ib_event *event, void *ctx) pr_info("SRQ event %d\n", event->event); } +static const char *get_ch_state_name(enum rdma_ch_state s) +{ + switch (s) { + case CH_CONNECTING: + return "connecting"; + case CH_LIVE: + return "live"; + case CH_DISCONNECTING: + return "disconnecting"; + case CH_DRAINING: + return "draining"; + case CH_DISCONNECTED: + return "disconnected"; + } + return "???"; +} + /** * srpt_qp_event() - QP event callback function. */ @@ -183,11 +201,9 @@ static void srpt_qp_event(struct ib_event *event, struct srpt_rdma_ch *ch) ib_cm_notify(ch->cm_id, event->event); break; case IB_EVENT_QP_LAST_WQE_REACHED: - if (srpt_set_ch_state(ch, CH_RELEASING)) - srpt_release_channel(ch); - else - pr_debug("%s: state %d - ignored LAST_WQE.\n", -ch->sess_name, ch->state); + pr_debug("%s-%d, state %s: received Last WQE event.\n", +ch->sess_name, ch->qp->qp_num, +get_ch_state_name(ch->state)); break; default: pr_err("received unrecognized IB QP event %d\n", event->event); @@ -789,6 +805,37 @@ out: } /** + * srpt_zerolength_write() - Perform a zero-length RDMA write. + * + * A quote from the InfiniBand specification: C9-88: For an HCA responder + * using Reliable Connection service, for each zero-length RDMA READ or WRITE + * request, the R_Key shall not be validated, even if the request includes + * Immediate data. + */ +static int srpt_zerolength_write(struct srpt_rdma_ch *ch) +{ + struct ib_send_wr wr, *bad_wr; + + memset(&wr, 0, sizeof(wr)); + wr.opcode = IB_WR_RDMA_WRITE; + wr.wr_cqe = &ch->zw_cqe; + wr.send_flags = IB_SEND_SIGNALED; + return ib_post_send(ch->qp, &wr, &bad_wr); +} + +static void srpt_zerolength_write_done(struct ib_cq *cq, struct ib_wc *wc) +{ + struct srpt_rdma_ch *ch = cq->cq_context; + + WARN(wc->status == IB_WC_SUCCESS, "%s-%d: QP not in error state\n", +ch->sess_name, ch->qp->qp_num); + if (srpt_set_ch_state(ch, CH_DISCONNECTED)) + schedule_work(&ch->release_work); + else + WARN_ONCE("%s-%d\n", ch->sess_name, ch->qp->qp_num); +} + +/** * srpt_get_desc_tbl() - Parse the data descriptors of an SRP_CMD request. * @ioctx: Pointer to the I/O context associated with the request. * @srp_cmd: Pointer to the SRP_CMD request data. @@ -1819,111 +1866,102 @@ static void srpt_destroy_ch_ib(struct srpt_rdma_ch *ch) } /** - * __srpt_close_ch() - Close an RDMA channel by setting the QP error state. + * srpt_close_ch() - Close an RDMA channel. * - * Reset the QP and make sure all resources associated with the channel will - * be deallocated at an appropriate time. + * Make sure all resources associated with the channel will be deallocated at + * an appropriate time. * - * Note: The caller must hold ch->sport->sdev->spinlock. + * Returns true if and only if the channel state has been modified into + * CH_DRAINING. */ -static void __srpt_close_ch(struct srpt_rdma_ch *ch) +static bool srpt_close_ch(struct srpt_rdma
[PATCH 12/15] IB/srpt: Eliminate srpt_find_channel()
In the CM REQ message handler, store the channel pointer in cm_id->context such that the function srpt_find_channel() is no longer needed. Additionally, make the CM event messages more informative. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig --- drivers/infiniband/ulp/srpt/ib_srpt.c | 113 +- 1 file changed, 43 insertions(+), 70 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 2ae3c1b..6ec130d 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -1893,25 +1893,14 @@ static int srpt_shutdown_session(struct se_session *se_sess) * ib_destroy_cm_id(), which locks the cm_id spinlock and hence waits until * this function has finished). */ -static void srpt_drain_channel(struct ib_cm_id *cm_id) +static void srpt_drain_channel(struct srpt_rdma_ch *ch) { - struct srpt_device *sdev; - struct srpt_rdma_ch *ch; int ret; bool do_reset = false; WARN_ON_ONCE(irqs_disabled()); - sdev = cm_id->context; - BUG_ON(!sdev); - spin_lock_irq(&sdev->spinlock); - list_for_each_entry(ch, &sdev->rch_list, list) { - if (ch->cm_id == cm_id) { - do_reset = srpt_set_ch_state(ch, CH_DRAINING); - break; - } - } - spin_unlock_irq(&sdev->spinlock); + do_reset = srpt_set_ch_state(ch, CH_DRAINING); if (do_reset) { if (ch->sess) @@ -1925,34 +1914,6 @@ static void srpt_drain_channel(struct ib_cm_id *cm_id) } /** - * srpt_find_channel() - Look up an RDMA channel. - * @cm_id: Pointer to the CM ID of the channel to be looked up. - * - * Return NULL if no matching RDMA channel has been found. - */ -static struct srpt_rdma_ch *srpt_find_channel(struct srpt_device *sdev, - struct ib_cm_id *cm_id) -{ - struct srpt_rdma_ch *ch; - bool found; - - WARN_ON_ONCE(irqs_disabled()); - BUG_ON(!sdev); - - found = false; - spin_lock_irq(&sdev->spinlock); - list_for_each_entry(ch, &sdev->rch_list, list) { - if (ch->cm_id == cm_id) { - found = true; - break; - } - } - spin_unlock_irq(&sdev->spinlock); - - return found ? ch : NULL; -} - -/** * srpt_release_channel() - Release channel resources. * * Schedules the actual release because: @@ -2160,6 +2121,7 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id, memcpy(ch->t_port_id, req->target_port_id, 16); ch->sport = &sdev->port[param->port - 1]; ch->cm_id = cm_id; + cm_id->context = ch; /* * Avoid QUEUE_FULL conditions by limiting the number of buffers used * for the SRP protocol to the command queue size. @@ -2304,10 +2266,23 @@ out: return ret; } -static void srpt_cm_rej_recv(struct ib_cm_id *cm_id) +static void srpt_cm_rej_recv(struct srpt_rdma_ch *ch, +enum ib_cm_rej_reason reason, +const u8 *private_data, +u8 private_data_len) { - pr_info("Received IB REJ for cm_id %p.\n", cm_id); - srpt_drain_channel(cm_id); + char *priv = kmalloc(private_data_len * 3 + 1, GFP_KERNEL); + int i; + + if (priv) { + priv[0] = '\0'; + for (i = 0; i < private_data_len; i++) + sprintf(priv + 3 * i, "%02x ", private_data[i]); + } + pr_info("Received CM REJ for ch %s-%d; reason %d; private data %s.\n", + ch->sess_name, ch->qp->qp_num, reason, priv ? : "(?)"); + kfree(priv); + srpt_drain_channel(ch); } /** @@ -2316,14 +2291,10 @@ static void srpt_cm_rej_recv(struct ib_cm_id *cm_id) * An IB_CM_RTU_RECEIVED message indicates that the connection is established * and that the recipient may begin transmitting (RTU = ready to use). */ -static void srpt_cm_rtu_recv(struct ib_cm_id *cm_id) +static void srpt_cm_rtu_recv(struct srpt_rdma_ch *ch) { - struct srpt_rdma_ch *ch; int ret; - ch = srpt_find_channel(cm_id->context, cm_id); - BUG_ON(!ch); - if (srpt_set_ch_state(ch, CH_LIVE)) { struct srpt_recv_ioctx *ioctx, *ioctx_tmp; @@ -2339,31 +2310,30 @@ static void srpt_cm_rtu_recv(struct ib_cm_id *cm_id) } } -static void srpt_cm_timewait_exit(struct ib_cm_id *cm_id) +static void srpt_cm_timewait_exit(struct srpt_rdma_ch *ch) { - pr_info("Received IB TimeWait exit for cm_id %p.\n", cm_id); - srpt_drain_channel(cm_id); + pr_info("Received CM TimeWait exit for ch %s-%d.\n", ch->sess_name, +
[PATCH 11/15] IB/srpt: Fix how aborted commands are processed
srpt_abort_cmd() must not be called in state SRPT_STATE_DATA_IN. Issue a warning if this occurs. srpt_abort_cmd() must not invoke target_put_sess_cmd() for commands in state SRPT_STATE_DONE because the srpt_abort_cmd() callers already do this when necessary. Hence remove this call. If an RDMA read fails the corresponding SCSI command must fail. Hence add a transport_generic_request_failure() call. Remove an incorrect srpt_abort_cmd() call from srpt_rdma_write_done(). Avoid that srpt_send_done() calls srpt_abort_cmd() for finished SCSI commands. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig --- drivers/infiniband/ulp/srpt/ib_srpt.c | 39 ++- 1 file changed, 15 insertions(+), 24 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 395bc1b..2ae3c1b 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -1273,25 +1273,17 @@ static int srpt_abort_cmd(struct srpt_send_ioctx *ioctx) case SRPT_STATE_NEED_DATA: ioctx->state = SRPT_STATE_DATA_IN; break; - case SRPT_STATE_DATA_IN: case SRPT_STATE_CMD_RSP_SENT: case SRPT_STATE_MGMT_RSP_SENT: ioctx->state = SRPT_STATE_DONE; break; default: + WARN_ONCE(true, "%s: unexpected I/O context state %d\n", + __func__, state); break; } spin_unlock_irqrestore(&ioctx->spinlock, flags); - if (state == SRPT_STATE_DONE) { - struct srpt_rdma_ch *ch = ioctx->ch; - - BUG_ON(ch->sess == NULL); - - target_put_sess_cmd(&ioctx->cmd); - goto out; - } - pr_debug("Aborting cmd with state %d and tag %lld\n", state, ioctx->cmd.tag); @@ -1299,14 +1291,16 @@ static int srpt_abort_cmd(struct srpt_send_ioctx *ioctx) case SRPT_STATE_NEW: case SRPT_STATE_DATA_IN: case SRPT_STATE_MGMT: + case SRPT_STATE_DONE: /* * Do nothing - defer abort processing until * srpt_queue_response() is invoked. */ - WARN_ON(!(ioctx->cmd.transport_state & CMD_T_ABORTED)); break; case SRPT_STATE_NEED_DATA: - /* DMA_TO_DEVICE (write) - RDMA read error. */ + pr_debug("tag %#llx: RDMA read error\n", ioctx->cmd.tag); + transport_generic_request_failure(&ioctx->cmd, + TCM_CHECK_CONDITION_ABORT_CMD); break; case SRPT_STATE_CMD_RSP_SENT: /* @@ -1314,18 +1308,16 @@ static int srpt_abort_cmd(struct srpt_send_ioctx *ioctx) * not been received in time. */ srpt_unmap_sg_to_ib_sge(ioctx->ch, ioctx); - target_put_sess_cmd(&ioctx->cmd); + transport_generic_free_cmd(&ioctx->cmd, 0); break; case SRPT_STATE_MGMT_RSP_SENT: - srpt_set_cmd_state(ioctx, SRPT_STATE_DONE); - target_put_sess_cmd(&ioctx->cmd); + transport_generic_free_cmd(&ioctx->cmd, 0); break; default: WARN(1, "Unexpected command state (%d)", state); break; } -out: return state; } @@ -1365,9 +1357,14 @@ static void srpt_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc) container_of(wc->wr_cqe, struct srpt_send_ioctx, rdma_cqe); if (unlikely(wc->status != IB_WC_SUCCESS)) { + /* +* Note: if an RDMA write error completion is received that +* means that a SEND also has been posted. Defer further +* processing of the associated command until the send error +* completion has been received. +*/ pr_info("RDMA_WRITE for ioctx 0x%p failed with status %d\n", ioctx, wc->status); - srpt_abort_cmd(ioctx); } } @@ -1716,15 +1713,10 @@ static void srpt_send_done(struct ib_cq *cq, struct ib_wc *wc) atomic_inc(&ch->sq_wr_avail); - if (wc->status != IB_WC_SUCCESS) { + if (wc->status != IB_WC_SUCCESS) pr_info("sending response for ioctx 0x%p failed" " with status %d\n", ioctx, wc->status); - atomic_dec(&ch->req_lim); - srpt_abort_cmd(ioctx); - goto out; - } - if (state != SRPT_STATE_DONE) { srpt_unmap_sg_to_ib_sge(ch, ioctx); transport_generic_free_cmd(&ioctx->cmd, 0); @@ -1733,7 +1725,6 @@ sta
[PATCH 09/15] IB/srpt: Fix srpt_close_session()
Avoid that srpt_close_session() waits if it doesn't have to wait. Additionally, increase the time during which srpt_close_session() waits until closing a session has finished. This makes it easier to detect session shutdown bugs. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig --- drivers/infiniband/ulp/srpt/ib_srpt.c | 31 --- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 0ff4ed6..1857d17 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -1981,8 +1981,8 @@ static void srpt_release_channel_work(struct work_struct *w) struct se_session *se_sess; ch = container_of(w, struct srpt_rdma_ch, release_work); - pr_debug("ch = %p; ch->sess = %p; release_done = %p\n", ch, ch->sess, -ch->release_done); + pr_debug("%s: %s-%d; release_done = %p\n", __func__, ch->sess_name, +ch->qp->qp_num, ch->release_done); sdev = ch->sport->sdev; BUG_ON(!sdev); @@ -2006,11 +2006,10 @@ static void srpt_release_channel_work(struct work_struct *w) ch->rsp_size, DMA_TO_DEVICE); spin_lock_irq(&sdev->spinlock); - list_del(&ch->list); - spin_unlock_irq(&sdev->spinlock); - + list_del_init(&ch->list); if (ch->release_done) complete(ch->release_done); + spin_unlock_irq(&sdev->spinlock); wake_up(&sdev->ch_releaseQ); @@ -3033,24 +3032,26 @@ static void srpt_release_cmd(struct se_cmd *se_cmd) static void srpt_close_session(struct se_session *se_sess) { DECLARE_COMPLETION_ONSTACK(release_done); - struct srpt_rdma_ch *ch; - struct srpt_device *sdev; - unsigned long res; - - ch = se_sess->fabric_sess_ptr; - WARN_ON(ch->sess != se_sess); + struct srpt_rdma_ch *ch = se_sess->fabric_sess_ptr; + struct srpt_device *sdev = ch->sport->sdev; + bool wait; - pr_debug("ch %p state %d\n", ch, ch->state); + pr_debug("ch %s-%d state %d\n", ch->sess_name, ch->qp->qp_num, +ch->state); - sdev = ch->sport->sdev; spin_lock_irq(&sdev->spinlock); BUG_ON(ch->release_done); ch->release_done = &release_done; + wait = !list_empty(&ch->list); __srpt_close_ch(ch); spin_unlock_irq(&sdev->spinlock); - res = wait_for_completion_timeout(&release_done, 60 * HZ); - WARN_ON(res == 0); + if (!wait) + return; + + while (wait_for_completion_timeout(&release_done, 180 * HZ) == 0) + pr_info("%s(%s-%d state %d): still waiting ...\n", __func__, + ch->sess_name, ch->qp->qp_num, ch->state); } /** -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/15] IB/srpt: Fix srpt_handle_cmd() error paths
The target core function that should be called if target_submit_cmd() fails is target_put_sess_cmd(). Additionally, change the return type of srpt_handle_cmd() from int into void. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig --- drivers/infiniband/ulp/srpt/ib_srpt.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 1857d17..395bc1b 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -1482,15 +1482,14 @@ static int srpt_check_stop_free(struct se_cmd *cmd) /** * srpt_handle_cmd() - Process SRP_CMD. */ -static int srpt_handle_cmd(struct srpt_rdma_ch *ch, - struct srpt_recv_ioctx *recv_ioctx, - struct srpt_send_ioctx *send_ioctx) +static void srpt_handle_cmd(struct srpt_rdma_ch *ch, + struct srpt_recv_ioctx *recv_ioctx, + struct srpt_send_ioctx *send_ioctx) { struct se_cmd *cmd; struct srp_cmd *srp_cmd; u64 data_len; enum dma_data_direction dir; - sense_reason_t ret; int rc; BUG_ON(!send_ioctx); @@ -1518,8 +1517,7 @@ static int srpt_handle_cmd(struct srpt_rdma_ch *ch, if (srpt_get_desc_tbl(send_ioctx, srp_cmd, &dir, &data_len)) { pr_err("0x%llx: parsing SRP descriptor table failed.\n", srp_cmd->tag); - ret = TCM_INVALID_CDB_FIELD; - goto send_sense; + goto put; } rc = target_submit_cmd(cmd, ch->sess, srp_cmd->cdb, @@ -1527,14 +1525,16 @@ static int srpt_handle_cmd(struct srpt_rdma_ch *ch, scsilun_to_int(&srp_cmd->lun), data_len, TCM_SIMPLE_TAG, dir, TARGET_SCF_ACK_KREF); if (rc != 0) { - ret = TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; - goto send_sense; + pr_debug("target_submit_cmd() returned %d for tag %#llx\n", rc, +srp_cmd->tag); + goto put; } - return 0; + return; -send_sense: - transport_send_check_condition_and_sense(cmd, ret, 0); - return -1; +put: + send_ioctx->state = SRPT_STATE_DONE; + target_put_sess_cmd(cmd); + return; } static int srp_tmr_to_tcm(int fn) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/15] IB/srpt: Simplify srpt_shutdown_session()
The target core guarantees that shutdown_session() is only invoked once per session. This means that the ib_srpt target driver doesn't have to track whether or not shutdown_session() has been called. Additionally, ensure that target_sess_cmd_list_set_waiting() is called before target_wait_for_sess_cmds() by moving it into srpt_release_channel_work(). Signed-off-by: Bart Van Assche Cc: Christoph Hellwig --- drivers/infiniband/ulp/srpt/ib_srpt.c | 14 +- drivers/infiniband/ulp/srpt/ib_srpt.h | 1 - 2 files changed, 1 insertion(+), 14 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index a27414d..0ff4ed6 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -1887,19 +1887,6 @@ static void srpt_close_ch(struct srpt_rdma_ch *ch) */ static int srpt_shutdown_session(struct se_session *se_sess) { - struct srpt_rdma_ch *ch = se_sess->fabric_sess_ptr; - unsigned long flags; - - spin_lock_irqsave(&ch->spinlock, flags); - if (ch->in_shutdown) { - spin_unlock_irqrestore(&ch->spinlock, flags); - return true; - } - - ch->in_shutdown = true; - target_sess_cmd_list_set_waiting(se_sess); - spin_unlock_irqrestore(&ch->spinlock, flags); - return true; } @@ -2003,6 +1990,7 @@ static void srpt_release_channel_work(struct work_struct *w) se_sess = ch->sess; BUG_ON(!se_sess); + target_sess_cmd_list_set_waiting(se_sess); target_wait_for_sess_cmds(se_sess); transport_deregister_session_configfs(se_sess); diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.h b/drivers/infiniband/ulp/srpt/ib_srpt.h index a98b86b..d0de468 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.h +++ b/drivers/infiniband/ulp/srpt/ib_srpt.h @@ -288,7 +288,6 @@ struct srpt_rdma_ch { u8 sess_name[36]; struct work_struct release_work; struct completion *release_done; - boolin_shutdown; }; /** -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/15] IB/srpt: Simplify channel state management
The only allowed channel state changes are those that change the channel state into a state with a higher numerical value. This allows to merge the functions srpt_set_ch_state() and srpt_test_and_set_ch_state() into a single function. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig --- drivers/infiniband/ulp/srpt/ib_srpt.c | 42 +-- 1 file changed, 10 insertions(+), 32 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 9cb1a14..a27414d 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -96,37 +96,21 @@ static int srpt_queue_status(struct se_cmd *cmd); static void srpt_recv_done(struct ib_cq *cq, struct ib_wc *wc); static void srpt_send_done(struct ib_cq *cq, struct ib_wc *wc); -static enum rdma_ch_state -srpt_set_ch_state(struct srpt_rdma_ch *ch, enum rdma_ch_state new_state) +static bool srpt_set_ch_state(struct srpt_rdma_ch *ch, enum rdma_ch_state new) { unsigned long flags; enum rdma_ch_state prev; + bool changed = false; spin_lock_irqsave(&ch->spinlock, flags); prev = ch->state; - ch->state = new_state; - spin_unlock_irqrestore(&ch->spinlock, flags); - return prev; -} - -/** - * srpt_test_and_set_ch_state() - Test and set the channel state. - * - * Returns true if and only if the channel state has been set to the new state. - */ -static bool -srpt_test_and_set_ch_state(struct srpt_rdma_ch *ch, enum rdma_ch_state old, - enum rdma_ch_state new) -{ - unsigned long flags; - enum rdma_ch_state prev; - - spin_lock_irqsave(&ch->spinlock, flags); - prev = ch->state; - if (prev == old) + if (new > prev) { ch->state = new; + changed = true; + } spin_unlock_irqrestore(&ch->spinlock, flags); - return prev == old; + + return changed; } /** @@ -199,8 +183,7 @@ static void srpt_qp_event(struct ib_event *event, struct srpt_rdma_ch *ch) ib_cm_notify(ch->cm_id, event->event); break; case IB_EVENT_QP_LAST_WQE_REACHED: - if (srpt_test_and_set_ch_state(ch, CH_DRAINING, - CH_RELEASING)) + if (srpt_set_ch_state(ch, CH_RELEASING)) srpt_release_channel(ch); else pr_debug("%s: state %d - ignored LAST_WQE.\n", @@ -1946,12 +1929,7 @@ static void srpt_drain_channel(struct ib_cm_id *cm_id) spin_lock_irq(&sdev->spinlock); list_for_each_entry(ch, &sdev->rch_list, list) { if (ch->cm_id == cm_id) { - do_reset = srpt_test_and_set_ch_state(ch, - CH_CONNECTING, CH_DRAINING) || - srpt_test_and_set_ch_state(ch, - CH_LIVE, CH_DRAINING) || - srpt_test_and_set_ch_state(ch, - CH_DISCONNECTING, CH_DRAINING); + do_reset = srpt_set_ch_state(ch, CH_DRAINING); break; } } @@ -2368,7 +2346,7 @@ static void srpt_cm_rtu_recv(struct ib_cm_id *cm_id) ch = srpt_find_channel(cm_id->context, cm_id); BUG_ON(!ch); - if (srpt_test_and_set_ch_state(ch, CH_CONNECTING, CH_LIVE)) { + if (srpt_set_ch_state(ch, CH_LIVE)) { struct srpt_recv_ioctx *ioctx, *ioctx_tmp; ret = srpt_ch_qp_rts(ch, ch->qp); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/15] IB/srpt: Simplify srpt_handle_tsk_mgmt()
Let the target core check task existence instead of the SRP target driver. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig --- drivers/infiniband/ulp/srpt/ib_srpt.c | 54 ++- 1 file changed, 2 insertions(+), 52 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index fc19203..9cb1a14 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -1554,47 +1554,6 @@ send_sense: return -1; } -/** - * srpt_rx_mgmt_fn_tag() - Process a task management function by tag. - * @ch: RDMA channel of the task management request. - * @fn: Task management function to perform. - * @req_tag: Tag of the SRP task management request. - * @mgmt_ioctx: I/O context of the task management request. - * - * Returns zero if the target core will process the task management - * request asynchronously. - * - * Note: It is assumed that the initiator serializes tag-based task management - * requests. - */ -static int srpt_rx_mgmt_fn_tag(struct srpt_send_ioctx *ioctx, u64 tag) -{ - struct srpt_device *sdev; - struct srpt_rdma_ch *ch; - struct srpt_send_ioctx *target; - int ret, i; - - ret = -EINVAL; - ch = ioctx->ch; - BUG_ON(!ch); - BUG_ON(!ch->sport); - sdev = ch->sport->sdev; - BUG_ON(!sdev); - spin_lock_irq(&sdev->spinlock); - for (i = 0; i < ch->rq_size; ++i) { - target = ch->ioctx_ring[i]; - if (target->cmd.se_lun == ioctx->cmd.se_lun && - target->cmd.tag == tag && - srpt_get_cmd_state(target) != SRPT_STATE_DONE) { - ret = 0; - /* now let the target core abort &target->cmd; */ - break; - } - } - spin_unlock_irq(&sdev->spinlock); - return ret; -} - static int srp_tmr_to_tcm(int fn) { switch (fn) { @@ -1628,7 +1587,6 @@ static void srpt_handle_tsk_mgmt(struct srpt_rdma_ch *ch, struct srp_tsk_mgmt *srp_tsk; struct se_cmd *cmd; struct se_session *sess = ch->sess; - uint32_t tag = 0; int tcm_tmr; int rc; @@ -1649,18 +1607,10 @@ static void srpt_handle_tsk_mgmt(struct srpt_rdma_ch *ch, TMR_TASK_MGMT_FUNCTION_NOT_SUPPORTED; goto fail; } - if (srp_tsk->tsk_mgmt_func == SRP_TSK_ABORT_TASK) { - rc = srpt_rx_mgmt_fn_tag(send_ioctx, srp_tsk->task_tag); - if (rc < 0) { - send_ioctx->cmd.se_tmr_req->response = - TMR_TASK_DOES_NOT_EXIST; - goto fail; - } - tag = srp_tsk->task_tag; - } rc = target_submit_tmr(&send_ioctx->cmd, sess, NULL, scsilun_to_int(&srp_tsk->lun), srp_tsk, tcm_tmr, - GFP_KERNEL, tag, TARGET_SCF_ACK_KREF); + GFP_KERNEL, srp_tsk->task_tag, + TARGET_SCF_ACK_KREF); if (rc != 0) { send_ioctx->cmd.se_tmr_req->response = TMR_FUNCTION_REJECTED; goto fail; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/15] IB/srpt: Use scsilun_to_int()
Just like other target drivers, use scsilun_to_int() to unpack SCSI LUN numbers. This patch only changes the behavior of ib_srpt for LUN numbers >= 16384. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig --- drivers/infiniband/ulp/srpt/ib_srpt.c | 92 +++ 1 file changed, 6 insertions(+), 86 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index fd94780..fc19203 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -1488,80 +1488,6 @@ static int srpt_build_tskmgmt_rsp(struct srpt_rdma_ch *ch, return resp_len; } -#define NO_SUCH_LUN ((uint64_t)-1LL) - -/* - * SCSI LUN addressing method. See also SAM-2 and the section about - * eight byte LUNs. - */ -enum scsi_lun_addr_method { - SCSI_LUN_ADDR_METHOD_PERIPHERAL = 0, - SCSI_LUN_ADDR_METHOD_FLAT = 1, - SCSI_LUN_ADDR_METHOD_LUN = 2, - SCSI_LUN_ADDR_METHOD_EXTENDED_LUN = 3, -}; - -/* - * srpt_unpack_lun() - Convert from network LUN to linear LUN. - * - * Convert an 2-byte, 4-byte, 6-byte or 8-byte LUN structure in network byte - * order (big endian) to a linear LUN. Supports three LUN addressing methods: - * peripheral, flat and logical unit. See also SAM-2, section 4.9.4 (page 40). - */ -static uint64_t srpt_unpack_lun(const uint8_t *lun, int len) -{ - uint64_t res = NO_SUCH_LUN; - int addressing_method; - - if (unlikely(len < 2)) { - pr_err("Illegal LUN length %d, expected 2 bytes or more\n", - len); - goto out; - } - - switch (len) { - case 8: - if ((*((__be64 *)lun) & -cpu_to_be64(0xLL)) != 0) - goto out_err; - break; - case 4: - if (*((__be16 *)&lun[2]) != 0) - goto out_err; - break; - case 6: - if (*((__be32 *)&lun[2]) != 0) - goto out_err; - break; - case 2: - break; - default: - goto out_err; - } - - addressing_method = (*lun) >> 6; /* highest two bits of byte 0 */ - switch (addressing_method) { - case SCSI_LUN_ADDR_METHOD_PERIPHERAL: - case SCSI_LUN_ADDR_METHOD_FLAT: - case SCSI_LUN_ADDR_METHOD_LUN: - res = *(lun + 1) | (((*lun) & 0x3f) << 8); - break; - - case SCSI_LUN_ADDR_METHOD_EXTENDED_LUN: - default: - pr_err("Unimplemented LUN addressing method %u\n", - addressing_method); - break; - } - -out: - return res; - -out_err: - pr_err("Support for multi-level LUNs has not yet been implemented\n"); - goto out; -} - static int srpt_check_stop_free(struct se_cmd *cmd) { struct srpt_send_ioctx *ioctx = container_of(cmd, @@ -1579,7 +1505,6 @@ static int srpt_handle_cmd(struct srpt_rdma_ch *ch, { struct se_cmd *cmd; struct srp_cmd *srp_cmd; - uint64_t unpacked_lun; u64 data_len; enum dma_data_direction dir; sense_reason_t ret; @@ -1614,11 +1539,10 @@ static int srpt_handle_cmd(struct srpt_rdma_ch *ch, goto send_sense; } - unpacked_lun = srpt_unpack_lun((uint8_t *)&srp_cmd->lun, - sizeof(srp_cmd->lun)); rc = target_submit_cmd(cmd, ch->sess, srp_cmd->cdb, - &send_ioctx->sense_data[0], unpacked_lun, data_len, - TCM_SIMPLE_TAG, dir, TARGET_SCF_ACK_KREF); + &send_ioctx->sense_data[0], + scsilun_to_int(&srp_cmd->lun), data_len, + TCM_SIMPLE_TAG, dir, TARGET_SCF_ACK_KREF); if (rc != 0) { ret = TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; goto send_sense; @@ -1704,7 +1628,6 @@ static void srpt_handle_tsk_mgmt(struct srpt_rdma_ch *ch, struct srp_tsk_mgmt *srp_tsk; struct se_cmd *cmd; struct se_session *sess = ch->sess; - uint64_t unpacked_lun; uint32_t tag = 0; int tcm_tmr; int rc; @@ -1726,9 +1649,6 @@ static void srpt_handle_tsk_mgmt(struct srpt_rdma_ch *ch, TMR_TASK_MGMT_FUNCTION_NOT_SUPPORTED; goto fail; } - unpacked_lun = srpt_unpack_lun((uint8_t *)&srp_tsk->lun, - sizeof(srp_tsk->lun)); - if (srp_tsk->tsk_mgmt_func == SRP_TSK_ABORT_TASK) { rc = srpt_rx_mgmt_fn_tag(send_ioctx, srp_tsk->task_tag); if (rc < 0) { @@ -1738,9 +1658,9 @@ static void srpt_handle_tsk_mgmt(struct srpt_rdma_ch *ch, }
[PATCH 04/15] IB/srpt: Introduce target_reverse_dma_direction()
Use the function target_reverse_dma_direction() instead of reimplementing it. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig --- drivers/infiniband/ulp/srpt/ib_srpt.c | 17 ++--- 1 file changed, 2 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index cbcc73e..fd94780 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -96,19 +96,6 @@ static int srpt_queue_status(struct se_cmd *cmd); static void srpt_recv_done(struct ib_cq *cq, struct ib_wc *wc); static void srpt_send_done(struct ib_cq *cq, struct ib_wc *wc); -/** - * opposite_dma_dir() - Swap DMA_TO_DEVICE and DMA_FROM_DEVICE. - */ -static inline -enum dma_data_direction opposite_dma_dir(enum dma_data_direction dir) -{ - switch (dir) { - case DMA_TO_DEVICE: return DMA_FROM_DEVICE; - case DMA_FROM_DEVICE: return DMA_TO_DEVICE; - default:return dir; - } -} - static enum rdma_ch_state srpt_set_ch_state(struct srpt_rdma_ch *ch, enum rdma_ch_state new_state) { @@ -1048,7 +1035,7 @@ static void srpt_unmap_sg_to_ib_sge(struct srpt_rdma_ch *ch, dir = ioctx->cmd.data_direction; BUG_ON(dir == DMA_NONE); ib_dma_unmap_sg(ch->sport->sdev->device, sg, ioctx->sg_cnt, - opposite_dma_dir(dir)); + target_reverse_dma_direction(&ioctx->cmd)); ioctx->mapped_sg_count = 0; } } @@ -1085,7 +1072,7 @@ static int srpt_map_sg_to_ib_sge(struct srpt_rdma_ch *ch, ioctx->sg_cnt = sg_cnt = cmd->t_data_nents; count = ib_dma_map_sg(ch->sport->sdev->device, sg, sg_cnt, - opposite_dma_dir(dir)); + target_reverse_dma_direction(cmd)); if (unlikely(!count)) return -EAGAIN; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/15] IB/srpt: Inline srpt_get_ch_state()
The callers of srpt_get_ch_state() can access ch->state safely without using locking. Hence inline this function. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig --- drivers/infiniband/ulp/srpt/ib_srpt.c | 42 ++- 1 file changed, 12 insertions(+), 30 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 7abad26..cbcc73e 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -109,17 +109,6 @@ enum dma_data_direction opposite_dma_dir(enum dma_data_direction dir) } } -static enum rdma_ch_state srpt_get_ch_state(struct srpt_rdma_ch *ch) -{ - unsigned long flags; - enum rdma_ch_state state; - - spin_lock_irqsave(&ch->spinlock, flags); - state = ch->state; - spin_unlock_irqrestore(&ch->spinlock, flags); - return state; -} - static enum rdma_ch_state srpt_set_ch_state(struct srpt_rdma_ch *ch, enum rdma_ch_state new_state) { @@ -216,7 +205,7 @@ static void srpt_srq_event(struct ib_event *event, void *ctx) static void srpt_qp_event(struct ib_event *event, struct srpt_rdma_ch *ch) { pr_debug("QP event %d on cm_id=%p sess_name=%s state=%d\n", -event->event, ch->cm_id, ch->sess_name, srpt_get_ch_state(ch)); +event->event, ch->cm_id, ch->sess_name, ch->state); switch (event->event) { case IB_EVENT_COMM_EST: @@ -228,7 +217,7 @@ static void srpt_qp_event(struct ib_event *event, struct srpt_rdma_ch *ch) srpt_release_channel(ch); else pr_debug("%s: state %d - ignored LAST_WQE.\n", -ch->sess_name, srpt_get_ch_state(ch)); +ch->sess_name, ch->state); break; default: pr_err("received unrecognized IB QP event %d\n", event->event); @@ -1784,7 +1773,6 @@ static void srpt_handle_new_iu(struct srpt_rdma_ch *ch, struct srpt_send_ioctx *send_ioctx) { struct srp_cmd *srp_cmd; - enum rdma_ch_state ch_state; BUG_ON(!ch); BUG_ON(!recv_ioctx); @@ -1793,13 +1781,12 @@ static void srpt_handle_new_iu(struct srpt_rdma_ch *ch, recv_ioctx->ioctx.dma, srp_max_req_size, DMA_FROM_DEVICE); - ch_state = srpt_get_ch_state(ch); - if (unlikely(ch_state == CH_CONNECTING)) { + if (unlikely(ch->state == CH_CONNECTING)) { list_add_tail(&recv_ioctx->wait_list, &ch->cmd_wait_list); goto out; } - if (unlikely(ch_state != CH_LIVE)) + if (unlikely(ch->state != CH_LIVE)) goto out; srp_cmd = recv_ioctx->ioctx.buf; @@ -1908,7 +1895,7 @@ static void srpt_send_done(struct ib_cq *cq, struct ib_wc *wc) out: while (!list_empty(&ch->cmd_wait_list) && - srpt_get_ch_state(ch) == CH_LIVE && + ch->state == CH_LIVE && (ioctx = srpt_get_send_ioctx(ch)) != NULL) { struct srpt_recv_ioctx *recv_ioctx; @@ -2314,17 +2301,14 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id, && param->port == ch->sport->port && param->listen_id == ch->sport->sdev->cm_id && ch->cm_id) { - enum rdma_ch_state ch_state; - - ch_state = srpt_get_ch_state(ch); - if (ch_state != CH_CONNECTING - && ch_state != CH_LIVE) + if (ch->state != CH_CONNECTING + && ch->state != CH_LIVE) continue; /* found an existing channel */ pr_debug("Found existing channel %s" " cm_id= %p state= %d\n", -ch->sess_name, ch->cm_id, ch_state); +ch->sess_name, ch->cm_id, ch->state); __srpt_close_ch(ch); @@ -2566,7 +2550,7 @@ static void srpt_cm_dreq_recv(struct ib_cm_id *cm_id) ch = srpt_find_channel(cm_id->context, cm_id); BUG_ON(!ch); - pr_debug("cm_id= %p ch->state= %d\n", cm_id, srpt_get_ch_state(ch)); + pr_debug("cm_id= %p ch->state= %d\n", cm_id, ch->state); spin_lock_irqsave(&ch->spinlock, flags); switch (ch->state) { @@ -2750,7 +2734,6 @@ static in
[PATCH 02/15] IB/srpt: Inline srpt_sdev_name()
srpt_sdev_name() is too trivial to keep it as a separate function. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig --- drivers/infiniband/ulp/srpt/ib_srpt.c | 14 ++ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index fe9474f..7abad26 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -109,16 +109,6 @@ enum dma_data_direction opposite_dma_dir(enum dma_data_direction dir) } } -/** - * srpt_sdev_name() - Return the name associated with the HCA. - * - * Examples are ib0, ib1, ... - */ -static inline const char *srpt_sdev_name(struct srpt_device *sdev) -{ - return sdev->device->name; -} - static enum rdma_ch_state srpt_get_ch_state(struct srpt_rdma_ch *ch) { unsigned long flags; @@ -182,7 +172,7 @@ static void srpt_event_handler(struct ib_event_handler *handler, return; pr_debug("ASYNC event= %d on device= %s\n", event->event, -srpt_sdev_name(sdev)); +sdev->device->name); switch (event->event) { case IB_EVENT_PORT_ERR: @@ -3089,7 +3079,7 @@ static void srpt_add_one(struct ib_device *device) if (srpt_refresh_port(sport)) { pr_err("MAD registration failed for %s-%d.\n", - srpt_sdev_name(sdev), i); + sdev->device->name, i); goto err_ring; } snprintf(sport->port_guid, sizeof(sport->port_guid), -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/15] IB/srpt: Add parentheses around sizeof argument
Although sizeof is an operator and hence in many cases parentheses can be left out, the recommended kernel coding style is to surround the sizeof argument with parentheses. This patch does not change any functionality. This patch has been generated by running the following shell command: sed -i 's/sizeof \([^ );,]*\)/sizeof(\1)/g' drivers/infiniband/ulp/srpt/*.[ch] Signed-off-by: Bart Van Assche Cc: Christoph Hellwig --- drivers/infiniband/ulp/srpt/ib_srpt.c | 42 +-- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 0c3ed7c..fe9474f 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -281,7 +281,7 @@ static void srpt_get_class_port_info(struct ib_dm_mad *mad) struct ib_class_port_info *cif; cif = (struct ib_class_port_info *)mad->data; - memset(cif, 0, sizeof *cif); + memset(cif, 0, sizeof(*cif)); cif->base_version = 1; cif->class_version = 1; cif->resp_time_value = 20; @@ -340,7 +340,7 @@ static void srpt_get_ioc(struct srpt_port *sport, u32 slot, return; } - memset(iocp, 0, sizeof *iocp); + memset(iocp, 0, sizeof(*iocp)); strcpy(iocp->id_string, SRPT_ID_STRING); iocp->guid = cpu_to_be64(srpt_service_guid); iocp->vendor_id = cpu_to_be32(sdev->dev_attr.vendor_id); @@ -390,7 +390,7 @@ static void srpt_get_svc_entries(u64 ioc_guid, } svc_entries = (struct ib_dm_svc_entries *)mad->data; - memset(svc_entries, 0, sizeof *svc_entries); + memset(svc_entries, 0, sizeof(*svc_entries)); svc_entries->service_entries[0].id = cpu_to_be64(ioc_guid); snprintf(svc_entries->service_entries[0].name, sizeof(svc_entries->service_entries[0].name), @@ -483,7 +483,7 @@ static void srpt_mad_recv_handler(struct ib_mad_agent *mad_agent, rsp->ah = ah; dm_mad = rsp->mad; - memcpy(dm_mad, mad_wc->recv_buf.mad, sizeof *dm_mad); + memcpy(dm_mad, mad_wc->recv_buf.mad, sizeof(*dm_mad)); dm_mad->mad_hdr.method = IB_MGMT_METHOD_GET_RESP; dm_mad->mad_hdr.status = 0; @@ -531,7 +531,7 @@ static int srpt_refresh_port(struct srpt_port *sport) struct ib_port_attr port_attr; int ret; - memset(&port_modify, 0, sizeof port_modify); + memset(&port_modify, 0, sizeof(port_modify)); port_modify.set_port_cap_mask = IB_PORT_DEVICE_MGMT_SUP; port_modify.clr_port_cap_mask = 0; @@ -552,7 +552,7 @@ static int srpt_refresh_port(struct srpt_port *sport) goto err_query_port; if (!sport->mad_agent) { - memset(®_req, 0, sizeof reg_req); + memset(®_req, 0, sizeof(reg_req)); reg_req.mgmt_class = IB_MGMT_CLASS_DEVICE_MGMT; reg_req.mgmt_class_version = IB_MGMT_BASE_VERSION; set_bit(IB_MGMT_METHOD_GET, reg_req.method_mask); @@ -902,14 +902,14 @@ static int srpt_get_desc_tbl(struct srpt_send_ioctx *ioctx, db = (struct srp_direct_buf *)(srp_cmd->add_data + add_cdb_offset); - memcpy(ioctx->rbufs, db, sizeof *db); + memcpy(ioctx->rbufs, db, sizeof(*db)); *data_len = be32_to_cpu(db->len); } else if (((srp_cmd->buf_fmt & 0xf) == SRP_DATA_DESC_INDIRECT) || ((srp_cmd->buf_fmt >> 4) == SRP_DATA_DESC_INDIRECT)) { idb = (struct srp_indirect_buf *)(srp_cmd->add_data + add_cdb_offset); - ioctx->n_rbuf = be32_to_cpu(idb->table_desc.len) / sizeof *db; + ioctx->n_rbuf = be32_to_cpu(idb->table_desc.len) / sizeof(*db); if (ioctx->n_rbuf > (srp_cmd->data_out_desc_cnt + srp_cmd->data_in_desc_cnt)) { @@ -928,7 +928,7 @@ static int srpt_get_desc_tbl(struct srpt_send_ioctx *ioctx, ioctx->rbufs = &ioctx->single_rbuf; else { ioctx->rbufs = - kmalloc(ioctx->n_rbuf * sizeof *db, GFP_ATOMIC); + kmalloc(ioctx->n_rbuf * sizeof(*db), GFP_ATOMIC); if (!ioctx->rbufs) { ioctx->n_rbuf = 0; ret = -ENOMEM; @@ -937,7 +937,7 @@ static int srpt_get_desc_tbl(struct srpt_send_ioctx *ioctx, } db = idb->desc_list; - memcpy(ioctx->rbufs, db, ioctx->n_rbuf * sizeof *db); + memcpy(ioctx->rbufs, db, ioctx->n_rbuf * sizeof(*db)); *data_len = be32
[PATCH 00/15] Various ib_srpt patches
The following series of patches is what I came up with while testing the most recent version of my SCSI target patch series (see also http://thread.gmane.org/gmane.linux.scsi.target.devel/10905): 0001-IB-srpt-Add-parentheses-around-sizeof-argument.patch 0002-IB-srpt-Inline-srpt_sdev_name.patch 0003-IB-srpt-Inline-srpt_get_ch_state.patch 0004-IB-srpt-Introduce-target_reverse_dma_direction.patch 0005-IB-srpt-Use-scsilun_to_int.patch 0006-IB-srpt-Simplify-srpt_handle_tsk_mgmt.patch 0007-IB-srpt-Simplify-channel-state-management.patch 0008-IB-srpt-Simplify-srpt_shutdown_session.patch 0009-IB-srpt-Fix-srpt_close_session.patch 0010-IB-srpt-Fix-srpt_handle_cmd-error-paths.patch 0011-IB-srpt-Fix-how-aborted-commands-are-processed.patch 0012-IB-srpt-Eliminate-srpt_find_channel.patch 0013-IB-srpt-Detect-session-shutdown-reliably.patch 0014-IB-srpt-Fix-srpt_write_pending.patch 0015-IB-srpt-Fix-a-rare-crash-in-srpt_close_session.patch -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/sysfs: Fix sparse warning on attr_id
On 01/04/2016 04:44 AM, ira.we...@intel.com wrote: From: Ira Weiny Attributed ID was declared as an int while the value should really be big endian 16. Fixes: 35c4cbb17811 ("IB/core: Create get_perf_mad function in sysfs.c") Reported-by: Bart Van Assche Signed-off-by: Ira Weiny Reviewed-by: Bart Van Assche -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] IB/srpt: Fix a race condition related to SRP login
On 01/03/2016 12:25 PM, Sagi Grimberg wrote: But now the first I/O(s) could be lost if no other I/O comes in, right? I suspect that we need to keep this loop to protect against such corner cases. It can happen theoretically, but why do we even bother? Why not just post the recv buffer after we moved in to CH_LIVE? This why we let the RC transport handle the "Recv-Not-Ready" NAK by retrying? Making wait list processing in the RTU handler safe would require to introduce additional locking. Posting receive buffers after the RTU event has been received would introduce another race condition because then it can happen that an initiator starts sending data before the receive buffers have been posted. A possible solution would be to trigger the receive handler after the RTU event has been received in such a way that all invocations of the receive handler are still serialized. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/cm: Fix a recently introduced deadlock
ib_send_cm_drep() calls cm_enter_timewait() while holding a spinlock that can be locked from inside an interrupt handler. Hence do not enable interrupts inside cm_enter_timewait() if called with interrupts disabled. This patch fixes e.g. the following deadlock: = [ INFO: inconsistent lock state ] 4.4.0-rc7+ #1 Tainted: GE - inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes: (&(&cm_id_priv->lock)->rlock){?.+...}, at: [] cm_establish+0x 74/0x1b0 [ib_cm] {HARDIRQ-ON-W} state was registered at: [] mark_held_locks+0x71/0x90 [] trace_hardirqs_on_caller+0xa7/0x1c0 [] trace_hardirqs_on+0xd/0x10 [] _raw_spin_unlock_irq+0x2b/0x40 [] cm_enter_timewait+0xae/0x100 [ib_cm] [] ib_send_cm_drep+0xb6/0x190 [ib_cm] [] srp_cm_handler+0x128/0x1a0 [ib_srp] [] cm_process_work+0x20/0xf0 [ib_cm] [] cm_dreq_handler+0x135/0x2c0 [ib_cm] [] cm_work_handler+0x75/0xd0 [ib_cm] [] process_one_work+0x1bd/0x460 [] worker_thread+0x118/0x420 [] kthread+0xe4/0x100 [] ret_from_fork+0x3f/0x70 irq event stamp: 1672286 hardirqs last enabled at (1672283): [] poll_idle+0x10/0x80 hardirqs last disabled at (1672284): [] common_interrupt+0x84/0x89 softirqs last enabled at (1672286): [] _local_bh_enable+0x1c/0x50 softirqs last disabled at (1672285): [] irq_enter+0x47/0x70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(&(&cm_id_priv->lock)->rlock); lock(&(&cm_id_priv->lock)->rlock); *** DEADLOCK *** no locks held by swapper/8/0. stack backtrace: CPU: 8 PID: 0 Comm: swapper/8 Tainted: GE 4.4.0-rc7+ #1 Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014 88045af5e950 88046e503a88 81251c1b 0007 0006 0003 88045af5ddc0 88046e503ad8 810a32f4 0001 Call Trace: [] dump_stack+0x4f/0x74 [] print_usage_bug+0x184/0x190 [] mark_lock_irq+0xf2/0x290 [] mark_lock+0x115/0x1b0 [] mark_irqflags+0x15c/0x170 [] __lock_acquire+0x1ef/0x560 [] lock_acquire+0x62/0x80 [] _raw_spin_lock_irqsave+0x43/0x60 [] cm_establish+0x74/0x1b0 [ib_cm] [] ib_cm_notify+0x31/0x100 [ib_cm] [] srpt_qp_event+0x54/0xd0 [ib_srpt] [] mlx4_ib_qp_event+0x72/0xc0 [mlx4_ib] [] mlx4_qp_event+0x69/0xd0 [mlx4_core] [] mlx4_eq_int+0x51e/0xd50 [mlx4_core] [] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core] [] handle_irq_event_percpu+0x40/0x110 [] handle_irq_event+0x3f/0x70 [] handle_edge_irq+0x79/0x120 [] handle_irq+0x5d/0x130 [] do_IRQ+0x6d/0x130 [] common_interrupt+0x89/0x89 [] cpuidle_enter_state+0xcf/0x200 [] cpuidle_enter+0x12/0x20 [] call_cpuidle+0x36/0x60 [] cpuidle_idle_call+0x63/0x110 [] cpu_idle_loop+0xfa/0x130 [] cpu_startup_entry+0xe/0x10 [] start_secondary+0x83/0x90 Fixes: commit be4b499323bf ("IB/cm: Do not queue work to a device that's going away") Signed-off-by: Bart Van Assche Cc: Erez Shitrit Cc: stable --- drivers/infiniband/core/cm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 0a26dd6..d6d2b35 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -782,11 +782,11 @@ static void cm_enter_timewait(struct cm_id_private *cm_id_priv) wait_time = cm_convert_to_ms(cm_id_priv->av.timeout); /* Check if the device started its remove_one */ - spin_lock_irq(&cm.lock); + spin_lock_irqsave(&cm.lock, flags); if (!cm_dev->going_down) queue_delayed_work(cm.wq, &cm_id_priv->timewait_info->work.work, msecs_to_jiffies(wait_time)); - spin_unlock_irq(&cm.lock); + spin_unlock_irqrestore(&cm.lock, flags); cm_id_priv->timewait_info = NULL; } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] IB/srpt: Fix a race condition related to SRP login
Since patch "IB/srpt: chain RDMA READ/WRITE requests" there are two loops that process the command wait list. ch->cmd_wait_list is accessed without locking which means that all code that accesses this list must be serialized. Since processing of the RTU event happens from another context than IB WC processing, remove the wait list processing code from the RTU handler. Fixes: commit a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1"). Signed-off-by: Bart Van Assche Cc: Christoph Hellwig --- drivers/infiniband/ulp/srpt/ib_srpt.c | 8 1 file changed, 8 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index b42cf0d..48b27dc 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -2540,15 +2540,7 @@ static void srpt_cm_rtu_recv(struct ib_cm_id *cm_id) BUG_ON(!ch); if (srpt_test_and_set_ch_state(ch, CH_CONNECTING, CH_LIVE)) { - struct srpt_recv_ioctx *ioctx, *ioctx_tmp; - ret = srpt_ch_qp_rts(ch, ch->qp); - - list_for_each_entry_safe(ioctx, ioctx_tmp, &ch->cmd_wait_list, -wait_list) { - list_del(&ioctx->wait_list); - srpt_handle_new_iu(ch, ioctx, NULL); - } if (ret) srpt_close_ch(ch); } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] IB/srpt: Fix the RDMA completion handlers
Avoid that the following kernel crash is triggered when processing an RDMA completion: BUG: unable to handle kernel paging request at 00010198 IP: [] __lock_acquire+0xa2/0x560 Call Trace: [] lock_acquire+0x62/0x80 [] _raw_spin_lock_irqsave+0x43/0x60 [] srpt_rdma_read_done+0x57/0x120 [ib_srpt] [] __ib_process_cq+0x43/0xc0 [ib_core] [] ib_cq_poll_work+0x25/0x70 [ib_core] [] process_one_work+0x1bd/0x460 [] worker_thread+0x118/0x420 [] kthread+0xe4/0x100 [] ret_from_fork+0x3f/0x70 Fixes: commit 59fae4deaad3 ("IB/srpt: chain RDMA READ/WRITE requests"). Signed-off-by: Bart Van Assche Reviewed-by: Christoph Hellwig --- drivers/infiniband/ulp/srpt/ib_srpt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 105512d..b42cf0d 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -1395,7 +1395,7 @@ static void srpt_rdma_read_done(struct ib_cq *cq, struct ib_wc *wc) { struct srpt_rdma_ch *ch = cq->cq_context; struct srpt_send_ioctx *ioctx = - container_of(wc->wr_cqe, struct srpt_send_ioctx, ioctx.cqe); + container_of(wc->wr_cqe, struct srpt_send_ioctx, rdma_cqe); WARN_ON(ioctx->n_rdma <= 0); atomic_add(ioctx->n_rdma, &ch->sq_wr_avail); @@ -1418,7 +1418,7 @@ static void srpt_rdma_read_done(struct ib_cq *cq, struct ib_wc *wc) static void srpt_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc) { struct srpt_send_ioctx *ioctx = - container_of(wc->wr_cqe, struct srpt_send_ioctx, ioctx.cqe); + container_of(wc->wr_cqe, struct srpt_send_ioctx, rdma_cqe); if (unlikely(wc->status != IB_WC_SUCCESS)) { pr_info("RDMA_WRITE for ioctx 0x%p failed with status %d\n", -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] irq_poll: Fix irq_poll_sched()
The IRQ_POLL_F_SCHED bit is set as long as polling is ongoing. This means that irq_poll_sched() must proceed if this bit has not yet been set. Fixes: commit ea51190c0315 ("irq_poll: fold irq_poll_sched_prep into irq_poll_sched"). Signed-off-by: Bart Van Assche Reviewed-by: Christoph Hellwig --- lib/irq_poll.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/irq_poll.c b/lib/irq_poll.c index 2836620..836f7db 100644 --- a/lib/irq_poll.c +++ b/lib/irq_poll.c @@ -29,7 +29,7 @@ void irq_poll_sched(struct irq_poll *iop) if (test_bit(IRQ_POLL_F_DISABLE, &iop->state)) return; - if (!test_and_set_bit(IRQ_POLL_F_SCHED, &iop->state)) + if (test_and_set_bit(IRQ_POLL_F_SCHED, &iop->state)) return; local_irq_save(flags); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] Three SRP driver patches for kernel 4.5
Hello Doug, This series of three patches fixes new issues that exist in the git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git/k.o/for-4.5 branch. It would be appreciated if these patches could be included in the kernel 4.5 RDMA pull request. 0001-irq_poll-Fix-irq_poll_sched.patch 0002-IB-srpt-Fix-the-RDMA-completion-handlers.patch 0003-IB-srpt-Fix-a-race-condition-related-to-SRP-login.patch Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git/k.o/for-4.5 crash during boot
On 12/30/2015 01:34 PM, Leon Romanovsky wrote: On Wed, Dec 30, 2015 at 02:22:23PM +0200, Or Gerlitz wrote: On 12/30/2015 2:04 PM, Bart Van Assche wrote: Hello Christoph, Can you check whether the branch in the subject of this e-mail works fine on your setup (commit 59caaed7a7) ? On my test setup (Dell R430 with two ConnectX-3 adapters) this branch crashes during boot in get_counter_table() (see also the attached screenshot). Can you please check with Hal's fix that was posted here 1-2 days ago, hopefully this should make your system to work Or referenced to this patch [1], it should fix your crash. [1] https://patchwork.kernel.org/patch/7929551/ Hello Or and Leon, That patch indeed fixes the crash I had reported. Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/core: Allocating larger memory than required for cma_configfs
On 12/30/2015 03:14 PM, Matan Barak wrote: We were allocating larger memory space than requried for cma_dev_group->default_ports_group. Please change the subject into something like "Do not allocate more ...". Please also fix the spelling error in the patch description. Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git/k.o/for-4.5 crash during boot
Hello Christoph, Can you check whether the branch in the subject of this e-mail works fine on your setup (commit 59caaed7a7) ? On my test setup (Dell R430 with two ConnectX-3 adapters) this branch crashes during boot in get_counter_table() (see also the attached screenshot). Thanks, Bart.
Re: [PATCH for-next 2/3] IB/core: Change per-entry lock in RoCE GID table to one lock
On 12/30/2015 07:01 AM, Or Gerlitz wrote: On 10/28/2015 4:52 PM, Matan Barak wrote: @@ -134,16 +138,14 @@ static int write_gid(struct ib_device *ib_dev, u8 port, { int ret = 0; struct net_device *old_net_dev; -unsigned long flags; /* in rdma_cap_roce_gid_table, this funciton should be protected by a * sleep-able lock. */ -write_lock_irqsave(&table->data_vec[ix].lock, flags); if (rdma_cap_roce_gid_table(ib_dev, port)) { table->data_vec[ix].props |= GID_TABLE_ENTRY_INVALID; -write_unlock_irqrestore(&table->data_vec[ix].lock, flags); +write_unlock_irq(&table->rwlock); /* GID_TABLE_WRITE_ACTION_MODIFY currently isn't supported by * RoCE providers and thus only updates the cache. */ @@ -153,7 +155,7 @@ static int write_gid(struct ib_device *ib_dev, u8 port, else if (action == GID_TABLE_WRITE_ACTION_DEL) ret = ib_dev->del_gid(ib_dev, port, ix, &table->data_vec[ix].context); -write_lock_irqsave(&table->data_vec[ix].lock, flags); +write_lock_irq(&table->rwlock); } sparse complains on drivers/infiniband/core/cache.c:186:17: warning: context imbalance in 'write_gid' - unexpected unlock is this false positive? Hello Or, sparse expects __release() and __acquire() annotations for functions that unlock a lock object that has been locked by its caller. See e.g. http://lists.kernelnewbies.org/pipermail/kernelnewbies/2011-October/003541.html. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/core: Remove a set-but-not-used variable from ib_sg_to_pages()
Detected this by building the IB core with W=1. See also patch "IB core: Fix ib_sg_to_pages()" (commit 8f5ba10ed40a). Signed-off-by: Bart Van Assche Cc: Sagi Grimberg Cc: Christoph Hellwig --- drivers/infiniband/core/verbs.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 545906d..c90ed29 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1530,7 +1530,7 @@ int ib_sg_to_pages(struct ib_mr *mr, int (*set_page)(struct ib_mr *, u64)) { struct scatterlist *sg; - u64 last_end_dma_addr = 0, last_page_addr = 0; + u64 last_end_dma_addr = 0; unsigned int last_page_off = 0; u64 page_mask = ~((u64)mr->page_size - 1); int i, ret; @@ -1572,7 +1572,6 @@ next_page: mr->length += dma_len; last_end_dma_addr = end_dma_addr; - last_page_addr = end_dma_addr & page_mask; last_page_off = end_dma_addr & ~page_mask; } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] IB/srpt: chain RDMA READ/WRITE requests
On 12/07/2015 09:51 PM, Christoph Hellwig wrote: > Remove struct rdma_iu and instead allocate the struct ib_rdma_wr array > early and fill out directly. This allows us to chain the WRs, and thus > archive both less lock contention on the HCA workqueue as well as much > simpler error handling. Please consider folding the patch below into this patch. Thanks, Bart. [PATCH] IB/srpt: Fix a recently introduced kernel crash BUG: unable to handle kernel paging request at 00010198 IP: [] __lock_acquire+0xa2/0x560 Call Trace: [] lock_acquire+0x62/0x80 [] _raw_spin_lock_irqsave+0x43/0x60 [] srpt_rdma_read_done+0x57/0x120 [ib_srpt] [] __ib_process_cq+0x43/0xc0 [ib_core] [] ib_cq_poll_work+0x25/0x70 [ib_core] [] process_one_work+0x1bd/0x460 [] worker_thread+0x118/0x420 [] kthread+0xe4/0x100 [] ret_from_fork+0x3f/0x70 --- drivers/infiniband/ulp/srpt/ib_srpt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 8068aff..3daab39 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -1395,7 +1395,7 @@ static void srpt_rdma_read_done(struct ib_cq *cq, struct ib_wc *wc) { struct srpt_rdma_ch *ch = cq->cq_context; struct srpt_send_ioctx *ioctx = - container_of(wc->wr_cqe, struct srpt_send_ioctx, ioctx.cqe); + container_of(wc->wr_cqe, struct srpt_send_ioctx, rdma_cqe); WARN_ON(ioctx->n_rdma <= 0); atomic_add(ioctx->n_rdma, &ch->sq_wr_avail); @@ -1418,7 +1418,7 @@ static void srpt_rdma_read_done(struct ib_cq *cq, struct ib_wc *wc) static void srpt_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc) { struct srpt_send_ioctx *ioctx = - container_of(wc->wr_cqe, struct srpt_send_ioctx, ioctx.cqe); + container_of(wc->wr_cqe, struct srpt_send_ioctx, rdma_cqe); if (unlikely(wc->status != IB_WC_SUCCESS)) { pr_info("RDMA_WRITE for ioctx 0x%p failed with status %d\n", -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/13] irq_poll: fold irq_poll_sched_prep into irq_poll_sched
On 12/07/2015 09:51 PM, Christoph Hellwig wrote: diff --git a/lib/irq_poll.c b/lib/irq_poll.c index 88af879..13cb149 100644 --- a/lib/irq_poll.c +++ b/lib/irq_poll.c @@ -21,13 +21,17 @@ static DEFINE_PER_CPU(struct list_head, blk_cpu_iopoll); * * Description: * Add this irq_poll structure to the pending poll list and trigger the - * raise of the blk iopoll softirq. The driver must already have gotten a - * successful return from irq_poll_sched_prep() before calling this. + * raise of the blk iopoll softirq. **/ void irq_poll_sched(struct irq_poll *iop) { unsigned long flags; + if (test_bit(IRQ_POLL_F_DISABLE, &iop->state)) + return; + if (!test_and_set_bit(IRQ_POLL_F_SCHED, &iop->state)) + return; + local_irq_save(flags); list_add_tail(&iop->list, this_cpu_ptr(&blk_cpu_iopoll)); __raise_softirq_irqoff(IRQ_POLL_SOFTIRQ); After having applied these changes the SRP initiator didn't receive any RDMA completions anymore. I could remedy that by changing "!test_and_set_bit()" into "test_and_set_bit()": diff --git a/lib/irq_poll.c b/lib/irq_poll.c index 43a3370..3a67019 100644 --- a/lib/irq_poll.c +++ b/lib/irq_poll.c @@ -29,7 +29,7 @@ void irq_poll_sched(struct irq_poll *iop) if (test_bit(IRQ_POLL_F_DISABLE, &iop->state)) return; - if (!test_and_set_bit(IRQ_POLL_F_SCHED, &iop->state)) + if (test_and_set_bit(IRQ_POLL_F_SCHED, &iop->state)) return; local_irq_save(flags); -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: completion queue abstraction V2
On 12/07/2015 09:51 PM, Christoph Hellwig wrote: This series adds a new RDMA core abstraction that insulated the ULPs from the nitty gritty details of CQ polling. See the individual patches for more details. Hello Christoph, After having tested the SRP initiator and target drivers with this patch series applied I have further feedback about this patch series. I will provide that feedback as replies to the individual patches. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10] IB: remove the unused usecnt field from struct ib_mr
On 12/18/2015 02:55 PM, Christoph Hellwig wrote: Signed-off-by: Christoph Hellwig Shouldn't the description of this patch be changed into something like "Remove the usecnt field from ib_mr since it is always zero" ? Anyway: Reviewed-by: Bart Van Assche -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH rdma-next V1 3/7] IB/ulps: Avoid calling ib_query_device
On 12/18/2015 09:59 AM, Or Gerlitz wrote: Instead, use the cached copy of the attributes present on the device. Signed-off-by: Or Gerlitz --- drivers/infiniband/ulp/ipoib/ipoib_cm.c | 19 --- drivers/infiniband/ulp/ipoib/ipoib_ethtool.c | 14 +++-- drivers/infiniband/ulp/ipoib/ipoib_main.c| 21 + drivers/infiniband/ulp/iser/iscsi_iser.c | 4 +-- drivers/infiniband/ulp/iser/iscsi_iser.h | 2 -- drivers/infiniband/ulp/iser/iser_memory.c| 9 +++--- drivers/infiniband/ulp/iser/iser_verbs.c | 38 ++ drivers/infiniband/ulp/isert/ib_isert.c | 47 +--- drivers/infiniband/ulp/isert/ib_isert.h | 1 - drivers/infiniband/ulp/srp/ib_srp.c | 32 ++- drivers/infiniband/ulp/srpt/ib_srpt.c| 15 - drivers/infiniband/ulp/srpt/ib_srpt.h| 3 -- 12 files changed, 63 insertions(+), 142 deletions(-) Shouldn't this patch be split per ULP driver to make review easier ? If you have a look at the MAINTAINERS file then you will see that you forgot to CC the SRP initiator maintainer. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH rdma-next V1 0/7] dev attr cleanup (less is more)
On 12/18/2015 09:59 AM, Or Gerlitz wrote: OK, Doug, this is my suggestion for the dev attr cleanup -- it has the advantages of leaving the attrs on a well defined location, a field in the IB device, the ability to get that through smaller patches, avoid touching any of the HW drivers, etc. Hello Or, This version of the dev attr cleanup patches (just like the previous version) makes kernel code access the attrs member of struct ib_device while user space code keeps calling the query_device function. If there would be a bug in one of the query_device implementations, e.g. that the correct data is only returned some time after device initialization has finished, then user space code will see other attribute values than kernel code. I think we should avoid such subtle and hard to debug behavior. Hence my proposal to modify ib_uverbs_ex_query_device() such that it fetches attribute information from the ib_device structure instead of by calling query_device(). Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH rdma-next 1/6] IB/core: Save the device attributes on the device structure
On 12/17/2015 06:41 PM, Jason Gunthorpe wrote: On Thu, Dec 17, 2015 at 03:44:19PM +0200, Sagi Grimberg wrote: + ret = ib_query_device(device, &device->attrs); + if (ret) { + printk(KERN_WARNING "Couldn't query the device attributes\n"); + goto out; + } + I thought we're all for removing the call altogether aren't we? I'd say just call device->query_device() instead. Christoph's patch even got rid of device->query_device(), which, IHMO, I prefer to see over this. It re-enforces that these values are constants and drivers cannot change them on the fly. I also would like to see the query_device() implementations to be removed from the hw drivers. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/13] irq_poll: remove unused data and max fields
On 12/07/2015 12:51 PM, Christoph Hellwig wrote: [ ... ] Reviewed-by: Bart Van Assche -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/13] IB/srpt: use the new CQ API
On 12/07/2015 12:51 PM, Christoph Hellwig wrote: [ ... ] Reviewed-by: Bart Van Assche -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] IB/srpt: chain RDMA READ/WRITE requests
On 12/07/2015 12:51 PM, Christoph Hellwig wrote: Remove struct rdma_iu and instead allocate the struct ib_rdma_wr array early and fill out directly. This allows us to chain the WRs, and thus archive both less lock contention on the HCA workqueue as well as much ^^^ Did you perhaps intend "achieve" ? struct srpt_send_ioctx { struct srpt_ioctx ioctx; struct srpt_rdma_ch *ch; - struct rdma_iu *rdma_ius; + struct ib_rdma_wr *rdma_ius; Please rename the "rdma_ius" member into "wr" or any other name that shows that this member is now a work request array and nothing else. Otherwise this patch looks fine to me. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/13] IB: add a proper completion queue abstraction
On 12/07/2015 12:51 PM, Christoph Hellwig wrote: This adds an abstraction that allows ULP to simply pass a completion ^^^ I think this should either be changed into either "an ULP" or "ULPs". +/** + * ib_process_direct_cq - process a CQ in caller context + * @cq:CQ to process + * @budget:number of CQEs to poll for + * + * This function is used to process all outstanding CQ entries on a + * %IB_POLL_DIRECT CQ. It does not offload CQ processing to a different + * context and does not ask from completion interrupts from the HCA. Should this perhaps be changed into "for" ? + * + * Note: for compatibility reasons -1 can be passed in %budget for unlimited + * polling. Do not use this feature in new code, it will be remove soon. ^^ Did you perhaps intend "removed" ? +struct ib_cq *ib_alloc_cq(struct ib_device *dev, void *private, + int nr_cqe, int comp_vector, enum ib_poll_context poll_ctx) +{ > [ ... ] + cq->wc = kmalloc_array(IB_POLL_BATCH, sizeof(*cq->wc), GFP_KERNEL); Why is the wc array allocated separately instead of being embedded in struct ib_cq ? I think the faster completion queues can be created the better so if it is possible to eliminate the above kmalloc() call I would prefer that. --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -457,10 +457,11 @@ static struct srp_fr_pool *srp_alloc_fr_pool(struct srp_target_port *target) static void srp_destroy_qp(struct srp_rdma_ch *ch) { static struct ib_qp_attr attr = { .qp_state = IB_QPS_ERR }; - static struct ib_recv_wr wr = { .wr_id = SRP_LAST_WR_ID }; + static struct ib_recv_wr wr = { 0 }; struct ib_recv_wr *bad_wr; int ret; Is explicit initialization to "{ 0 }" really needed for static structures ? Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/13] irq_poll: mark __irq_poll_complete static
On 12/07/2015 12:51 PM, Christoph Hellwig wrote: [ ... ] Reviewed-by: Bart Van Assche -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/13] irq_poll: fold irq_poll_disable_pending into irq_poll_softirq
On 12/07/2015 12:51 PM, Christoph Hellwig wrote: > [ ... ] Reviewed-by: Bart Van Assche -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/13] irq_poll: make blk-iopoll available outside the block layer
On 12/07/2015 12:51 PM, Christoph Hellwig wrote: -enum { - IOPOLL_F_SCHED = 0, - IOPOLL_F_DISABLE= 1, -}; [ ... ] +enum { + IRQ_POLL_F_SCHED= 0, + IRQ_POLL_F_DISABLE = 1, +}; A nitpick: the values of these constants were aligned in the original code but no longer in the new code. Whether or not this gets addressed: Reviewed-by: Bart Van Assche -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/13] irq_poll: don't disable new irq_poll instances
On 12/07/2015 12:51 PM, Christoph Hellwig wrote: There is no good reason to start out disabled - drivers can control if the poll instance can be scheduled by simply not scheduling it yet. Reviewed-by: Bart Van Assche -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/13] IB/srp: use the new CQ API
On 12/07/2015 12:51 PM, Christoph Hellwig wrote: +static void srp_send_done(struct ib_cq *cq, struct ib_wc *wc) +{ + struct srp_iu *iu = container_of(wc->wr_cqe, struct srp_iu, cqe); + struct srp_rdma_ch *ch = cq->cq_context; + + if (likely(wc->status != IB_WC_SUCCESS)) { + srp_handle_qp_err(cq, wc, "SEND"); + return; + } + + list_add(&iu->list, &ch->free_tx); +} Please change likely() in the above code into unlikely(). Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/13] irq_poll: fold irq_poll_sched_prep into irq_poll_sched
On 12/07/2015 12:51 PM, Christoph Hellwig wrote: @@ -58,7 +62,7 @@ EXPORT_SYMBOL(__irq_poll_complete); * Description: * If a driver consumes less than the assigned budget in its run of the * iopoll handler, it'll end the polled mode by calling this function. The - * iopoll handler will not be invoked again before irq_poll_sched_prep() + * iopoll handler will not be invoked again before irq_poll_schedp() * is called. **/ void irq_poll_complete(struct irq_poll *iop) Is that a typo at the end of the modified line ("schedp") ? If this gets addressed: Reviewed-by: Bart Van Assche -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] SRP initiator related bug fixes
On 12/07/2015 01:07 PM, Doug Ledford wrote: On 12/07/2015 02:26 PM, Bart Van Assche wrote: On 12/01/2015 10:16 AM, Bart Van Assche wrote: This patch series contains six bug fixes either for the SRP initiator itself or for IB core functionality used by the SRP initiator. The order of these patches matches the order in which the corresponding bugs have been introduced. The patches in this series are: 0001-IB-srp-Fix-a-memory-leak.patch 0002-IB-srp-Fix-possible-send-queue-overflow.patch 0003-IB-srp-Initialize-dma_length-in-srp_map_idb.patch 0004-IB-srp-Fix-indirect-data-buffer-rkey-endianness.patch 0005-IB-core-Fix-ib_sg_to_pages.patch 0006-IB-srp-Fix-srp_map_sg_fr.patch (replying to my own e-mail) Hello Doug, A consensus has been achieved about this patch series. Since one of the patches in this series fixes a data corruption bug that has been introduced during the latest (v4.4) merge window I'd like to see these patches upstream soon. Please recommend how I should proceed. When I repost this patch series, should I send it to you, to Linus or perhaps to someone else ? I was just getting ready to pull this set in. The only one you actually changed, and therefore needs either a resubmit or I have to be careful to get the follow-on version in Patchworks is 5/6, yes? Hello Doug, You are correct, only for patch 5 the patch itself has been modified. The only change for the other patches is that Reviewed-by tags have been posted on the list. Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] SRP initiator related bug fixes
On 12/01/2015 10:16 AM, Bart Van Assche wrote: This patch series contains six bug fixes either for the SRP initiator itself or for IB core functionality used by the SRP initiator. The order of these patches matches the order in which the corresponding bugs have been introduced. The patches in this series are: 0001-IB-srp-Fix-a-memory-leak.patch 0002-IB-srp-Fix-possible-send-queue-overflow.patch 0003-IB-srp-Initialize-dma_length-in-srp_map_idb.patch 0004-IB-srp-Fix-indirect-data-buffer-rkey-endianness.patch 0005-IB-core-Fix-ib_sg_to_pages.patch 0006-IB-srp-Fix-srp_map_sg_fr.patch (replying to my own e-mail) Hello Doug, A consensus has been achieved about this patch series. Since one of the patches in this series fixes a data corruption bug that has been introduced during the latest (v4.4) merge window I'd like to see these patches upstream soon. Please recommend how I should proceed. When I repost this patch series, should I send it to you, to Linus or perhaps to someone else ? Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] SRP initiator related bug fixes
On 12/05/2015 03:17 AM, Christoph Hellwig wrote: Hi Bart, On Fri, Dec 04, 2015 at 03:08:15PM -0800, Bart Van Assche wrote: While preparing this patch series I noticed that none of the SCSI RDMA initiator drivers syncs RDMA buffers before performing RDMA. Does anyone know why something like the code below is not present in these drivers and why no dma_sync_sg operations are present in struct ib_dma_mapping_ops ? dma_map*/dma_unmap* have to do implicit syncs, so no explicit call is required. That's probably the reason why we don't need them in the weird RDMA shadows of these functions (for which I still don't understand the reason, while we're at it..). Hello Christoph, struct ib_dma_mapping_ops was added through commit "IB: Add DMA mapping functions to allow device drivers to interpose" (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9b513090a3c5e4964f9ac09016c1586988abb3d5). I'm not sure that these functions are still needed by the qib or ipath drivers since the .dma_address and .dma_len members have been removed some time ago from that structure (see also http://thread.gmane.org/gmane.linux.drivers.rdma/19730). It would be useful to have data available that shows the performance difference with and without struct ib_dma_mapping_ops for the qib and ipath drivers but I do not know whether such data is available publicly. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] SRP initiator related bug fixes
On 12/01/2015 10:16 AM, Bart Van Assche wrote: > [ ... ] Hello, While preparing this patch series I noticed that none of the SCSI RDMA initiator drivers syncs RDMA buffers before performing RDMA. Does anyone know why something like the code below is not present in these drivers and why no dma_sync_sg operations are present in struct ib_dma_mapping_ops ? Thanks, Bart. [PATCH] IB/srp: Sync scatterlists before and after DMA --- drivers/infiniband/ulp/srp/ib_srp.c | 10 ++ include/rdma/ib_verbs.h | 16 2 files changed, 26 insertions(+) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 3db9a65..23e3c25 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1780,6 +1780,7 @@ static int srp_post_recv(struct srp_rdma_ch *ch, struct srp_iu *iu) static void srp_process_rsp(struct srp_rdma_ch *ch, struct srp_rsp *rsp) { struct srp_target_port *target = ch->target; + struct ib_device *dev = target->srp_host->srp_dev->dev; struct srp_request *req; struct scsi_cmnd *scmnd; unsigned long flags; @@ -1828,6 +1829,11 @@ static void srp_process_rsp(struct srp_rdma_ch *ch, struct srp_rsp *rsp) else if (unlikely(rsp->flags & SRP_RSP_FLAG_DOOVER)) scsi_set_resid(scmnd, -be32_to_cpu(rsp->data_out_res_cnt)); + if (scmnd->sc_data_direction != DMA_NONE) + ib_dma_sync_sg_for_cpu(dev, scsi_sglist(scmnd), + scsi_sg_count(scmnd), + scmnd->sc_data_direction); + srp_free_req(ch, req, scmnd, be32_to_cpu(rsp->req_lim_delta)); @@ -2112,6 +2118,10 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd) ib_dma_sync_single_for_device(dev, iu->dma, target->max_iu_len, DMA_TO_DEVICE); + if (scmnd->sc_data_direction != DMA_NONE) + ib_dma_sync_sg_for_device(dev, scsi_sglist(scmnd), + scsi_sg_count(scmnd), + scmnd->sc_data_direction); if (srp_post_send(ch, iu, len)) { shost_printk(KERN_ERR, target->scsi_host, PFX "Send failed\n"); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 9a68a19..5f9cba7 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2796,6 +2796,22 @@ static inline void ib_dma_sync_single_for_device(struct ib_device *dev, dma_sync_single_for_device(dev->dma_device, addr, size, dir); } +/* Prepare DMA region to be accessed by CPU */ +static inline void ib_dma_sync_sg_for_cpu(struct ib_device *dev, + struct scatterlist *sg, int nelems, + enum dma_data_direction dir) +{ + dma_sync_sg_for_cpu(dev->dma_device, sg, nelems, dir); +} + +/* Prepare DMA region to be accessed by HCA */ +static inline void ib_dma_sync_sg_for_device(struct ib_device *dev, +struct scatterlist *sg, int nelems, +enum dma_data_direction dir) +{ + dma_sync_sg_for_device(dev->dma_device, sg, nelems, dir); +} + /** * ib_dma_alloc_coherent - Allocate memory and map it for DMA * @dev: The device for which the DMA address is requested -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] IB core: Fix ib_sg_to_pages()
On 12/03/2015 01:18 AM, Christoph Hellwig wrote: > The patch looks good to me, but while we touch this area, how about > throwing in a few cosmetic fixes as well? How about the patch below ? In that version of the ib_sg_to_pages() fix these concerns have been addressed and additionally to more bugs have been fixed. [PATCH] IB core: Fix ib_sg_to_pages() Fix the code for detecting gaps. A gap occurs not only if the second or later scatterlist element is not aligned but also if any scatterlist element other than the last does not end at a page boundary. In the code for coalescing contiguous elements, ensure that mr->length is correct and that last_page_addr is up-to-date. Ensure that this function returns a negative error code instead of zero if the first set_page() call fails. Fixes: commit 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API") Reported-by: Christoph Hellwig --- drivers/infiniband/core/verbs.c | 43 + 1 file changed, 22 insertions(+), 21 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 043a60e..545906d 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1516,7 +1516,7 @@ EXPORT_SYMBOL(ib_map_mr_sg); * @sg_nents: number of entries in sg * @set_page: driver page assignment function pointer * - * Core service helper for drivers to covert the largest + * Core service helper for drivers to convert the largest * prefix of given sg list to a page vector. The sg list * prefix converted is the prefix that meet the requirements * of ib_map_mr_sg. @@ -1533,7 +1533,7 @@ int ib_sg_to_pages(struct ib_mr *mr, u64 last_end_dma_addr = 0, last_page_addr = 0; unsigned int last_page_off = 0; u64 page_mask = ~((u64)mr->page_size - 1); - int i; + int i, ret; mr->iova = sg_dma_address(&sgl[0]); mr->length = 0; @@ -1544,27 +1544,29 @@ int ib_sg_to_pages(struct ib_mr *mr, u64 end_dma_addr = dma_addr + dma_len; u64 page_addr = dma_addr & page_mask; - if (i && page_addr != dma_addr) { - if (last_end_dma_addr != dma_addr) { - /* gap */ - goto done; - - } else if (last_page_off + dma_len <= mr->page_size) { - /* chunk this fragment with the last */ - mr->length += dma_len; - last_end_dma_addr += dma_len; - last_page_off += dma_len; - continue; - } else { - /* map starting from the next page */ - page_addr = last_page_addr + mr->page_size; - dma_len -= mr->page_size - last_page_off; - } + /* +* For the second and later elements, check whether either the +* end of element i-1 or the start of element i is not aligned +* on a page boundary. +*/ + if (i && (last_page_off != 0 || page_addr != dma_addr)) { + /* Stop mapping if there is a gap. */ + if (last_end_dma_addr != dma_addr) + break; + + /* +* Coalesce this element with the last. If it is small +* enough just update mr->length. Otherwise start +* mapping from the next page. +*/ + goto next_page; } do { - if (unlikely(set_page(mr, page_addr))) - goto done; + ret = set_page(mr, page_addr); + if (unlikely(ret < 0)) + return i ? : ret; +next_page: page_addr += mr->page_size; } while (page_addr < end_dma_addr); @@ -1574,7 +1576,6 @@ int ib_sg_to_pages(struct ib_mr *mr, last_page_off = end_dma_addr & ~page_mask; } -done: return i; } EXPORT_SYMBOL(ib_sg_to_pages); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] IB core: Fix ib_sg_to_pages()
On 12/02/2015 01:31 AM, Sagi Grimberg wrote: > On 01/12/2015 21:10, Bart Van Assche wrote: >> On 12/01/2015 10:32 AM, Sagi Grimberg wrote: >> How ib_sg_to_pages() reports to its caller that mapping the first >> scatterlist element failed is not important to me. I included that >> change in this patch because I noticed that in the upstream SRP >> initiator does not consider ib_map_mr_sg() returning zero as an error. I >> think either ib_sg_to_pages() or ib_map_mr_sg() should be modified such >> that "no pages have been mapped" is handled as a mapping failure. > > I don't either, but the patch introduces inconsistency in a case where > the caller passed sg_nents=0 (which will return 0 and not -errno). > > Would it make better sense to have srp treat 0 return as error? note > that all other ULPs treat return_val != sg_nents as error. Hello Sagi, Hmm ... why would it be unacceptable to return 0 if sg_nents == 0 ? Regarding which component to modify if mapping the first page fails: for almost every kernel function I know a negative return value means failure and a return value >= 0 means success. Hence my proposal to change the return value of the ib_map_mr_sg() function if mapping the first page fails. How about the patch below ? Bart. [PATCH] IB core: Fix ib_sg_to_pages() Fix the code for detecting gaps. A gap occurs not only if the second or later scatterlist element is not aligned but also if any scatterlist element other than the last does not end at a page boundary. Ensure that this function returns a negative error code instead of zero if the first set_page() call fails. Fixes: commit 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API") Reported-by: Christoph Hellwig --- drivers/infiniband/core/verbs.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 043a60e..1972c50 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1516,7 +1516,7 @@ EXPORT_SYMBOL(ib_map_mr_sg); * @sg_nents: number of entries in sg * @set_page: driver page assignment function pointer * - * Core service helper for drivers to covert the largest + * Core service helper for drivers to convert the largest * prefix of given sg list to a page vector. The sg list * prefix converted is the prefix that meet the requirements * of ib_map_mr_sg. @@ -1533,7 +1533,7 @@ int ib_sg_to_pages(struct ib_mr *mr, u64 last_end_dma_addr = 0, last_page_addr = 0; unsigned int last_page_off = 0; u64 page_mask = ~((u64)mr->page_size - 1); - int i; + int i, ret; mr->iova = sg_dma_address(&sgl[0]); mr->length = 0; @@ -1544,11 +1544,10 @@ int ib_sg_to_pages(struct ib_mr *mr, u64 end_dma_addr = dma_addr + dma_len; u64 page_addr = dma_addr & page_mask; - if (i && page_addr != dma_addr) { + if (i && (page_addr != dma_addr || last_page_off != 0)) { if (last_end_dma_addr != dma_addr) { /* gap */ - goto done; - + break; } else if (last_page_off + dma_len <= mr->page_size) { /* chunk this fragment with the last */ mr->length += dma_len; @@ -1563,8 +1562,9 @@ int ib_sg_to_pages(struct ib_mr *mr, } do { - if (unlikely(set_page(mr, page_addr))) - goto done; + ret = set_page(mr, page_addr); + if (unlikely(ret < 0)) + return i ? : ret; page_addr += mr->page_size; } while (page_addr < end_dma_addr); @@ -1574,7 +1574,6 @@ int ib_sg_to_pages(struct ib_mr *mr, last_page_off = end_dma_addr & ~page_mask; } -done: return i; } EXPORT_SYMBOL(ib_sg_to_pages); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] IB core: Fix ib_sg_to_pages()
On 12/01/2015 10:32 AM, Sagi Grimberg wrote: Fix the code for detecting gaps and disable the code for chunking. A gap occurs not only if the second or later scatterlist element is not aligned but also if any scatterlist element other than the last does not end at a page boundary. Disable the code for chunking. So can you explain what went wrong with the code? If ib_sg_to_pages() supports chunking, then sg elements are allowed not to end at a page boundary if it is physically contig to the next sg and then the next is chunked. Care to explain how did ib_sg_to_pages mess up? With the upstream implementation, if the previous element ends at a page boundary (last_end_dma_addr == dma_addr) but the current element does not start at a page boundary (page_addr != dma_addr) then the current element is mapped. I think this is wrong because this condition indicates a gap and hence that ib_sg_to_pages() should stop iterating in this case. Ensure that this function returns a negative error code instead of zero if the first set_page() call fails. Umm, my recollection was to make ib_map_mr_sg return the largest prefix mapped. I don't mind a negative error in this case, but isn't zero an implicit error (given you didn't want to map 0 sg elements)? If we do agree on this we need to change ib_map_mr_sg documentation accordingly. How ib_sg_to_pages() reports to its caller that mapping the first scatterlist element failed is not important to me. I included that change in this patch because I noticed that in the upstream SRP initiator does not consider ib_map_mr_sg() returning zero as an error. I think either ib_sg_to_pages() or ib_map_mr_sg() should be modified such that "no pages have been mapped" is handled as a mapping failure. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] IB/srp: Fix indirect data buffer rkey endianness
On 12/01/2015 10:37 AM, Sagi Grimberg wrote: On 01/12/2015 20:18, Bart Van Assche wrote: Detected by sparse. Fixes: commit 330179f2fa93 ("IB/srp: Register the indirect data buffer descriptor") Signed-off-by: Bart Van Assche Cc: stable # v4.3+ Cc: Sagi Grimberg Cc: Christoph Hellwig Cc: Sebastian Parschauer --- drivers/infiniband/ulp/srp/ib_srp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 72fac20..fac1423 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1662,7 +1662,7 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_rdma_ch *ch, return ret; req->nmdesc++; } else { -idb_rkey = target->global_mr->rkey; +idb_rkey = cpu_to_be32(target->global_mr->rkey); } Wouldn't it make more sense to define idb_rkey to be u32 and change endianness when assigning it to the indirect desc? Hello Sagi, That's possible, but that would cause the endianness of the indirect data buffer rkey to be changed three times - a first time in srp_map_desc(), a second time in srp_map_idb() and a third time in srp_map_data(). Hence the choice to fix the IDB rkey via the above patch. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] IB/srp: Fix srp_map_sg_fr()
On 12/01/2015 10:35 AM, Sagi Grimberg wrote: After dma_map_sg() has been called the return value of that function must be used as the number of elements in the scatterlist instead of scsi_sg_count(). Umm, but ib_map_mr_sg iterates on the sg list. Say you have sg_nents=3 and after mapping you got dma_nents=2 (2 entries are contig) and you pass that, ib_sg_to_pages will only iterate on 2 elements won't it? am I missing something? Hello Sagi, From https://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt: With scatterlists, you map a region gathered from several regions by: int i, count = dma_map_sg(dev, sglist, nents, direction); struct scatterlist *sg; for_each_sg(sglist, sg, count, i) { hw_address[i] = sg_dma_address(sg); hw_len[i] = sg_dma_len(sg); } where nents is the number of entries in the sglist. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] IB/srp: Fix srp_map_sg_fr()
After dma_map_sg() has been called the return value of that function must be used as the number of elements in the scatterlist instead of scsi_sg_count(). Fixes: commit f7f7aab1a5c0 ("IB/srp: Convert to new registration API") Reported-by: Christoph Hellwig Signed-off-by: Bart Van Assche Cc: stable # v4.4+ Cc: Sagi Grimberg Cc: Sebastian Parschauer --- drivers/infiniband/ulp/srp/ib_srp.c | 19 --- drivers/infiniband/ulp/srp/ib_srp.h | 5 + 2 files changed, 9 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index fac1423..3db9a65 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1313,7 +1313,7 @@ reset_state: } static int srp_map_finish_fr(struct srp_map_state *state, -struct srp_rdma_ch *ch) +struct srp_rdma_ch *ch, int sg_nents) { struct srp_target_port *target = ch->target; struct srp_device *dev = target->srp_host->srp_dev; @@ -1328,10 +1328,10 @@ static int srp_map_finish_fr(struct srp_map_state *state, WARN_ON_ONCE(!dev->use_fast_reg); - if (state->sg_nents == 0) + if (sg_nents == 0) return 0; - if (state->sg_nents == 1 && target->global_mr) { + if (sg_nents == 1 && target->global_mr) { srp_map_desc(state, sg_dma_address(state->sg), sg_dma_len(state->sg), target->global_mr->rkey); @@ -1345,8 +1345,7 @@ static int srp_map_finish_fr(struct srp_map_state *state, rkey = ib_inc_rkey(desc->mr->rkey); ib_update_fast_reg_key(desc->mr, rkey); - n = ib_map_mr_sg(desc->mr, state->sg, state->sg_nents, -dev->mr_page_size); + n = ib_map_mr_sg(desc->mr, state->sg, sg_nents, dev->mr_page_size); if (unlikely(n < 0)) return n; @@ -1452,16 +1451,15 @@ static int srp_map_sg_fr(struct srp_map_state *state, struct srp_rdma_ch *ch, state->fr.next = req->fr_list; state->fr.end = req->fr_list + ch->target->cmd_sg_cnt; state->sg = scat; - state->sg_nents = scsi_sg_count(req->scmnd); - while (state->sg_nents) { + while (count) { int i, n; - n = srp_map_finish_fr(state, ch); + n = srp_map_finish_fr(state, ch, count); if (unlikely(n < 0)) return n; - state->sg_nents -= n; + count -= n; for (i = 0; i < n; i++) state->sg = sg_next(state->sg); } @@ -1521,13 +1519,12 @@ static int srp_map_idb(struct srp_rdma_ch *ch, struct srp_request *req, if (dev->use_fast_reg) { state.sg = idb_sg; - state.sg_nents = 1; sg_set_buf(idb_sg, req->indirect_desc, idb_len); idb_sg->dma_address = req->indirect_dma_addr; /* hack! */ #ifdef CONFIG_NEED_SG_DMA_LENGTH idb_sg->dma_length = idb_sg->length; /* hack^2 */ #endif - ret = srp_map_finish_fr(&state, ch); + ret = srp_map_finish_fr(&state, ch, 1); if (ret < 0) return ret; } else if (dev->use_fmr) { diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h index 87a2a91..f6af531 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.h +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -300,10 +300,7 @@ struct srp_map_state { dma_addr_t base_dma_addr; u32 dma_len; u32 total_len; - union { - unsigned intnpages; - int sg_nents; - }; + unsigned intnpages; unsigned intnmdesc; unsigned intndesc; }; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] IB core: Fix ib_sg_to_pages()
Fix the code for detecting gaps and disable the code for chunking. A gap occurs not only if the second or later scatterlist element is not aligned but also if any scatterlist element other than the last does not end at a page boundary. Disable the code for chunking. Ensure that this function returns a negative error code instead of zero if the first set_page() call fails. Fixes: commit 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API") Reported-by: Christoph Hellwig Signed-off-by: Bart Van Assche Cc: stable # v4.4+ Cc: Sagi Grimberg Cc: Sebastian Parschauer --- drivers/infiniband/core/verbs.c | 31 +++ 1 file changed, 7 insertions(+), 24 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 043a60e..95836c6 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1530,10 +1530,9 @@ int ib_sg_to_pages(struct ib_mr *mr, int (*set_page)(struct ib_mr *, u64)) { struct scatterlist *sg; - u64 last_end_dma_addr = 0, last_page_addr = 0; unsigned int last_page_off = 0; u64 page_mask = ~((u64)mr->page_size - 1); - int i; + int i, ret; mr->iova = sg_dma_address(&sgl[0]); mr->length = 0; @@ -1544,37 +1543,21 @@ int ib_sg_to_pages(struct ib_mr *mr, u64 end_dma_addr = dma_addr + dma_len; u64 page_addr = dma_addr & page_mask; - if (i && page_addr != dma_addr) { - if (last_end_dma_addr != dma_addr) { - /* gap */ - goto done; - - } else if (last_page_off + dma_len <= mr->page_size) { - /* chunk this fragment with the last */ - mr->length += dma_len; - last_end_dma_addr += dma_len; - last_page_off += dma_len; - continue; - } else { - /* map starting from the next page */ - page_addr = last_page_addr + mr->page_size; - dma_len -= mr->page_size - last_page_off; - } - } + /* gap */ + if (i && (last_page_off || page_addr != dma_addr)) + break; do { - if (unlikely(set_page(mr, page_addr))) - goto done; + ret = set_page(mr, page_addr); + if (unlikely(ret < 0)) + return i ? i : ret; page_addr += mr->page_size; } while (page_addr < end_dma_addr); mr->length += dma_len; - last_end_dma_addr = end_dma_addr; - last_page_addr = end_dma_addr & page_mask; last_page_off = end_dma_addr & ~page_mask; } -done: return i; } EXPORT_SYMBOL(ib_sg_to_pages); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] IB/srp: Fix indirect data buffer rkey endianness
Detected by sparse. Fixes: commit 330179f2fa93 ("IB/srp: Register the indirect data buffer descriptor") Signed-off-by: Bart Van Assche Cc: stable # v4.3+ Cc: Sagi Grimberg Cc: Christoph Hellwig Cc: Sebastian Parschauer --- drivers/infiniband/ulp/srp/ib_srp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 72fac20..fac1423 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1662,7 +1662,7 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_rdma_ch *ch, return ret; req->nmdesc++; } else { - idb_rkey = target->global_mr->rkey; + idb_rkey = cpu_to_be32(target->global_mr->rkey); } indirect_hdr->table_desc.va = cpu_to_be64(req->indirect_dma_addr); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] IB/srp: Initialize dma_length in srp_map_idb
From: Christoph Hellwig Without this sg_dma_len will return 0 on architectures tha have the dma_length field. Fixes: commit f7f7aab1a5c0 ("IB/srp: Convert to new registration API") Signed-off-by: Christoph Hellwig --- drivers/infiniband/ulp/srp/ib_srp.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index c7a95d2..72fac20 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1524,6 +1524,9 @@ static int srp_map_idb(struct srp_rdma_ch *ch, struct srp_request *req, state.sg_nents = 1; sg_set_buf(idb_sg, req->indirect_desc, idb_len); idb_sg->dma_address = req->indirect_dma_addr; /* hack! */ +#ifdef CONFIG_NEED_SG_DMA_LENGTH + idb_sg->dma_length = idb_sg->length; /* hack^2 */ +#endif ret = srp_map_finish_fr(&state, ch); if (ret < 0) return ret; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] IB/srp: Fix a memory leak
If srp_connect_ch() returns a positive value then that is considered by its caller as a connection failure but this does not result in a scsi_host_put() call and additionally causes the srp_create_target() function to return a positive value while it should return a negative value. Avoid all this confusion and additionally fix a memory leak by ensuring that srp_connect_ch() always returns a value that is <= 0. This patch avoids that a rejected login triggers the following memory leak: unreferenced object 0x88021b24a220 (size 8): comm "srp_daemon", pid 56421, jiffies 4295006762 (age 4240.750s) hex dump (first 8 bytes): 68 6f 73 74 35 38 00 a5 host58.. backtrace: [] kmemleak_alloc+0x7a/0xc0 [] __kmalloc_track_caller+0xfe/0x160 [] kvasprintf+0x5b/0x90 [] kvasprintf_const+0x8d/0xb0 [] kobject_set_name_vargs+0x3c/0xa0 [] dev_set_name+0x3c/0x40 [] scsi_host_alloc+0x327/0x4b0 [] srp_create_target+0x4e/0x8a0 [ib_srp] [] dev_attr_store+0x1b/0x20 [] sysfs_kf_write+0x4a/0x60 [] kernfs_fop_write+0x14e/0x180 [] __vfs_write+0x2f/0xf0 [] vfs_write+0xa4/0x100 [] SyS_write+0x54/0xc0 [] entry_SYSCALL_64_fastpath+0x12/0x6f Signed-off-by: Bart Van Assche Reviewed-by: Christoph Hellwig Reviewed-by: Sagi Grimberg Cc: Sebastian Parschauer Cc: stable --- drivers/infiniband/ulp/srp/ib_srp.c | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 9909022..a074dad 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -994,16 +994,16 @@ static int srp_connect_ch(struct srp_rdma_ch *ch, bool multich) ret = srp_lookup_path(ch); if (ret) - return ret; + goto out; while (1) { init_completion(&ch->done); ret = srp_send_req(ch, multich); if (ret) - return ret; + goto out; ret = wait_for_completion_interruptible(&ch->done); if (ret < 0) - return ret; + goto out; /* * The CM event handling code will set status to @@ -1011,15 +1011,16 @@ static int srp_connect_ch(struct srp_rdma_ch *ch, bool multich) * back, or SRP_DLID_REDIRECT if we get a lid/qp * redirect REJ back. */ - switch (ch->status) { + ret = ch->status; + switch (ret) { case 0: ch->connected = true; - return 0; + goto out; case SRP_PORT_REDIRECT: ret = srp_lookup_path(ch); if (ret) - return ret; + goto out; break; case SRP_DLID_REDIRECT: @@ -1028,13 +1029,16 @@ static int srp_connect_ch(struct srp_rdma_ch *ch, bool multich) case SRP_STALE_CONN: shost_printk(KERN_ERR, target->scsi_host, PFX "giving up on stale connection\n"); - ch->status = -ECONNRESET; - return ch->status; + ret = -ECONNRESET; + goto out; default: - return ch->status; + goto out; } } + +out: + return ret <= 0 ? ret : -ENODEV; } static int srp_inv_rkey(struct srp_rdma_ch *ch, u32 rkey) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] IB/srp: Fix possible send queue overflow
From: Sagi Grimberg When using work request based memory registration (fast_reg) we must reserve SQ entries for registration and invalidation in addition to send operations. Each IO consumes 3 SQ entries (registration, send, invalidation) so we need to allocate 3x larger send-queue instead of 2x. Signed-off-by: Sagi Grimberg CC: Stable --- drivers/infiniband/ulp/srp/ib_srp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index a074dad..c7a95d2 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -488,7 +488,7 @@ static int srp_create_ch_ib(struct srp_rdma_ch *ch) struct ib_qp *qp; struct ib_fmr_pool *fmr_pool = NULL; struct srp_fr_pool *fr_pool = NULL; - const int m = 1 + dev->use_fast_reg; + const int m = dev->use_fast_reg ? 3 : 1; struct ib_cq_init_attr cq_attr = {}; int ret; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] SRP initiator related bug fixes
This patch series contains six bug fixes either for the SRP initiator itself or for IB core functionality used by the SRP initiator. The order of these patches matches the order in which the corresponding bugs have been introduced. The patches in this series are: 0001-IB-srp-Fix-a-memory-leak.patch 0002-IB-srp-Fix-possible-send-queue-overflow.patch 0003-IB-srp-Initialize-dma_length-in-srp_map_idb.patch 0004-IB-srp-Fix-indirect-data-buffer-rkey-endianness.patch 0005-IB-core-Fix-ib_sg_to_pages.patch 0006-IB-srp-Fix-srp_map_sg_fr.patch -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] Fix compiler warnings about set-but-not-used variables
Recent gcc versions emit a warning if a variable is set but not used. Suppress these warnings by removing dummy assignments to unused arguments. Additionally, enable the compiler command-line option -Wno-unused-parameter and remove the '(void)arg' statements that thereby became superfluous. Signed-off-by: Bart Van Assche --- cmdif/tools_cif.c| 1 - cmdparser/my_getopt.c| 3 - common/compatibility.h | 7 + flint/Makefile.am| 2 +- flint/subcommands.cpp| 5 - mflash/Makefile.am | 2 +- mflash/mflash.c | 15 +- mlxconfig/Makefile.am| 2 +- mlxconfig/mlxcfg_param_lib.cpp | 19 - mlxconfig/mlxcfg_param_lib.h | 10 +- mlxconfig/mlxcfg_ui.cpp | 1 - mlxfwops/lib/Makefile.am | 2 +- mlxfwops/lib/flint_io.cpp| 5 +- mlxfwops/lib/fs2_ops.cpp | 31 -- mlxfwops/lib/fs3_ops.cpp | 17 +- mlxfwops/lib/fw_ops.cpp | 4 - mtcr_ul/Makefile.am | 2 +- mtcr_ul/mtcr_ib_ofed.c | 1 - mtcr_ul/mtcr_ul.c| 22 +- mtcr_ul/mtcr_ul_icmd_cif.c | 1 - small_utils/mcra.c | 1 - tools_layouts/cibfw_layouts.c| 249 -- tools_layouts/cx4fw_layouts.c| 95 tools_layouts/register_access_open_layouts.c | 49 -- tools_layouts/register_access_sib_layouts.c | 97 tools_layouts/tools_open_layouts.c | 685 --- 26 files changed, 25 insertions(+), 1303 deletions(-) diff --git a/cmdif/tools_cif.c b/cmdif/tools_cif.c index c6afc68..d767ada 100644 --- a/cmdif/tools_cif.c +++ b/cmdif/tools_cif.c @@ -132,7 +132,6 @@ const char* tcif_err2str(MError rc) { MError tcif_cr_mbox_supported(mfile* dev) { #if defined(__FreeBSD__) || defined(UEFI_BUILD) -(void)dev; return ME_NOT_IMPLEMENTED; #else return tools_cmdif_is_cr_mbox_supported(dev); diff --git a/cmdparser/my_getopt.c b/cmdparser/my_getopt.c index be6281d..6e64b35 100755 --- a/cmdparser/my_getopt.c +++ b/cmdparser/my_getopt.c @@ -394,9 +394,6 @@ /* Start processing options with ARGV-element 1 (since ARGV-element 0 is the program name); the sequence of previously skipped non-option ARGV-elements is empty. */ - // avoid compiler warnings - (void) argc; - (void) argv; first_nonopt = last_nonopt = tools_optind; nextchar = NULL; diff --git a/common/compatibility.h b/common/compatibility.h index 8dc85e3..8f3b4d5 100644 --- a/common/compatibility.h +++ b/common/compatibility.h @@ -215,6 +215,13 @@ #endif /* + * Windows (Microsoft C Compiler) + */ +#ifdef _MSC_VER +#pragma warning (disable: 4100) // 'identifier' : unreferenced formal parameter +#endif + +/* * Windows (CYGWIN) */ #if defined(__CYGWIN32__) diff --git a/flint/Makefile.am b/flint/Makefile.am index 96bb6ad..5ae5ee5 100755 --- a/flint/Makefile.am +++ b/flint/Makefile.am @@ -41,7 +41,7 @@ CMDIF_DIR = $(top_srcdir)/cmdif AM_CPPFLAGS = -I$(top_srcdir) -I$(srcdir) -I$(MTCR_DIR) -I$(MFLASH_DIR) -I$(COMMON_DIR) \ -I$(LAYOUTS_DIR) -I$(MFT_UTILS_DIR) -mstflint_CXXFLAGS = -Wall -W -g -MP -MD -pipe -DEXTERNAL +mstflint_CXXFLAGS = -Wall -Wextra -Wno-unused-parameter -g -MP -MD -pipe -DEXTERNAL bin_PROGRAMS = mstflint mstflint_SOURCES = flint.cpp flint.h subcommands.cpp subcommands.h\ diff --git a/flint/subcommands.cpp b/flint/subcommands.cpp index 91400a5..5b60334 100644 --- a/flint/subcommands.cpp +++ b/flint/subcommands.cpp @@ -981,10 +981,6 @@ bool SubCommand::unzipDataFile (std::vector data, std::vector& data, con bool SubCommand::checkGuidsFlags (chip_type_t ct, u_int16_t devType, u_int8_t fwType, bool guidsSpecified, bool macsSpecified, bool uidsSpecified, bool ibDev, bool ethDev) { -(void)ibDev; if (guidsSpecified || macsSpecified || uidsSpecified) { if (ct == CT_BRIDGEX) { if (macsSpecified || guidsSpecified) { diff --git a/mflash/Makefile.am b/mflash/Makefile.am index c45351a..12da860 100644 --- a/mflash/Makefile.am +++ b/mflash/Makefile.am @@ -34,7 +34,7 @@ AM_CPPFLAGS= -I. -I$(top_srcdir)/include/mtcr_ul -I$(top_srcdir)/common -I$(top_srcdir)/tools_layouts -I$(top_srcdir)/reg_access \ -I$(top_srcdir)/cmdif -I$(top_srcdir)/tools_res_mgmt -AM_CFLAGS = -MD -pipe -Wall -W -DMST_UL -g ${MFLASH_INBAND_FLAG} +AM_CFLAGS = -MD -pipe -Wall -Wextra -Wno-unused-parameter -DMST_UL -g ${MFLASH_INBAND_FLAG} noinst_LIBRARIES = libmflash.a diff --git a/mflash/mflash.c b/mflash/mflash.c index c039127..4174f39 100644 --- a/mflash/mflash.c +++ b/mflash/mflash.c @@ -100,9 +100,7 @@ #incl
[PATCH 3/4] crd_read_line(): Rework code to suppress a compiler warning
Avoid that gcc prints a warning about not checking the result of fgets(). Signed-off-by: Bart Van Assche --- mstdump/crd_lib/crdump.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mstdump/crd_lib/crdump.c b/mstdump/crd_lib/crdump.c index 451a944..3e3c4bc 100755 --- a/mstdump/crd_lib/crdump.c +++ b/mstdump/crd_lib/crdump.c @@ -504,8 +504,8 @@ static int crd_read_line(IN FILE *fd, OUT char *tmp) { if (!feof(fd)) { int c = fgetc(fd); if (c == '#') { -char* _ptr=fgets (tmp, 300, fd); -(void)_ptr;//avoid warning +if (fgets(tmp, 300, fd)) { +} tmp[0] = 0; continue; } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] Fix several automake warnings
Avoid that automake reports the following warning messages: configure.ac:13: warning: AM_INIT_AUTOMAKE: two- and three-arguments forms are deprecated. For more info, see: configure.ac:13: http://www.gnu.org/software/automake/manual/automake.html#Modernize-AM_005fINIT_005fAUTOMAKE-invocation small_utils/Makefile.am:50: warning: compiling 'mtserver.c' with per-target flags requires 'AM_PROG_CC_C_O' in 'configure.ac' warning: 'libdev_mgt.a': linking libraries using a non-POSIX archiver requires 'AM_PROG_AR' in 'configure.ac' warning: 'INCLUDES' is the old name for 'AM_CPPFLAGS' (or '*_CPPFLAGS') Signed-off-by: Bart Van Assche --- cmdif/Makefile.am| 2 +- cmdparser/Makefile.am| 2 -- configure.ac | 4 +++- dev_mgt/Makefile.am | 2 +- flint/Makefile.am| 2 +- mflash/Makefile.am | 2 +- mft_utils/Makefile.am| 2 +- mlxconfig/Makefile.am| 2 +- mlxfwops/lib/Makefile.am | 2 +- mstdump/crd_lib/Makefile.am | 2 +- mstdump/crd_main/Makefile.am | 2 +- mtcr_ul/Makefile.am | 2 +- reg_access/Makefile.am | 2 +- small_utils/Makefile.am | 2 +- tools_layouts/Makefile.am| 2 +- tools_res_mgmt/Makefile.am | 2 +- 16 files changed, 17 insertions(+), 17 deletions(-) diff --git a/cmdif/Makefile.am b/cmdif/Makefile.am index 78aba47..8c03fe9 100644 --- a/cmdif/Makefile.am +++ b/cmdif/Makefile.am @@ -34,7 +34,7 @@ USER_DIR = $(top_srcdir) MTCR_DIR = $(USER_DIR)/include/mtcr_ul TOOLS_LAYOUTS_DIR = $(USER_DIR)/tools_layouts -INCLUDES = -I. -I../common -I../tools_layouts -I$(MTCR_DIR) -I.. -I$(USER_DIR)/mtcr_ul +AM_CPPFLAGS = -I. -I../common -I../tools_layouts -I$(MTCR_DIR) -I.. -I$(USER_DIR)/mtcr_ul AM_CFLAGS = -W -Wall -Werror -g -MP -MD $(COMPILER_FPIC) -DCMDIF_EXPORTS CMDIF_VERSION = 1 diff --git a/cmdparser/Makefile.am b/cmdparser/Makefile.am index 9696f71..952a074 100644 --- a/cmdparser/Makefile.am +++ b/cmdparser/Makefile.am @@ -32,8 +32,6 @@ # Makefile.am -- Process this file with automake to produce Makefile.in -INCLUDES = - AM_CPPFLAGS = -W -g -MP -MD -fPIC noinst_LIBRARIES = libcmdparser.a diff --git a/configure.ac b/configure.ac index be0f9d1..04fb61d 100644 --- a/configure.ac +++ b/configure.ac @@ -10,11 +10,13 @@ AC_SUBST([VERSION]) AC_CONFIG_AUX_DIR(config) AC_CONFIG_SRCDIR([README]) -AM_INIT_AUTOMAKE(mstflint, 4.1.0) +AM_INIT_AUTOMAKE([-Wall foreign]) dnl Checks for programs AC_PROG_CC +AM_PROG_CC_C_O AC_PROG_CXX +AM_PROG_AR AC_PROG_LIBTOOL AC_CONFIG_HEADERS( config.h ) diff --git a/dev_mgt/Makefile.am b/dev_mgt/Makefile.am index 703a27a..147f7aa 100644 --- a/dev_mgt/Makefile.am +++ b/dev_mgt/Makefile.am @@ -32,7 +32,7 @@ # Makefile.am -- Process this file with automake to produce Makefile.in -INCLUDES = -I$(srcdir) -I$(top_srcdir) -I$(top_srcdir)/common -I$(top_srcdir)/include/mtcr_ul +AM_CPPFLAGS = -I$(srcdir) -I$(top_srcdir) -I$(top_srcdir)/common -I$(top_srcdir)/include/mtcr_ul AM_CFLAGS = -W -Wall -g -MP -MD -Wswitch-enum $(COMPILER_FPIC) -DMTCR_EXPORT noinst_LIBRARIES = libdev_mgt.a diff --git a/flint/Makefile.am b/flint/Makefile.am index f001515..96bb6ad 100755 --- a/flint/Makefile.am +++ b/flint/Makefile.am @@ -38,7 +38,7 @@ LAYOUTS_DIR = $(top_srcdir)/tools_layouts MFT_UTILS_DIR = $(top_srcdir)/mft_utils CMDIF_DIR = $(top_srcdir)/cmdif -INCLUDES = -I$(top_srcdir) -I$(srcdir) -I$(MTCR_DIR) -I$(MFLASH_DIR) -I$(COMMON_DIR) \ +AM_CPPFLAGS = -I$(top_srcdir) -I$(srcdir) -I$(MTCR_DIR) -I$(MFLASH_DIR) -I$(COMMON_DIR) \ -I$(LAYOUTS_DIR) -I$(MFT_UTILS_DIR) mstflint_CXXFLAGS = -Wall -W -g -MP -MD -pipe -DEXTERNAL diff --git a/mflash/Makefile.am b/mflash/Makefile.am index 950cabd..c45351a 100644 --- a/mflash/Makefile.am +++ b/mflash/Makefile.am @@ -31,7 +31,7 @@ #-- # Makefile.am -- Process this file with automake to produce Makefile.in -INCLUDES= -I. -I$(top_srcdir)/include/mtcr_ul -I$(top_srcdir)/common -I$(top_srcdir)/tools_layouts -I$(top_srcdir)/reg_access \ +AM_CPPFLAGS= -I. -I$(top_srcdir)/include/mtcr_ul -I$(top_srcdir)/common -I$(top_srcdir)/tools_layouts -I$(top_srcdir)/reg_access \ -I$(top_srcdir)/cmdif -I$(top_srcdir)/tools_res_mgmt AM_CFLAGS = -MD -pipe -Wall -W -DMST_UL -g ${MFLASH_INBAND_FLAG} diff --git a/mft_utils/Makefile.am b/mft_utils/Makefile.am index ad9ad8e..ebb3bf3 100644 --- a/mft_utils/Makefile.am +++ b/mft_utils/Makefile.am @@ -32,7 +32,7 @@ # Makefile.am -- Process this file with automake to produce Makefile.in USER_DIR = $(top_srcdir) -INCLUDES = -I. -I$(USER_DIR)/common +AM_CPPFLAGS = -I. -I$(USER_DIR)/common AM_CFLAGS = -MD -pipe -Wall -W -Werror diff --git a/mlxconfig/Makefile.am b/mlxconfig/Makefile.am index f4ce674..2b2fd3e 100755 --- a/mlxconfig/Makefile.am +++ b/mlxconfig/Makefile.am @@ -42,7 +42,7 @@ UTILS_LIB = $(USER_DIR)/mft_utils/libmftutils.a CMDIF_DIR = $(USER_DIR)
[PATCH 2/4] Makefile.am: Remove the undefined variable MFT_EXT_LIBS_INC_DIR
This avoids that the option sequence "-I -I" is passed to the compiler, a sequence that causes weird error messages to be printed. Signed-off-by: Bart Van Assche --- mlxconfig/Makefile.am| 2 +- mlxfwops/lib/Makefile.am | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/mlxconfig/Makefile.am b/mlxconfig/Makefile.am index 2b2fd3e..c9189aa 100755 --- a/mlxconfig/Makefile.am +++ b/mlxconfig/Makefile.am @@ -43,7 +43,7 @@ CMDIF_DIR = $(USER_DIR)/cmdif AM_CPPFLAGS = -I. -I$(USER_DIR) -I$(top_srcdir)/include/mtcr_ul -I$(MTCR_DIR) -I$(COMMON_DIR) $(WIN64_INC)\ - -I$(MFT_EXT_LIBS_INC_DIR) -I $(LAYOUTS_DIR) -I$(MFT_EXT_LIBS_INC_DIR)/zlib -I $(UTILS_DIR) -I$(DEV_MGT_DIR) -I$(CMDIF_DIR) + -I $(LAYOUTS_DIR) -I $(UTILS_DIR) -I$(DEV_MGT_DIR) -I$(CMDIF_DIR) AM_CXXFLAGS = -Wall -W -g -MP -MD -pipe bin_PROGRAMS = mstconfig diff --git a/mlxfwops/lib/Makefile.am b/mlxfwops/lib/Makefile.am index 8690345..c04d333 100755 --- a/mlxfwops/lib/Makefile.am +++ b/mlxfwops/lib/Makefile.am @@ -41,7 +41,7 @@ UTILS_LIB = $(top_srcdir)/mft_utils UEFI_COMMON_DIR = $(top_srcdir)/mlxfwops/uefi_c AM_CPPFLAGS = -I$(srcdir) -I$(MTCR_INC_DIR) -I$(MFLASH_DIR) -I$(top_srcdir)/ext_libs/json -I$(MINIXZ_DIR)\ - -I$(COMMON_DIR) -I$(MFT_EXT_LIBS_INC_DIR)/zlib -I $(LAYOUTS_DIR) -I$(top_srcdir)/common -I$(UTILS_LIB) -I$(UEFI_COMMON_DIR) + -I$(COMMON_DIR) -I $(LAYOUTS_DIR) -I$(top_srcdir)/common -I$(UTILS_LIB) -I$(UEFI_COMMON_DIR) MLXFWOPS_VERSION = 1 -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] mstflint source code cleanup patches
These four patches bring the mstflint coding style closer to that of the other RDMA user space software. The patches in this series are: 0001-Fix-several-automake-warnings.patch 0002-Makefile.am-Remove-the-undefined-variable-MFT_EXT_LI.patch 0003-crd_read_line-Rework-code-to-suppress-a-compiler-war.patch 0004-Fix-compiler-warnings-about-set-but-not-used-variabl.patch -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: srp state in current mainline
On 11/25/2015 11:34 AM, Bart Van Assche wrote: > On 11/23/2015 05:14 PM, Bart Van Assche wrote: >> On 11/22/2015 07:31 AM, Christoph Hellwig wrote: >>> On Sun, Nov 22, 2015 at 05:26:28PM +0200, Sagi Grimberg wrote: >>>>> No. register_always=Y is already broken in 4.3, but >>>>> register_always=N is >>>>> now also broken in 4.4. >>>> >>>> OK, I'm confused so please let me understand slowly :) >>>> >>>> Your patch "ib_srp: initialize dma_length in srp_map_idb" solves >>>> the register_always=Y dma_length = 0 WARN_ON() on 4.4-rc, does it solve >>>> the data integrity errors too? >>> >>> No, it doesn't. >>> >>>> This code path is specific to srp because all other ULPs guarantee >>>> no-gaps... >>> >>> Yes. Life would be simpler if we could just set the virt_boundary >>> on SRP, and Bart has indicated that he's willing to at least looks at >>> this for the next merge window. >> >> Hello Christoph, >> >> Tomorrow I will try to reproduce this behavior on my test setup. I >> prepared a setup with kernel v4.4-rc2 and on which the SRP initiator and >> target are running on the same server. Tomorrow I will install xfstests >> and see whether these tests pass fine against an XFS filesystem. > > (replying to my own e-mail) > > I can reproduce this behavior with both LIO and SCST. I have modified > initiator and target code such that the target appends a data CRC at the > end of the SRP_RSP IU and such that the initiator checks that CRC. A CRC > mismatch was reported for the following SG-list by the initiator: > > scsi host10: srp_check_crc: bufflen 1024; resid 0; sg-list len 2; dir > DMA_TO_DEVICE; CRC mismatch (0x7916620b <> 0xde97b796); sg-list: > [0] 880407378348 len 512 > [1] 880407378000 len 512 > > I will check the memory registration code next. (again replying to my own e-mail) The patch below helps somewhat but is not sufficient to make the XFS tests pass. On Monday I will continue to work on this. Bart. [PATCH] IB core: Fix ib_sg_to_pages() Fix the code for detecting gaps and disable the code for chunking. Fixes: "IB/core: Introduce new fast registration API" (commit 4c67e2bfc8b7) Signed-off-by: Bart Van Assche --- drivers/infiniband/core/verbs.c | 23 +++ 1 file changed, 3 insertions(+), 20 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 043a60e..3f9c820 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1530,7 +1530,6 @@ int ib_sg_to_pages(struct ib_mr *mr, int (*set_page)(struct ib_mr *, u64)) { struct scatterlist *sg; - u64 last_end_dma_addr = 0, last_page_addr = 0; unsigned int last_page_off = 0; u64 page_mask = ~((u64)mr->page_size - 1); int i; @@ -1544,23 +1543,9 @@ int ib_sg_to_pages(struct ib_mr *mr, u64 end_dma_addr = dma_addr + dma_len; u64 page_addr = dma_addr & page_mask; - if (i && page_addr != dma_addr) { - if (last_end_dma_addr != dma_addr) { - /* gap */ - goto done; - - } else if (last_page_off + dma_len <= mr->page_size) { - /* chunk this fragment with the last */ - mr->length += dma_len; - last_end_dma_addr += dma_len; - last_page_off += dma_len; - continue; - } else { - /* map starting from the next page */ - page_addr = last_page_addr + mr->page_size; - dma_len -= mr->page_size - last_page_off; - } - } + /* gap */ + if (i && (last_page_off || page_addr != dma_addr)) + goto done; do { if (unlikely(set_page(mr, page_addr))) @@ -1569,8 +1554,6 @@ int ib_sg_to_pages(struct ib_mr *mr, } while (page_addr < end_dma_addr); mr->length += dma_len; - last_end_dma_addr = end_dma_addr; - last_page_addr = end_dma_addr & page_mask; last_page_off = end_dma_addr & ~page_mask; } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: srp state in current mainline
On 11/23/2015 05:14 PM, Bart Van Assche wrote: On 11/22/2015 07:31 AM, Christoph Hellwig wrote: On Sun, Nov 22, 2015 at 05:26:28PM +0200, Sagi Grimberg wrote: No. register_always=Y is already broken in 4.3, but register_always=N is now also broken in 4.4. OK, I'm confused so please let me understand slowly :) Your patch "ib_srp: initialize dma_length in srp_map_idb" solves the register_always=Y dma_length = 0 WARN_ON() on 4.4-rc, does it solve the data integrity errors too? No, it doesn't. This code path is specific to srp because all other ULPs guarantee no-gaps... Yes. Life would be simpler if we could just set the virt_boundary on SRP, and Bart has indicated that he's willing to at least looks at this for the next merge window. Hello Christoph, Tomorrow I will try to reproduce this behavior on my test setup. I prepared a setup with kernel v4.4-rc2 and on which the SRP initiator and target are running on the same server. Tomorrow I will install xfstests and see whether these tests pass fine against an XFS filesystem. (replying to my own e-mail) I can reproduce this behavior with both LIO and SCST. I have modified initiator and target code such that the target appends a data CRC at the end of the SRP_RSP IU and such that the initiator checks that CRC. A CRC mismatch was reported for the following SG-list by the initiator: scsi host10: srp_check_crc: bufflen 1024; resid 0; sg-list len 2; dir DMA_TO_DEVICE; CRC mismatch (0x7916620b <> 0xde97b796); sg-list: [0] 880407378348 len 512 [1] 880407378000 len 512 I will check the memory registration code next. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: srp state in current mainline
On 11/22/2015 07:31 AM, Christoph Hellwig wrote: On Sun, Nov 22, 2015 at 05:26:28PM +0200, Sagi Grimberg wrote: No. register_always=Y is already broken in 4.3, but register_always=N is now also broken in 4.4. OK, I'm confused so please let me understand slowly :) Your patch "ib_srp: initialize dma_length in srp_map_idb" solves the register_always=Y dma_length = 0 WARN_ON() on 4.4-rc, does it solve the data integrity errors too? No, it doesn't. This code path is specific to srp because all other ULPs guarantee no-gaps... Yes. Life would be simpler if we could just set the virt_boundary on SRP, and Bart has indicated that he's willing to at least looks at this for the next merge window. Hello Christoph, Tomorrow I will try to reproduce this behavior on my test setup. I prepared a setup with kernel v4.4-rc2 and on which the SRP initiator and target are running on the same server. Tomorrow I will install xfstests and see whether these tests pass fine against an XFS filesystem. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/srp: Fix possible send queue overflow
On 11/22/2015 05:37 AM, Christoph Hellwig wrote: On Tue, Nov 10, 2015 at 12:35:05PM +0200, Sagi Grimberg wrote: Are you planning to pick this up? Note that this patch is stable material as well. Doug? any plans for this patch? We should really get this in an into -stable. Bart, can you resend the series and if it doesn't get picked up next week send it directly to Linus? Hi Christoph, Are you referring to Sagi's patch only or also to my patch series that fixes error handling ? Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/9] IB: add a proper completion queue abstraction
On 11/23/2015 02:18 PM, Jason Gunthorpe wrote: On Mon, Nov 23, 2015 at 01:54:05PM -0800, Bart Van Assche wrote: What I don't see is how SRP handles things when the sendq fills up, ie the case where __srp_get_tx_iu() == NULL. It looks like the driver starts to panic and generates printks. I can't tell if it can survive that, but it doesn't look very good.. Hello Jason, From srp_cm_rep_handler(): target->scsi_host->can_queue = min(ch->req_lim - SRP_TSK_MGMT_SQ_SIZE, target->scsi_host->can_queue); In other words, the SCSI core is told to ensure that the number of outstanding SCSI commands is one less than the number of elements in the ch->free_tx list. And since the SRP initiator serializes task management requests it is guaranteed that __srp_get_tx_iu() won't fail due to ch->free_tx being empty. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/9] IB: add a proper completion queue abstraction
On 11/23/2015 01:28 PM, Jason Gunthorpe wrote: On Mon, Nov 23, 2015 at 01:04:25PM -0800, Bart Van Assche wrote: Considerable time ago the send queue in the SRP initiator driver was modified from signaled to non-signaled to reduce the number of interrupts triggered by the SRP initiator driver. The SRP initiator driver polls the send queue every time before a SCSI command is sent to the target. I think this is a pattern that is also useful for other ULP's so I'm not convinced that ib_process_cq_direct() should be deprecated :-) As I explained, that is a fine idea, but I can't see how SRP is able to correctly do sendq flow control without spinning on the poll, which it does not do. I'm guessing SRP is trying to drive sendq flow control from the recv side, like NFS was. This is wrong and should not be part of the common API. Does that make sense? Not really ... Please have a look at the SRP initiator source code. What the SRP initiator does is to poll the send queue before sending a new SCSI command to the target system starts. I think this approach could also be used in other ULP drivers if the send queue poll frequency is such that no send queue overflow occurs. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/9] IB: add a proper completion queue abstraction
On 11/23/2015 12:37 PM, Jason Gunthorpe wrote: On Sat, Nov 14, 2015 at 08:13:44AM +0100, Christoph Hellwig wrote: On Fri, Nov 13, 2015 at 03:06:36PM -0700, Jason Gunthorpe wrote: Looking at that thread and then at the patch a bit more.. +void ib_process_cq_direct(struct ib_cq *cq) [..] + __ib_process_cq(cq, INT_MAX); INT_MAX is not enough, it needs to loop. This is missing a ib_req_notify also. No. Direct cases _never_ calls ib_req_notify. Its for the case where the SRP case polls the send CQ only from the same context it sends for without any interrupt notification at al. Hurm, okay, that is not at all what I was thinking this was for.. So the only use of this function is to drain a send cq, in a state where it is guarenteed no new entries can be added, and only if the cq is not already event driven. I'd stick those notes in the comment.. Hum. I wonder if this is even a reasonable way to run a ULP. It is important that rx completions are not used to drive reaping of resources that are still committed to the send queue. ie do not trigger send buffer reuse based on a rx completion. So, if a ULP uses this API, how does it handle the sendq becoming full? As above, a ULP cannot use recvs to infer available sendq space. It must directly reap the sendq. So a correct ULP would have to spin calling ib_process_direct_cq until it makes enough progress to add more things to the sendq. I don't obviously see that in SRP - so I'm guessing it has buggered up sendq flow control? NFS had similar problems lately too, I wrote a long explanation to Chuck on this subject. That said, the demand poll almost seems like a reasonable way for a ULP to run the sendq, do the polls on send occasionally or when more space is needed to better amortize the reaping overhead at the cost of send latency. But API wise it needs to be able to switch over to a sleep if enough progress hasn't been made. So.. maybe also add to the comment that ib_process_cq_direct is deprecated and should not be used in new code until SRP gets sorted? Hello Jason, Considerable time ago the send queue in the SRP initiator driver was modified from signaled to non-signaled to reduce the number of interrupts triggered by the SRP initiator driver. The SRP initiator driver polls the send queue every time before a SCSI command is sent to the target. I think this is a pattern that is also useful for other ULP's so I'm not convinced that ib_process_cq_direct() should be deprecated :-) Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/9] IB: add a proper completion queue abstraction
On 11/22/15 06:57, Sagi Grimberg wrote: I think that bart wants to allow the caller to select cpu affinity per CQ. In this case ib_alloc_cq in workqueue mode would need to accept a affinity_hint from the caller (default to wild-card WORK_CPU_UNBOUND). Hmm, true. How would be set that hint from userspace? I'd really prefer to see a practical justification for it first. In order to assign CPUs from user-space we'd need an ethtool like interface for isert/srpt/. Given that this is something we don't want to get into right now, I assumed that Bart meant that srpt would take a "least used" approach from srpt driver (which isn't better taking the wild-card option I'd say), So I'll let Bart answer... Hello Christoph and Sagi, My intention is indeed to allow to control CPU affinity per CQ. One use case is to implement a least-used policy in RDMA drivers that use multiple completion queues. Another use case is to make CPU affinity configurable from user space through something similar to ethtool or via sysfs. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/cma: Add a missing rcu_read_unlock()
Ensure that validate_ipv4_net_dev() calls rcu_read_unlock() if fib_lookup() fails. Detected by sparse. Compile-tested only. Fixes: "IB/cma: Validate routing of incoming requests" (commit f887f2ac87c2). Cc: Haggai Eran Cc: stable --- drivers/infiniband/core/cma.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 69e5477..f8dfc63 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1126,10 +1126,7 @@ static bool validate_ipv4_net_dev(struct net_device *net_dev, rcu_read_lock(); err = fib_lookup(dev_net(net_dev), &fl4, &res, 0); - if (err) - return false; - - ret = FIB_RES_DEV(res) == net_dev; + ret = err == 0 && FIB_RES_DEV(res) == net_dev; rcu_read_unlock(); return ret; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/9] IB: add a proper completion queue abstraction
On 11/20/2015 02:16 AM, Christoph Hellwig wrote: > On Wed, Nov 18, 2015 at 10:20:14AM -0800, Bart Van Assche wrote: >> Are you perhaps referring to the sysfs CPU mask that allows to control >> workqueue affinity ? > > I think he is referring to the defintion of WQ_UNBOUND: > >WQ_UNBOUND > > Work items queued to an unbound wq are served by the special > woker-pools which host workers which are not bound to any > specific CPU. This makes the wq behave as a simple execution > context provider without concurrency management. The unbound > worker-pools try to start execution of work items as soon as > possible. Unbound wq sacrifices locality but is useful for > the following cases. > > * Wide fluctuation in the concurrency level requirement is > expected and using bound wq may end up creating large number > of mostly unused workers across different CPUs as the issuer > hops through different CPUs. > > * Long running CPU intensive workloads which can be better > managed by the system scheduler. Hello Christoph, The comment about locality in the above quote is interesting. How about modifying patch 2/9 as indicated below ? The modification below does not change the behavior of this patch if ib_cq.w.cpu is not modified. And it allows users who care about locality and who want to skip the scheduler overhead by setting ib_cq.w.cpu to the index of the CPU they want the work to be processed on. Thanks, Bart. --- drivers/infiniband/core/cq.c | 11 ++- include/rdma/ib_verbs.h | 5 - 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c index bf2a079..4d80d8c 100644 --- a/drivers/infiniband/core/cq.c +++ b/drivers/infiniband/core/cq.c @@ -94,18 +94,18 @@ static void ib_cq_completion_softirq(struct ib_cq *cq, void *private) static void ib_cq_poll_work(struct work_struct *work) { - struct ib_cq *cq = container_of(work, struct ib_cq, work); + struct ib_cq *cq = container_of(work, struct ib_cq, w.work); int completed; completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE); if (completed >= IB_POLL_BUDGET_WORKQUEUE || ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0) - queue_work(ib_comp_wq, &cq->work); + queue_work_on(cq->w.cpu, ib_comp_wq, &cq->w.work); } static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private) { - queue_work(ib_comp_wq, &cq->work); + queue_work_on(cq->w.cpu, ib_comp_wq, &cq->w.work); } /** @@ -159,7 +159,8 @@ struct ib_cq *ib_alloc_cq(struct ib_device *dev, void *private, break; case IB_POLL_WORKQUEUE: cq->comp_handler = ib_cq_completion_workqueue; - INIT_WORK(&cq->work, ib_cq_poll_work); + INIT_WORK(&cq->w.work, ib_cq_poll_work); + cq->w.cpu = WORK_CPU_UNBOUND; ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); break; default: @@ -195,7 +196,7 @@ void ib_free_cq(struct ib_cq *cq) irq_poll_disable(&cq->iop); break; case IB_POLL_WORKQUEUE: - flush_work(&cq->work); + flush_work(&cq->w.work); break; default: WARN_ON_ONCE(1); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index f59a8d3..b1344f8 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1291,7 +1291,10 @@ struct ib_cq { struct ib_wc*wc; union { struct irq_poll iop; - struct work_struct work; + struct { + struct work_struct work; + int cpu; + } w; }; }; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/srp: Fix a memory leak
If srp_connect_ch() returns a positive value then that is considered by its caller as a connection failure but this does not result in a scsi_host_put() call and additionally causes the srp_create_target() function to return a positive value while it should return a negative value. Avoid all this confusion and additionally fix a memory leak by ensuring that srp_connect_ch() always returns a value that is <= 0. This patch avoids that a rejected login triggers the following memory leak: unreferenced object 0x88021b24a220 (size 8): comm "srp_daemon", pid 56421, jiffies 4295006762 (age 4240.750s) hex dump (first 8 bytes): 68 6f 73 74 35 38 00 a5 host58.. backtrace: [] kmemleak_alloc+0x7a/0xc0 [] __kmalloc_track_caller+0xfe/0x160 [] kvasprintf+0x5b/0x90 [] kvasprintf_const+0x8d/0xb0 [] kobject_set_name_vargs+0x3c/0xa0 [] dev_set_name+0x3c/0x40 [] scsi_host_alloc+0x327/0x4b0 [] srp_create_target+0x4e/0x8a0 [ib_srp] [] dev_attr_store+0x1b/0x20 [] sysfs_kf_write+0x4a/0x60 [] kernfs_fop_write+0x14e/0x180 [] __vfs_write+0x2f/0xf0 [] vfs_write+0xa4/0x100 [] SyS_write+0x54/0xc0 [] entry_SYSCALL_64_fastpath+0x12/0x6f Signed-off-by: Bart Van Assche Cc: Sagi Grimberg Cc: Sebastian Parschauer Cc: Christoph Hellwig Cc: stable --- drivers/infiniband/ulp/srp/ib_srp.c | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index eda427f8..3f4786a 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -993,16 +993,16 @@ static int srp_connect_ch(struct srp_rdma_ch *ch, bool multich) ret = srp_lookup_path(ch); if (ret) - return ret; + goto out; while (1) { init_completion(&ch->done); ret = srp_send_req(ch, multich); if (ret) - return ret; + goto out; ret = wait_for_completion_interruptible(&ch->done); if (ret < 0) - return ret; + goto out; /* * The CM event handling code will set status to @@ -1010,15 +1010,16 @@ static int srp_connect_ch(struct srp_rdma_ch *ch, bool multich) * back, or SRP_DLID_REDIRECT if we get a lid/qp * redirect REJ back. */ - switch (ch->status) { + ret = ch->status; + switch (ret) { case 0: ch->connected = true; - return 0; + goto out; case SRP_PORT_REDIRECT: ret = srp_lookup_path(ch); if (ret) - return ret; + goto out; break; case SRP_DLID_REDIRECT: @@ -1027,13 +1028,16 @@ static int srp_connect_ch(struct srp_rdma_ch *ch, bool multich) case SRP_STALE_CONN: shost_printk(KERN_ERR, target->scsi_host, PFX "giving up on stale connection\n"); - ch->status = -ECONNRESET; - return ch->status; + ret = -ECONNRESET; + goto out; default: - return ch->status; + goto out; } } + +out: + return ret <= 0 ? ret : -ENODEV; } static int srp_inv_rkey(struct srp_rdma_ch *ch, u32 rkey, u32 send_flags) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/9] IB: add a proper completion queue abstraction
On 11/17/2015 11:55 PM, Sagi Grimberg wrote: +static void ib_cq_poll_work(struct work_struct *work) +{ +struct ib_cq *cq = container_of(work, struct ib_cq, work); +int completed; + +completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE); +if (completed >= IB_POLL_BUDGET_WORKQUEUE || +ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0) +queue_work(ib_comp_wq, &cq->work); +} + +static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private) +{ +queue_work(ib_comp_wq, &cq->work); +} The above code will cause all polling to occur on the context of the CPU that received the completion interrupt. This approach is not powerful enough. For certain workloads throughput is higher if work completions are processed by another CPU core on the same CPU socket. Has it been considered to make the CPU core on which work completions are processed configurable ? The workqueue is unbound. This means that the functionality you are you are asking for exists. Hello Sagi, Are you perhaps referring to the sysfs CPU mask that allows to control workqueue affinity ? I expect that setting the CPU mask for an entire pool through sysfs will lead to suboptimal results. What I have learned by tuning target systems is that there is a significant performance difference (> 30% IOPS) between a configuration where each completion thread is pinned to exactly one CPU compared to allowing the scheduler to choose a CPU. Controlling the CPU affinity of worker threads with the taskset command is not possible since the function create_worker() in kernel/workqueue.c calls kthread_bind_mask(). That function sets PF_NO_SETAFFINITY. From sched.h: #define PF_NO_SETAFFINITY 0x0400 /* Userland is not allowed to meddle with cpus_allowed */ Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/9] srpt: chain RDMA READ/WRITE requests
On 11/18/2015 01:15 AM, Sagi Grimberg wrote: On 18/11/2015 03:17, Bart Van Assche wrote: On 11/13/2015 05:46 AM, Christoph Hellwig wrote: -ret = ib_post_send(ch->qp, &wr.wr, &bad_wr); -if (ret) -break; +if (i == n_rdma - 1) { +/* only get completion event for the last rdma read */ +if (dir == DMA_TO_DEVICE) +wr->wr.send_flags = IB_SEND_SIGNALED; +wr->wr.next = NULL; +} else { +wr->wr.next = &ioctx->rdma_ius[i + 1].wr; +} } +ret = ib_post_send(ch->qp, &ioctx->rdma_ius->wr, &bad_wr); if (ret) pr_err("%s[%d]: ib_post_send() returned %d for %d/%d\n", __func__, __LINE__, ret, i, n_rdma); Hello Christoph, Hi Bart, Chaining RDMA requests is a great idea. But it seems to me that this patch is based on the assumption that posting multiple RDMA requests either succeeds as a whole or fails as a whole. Sorry but I'm not sure that the verbs API guarantees this. In the ib_srpt driver a QP can be changed at any time into the error state and there might be drivers that report an immediate failure in that case. I'm not so sure it actually matters if some WRs succeeded. In the normal flow when srpt has enough available work requests (sq_wr_avail) they should all succeed otherwise we're in trouble. If the QP transitioned to ERROR state, then some failed, but those that succeeded will generate flush completions, and srpt should handle it correctly shouldn't it? I think even when chaining RDMA requests that we still need a mechanism to wait until ongoing RDMA transfers have finished if some but not all RDMA requests have been posted. I'm not an expert on srpt, can you explain how this mechanism will help? Hello Sagi, As you know events like a cable pull can cause some of the RDMA work requests to succeed and others to fail. It is essential that all RDMA work requests related to the same SCSI command have finished before the buffers these requests operate upon are reused. The purpose of the SRPT_RDMA_ABORT request is to wait for the RDMA requests that were posted without IB_SEND_SIGNALED and for which no error completion will be received. BTW, I think this consideration applies to all SCSI target drivers and not only to SRP target drivers. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/9] srpt: chain RDMA READ/WRITE requests
On 11/13/2015 05:46 AM, Christoph Hellwig wrote: - ret = ib_post_send(ch->qp, &wr.wr, &bad_wr); - if (ret) - break; + if (i == n_rdma - 1) { + /* only get completion event for the last rdma read */ + if (dir == DMA_TO_DEVICE) + wr->wr.send_flags = IB_SEND_SIGNALED; + wr->wr.next = NULL; + } else { + wr->wr.next = &ioctx->rdma_ius[i + 1].wr; + } } + ret = ib_post_send(ch->qp, &ioctx->rdma_ius->wr, &bad_wr); if (ret) pr_err("%s[%d]: ib_post_send() returned %d for %d/%d\n", __func__, __LINE__, ret, i, n_rdma); Hello Christoph, Chaining RDMA requests is a great idea. But it seems to me that this patch is based on the assumption that posting multiple RDMA requests either succeeds as a whole or fails as a whole. Sorry but I'm not sure that the verbs API guarantees this. In the ib_srpt driver a QP can be changed at any time into the error state and there might be drivers that report an immediate failure in that case. I think even when chaining RDMA requests that we still need a mechanism to wait until ongoing RDMA transfers have finished if some but not all RDMA requests have been posted. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/9] srp: use the new CQ API
On 11/13/2015 05:46 AM, Christoph Hellwig wrote: +static void srp_inv_rkey_done(struct ib_cq *cq, struct ib_wc *wc) +{ + srp_handle_qp_err(cq, wc, "INV RKEY"); +} > [ ... ] > +static void srp_reg_mr_done(struct ib_cq *cq, struct ib_wc *wc) +{ + srp_handle_qp_err(cq, wc, "FAST REG"); +} How about using names like srp_inv_rkey_err() and srp_reg_mr_err() to make clear that these completion functions are only called if an error occurred ? Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/9] srpt: use the new CQ API
On 11/13/2015 05:46 AM, Christoph Hellwig wrote: [ ... ] This patch contains two logical changes: - Conversion to the new CQ API. - Removal of the ib_srpt_compl thread. Had it been considered to implement these changes as two separate patches ? Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/9] srpt: use the new CQ API
On 11/13/2015 05:46 AM, Christoph Hellwig wrote: > [ ... ] The previous patch and this patch look like great work to me. However, this patch not only reworks the SRP target driver but also prevents users to move the SRP completion thread to another CPU core than the CPU core that processes the completion interrupts (with the help of e.g. the taskset command). Hence my request to make that CPU core configurable in the second patch of this patch series. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/9] IB: add a proper completion queue abstraction
On 11/13/2015 05:46 AM, Christoph Hellwig wrote: + * context and does not ask from completion interrupts from the HCA. Should this perhaps be changed into "for" ? + */ +void ib_process_cq_direct(struct ib_cq *cq) +{ + WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT); + + __ib_process_cq(cq, INT_MAX); +} +EXPORT_SYMBOL(ib_process_cq_direct); My proposal is to drop this function and to export __ib_process_cq() instead (with or without renaming). That will allow callers of this function to compare the poll budget with the number of completions that have been processed and use that information to decide whether or not to call this function again. +static void ib_cq_poll_work(struct work_struct *work) +{ + struct ib_cq *cq = container_of(work, struct ib_cq, work); + int completed; + + completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE); + if (completed >= IB_POLL_BUDGET_WORKQUEUE || + ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0) + queue_work(ib_comp_wq, &cq->work); +} + +static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private) +{ + queue_work(ib_comp_wq, &cq->work); +} The above code will cause all polling to occur on the context of the CPU that received the completion interrupt. This approach is not powerful enough. For certain workloads throughput is higher if work completions are processed by another CPU core on the same CPU socket. Has it been considered to make the CPU core on which work completions are processed configurable ? diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 62b6cba..3027824 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -457,10 +457,11 @@ static struct srp_fr_pool *srp_alloc_fr_pool(struct srp_target_port *target) static void srp_destroy_qp(struct srp_rdma_ch *ch) { static struct ib_qp_attr attr = { .qp_state = IB_QPS_ERR }; - static struct ib_recv_wr wr = { .wr_id = SRP_LAST_WR_ID }; + static struct ib_recv_wr wr = { 0 }; struct ib_recv_wr *bad_wr; int ret; Since the 'wr' structure is static I don't think it needs to be zero-initialized explicitly. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] move blk_iopoll to limit and make it generally available
On 11/17/2015 09:16 AM, Bart Van Assche wrote: On 11/13/2015 11:02 PM, Christoph Hellwig wrote: On Fri, Nov 13, 2015 at 11:19:24AM -0800, Bart Van Assche wrote: On 11/13/2015 05:46 AM, Christoph Hellwig wrote: The new name is irq_poll as iopoll is already taken. Better suggestions welcome. Would it be possible to provide more background information about this ? Which other kernel subsystem is using the name iopoll ? Take a look at include/linux/iopoll.h - I can't reaplly make much sense of it to be honest, but it's used in a quite a few places. How about renaming blk_iopoll into blk_poll ? That way the name still refers to the block layer. And although the current implementation performs polling from IRQ context future implementations maybe will allow polling from thread context. (replying to my own e-mail) Please ignore the previous comment - I just noticed that this mechanism is not limited to block devices. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] move blk_iopoll to limit and make it generally available
On 11/13/2015 11:02 PM, Christoph Hellwig wrote: On Fri, Nov 13, 2015 at 11:19:24AM -0800, Bart Van Assche wrote: On 11/13/2015 05:46 AM, Christoph Hellwig wrote: The new name is irq_poll as iopoll is already taken. Better suggestions welcome. Would it be possible to provide more background information about this ? Which other kernel subsystem is using the name iopoll ? Take a look at include/linux/iopoll.h - I can't reaplly make much sense of it to be honest, but it's used in a quite a few places. How about renaming blk_iopoll into blk_poll ? That way the name still refers to the block layer. And although the current implementation performs polling from IRQ context future implementations maybe will allow polling from thread context. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] IB: add a helper to safely drain a QP
On 11/15/2015 01:34 AM, Sagi Grimberg wrote: This is taken from srp, and srp drains using a recv wr due to a race causing a use-after-free condition in srp which re-posts a recv buffer in the recv completion handler. Hello Sagi, Would it be possible to clarify this ? Does this refer to an existing race or a race that would only occur if the code would be modified ? Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ib_srp: initialize dma_length in srp_map_idb
On 11/16/2015 09:22 AM, Christoph Hellwig wrote: the code in this area changed enough since 4.3 that it won't easily apply. But a backport would still be very useful! In that case: Reviewed-by: Bart Van Assche -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ib_srp: initialize dma_length in srp_map_idb
On 11/15/2015 09:59 AM, Christoph Hellwig wrote: Without this sg_dma_len will return 0 on architectures tha have the dma_length field. Signed-off-by: Christoph Hellwig --- drivers/infiniband/ulp/srp/ib_srp.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 32f7962..445c0a6 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1520,6 +1520,9 @@ static int srp_map_idb(struct srp_rdma_ch *ch, struct srp_request *req, state.sg_nents = 1; sg_set_buf(idb_sg, req->indirect_desc, idb_len); idb_sg->dma_address = req->indirect_dma_addr; /* hack! */ +#ifdef CONFIG_NEED_SG_DMA_LENGTH + idb_sg->dma_length = idb_sg->length;/* hack^2 */ +#endif ret = srp_map_finish_fr(&state, ch); if (ret < 0) return ret; Hello Christoph, How about adding "Cc: stable" to this patch such that it not only will be integrated in kernel v4.4 but also in kernel v4.3.1 or later ? Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: srp state in current mainline
On 11/15/15 10:06, Christoph Hellwig wrote: FYI, I sent a patch for the zero S/G length issue. With this xfstests does fine for ext4 and btrfs. With XFS I still run into corruption warnings for the slab use after free poison pattern. I suspect that issue might be related to uniqueue XFS I/O patterns. One thing that might be related is that XFS can submit bios backed by kmalloced memory. I've also tested XFS on various other storage devices on the same kernel and didn't see an issue like that just to be sure. Thanks for submitting that patch. Did I understand correctly that page-aligned I/O works fine but I/O that is not aligned on a page boundary not ? Have you already had the time to verify whether the "IB/srp: Convert to new registration API" patch is the patch that introduced this issue ? BTW, another location where a buffer is used that is not aligned on a page boundary is the scsi_probe_and_add_lun() function. If a crash occurs while probing LUNs or lsscsi shows garbled data for SRP LUNs that might indicate a problem in the buffer mapping or registration code. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/9] IB: add a proper completion queue abstraction
On 11/13/2015 10:25 AM, Jason Gunthorpe wrote: On Fri, Nov 13, 2015 at 02:46:43PM +0100, Christoph Hellwig wrote: This adds an abstraction that allows ULP to simply pass a completion object and completion callback with each submitted WR and let the RDMA core handle the nitty gritty details of how to handle completion interrupts and poll the CQ. This looks pretty nice, I'd really like to look it over carefully after SC|15.. I know Bart and others have attempted to have switching between event and polling driven operation, but there were problems resolving the races. Would be nice to review that conversation.. Do you remember the details Bart? Hello Jason, I think this is the conversation you are referring to: "About a shortcoming of the verbs API" (http://thread.gmane.org/gmane.linux.drivers.rdma/5028). That conversation occurred five years ago, which means that you have an excellent memory :-) I doesn't seem to me like Christoph wanted to support dynamic switching between the IB_POLL_DIRECT, IB_POLL_SOFTIRQ and IB_POLL_WORKQUEUE polling modes. I think this should have been mentioned in the patch description. The implementation of this patch makes it clear that it is essential that all polling is serialized. The WC array that is used for polling is embedded in the CQ and is not protected against concurrent access. This means that it is essential that _ib_process_cq() calls are serialized. I need some more time to verify whether such serialization is always guaranteed by this patch. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] move blk_iopoll to limit and make it generally available
On 11/13/2015 05:46 AM, Christoph Hellwig wrote: The new name is irq_poll as iopoll is already taken. Better suggestions welcome. Hello Christoph, Would it be possible to provide more background information about this ? Which other kernel subsystem is using the name iopoll ? I think the name blk-iopoll was chosen six years ago for this subsystem (see also https://lwn.net/Articles/346187/). If the conflicting subsystem is newer, how about renaming the other polling mechanism ? Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: srp state in current mainline
On 11/12/2015 09:59 AM, Christoph Hellwig wrote: [ 108.998574] WARNING: CPU: 0 PID: 1258 at kernel/sched/core.c:7389 __might_sleep+0xa7/0xb0() [ 108.998580] do not call blocking ops when !TASK_RUNNING; state=1 set Although this is most likely unrelated to the issue reported at the start of this thread, I have started working on a patch for this warning. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/mad: In validate_mad, validate CM method and attribute
On 11/12/2015 03:48 AM, Hal Rosenstock wrote: A new SRP target has been observed to respond to Send CM REQ with GetResp of CM REQ with bad status. This is non conformant with IBA spec but exposes a vulnerability in the current MAD/CM code which will respond to the incoming GetResp of CM REQ as if it was a valid incoming Send of CM REQ rather than tossing this on the floor. It also causes the MAD layer not to retransmit the original REQ even though it has not received a REP. Hello Hal, With which SRP target has this behavior been observed ? Has this patch been tested with the LIO SRP target ? Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: srp state in current mainline
On 11/10/2015 09:15 AM, Christoph Hellwig wrote: scsi host3: ib_srp: failed receive status WR flushed (5) for iu 880313f4ca40 Can you also post the logs from the target system from around the time this message was logged on the initiator system ? Usually this message means that the target system closed a QP. I'm looking for messages generated by the following statement in ib_srpt.c: pr_info("RDMA t %d for idx %u failed with status %d\n", opcode, index, wc->status); Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html