registering physical memory region
The infiniband specification describes register physical memory region. consumer can request an iova to be returned which can be same or different from the one requested. we provide a physical buffer list (which is list of start physical address of pages) as input to the call that registers physical memory. while allocating any memory the physical address is already associated with a virtual address. still we can request a new virtual address to be returned for region. what is significance of iova, since one virtual address is already associated with physical address that we provide. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] IB/srp: Make queue size configurable
On Tue, 2013-08-20 at 17:55 +0200, Bart Van Assche wrote: > On 08/20/13 17:34, Sagi Grimberg wrote: > > Question, > > If srp now will allow larger queues while using a single global FMR pool > > of size 1024, isn't it more likely now that in stress environment srp > > will run out of FMRs to handle IO commands? > > I mean that let's say that you have x scsi hosts with can_queue size of > > 512 (+-) and all of them are running IO stress, is it possible that all > > FMRs will be inuse and no FMR is available to register the next IO SG-list? > > Did you try out such a scenario? > > > > I guess that in such a case IB core will return EAGAIN and SRP will > > return SCSI_MLQUEUE_HOST_BUSY. > > I think it is a good Idea to move FMR pools to be per connection rather > > than a global pool, what do you think? > > That makes sense to me. And as long as the above has not yet been > implemented I'm fine with dropping patch 8/8 from this patch set. Don't drop it; most configs won't have all that many connections and shouldn't have an issue; even those that do will only see a potential slowdown when running with everything at once. We can address the FMR/BMME issues on top of this patch. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH opensm] osm_db_files.c : Fix resource leak guid2lid parser
From: Dan Ben Yosef leaks the storage that "p_accum_val" and "p_key" points to. Signed-off-by: Dan Ben Yosef Reviewed-by: Vladimir Koushnir Signed-off-by: Hal Rosenstock --- diff --git a/opensm/osm_db_files.c b/opensm/osm_db_files.c index 0d8f36c..513cf85 100644 --- a/opensm/osm_db_files.c +++ b/opensm/osm_db_files.c @@ -272,7 +272,7 @@ int osm_db_restore(IN osm_db_domain_t * p_domain) boolean_t before_key; char *p_first_word, *p_rest_of_line, *p_last; char *p_key = NULL; - char *p_prev_val, *p_accum_val = NULL; + char *p_prev_val = NULL, *p_accum_val = NULL; char *endptr = NULL; unsigned int line_num; @@ -371,12 +371,18 @@ int osm_db_restore(IN osm_db_domain_t * p_domain) if (st_lookup(p_domain_imp->p_hash, (st_data_t) p_key, (void *)&p_prev_val)) { + /* if previously used we ignore this guid */ OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 6106: " "Key:%s already exists in:%s with value:%s." " Removing it\n", p_key, p_domain_imp->file_name, p_prev_val); + free(p_key); + p_key = NULL; + free(p_accum_val); + p_accum_val = NULL; + continue; } else { p_prev_val = NULL; } @@ -391,6 +397,10 @@ int osm_db_restore(IN osm_db_domain_t * p_domain) OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 610B: " "Key:%s is invalid\n", p_key); + free(p_key); + p_key = NULL; + free(p_accum_val); + p_accum_val = NULL; } else { /* store our key and value */ st_insert(p_domain_imp->p_hash, @@ -404,6 +414,7 @@ int osm_db_restore(IN osm_db_domain_t * p_domain) strlen(sLine) + 1); strcpy(p_accum_val, p_prev_val); free(p_prev_val); + p_prev_val = NULL; strcat(p_accum_val, sLine); } } /* in key */ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
celina2john...@hotmail.com
celina2john...@hotmail.com Hello, My name is Celina Johnson. i saw your profile today and become interesting to know more about you. please i will like you respond to me at my private e-mail address (celina2john...@hotmail.com) so that i will tell you more about my self and also give you my picture. and tell you the reason of contacting you. remember distance and color dose not matter what matters most is love. thanks for your understanding. i am waiting for your responds. yours Celina. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] IB/srp: Make queue size configurable
On 08/20/13 17:34, Sagi Grimberg wrote: On 8/20/2013 3:50 PM, Bart Van Assche wrote: Certain storage configurations, e.g. a sufficiently large array of hard disks in a RAID configuration, need a queue depth above 64 to achieve optimal performance. Hence make the queue depth configurable. [ ... ] I noticed this patch in your github and played with it, I agree that this patch is needed for a long time... Question, If srp now will allow larger queues while using a single global FMR pool of size 1024, isn't it more likely now that in stress environment srp will run out of FMRs to handle IO commands? I mean that let's say that you have x scsi hosts with can_queue size of 512 (+-) and all of them are running IO stress, is it possible that all FMRs will be inuse and no FMR is available to register the next IO SG-list? Did you try out such a scenario? I guess that in such a case IB core will return EAGAIN and SRP will return SCSI_MLQUEUE_HOST_BUSY. I think it is a good Idea to move FMR pools to be per connection rather than a global pool, what do you think? That makes sense to me. And as long as the above has not yet been implemented I'm fine with dropping patch 8/8 from this patch set. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH opensm] osm_db_files.c: Fix memory leak when deleting entries from osm db
From: Alex Netes The key also should be freed. Signed-off-by: Alex Netes Signed-off-by: Hal Rosenstock --- opensm/osm_db_files.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/opensm/osm_db_files.c b/opensm/osm_db_files.c index 94dc11c..0eef1d4 100644 --- a/opensm/osm_db_files.c +++ b/opensm/osm_db_files.c @@ -608,6 +608,7 @@ int osm_db_delete(IN osm_db_domain_t * p_domain, IN char *p_key) p_key, p_domain_imp->file_name, p_prev_val); res = 1; } else { + free(p_key); free(p_prev_val); res = 0; } -- 1.7.8.2 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] IB/srp: Make queue size configurable
On 8/20/2013 3:50 PM, Bart Van Assche wrote: Certain storage configurations, e.g. a sufficiently large array of hard disks in a RAID configuration, need a queue depth above 64 to achieve optimal performance. Hence make the queue depth configurable. Signed-off-by: Bart Van Assche Cc: Roland Dreier Cc: David Dillow Cc: Vu Pham Cc: Sebastian Riemer Cc: Konrad Grzybowski --- drivers/infiniband/ulp/srp/ib_srp.c | 125 ++- drivers/infiniband/ulp/srp/ib_srp.h | 17 +++-- 2 files changed, 103 insertions(+), 39 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index ece1f2d..6de2323 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -299,16 +299,16 @@ static int srp_create_target_ib(struct srp_target_port *target) return -ENOMEM; recv_cq = ib_create_cq(target->srp_host->srp_dev->dev, - srp_recv_completion, NULL, target, SRP_RQ_SIZE, - target->comp_vector); + srp_recv_completion, NULL, target, + target->queue_size, target->comp_vector); if (IS_ERR(recv_cq)) { ret = PTR_ERR(recv_cq); goto err; } send_cq = ib_create_cq(target->srp_host->srp_dev->dev, - srp_send_completion, NULL, target, SRP_SQ_SIZE, - target->comp_vector); + srp_send_completion, NULL, target, + target->queue_size, target->comp_vector); if (IS_ERR(send_cq)) { ret = PTR_ERR(send_cq); goto err_recv_cq; @@ -317,8 +317,8 @@ static int srp_create_target_ib(struct srp_target_port *target) ib_req_notify_cq(recv_cq, IB_CQ_NEXT_COMP); init_attr->event_handler = srp_qp_event; - init_attr->cap.max_send_wr = SRP_SQ_SIZE; - init_attr->cap.max_recv_wr = SRP_RQ_SIZE; + init_attr->cap.max_send_wr = target->queue_size; + init_attr->cap.max_recv_wr = target->queue_size; init_attr->cap.max_recv_sge= 1; init_attr->cap.max_send_sge= 1; init_attr->sq_sig_type = IB_SIGNAL_ALL_WR; @@ -364,6 +364,10 @@ err: return ret; } +/* + * Note: this function may be called without srp_alloc_iu_bufs() having been + * invoked. Hence the target->[rt]x_ring checks. + */ static void srp_free_target_ib(struct srp_target_port *target) { int i; @@ -375,10 +379,18 @@ static void srp_free_target_ib(struct srp_target_port *target) target->qp = NULL; target->send_cq = target->recv_cq = NULL; - for (i = 0; i < SRP_RQ_SIZE; ++i) - srp_free_iu(target->srp_host, target->rx_ring[i]); - for (i = 0; i < SRP_SQ_SIZE; ++i) - srp_free_iu(target->srp_host, target->tx_ring[i]); + if (target->rx_ring) { + for (i = 0; i < target->queue_size; ++i) + srp_free_iu(target->srp_host, target->rx_ring[i]); + kfree(target->rx_ring); + target->rx_ring = NULL; + } + if (target->tx_ring) { + for (i = 0; i < target->queue_size; ++i) + srp_free_iu(target->srp_host, target->tx_ring[i]); + kfree(target->tx_ring); + target->tx_ring = NULL; + } } static void srp_path_rec_completion(int status, @@ -564,7 +576,11 @@ static void srp_free_req_data(struct srp_target_port *target) struct srp_request *req; int i; - for (i = 0, req = target->req_ring; i < SRP_CMD_SQ_SIZE; ++i, ++req) { + if (!target->req_ring) + return; + + for (i = 0; i < target->req_ring_size; ++i) { + req = &target->req_ring[i]; kfree(req->fmr_list); kfree(req->map_page); if (req->indirect_dma_addr) { @@ -574,6 +590,9 @@ static void srp_free_req_data(struct srp_target_port *target) } kfree(req->indirect_desc); } + + kfree(target->req_ring); + target->req_ring = NULL; } static int srp_alloc_req_data(struct srp_target_port *target) @@ -586,7 +605,12 @@ static int srp_alloc_req_data(struct srp_target_port *target) INIT_LIST_HEAD(&target->free_reqs); - for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) { + target->req_ring = kzalloc(target->req_ring_size * + sizeof(*target->req_ring), GFP_KERNEL); + if (!target->req_ring) + goto out; + + for (i = 0; i < target->req_ring_size; ++i) { req = &target->req_ring[i]; req->fmr_list = kmalloc(target->cmd_sg_cnt * sizeof(void *), GFP_KERNEL); @@ -810,7 +834,7 @@ static void srp_terminate_io(struct srp_rport *rpor
[PATCH] IB/qib: Move COUNTER_MASK definition within qib_mad.h header guards
Commit 36a8f01c ("IB/qib: Add congestion control agent implementation") caused statements to leak pass the header guard. Correct with this update. Reviewed-by: Marciniszyn, Mike Signed-off-by: Ira Weiny --- drivers/infiniband/hw/qib/qib_mad.h |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/qib/qib_mad.h b/drivers/infiniband/hw/qib/qib_mad.h index 57bd3fa..28874f8 100644 --- a/drivers/infiniband/hw/qib/qib_mad.h +++ b/drivers/infiniband/hw/qib/qib_mad.h @@ -415,7 +415,6 @@ struct cc_table_shadow { struct ib_cc_table_entry_shadow entries[CC_TABLE_SHADOW_MAX]; } __packed; -#endif /* _QIB_MAD_H */ /* * The PortSamplesControl.CounterMasks field is an array of 3 bit fields * which specify the N'th counter's capabilities. See ch. 16.1.3.2. @@ -428,3 +427,5 @@ struct cc_table_shadow { COUNTER_MASK(1, 2) | \ COUNTER_MASK(1, 3) | \ COUNTER_MASK(1, 4)) + +#endif /* _QIB_MAD_H */ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] infiniband-diags: fail configure if glib2 is not found
Signed-off-by: Ira Weiny --- README |1 + configure.ac |2 ++ 2 files changed, 3 insertions(+), 0 deletions(-) diff --git a/README b/README index d19e3e9..10a11e2 100644 --- a/README +++ b/README @@ -14,6 +14,7 @@ Dependencies: 2) libibumad >= 1.3.7 3) opensm-libs >= 3.3.10 4) ib_umad kernel module + 5) glib2 Release notes v1.6.1 => 1.6.2 diff --git a/configure.ac b/configure.ac index 4c37259..b43818b 100644 --- a/configure.ac +++ b/configure.ac @@ -167,6 +167,8 @@ PKG_CHECK_MODULES([GLIB], [glib-2.0], ac_glib=yes, ac_glib=no) AM_CONDITIONAL([HAVE_GLIB], test "$ac_glib" = "yes") if test "$ac_glib" = "yes"; then AC_DEFINE([HAVE_GLIB], 1, [Define to 1 to indicate GLIB support]) +else + AC_MSG_ERROR(glib not found; glib2 is required) fi dnl Begin libibnetdisc stuff -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] Add new verb: uv_query_port_max_datagram()
> Where is the documentation for this? Multiple people have referred to it, but > I don't see any mention of it in libibverbs.git. This is an unmerged, yet to be accepted patch set. Extensions were added as part of adding support for XRC. Yishai Hadas posted v9 of the series on 8/1 - "Add extension and XRC QP support" is the subject. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [ceph-users] Help needed porting Ceph to RSockets
> I have added the patch and re-tested: I still encounter > hangs of my application. I am not quite sure whether the > I hit the same error on the shutdown because now I don't hit > the error always, but only every now and then. I guess this is at least some progress... :/ > WHen adding the patch to my code base (git tag v1.0.17) I notice > an offset of "-34 lines". Which code base are you using? This patch was generated against the tip of the git tree. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 opensm] osm_subnet.c: Fix memory leak caused by commit dc0760cb8088fbe079e19682570a884ba01e94ff
From: Vladimir Koushnir double strdup for p_opt->dump_files_dir is causing memory leak Approach from Bart Van Assche Signed-off-by: Vladimir Koushnir Signed-off-by: Hal Rosenstock --- Change since v1: Eliminate cast by doing separate strdup diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c index 7ab1671..d0835b9 100644 --- a/opensm/osm_subnet.c +++ b/opensm/osm_subnet.c @@ -1499,7 +1499,8 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * p_opt) p_opt->dump_files_dir = getenv("OSM_TMP_DIR"); if (!p_opt->dump_files_dir || !(*p_opt->dump_files_dir)) p_opt->dump_files_dir = strdup(OSM_DEFAULT_TMP_DIR); - p_opt->dump_files_dir = strdup(p_opt->dump_files_dir); + else + p_opt->dump_files_dir = strdup(p_opt->dump_files_dir); p_opt->log_file = strdup(OSM_DEFAULT_LOG_FILE); p_opt->log_max_size = 0; p_opt->partition_config_file = strdup(OSM_DEFAULT_PARTITION_CONFIG_FILE); -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add new verb: uv_query_port_max_datagram()
On Aug 19, 2013, at 8:59 PM, "Hefty, Sean" wrote: >> Any suggestions on how one adds a new driver call without breaking ABI? > > It could be built on the verbs extension mechanism. Where is the documentation for this? Multiple people have referred to it, but I don't see any mention of it in libibverbs.git. > Is it necessary to call into a provider library, versus simply dropping into > the kernel? I don't think I have much of an opinion here, other than: it would seem weird to not call the provider library, given that all other verbs do that. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH opensm] osm_subnet.c: Fix memory leak caused by commit dc0760cb8088fbe079e19682570a884ba01e94ff
On 08/20/13 15:00, Hal Rosenstock wrote: From: Vladimir Koushnir double strdup for p_opt->dump_files_dir is causing memory leak Signed-off-by: Vladimir Koushnir Signed-off-by: Hal Rosenstock --- diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c index 7ab1671..4b5ef38 100644 --- a/opensm/osm_subnet.c +++ b/opensm/osm_subnet.c @@ -1498,7 +1498,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * p_opt) p_opt->dump_files_dir = getenv("OSM_TMP_DIR"); if (!p_opt->dump_files_dir || !(*p_opt->dump_files_dir)) - p_opt->dump_files_dir = strdup(OSM_DEFAULT_TMP_DIR); + p_opt->dump_files_dir = (char *) OSM_DEFAULT_TMP_DIR; p_opt->dump_files_dir = strdup(p_opt->dump_files_dir); p_opt->log_file = strdup(OSM_DEFAULT_LOG_FILE); p_opt->log_max_size = 0; How about avoiding the memory leak via the construct below, which has the advantage that no cast is necessary ? if (!p_opt->dump_files_dir || !(*p_opt->dump_files_dir)) p_opt->dump_files_dir = strdup(OSM_DEFAULT_TMP_DIR); else p_opt->dump_files_dir = strdup(p_opt->dump_files_dir); -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH opensm] osm_subnet.c: Fix memory leak caused by commit dc0760cb8088fbe079e19682570a884ba01e94ff
From: Vladimir Koushnir double strdup for p_opt->dump_files_dir is causing memory leak Signed-off-by: Vladimir Koushnir Signed-off-by: Hal Rosenstock --- diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c index 7ab1671..4b5ef38 100644 --- a/opensm/osm_subnet.c +++ b/opensm/osm_subnet.c @@ -1498,7 +1498,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * p_opt) p_opt->dump_files_dir = getenv("OSM_TMP_DIR"); if (!p_opt->dump_files_dir || !(*p_opt->dump_files_dir)) - p_opt->dump_files_dir = strdup(OSM_DEFAULT_TMP_DIR); + p_opt->dump_files_dir = (char *) OSM_DEFAULT_TMP_DIR; p_opt->dump_files_dir = strdup(p_opt->dump_files_dir); p_opt->log_file = strdup(OSM_DEFAULT_LOG_FILE); p_opt->log_max_size = 0; -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] osm_port_info_rcv.c Issue a log message if we cannot read the MKey of a port
On 8/19/2013 6:46 AM, Line Holen wrote: > On 08/16/13 15:47, Hal Rosenstock wrote: >> On 8/14/2013 6:26 AM, Line Holen wrote: >>> Signed-off-by: Line Holen >>> >>> --- >>> >>> diff --git a/opensm/osm_port_info_rcv.c b/opensm/osm_port_info_rcv.c >>> index 7dcd15e..961b376 100644 >>> --- a/opensm/osm_port_info_rcv.c >>> +++ b/opensm/osm_port_info_rcv.c >>> @@ -85,7 +85,7 @@ static void pi_rcv_process_endport(IN osm_sm_t * >>> sm, IN osm_physp_t * p_physp, >>> osm_madw_context_t context; >>> ib_api_status_t status; >>> ib_net64_t port_guid; >>> -uint8_t rate, mtu; >>> +uint8_t rate, mtu, mpb; >>> unsigned data_vls; >>> cl_qmap_t *p_sm_tbl; >>> osm_remote_sm_t *p_sm; >>> @@ -126,6 +126,14 @@ static void pi_rcv_process_endport(IN osm_sm_t * >>> sm, IN osm_physp_t * p_physp, >>> } >>> } >>> >>> +/* Check M_Key vs M_Key protect, can we control the port ? */ >>> +mpb = ib_port_info_get_mpb(p_pi); >>> +if (mpb > 0 && p_pi->m_key == 0) { >>> +OSM_LOG(sm->p_log, OSM_LOG_INFO, >>> +"Port 0x%" PRIx64 " has unknown M_Key, protection level >>> %u\n", >>> +cl_ntoh64(port_guid), mpb); >>> +} >>> + >> It looks to me like the only case here is when protect bits is 1 for >> gets; all others fail. Is it more than that ? > You are probably right - I was referring to that only for protect bits of 1 does this seem to have potential value for gets as gets with protect bits of 1 with wrong Mkey return port info with 0 MKey. All other mpb cases fail. > have to admit I haven't tried a higher > protection level. What protection level(s) have you tried ? >> >> Also, would this spam the OpenSM log ? > It would print one additional message per heavy sweep. > But if you have a system with unknown MKeys configured you would get > many error > messages as it is. With protection level 2 every MAD operation will > generate > an error I guess (either 3111 or 3120). And with protection level 1 set > operations > will fail, but this new message will let you know why it failed. I think it would be a 3120 error (timeout) rather than bad status. I think that is what is meant in the IBA spec by fail (fail = no response). Have you see 3111 or other than 3120 errors for this ? -- Hal > Line > >> >> -- Hal >> >>> if (port_guid != sm->p_subn->sm_port_guid) { >>> p_sm_tbl =&sm->p_subn->sm_guid_tbl; >>> if (p_pi->capability_mask& IB_PORT_CAP_IS_SM) { >>> > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/8] IB/srp: Make queue size configurable
Certain storage configurations, e.g. a sufficiently large array of hard disks in a RAID configuration, need a queue depth above 64 to achieve optimal performance. Hence make the queue depth configurable. Signed-off-by: Bart Van Assche Cc: Roland Dreier Cc: David Dillow Cc: Vu Pham Cc: Sebastian Riemer Cc: Konrad Grzybowski --- drivers/infiniband/ulp/srp/ib_srp.c | 125 ++- drivers/infiniband/ulp/srp/ib_srp.h | 17 +++-- 2 files changed, 103 insertions(+), 39 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index ece1f2d..6de2323 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -299,16 +299,16 @@ static int srp_create_target_ib(struct srp_target_port *target) return -ENOMEM; recv_cq = ib_create_cq(target->srp_host->srp_dev->dev, - srp_recv_completion, NULL, target, SRP_RQ_SIZE, - target->comp_vector); + srp_recv_completion, NULL, target, + target->queue_size, target->comp_vector); if (IS_ERR(recv_cq)) { ret = PTR_ERR(recv_cq); goto err; } send_cq = ib_create_cq(target->srp_host->srp_dev->dev, - srp_send_completion, NULL, target, SRP_SQ_SIZE, - target->comp_vector); + srp_send_completion, NULL, target, + target->queue_size, target->comp_vector); if (IS_ERR(send_cq)) { ret = PTR_ERR(send_cq); goto err_recv_cq; @@ -317,8 +317,8 @@ static int srp_create_target_ib(struct srp_target_port *target) ib_req_notify_cq(recv_cq, IB_CQ_NEXT_COMP); init_attr->event_handler = srp_qp_event; - init_attr->cap.max_send_wr = SRP_SQ_SIZE; - init_attr->cap.max_recv_wr = SRP_RQ_SIZE; + init_attr->cap.max_send_wr = target->queue_size; + init_attr->cap.max_recv_wr = target->queue_size; init_attr->cap.max_recv_sge= 1; init_attr->cap.max_send_sge= 1; init_attr->sq_sig_type = IB_SIGNAL_ALL_WR; @@ -364,6 +364,10 @@ err: return ret; } +/* + * Note: this function may be called without srp_alloc_iu_bufs() having been + * invoked. Hence the target->[rt]x_ring checks. + */ static void srp_free_target_ib(struct srp_target_port *target) { int i; @@ -375,10 +379,18 @@ static void srp_free_target_ib(struct srp_target_port *target) target->qp = NULL; target->send_cq = target->recv_cq = NULL; - for (i = 0; i < SRP_RQ_SIZE; ++i) - srp_free_iu(target->srp_host, target->rx_ring[i]); - for (i = 0; i < SRP_SQ_SIZE; ++i) - srp_free_iu(target->srp_host, target->tx_ring[i]); + if (target->rx_ring) { + for (i = 0; i < target->queue_size; ++i) + srp_free_iu(target->srp_host, target->rx_ring[i]); + kfree(target->rx_ring); + target->rx_ring = NULL; + } + if (target->tx_ring) { + for (i = 0; i < target->queue_size; ++i) + srp_free_iu(target->srp_host, target->tx_ring[i]); + kfree(target->tx_ring); + target->tx_ring = NULL; + } } static void srp_path_rec_completion(int status, @@ -564,7 +576,11 @@ static void srp_free_req_data(struct srp_target_port *target) struct srp_request *req; int i; - for (i = 0, req = target->req_ring; i < SRP_CMD_SQ_SIZE; ++i, ++req) { + if (!target->req_ring) + return; + + for (i = 0; i < target->req_ring_size; ++i) { + req = &target->req_ring[i]; kfree(req->fmr_list); kfree(req->map_page); if (req->indirect_dma_addr) { @@ -574,6 +590,9 @@ static void srp_free_req_data(struct srp_target_port *target) } kfree(req->indirect_desc); } + + kfree(target->req_ring); + target->req_ring = NULL; } static int srp_alloc_req_data(struct srp_target_port *target) @@ -586,7 +605,12 @@ static int srp_alloc_req_data(struct srp_target_port *target) INIT_LIST_HEAD(&target->free_reqs); - for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) { + target->req_ring = kzalloc(target->req_ring_size * + sizeof(*target->req_ring), GFP_KERNEL); + if (!target->req_ring) + goto out; + + for (i = 0; i < target->req_ring_size; ++i) { req = &target->req_ring[i]; req->fmr_list = kmalloc(target->cmd_sg_cnt * sizeof(void *), GFP_KERNEL); @@ -810,7 +834,7 @@ static void srp_terminate_io(struct srp_rport *rport) struct srp_target_port *tar
[PATCH 7/8] IB/srp: Introduce srp_alloc_req_data()
This patch does not change any functionality. Signed-off-by: Bart Van Assche Cc: Roland Dreier Cc: David Dillow Cc: Vu Pham Cc: Sebastian Riemer --- drivers/infiniband/ulp/srp/ib_srp.c | 64 ++- 1 file changed, 40 insertions(+), 24 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index de4c3b7..ece1f2d 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -576,6 +576,42 @@ static void srp_free_req_data(struct srp_target_port *target) } } +static int srp_alloc_req_data(struct srp_target_port *target) +{ + struct srp_device *srp_dev = target->srp_host->srp_dev; + struct ib_device *ibdev = srp_dev->dev; + struct srp_request *req; + dma_addr_t dma_addr; + int i, ret = -ENOMEM; + + INIT_LIST_HEAD(&target->free_reqs); + + for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) { + req = &target->req_ring[i]; + req->fmr_list = kmalloc(target->cmd_sg_cnt * sizeof(void *), + GFP_KERNEL); + req->map_page = kmalloc(SRP_FMR_SIZE * sizeof(void *), + GFP_KERNEL); + req->indirect_desc = kmalloc(target->indirect_size, GFP_KERNEL); + if (!req->fmr_list || !req->map_page || !req->indirect_desc) + goto out; + + dma_addr = ib_dma_map_single(ibdev, req->indirect_desc, +target->indirect_size, +DMA_TO_DEVICE); + if (ib_dma_mapping_error(ibdev, dma_addr)) + goto out; + + req->indirect_dma_addr = dma_addr; + req->index = i; + list_add_tail(&req->list, &target->free_reqs); + } + ret = 0; + +out: + return ret; +} + /** * srp_del_scsi_host_attr() - Remove attributes defined in the host template. * @shost: SCSI host whose attributes to remove from sysfs. @@ -2393,8 +2429,7 @@ static ssize_t srp_create_target(struct device *dev, struct Scsi_Host *target_host; struct srp_target_port *target; struct ib_device *ibdev = host->srp_dev->dev; - dma_addr_t dma_addr; - int i, ret; + int ret; target_host = scsi_host_alloc(&srp_template, sizeof (struct srp_target_port)); @@ -2450,28 +2485,9 @@ static ssize_t srp_create_target(struct device *dev, INIT_WORK(&target->remove_work, srp_remove_work); spin_lock_init(&target->lock); INIT_LIST_HEAD(&target->free_tx); - INIT_LIST_HEAD(&target->free_reqs); - for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) { - struct srp_request *req = &target->req_ring[i]; - - req->fmr_list = kmalloc(target->cmd_sg_cnt * sizeof (void *), - GFP_KERNEL); - req->map_page = kmalloc(SRP_FMR_SIZE * sizeof (void *), - GFP_KERNEL); - req->indirect_desc = kmalloc(target->indirect_size, GFP_KERNEL); - if (!req->fmr_list || !req->map_page || !req->indirect_desc) - goto err_free_mem; - - dma_addr = ib_dma_map_single(ibdev, req->indirect_desc, -target->indirect_size, -DMA_TO_DEVICE); - if (ib_dma_mapping_error(ibdev, dma_addr)) - goto err_free_mem; - - req->indirect_dma_addr = dma_addr; - req->index = i; - list_add_tail(&req->list, &target->free_reqs); - } + ret = srp_alloc_req_data(target); + if (ret) + goto err_free_mem; ib_query_gid(ibdev, host->port, 0, &target->path.sgid); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] IB/srp: Make transport layer retry count configurable
Allow the InfiniBand RC retry count to be configured by the user as an option in the target login string. Reducing this retry count helps with reducing path failover time. [bvanassche: Rewrote patch description / changed default retry count] Signed-off-by: Vu Pham Signed-off-by: Bart Van Assche Acked-by: David Dillow Cc: Roland Dreier Cc: Sebastian Riemer --- Documentation/ABI/stable/sysfs-driver-ib_srp |2 ++ drivers/infiniband/ulp/srp/ib_srp.c | 24 +++- drivers/infiniband/ulp/srp/ib_srp.h |1 + 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp b/Documentation/ABI/stable/sysfs-driver-ib_srp index 5c53d28..18e9b27 100644 --- a/Documentation/ABI/stable/sysfs-driver-ib_srp +++ b/Documentation/ABI/stable/sysfs-driver-ib_srp @@ -61,6 +61,8 @@ Description: Interface for making ib_srp connect to a new target. interrupt is handled by a different CPU then the comp_vector parameter can be used to spread the SRP completion workload over multiple CPU's. + * tl_retry_count, a number in the range 2..7 specifying the + IB RC retry count. What: /sys/class/infiniband_srp/srp--/ibdev Date: January 2, 2006 diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 2b7ef6b..de4c3b7 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -458,7 +458,7 @@ static int srp_send_req(struct srp_target_port *target) req->param.responder_resources= 4; req->param.remote_cm_response_timeout = 20; req->param.local_cm_response_timeout = 20; - req->param.retry_count= 7; + req->param.retry_count= target->tl_retry_count; req->param.rnr_retry_count= 7; req->param.max_cm_retries = 15; @@ -1991,6 +1991,14 @@ static ssize_t show_comp_vector(struct device *dev, return sprintf(buf, "%d\n", target->comp_vector); } +static ssize_t show_tl_retry_count(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct srp_target_port *target = host_to_target(class_to_shost(dev)); + + return sprintf(buf, "%d\n", target->tl_retry_count); +} + static ssize_t show_cmd_sg_entries(struct device *dev, struct device_attribute *attr, char *buf) { @@ -2018,6 +2026,7 @@ static DEVICE_ATTR(zero_req_lim,S_IRUGO, show_zero_req_lim, NULL); static DEVICE_ATTR(local_ib_port, S_IRUGO, show_local_ib_port, NULL); static DEVICE_ATTR(local_ib_device, S_IRUGO, show_local_ib_device, NULL); static DEVICE_ATTR(comp_vector, S_IRUGO, show_comp_vector, NULL); +static DEVICE_ATTR(tl_retry_count, S_IRUGO, show_tl_retry_count, NULL); static DEVICE_ATTR(cmd_sg_entries, S_IRUGO, show_cmd_sg_entries, NULL); static DEVICE_ATTR(allow_ext_sg,S_IRUGO, show_allow_ext_sg,NULL); @@ -2033,6 +2042,7 @@ static struct device_attribute *srp_host_attrs[] = { &dev_attr_local_ib_port, &dev_attr_local_ib_device, &dev_attr_comp_vector, + &dev_attr_tl_retry_count, &dev_attr_cmd_sg_entries, &dev_attr_allow_ext_sg, NULL @@ -2158,6 +2168,7 @@ enum { SRP_OPT_ALLOW_EXT_SG= 1 << 10, SRP_OPT_SG_TABLESIZE= 1 << 11, SRP_OPT_COMP_VECTOR = 1 << 12, + SRP_OPT_TL_RETRY_COUNT = 1 << 13, SRP_OPT_ALL = (SRP_OPT_ID_EXT | SRP_OPT_IOC_GUID | SRP_OPT_DGID | @@ -2179,6 +2190,7 @@ static const match_table_t srp_opt_tokens = { { SRP_OPT_ALLOW_EXT_SG, "allow_ext_sg=%u" }, { SRP_OPT_SG_TABLESIZE, "sg_tablesize=%u" }, { SRP_OPT_COMP_VECTOR, "comp_vector=%u"}, + { SRP_OPT_TL_RETRY_COUNT, "tl_retry_count=%u" }, { SRP_OPT_ERR, NULL} }; @@ -2342,6 +2354,15 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target) target->comp_vector = token; break; + case SRP_OPT_TL_RETRY_COUNT: + if (match_int(args, &token) || token < 2 || token > 7) { + pr_warn("bad tl_retry_count parameter '%s' (must be a number between 2 and 7)\n", + p); + goto out; + } + target->tl_retry_count = token; + break; + default: pr_warn("unknown parameter or missing value '%s' in target creation request\n", p); @@ -2396,6 +2417,7 @@ static ssize_t
[PATCH 5/8] IB/srp: Start timers if a transport layer error occurs
Start the reconnect timer, fast_io_fail timer and dev_loss timers if a transport layer error occurs. Signed-off-by: Bart Van Assche Acked-by: David Dillow Cc: Roland Dreier Cc: Vu Pham Cc: Sebastian Riemer --- drivers/infiniband/ulp/srp/ib_srp.c | 19 +++ drivers/infiniband/ulp/srp/ib_srp.h |1 + 2 files changed, 20 insertions(+) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index a7fa7ed..2b7ef6b 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -602,6 +602,7 @@ static void srp_remove_target(struct srp_target_port *target) srp_disconnect_target(target); ib_destroy_cm_id(target->cm_id); srp_free_target_ib(target); + cancel_work_sync(&target->tl_err_work); srp_rport_put(target->rport); srp_free_req_data(target); scsi_host_put(target->scsi_host); @@ -1371,6 +1372,21 @@ static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc) PFX "Recv failed with error code %d\n", res); } +/** + * srp_tl_err_work() - handle a transport layer error + * + * Note: This function may get invoked before the rport has been created, + * hence the target->rport test. + */ +static void srp_tl_err_work(struct work_struct *work) +{ + struct srp_target_port *target; + + target = container_of(work, struct srp_target_port, tl_err_work); + if (target->rport) + srp_start_tl_fail_timers(target->rport); +} + static void srp_handle_qp_err(enum ib_wc_status wc_status, enum ib_wc_opcode wc_opcode, struct srp_target_port *target) @@ -1380,6 +1396,7 @@ static void srp_handle_qp_err(enum ib_wc_status wc_status, PFX "failed %s status %d\n", wc_opcode & IB_WC_RECV ? "receive" : "send", wc_status); + queue_work(system_long_wq, &target->tl_err_work); } target->qp_in_error = true; } @@ -1742,6 +1759,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) if (ib_send_cm_drep(cm_id, NULL, 0)) shost_printk(KERN_ERR, target->scsi_host, PFX "Sending CM DREP failed\n"); + queue_work(system_long_wq, &target->tl_err_work); break; case IB_CM_TIMEWAIT_EXIT: @@ -2406,6 +2424,7 @@ static ssize_t srp_create_target(struct device *dev, sizeof (struct srp_indirect_buf) + target->cmd_sg_cnt * sizeof (struct srp_direct_buf); + INIT_WORK(&target->tl_err_work, srp_tl_err_work); INIT_WORK(&target->remove_work, srp_remove_work); spin_lock_init(&target->lock); INIT_LIST_HEAD(&target->free_tx); diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h index b62a943..cbc0b14 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.h +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -176,6 +176,7 @@ struct srp_target_port { struct srp_iu *rx_ring[SRP_RQ_SIZE]; struct srp_request req_ring[SRP_CMD_SQ_SIZE]; + struct work_struct tl_err_work; struct work_struct remove_work; struct list_headlist; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/8] IB/srp: Use SRP transport layer error recovery
Enable reconnect_delay, fast_io_fail_tmo and dev_loss_tmo functionality for the IB SRP initiator. Add kernel module parameters that allow to specify default values for these three parameters. Signed-off-by: Bart Van Assche Acked-by: David Dillow Cc: Roland Dreier Cc: Vu Pham Cc: Sebastian Riemer --- drivers/infiniband/ulp/srp/ib_srp.c | 129 +-- drivers/infiniband/ulp/srp/ib_srp.h |1 - 2 files changed, 94 insertions(+), 36 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 37dd3fb..a7fa7ed 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -86,6 +86,32 @@ module_param(topspin_workarounds, int, 0444); MODULE_PARM_DESC(topspin_workarounds, "Enable workarounds for Topspin/Cisco SRP target bugs if != 0"); +static struct kernel_param_ops srp_tmo_ops; + +static int srp_reconnect_delay = 10; +module_param_cb(reconnect_delay, &srp_tmo_ops, &srp_reconnect_delay, + S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(reconnect_delay, "Time between successive reconnect attempts"); + +static int srp_fast_io_fail_tmo = 15; +module_param_cb(fast_io_fail_tmo, &srp_tmo_ops, &srp_fast_io_fail_tmo, + S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(fast_io_fail_tmo, +"Number of seconds between the observation of a transport" +" layer error and failing all I/O. \"off\" means that this" +" functionality is disabled."); + +static int srp_dev_loss_tmo = 600; +module_param_cb(dev_loss_tmo, &srp_tmo_ops, &srp_dev_loss_tmo, + S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(dev_loss_tmo, +"Maximum number of seconds that the SRP transport should" +" insulate transport layer errors. After this time has been" +" exceeded the SCSI target is removed. Should be" +" between 1 and " __stringify(SCSI_DEVICE_BLOCK_MAX_TIMEOUT) +" if fast_io_fail_tmo has not been set. \"off\" means that" +" this functionality is disabled."); + static void srp_add_one(struct ib_device *device); static void srp_remove_one(struct ib_device *device); static void srp_recv_completion(struct ib_cq *cq, void *target_ptr); @@ -102,6 +128,48 @@ static struct ib_client srp_client = { static struct ib_sa_client srp_sa_client; +static int srp_tmo_get(char *buffer, const struct kernel_param *kp) +{ + int tmo = *(int *)kp->arg; + + if (tmo >= 0) + return sprintf(buffer, "%d", tmo); + else + return sprintf(buffer, "off"); +} + +static int srp_tmo_set(const char *val, const struct kernel_param *kp) +{ + int tmo, res; + + if (strncmp(val, "off", 3) != 0) { + res = kstrtoint(val, 0, &tmo); + if (res) + goto out; + } else { + tmo = -1; + } + if (kp->arg == &srp_reconnect_delay) + res = srp_tmo_valid(tmo, srp_fast_io_fail_tmo, + srp_dev_loss_tmo); + else if (kp->arg == &srp_fast_io_fail_tmo) + res = srp_tmo_valid(srp_reconnect_delay, tmo, srp_dev_loss_tmo); + else + res = srp_tmo_valid(srp_reconnect_delay, srp_fast_io_fail_tmo, + tmo); + if (res) + goto out; + *(int *)kp->arg = tmo; + +out: + return res; +} + +static struct kernel_param_ops srp_tmo_ops = { + .get = srp_tmo_get, + .set = srp_tmo_set, +}; + static inline struct srp_target_port *host_to_target(struct Scsi_Host *host) { return (struct srp_target_port *) host->hostdata; @@ -711,13 +779,20 @@ static void srp_terminate_io(struct srp_rport *rport) } } -static int srp_reconnect_target(struct srp_target_port *target) +/* + * It is up to the caller to ensure that srp_rport_reconnect() calls are + * serialized and that no concurrent srp_queuecommand(), srp_abort(), + * srp_reset_device() or srp_reset_host() calls will occur while this function + * is in progress. One way to realize that is not to call this function + * directly but to call srp_reconnect_rport() instead since that last function + * serializes calls of this function via rport->mutex and also blocks + * srp_queuecommand() calls before invoking this function. + */ +static int srp_rport_reconnect(struct srp_rport *rport) { - struct Scsi_Host *shost = target->scsi_host; + struct srp_target_port *target = rport->lld_data; int i, ret; - scsi_target_block(&shost->shost_gendev); - srp_disconnect_target(target); /* * Now get a new local CM ID so that we avoid confusing the target in @@ -747,28 +822,9 @@ static int srp_reconnect_target(struct srp_target_port *target) if (ret == 0) ret = srp_connect_target(target); - scsi_target_unblock(&shost->s
[PATCH 3/8] IB/srp: Add srp_terminate_io()
Finish all outstanding I/O requests after fast_io_fail_tmo expired, which speeds up failover in a multipath setup. This patch is a reworked version of a patch from Sebastian Riemer. Reported-by: Sebastian Riemer Signed-off-by: Bart Van Assche Acked-by: David Dillow Cc: Roland Dreier Cc: Vu Pham Cc: Sebastian Riemer --- drivers/infiniband/ulp/srp/ib_srp.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index de49088..37dd3fb 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -688,17 +688,29 @@ static void srp_free_req(struct srp_target_port *target, spin_unlock_irqrestore(&target->lock, flags); } -static void srp_reset_req(struct srp_target_port *target, struct srp_request *req) +static void srp_finish_req(struct srp_target_port *target, + struct srp_request *req, int result) { struct scsi_cmnd *scmnd = srp_claim_req(target, req, NULL); if (scmnd) { srp_free_req(target, req, scmnd, 0); - scmnd->result = DID_RESET << 16; + scmnd->result = result; scmnd->scsi_done(scmnd); } } +static void srp_terminate_io(struct srp_rport *rport) +{ + struct srp_target_port *target = rport->lld_data; + int i; + + for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) { + struct srp_request *req = &target->req_ring[i]; + srp_finish_req(target, req, DID_TRANSPORT_FAILFAST << 16); + } +} + static int srp_reconnect_target(struct srp_target_port *target) { struct Scsi_Host *shost = target->scsi_host; @@ -725,8 +737,7 @@ static int srp_reconnect_target(struct srp_target_port *target) for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) { struct srp_request *req = &target->req_ring[i]; - if (req->scmnd) - srp_reset_req(target, req); + srp_finish_req(target, req, DID_RESET << 16); } INIT_LIST_HEAD(&target->free_tx); @@ -1784,7 +1795,7 @@ static int srp_reset_device(struct scsi_cmnd *scmnd) for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) { struct srp_request *req = &target->req_ring[i]; if (req->scmnd && req->scmnd->device == scmnd->device) - srp_reset_req(target, req); + srp_finish_req(target, req, DID_RESET << 16); } return SUCCESS; @@ -2616,6 +2627,7 @@ static void srp_remove_one(struct ib_device *device) static struct srp_function_template ib_srp_transport_functions = { .rport_delete= srp_rport_delete, + .terminate_rport_io = srp_terminate_io, }; static int __init srp_init_module(void) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/8] scsi_transport_srp: Add transport layer error handling
Add the necessary functions in the SRP transport module to allow an SRP initiator driver to implement transport layer error handling similar to the functionality already provided by the FC transport layer. This includes: - Support for implementing fast_io_fail_tmo, the time that should elapse after having detected a transport layer problem and before failing I/O. - Support for implementing dev_loss_tmo, the time that should elapse after having detected a transport layer problem and before removing a remote port. - Support for periodically trying to reconnect to an SRP target after connection to a target has been lost. Signed-off-by: Bart Van Assche Cc: Roland Dreier Cc: James Bottomley Cc: David Dillow Cc: Vu Pham Cc: Sebastian Riemer --- Documentation/ABI/stable/sysfs-transport-srp | 39 ++ drivers/scsi/scsi_transport_srp.c| 504 +- include/scsi/scsi_transport_srp.h| 65 +++- 3 files changed, 605 insertions(+), 3 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-transport-srp b/Documentation/ABI/stable/sysfs-transport-srp index b36fb0d..21bd480 100644 --- a/Documentation/ABI/stable/sysfs-transport-srp +++ b/Documentation/ABI/stable/sysfs-transport-srp @@ -5,6 +5,24 @@ Contact: linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org Description: Instructs an SRP initiator to disconnect from a target and to remove all LUNs imported from that target. +What: /sys/class/srp_remote_ports/port-:/dev_loss_tmo +Date: December 1, 2013 +KernelVersion: 3.12 +Contact: linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org +Description: Number of seconds the SCSI layer will wait after a transport + layer error has been observed before removing a target port. + Zero means immediate removal. Setting this attribute to "off" + will disable the dev_loss timer. + +What: /sys/class/srp_remote_ports/port-:/fast_io_fail_tmo +Date: December 1, 2013 +KernelVersion: 3.12 +Contact: linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org +Description: Number of seconds the SCSI layer will wait after a transport + layer error has been observed before failing I/O. Zero means + failing I/O immediately. Setting this attribute to "off" will + disable the fast_io_fail timer. + What: /sys/class/srp_remote_ports/port-:/port_id Date: June 27, 2007 KernelVersion: 2.6.24 @@ -12,8 +30,29 @@ Contact: linux-s...@vger.kernel.org Description: 16-byte local SRP port identifier in hexadecimal format. An example: 4c:49:4e:55:58:20:56:49:4f:00:00:00:00:00:00:00. +What: /sys/class/srp_remote_ports/port-:/reconnect_delay +Date: December 1, 2013 +KernelVersion: 3.12 +Contact: linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org +Description: Number of seconds the SCSI layer will wait after a reconnect + attempt failed before retrying. Setting this attribute to + "off" will disable time-based reconnecting. + What: /sys/class/srp_remote_ports/port-:/roles Date: June 27, 2007 KernelVersion: 2.6.24 Contact: linux-s...@vger.kernel.org Description: Role of the remote port. Either "SRP Initiator" or "SRP Target". + +What: /sys/class/srp_remote_ports/port-:/state +Date: December 1, 2013 +KernelVersion: 3.12 +Contact: linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org +Description: State of the transport layer used for communication with the + remote port. "running" if the transport layer is operational; + "blocked" if a transport layer error has been encountered but + the fail_io_fast_tmo timer has not yet fired; "fail-fast" + after the fail_io_fast_tmo timer has fired and before the + "dev_loss_tmo" timer has fired; "lost" after the + "dev_loss_tmo" timer has fired and before the port is finally + removed. diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c index f7ba94a..ff1baa8 100644 --- a/drivers/scsi/scsi_transport_srp.c +++ b/drivers/scsi/scsi_transport_srp.c @@ -24,12 +24,15 @@ #include #include #include +#include #include +#include #include #include #include #include +#include "scsi_priv.h" #include "scsi_transport_srp_internal.h" struct srp_host_attrs { @@ -38,7 +41,7 @@ struct srp_host_attrs { #define to_srp_host_attrs(host)((struct srp_host_attrs *)(host)->shost_data) #define SRP_HOST_ATTRS 0 -#define SRP_RPORT_ATTRS 3 +#define SRP_RPORT_ATTRS 8 struct srp_internal { struct scsi_transport_template t; @@ -54,6 +57,36 @@ struct srp_internal { #definedev_to_rport(d) container_of(d, struct srp_rport, dev) #define transport_class_to_srp_rport(dev) dev_to_rport((dev)->paren
[PATCH 1/8] IB/srp: Keep rport as long as the IB transport layer
Keep the rport data structure around after srp_remove_host() has finished until cleanup of the IB transport layer has finished completely. This is necessary because later patches use the rport pointer inside the queuecommand callback. Without this patch accessing the rport from inside a queuecommand callback is racy because srp_remove_host() must be invoked before scsi_remove_host() and because the queuecommand callback may get invoked after srp_remove_host() has finished. In other words, without this patch the queuecommand callback may get invoked after the rport has been removed. Signed-off-by: Bart Van Assche Cc: Roland Dreier Cc: James Bottomley Cc: David Dillow Cc: Vu Pham Cc: Sebastian Riemer --- drivers/infiniband/ulp/srp/ib_srp.c |3 +++ drivers/infiniband/ulp/srp/ib_srp.h |1 + drivers/scsi/scsi_transport_srp.c | 18 ++ include/scsi/scsi_transport_srp.h |2 ++ 4 files changed, 24 insertions(+) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index f93baf8..de49088 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -528,11 +528,13 @@ static void srp_remove_target(struct srp_target_port *target) WARN_ON_ONCE(target->state != SRP_TARGET_REMOVED); srp_del_scsi_host_attr(target->scsi_host); + srp_rport_get(target->rport); srp_remove_host(target->scsi_host); scsi_remove_host(target->scsi_host); srp_disconnect_target(target); ib_destroy_cm_id(target->cm_id); srp_free_target_ib(target); + srp_rport_put(target->rport); srp_free_req_data(target); scsi_host_put(target->scsi_host); } @@ -1994,6 +1996,7 @@ static int srp_add_target(struct srp_host *host, struct srp_target_port *target) } rport->lld_data = target; + target->rport = rport; spin_lock(&host->target_lock); list_add_tail(&target->list, &host->target_list); diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h index e641088..02392f5 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.h +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -153,6 +153,7 @@ struct srp_target_port { u16 io_class; struct srp_host*srp_host; struct Scsi_Host *scsi_host; + struct srp_rport *rport; chartarget_name[32]; unsigned intscsi_id; unsigned intsg_tablesize; diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c index f379c7f..f7ba94a 100644 --- a/drivers/scsi/scsi_transport_srp.c +++ b/drivers/scsi/scsi_transport_srp.c @@ -185,6 +185,24 @@ static int srp_host_match(struct attribute_container *cont, struct device *dev) } /** + * srp_rport_get() - increment rport reference count + */ +void srp_rport_get(struct srp_rport *rport) +{ + get_device(&rport->dev); +} +EXPORT_SYMBOL(srp_rport_get); + +/** + * srp_rport_put() - decrement rport reference count + */ +void srp_rport_put(struct srp_rport *rport) +{ + put_device(&rport->dev); +} +EXPORT_SYMBOL(srp_rport_put); + +/** * srp_rport_add - add a SRP remote port to the device hierarchy * @shost: scsi host the remote port is connected to. * @ids: The port id for the remote port. diff --git a/include/scsi/scsi_transport_srp.h b/include/scsi/scsi_transport_srp.h index ff0f04a..5a2d2d1 100644 --- a/include/scsi/scsi_transport_srp.h +++ b/include/scsi/scsi_transport_srp.h @@ -38,6 +38,8 @@ extern struct scsi_transport_template * srp_attach_transport(struct srp_function_template *); extern void srp_release_transport(struct scsi_transport_template *); +extern void srp_rport_get(struct srp_rport *rport); +extern void srp_rport_put(struct srp_rport *rport); extern struct srp_rport *srp_rport_add(struct Scsi_Host *, struct srp_rport_identifiers *); extern void srp_rport_del(struct srp_rport *); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/8] IB SRP initiator patches for kernel 3.12
The purpose of this InfiniBand SRP initiator patch series is as follows: - Make the SRP initiator driver better suited for use in a H.A. setup. Add fast_io_fail_tmo, dev_loss_tmo and reconnect_delay parameters. These can be used either to speed up failover or to avoid device removal when e.g. using initiator side mirroring. - Make the SRP initiator better suited for use on NUMA systems by making the HCA completion vector configurable. - Improve performance by making the queue size configurable. Changes since the previous patch series are: - Rewrote the srp_tmo_valid() to improve readability (requested by Dave Dillow). - The combination (reconnect_delay < 0 && fast_io_fail_tmo < 0 && dev_loss_tmo < 0) is now rejected as requested by Dave Dillow. - Fixed a race between transport layer failure handling and device removal. This issue was reported by Vu Pham. The previous patch series can be found here: http://thread.gmane.org/gmane.linux.drivers.rdma/16389 The individual patches in this series are: 0001-IB-srp-Keep-rport-as-long-as-the-IB-transport-layer.patch 0002-scsi_transport_srp-Add-transport-layer-error-handlin.patch 0003-IB-srp-Add-srp_terminate_io.patch 0004-IB-srp-Use-SRP-transport-layer-error-recovery.patch 0005-IB-srp-Start-timers-if-a-transport-layer-error-occur.patch 0006-IB-srp-Make-transport-layer-retry-count-configurable.patch 0007-IB-srp-Introduce-srp_alloc_req_data.patch 0008-IB-srp-Make-queue-size-configurable.patch -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ceph-users] Help needed porting Ceph to RSockets
Hi, I have added the patch and re-tested: I still encounter hangs of my application. I am not quite sure whether the I hit the same error on the shutdown because now I don't hit the error always, but only every now and then. WHen adding the patch to my code base (git tag v1.0.17) I notice an offset of "-34 lines". Which code base are you using? Best Regards Andreas Bluemle On Tue, 20 Aug 2013 09:21:13 +0200 Andreas Bluemle wrote: > Hi Sean, > > I will re-check until the end of the week; there is > some test scheduling issue with our test system, which > affects my access times. > > Thanks > > Andreas > > > On Mon, 19 Aug 2013 17:10:11 + > "Hefty, Sean" wrote: > > > Can you see if the patch below fixes the hang? > > > > Signed-off-by: Sean Hefty > > --- > > src/rsocket.c | 11 ++- > > 1 files changed, 10 insertions(+), 1 deletions(-) > > > > diff --git a/src/rsocket.c b/src/rsocket.c > > index d544dd0..e45b26d 100644 > > --- a/src/rsocket.c > > +++ b/src/rsocket.c > > @@ -2948,10 +2948,12 @@ static int rs_poll_events(struct pollfd > > *rfds, struct pollfd *fds, nfds_t nfds) > > rs = idm_lookup(&idm, fds[i].fd); > > if (rs) { > > + fastlock_acquire(&rs->cq_wait_lock); > > if (rs->type == SOCK_STREAM) > > rs_get_cq_event(rs); > > else > > ds_get_cq_event(rs); > > + fastlock_release(&rs->cq_wait_lock); > > fds[i].revents = rs_poll_rs(rs, > > fds[i].events, 1, rs_poll_all); } else { > > fds[i].revents = rfds[i].revents; > > @@ -3098,7 +3100,8 @@ int rselect(int nfds, fd_set *readfds, fd_set > > *writefds, > > /* > > * For graceful disconnect, notify the remote side that we're > > - * disconnecting and wait until all outstanding sends complete. > > + * disconnecting and wait until all outstanding sends complete, > > provided > > + * that the remote side has not sent a disconnect message. > > */ > > int rshutdown(int socket, int how) > > { > > @@ -3138,6 +3141,12 @@ int rshutdown(int socket, int how) > > if (rs->state & rs_connected) > > rs_process_cq(rs, 0, rs_conn_all_sends_done); > > > > + if (rs->state & rs_disconnected) { > > + /* Generate event by flushing receives to unblock > > rpoll */ > > + ibv_req_notify_cq(rs->cm_id->recv_cq, 0); > > + rdma_disconnect(rs->cm_id); > > + } > > + > > if ((rs->fd_flags & O_NONBLOCK) && (rs->state & > > rs_connected)) rs_set_nonblocking(rs, rs->fd_flags); > > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe > > linux-rdma" in the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > -- Andreas Bluemle mailto:andreas.blue...@itxperts.de Heinrich Boell Strasse 88 Phone: (+49) 89 4317582 D-81829 Muenchen (Germany) Mobil: (+49) 177 522 0151 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ceph-users] Help needed porting Ceph to RSockets
Hi Sean, I will re-check until the end of the week; there is some test scheduling issue with our test system, which affects my access times. Thanks Andreas On Mon, 19 Aug 2013 17:10:11 + "Hefty, Sean" wrote: > Can you see if the patch below fixes the hang? > > Signed-off-by: Sean Hefty > --- > src/rsocket.c | 11 ++- > 1 files changed, 10 insertions(+), 1 deletions(-) > > diff --git a/src/rsocket.c b/src/rsocket.c > index d544dd0..e45b26d 100644 > --- a/src/rsocket.c > +++ b/src/rsocket.c > @@ -2948,10 +2948,12 @@ static int rs_poll_events(struct pollfd > *rfds, struct pollfd *fds, nfds_t nfds) > rs = idm_lookup(&idm, fds[i].fd); > if (rs) { > + fastlock_acquire(&rs->cq_wait_lock); > if (rs->type == SOCK_STREAM) > rs_get_cq_event(rs); > else > ds_get_cq_event(rs); > + fastlock_release(&rs->cq_wait_lock); > fds[i].revents = rs_poll_rs(rs, > fds[i].events, 1, rs_poll_all); } else { > fds[i].revents = rfds[i].revents; > @@ -3098,7 +3100,8 @@ int rselect(int nfds, fd_set *readfds, fd_set > *writefds, > /* > * For graceful disconnect, notify the remote side that we're > - * disconnecting and wait until all outstanding sends complete. > + * disconnecting and wait until all outstanding sends complete, > provided > + * that the remote side has not sent a disconnect message. > */ > int rshutdown(int socket, int how) > { > @@ -3138,6 +3141,12 @@ int rshutdown(int socket, int how) > if (rs->state & rs_connected) > rs_process_cq(rs, 0, rs_conn_all_sends_done); > > + if (rs->state & rs_disconnected) { > + /* Generate event by flushing receives to unblock > rpoll */ > + ibv_req_notify_cq(rs->cm_id->recv_cq, 0); > + rdma_disconnect(rs->cm_id); > + } > + > if ((rs->fd_flags & O_NONBLOCK) && (rs->state & > rs_connected)) rs_set_nonblocking(rs, rs->fd_flags); > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" > in the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- Andreas Bluemle mailto:andreas.blue...@itxperts.de Heinrich Boell Strasse 88 Phone: (+49) 89 4317582 D-81829 Muenchen (Germany) Mobil: (+49) 177 522 0151 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html