registering physical memory region

2013-08-20 Thread Eva Mishra
The infiniband specification describes register physical memory
region. consumer can request an iova to be returned which can be same
or different from the one requested.

we provide a physical buffer list (which is list of start physical
address of pages) as input to the call that registers physical memory.
while allocating any memory the physical address is already associated
with a virtual address. still we can request a new virtual address to
be returned for region.

what is significance of iova, since one virtual address is already
associated with physical address that we provide.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] IB/srp: Make queue size configurable

2013-08-20 Thread David Dillow
On Tue, 2013-08-20 at 17:55 +0200, Bart Van Assche wrote:
> On 08/20/13 17:34, Sagi Grimberg wrote:
> > Question,
> > If srp now will allow larger queues while using a single global FMR pool
> > of size 1024, isn't it more likely now that in stress environment srp
> > will run out of FMRs to handle IO commands?
> > I mean that let's say that you have x scsi hosts with can_queue size of
> > 512 (+-) and all of them are running IO stress, is it possible that all
> > FMRs will be inuse and no FMR is available to register the next IO SG-list?
> > Did you try out such a scenario?
> >
> > I guess that in such a case IB core will return EAGAIN and SRP will
> > return SCSI_MLQUEUE_HOST_BUSY.
> > I think it is a good Idea to move FMR pools to be per connection rather
> > than a global pool, what do you think?
> 
> That makes sense to me. And as long as the above has not yet been 
> implemented I'm fine with dropping patch 8/8 from this patch set.

Don't drop it; most configs won't have all that many connections and
shouldn't have an issue; even those that do will only see a potential
slowdown when running with everything at once.

We can address the FMR/BMME issues on top of this patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH opensm] osm_db_files.c : Fix resource leak guid2lid parser

2013-08-20 Thread Hal Rosenstock
From: Dan Ben Yosef 

leaks the storage that "p_accum_val" and "p_key" points to.

Signed-off-by: Dan Ben Yosef 
Reviewed-by: Vladimir Koushnir 
Signed-off-by: Hal Rosenstock 
---
diff --git a/opensm/osm_db_files.c b/opensm/osm_db_files.c
index 0d8f36c..513cf85 100644
--- a/opensm/osm_db_files.c
+++ b/opensm/osm_db_files.c
@@ -272,7 +272,7 @@ int osm_db_restore(IN osm_db_domain_t * p_domain)
boolean_t before_key;
char *p_first_word, *p_rest_of_line, *p_last;
char *p_key = NULL;
-   char *p_prev_val, *p_accum_val = NULL;
+   char *p_prev_val = NULL, *p_accum_val = NULL;
char *endptr = NULL;
unsigned int line_num;
 
@@ -371,12 +371,18 @@ int osm_db_restore(IN osm_db_domain_t * p_domain)
if (st_lookup(p_domain_imp->p_hash,
  (st_data_t) p_key,
  (void *)&p_prev_val)) {
+   /* if previously used we ignore this 
guid */
OSM_LOG(p_log, OSM_LOG_ERROR,
"ERR 6106: "
"Key:%s already exists in:%s 
with value:%s."
" Removing it\n", p_key,
p_domain_imp->file_name,
p_prev_val);
+   free(p_key);
+   p_key = NULL;
+   free(p_accum_val);
+   p_accum_val = NULL;
+   continue;
} else {
p_prev_val = NULL;
}
@@ -391,6 +397,10 @@ int osm_db_restore(IN osm_db_domain_t * p_domain)
OSM_LOG(p_log, OSM_LOG_ERROR,
"ERR 610B: "
"Key:%s is invalid\n", p_key);
+   free(p_key);
+   p_key = NULL;
+   free(p_accum_val);
+   p_accum_val = NULL;
} else {
/* store our key and value */
st_insert(p_domain_imp->p_hash,
@@ -404,6 +414,7 @@ int osm_db_restore(IN osm_db_domain_t * p_domain)
 strlen(sLine) + 1);
strcpy(p_accum_val, p_prev_val);
free(p_prev_val);
+   p_prev_val = NULL;
strcat(p_accum_val, sLine);
}
}   /* in key */
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


celina2john...@hotmail.com

2013-08-20 Thread celina2joh555n4...@libero.it
celina2john...@hotmail.com
Hello,
My name is Celina Johnson.
i saw your profile today and become interesting to know more about you. please 
i will like you respond to me at my private e-mail address 
(celina2john...@hotmail.com) so that i will tell you more about my self and 
also give you my picture. and tell you the reason of contacting you. remember 
distance and color dose not matter what matters most is love. thanks for your 
understanding. i am waiting for your responds.
yours Celina.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] IB/srp: Make queue size configurable

2013-08-20 Thread Bart Van Assche

On 08/20/13 17:34, Sagi Grimberg wrote:

On 8/20/2013 3:50 PM, Bart Van Assche wrote:

Certain storage configurations, e.g. a sufficiently large array of
hard disks in a RAID configuration, need a queue depth above 64 to
achieve optimal performance. Hence make the queue depth configurable.
[ ... ]


I noticed this patch in your github and played with it, I agree that
this patch is needed for a long time...

Question,
If srp now will allow larger queues while using a single global FMR pool
of size 1024, isn't it more likely now that in stress environment srp
will run out of FMRs to handle IO commands?
I mean that let's say that you have x scsi hosts with can_queue size of
512 (+-) and all of them are running IO stress, is it possible that all
FMRs will be inuse and no FMR is available to register the next IO SG-list?
Did you try out such a scenario?

I guess that in such a case IB core will return EAGAIN and SRP will
return SCSI_MLQUEUE_HOST_BUSY.
I think it is a good Idea to move FMR pools to be per connection rather
than a global pool, what do you think?


That makes sense to me. And as long as the above has not yet been 
implemented I'm fine with dropping patch 8/8 from this patch set.


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH opensm] osm_db_files.c: Fix memory leak when deleting entries from osm db

2013-08-20 Thread Hal Rosenstock
From: Alex Netes 

The key also should be freed.

Signed-off-by: Alex Netes 
Signed-off-by: Hal Rosenstock 
---
 opensm/osm_db_files.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/opensm/osm_db_files.c b/opensm/osm_db_files.c
index 94dc11c..0eef1d4 100644
--- a/opensm/osm_db_files.c
+++ b/opensm/osm_db_files.c
@@ -608,6 +608,7 @@ int osm_db_delete(IN osm_db_domain_t * p_domain, IN char 
*p_key)
p_key, p_domain_imp->file_name, p_prev_val);
res = 1;
} else {
+   free(p_key);
free(p_prev_val);
res = 0;
}
-- 
1.7.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] IB/srp: Make queue size configurable

2013-08-20 Thread Sagi Grimberg

On 8/20/2013 3:50 PM, Bart Van Assche wrote:

Certain storage configurations, e.g. a sufficiently large array of
hard disks in a RAID configuration, need a queue depth above 64 to
achieve optimal performance. Hence make the queue depth configurable.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: David Dillow 
Cc: Vu Pham 
Cc: Sebastian Riemer 
Cc: Konrad Grzybowski 
---
  drivers/infiniband/ulp/srp/ib_srp.c |  125 ++-
  drivers/infiniband/ulp/srp/ib_srp.h |   17 +++--
  2 files changed, 103 insertions(+), 39 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index ece1f2d..6de2323 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -299,16 +299,16 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
return -ENOMEM;
  
  	recv_cq = ib_create_cq(target->srp_host->srp_dev->dev,

-  srp_recv_completion, NULL, target, SRP_RQ_SIZE,
-  target->comp_vector);
+  srp_recv_completion, NULL, target,
+  target->queue_size, target->comp_vector);
if (IS_ERR(recv_cq)) {
ret = PTR_ERR(recv_cq);
goto err;
}
  
  	send_cq = ib_create_cq(target->srp_host->srp_dev->dev,

-  srp_send_completion, NULL, target, SRP_SQ_SIZE,
-  target->comp_vector);
+  srp_send_completion, NULL, target,
+  target->queue_size, target->comp_vector);
if (IS_ERR(send_cq)) {
ret = PTR_ERR(send_cq);
goto err_recv_cq;
@@ -317,8 +317,8 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
ib_req_notify_cq(recv_cq, IB_CQ_NEXT_COMP);
  
  	init_attr->event_handler   = srp_qp_event;

-   init_attr->cap.max_send_wr = SRP_SQ_SIZE;
-   init_attr->cap.max_recv_wr = SRP_RQ_SIZE;
+   init_attr->cap.max_send_wr = target->queue_size;
+   init_attr->cap.max_recv_wr = target->queue_size;
init_attr->cap.max_recv_sge= 1;
init_attr->cap.max_send_sge= 1;
init_attr->sq_sig_type = IB_SIGNAL_ALL_WR;
@@ -364,6 +364,10 @@ err:
return ret;
  }
  
+/*

+ * Note: this function may be called without srp_alloc_iu_bufs() having been
+ * invoked. Hence the target->[rt]x_ring checks.
+ */
  static void srp_free_target_ib(struct srp_target_port *target)
  {
int i;
@@ -375,10 +379,18 @@ static void srp_free_target_ib(struct srp_target_port 
*target)
target->qp = NULL;
target->send_cq = target->recv_cq = NULL;
  
-	for (i = 0; i < SRP_RQ_SIZE; ++i)

-   srp_free_iu(target->srp_host, target->rx_ring[i]);
-   for (i = 0; i < SRP_SQ_SIZE; ++i)
-   srp_free_iu(target->srp_host, target->tx_ring[i]);
+   if (target->rx_ring) {
+   for (i = 0; i < target->queue_size; ++i)
+   srp_free_iu(target->srp_host, target->rx_ring[i]);
+   kfree(target->rx_ring);
+   target->rx_ring = NULL;
+   }
+   if (target->tx_ring) {
+   for (i = 0; i < target->queue_size; ++i)
+   srp_free_iu(target->srp_host, target->tx_ring[i]);
+   kfree(target->tx_ring);
+   target->tx_ring = NULL;
+   }
  }
  
  static void srp_path_rec_completion(int status,

@@ -564,7 +576,11 @@ static void srp_free_req_data(struct srp_target_port 
*target)
struct srp_request *req;
int i;
  
-	for (i = 0, req = target->req_ring; i < SRP_CMD_SQ_SIZE; ++i, ++req) {

+   if (!target->req_ring)
+   return;
+
+   for (i = 0; i < target->req_ring_size; ++i) {
+   req = &target->req_ring[i];
kfree(req->fmr_list);
kfree(req->map_page);
if (req->indirect_dma_addr) {
@@ -574,6 +590,9 @@ static void srp_free_req_data(struct srp_target_port 
*target)
}
kfree(req->indirect_desc);
}
+
+   kfree(target->req_ring);
+   target->req_ring = NULL;
  }
  
  static int srp_alloc_req_data(struct srp_target_port *target)

@@ -586,7 +605,12 @@ static int srp_alloc_req_data(struct srp_target_port 
*target)
  
  	INIT_LIST_HEAD(&target->free_reqs);
  
-	for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) {

+   target->req_ring = kzalloc(target->req_ring_size *
+  sizeof(*target->req_ring), GFP_KERNEL);
+   if (!target->req_ring)
+   goto out;
+
+   for (i = 0; i < target->req_ring_size; ++i) {
req = &target->req_ring[i];
req->fmr_list = kmalloc(target->cmd_sg_cnt * sizeof(void *),
GFP_KERNEL);
@@ -810,7 +834,7 @@ static void srp_terminate_io(struct srp_rport *rpor

[PATCH] IB/qib: Move COUNTER_MASK definition within qib_mad.h header guards

2013-08-20 Thread Ira Weiny

Commit 36a8f01c ("IB/qib: Add congestion control agent implementation") caused 
statements to leak pass the header guard.

Correct with this update.

Reviewed-by: Marciniszyn, Mike 
Signed-off-by: Ira Weiny 
---
 drivers/infiniband/hw/qib/qib_mad.h |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_mad.h 
b/drivers/infiniband/hw/qib/qib_mad.h
index 57bd3fa..28874f8 100644
--- a/drivers/infiniband/hw/qib/qib_mad.h
+++ b/drivers/infiniband/hw/qib/qib_mad.h
@@ -415,7 +415,6 @@ struct cc_table_shadow {
struct ib_cc_table_entry_shadow entries[CC_TABLE_SHADOW_MAX];
 } __packed;
 
-#endif /* _QIB_MAD_H */
 /*
  * The PortSamplesControl.CounterMasks field is an array of 3 bit fields
  * which specify the N'th counter's capabilities. See ch. 16.1.3.2.
@@ -428,3 +427,5 @@ struct cc_table_shadow {
COUNTER_MASK(1, 2) | \
COUNTER_MASK(1, 3) | \
COUNTER_MASK(1, 4))
+
+#endif /* _QIB_MAD_H */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] infiniband-diags: fail configure if glib2 is not found

2013-08-20 Thread Ira Weiny

Signed-off-by: Ira Weiny 
---
 README   |1 +
 configure.ac |2 ++
 2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/README b/README
index d19e3e9..10a11e2 100644
--- a/README
+++ b/README
@@ -14,6 +14,7 @@ Dependencies:
2) libibumad >= 1.3.7
3) opensm-libs >= 3.3.10
4) ib_umad kernel module
+   5) glib2
 
 
 Release notes v1.6.1 => 1.6.2
diff --git a/configure.ac b/configure.ac
index 4c37259..b43818b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -167,6 +167,8 @@ PKG_CHECK_MODULES([GLIB], [glib-2.0], ac_glib=yes, 
ac_glib=no)
 AM_CONDITIONAL([HAVE_GLIB], test "$ac_glib" = "yes")
 if test "$ac_glib" = "yes"; then
AC_DEFINE([HAVE_GLIB], 1, [Define to 1 to indicate GLIB support])
+else
+   AC_MSG_ERROR(glib not found; glib2 is required)
 fi
 
 dnl Begin libibnetdisc stuff
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] Add new verb: uv_query_port_max_datagram()

2013-08-20 Thread Hefty, Sean
> Where is the documentation for this?  Multiple people have referred to it, but
> I don't see any mention of it in libibverbs.git.

This is an unmerged, yet to be accepted patch set.  Extensions were added as 
part of adding support for XRC.

Yishai Hadas posted v9 of the series on 8/1 - "Add extension and XRC QP 
support" is the subject.

- Sean
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [ceph-users] Help needed porting Ceph to RSockets

2013-08-20 Thread Hefty, Sean
> I have added the patch and re-tested: I still encounter
> hangs of my application. I am not quite sure whether the
> I hit the same error on the shutdown because now I don't hit
> the error always, but only every now and then.

I guess this is at least some progress... :/
 
> WHen adding the patch to my code base (git tag v1.0.17) I notice
> an offset of "-34 lines". Which code base are you using?

This patch was generated against the tip of the git tree. 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 opensm] osm_subnet.c: Fix memory leak caused by commit dc0760cb8088fbe079e19682570a884ba01e94ff

2013-08-20 Thread Hal Rosenstock
From: Vladimir Koushnir 

double strdup for p_opt->dump_files_dir is causing memory leak

Approach from Bart Van Assche 

Signed-off-by: Vladimir Koushnir 
Signed-off-by: Hal Rosenstock 
---
Change since v1:
Eliminate cast by doing separate strdup

diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
index 7ab1671..d0835b9 100644
--- a/opensm/osm_subnet.c
+++ b/opensm/osm_subnet.c
@@ -1499,7 +1499,8 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * p_opt)
p_opt->dump_files_dir = getenv("OSM_TMP_DIR");
if (!p_opt->dump_files_dir || !(*p_opt->dump_files_dir))
p_opt->dump_files_dir = strdup(OSM_DEFAULT_TMP_DIR);
-   p_opt->dump_files_dir = strdup(p_opt->dump_files_dir);
+   else
+   p_opt->dump_files_dir = strdup(p_opt->dump_files_dir);
p_opt->log_file = strdup(OSM_DEFAULT_LOG_FILE);
p_opt->log_max_size = 0;
p_opt->partition_config_file = 
strdup(OSM_DEFAULT_PARTITION_CONFIG_FILE);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add new verb: uv_query_port_max_datagram()

2013-08-20 Thread Jeff Squyres (jsquyres)
On Aug 19, 2013, at 8:59 PM, "Hefty, Sean"  wrote:

>> Any suggestions on how one adds a new driver call without breaking ABI?
> 
> It could be built on the verbs extension mechanism.

Where is the documentation for this?  Multiple people have referred to it, but 
I don't see any mention of it in libibverbs.git.

> Is it necessary to call into a provider library, versus simply dropping into 
> the kernel?


I don't think I have much of an opinion here, other than: it would seem weird 
to not call the provider library, given that all other verbs do that.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH opensm] osm_subnet.c: Fix memory leak caused by commit dc0760cb8088fbe079e19682570a884ba01e94ff

2013-08-20 Thread Bart Van Assche

On 08/20/13 15:00, Hal Rosenstock wrote:

From: Vladimir Koushnir 

double strdup for p_opt->dump_files_dir is causing memory leak

Signed-off-by: Vladimir Koushnir 
Signed-off-by: Hal Rosenstock 
---
diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
index 7ab1671..4b5ef38 100644
--- a/opensm/osm_subnet.c
+++ b/opensm/osm_subnet.c
@@ -1498,7 +1498,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t *
p_opt)
p_opt->dump_files_dir = getenv("OSM_TMP_DIR");
if (!p_opt->dump_files_dir || !(*p_opt->dump_files_dir))
-   p_opt->dump_files_dir = strdup(OSM_DEFAULT_TMP_DIR);
+   p_opt->dump_files_dir = (char *) OSM_DEFAULT_TMP_DIR;
p_opt->dump_files_dir = strdup(p_opt->dump_files_dir);
p_opt->log_file = strdup(OSM_DEFAULT_LOG_FILE);
p_opt->log_max_size = 0;


How about avoiding the memory leak via the construct below, which has 
the advantage that no cast is necessary ?


if (!p_opt->dump_files_dir || !(*p_opt->dump_files_dir))
p_opt->dump_files_dir = strdup(OSM_DEFAULT_TMP_DIR);
else
p_opt->dump_files_dir = strdup(p_opt->dump_files_dir);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH opensm] osm_subnet.c: Fix memory leak caused by commit dc0760cb8088fbe079e19682570a884ba01e94ff

2013-08-20 Thread Hal Rosenstock
From: Vladimir Koushnir 

double strdup for p_opt->dump_files_dir is causing memory leak

Signed-off-by: Vladimir Koushnir 
Signed-off-by: Hal Rosenstock 
---
diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
index 7ab1671..4b5ef38 100644
--- a/opensm/osm_subnet.c
+++ b/opensm/osm_subnet.c
@@ -1498,7 +1498,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t *
p_opt)
p_opt->dump_files_dir = getenv("OSM_TMP_DIR");
if (!p_opt->dump_files_dir || !(*p_opt->dump_files_dir))
-   p_opt->dump_files_dir = strdup(OSM_DEFAULT_TMP_DIR);
+   p_opt->dump_files_dir = (char *) OSM_DEFAULT_TMP_DIR;
p_opt->dump_files_dir = strdup(p_opt->dump_files_dir);
p_opt->log_file = strdup(OSM_DEFAULT_LOG_FILE);
p_opt->log_max_size = 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] osm_port_info_rcv.c Issue a log message if we cannot read the MKey of a port

2013-08-20 Thread Hal Rosenstock
On 8/19/2013 6:46 AM, Line Holen wrote:
> On 08/16/13 15:47, Hal Rosenstock wrote:
>> On 8/14/2013 6:26 AM, Line Holen wrote:
>>> Signed-off-by: Line Holen
>>>
>>> ---
>>>
>>> diff --git a/opensm/osm_port_info_rcv.c b/opensm/osm_port_info_rcv.c
>>> index 7dcd15e..961b376 100644
>>> --- a/opensm/osm_port_info_rcv.c
>>> +++ b/opensm/osm_port_info_rcv.c
>>> @@ -85,7 +85,7 @@ static void pi_rcv_process_endport(IN osm_sm_t *
>>> sm, IN osm_physp_t * p_physp,
>>>   osm_madw_context_t context;
>>>   ib_api_status_t status;
>>>   ib_net64_t port_guid;
>>> -uint8_t rate, mtu;
>>> +uint8_t rate, mtu, mpb;
>>>   unsigned data_vls;
>>>   cl_qmap_t *p_sm_tbl;
>>>   osm_remote_sm_t *p_sm;
>>> @@ -126,6 +126,14 @@ static void pi_rcv_process_endport(IN osm_sm_t *
>>> sm, IN osm_physp_t * p_physp,
>>>   }
>>>   }
>>>
>>> +/* Check M_Key vs M_Key protect, can we control the port ? */
>>> +mpb = ib_port_info_get_mpb(p_pi);
>>> +if (mpb > 0 && p_pi->m_key == 0) {
>>> +OSM_LOG(sm->p_log, OSM_LOG_INFO,
>>> +"Port 0x%" PRIx64 " has unknown M_Key, protection level
>>> %u\n",
>>> +cl_ntoh64(port_guid), mpb);
>>> +}
>>> +
>> It looks to me like the only case here is when protect bits is 1 for
>> gets; all others fail. Is it more than that ?
> You are probably right - 

I was referring to that only for protect bits of 1 does this seem to
have potential value for gets as gets with protect bits of 1 with wrong
Mkey return port info with 0 MKey. All other mpb cases fail.

> have to admit I haven't tried a higher
> protection level.

What protection level(s) have you tried ?

>>
>> Also, would this spam the OpenSM log ?
> It would print one additional message per heavy sweep.
> But if you have a system with unknown MKeys configured you would get
> many error
> messages as it is. With protection level 2 every MAD operation will
> generate
> an error I guess (either 3111 or 3120). And with protection level 1 set
> operations
> will fail, but this new message will let you know why it failed.

I think it would be a 3120 error (timeout) rather than bad status. I
think that is what is meant in the IBA spec by fail (fail = no
response). Have you see 3111 or other than 3120 errors for this ?

-- Hal

> Line
> 
>>
>> -- Hal
>>
>>>   if (port_guid != sm->p_subn->sm_port_guid) {
>>>   p_sm_tbl =&sm->p_subn->sm_guid_tbl;
>>>   if (p_pi->capability_mask&  IB_PORT_CAP_IS_SM) {
>>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/8] IB/srp: Make queue size configurable

2013-08-20 Thread Bart Van Assche
Certain storage configurations, e.g. a sufficiently large array of
hard disks in a RAID configuration, need a queue depth above 64 to
achieve optimal performance. Hence make the queue depth configurable.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: David Dillow 
Cc: Vu Pham 
Cc: Sebastian Riemer 
Cc: Konrad Grzybowski 
---
 drivers/infiniband/ulp/srp/ib_srp.c |  125 ++-
 drivers/infiniband/ulp/srp/ib_srp.h |   17 +++--
 2 files changed, 103 insertions(+), 39 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index ece1f2d..6de2323 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -299,16 +299,16 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
return -ENOMEM;
 
recv_cq = ib_create_cq(target->srp_host->srp_dev->dev,
-  srp_recv_completion, NULL, target, SRP_RQ_SIZE,
-  target->comp_vector);
+  srp_recv_completion, NULL, target,
+  target->queue_size, target->comp_vector);
if (IS_ERR(recv_cq)) {
ret = PTR_ERR(recv_cq);
goto err;
}
 
send_cq = ib_create_cq(target->srp_host->srp_dev->dev,
-  srp_send_completion, NULL, target, SRP_SQ_SIZE,
-  target->comp_vector);
+  srp_send_completion, NULL, target,
+  target->queue_size, target->comp_vector);
if (IS_ERR(send_cq)) {
ret = PTR_ERR(send_cq);
goto err_recv_cq;
@@ -317,8 +317,8 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
ib_req_notify_cq(recv_cq, IB_CQ_NEXT_COMP);
 
init_attr->event_handler   = srp_qp_event;
-   init_attr->cap.max_send_wr = SRP_SQ_SIZE;
-   init_attr->cap.max_recv_wr = SRP_RQ_SIZE;
+   init_attr->cap.max_send_wr = target->queue_size;
+   init_attr->cap.max_recv_wr = target->queue_size;
init_attr->cap.max_recv_sge= 1;
init_attr->cap.max_send_sge= 1;
init_attr->sq_sig_type = IB_SIGNAL_ALL_WR;
@@ -364,6 +364,10 @@ err:
return ret;
 }
 
+/*
+ * Note: this function may be called without srp_alloc_iu_bufs() having been
+ * invoked. Hence the target->[rt]x_ring checks.
+ */
 static void srp_free_target_ib(struct srp_target_port *target)
 {
int i;
@@ -375,10 +379,18 @@ static void srp_free_target_ib(struct srp_target_port 
*target)
target->qp = NULL;
target->send_cq = target->recv_cq = NULL;
 
-   for (i = 0; i < SRP_RQ_SIZE; ++i)
-   srp_free_iu(target->srp_host, target->rx_ring[i]);
-   for (i = 0; i < SRP_SQ_SIZE; ++i)
-   srp_free_iu(target->srp_host, target->tx_ring[i]);
+   if (target->rx_ring) {
+   for (i = 0; i < target->queue_size; ++i)
+   srp_free_iu(target->srp_host, target->rx_ring[i]);
+   kfree(target->rx_ring);
+   target->rx_ring = NULL;
+   }
+   if (target->tx_ring) {
+   for (i = 0; i < target->queue_size; ++i)
+   srp_free_iu(target->srp_host, target->tx_ring[i]);
+   kfree(target->tx_ring);
+   target->tx_ring = NULL;
+   }
 }
 
 static void srp_path_rec_completion(int status,
@@ -564,7 +576,11 @@ static void srp_free_req_data(struct srp_target_port 
*target)
struct srp_request *req;
int i;
 
-   for (i = 0, req = target->req_ring; i < SRP_CMD_SQ_SIZE; ++i, ++req) {
+   if (!target->req_ring)
+   return;
+
+   for (i = 0; i < target->req_ring_size; ++i) {
+   req = &target->req_ring[i];
kfree(req->fmr_list);
kfree(req->map_page);
if (req->indirect_dma_addr) {
@@ -574,6 +590,9 @@ static void srp_free_req_data(struct srp_target_port 
*target)
}
kfree(req->indirect_desc);
}
+
+   kfree(target->req_ring);
+   target->req_ring = NULL;
 }
 
 static int srp_alloc_req_data(struct srp_target_port *target)
@@ -586,7 +605,12 @@ static int srp_alloc_req_data(struct srp_target_port 
*target)
 
INIT_LIST_HEAD(&target->free_reqs);
 
-   for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) {
+   target->req_ring = kzalloc(target->req_ring_size *
+  sizeof(*target->req_ring), GFP_KERNEL);
+   if (!target->req_ring)
+   goto out;
+
+   for (i = 0; i < target->req_ring_size; ++i) {
req = &target->req_ring[i];
req->fmr_list = kmalloc(target->cmd_sg_cnt * sizeof(void *),
GFP_KERNEL);
@@ -810,7 +834,7 @@ static void srp_terminate_io(struct srp_rport *rport)
struct srp_target_port *tar

[PATCH 7/8] IB/srp: Introduce srp_alloc_req_data()

2013-08-20 Thread Bart Van Assche
This patch does not change any functionality.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: David Dillow 
Cc: Vu Pham 
Cc: Sebastian Riemer 
---
 drivers/infiniband/ulp/srp/ib_srp.c |   64 ++-
 1 file changed, 40 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index de4c3b7..ece1f2d 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -576,6 +576,42 @@ static void srp_free_req_data(struct srp_target_port 
*target)
}
 }
 
+static int srp_alloc_req_data(struct srp_target_port *target)
+{
+   struct srp_device *srp_dev = target->srp_host->srp_dev;
+   struct ib_device *ibdev = srp_dev->dev;
+   struct srp_request *req;
+   dma_addr_t dma_addr;
+   int i, ret = -ENOMEM;
+
+   INIT_LIST_HEAD(&target->free_reqs);
+
+   for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) {
+   req = &target->req_ring[i];
+   req->fmr_list = kmalloc(target->cmd_sg_cnt * sizeof(void *),
+   GFP_KERNEL);
+   req->map_page = kmalloc(SRP_FMR_SIZE * sizeof(void *),
+   GFP_KERNEL);
+   req->indirect_desc = kmalloc(target->indirect_size, GFP_KERNEL);
+   if (!req->fmr_list || !req->map_page || !req->indirect_desc)
+   goto out;
+
+   dma_addr = ib_dma_map_single(ibdev, req->indirect_desc,
+target->indirect_size,
+DMA_TO_DEVICE);
+   if (ib_dma_mapping_error(ibdev, dma_addr))
+   goto out;
+
+   req->indirect_dma_addr = dma_addr;
+   req->index = i;
+   list_add_tail(&req->list, &target->free_reqs);
+   }
+   ret = 0;
+
+out:
+   return ret;
+}
+
 /**
  * srp_del_scsi_host_attr() - Remove attributes defined in the host template.
  * @shost: SCSI host whose attributes to remove from sysfs.
@@ -2393,8 +2429,7 @@ static ssize_t srp_create_target(struct device *dev,
struct Scsi_Host *target_host;
struct srp_target_port *target;
struct ib_device *ibdev = host->srp_dev->dev;
-   dma_addr_t dma_addr;
-   int i, ret;
+   int ret;
 
target_host = scsi_host_alloc(&srp_template,
  sizeof (struct srp_target_port));
@@ -2450,28 +2485,9 @@ static ssize_t srp_create_target(struct device *dev,
INIT_WORK(&target->remove_work, srp_remove_work);
spin_lock_init(&target->lock);
INIT_LIST_HEAD(&target->free_tx);
-   INIT_LIST_HEAD(&target->free_reqs);
-   for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) {
-   struct srp_request *req = &target->req_ring[i];
-
-   req->fmr_list = kmalloc(target->cmd_sg_cnt * sizeof (void *),
-   GFP_KERNEL);
-   req->map_page = kmalloc(SRP_FMR_SIZE * sizeof (void *),
-   GFP_KERNEL);
-   req->indirect_desc = kmalloc(target->indirect_size, GFP_KERNEL);
-   if (!req->fmr_list || !req->map_page || !req->indirect_desc)
-   goto err_free_mem;
-
-   dma_addr = ib_dma_map_single(ibdev, req->indirect_desc,
-target->indirect_size,
-DMA_TO_DEVICE);
-   if (ib_dma_mapping_error(ibdev, dma_addr))
-   goto err_free_mem;
-
-   req->indirect_dma_addr = dma_addr;
-   req->index = i;
-   list_add_tail(&req->list, &target->free_reqs);
-   }
+   ret = srp_alloc_req_data(target);
+   if (ret)
+   goto err_free_mem;
 
ib_query_gid(ibdev, host->port, 0, &target->path.sgid);
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/8] IB/srp: Make transport layer retry count configurable

2013-08-20 Thread Bart Van Assche
Allow the InfiniBand RC retry count to be configured by the user
as an option in the target login string. Reducing this retry count
helps with reducing path failover time.

[bvanassche: Rewrote patch description / changed default retry count]
Signed-off-by: Vu Pham 
Signed-off-by: Bart Van Assche 
Acked-by: David Dillow 
Cc: Roland Dreier 
Cc: Sebastian Riemer 
---
 Documentation/ABI/stable/sysfs-driver-ib_srp |2 ++
 drivers/infiniband/ulp/srp/ib_srp.c  |   24 +++-
 drivers/infiniband/ulp/srp/ib_srp.h  |1 +
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp 
b/Documentation/ABI/stable/sysfs-driver-ib_srp
index 5c53d28..18e9b27 100644
--- a/Documentation/ABI/stable/sysfs-driver-ib_srp
+++ b/Documentation/ABI/stable/sysfs-driver-ib_srp
@@ -61,6 +61,8 @@ Description:  Interface for making ib_srp connect to a new 
target.
  interrupt is handled by a different CPU then the comp_vector
  parameter can be used to spread the SRP completion workload
  over multiple CPU's.
+   * tl_retry_count, a number in the range 2..7 specifying the
+ IB RC retry count.
 
 What:  /sys/class/infiniband_srp/srp--/ibdev
 Date:  January 2, 2006
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 2b7ef6b..de4c3b7 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -458,7 +458,7 @@ static int srp_send_req(struct srp_target_port *target)
req->param.responder_resources= 4;
req->param.remote_cm_response_timeout = 20;
req->param.local_cm_response_timeout  = 20;
-   req->param.retry_count= 7;
+   req->param.retry_count= target->tl_retry_count;
req->param.rnr_retry_count= 7;
req->param.max_cm_retries = 15;
 
@@ -1991,6 +1991,14 @@ static ssize_t show_comp_vector(struct device *dev,
return sprintf(buf, "%d\n", target->comp_vector);
 }
 
+static ssize_t show_tl_retry_count(struct device *dev,
+  struct device_attribute *attr, char *buf)
+{
+   struct srp_target_port *target = host_to_target(class_to_shost(dev));
+
+   return sprintf(buf, "%d\n", target->tl_retry_count);
+}
+
 static ssize_t show_cmd_sg_entries(struct device *dev,
   struct device_attribute *attr, char *buf)
 {
@@ -2018,6 +2026,7 @@ static DEVICE_ATTR(zero_req_lim,S_IRUGO, 
show_zero_req_lim,  NULL);
 static DEVICE_ATTR(local_ib_port,   S_IRUGO, show_local_ib_port,   NULL);
 static DEVICE_ATTR(local_ib_device, S_IRUGO, show_local_ib_device, NULL);
 static DEVICE_ATTR(comp_vector, S_IRUGO, show_comp_vector, NULL);
+static DEVICE_ATTR(tl_retry_count,  S_IRUGO, show_tl_retry_count,  NULL);
 static DEVICE_ATTR(cmd_sg_entries,  S_IRUGO, show_cmd_sg_entries,  NULL);
 static DEVICE_ATTR(allow_ext_sg,S_IRUGO, show_allow_ext_sg,NULL);
 
@@ -2033,6 +2042,7 @@ static struct device_attribute *srp_host_attrs[] = {
&dev_attr_local_ib_port,
&dev_attr_local_ib_device,
&dev_attr_comp_vector,
+   &dev_attr_tl_retry_count,
&dev_attr_cmd_sg_entries,
&dev_attr_allow_ext_sg,
NULL
@@ -2158,6 +2168,7 @@ enum {
SRP_OPT_ALLOW_EXT_SG= 1 << 10,
SRP_OPT_SG_TABLESIZE= 1 << 11,
SRP_OPT_COMP_VECTOR = 1 << 12,
+   SRP_OPT_TL_RETRY_COUNT  = 1 << 13,
SRP_OPT_ALL = (SRP_OPT_ID_EXT   |
   SRP_OPT_IOC_GUID |
   SRP_OPT_DGID |
@@ -2179,6 +2190,7 @@ static const match_table_t srp_opt_tokens = {
{ SRP_OPT_ALLOW_EXT_SG, "allow_ext_sg=%u"   },
{ SRP_OPT_SG_TABLESIZE, "sg_tablesize=%u"   },
{ SRP_OPT_COMP_VECTOR,  "comp_vector=%u"},
+   { SRP_OPT_TL_RETRY_COUNT,   "tl_retry_count=%u" },
{ SRP_OPT_ERR,  NULL}
 };
 
@@ -2342,6 +2354,15 @@ static int srp_parse_options(const char *buf, struct 
srp_target_port *target)
target->comp_vector = token;
break;
 
+   case SRP_OPT_TL_RETRY_COUNT:
+   if (match_int(args, &token) || token < 2 || token > 7) {
+   pr_warn("bad tl_retry_count parameter '%s' 
(must be a number between 2 and 7)\n",
+   p);
+   goto out;
+   }
+   target->tl_retry_count = token;
+   break;
+
default:
pr_warn("unknown parameter or missing value '%s' in 
target creation request\n",
p);
@@ -2396,6 +2417,7 @@ static ssize_t

[PATCH 5/8] IB/srp: Start timers if a transport layer error occurs

2013-08-20 Thread Bart Van Assche
Start the reconnect timer, fast_io_fail timer and dev_loss timers
if a transport layer error occurs.

Signed-off-by: Bart Van Assche 
Acked-by: David Dillow 
Cc: Roland Dreier 
Cc: Vu Pham 
Cc: Sebastian Riemer 
---
 drivers/infiniband/ulp/srp/ib_srp.c |   19 +++
 drivers/infiniband/ulp/srp/ib_srp.h |1 +
 2 files changed, 20 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index a7fa7ed..2b7ef6b 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -602,6 +602,7 @@ static void srp_remove_target(struct srp_target_port 
*target)
srp_disconnect_target(target);
ib_destroy_cm_id(target->cm_id);
srp_free_target_ib(target);
+   cancel_work_sync(&target->tl_err_work);
srp_rport_put(target->rport);
srp_free_req_data(target);
scsi_host_put(target->scsi_host);
@@ -1371,6 +1372,21 @@ static void srp_handle_recv(struct srp_target_port 
*target, struct ib_wc *wc)
 PFX "Recv failed with error code %d\n", res);
 }
 
+/**
+ * srp_tl_err_work() - handle a transport layer error
+ *
+ * Note: This function may get invoked before the rport has been created,
+ * hence the target->rport test.
+ */
+static void srp_tl_err_work(struct work_struct *work)
+{
+   struct srp_target_port *target;
+
+   target = container_of(work, struct srp_target_port, tl_err_work);
+   if (target->rport)
+   srp_start_tl_fail_timers(target->rport);
+}
+
 static void srp_handle_qp_err(enum ib_wc_status wc_status,
  enum ib_wc_opcode wc_opcode,
  struct srp_target_port *target)
@@ -1380,6 +1396,7 @@ static void srp_handle_qp_err(enum ib_wc_status wc_status,
 PFX "failed %s status %d\n",
 wc_opcode & IB_WC_RECV ? "receive" : "send",
 wc_status);
+   queue_work(system_long_wq, &target->tl_err_work);
}
target->qp_in_error = true;
 }
@@ -1742,6 +1759,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct 
ib_cm_event *event)
if (ib_send_cm_drep(cm_id, NULL, 0))
shost_printk(KERN_ERR, target->scsi_host,
 PFX "Sending CM DREP failed\n");
+   queue_work(system_long_wq, &target->tl_err_work);
break;
 
case IB_CM_TIMEWAIT_EXIT:
@@ -2406,6 +2424,7 @@ static ssize_t srp_create_target(struct device *dev,
 sizeof (struct srp_indirect_buf) +
 target->cmd_sg_cnt * sizeof (struct 
srp_direct_buf);
 
+   INIT_WORK(&target->tl_err_work, srp_tl_err_work);
INIT_WORK(&target->remove_work, srp_remove_work);
spin_lock_init(&target->lock);
INIT_LIST_HEAD(&target->free_tx);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h 
b/drivers/infiniband/ulp/srp/ib_srp.h
index b62a943..cbc0b14 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -176,6 +176,7 @@ struct srp_target_port {
struct srp_iu  *rx_ring[SRP_RQ_SIZE];
struct srp_request  req_ring[SRP_CMD_SQ_SIZE];
 
+   struct work_struct  tl_err_work;
struct work_struct  remove_work;
 
struct list_headlist;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/8] IB/srp: Use SRP transport layer error recovery

2013-08-20 Thread Bart Van Assche
Enable reconnect_delay, fast_io_fail_tmo and dev_loss_tmo
functionality for the IB SRP initiator. Add kernel module
parameters that allow to specify default values for these
three parameters.

Signed-off-by: Bart Van Assche 
Acked-by: David Dillow 
Cc: Roland Dreier 
Cc: Vu Pham 
Cc: Sebastian Riemer 
---
 drivers/infiniband/ulp/srp/ib_srp.c |  129 +--
 drivers/infiniband/ulp/srp/ib_srp.h |1 -
 2 files changed, 94 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 37dd3fb..a7fa7ed 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -86,6 +86,32 @@ module_param(topspin_workarounds, int, 0444);
 MODULE_PARM_DESC(topspin_workarounds,
 "Enable workarounds for Topspin/Cisco SRP target bugs if != 
0");
 
+static struct kernel_param_ops srp_tmo_ops;
+
+static int srp_reconnect_delay = 10;
+module_param_cb(reconnect_delay, &srp_tmo_ops, &srp_reconnect_delay,
+   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(reconnect_delay, "Time between successive reconnect 
attempts");
+
+static int srp_fast_io_fail_tmo = 15;
+module_param_cb(fast_io_fail_tmo, &srp_tmo_ops, &srp_fast_io_fail_tmo,
+   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(fast_io_fail_tmo,
+"Number of seconds between the observation of a transport"
+" layer error and failing all I/O. \"off\" means that this"
+" functionality is disabled.");
+
+static int srp_dev_loss_tmo = 600;
+module_param_cb(dev_loss_tmo, &srp_tmo_ops, &srp_dev_loss_tmo,
+   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(dev_loss_tmo,
+"Maximum number of seconds that the SRP transport should"
+" insulate transport layer errors. After this time has been"
+" exceeded the SCSI target is removed. Should be"
+" between 1 and " __stringify(SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
+" if fast_io_fail_tmo has not been set. \"off\" means that"
+" this functionality is disabled.");
+
 static void srp_add_one(struct ib_device *device);
 static void srp_remove_one(struct ib_device *device);
 static void srp_recv_completion(struct ib_cq *cq, void *target_ptr);
@@ -102,6 +128,48 @@ static struct ib_client srp_client = {
 
 static struct ib_sa_client srp_sa_client;
 
+static int srp_tmo_get(char *buffer, const struct kernel_param *kp)
+{
+   int tmo = *(int *)kp->arg;
+
+   if (tmo >= 0)
+   return sprintf(buffer, "%d", tmo);
+   else
+   return sprintf(buffer, "off");
+}
+
+static int srp_tmo_set(const char *val, const struct kernel_param *kp)
+{
+   int tmo, res;
+
+   if (strncmp(val, "off", 3) != 0) {
+   res = kstrtoint(val, 0, &tmo);
+   if (res)
+   goto out;
+   } else {
+   tmo = -1;
+   }
+   if (kp->arg == &srp_reconnect_delay)
+   res = srp_tmo_valid(tmo, srp_fast_io_fail_tmo,
+   srp_dev_loss_tmo);
+   else if (kp->arg == &srp_fast_io_fail_tmo)
+   res = srp_tmo_valid(srp_reconnect_delay, tmo, srp_dev_loss_tmo);
+   else
+   res = srp_tmo_valid(srp_reconnect_delay, srp_fast_io_fail_tmo,
+   tmo);
+   if (res)
+   goto out;
+   *(int *)kp->arg = tmo;
+
+out:
+   return res;
+}
+
+static struct kernel_param_ops srp_tmo_ops = {
+   .get = srp_tmo_get,
+   .set = srp_tmo_set,
+};
+
 static inline struct srp_target_port *host_to_target(struct Scsi_Host *host)
 {
return (struct srp_target_port *) host->hostdata;
@@ -711,13 +779,20 @@ static void srp_terminate_io(struct srp_rport *rport)
}
 }
 
-static int srp_reconnect_target(struct srp_target_port *target)
+/*
+ * It is up to the caller to ensure that srp_rport_reconnect() calls are
+ * serialized and that no concurrent srp_queuecommand(), srp_abort(),
+ * srp_reset_device() or srp_reset_host() calls will occur while this function
+ * is in progress. One way to realize that is not to call this function
+ * directly but to call srp_reconnect_rport() instead since that last function
+ * serializes calls of this function via rport->mutex and also blocks
+ * srp_queuecommand() calls before invoking this function.
+ */
+static int srp_rport_reconnect(struct srp_rport *rport)
 {
-   struct Scsi_Host *shost = target->scsi_host;
+   struct srp_target_port *target = rport->lld_data;
int i, ret;
 
-   scsi_target_block(&shost->shost_gendev);
-
srp_disconnect_target(target);
/*
 * Now get a new local CM ID so that we avoid confusing the target in
@@ -747,28 +822,9 @@ static int srp_reconnect_target(struct srp_target_port 
*target)
if (ret == 0)
ret = srp_connect_target(target);
 
-   scsi_target_unblock(&shost->s

[PATCH 3/8] IB/srp: Add srp_terminate_io()

2013-08-20 Thread Bart Van Assche
Finish all outstanding I/O requests after fast_io_fail_tmo expired,
which speeds up failover in a multipath setup. This patch is a
reworked version of a patch from Sebastian Riemer.

Reported-by: Sebastian Riemer 
Signed-off-by: Bart Van Assche 
Acked-by: David Dillow 
Cc: Roland Dreier 
Cc: Vu Pham 
Cc: Sebastian Riemer 
---
 drivers/infiniband/ulp/srp/ib_srp.c |   22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index de49088..37dd3fb 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -688,17 +688,29 @@ static void srp_free_req(struct srp_target_port *target,
spin_unlock_irqrestore(&target->lock, flags);
 }
 
-static void srp_reset_req(struct srp_target_port *target, struct srp_request 
*req)
+static void srp_finish_req(struct srp_target_port *target,
+  struct srp_request *req, int result)
 {
struct scsi_cmnd *scmnd = srp_claim_req(target, req, NULL);
 
if (scmnd) {
srp_free_req(target, req, scmnd, 0);
-   scmnd->result = DID_RESET << 16;
+   scmnd->result = result;
scmnd->scsi_done(scmnd);
}
 }
 
+static void srp_terminate_io(struct srp_rport *rport)
+{
+   struct srp_target_port *target = rport->lld_data;
+   int i;
+
+   for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) {
+   struct srp_request *req = &target->req_ring[i];
+   srp_finish_req(target, req, DID_TRANSPORT_FAILFAST << 16);
+   }
+}
+
 static int srp_reconnect_target(struct srp_target_port *target)
 {
struct Scsi_Host *shost = target->scsi_host;
@@ -725,8 +737,7 @@ static int srp_reconnect_target(struct srp_target_port 
*target)
 
for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) {
struct srp_request *req = &target->req_ring[i];
-   if (req->scmnd)
-   srp_reset_req(target, req);
+   srp_finish_req(target, req, DID_RESET << 16);
}
 
INIT_LIST_HEAD(&target->free_tx);
@@ -1784,7 +1795,7 @@ static int srp_reset_device(struct scsi_cmnd *scmnd)
for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) {
struct srp_request *req = &target->req_ring[i];
if (req->scmnd && req->scmnd->device == scmnd->device)
-   srp_reset_req(target, req);
+   srp_finish_req(target, req, DID_RESET << 16);
}
 
return SUCCESS;
@@ -2616,6 +2627,7 @@ static void srp_remove_one(struct ib_device *device)
 
 static struct srp_function_template ib_srp_transport_functions = {
.rport_delete= srp_rport_delete,
+   .terminate_rport_io  = srp_terminate_io,
 };
 
 static int __init srp_init_module(void)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/8] scsi_transport_srp: Add transport layer error handling

2013-08-20 Thread Bart Van Assche
Add the necessary functions in the SRP transport module to allow
an SRP initiator driver to implement transport layer error handling
similar to the functionality already provided by the FC transport
layer. This includes:
- Support for implementing fast_io_fail_tmo, the time that should
  elapse after having detected a transport layer problem and
  before failing I/O.
- Support for implementing dev_loss_tmo, the time that should
  elapse after having detected a transport layer problem and
  before removing a remote port.
- Support for periodically trying to reconnect to an SRP target
  after connection to a target has been lost.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: James Bottomley 
Cc: David Dillow 
Cc: Vu Pham 
Cc: Sebastian Riemer 
---
 Documentation/ABI/stable/sysfs-transport-srp |   39 ++
 drivers/scsi/scsi_transport_srp.c|  504 +-
 include/scsi/scsi_transport_srp.h|   65 +++-
 3 files changed, 605 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-transport-srp 
b/Documentation/ABI/stable/sysfs-transport-srp
index b36fb0d..21bd480 100644
--- a/Documentation/ABI/stable/sysfs-transport-srp
+++ b/Documentation/ABI/stable/sysfs-transport-srp
@@ -5,6 +5,24 @@ Contact:   linux-s...@vger.kernel.org, 
linux-rdma@vger.kernel.org
 Description:   Instructs an SRP initiator to disconnect from a target and to
remove all LUNs imported from that target.
 
+What:  /sys/class/srp_remote_ports/port-:/dev_loss_tmo
+Date:  December 1, 2013
+KernelVersion: 3.12
+Contact:   linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org
+Description:   Number of seconds the SCSI layer will wait after a transport
+   layer error has been observed before removing a target port.
+   Zero means immediate removal. Setting this attribute to "off"
+   will disable the dev_loss timer.
+
+What:  /sys/class/srp_remote_ports/port-:/fast_io_fail_tmo
+Date:  December 1, 2013
+KernelVersion: 3.12
+Contact:   linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org
+Description:   Number of seconds the SCSI layer will wait after a transport
+   layer error has been observed before failing I/O. Zero means
+   failing I/O immediately. Setting this attribute to "off" will
+   disable the fast_io_fail timer.
+
 What:  /sys/class/srp_remote_ports/port-:/port_id
 Date:  June 27, 2007
 KernelVersion: 2.6.24
@@ -12,8 +30,29 @@ Contact: linux-s...@vger.kernel.org
 Description:   16-byte local SRP port identifier in hexadecimal format. An
example: 4c:49:4e:55:58:20:56:49:4f:00:00:00:00:00:00:00.
 
+What:  /sys/class/srp_remote_ports/port-:/reconnect_delay
+Date:  December 1, 2013
+KernelVersion: 3.12
+Contact:   linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org
+Description:   Number of seconds the SCSI layer will wait after a reconnect
+   attempt failed before retrying. Setting this attribute to
+   "off" will disable time-based reconnecting.
+
 What:  /sys/class/srp_remote_ports/port-:/roles
 Date:  June 27, 2007
 KernelVersion: 2.6.24
 Contact:   linux-s...@vger.kernel.org
 Description:   Role of the remote port. Either "SRP Initiator" or "SRP Target".
+
+What:  /sys/class/srp_remote_ports/port-:/state
+Date:  December 1, 2013
+KernelVersion: 3.12
+Contact:   linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org
+Description:   State of the transport layer used for communication with the
+   remote port. "running" if the transport layer is operational;
+   "blocked" if a transport layer error has been encountered but
+   the fail_io_fast_tmo timer has not yet fired; "fail-fast"
+   after the fail_io_fast_tmo timer has fired and before the
+   "dev_loss_tmo" timer has fired; "lost" after the
+   "dev_loss_tmo" timer has fired and before the port is finally
+   removed.
diff --git a/drivers/scsi/scsi_transport_srp.c 
b/drivers/scsi/scsi_transport_srp.c
index f7ba94a..ff1baa8 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -24,12 +24,15 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include "scsi_priv.h"
 #include "scsi_transport_srp_internal.h"
 
 struct srp_host_attrs {
@@ -38,7 +41,7 @@ struct srp_host_attrs {
 #define to_srp_host_attrs(host)((struct srp_host_attrs 
*)(host)->shost_data)
 
 #define SRP_HOST_ATTRS 0
-#define SRP_RPORT_ATTRS 3
+#define SRP_RPORT_ATTRS 8
 
 struct srp_internal {
struct scsi_transport_template t;
@@ -54,6 +57,36 @@ struct srp_internal {
 
 #definedev_to_rport(d) container_of(d, struct srp_rport, dev)
 #define transport_class_to_srp_rport(dev) dev_to_rport((dev)->paren

[PATCH 1/8] IB/srp: Keep rport as long as the IB transport layer

2013-08-20 Thread Bart Van Assche
Keep the rport data structure around after srp_remove_host() has
finished until cleanup of the IB transport layer has finished
completely. This is necessary because later patches use the rport
pointer inside the queuecommand callback. Without this patch
accessing the rport from inside a queuecommand callback is racy
because srp_remove_host() must be invoked before scsi_remove_host()
and because the queuecommand callback may get invoked after
srp_remove_host() has finished. In other words, without this patch
the queuecommand callback may get invoked after the rport has been
removed.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: James Bottomley 
Cc: David Dillow 
Cc: Vu Pham 
Cc: Sebastian Riemer 
---
 drivers/infiniband/ulp/srp/ib_srp.c |3 +++
 drivers/infiniband/ulp/srp/ib_srp.h |1 +
 drivers/scsi/scsi_transport_srp.c   |   18 ++
 include/scsi/scsi_transport_srp.h   |2 ++
 4 files changed, 24 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index f93baf8..de49088 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -528,11 +528,13 @@ static void srp_remove_target(struct srp_target_port 
*target)
WARN_ON_ONCE(target->state != SRP_TARGET_REMOVED);
 
srp_del_scsi_host_attr(target->scsi_host);
+   srp_rport_get(target->rport);
srp_remove_host(target->scsi_host);
scsi_remove_host(target->scsi_host);
srp_disconnect_target(target);
ib_destroy_cm_id(target->cm_id);
srp_free_target_ib(target);
+   srp_rport_put(target->rport);
srp_free_req_data(target);
scsi_host_put(target->scsi_host);
 }
@@ -1994,6 +1996,7 @@ static int srp_add_target(struct srp_host *host, struct 
srp_target_port *target)
}
 
rport->lld_data = target;
+   target->rport = rport;
 
spin_lock(&host->target_lock);
list_add_tail(&target->list, &host->target_list);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h 
b/drivers/infiniband/ulp/srp/ib_srp.h
index e641088..02392f5 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -153,6 +153,7 @@ struct srp_target_port {
u16 io_class;
struct srp_host*srp_host;
struct Scsi_Host   *scsi_host;
+   struct srp_rport   *rport;
chartarget_name[32];
unsigned intscsi_id;
unsigned intsg_tablesize;
diff --git a/drivers/scsi/scsi_transport_srp.c 
b/drivers/scsi/scsi_transport_srp.c
index f379c7f..f7ba94a 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -185,6 +185,24 @@ static int srp_host_match(struct attribute_container 
*cont, struct device *dev)
 }
 
 /**
+ * srp_rport_get() - increment rport reference count
+ */
+void srp_rport_get(struct srp_rport *rport)
+{
+   get_device(&rport->dev);
+}
+EXPORT_SYMBOL(srp_rport_get);
+
+/**
+ * srp_rport_put() - decrement rport reference count
+ */
+void srp_rport_put(struct srp_rport *rport)
+{
+   put_device(&rport->dev);
+}
+EXPORT_SYMBOL(srp_rport_put);
+
+/**
  * srp_rport_add - add a SRP remote port to the device hierarchy
  * @shost: scsi host the remote port is connected to.
  * @ids:   The port id for the remote port.
diff --git a/include/scsi/scsi_transport_srp.h 
b/include/scsi/scsi_transport_srp.h
index ff0f04a..5a2d2d1 100644
--- a/include/scsi/scsi_transport_srp.h
+++ b/include/scsi/scsi_transport_srp.h
@@ -38,6 +38,8 @@ extern struct scsi_transport_template *
 srp_attach_transport(struct srp_function_template *);
 extern void srp_release_transport(struct scsi_transport_template *);
 
+extern void srp_rport_get(struct srp_rport *rport);
+extern void srp_rport_put(struct srp_rport *rport);
 extern struct srp_rport *srp_rport_add(struct Scsi_Host *,
   struct srp_rport_identifiers *);
 extern void srp_rport_del(struct srp_rport *);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/8] IB SRP initiator patches for kernel 3.12

2013-08-20 Thread Bart Van Assche

The purpose of this InfiniBand SRP initiator patch series is as follows:
- Make the SRP initiator driver better suited for use in a H.A. setup.
  Add fast_io_fail_tmo, dev_loss_tmo and reconnect_delay parameters.
  These can be used either to speed up failover or to avoid device
  removal when e.g. using initiator side mirroring.
- Make the SRP initiator better suited for use on NUMA systems by
  making the HCA completion vector configurable.
- Improve performance by making the queue size configurable.

Changes since the previous patch series are:
- Rewrote the srp_tmo_valid() to improve readability (requested by Dave
  Dillow).
- The combination (reconnect_delay < 0 && fast_io_fail_tmo < 0 &&
  dev_loss_tmo < 0) is now rejected as requested by Dave Dillow.
- Fixed a race between transport layer failure handling and device
  removal. This issue was reported by Vu Pham.

The previous patch series can be found here:
http://thread.gmane.org/gmane.linux.drivers.rdma/16389

The individual patches in this series are:
0001-IB-srp-Keep-rport-as-long-as-the-IB-transport-layer.patch
0002-scsi_transport_srp-Add-transport-layer-error-handlin.patch
0003-IB-srp-Add-srp_terminate_io.patch
0004-IB-srp-Use-SRP-transport-layer-error-recovery.patch
0005-IB-srp-Start-timers-if-a-transport-layer-error-occur.patch
0006-IB-srp-Make-transport-layer-retry-count-configurable.patch
0007-IB-srp-Introduce-srp_alloc_req_data.patch
0008-IB-srp-Make-queue-size-configurable.patch
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-20 Thread Andreas Bluemle
Hi,

I have added the patch and re-tested: I still encounter
hangs of my application. I am not quite sure whether the
I hit the same error on the shutdown because now I don't hit
the error always, but only every now and then.

WHen adding the patch to my code base (git tag v1.0.17) I notice
an offset of "-34 lines". Which code base are you using?


Best Regards

Andreas Bluemle

On Tue, 20 Aug 2013 09:21:13 +0200
Andreas Bluemle  wrote:

> Hi Sean,
> 
> I will re-check until the end of the week; there is
> some test scheduling issue with our test system, which
> affects my access times.
> 
> Thanks
> 
> Andreas
> 
> 
> On Mon, 19 Aug 2013 17:10:11 +
> "Hefty, Sean"  wrote:
> 
> > Can you see if the patch below fixes the hang?
> > 
> > Signed-off-by: Sean Hefty 
> > ---
> >  src/rsocket.c |   11 ++-
> >  1 files changed, 10 insertions(+), 1 deletions(-)
> > 
> > diff --git a/src/rsocket.c b/src/rsocket.c
> > index d544dd0..e45b26d 100644
> > --- a/src/rsocket.c
> > +++ b/src/rsocket.c
> > @@ -2948,10 +2948,12 @@ static int rs_poll_events(struct pollfd
> > *rfds, struct pollfd *fds, nfds_t nfds) 
> > rs = idm_lookup(&idm, fds[i].fd);
> > if (rs) {
> > +   fastlock_acquire(&rs->cq_wait_lock);
> > if (rs->type == SOCK_STREAM)
> > rs_get_cq_event(rs);
> > else
> > ds_get_cq_event(rs);
> > +   fastlock_release(&rs->cq_wait_lock);
> > fds[i].revents = rs_poll_rs(rs,
> > fds[i].events, 1, rs_poll_all); } else {
> > fds[i].revents = rfds[i].revents;
> > @@ -3098,7 +3100,8 @@ int rselect(int nfds, fd_set *readfds, fd_set
> > *writefds, 
> >  /*
> >   * For graceful disconnect, notify the remote side that we're
> > - * disconnecting and wait until all outstanding sends complete.
> > + * disconnecting and wait until all outstanding sends complete,
> > provided
> > + * that the remote side has not sent a disconnect message.
> >   */
> >  int rshutdown(int socket, int how)
> >  {
> > @@ -3138,6 +3141,12 @@ int rshutdown(int socket, int how)
> > if (rs->state & rs_connected)
> > rs_process_cq(rs, 0, rs_conn_all_sends_done);
> >  
> > +   if (rs->state & rs_disconnected) {
> > +   /* Generate event by flushing receives to unblock
> > rpoll */
> > +   ibv_req_notify_cq(rs->cm_id->recv_cq, 0);
> > +   rdma_disconnect(rs->cm_id);
> > +   }
> > +
> > if ((rs->fd_flags & O_NONBLOCK) && (rs->state &
> > rs_connected)) rs_set_nonblocking(rs, rs->fd_flags);
> >  
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-rdma" in the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> 
> 
> 



-- 
Andreas Bluemle mailto:andreas.blue...@itxperts.de
Heinrich Boell Strasse 88   Phone: (+49) 89 4317582
D-81829 Muenchen (Germany)  Mobil: (+49) 177 522 0151
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-20 Thread Andreas Bluemle
Hi Sean,

I will re-check until the end of the week; there is
some test scheduling issue with our test system, which
affects my access times.

Thanks

Andreas


On Mon, 19 Aug 2013 17:10:11 +
"Hefty, Sean"  wrote:

> Can you see if the patch below fixes the hang?
> 
> Signed-off-by: Sean Hefty 
> ---
>  src/rsocket.c |   11 ++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/src/rsocket.c b/src/rsocket.c
> index d544dd0..e45b26d 100644
> --- a/src/rsocket.c
> +++ b/src/rsocket.c
> @@ -2948,10 +2948,12 @@ static int rs_poll_events(struct pollfd
> *rfds, struct pollfd *fds, nfds_t nfds) 
>   rs = idm_lookup(&idm, fds[i].fd);
>   if (rs) {
> + fastlock_acquire(&rs->cq_wait_lock);
>   if (rs->type == SOCK_STREAM)
>   rs_get_cq_event(rs);
>   else
>   ds_get_cq_event(rs);
> + fastlock_release(&rs->cq_wait_lock);
>   fds[i].revents = rs_poll_rs(rs,
> fds[i].events, 1, rs_poll_all); } else {
>   fds[i].revents = rfds[i].revents;
> @@ -3098,7 +3100,8 @@ int rselect(int nfds, fd_set *readfds, fd_set
> *writefds, 
>  /*
>   * For graceful disconnect, notify the remote side that we're
> - * disconnecting and wait until all outstanding sends complete.
> + * disconnecting and wait until all outstanding sends complete,
> provided
> + * that the remote side has not sent a disconnect message.
>   */
>  int rshutdown(int socket, int how)
>  {
> @@ -3138,6 +3141,12 @@ int rshutdown(int socket, int how)
>   if (rs->state & rs_connected)
>   rs_process_cq(rs, 0, rs_conn_all_sends_done);
>  
> + if (rs->state & rs_disconnected) {
> + /* Generate event by flushing receives to unblock
> rpoll */
> + ibv_req_notify_cq(rs->cm_id->recv_cq, 0);
> + rdma_disconnect(rs->cm_id);
> + }
> +
>   if ((rs->fd_flags & O_NONBLOCK) && (rs->state &
> rs_connected)) rs_set_nonblocking(rs, rs->fd_flags);
>  
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> in the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 



-- 
Andreas Bluemle mailto:andreas.blue...@itxperts.de
Heinrich Boell Strasse 88   Phone: (+49) 89 4317582
D-81829 Muenchen (Germany)  Mobil: (+49) 177 522 0151
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html