Re: PATCH: opensm enhancements

2013-07-03 Thread Hal Rosenstock
HI Jeff,

On 6/26/2013 5:24 PM, Jeff Becker wrote:
 Hi Hal. At the OFA workshop, I mentioned that I've been working on some
 modifications to opensm that we use at NASA. Following extensive testing
 of these applied to opensm 3.3.13 (the version we run here), I have
 ported these to top of tree opensm, and have tested them on a small
 cluster.

Thanks for getting this done! For future reference, patches should be
sent as plain text as this makes it easier to comment.

 The first patch modifies the console logflush command to take on or
 off as an argument for toggling. 

Thanks. Applied.

 The second (more extensive) patch
 adds a command line option to specify a file in which each line contains
 a switch GUID/port pair to be ignored by opensm. The idea is to specify
 this file when you start opensm (it can be empty), and add ports to
 ignore (one per line for each end of a connection) to the file. At the
 next heavy sweep (or HUP) the sm will reprogram the forwarding tables
 without including the ignored links. We use this for replacing cables,
 as well as for system expansion (adding new racks).

I'll comment on this one later.

-- Hal

 Please let me know if you have any questions/issues with these. Thanks.
 
 -jeff
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/13] IB SRP initiator patches for kernel 3.11

2013-07-03 Thread Bart Van Assche

The purpose of this InfiniBand SRP initiator patch series is as follows:
- Make the SRP initiator driver better suited for use in a H.A. setup.
  Add fast_io_fail_tmo and dev_loss_tmo parameters. These can be used
  either to speed up failover or to avoid device removal when e.g. using
  initiator side mirroring.
- Make the SRP initiator better suited for use on NUMA systems by
  making the HCA completion vector configurable.

Changes since the v2 of the IB SRP initiator patches for kernel 3.11 
patch series:

- Improved documentation of the newly added sysfs parameters.
- Limit fast_io_fail_tmo to SCSI_DEVICE_BLOCK_MAX_TIMEOUT.
- Simplified the code for parsing values written into sysfs attributes.
- Fixed a potential deadlock in the code added in scsi_transport_srp
  (invoking cancel_delayed_work() with the rport mutex held for work
   that needs the rport mutex itself).
- Changed the default retry count back from 2 to 7 since there is not
  yet agreement about this change.
- Dropped the patch that silences failed SCSI commands and also the
  patch that fixes a race between srp_queuecommand() and
  srp_claim_req() since there is no agreement about these patches.

Changes since the v1 of the IB SRP initiator patches for kernel 3.11 
patch series:

- scsi_transport_srp: Allowed both fast_io_fail and dev_loss timeouts
  to be disabled.
- scsi_transport_srp, srp_reconnect_rport(): switched from
  scsi_block_requests() to scsi_target_block() for blocking SCSI command
  processing temporarily.
- scsi_transport_srp, srp_start_tl_fail_timers(): only block SCSI device
  command processing if the fast_io_fail timer is enabled.
- Changed srp_abort() such that upon transport offline the value
  FAST_IO_FAIL is returned instead of SUCCESS.
- Fixed a race condition in the maintain single connection patch: a
  new login after removal had started but before removal had finished
  still could create a duplicate connection. Fixed this by deferring
  removal from the target list until removal has finished.
- Modified the error message in the same patch for reporting that a
  duplicate connection has been rejected.
- Modified patch 2/15 such that all possible race conditions with
  srp_claim_req() are addressed.
- Documented the comp_vector and tl_retry_count login string parameters.
- Updated dev_loss_tmo and fast_io_fail_tmo documentation - mentioned
  off is a valid choice.

Changes compared to v5 of the Make ib_srp better suited for H.A. 
purposes patch series:

- Left out patches that are already upstream.
- Made it possible to set dev_loss_tmo to off. This is useful in a
  setup using initiator side mirroring to avoid that new /dev/sd* names
  are reassigned after a failover or cable pull and reinsert.
- Added kernel module parameters to ib_srp for configuring default
  values of the fast_io_fail_tmo and dev_loss_tmo parameters.
- Added a patch from Dotan Barak that fixes a kernel oops during rmmod
  triggered by resource allocation failure at module load time.
- Avoid duplicate connections by refusing relogins instead of dropping
  duplicate connections, as proposed by Sebastian Riemer.
- Added a patch from Sebastian Riemer for failing SCSI commands
  silently.
- Added a patch from Vu Pham to make the transport layer (IB RC) retry
  count configurable.
- Made HCA completion vector configurable.

Changes since v4:
- Added a patch for removing SCSI devices upon a port down event

Changes since v3:
- Restored the dev_loss_tmo and fast_io_fail_tmo sysfs attributes.
- Included a patch to fix an ib_srp crash that could be triggered by
  cable pulling.

Changes since v2:
- Addressed the v2 review comments.
- Dropped the patches that have already been merged.
- Dropped the patches for integration with multipathd.
- Dropped the micro-optimization of the IB completion handlers.

The individual patches in this series are as follows:
0001-IB-srp-Fix-remove_one-crash-due-to-resource-exhausti.patch
0002-IB-srp-Fix-race-between-srp_queuecommand-and-srp_cla.patch
0003-IB-srp-Avoid-that-srp_reset_host-is-skipped-after-a-.patch
0004-IB-srp-Fail-I-O-fast-if-target-offline.patch
0005-IB-srp-Skip-host-settle-delay.patch
0006-IB-srp-Maintain-a-single-connection-per-I_T-nexus.patch
0007-IB-srp-Keep-rport-as-long-as-the-IB-transport-layer.patch
0008-scsi_transport_srp-Add-transport-layer-error-handlin.patch
0009-IB-srp-Add-srp_terminate_io.patch
0010-IB-srp-Use-SRP-transport-layer-error-recovery.patch
0011-IB-srp-Start-timers-if-a-transport-layer-error-occur.patch
0012-IB-srp-Fail-SCSI-commands-silently.patch
0013-IB-srp-Make-HCA-completion-vector-configurable.patch
0014-IB-srp-Make-transport-layer-retry-count-configurable.patch
0015-IB-srp-Bump-driver-version-and-release-date.patch
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 01/13] IB/srp: Fix remove_one crash due to resource exhaustion

2013-07-03 Thread Bart Van Assche
From: Dotan Barak dot...@dev.mellanox.co.il

If the add_one callback fails during driver load no resources are
allocated so there isn't a need to release any resources. Trying
to clean the resource may lead to the following kernel panic:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [a0132331] srp_remove_one+0x31/0x240 [ib_srp]
RIP: 0010:[a0132331]  [a0132331] srp_remove_one+0x31/0x240 
[ib_srp]
Process rmmod (pid: 4562, threadinfo 8800dd738000, task 8801167e60c0)
Call Trace:
 [a024500e] ib_unregister_client+0x4e/0x120 [ib_core]
 [a01361bd] srp_cleanup_module+0x15/0x71 [ib_srp]
 [810ac6a4] sys_delete_module+0x194/0x260
 [8100b0f2] system_call_fastpath+0x16/0x1b

[bvanassche: Shortened patch description]
Signed-off-by: Dotan Barak dot...@dev.mellanox.co.il
Reviewed-by: Eli Cohen e...@mellanox.co.il
Signed-off-by: Bart Van Assche bvanass...@acm.org
Acked-by: Sebastian Riemer sebastian.rie...@profitbricks.com
Acked-by: David Dillow dillo...@ornl.gov
Cc: Roland Dreier rol...@purestorage.com
Cc: Vu Pham v...@mellanox.com
---
 drivers/infiniband/ulp/srp/ib_srp.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 7ccf328..368d160 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -2507,6 +2507,8 @@ static void srp_remove_one(struct ib_device *device)
struct srp_target_port *target;
 
srp_dev = ib_get_client_data(device, srp_client);
+   if (!srp_dev)
+   return;
 
list_for_each_entry_safe(host, tmp_host, srp_dev-dev_list, list) {
device_unregister(host-dev);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 02/13] IB/srp: Avoid that srp_reset_host() is skipped after a TL error

2013-07-03 Thread Bart Van Assche
The SCSI error handler assumes that the transport layer is
operational if an eh_abort_handler() returns SUCCESS. Hence let
srp_abort() only return SUCCESS if sending the ABORT TASK task
management function succeeded. This patch avoids that the SCSI
error handler skips the srp_reset_host() call after a transport
layer error.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Acked-by: David Dillow dillo...@ornl.gov
Cc: Roland Dreier rol...@purestorage.com
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
---
 drivers/infiniband/ulp/srp/ib_srp.c |   10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 368d160..0e0a5a2 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1744,18 +1744,22 @@ static int srp_abort(struct scsi_cmnd *scmnd)
 {
struct srp_target_port *target = host_to_target(scmnd-device-host);
struct srp_request *req = (struct srp_request *) scmnd-host_scribble;
+   int ret;
 
shost_printk(KERN_ERR, target-scsi_host, SRP abort called\n);
 
if (!req || !srp_claim_req(target, req, scmnd))
return FAILED;
-   srp_send_tsk_mgmt(target, req-index, scmnd-device-lun,
- SRP_TSK_ABORT_TASK);
+   if (srp_send_tsk_mgmt(target, req-index, scmnd-device-lun,
+ SRP_TSK_ABORT_TASK) == 0)
+   ret = SUCCESS;
+   else
+   ret = FAILED;
srp_free_req(target, req, scmnd, 0);
scmnd-result = DID_ABORT  16;
scmnd-scsi_done(scmnd);
 
-   return SUCCESS;
+   return ret;
 }
 
 static int srp_reset_device(struct scsi_cmnd *scmnd)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 03/13] IB/srp: Fail I/O fast if target offline

2013-07-03 Thread Bart Van Assche
If reconnecting failed we know that no command completion will
be received anymore. Hence let the SCSI error handler fail such
commands immediately.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Acked-by: David Dillow dillo...@ornl.gov
Acked-by: Sebastian Riemer sebastian.rie...@profitbricks.com
Cc: Roland Dreier rol...@purestorage.com
Cc: Vu Pham v...@mellanox.com
---
 drivers/infiniband/ulp/srp/ib_srp.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 0e0a5a2..19279e5 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1753,6 +1753,8 @@ static int srp_abort(struct scsi_cmnd *scmnd)
if (srp_send_tsk_mgmt(target, req-index, scmnd-device-lun,
  SRP_TSK_ABORT_TASK) == 0)
ret = SUCCESS;
+   else if (target-transport_offline)
+   ret = FAST_IO_FAIL;
else
ret = FAILED;
srp_free_req(target, req, scmnd, 0);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 04/13] IB/srp: Skip host settle delay

2013-07-03 Thread Bart Van Assche
The SRP initiator implements host reset by reconnecting to the SRP
target. That means that communication with the target is possible
as soon as host reset finished. Hence skip the host settle delay.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Acked-by: David Dillow dillo...@ornl.gov
Cc: Roland Dreier rol...@purestorage.com
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
---
 drivers/infiniband/ulp/srp/ib_srp.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 19279e5..2c82b90 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1952,6 +1952,7 @@ static struct scsi_host_template srp_template = {
.eh_abort_handler   = srp_abort,
.eh_device_reset_handler= srp_reset_device,
.eh_host_reset_handler  = srp_reset_host,
+   .skip_settle_delay  = true,
.sg_tablesize   = SRP_DEF_SG_TABLESIZE,
.can_queue  = SRP_CMD_SQ_SIZE,
.this_id= -1,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 05/13] IB/srp: Maintain a single connection per I_T nexus

2013-07-03 Thread Bart Van Assche
An SRP target is required to maintain a single connection between
initiator and target. This means that if the 'add_target' attribute
is used to create a second connection to a target that the first
connection will be logged out and that the SCSI error handler will
kick in. The SCSI error handler will cause the SRP initiator to
reconnect, which will cause I/O over the second connection to fail.
Avoid such ping-pong behavior by disabling relogins. Note: if
reconnecting manually is necessary, that is possible by deleting
and recreating an rport via sysfs.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Signed-off-by: Sebastian Riemer sebastian.rie...@profitbricks.com
Acked-by: David Dillow dillo...@ornl.gov
Cc: Roland Dreier rol...@kernel.org
Cc: Vu Pham v...@mellanox.com
---
 drivers/infiniband/ulp/srp/ib_srp.c |   44 +--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 2c82b90..f046e32 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -542,11 +542,11 @@ static void srp_remove_work(struct work_struct *work)
 
WARN_ON_ONCE(target-state != SRP_TARGET_REMOVED);
 
+   srp_remove_target(target);
+
spin_lock(target-srp_host-target_lock);
list_del(target-list);
spin_unlock(target-srp_host-target_lock);
-
-   srp_remove_target(target);
 }
 
 static void srp_rport_delete(struct srp_rport *rport)
@@ -2008,6 +2008,36 @@ static struct class srp_class = {
.dev_release = srp_release_dev
 };
 
+/**
+ * srp_conn_unique() - check whether the connection to a target is unique
+ */
+static bool srp_conn_unique(struct srp_host *host,
+   struct srp_target_port *target)
+{
+   struct srp_target_port *t;
+   bool ret = false;
+
+   if (target-state == SRP_TARGET_REMOVED)
+   goto out;
+
+   ret = true;
+
+   spin_lock(host-target_lock);
+   list_for_each_entry(t, host-target_list, list) {
+   if (t != target 
+   target-id_ext == t-id_ext 
+   target-ioc_guid == t-ioc_guid 
+   target-initiator_ext == t-initiator_ext) {
+   ret = false;
+   break;
+   }
+   }
+   spin_unlock(host-target_lock);
+
+out:
+   return ret;
+}
+
 /*
  * Target ports are added by writing
  *
@@ -2264,6 +2294,16 @@ static ssize_t srp_create_target(struct device *dev,
if (ret)
goto err;
 
+   if (!srp_conn_unique(target-srp_host, target)) {
+   shost_printk(KERN_INFO, target-scsi_host,
+PFX Already connected to target port with 
id_ext=%016llx;ioc_guid=%016llx;initiator_ext=%016llx\n,
+be64_to_cpu(target-id_ext),
+be64_to_cpu(target-ioc_guid),
+be64_to_cpu(target-initiator_ext));
+   ret = -EEXIST;
+   goto err;
+   }
+
if (!host-srp_dev-fmr_pool  !target-allow_ext_sg 
target-cmd_sg_cnt  target-sg_tablesize) {
pr_warn(No FMR pool and no external indirect descriptors, 
limiting sg_tablesize to cmd_sg_cnt\n);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 06/13] IB/srp: Keep rport as long as the IB transport layer

2013-07-03 Thread Bart Van Assche
Keep the rport data structure around after srp_remove_host() has
finished until cleanup of the IB transport layer has finished
completely. This is necessary because later patches use the rport
pointer inside the queuecommand callback. Without this patch
accessing the rport from inside a queuecommand callback is racy
because srp_remove_host() must be invoked before scsi_remove_host()
and because the queuecommand callback may get invoked after
srp_remove_host() has finished.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol...@purestorage.com
Cc: James Bottomley jbottom...@parallels.com
Cc: David Dillow dillo...@ornl.gov
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
---
 drivers/infiniband/ulp/srp/ib_srp.c |3 +++
 drivers/infiniband/ulp/srp/ib_srp.h |1 +
 drivers/scsi/scsi_transport_srp.c   |   18 ++
 include/scsi/scsi_transport_srp.h   |2 ++
 4 files changed, 24 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index f046e32..f65701d 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -526,11 +526,13 @@ static void srp_remove_target(struct srp_target_port 
*target)
WARN_ON_ONCE(target-state != SRP_TARGET_REMOVED);
 
srp_del_scsi_host_attr(target-scsi_host);
+   srp_rport_get(target-rport);
srp_remove_host(target-scsi_host);
scsi_remove_host(target-scsi_host);
srp_disconnect_target(target);
ib_destroy_cm_id(target-cm_id);
srp_free_target_ib(target);
+   srp_rport_put(target-rport);
srp_free_req_data(target);
scsi_host_put(target-scsi_host);
 }
@@ -1982,6 +1984,7 @@ static int srp_add_target(struct srp_host *host, struct 
srp_target_port *target)
}
 
rport-lld_data = target;
+   target-rport = rport;
 
spin_lock(host-target_lock);
list_add_tail(target-list, host-target_list);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h 
b/drivers/infiniband/ulp/srp/ib_srp.h
index 66fbedd..1817ed5 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -153,6 +153,7 @@ struct srp_target_port {
u16 io_class;
struct srp_host*srp_host;
struct Scsi_Host   *scsi_host;
+   struct srp_rport   *rport;
chartarget_name[32];
unsigned intscsi_id;
unsigned intsg_tablesize;
diff --git a/drivers/scsi/scsi_transport_srp.c 
b/drivers/scsi/scsi_transport_srp.c
index f379c7f..f7ba94a 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -185,6 +185,24 @@ static int srp_host_match(struct attribute_container 
*cont, struct device *dev)
 }
 
 /**
+ * srp_rport_get() - increment rport reference count
+ */
+void srp_rport_get(struct srp_rport *rport)
+{
+   get_device(rport-dev);
+}
+EXPORT_SYMBOL(srp_rport_get);
+
+/**
+ * srp_rport_put() - decrement rport reference count
+ */
+void srp_rport_put(struct srp_rport *rport)
+{
+   put_device(rport-dev);
+}
+EXPORT_SYMBOL(srp_rport_put);
+
+/**
  * srp_rport_add - add a SRP remote port to the device hierarchy
  * @shost: scsi host the remote port is connected to.
  * @ids:   The port id for the remote port.
diff --git a/include/scsi/scsi_transport_srp.h 
b/include/scsi/scsi_transport_srp.h
index ff0f04a..5a2d2d1 100644
--- a/include/scsi/scsi_transport_srp.h
+++ b/include/scsi/scsi_transport_srp.h
@@ -38,6 +38,8 @@ extern struct scsi_transport_template *
 srp_attach_transport(struct srp_function_template *);
 extern void srp_release_transport(struct scsi_transport_template *);
 
+extern void srp_rport_get(struct srp_rport *rport);
+extern void srp_rport_put(struct srp_rport *rport);
 extern struct srp_rport *srp_rport_add(struct Scsi_Host *,
   struct srp_rport_identifiers *);
 extern void srp_rport_del(struct srp_rport *);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-03 Thread Bart Van Assche
Add the necessary functions in the SRP transport module to allow
an SRP initiator driver to implement transport layer error handling
similar to the functionality already provided by the FC transport
layer. This includes:
- Support for implementing fast_io_fail_tmo, the time that should
  elapse after having detected a transport layer problem and
  before failing I/O.
- Support for implementing dev_loss_tmo, the time that should
  elapse after having detected a transport layer problem and
  before removing a remote port.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol...@purestorage.com
Cc: James Bottomley jbottom...@parallels.com
Cc: David Dillow dillo...@ornl.gov
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
---
 Documentation/ABI/stable/sysfs-transport-srp |   38 +++
 drivers/scsi/scsi_transport_srp.c|  468 +-
 include/scsi/scsi_transport_srp.h|   62 +++-
 3 files changed, 565 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-transport-srp 
b/Documentation/ABI/stable/sysfs-transport-srp
index b36fb0d..52babb9 100644
--- a/Documentation/ABI/stable/sysfs-transport-srp
+++ b/Documentation/ABI/stable/sysfs-transport-srp
@@ -5,6 +5,24 @@ Contact:   linux-s...@vger.kernel.org, 
linux-rdma@vger.kernel.org
 Description:   Instructs an SRP initiator to disconnect from a target and to
remove all LUNs imported from that target.
 
+What:  /sys/class/srp_remote_ports/port-h:n/dev_loss_tmo
+Date:  October 1, 2013
+KernelVersion: 3.11
+Contact:   linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org
+Description:   Number of seconds the SCSI layer will wait after a transport
+   layer error has been observed before removing a target port.
+   Zero means immediate removal. Setting this attribute to off
+   will disable this behavior.
+
+What:  /sys/class/srp_remote_ports/port-h:n/fast_io_fail_tmo
+Date:  October 1, 2013
+KernelVersion: 3.11
+Contact:   linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org
+Description:   Number of seconds the SCSI layer will wait after a transport
+   layer error has been observed before failing I/O. Zero means
+   failing I/O immediately. Setting this attribute to off will
+   disable this behavior.
+
 What:  /sys/class/srp_remote_ports/port-h:n/port_id
 Date:  June 27, 2007
 KernelVersion: 2.6.24
@@ -12,8 +30,28 @@ Contact: linux-s...@vger.kernel.org
 Description:   16-byte local SRP port identifier in hexadecimal format. An
example: 4c:49:4e:55:58:20:56:49:4f:00:00:00:00:00:00:00.
 
+What:  /sys/class/srp_remote_ports/port-h:n/reconnect_delay
+Date:  October 1, 2013
+KernelVersion: 3.11
+Contact:   linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org
+Description:   Number of seconds the SCSI layer will wait after a reconnect
+   attempt failed before retrying.
+
 What:  /sys/class/srp_remote_ports/port-h:n/roles
 Date:  June 27, 2007
 KernelVersion: 2.6.24
 Contact:   linux-s...@vger.kernel.org
 Description:   Role of the remote port. Either SRP Initiator or SRP Target.
+
+What:  /sys/class/srp_remote_ports/port-h:n/state
+Date:  October 1, 2013
+KernelVersion: 3.11
+Contact:   linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org
+Description:   State of the transport layer used for communication with the
+   remote port. running if the transport layer is operational;
+   blocked if a transport layer error has been encountered but
+   the fail_io_fast_tmo timer has not yet fired; fail-fast
+   after the fail_io_fast_tmo timer has fired and before the
+   dev_loss_tmo timer has fired; lost after the
+   dev_loss_tmo timer has fired and before the port is finally
+   removed.
diff --git a/drivers/scsi/scsi_transport_srp.c 
b/drivers/scsi/scsi_transport_srp.c
index f7ba94a..1b9ebd5 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -24,12 +24,15 @@
 #include linux/err.h
 #include linux/slab.h
 #include linux/string.h
+#include linux/delay.h
 
 #include scsi/scsi.h
+#include scsi/scsi_cmnd.h
 #include scsi/scsi_device.h
 #include scsi/scsi_host.h
 #include scsi/scsi_transport.h
 #include scsi/scsi_transport_srp.h
+#include scsi_priv.h
 #include scsi_transport_srp_internal.h
 
 struct srp_host_attrs {
@@ -38,7 +41,7 @@ struct srp_host_attrs {
 #define to_srp_host_attrs(host)((struct srp_host_attrs 
*)(host)-shost_data)
 
 #define SRP_HOST_ATTRS 0
-#define SRP_RPORT_ATTRS 3
+#define SRP_RPORT_ATTRS 8
 
 struct srp_internal {
struct scsi_transport_template t;
@@ -54,6 +57,26 @@ struct srp_internal {
 
 #definedev_to_rport(d) container_of(d, struct srp_rport, dev)
 

[PATCH v3 08/13] IB/srp: Add srp_terminate_io()

2013-07-03 Thread Bart Van Assche
Finish all outstanding I/O requests after fast_io_fail_tmo expired,
which speeds up failover in a multipath setup. This patch is a
reworked version of a patch from Sebastian Riemer.

Reported-by: Sebastian Riemer sebastian.rie...@profitbricks.com
Signed-off-by: Bart Van Assche bvanass...@acm.org
Acked-by: David Dillow dillo...@ornl.gov
Cc: Roland Dreier rol...@kernel.org
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
---
 drivers/infiniband/ulp/srp/ib_srp.c |   22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index f65701d..8ba4e9c 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -686,17 +686,29 @@ static void srp_free_req(struct srp_target_port *target,
spin_unlock_irqrestore(target-lock, flags);
 }
 
-static void srp_reset_req(struct srp_target_port *target, struct srp_request 
*req)
+static void srp_finish_req(struct srp_target_port *target,
+  struct srp_request *req, int result)
 {
struct scsi_cmnd *scmnd = srp_claim_req(target, req, NULL);
 
if (scmnd) {
srp_free_req(target, req, scmnd, 0);
-   scmnd-result = DID_RESET  16;
+   scmnd-result = result;
scmnd-scsi_done(scmnd);
}
 }
 
+static void srp_terminate_io(struct srp_rport *rport)
+{
+   struct srp_target_port *target = rport-lld_data;
+   int i;
+
+   for (i = 0; i  SRP_CMD_SQ_SIZE; ++i) {
+   struct srp_request *req = target-req_ring[i];
+   srp_finish_req(target, req, DID_TRANSPORT_FAILFAST  16);
+   }
+}
+
 static int srp_reconnect_target(struct srp_target_port *target)
 {
struct Scsi_Host *shost = target-scsi_host;
@@ -723,8 +735,7 @@ static int srp_reconnect_target(struct srp_target_port 
*target)
 
for (i = 0; i  SRP_CMD_SQ_SIZE; ++i) {
struct srp_request *req = target-req_ring[i];
-   if (req-scmnd)
-   srp_reset_req(target, req);
+   srp_finish_req(target, req, DID_RESET  16);
}
 
INIT_LIST_HEAD(target-free_tx);
@@ -1782,7 +1793,7 @@ static int srp_reset_device(struct scsi_cmnd *scmnd)
for (i = 0; i  SRP_CMD_SQ_SIZE; ++i) {
struct srp_request *req = target-req_ring[i];
if (req-scmnd  req-scmnd-device == scmnd-device)
-   srp_reset_req(target, req);
+   srp_finish_req(target, req, DID_RESET  16);
}
 
return SUCCESS;
@@ -2594,6 +2605,7 @@ static void srp_remove_one(struct ib_device *device)
 
 static struct srp_function_template ib_srp_transport_functions = {
.rport_delete= srp_rport_delete,
+   .terminate_rport_io  = srp_terminate_io,
 };
 
 static int __init srp_init_module(void)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 10/13] IB/srp: Start timers if a transport layer error occurs

2013-07-03 Thread Bart Van Assche
Start the reconnect timer, fast_io_fail timer and dev_loss timers
if a transport layer error occurs.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Acked-by: David Dillow dillo...@ornl.gov
Cc: Roland Dreier rol...@kernel.org
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
---
 drivers/infiniband/ulp/srp/ib_srp.c |   19 +++
 drivers/infiniband/ulp/srp/ib_srp.h |1 +
 2 files changed, 20 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 0f69ae1..2557b7a 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -595,6 +595,7 @@ static void srp_remove_target(struct srp_target_port 
*target)
srp_disconnect_target(target);
ib_destroy_cm_id(target-cm_id);
srp_free_target_ib(target);
+   cancel_work_sync(target-tl_err_work);
srp_rport_put(target-rport);
srp_free_req_data(target);
scsi_host_put(target-scsi_host);
@@ -1364,6 +1365,21 @@ static void srp_handle_recv(struct srp_target_port 
*target, struct ib_wc *wc)
 PFX Recv failed with error code %d\n, res);
 }
 
+/**
+ * srp_tl_err_work() - handle a transport layer error
+ *
+ * Note: This function may get invoked before the rport has been created,
+ * hence the target-rport test.
+ */
+static void srp_tl_err_work(struct work_struct *work)
+{
+   struct srp_target_port *target;
+
+   target = container_of(work, struct srp_target_port, tl_err_work);
+   if (target-rport)
+   srp_start_tl_fail_timers(target-rport);
+}
+
 static void srp_handle_qp_err(enum ib_wc_status wc_status,
  enum ib_wc_opcode wc_opcode,
  struct srp_target_port *target)
@@ -1373,6 +1389,7 @@ static void srp_handle_qp_err(enum ib_wc_status wc_status,
 PFX failed %s status %d\n,
 wc_opcode  IB_WC_RECV ? receive : send,
 wc_status);
+   queue_work(system_long_wq, target-tl_err_work);
}
target-qp_in_error = true;
 }
@@ -1735,6 +1752,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct 
ib_cm_event *event)
if (ib_send_cm_drep(cm_id, NULL, 0))
shost_printk(KERN_ERR, target-scsi_host,
 PFX Sending CM DREP failed\n);
+   queue_work(system_long_wq, target-tl_err_work);
break;
 
case IB_CM_TIMEWAIT_EXIT:
@@ -2379,6 +2397,7 @@ static ssize_t srp_create_target(struct device *dev,
 sizeof (struct srp_indirect_buf) +
 target-cmd_sg_cnt * sizeof (struct 
srp_direct_buf);
 
+   INIT_WORK(target-tl_err_work, srp_tl_err_work);
INIT_WORK(target-remove_work, srp_remove_work);
spin_lock_init(target-lock);
INIT_LIST_HEAD(target-free_tx);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h 
b/drivers/infiniband/ulp/srp/ib_srp.h
index fda82f7..e45d9d0 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -175,6 +175,7 @@ struct srp_target_port {
struct srp_iu  *rx_ring[SRP_RQ_SIZE];
struct srp_request  req_ring[SRP_CMD_SQ_SIZE];
 
+   struct work_struct  tl_err_work;
struct work_struct  remove_work;
 
struct list_headlist;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 09/13] IB/srp: Use SRP transport layer error recovery

2013-07-03 Thread Bart Van Assche
Enable reconnect_delay, fast_io_fail_tmo and dev_loss_tmo
functionality for the IB SRP initiator. Add kernel module
parameters that allow to specify default values for these
three parameters.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Acked-by: David Dillow dillo...@ornl.gov
Cc: Roland Dreier rol...@kernel.org
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
---
 drivers/infiniband/ulp/srp/ib_srp.c |  123 +--
 drivers/infiniband/ulp/srp/ib_srp.h |1 -
 2 files changed, 88 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 8ba4e9c..0f69ae1 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -86,6 +86,31 @@ module_param(topspin_workarounds, int, 0444);
 MODULE_PARM_DESC(topspin_workarounds,
 Enable workarounds for Topspin/Cisco SRP target bugs if != 
0);
 
+static int srp_reconnect_delay = 10;
+module_param_named(reconnect_delay, srp_reconnect_delay, int, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(reconnect_delay, Time between successive reconnect 
attempts);
+
+static struct kernel_param_ops srp_tmo_ops;
+
+static int srp_fast_io_fail_tmo = 15;
+module_param_cb(fast_io_fail_tmo, srp_tmo_ops, srp_fast_io_fail_tmo,
+   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(fast_io_fail_tmo,
+Number of seconds between the observation of a transport
+ layer error and failing all I/O. \off\ means that this
+ functionality is disabled.);
+
+static int srp_dev_loss_tmo = 600;
+module_param_cb(dev_loss_tmo, srp_tmo_ops, srp_dev_loss_tmo,
+   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(dev_loss_tmo,
+Maximum number of seconds that the SRP transport should
+ insulate transport layer errors. After this time has been
+ exceeded the SCSI target is removed. Should be
+ between 1 and  __stringify(SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
+ if fast_io_fail_tmo has not been set. \off\ means that
+ this functionality is disabled.);
+
 static void srp_add_one(struct ib_device *device);
 static void srp_remove_one(struct ib_device *device);
 static void srp_recv_completion(struct ib_cq *cq, void *target_ptr);
@@ -102,6 +127,44 @@ static struct ib_client srp_client = {
 
 static struct ib_sa_client srp_sa_client;
 
+static int srp_tmo_get(char *buffer, const struct kernel_param *kp)
+{
+   int tmo = *(int *)kp-arg;
+
+   if (tmo = 0)
+   return sprintf(buffer, %d, tmo);
+   else
+   return sprintf(buffer, off);
+}
+
+static int srp_tmo_set(const char *val, const struct kernel_param *kp)
+{
+   int tmo, res;
+
+   if (strncmp(val, off, 3) != 0) {
+   res = kstrtoint(val, 0, tmo);
+   if (res)
+   goto out;
+   } else {
+   tmo = -1;
+   }
+   if (kp-arg == srp_fast_io_fail_tmo)
+   res = srp_tmo_valid(tmo, srp_dev_loss_tmo);
+   else
+   res = srp_tmo_valid(srp_fast_io_fail_tmo, tmo);
+   if (res)
+   goto out;
+   *(int *)kp-arg = tmo;
+
+out:
+   return res;
+}
+
+static struct kernel_param_ops srp_tmo_ops = {
+   .get = srp_tmo_get,
+   .set = srp_tmo_set,
+};
+
 static inline struct srp_target_port *host_to_target(struct Scsi_Host *host)
 {
return (struct srp_target_port *) host-hostdata;
@@ -709,13 +772,20 @@ static void srp_terminate_io(struct srp_rport *rport)
}
 }
 
-static int srp_reconnect_target(struct srp_target_port *target)
+/*
+ * It is up to the caller to ensure that srp_rport_reconnect() calls are
+ * serialized and that no concurrent srp_queuecommand(), srp_abort(),
+ * srp_reset_device() or srp_reset_host() calls will occur while this function
+ * is in progress. One way to realize that is not to call this function
+ * directly but to call srp_reconnect_rport() instead since that last function
+ * serializes calls of this function via rport-mutex and also blocks
+ * srp_queuecommand() calls before invoking this function.
+ */
+static int srp_rport_reconnect(struct srp_rport *rport)
 {
-   struct Scsi_Host *shost = target-scsi_host;
+   struct srp_target_port *target = rport-lld_data;
int i, ret;
 
-   scsi_target_block(shost-shost_gendev);
-
srp_disconnect_target(target);
/*
 * Now get a new local CM ID so that we avoid confusing the target in
@@ -745,28 +815,9 @@ static int srp_reconnect_target(struct srp_target_port 
*target)
if (ret == 0)
ret = srp_connect_target(target);
 
-   scsi_target_unblock(shost-shost_gendev, ret == 0 ? SDEV_RUNNING :
-   SDEV_TRANSPORT_OFFLINE);
-   target-transport_offline = !!ret;
-
-   if (ret)
-   goto err;
-
-   shost_printk(KERN_INFO, 

[PATCH v3 11/13] IB/srp: Make HCA completion vector configurable

2013-07-03 Thread Bart Van Assche
Several InfiniBand HCA's allow to configure the completion vector
per queue pair. This allows to spread the workload created by IB
completion interrupts over multiple MSI-X vectors and hence over
multiple CPU cores. In other words, configuring the completion
vector properly not only allows to reduce latency on an initiator
connected to multiple SRP targets but also allows to improve
throughput.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Acked-by: David Dillow dillo...@ornl.gov
Cc: Roland Dreier rol...@kernel.org
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
---
 Documentation/ABI/stable/sysfs-driver-ib_srp |7 +++
 drivers/infiniband/ulp/srp/ib_srp.c  |   26 --
 drivers/infiniband/ulp/srp/ib_srp.h  |1 +
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp 
b/Documentation/ABI/stable/sysfs-driver-ib_srp
index 481aae9..5c53d28 100644
--- a/Documentation/ABI/stable/sysfs-driver-ib_srp
+++ b/Documentation/ABI/stable/sysfs-driver-ib_srp
@@ -54,6 +54,13 @@ Description: Interface for making ib_srp connect to a new 
target.
  ib_srp. Specifying a value that exceeds cmd_sg_entries is
  only safe with partial memory descriptor list support enabled
  (allow_ext_sg=1).
+   * comp_vector, a number in the range 0..n-1 specifying the
+ MSI-X completion vector. Some HCA's allocate multiple (n)
+ MSI-X vectors per HCA port. If the IRQ affinity masks of
+ these interrupts have been configured such that each MSI-X
+ interrupt is handled by a different CPU then the comp_vector
+ parameter can be used to spread the SRP completion workload
+ over multiple CPU's.
 
 What:  /sys/class/infiniband_srp/srp-hca-port_number/ibdev
 Date:  January 2, 2006
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 2557b7a..6c164f6 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -294,14 +294,16 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
return -ENOMEM;
 
recv_cq = ib_create_cq(target-srp_host-srp_dev-dev,
-  srp_recv_completion, NULL, target, SRP_RQ_SIZE, 
0);
+  srp_recv_completion, NULL, target, SRP_RQ_SIZE,
+  target-comp_vector);
if (IS_ERR(recv_cq)) {
ret = PTR_ERR(recv_cq);
goto err;
}
 
send_cq = ib_create_cq(target-srp_host-srp_dev-dev,
-  srp_send_completion, NULL, target, SRP_SQ_SIZE, 
0);
+  srp_send_completion, NULL, target, SRP_SQ_SIZE,
+  target-comp_vector);
if (IS_ERR(send_cq)) {
ret = PTR_ERR(send_cq);
goto err_recv_cq;
@@ -1976,6 +1978,14 @@ static ssize_t show_local_ib_device(struct device *dev,
return sprintf(buf, %s\n, target-srp_host-srp_dev-dev-name);
 }
 
+static ssize_t show_comp_vector(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct srp_target_port *target = host_to_target(class_to_shost(dev));
+
+   return sprintf(buf, %d\n, target-comp_vector);
+}
+
 static ssize_t show_cmd_sg_entries(struct device *dev,
   struct device_attribute *attr, char *buf)
 {
@@ -2002,6 +2012,7 @@ static DEVICE_ATTR(req_lim, S_IRUGO, 
show_req_lim, NULL);
 static DEVICE_ATTR(zero_req_lim,S_IRUGO, show_zero_req_lim,   
NULL);
 static DEVICE_ATTR(local_ib_port,   S_IRUGO, show_local_ib_port,   NULL);
 static DEVICE_ATTR(local_ib_device, S_IRUGO, show_local_ib_device, NULL);
+static DEVICE_ATTR(comp_vector, S_IRUGO, show_comp_vector, NULL);
 static DEVICE_ATTR(cmd_sg_entries,  S_IRUGO, show_cmd_sg_entries,  NULL);
 static DEVICE_ATTR(allow_ext_sg,S_IRUGO, show_allow_ext_sg,NULL);
 
@@ -2016,6 +2027,7 @@ static struct device_attribute *srp_host_attrs[] = {
dev_attr_zero_req_lim,
dev_attr_local_ib_port,
dev_attr_local_ib_device,
+   dev_attr_comp_vector,
dev_attr_cmd_sg_entries,
dev_attr_allow_ext_sg,
NULL
@@ -2140,6 +2152,7 @@ enum {
SRP_OPT_CMD_SG_ENTRIES  = 1  9,
SRP_OPT_ALLOW_EXT_SG= 1  10,
SRP_OPT_SG_TABLESIZE= 1  11,
+   SRP_OPT_COMP_VECTOR = 1  12,
SRP_OPT_ALL = (SRP_OPT_ID_EXT   |
   SRP_OPT_IOC_GUID |
   SRP_OPT_DGID |
@@ -2160,6 +2173,7 @@ static const match_table_t srp_opt_tokens = {
{ SRP_OPT_CMD_SG_ENTRIES,   cmd_sg_entries=%u },
{ SRP_OPT_ALLOW_EXT_SG,   

[PATCH v3 12/13] IB/srp: Make transport layer retry count configurable

2013-07-03 Thread Bart Van Assche
Allow the InfiniBand RC retry count to be configured by the user
as an option in the target login string. Reducing this retry count
helps with reducing path failover time.

[bvanassche: Rewrote patch description / changed default retry count from 2 
back to 7]
Signed-off-by: Vu Pham v...@mellanox.com
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol...@purestorage.com
Cc: David Dillow dillo...@ornl.gov
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
---
 Documentation/ABI/stable/sysfs-driver-ib_srp |2 ++
 drivers/infiniband/ulp/srp/ib_srp.c  |   24 +++-
 drivers/infiniband/ulp/srp/ib_srp.h  |1 +
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp 
b/Documentation/ABI/stable/sysfs-driver-ib_srp
index 5c53d28..18e9b27 100644
--- a/Documentation/ABI/stable/sysfs-driver-ib_srp
+++ b/Documentation/ABI/stable/sysfs-driver-ib_srp
@@ -61,6 +61,8 @@ Description:  Interface for making ib_srp connect to a new 
target.
  interrupt is handled by a different CPU then the comp_vector
  parameter can be used to spread the SRP completion workload
  over multiple CPU's.
+   * tl_retry_count, a number in the range 2..7 specifying the
+ IB RC retry count.
 
 What:  /sys/class/infiniband_srp/srp-hca-port_number/ibdev
 Date:  January 2, 2006
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 6c164f6..91b2d04 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -453,7 +453,7 @@ static int srp_send_req(struct srp_target_port *target)
req-param.responder_resources= 4;
req-param.remote_cm_response_timeout = 20;
req-param.local_cm_response_timeout  = 20;
-   req-param.retry_count= 7;
+   req-param.retry_count= target-tl_retry_count;
req-param.rnr_retry_count= 7;
req-param.max_cm_retries = 15;
 
@@ -1986,6 +1986,14 @@ static ssize_t show_comp_vector(struct device *dev,
return sprintf(buf, %d\n, target-comp_vector);
 }
 
+static ssize_t show_tl_retry_count(struct device *dev,
+  struct device_attribute *attr, char *buf)
+{
+   struct srp_target_port *target = host_to_target(class_to_shost(dev));
+
+   return sprintf(buf, %d\n, target-tl_retry_count);
+}
+
 static ssize_t show_cmd_sg_entries(struct device *dev,
   struct device_attribute *attr, char *buf)
 {
@@ -2013,6 +2021,7 @@ static DEVICE_ATTR(zero_req_lim,S_IRUGO, 
show_zero_req_lim,  NULL);
 static DEVICE_ATTR(local_ib_port,   S_IRUGO, show_local_ib_port,   NULL);
 static DEVICE_ATTR(local_ib_device, S_IRUGO, show_local_ib_device, NULL);
 static DEVICE_ATTR(comp_vector, S_IRUGO, show_comp_vector, NULL);
+static DEVICE_ATTR(tl_retry_count,  S_IRUGO, show_tl_retry_count,  NULL);
 static DEVICE_ATTR(cmd_sg_entries,  S_IRUGO, show_cmd_sg_entries,  NULL);
 static DEVICE_ATTR(allow_ext_sg,S_IRUGO, show_allow_ext_sg,NULL);
 
@@ -2028,6 +2037,7 @@ static struct device_attribute *srp_host_attrs[] = {
dev_attr_local_ib_port,
dev_attr_local_ib_device,
dev_attr_comp_vector,
+   dev_attr_tl_retry_count,
dev_attr_cmd_sg_entries,
dev_attr_allow_ext_sg,
NULL
@@ -2153,6 +2163,7 @@ enum {
SRP_OPT_ALLOW_EXT_SG= 1  10,
SRP_OPT_SG_TABLESIZE= 1  11,
SRP_OPT_COMP_VECTOR = 1  12,
+   SRP_OPT_TL_RETRY_COUNT  = 1  13,
SRP_OPT_ALL = (SRP_OPT_ID_EXT   |
   SRP_OPT_IOC_GUID |
   SRP_OPT_DGID |
@@ -2174,6 +2185,7 @@ static const match_table_t srp_opt_tokens = {
{ SRP_OPT_ALLOW_EXT_SG, allow_ext_sg=%u   },
{ SRP_OPT_SG_TABLESIZE, sg_tablesize=%u   },
{ SRP_OPT_COMP_VECTOR,  comp_vector=%u},
+   { SRP_OPT_TL_RETRY_COUNT,   tl_retry_count=%u },
{ SRP_OPT_ERR,  NULL}
 };
 
@@ -2337,6 +2349,15 @@ static int srp_parse_options(const char *buf, struct 
srp_target_port *target)
target-comp_vector = token;
break;
 
+   case SRP_OPT_TL_RETRY_COUNT:
+   if (match_int(args, token) || token  2 || token  7) {
+   pr_warn(bad tl_retry_count parameter '%s' 
(must be a number between 2 and 7)\n,
+   p);
+   goto out;
+   }
+   target-tl_retry_count = token;
+   break;
+
default:
pr_warn(unknown parameter or 

[PATCH v3 13/13] IB/srp: Bump driver version and release date

2013-07-03 Thread Bart Van Assche
Signed-off-by: Vu Pham v...@mellanox.com
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol...@purestorage.com
Cc: David Dillow dillo...@ornl.gov
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
---
 drivers/infiniband/ulp/srp/ib_srp.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 91b2d04..fa38bc3 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -53,8 +53,8 @@
 
 #define DRV_NAME   ib_srp
 #define PFXDRV_NAME : 
-#define DRV_VERSION0.2
-#define DRV_RELDATENovember 1, 2005
+#define DRV_VERSION1.0
+#define DRV_RELDATEJuly 1, 2013
 
 MODULE_AUTHOR(Roland Dreier);
 MODULE_DESCRIPTION(InfiniBand SCSI RDMA Protocol initiator 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11

2013-07-03 Thread Or Gerlitz

On 03/07/2013 15:41, Bart Van Assche wrote:


[...]

Bart,


The individual patches in this series are as follows:
0001-IB-srp-Fix-remove_one-crash-due-to-resource-exhausti.patch
0002-IB-srp-Fix-race-between-srp_queuecommand-and-srp_cla.patch
0003-IB-srp-Avoid-that-srp_reset_host-is-skipped-after-a-.patch
0004-IB-srp-Fail-I-O-fast-if-target-offline.patch
0005-IB-srp-Skip-host-settle-delay.patch
0006-IB-srp-Maintain-a-single-connection-per-I_T-nexus.patch
0007-IB-srp-Keep-rport-as-long-as-the-IB-transport-layer.patch
0008-scsi_transport_srp-Add-transport-layer-error-handlin.patch
0009-IB-srp-Add-srp_terminate_io.patch
0010-IB-srp-Use-SRP-transport-layer-error-recovery.patch
0011-IB-srp-Start-timers-if-a-transport-layer-error-occur.patch
0012-IB-srp-Fail-SCSI-commands-silently.patch
0013-IB-srp-Make-HCA-completion-vector-configurable.patch
0014-IB-srp-Make-transport-layer-retry-count-configurable.patch
0015-IB-srp-Bump-driver-version-and-release-date.patch


Some of these patches were already picked by Roland (SB), I would 
suggest that you

post V4 and drop the ones which were accepted.

e8ca413 IB/srp: Bump driver version and release date
4b5e5f4 IB/srp: Make HCA completion vector configurable
96fc248 IB/srp: Maintain a single connection per I_T nexus
99e1c13 IB/srp: Fail I/O fast if target offline
2742c1d IB/srp: Skip host settle delay
086f44f IB/srp: Avoid skipping srp_reset_host() after a transport error
1fe0cb8 IB/srp: Fix remove_one crash due to resource exhaustion

Also, Would help if you use the --cover-letter of git format-patch and
the resulted cover letter  (patch 0/N) as it has standard content which
you can enhance and place your additions.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 08/13] IB/srp: Add srp_terminate_io()

2013-07-03 Thread David Dillow
On Wed, 2013-07-03 at 14:55 +0200, Bart Van Assche wrote:
 Finish all outstanding I/O requests after fast_io_fail_tmo expired,
 which speeds up failover in a multipath setup. This patch is a
 reworked version of a patch from Sebastian Riemer.
 
 Reported-by: Sebastian Riemer sebastian.rie...@profitbricks.com
 Signed-off-by: Bart Van Assche bvanass...@acm.org
 Acked-by: David Dillow dillo...@ornl.gov

I don't believe I ack'd this; I don't want the callers doing the result
shift, do it in srp_finish_req().


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 14/15] IB/srp: Make transport layer retry count configurable

2013-07-03 Thread David Dillow
On Tue, 2013-07-02 at 13:18 -0600, Jason Gunthorpe wrote:
 On Mon, Jul 01, 2013 at 07:26:05AM -0400, David Dillow wrote:
  You assume independent failures, which is suspect -- many times these
  are data-dependent, or so I tend to think. Jason, do you have any
  insight on this (overall) topic you could share?
 
 All data transmitted on modern serial links is 'whitened'
 somehow. This is does independently on a link-by-link basis either
 with 8b/10b coding or with the 64b/66b scrambler. So the idea of a
 high level 'magic packet' that causes data-dependent errors is not
 statistically likely.

My thought was that if we hit a statistically unlikely pattern that
caused an issue, the retransmission is likely to also hit the issue
given the deterministic scrambling. But I didn't think about the fact
that the signal stream was being whitened.

 It is best to use all the information the SM provides when setting up
 the path, however I don't think there is a best practice idea yet for
 how to setup the retry count though..

Hmm, that would be a useful presentation for the workshop; I'll have to
see if I can get some people interested here.

Thanks for the information,
Dave
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 12/13] IB/srp: Make transport layer retry count configurable

2013-07-03 Thread David Dillow
On Wed, 2013-07-03 at 14:59 +0200, Bart Van Assche wrote:
 Allow the InfiniBand RC retry count to be configured by the user
 as an option in the target login string. Reducing this retry count
 helps with reducing path failover time.
 
 [bvanassche: Rewrote patch description / changed default retry count from 2 
 back to 7]

Acked-by: David Dillow dillo...@ornl.gov
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11

2013-07-03 Thread Bart Van Assche

On 07/03/13 15:38, Or Gerlitz wrote:

Some of these patches were already picked by Roland (SB), I would
suggest that you post V4 and drop the ones which were accepted.


One of the patches that is already in Roland's tree and that was in v1 
of this series has been split into two patches in v2 and v3 of this 
series. So I'd like to hear from Roland what he prefers himself - that I 
drop the patches that are already in his tree or that Roland updates his 
tree with the most recently posted patches.


Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 08/13] IB/srp: Add srp_terminate_io()

2013-07-03 Thread Bart Van Assche

On 07/03/13 16:08, David Dillow wrote:

On Wed, 2013-07-03 at 14:55 +0200, Bart Van Assche wrote:

Finish all outstanding I/O requests after fast_io_fail_tmo expired,
which speeds up failover in a multipath setup. This patch is a
reworked version of a patch from Sebastian Riemer.

Reported-by: Sebastian Riemer sebastian.rie...@profitbricks.com
Signed-off-by: Bart Van Assche bvanass...@acm.org
Acked-by: David Dillow dillo...@ornl.gov


I don't believe I ack'd this; I don't want the callers doing the result
shift, do it in srp_finish_req().


My apologies. You are correct, this patch was not yet acknowledged by you.

Regarding the shift itself: is it really that important whether the 
caller or callee performs that shift ? Having it in the caller has the 
advantage that the compiler can optimize the shift operation out because 
the number that is being shifted left is a constant. And if later on it 
would be necessary to set more fields of the SCSI result in a caller of 
srp_finish_req() then that will be possible without having to modify the 
srp_finish_req() function itself.


Bart.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 08/13] IB/srp: Add srp_terminate_io()

2013-07-03 Thread David Dillow
On Wed, 2013-07-03 at 16:45 +0200, Bart Van Assche wrote:
 Having it in the caller has the 
 advantage that the compiler can optimize the shift operation out because 
 the number that is being shifted left is a constant.

srp_finish_req() is likely to be inlined, so the compiler will be able
to make this optimization. Regardless, this is so far in the noise that
it looses to readability. 

  And if later on it 
 would be necessary to set more fields of the SCSI result in a caller of 
 srp_finish_req() then that will be possible without having to modify the 
 srp_finish_req() function itself.

Other than REQ_QUIET, what do you think would need to be added? I think
we can cross that bridge when we get there, as I don't think REQ_QUIET
should not be set in the LLDs.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 08/13] IB/srp: Add srp_terminate_io()

2013-07-03 Thread David Dillow
On Wed, 2013-07-03 at 10:57 -0400, David Dillow wrote:
 On Wed, 2013-07-03 at 16:45 +0200, Bart Van Assche wrote:
  Having it in the caller has the 
  advantage that the compiler can optimize the shift operation out because 
  the number that is being shifted left is a constant.
 
 srp_finish_req() is likely to be inlined, so the compiler will be able
 to make this optimization. Regardless, this is so far in the noise that
 it looses to readability. 

Eh, just leave it alone. As much as I don't like it, it does look to be
fairly common among the LLDs and other transport code.

Acked-by: David Dillow dillo...@ornl.gov
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-03 Thread David Dillow
On Wed, 2013-07-03 at 14:54 +0200, Bart Van Assche wrote:
 +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
 +{
 + return (fast_io_fail_tmo  0 || dev_loss_tmo  0 ||
 + fast_io_fail_tmo  dev_loss_tmo) 
 + fast_io_fail_tmo = SCSI_DEVICE_BLOCK_MAX_TIMEOUT 
 + dev_loss_tmo  LONG_MAX / HZ ? 0 : -EINVAL;
 +}
 +EXPORT_SYMBOL_GPL(srp_tmo_valid);

This would have been more readable:

int srp_tmo_valid(int fast_io_fail_tmp, int dev_loss_tmo)
{
/* Fast IO fail must be off, or no greater than the max timeout */
if (fast_io_fail_tmo  SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
return -EINVAL;

/* Device timeout must be off, or fit into jiffies */
if (dev_loss_tmo = LONG_MAX / HZ)
return -EINVAL;

/* Fast IO must trigger before device loss, or one of the
 * timeouts must be disabled.
 */
if (fast_io_fail_tmo  0 || dev_loss_tmo  0)
return 0;
if (fast_io_fail  dev_loss_tmo)
return 0;

return -EINVAL;  
}

Though, now that I've unpacked it -- I don't think it is OK for
dev_loss_tmo to be off, but fast IO to be on? That drops another
conditional.

Also, FC caps dev_loss_tmo at SCSI_DEVICE_BLOCK_MAX_TIMEOUT if
fail_io_fast_tmo is off; I agree with your reasoning about leaving it
unlimited if fast fail is on, but does that still hold if it is off?



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-03 Thread Bart Van Assche

On 07/03/13 17:14, David Dillow wrote:

On Wed, 2013-07-03 at 14:54 +0200, Bart Van Assche wrote:

+int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
+{
+   return (fast_io_fail_tmo  0 || dev_loss_tmo  0 ||
+   fast_io_fail_tmo  dev_loss_tmo) 
+   fast_io_fail_tmo = SCSI_DEVICE_BLOCK_MAX_TIMEOUT 
+   dev_loss_tmo  LONG_MAX / HZ ? 0 : -EINVAL;
+}
+EXPORT_SYMBOL_GPL(srp_tmo_valid);


This would have been more readable:

int srp_tmo_valid(int fast_io_fail_tmp, int dev_loss_tmo)
{
/* Fast IO fail must be off, or no greater than the max timeout */
if (fast_io_fail_tmo  SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
return -EINVAL;

/* Device timeout must be off, or fit into jiffies */
if (dev_loss_tmo = LONG_MAX / HZ)
return -EINVAL;

/* Fast IO must trigger before device loss, or one of the
 * timeouts must be disabled.
 */
if (fast_io_fail_tmo  0 || dev_loss_tmo  0)
return 0;
if (fast_io_fail  dev_loss_tmo)
return 0;

return -EINVAL; 
}


Isn't that a matter of personal taste which of the above two is more 
clear ? It might also depend on the number of mathematics courses in 
someones educational background :-)



Though, now that I've unpacked it -- I don't think it is OK for
dev_loss_tmo to be off, but fast IO to be on? That drops another
conditional.


The combination of dev_loss_tmo off and reconnect_delay  0 worked fine 
in my tests. An I/O failure was detected shortly after the cable to the 
target was pulled. I/O resumed shortly after the cable to the target was 
reinserted.



Also, FC caps dev_loss_tmo at SCSI_DEVICE_BLOCK_MAX_TIMEOUT if
fail_io_fast_tmo is off; I agree with your reasoning about leaving it
unlimited if fast fail is on, but does that still hold if it is off?


I think setting dev_loss_tmo to a large value only makes sense if the 
value of reconnect_delay is not too large. Setting both to a large value 
would result in slow recovery after a transport layer failure has been 
corrected.


Bart.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH: opensm enhancements

2013-07-03 Thread Jeff Becker

Hi Hal,

I have some testing info about the second patch below.

On 07/03/2013 03:23 AM, Hal Rosenstock wrote:

HI Jeff,

On 6/26/2013 5:24 PM, Jeff Becker wrote:

Hi Hal. At the OFA workshop, I mentioned that I've been working on some
modifications to opensm that we use at NASA. Following extensive testing
of these applied to opensm 3.3.13 (the version we run here), I have
ported these to top of tree opensm, and have tested them on a small
cluster.

Thanks for getting this done! For future reference, patches should be
sent as plain text as this makes it easier to comment.


OK. So I just send the output of git-format-patch directly? It appears 
to be formatted properly.



The first patch modifies the console logflush command to take on or
off as an argument for toggling.

Thanks. Applied.


The second (more extensive) patch
adds a command line option to specify a file in which each line contains
a switch GUID/port pair to be ignored by opensm. The idea is to specify
this file when you start opensm (it can be empty), and add ports to
ignore (one per line for each end of a connection) to the file. At the
next heavy sweep (or HUP) the sm will reprogram the forwarding tables
without including the ignored links. We use this for replacing cables,
as well as for system expansion (adding new racks).

I'll comment on this one later.


Dale (cc'd) did some testing with my patch on Pleiades in preparation 
for a system augmentation (new racks) happening soon. He found that the 
SM correctly produces routes that do not use links marked to be ignored, 
but when you then remove or disable the links, the SM re-routes the 
fabric anyway and comes up with different routes than before. This 
rerouting causes problems with existing connections. There also appears 
to be a bookkeeping problem such that some of these links get added to 
the SM's light sampling list and never get removed. This ties up 
outstanding MAD packet slots, causing the SM to become unresponsive for 
several seconds every time it reviews its light sampling list.


I'm working on fixing these. I'll take care of the second problem 
(incorrectly getting added to the light sampling list) first. Is it 
possible this problem is related to the re-routing on port disable 
problem? Anyhow, if you have any specific comments about these issues, 
that would be great. Thanks, and have a great Fourth of July.


-jeff


-- Hal


Please let me know if you have any questions/issues with these. Thanks.

-jeff


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices

2013-07-03 Thread Or Gerlitz

On 01/07/2013 20:49, Roland Dreier wrote:

- I think the active flag for the health check timer is unnecessary.
It can just be stopped with del_timer_sync().


Hi Roland

Jack looked on this comment/code and he says that the active flag is used
to prevent re-scheduling the timer from inside the timer handling routine.

In the kernel, the comment header in the source file for del_timer_sync
explicitly states that re-scheduling the timer must be prevented,
or the sync is useless:Callers must prevent restarting of the timer, 
otherwise

this function is meaningless

So we believe that code should remain.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH opensm] Add flags to OSM_EVENT_ID_UCAST_ROUTING_DONE

2013-07-03 Thread Hal Rosenstock

to be able to discern between ucast routing done when rerouting
versus heavy sweep.

Signed-off-by: Hal Rosenstock h...@mellanox.com
---
diff --git a/include/opensm/osm_event_plugin.h 
b/include/opensm/osm_event_plugin.h
index 6b060e7..ca5a719 100644
--- a/include/opensm/osm_event_plugin.h
+++ b/include/opensm/osm_event_plugin.h
@@ -94,6 +94,12 @@ typedef enum {
LFT_CHANGED_BLOCK = (1  1)
 } osm_epi_lft_change_flags_t;
 
+typedef enum {
+   UCAST_ROUTING_NONE,
+   UCAST_ROUTING_HEAVY_SWEEP,
+   UCAST_ROUTING_REROUTE
+} osm_epi_ucast_routing_flags_t;
+
 typedef struct osm_epi_lft_change_event {
osm_switch_t *p_sw;
osm_epi_lft_change_flags_t flags;
diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c
index 1b73834..0cc8162 100644
--- a/opensm/osm_state_mgr.c
+++ b/opensm/osm_state_mgr.c
@@ -1190,7 +1190,7 @@ static void do_sweep(osm_sm_t * sm)
REROUTE COMPLETE);
osm_opensm_report_event(sm-p_subn-p_osm,
OSM_EVENT_ID_UCAST_ROUTING_DONE,
-   NULL);
+   (void *) UCAST_ROUTING_REROUTE);
return;
}
}
@@ -1387,7 +1387,8 @@ repeat_discovery:
OSM_LOG_MSG_BOX(sm-p_log, OSM_LOG_VERBOSE,
SWITCHES CONFIGURED FOR UNICAST);
osm_opensm_report_event(sm-p_subn-p_osm,
-   OSM_EVENT_ID_UCAST_ROUTING_DONE, NULL);
+   OSM_EVENT_ID_UCAST_ROUTING_DONE,
+   (void *) UCAST_ROUTING_HEAVY_SWEEP);
 
if (!sm-p_subn-opt.disable_multicast) {
osm_mcast_mgr_process(sm, TRUE);
diff --git a/osmeventplugin/src/osmeventplugin.c 
b/osmeventplugin/src/osmeventplugin.c
index c5655fe..1eaf7ea 100644
--- a/osmeventplugin/src/osmeventplugin.c
+++ b/osmeventplugin/src/osmeventplugin.c
@@ -195,7 +195,7 @@ static void report(void *_log, osm_epi_event_id_t event_id, 
void *event_data)
fprintf(log-log_file, Heavy sweep completed\n);
break;
case OSM_EVENT_ID_UCAST_ROUTING_DONE:
-   fprintf(log-log_file, Unicast routing completed\n);
+   fprintf(log-log_file, Unicast routing completed %d\n, 
event_data);
break;
case OSM_EVENT_ID_STATE_CHANGE:
fprintf(log-log_file, SM state changed\n);
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 4/9] IB/core: Add reserved values to enums for low-level drivers use

2013-07-03 Thread Or Gerlitz
From: Jack Morgenstein ja...@dev.mellanox.co.il

Continue the approach taken by commit d2b57063e4a IB/core: Reserve bits in 
enum ib_qp_create_flags for low-level driver use and reserved entries to 
the ib_qp_type and ib_wr_opcode enums. The low-level drivers will then define 
macros to use these reserved values, giving proper names to the macros for 
readability. Also add a range of reserved flags to enum ib_send_flags.

The mlx5 IB driver uses the new additions.

Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il
---
 include/rdma/ib_verbs.h |   35 +--
 1 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 98cc4b2..645c3ce 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -610,7 +610,21 @@ enum ib_qp_type {
IB_QPT_RAW_PACKET = 8,
IB_QPT_XRC_INI = 9,
IB_QPT_XRC_TGT,
-   IB_QPT_MAX
+   IB_QPT_MAX,
+   /* Reserve a range for qp types internal to the low level driver.
+* These qp types will not be visible at the IB core layer, so the
+* IB_QPT_MAX usages should not be affected in the core layer
+*/
+   IB_QPT_RESERVED1 = 0x1000,
+   IB_QPT_RESERVED2,
+   IB_QPT_RESERVED3,
+   IB_QPT_RESERVED4,
+   IB_QPT_RESERVED5,
+   IB_QPT_RESERVED6,
+   IB_QPT_RESERVED7,
+   IB_QPT_RESERVED8,
+   IB_QPT_RESERVED9,
+   IB_QPT_RESERVED10,
 };
 
 enum ib_qp_create_flags {
@@ -766,6 +780,19 @@ enum ib_wr_opcode {
IB_WR_MASKED_ATOMIC_CMP_AND_SWP,
IB_WR_MASKED_ATOMIC_FETCH_AND_ADD,
IB_WR_BIND_MW,
+   /* reserve values for low level drivers' internal use.
+* These values will not be used at all in the ib core layer.
+*/
+   IB_WR_RESERVED1 = 0xf0,
+   IB_WR_RESERVED2,
+   IB_WR_RESERVED3,
+   IB_WR_RESERVED4,
+   IB_WR_RESERVED5,
+   IB_WR_RESERVED6,
+   IB_WR_RESERVED7,
+   IB_WR_RESERVED8,
+   IB_WR_RESERVED9,
+   IB_WR_RESERVED10,
 };
 
 enum ib_send_flags {
@@ -773,7 +800,11 @@ enum ib_send_flags {
IB_SEND_SIGNALED= (11),
IB_SEND_SOLICITED   = (12),
IB_SEND_INLINE  = (13),
-   IB_SEND_IP_CSUM = (14)
+   IB_SEND_IP_CSUM = (14),
+
+   /* reserve bits 26-31 for low level drivers' internal use */
+   IB_SEND_RESERVED_START  = (1  26),
+   IB_SEND_RESERVED_END= (1  31),
 };
 
 struct ib_sge {
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 5/9] IB/mlx5: Mellanox Connect-IB, IB driver part 1/5

2013-07-03 Thread Or Gerlitz
From: Eli Cohen e...@mellanox.com

Signed-off-by: Eli Cohen e...@mellanox.com
---
 drivers/infiniband/hw/mlx5/ah.c   |   95 
 drivers/infiniband/hw/mlx5/cq.c   |  844 +
 drivers/infiniband/hw/mlx5/doorbell.c |  100 
 drivers/infiniband/hw/mlx5/mad.c  |  139 ++
 4 files changed, 1178 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/ah.c
 create mode 100644 drivers/infiniband/hw/mlx5/cq.c
 create mode 100644 drivers/infiniband/hw/mlx5/doorbell.c
 create mode 100644 drivers/infiniband/hw/mlx5/mad.c

diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c
new file mode 100644
index 000..ff8f1cb
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/ah.c
@@ -0,0 +1,95 @@
+/*
+ * Copyright (c) 2013, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include mlx5_ib.h
+
+struct ib_ah *create_ib_ah(struct ib_ah_attr *ah_attr,
+  struct mlx5_ib_ah *ah)
+{
+   u32 sgi;
+
+   if (ah_attr-ah_flags  IB_AH_GRH) {
+   sgi = ah_attr-grh.sgid_index  20;
+
+   memcpy(ah-av.rgid, ah_attr-grh.dgid, 16);
+   ah-av.grh_gid_fl = cpu_to_be32(ah_attr-grh.flow_label |
+   (1  30) | sgi);
+   ah-av.hop_limit = ah_attr-grh.hop_limit;
+   ah-av.tclass = ah_attr-grh.traffic_class;
+   }
+
+   ah-av.rlid = cpu_to_be16(ah_attr-dlid);
+   ah-av.fl_mlid = ah_attr-src_path_bits  0x7f;
+   ah-av.stat_rate_sl = (ah_attr-static_rate  4) | (ah_attr-sl  0xf);
+
+   return ah-ibah;
+}
+
+struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
+{
+   struct mlx5_ib_ah *ah;
+
+   ah = kzalloc(sizeof(*ah), GFP_ATOMIC);
+   if (!ah)
+   return ERR_PTR(-ENOMEM);
+
+   return create_ib_ah(ah_attr, ah); /* never fails */
+}
+
+int mlx5_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr)
+{
+   struct mlx5_ib_ah *ah = to_mah(ibah);
+   u32 tmp;
+
+   memset(ah_attr, 0, sizeof(*ah_attr));
+
+   tmp = be32_to_cpu(ah-av.grh_gid_fl);
+   if (tmp  (1  30)) {
+   ah_attr-ah_flags = IB_AH_GRH;
+   ah_attr-grh.sgid_index = (tmp  20)  0xff;
+   ah_attr-grh.flow_label = tmp  0xf;
+   memcpy(ah_attr-grh.dgid, ah-av.rgid, 16);
+   ah_attr-grh.hop_limit = ah-av.hop_limit;
+   ah_attr-grh.traffic_class = ah-av.tclass;
+   }
+   ah_attr-dlid = be16_to_cpu(ah-av.rlid);
+   ah_attr-static_rate = ah-av.stat_rate_sl  4;
+   ah_attr-sl = ah-av.stat_rate_sl  0xf;
+
+   return 0;
+}
+
+int mlx5_ib_destroy_ah(struct ib_ah *ah)
+{
+   kfree(to_mah(ah));
+   return 0;
+}
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
new file mode 100644
index 000..c05868e
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -0,0 +1,844 @@
+/*
+ * Copyright (c) 2013, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *  

[PATCH V2 9/9] IB/mlx5: Mellanox Connect-IB, IB driver part 5/5

2013-07-03 Thread Or Gerlitz
From: Eli Cohen e...@mellanox.com

Signed-off-by: Eli Cohen e...@mellanox.com
---
 MAINTAINERS |   10 ++
 drivers/infiniband/Kconfig  |1 +
 drivers/infiniband/Makefile |1 +
 drivers/infiniband/hw/mlx5/Kconfig  |   10 ++
 drivers/infiniband/hw/mlx5/Makefile |3 +++
 5 files changed, 25 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/Kconfig
 create mode 100644 drivers/infiniband/hw/mlx5/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index 6e82fb5..b426536 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5377,6 +5377,16 @@ S:   Supported
 F: drivers/net/ethernet/mellanox/mlx5/core/
 F: include/linux/mlx5/
 
+Mellanox MLX5 IB driver
+M:  Eli Cohen e...@mellanox.com
+L:  linux-rdma@vger.kernel.org
+W:  http://www.mellanox.com
+Q:  http://patchwork.kernel.org/project/linux-rdma/list/
+T:  git://openfabrics.org/~eli/connect-ib.git
+S:  Supported
+F:  include/linux/mlx5/
+F:  drivers/infiniband/hw/mlx5/
+
 MODULE SUPPORT
 M: Rusty Russell ru...@rustcorp.com.au
 S: Maintained
diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index c85b56c..5ceda71 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -50,6 +50,7 @@ source drivers/infiniband/hw/amso1100/Kconfig
 source drivers/infiniband/hw/cxgb3/Kconfig
 source drivers/infiniband/hw/cxgb4/Kconfig
 source drivers/infiniband/hw/mlx4/Kconfig
+source drivers/infiniband/hw/mlx5/Kconfig
 source drivers/infiniband/hw/nes/Kconfig
 source drivers/infiniband/hw/ocrdma/Kconfig
 
diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile
index b126fef..1fe6988 100644
--- a/drivers/infiniband/Makefile
+++ b/drivers/infiniband/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_INFINIBAND_AMSO1100)   += hw/amso1100/
 obj-$(CONFIG_INFINIBAND_CXGB3) += hw/cxgb3/
 obj-$(CONFIG_INFINIBAND_CXGB4) += hw/cxgb4/
 obj-$(CONFIG_MLX4_INFINIBAND)  += hw/mlx4/
+obj-$(CONFIG_MLX5_INFINIBAND)  += hw/mlx5/
 obj-$(CONFIG_INFINIBAND_NES)   += hw/nes/
 obj-$(CONFIG_INFINIBAND_OCRDMA)+= hw/ocrdma/
 obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/
diff --git a/drivers/infiniband/hw/mlx5/Kconfig 
b/drivers/infiniband/hw/mlx5/Kconfig
new file mode 100644
index 000..8e6aebf
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/Kconfig
@@ -0,0 +1,10 @@
+config MLX5_INFINIBAND
+   tristate Mellanox Connect-IB HCA support
+   depends on NETDEVICES  ETHERNET  PCI  X86
+   select NET_VENDOR_MELLANOX
+   select MLX5_CORE
+   ---help---
+ This driver provides low-level InfiniBand support for
+ Mellanox Connect-IB PCI Express host channel adapters (HCAs).
+ This is required to use InfiniBand protocols such as
+ IP-over-IB or SRP with these devices.
diff --git a/drivers/infiniband/hw/mlx5/Makefile 
b/drivers/infiniband/hw/mlx5/Makefile
new file mode 100644
index 000..4ea0135
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_MLX5_INFINIBAND)  += mlx5_ib.o
+
+mlx5_ib-y :=   main.o cq.o doorbell.o qp.o mem.o srq.o mr.o ah.o mad.o
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 0/9] Add Mellanox mlx5 driver for Connect-IB devices

2013-07-03 Thread Or Gerlitz
Hi Roland, all 

Here's V2 of the driver, with Dave's and Roland's comments addressed, 
looking forward to see if we have OK from Roland to merge that into 3.11

Jack, Moshe and Or.

changes from V1:

- Addreessed Dave Miller's comments:
   * Local variables in functions listed from longest to shortest
   * --i/++i changed to i--/i++ in all for-loops
   * Removed leading /* empty line from all comments
   * magic constants given names
   * endianness code moved to driver.h, and defined an endianness-dependent 
macro 
 for use in assignment.
   * destroy_msg_cache() duplicated code removed

- Addressed Roland's comments:

   * Renamed foo_spl to foo_lock for spinlocks.
   * Eliminated magic number from mlx5_cmd_stats field declaration in struct 
mlx5_cmd.
   * Eliminated unused procedure mlx5_ib_umem_populate_pas()
 command execution times, but all file-name-based mask bits removed.

   * Cleaned up mlx5_ib.h:
   * Added new patch for ib_verbs.h, adding reserved values to several enums
   * For several ib-core enums, added reserved values for use by low-level 
drivers. 
 By defining macros at the low level (i.e., renaming the reserved values, 
in effect), the 
 ll drivers may use these enums without needing to duplicate the ib-core 
enums while adding 
 extra values. This fixes compilation problems such as:
/home/roland/Src/linux-merge.git/drivers/infiniband/hw/mlx5/qp.c:975:2:
error: case value 4671 not in enumerated type enum ib_qp_type

   * Changed ib_latency_class to mlx5_ib_latency_class, visible only in 
low-level driver
   * Eliminated the unused IB_WR_xxx_PSV enums
   * Defined macros MLX5_IB_SEND_UMR_UNREG, MLX5_IB_QPT_REG_UMR, and 
MLX5_IB_WR_UMR, 
 taking advantage of the reserved values added to the ib_core enums.

   * debug-mask removed from mlx5_ib
   * Regarding mlx5_core, still have a debug mask to enable printouts of 
command data and 
   * Removed forced -Wall -Werror -DDEBUG settings in the mlx5 core/ib makefiles

changes from V0:
 - Per Dave's request, cross posting to both netdev and linux-rdma, to see 
   if there are comments from netdev on the core driver.

From: Eli Cohen e...@mellanox.com

The patches that follow constitute the driver for Mellanox's 5th generation
of HCAs named Connect-IB.

The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This
partitioning resembles what we have for mlx4 with the substantial difference
that mlx5_ib is the pci device driver and not mlx5_core.

mlx5_core provides general functionality that is intended to be used by
other Mellanox devices that will be introduced in the future. In this sense,
it can be perceived as a library. mlx5_ib has a similar role as any hardware
device under drivers/infiniband/hw.

The patches are partitioned to avoid exceeding the 100KB vger.kernel.org
limitation. They are divided such that the first three ones have the code
of the mlx5_core driver, and the last five the code of the mlx5_ib driver.

Only the last patch per driver adds the Makefiles and Kconfigs, to make
things robust for future bisections.

PPC is not yet supported but support will be included in the near future.

Eli Cohen (8):
  net/mlx5: Mellanox Connect-IB, core driver part 1/3
  net/mlx5: Mellanox Connect-IB, core driver part 2/3
  net/mlx5: Mellanox Connect-IB, core driver part 3/3
  IB/mlx5: Mellanox Connect-IB, IB driver part 1/5
  IB/mlx5: Mellanox Connect-IB, IB driver part 2/5
  IB/mlx5: Mellanox Connect-IB, IB driver part 3/5
  IB/mlx5: Mellanox Connect-IB, IB driver part 4/5
  IB/mlx5: Mellanox Connect-IB, IB driver part 5/5

Jack Morgenstein (1):
  IB/core: Add reserved values to enums for low-level drivers use

 MAINTAINERS|   22 +
 drivers/infiniband/Kconfig |1 +
 drivers/infiniband/Makefile|1 +
 drivers/infiniband/hw/mlx5/Kconfig |   10 +
 drivers/infiniband/hw/mlx5/Makefile|3 +
 drivers/infiniband/hw/mlx5/ah.c|   95 +
 drivers/infiniband/hw/mlx5/cq.c|  844 +++
 drivers/infiniband/hw/mlx5/doorbell.c  |  100 +
 drivers/infiniband/hw/mlx5/mad.c   |  139 ++
 drivers/infiniband/hw/mlx5/main.c  | 1504 
 drivers/infiniband/hw/mlx5/mem.c   |  162 ++
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |  547 +
 drivers/infiniband/hw/mlx5/mr.c| 1021 
 drivers/infiniband/hw/mlx5/qp.c| 2537 
 drivers/infiniband/hw/mlx5/srq.c   |  478 
 drivers/infiniband/hw/mlx5/user.h  |  121 +
 drivers/net/ethernet/mellanox/Kconfig  |1 +
 drivers/net/ethernet/mellanox/Makefile |1 +
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|   18 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |5 +
 

[PATCH V2 7/9] IB/mlx5: Mellanox Connect-IB, IB driver part 3/5

2013-07-03 Thread Or Gerlitz
From: Eli Cohen e...@mellanox.com

Signed-off-by: Eli Cohen e...@mellanox.com
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  547 ++
 drivers/infiniband/hw/mlx5/mr.c  | 1021 ++
 2 files changed, 1568 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/mlx5_ib.h
 create mode 100644 drivers/infiniband/hw/mlx5/mr.c

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
new file mode 100644
index 000..d2067c3
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -0,0 +1,547 @@
+/*
+ * Copyright (c) 2013, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef MLX5_IB_H
+#define MLX5_IB_H
+
+#include linux/kernel.h
+#include linux/sched.h
+#include rdma/ib_verbs.h
+#include rdma/ib_smi.h
+#include linux/mlx5/driver.h
+#include linux/mlx5/cq.h
+#include linux/mlx5/qp.h
+#include linux/mlx5/srq.h
+#include linux/types.h
+
+#define mlx5_ib_dbg(dev, format, arg...)   \
+do {   \
+   pr_debug(%s:%s:%d:(pid %d):  format, (dev)-ib_dev.name,  \
+__func__, __LINE__, current-pid, ##arg);  \
+} while (0)
+
+#define mlx5_ib_err(dev, format, arg...)   \
+pr_err(%s:%s:%d:(pid %d):  format, (dev)-ib_dev.name, __func__, \
+   __LINE__, current-pid, ##arg)
+
+#define mlx5_ib_warn(dev, format, arg...)  \
+pr_warn(%s:%s:%d:(pid %d):  format, (dev)-ib_dev.name, __func__,\
+   __LINE__, current-pid, ##arg)
+
+enum {
+   MLX5_IB_MMAP_CMD_SHIFT  = 8,
+   MLX5_IB_MMAP_CMD_MASK   = 0xff,
+};
+
+enum mlx5_ib_mmap_cmd {
+   MLX5_IB_MMAP_REGULAR_PAGE   = 0,
+   MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES   = 1, /* always last */
+};
+
+enum {
+   MLX5_RES_SCAT_DATA32_CQE= 0x1,
+   MLX5_RES_SCAT_DATA64_CQE= 0x2,
+   MLX5_REQ_SCAT_DATA32_CQE= 0x11,
+   MLX5_REQ_SCAT_DATA64_CQE= 0x22,
+};
+
+enum mlx5_ib_latency_class {
+   MLX5_IB_LATENCY_CLASS_LOW,
+   MLX5_IB_LATENCY_CLASS_MEDIUM,
+   MLX5_IB_LATENCY_CLASS_HIGH,
+   MLX5_IB_LATENCY_CLASS_FAST_PATH
+};
+
+enum mlx5_ib_mad_ifc_flags {
+   MLX5_MAD_IFC_IGNORE_MKEY= 1,
+   MLX5_MAD_IFC_IGNORE_BKEY= 2,
+   MLX5_MAD_IFC_NET_VIEW   = 4,
+};
+
+struct mlx5_ib_ucontext {
+   struct ib_ucontext  ibucontext;
+   struct list_headdb_page_list;
+
+   /* protect doorbell record alloc/free
+*/
+   struct mutexdb_page_mutex;
+   struct mlx5_uuar_info   uuari;
+};
+
+static inline struct mlx5_ib_ucontext *to_mucontext(struct ib_ucontext 
*ibucontext)
+{
+   return container_of(ibucontext, struct mlx5_ib_ucontext, ibucontext);
+}
+
+struct mlx5_ib_pd {
+   struct ib_pdibpd;
+   u32 pdn;
+   u32 pa_lkey;
+};
+
+/* Use macros here so that don't have to duplicate
+ * enum ib_send_flags and enum ib_qp_type for low-level driver
+ */
+
+#define MLX5_IB_SEND_UMR_UNREG IB_SEND_RESERVED_START
+#define MLX5_IB_QPT_REG_UMRIB_QPT_RESERVED1
+#define MLX5_IB_WR_UMR IB_WR_RESERVED1
+
+struct wr_list {
+   u16 opcode;
+   u16 next;
+};
+
+struct mlx5_ib_wq {
+   u64*wrid;
+   u32*wr_data;
+   struct wr_list *w_list;
+   unsigned   *wqe_head;
+   u16 unsig_count;
+
+   /* 

Re: rtnl_lock deadlock on 3.10

2013-07-03 Thread Shawn Bohrer
On Wed, Jul 03, 2013 at 07:33:07AM +0200, Hannes Frederic Sowa wrote:
 On Wed, Jul 03, 2013 at 07:11:52AM +0200, Hannes Frederic Sowa wrote:
  On Tue, Jul 02, 2013 at 01:38:26PM +, Cong Wang wrote:
   On Tue, 02 Jul 2013 at 08:28 GMT, Hannes Frederic Sowa 
   han...@stressinduktion.org wrote:
On Mon, Jul 01, 2013 at 09:54:56AM -0500, Shawn Bohrer wrote:
I've managed to hit a deadlock at boot a couple times while testing
the 3.10 rc kernels.  It seems to always happen when my network
devices are initializing.  This morning I updated to v3.10 and made a
few config tweaks and so far I've hit it 4 out of 5 reboots.  It looks
like most processes are getting stuck on rtnl_lock.  Below is a boot
log with the soft lockup prints.  Please let know if there is any
other information I can provide:
   
Could you try a build with CONFIG_LOCKDEP enabled?
   
   
   The problem is clear: ib_register_device() is called with rtnl_lock,
   but itself needs device_mutex, however, ib_register_client() first
   acquires device_mutex, then indirectly calls register_netdev() which
   takes rtnl_lock. Deadlock!
   
   One possible fix is always taking rtnl_lock before taking
   device_mutex, something like below:
   
   diff --git a/drivers/infiniband/core/device.c 
   b/drivers/infiniband/core/device.c
   index 18c1ece..890870b 100644
   --- a/drivers/infiniband/core/device.c
   +++ b/drivers/infiniband/core/device.c
   @@ -381,6 +381,7 @@ int ib_register_client(struct ib_client *client)
{
 struct ib_device *device;

   + rtnl_lock();
 mutex_lock(device_mutex);

 list_add_tail(client-list, client_list);
   @@ -389,6 +390,7 @@ int ib_register_client(struct ib_client *client)
 client-add(device);

 mutex_unlock(device_mutex);
   + rtnl_unlock();

 return 0;
}
   diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
   b/drivers/infiniband/ulp/ipoib/ipoib_main.c
   index b6e049a..5a7a048 100644
   --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
   +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
   @@ -1609,7 +1609,7 @@ static struct net_device *ipoib_add_port(const char 
   *format,
 goto event_failed;
 }

   - result = register_netdev(priv-dev);
   + result = register_netdevice(priv-dev);
 if (result) {
 printk(KERN_WARNING %s: couldn't register ipoib port %d; error 
   %d\n,
hca-name, port, result);
  
  Looks good to me. Shawn, could you test this patch?
 
 ib_unregister_device/ib_unregister_client would need the same change,
 too. I have not checked the other -add() and -remove() functions. Also
 cc'ed linux-rdma@vger.kernel.org, Roland Dreier.

Cong's patch is missing the #include linux/rtnetlink.h but otherwise
I've had 34 successful reboots with no deadlocks which is a good sign.
It sounds like there are more paths that need to be audited and a
proper patch submitted.  I can do more testing later if needed.

Thanks,
Shawn
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH: opensm enhancements

2013-07-03 Thread Hal Rosenstock
Hi again Jeff,

On 7/3/2013 12:20 PM, Jeff Becker wrote:
 Hi Hal,
 
 I have some testing info about the second patch below.
 
 On 07/03/2013 03:23 AM, Hal Rosenstock wrote:
 HI Jeff,

 On 6/26/2013 5:24 PM, Jeff Becker wrote:
 Hi Hal. At the OFA workshop, I mentioned that I've been working on some
 modifications to opensm that we use at NASA. Following extensive testing
 of these applied to opensm 3.3.13 (the version we run here), I have
 ported these to top of tree opensm, and have tested them on a small
 cluster.
 Thanks for getting this done! For future reference, patches should be
 sent as plain text as this makes it easier to comment.
 
 OK. So I just send the output of git-format-patch directly? It appears
 to be formatted properly.

 The first patch modifies the console logflush command to take on or
 off as an argument for toggling.
 Thanks. Applied.

 The second (more extensive) patch
 adds a command line option to specify a file in which each line contains
 a switch GUID/port pair to be ignored by opensm. The idea is to specify
 this file when you start opensm (it can be empty), and add ports to
 ignore (one per line for each end of a connection) to the file. At the
 next heavy sweep (or HUP) the sm will reprogram the forwarding tables
 without including the ignored links. We use this for replacing cables,
 as well as for system expansion (adding new racks).
 I'll comment on this one later.
 
 Dale (cc'd) did some testing with my patch on Pleiades in preparation
 for a system augmentation (new racks) happening soon. He found that the
 SM correctly produces routes that do not use links marked to be ignored,
 but when you then remove or disable the links, the SM re-routes the
 fabric anyway and comes up with different routes than before. This
 rerouting causes problems with existing connections. There also appears
 to be a bookkeeping problem such that some of these links get added to
 the SM's light sampling list and never get removed. This ties up
 outstanding MAD packet slots, causing the SM to become unresponsive for
 several seconds every time it reviews its light sampling list.

Yes, this is one of several issues with using this approach.

I plan on detailing these later as well as posting a slightly different
approach for this but that may take a little longer...

 I'm working on fixing these. I'll take care of the second problem
 (incorrectly getting added to the light sampling list) first. Is it
 possible this problem is related to the re-routing on port disable
 problem? Anyhow, if you have any specific comments about these issues,
 that would be great. 

 Thanks, and have a great Fourth of July.

Thanks; you too!

-- Hal

 -jeff

 -- Hal

 Please let me know if you have any questions/issues with these. Thanks.

 -jeff
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rtnl_lock deadlock on 3.10

2013-07-03 Thread Or Gerlitz

On 03/07/2013 20:22, Shawn Bohrer wrote:

On Wed, Jul 03, 2013 at 07:33:07AM +0200, Hannes Frederic Sowa wrote:

On Wed, Jul 03, 2013 at 07:11:52AM +0200, Hannes Frederic Sowa wrote:

On Tue, Jul 02, 2013 at 01:38:26PM +, Cong Wang wrote:

On Tue, 02 Jul 2013 at 08:28 GMT, Hannes Frederic Sowa 
han...@stressinduktion.org wrote:

On Mon, Jul 01, 2013 at 09:54:56AM -0500, Shawn Bohrer wrote:

I've managed to hit a deadlock at boot a couple times while testing
the 3.10 rc kernels.  It seems to always happen when my network
devices are initializing.  This morning I updated to v3.10 and made a
few config tweaks and so far I've hit it 4 out of 5 reboots.  It looks
like most processes are getting stuck on rtnl_lock.  Below is a boot
log with the soft lockup prints.  Please let know if there is any
other information I can provide:

Could you try a build with CONFIG_LOCKDEP enabled?


The problem is clear: ib_register_device() is called with rtnl_lock,
but itself needs device_mutex, however, ib_register_client() first
acquires device_mutex, then indirectly calls register_netdev() which
takes rtnl_lock. Deadlock!

One possible fix is always taking rtnl_lock before taking
device_mutex, something like below:

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 18c1ece..890870b 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -381,6 +381,7 @@ int ib_register_client(struct ib_client *client)
  {
struct ib_device *device;
  
+	rtnl_lock();

mutex_lock(device_mutex);
  
  	list_add_tail(client-list, client_list);

@@ -389,6 +390,7 @@ int ib_register_client(struct ib_client *client)
client-add(device);
  
  	mutex_unlock(device_mutex);

+   rtnl_unlock();
  
  	return 0;

  }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index b6e049a..5a7a048 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1609,7 +1609,7 @@ static struct net_device *ipoib_add_port(const char 
*format,
goto event_failed;
}
  
-	result = register_netdev(priv-dev);

+   result = register_netdevice(priv-dev);
if (result) {
printk(KERN_WARNING %s: couldn't register ipoib port %d; error 
%d\n,
   hca-name, port, result);

Looks good to me. Shawn, could you test this patch?

ib_unregister_device/ib_unregister_client would need the same change,
too. I have not checked the other -add() and -remove() functions. Also
cc'ed linux-rdma@vger.kernel.org, Roland Dreier.

Cong's patch is missing the #include linux/rtnetlink.h but otherwise
I've had 34 successful reboots with no deadlocks which is a good sign.
It sounds like there are more paths that need to be audited and a
proper patch submitted.  I can do more testing later if needed.

Thanks,
Shawn



Guys, I was a bit busy today looking into that, but I don't think we 
want the IB core layer  (core/device.c) to

use rtnl locking which is something that belongs to the network stack.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-03 Thread David Dillow
On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote:
 On 07/03/13 17:14, David Dillow wrote:
  On Wed, 2013-07-03 at 14:54 +0200, Bart Van Assche wrote:
  +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
  +{
  +  return (fast_io_fail_tmo  0 || dev_loss_tmo  0 ||
  +  fast_io_fail_tmo  dev_loss_tmo) 
  +  fast_io_fail_tmo = SCSI_DEVICE_BLOCK_MAX_TIMEOUT 
  +  dev_loss_tmo  LONG_MAX / HZ ? 0 : -EINVAL;
  +}
  +EXPORT_SYMBOL_GPL(srp_tmo_valid);
 
  This would have been more readable:
 
  int srp_tmo_valid(int fast_io_fail_tmp, int dev_loss_tmo)
  {
  /* Fast IO fail must be off, or no greater than the max timeout */
  if (fast_io_fail_tmo  SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
  return -EINVAL;
 
  /* Device timeout must be off, or fit into jiffies */
  if (dev_loss_tmo = LONG_MAX / HZ)
  return -EINVAL;
 
  /* Fast IO must trigger before device loss, or one of the
   * timeouts must be disabled.
   */
  if (fast_io_fail_tmo  0 || dev_loss_tmo  0)
  return 0;
  if (fast_io_fail  dev_loss_tmo)
  return 0;
 
  return -EINVAL; 
  }
 
 Isn't that a matter of personal taste which of the above two is more 
 clear ?

No, it is quite common in Linux for complicated conditionals to be
broken up into helper functions, and Vu found logic bugs in previous
iterations. After unpacking it, I still found behavior that is
questionable. All of this strongly points to that block being too dense
for its own good.

 It might also depend on the number of mathematics courses in 
 someones educational background :-)

Or the number of logic courses, or their experience with Lisp. :)

  Though, now that I've unpacked it -- I don't think it is OK for
  dev_loss_tmo to be off, but fast IO to be on? That drops another
  conditional.
 
 The combination of dev_loss_tmo off and reconnect_delay  0 worked fine 
 in my tests. An I/O failure was detected shortly after the cable to the 
 target was pulled. I/O resumed shortly after the cable to the target was 
 reinserted.

Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo
 0, and fast_io_fail_tmo = 0. The other transports do not allow this
scenario, and I'm asking if it makes sense for SRP to allow it.

But now that you mention reconnect_delay, what is the meaning of that
when it is negative? That's not in the documentation. And should it be
considered in srp_tmo_valid() -- are there values of reconnect_delay
that cause problems?

I'm starting to get a bit concerned about this patch -- can you, Vu, and
Sebastian comment on the testing you have done?

  Also, FC caps dev_loss_tmo at SCSI_DEVICE_BLOCK_MAX_TIMEOUT if
  fail_io_fast_tmo is off; I agree with your reasoning about leaving it
  unlimited if fast fail is on, but does that still hold if it is off?
 
 I think setting dev_loss_tmo to a large value only makes sense if the 
 value of reconnect_delay is not too large. Setting both to a large value 
 would result in slow recovery after a transport layer failure has been 
 corrected.

So you agree it should be capped? I can't tell from your response.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs

2013-07-03 Thread Or Gerlitz
From: Hadar Hen Zion had...@mellanox.com

Implement ib_uverbs_create_flow and ib_uverbs_destroy_flow to
support flow steering for user space applications.

Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/uverbs.h  |3 +
 drivers/infiniband/core/uverbs_cmd.c  |  199 +
 drivers/infiniband/core/uverbs_main.c |   13 ++-
 include/rdma/ib_verbs.h   |1 +
 include/uapi/rdma/ib_user_verbs.h |   88 ++-
 5 files changed, 302 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 0fcd7aa..ad9d102 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -155,6 +155,7 @@ extern struct idr ib_uverbs_cq_idr;
 extern struct idr ib_uverbs_qp_idr;
 extern struct idr ib_uverbs_srq_idr;
 extern struct idr ib_uverbs_xrcd_idr;
+extern struct idr ib_uverbs_rule_idr;
 
 void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj);
 
@@ -215,5 +216,7 @@ IB_UVERBS_DECLARE_CMD(destroy_srq);
 IB_UVERBS_DECLARE_CMD(create_xsrq);
 IB_UVERBS_DECLARE_CMD(open_xrcd);
 IB_UVERBS_DECLARE_CMD(close_xrcd);
+IB_UVERBS_DECLARE_CMD(create_flow);
+IB_UVERBS_DECLARE_CMD(destroy_flow);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index a7d00f6..bfc53f7 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -54,6 +54,7 @@ static struct uverbs_lock_class qp_lock_class = { .name = 
QP-uobj };
 static struct uverbs_lock_class ah_lock_class  = { .name = AH-uobj };
 static struct uverbs_lock_class srq_lock_class = { .name = SRQ-uobj };
 static struct uverbs_lock_class xrcd_lock_class = { .name = XRCD-uobj };
+static struct uverbs_lock_class rule_lock_class = { .name = RULE-uobj };
 
 #define INIT_UDATA(udata, ibuf, obuf, ilen, olen)  \
do {\
@@ -330,6 +331,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
INIT_LIST_HEAD(ucontext-srq_list);
INIT_LIST_HEAD(ucontext-ah_list);
INIT_LIST_HEAD(ucontext-xrcd_list);
+   INIT_LIST_HEAD(ucontext-rule_list);
ucontext-closing = 0;
 
resp.num_comp_vectors = file-device-num_comp_vectors;
@@ -2587,6 +2589,203 @@ out_put:
return ret ? ret : in_len;
 }
 
+static int kern_spec_to_ib_spec(struct ib_kern_spec *kern_spec,
+   struct _ib_flow_spec *ib_spec)
+{
+   ib_spec-type = kern_spec-type;
+
+   switch (ib_spec-type) {
+   case IB_FLOW_SPEC_ETH:
+   ib_spec-eth.size = sizeof(struct ib_flow_spec_eth);
+   memcpy(ib_spec-eth.val, kern_spec-eth.val,
+  sizeof(struct ib_flow_eth_filter));
+   memcpy(ib_spec-eth.mask, kern_spec-eth.mask,
+  sizeof(struct ib_flow_eth_filter));
+   break;
+   case IB_FLOW_SPEC_IPV4:
+   ib_spec-ipv4.size = sizeof(struct ib_flow_spec_ipv4);
+   memcpy(ib_spec-ipv4.val, kern_spec-ipv4.val,
+  sizeof(struct ib_flow_ipv4_filter));
+   memcpy(ib_spec-ipv4.mask, kern_spec-ipv4.mask,
+  sizeof(struct ib_flow_ipv4_filter));
+   break;
+   case IB_FLOW_SPEC_TCP:
+   case IB_FLOW_SPEC_UDP:
+   ib_spec-tcp_udp.size = sizeof(struct ib_flow_spec_tcp_udp);
+   memcpy(ib_spec-tcp_udp.val, kern_spec-tcp_udp.val,
+  sizeof(struct ib_flow_tcp_udp_filter));
+   memcpy(ib_spec-tcp_udp.mask, kern_spec-tcp_udp.mask,
+  sizeof(struct ib_flow_tcp_udp_filter));
+   break;
+   default:
+   return -EINVAL;
+   }
+   return 0;
+}
+
+ssize_t ib_uverbs_create_flow(struct ib_uverbs_file *file,
+ const char __user *buf, int in_len,
+ int out_len)
+{
+   struct ib_uverbs_create_flow  cmd;
+   struct ib_uverbs_create_flow_resp resp;
+   struct ib_uobject *uobj;
+   struct ib_flow*flow_id;
+   struct ib_kern_flow_attr  *kern_flow_attr;
+   struct ib_flow_attr   *flow_attr;
+   struct ib_qp  *qp;
+   int err = 0;
+   void *kern_spec;
+   void *ib_spec;
+   int i;
+
+   if (out_len  sizeof(resp))
+   return -ENOSPC;
+
+   if (copy_from_user(cmd, buf, sizeof(cmd)))
+   return -EFAULT;
+
+   if ((cmd.flow_attr.type == IB_FLOW_ATTR_SNIFFER 
+!capable(CAP_NET_ADMIN)) || !capable(CAP_NET_RAW))
+   return -EPERM;
+
+   if (cmd.flow_attr.num_of_specs) {
+   kern_flow_attr = kmalloc(cmd.flow_attr.size, GFP_KERNEL);
+   if (!kern_flow_attr)
+ 

[PATCH V3 for-next 2/4] IB/core: Infra-structure to support verbs extensions through uverbs

2013-07-03 Thread Or Gerlitz
From: Igor Ivanov igor.iva...@itseez.com

Add Infra-structure to support extended uverbs capabilities in a 
forward/backward
manner. Uverbs command opcodes which are based on the verbs extensions approach 
should
be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format
and processed a bit differently.

Signed-off-by: Igor Ivanov igor.iva...@itseez.com
Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/uverbs_main.c |   29 -
 include/uapi/rdma/ib_user_verbs.h |   10 ++
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_main.c 
b/drivers/infiniband/core/uverbs_main.c
index 2c6f0f2..e4e7b24 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -583,9 +583,6 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
if (copy_from_user(hdr, buf, sizeof hdr))
return -EFAULT;
 
-   if (hdr.in_words * 4 != count)
-   return -EINVAL;
-
if (hdr.command = ARRAY_SIZE(uverbs_cmd_table) ||
!uverbs_cmd_table[hdr.command])
return -EINVAL;
@@ -597,8 +594,30 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
if (!(file-device-ib_dev-uverbs_cmd_mask  (1ull  hdr.command)))
return -ENOSYS;
 
-   return uverbs_cmd_table[hdr.command](file, buf + sizeof hdr,
-hdr.in_words * 4, hdr.out_words * 
4);
+   if (hdr.command = IB_USER_VERBS_CMD_THRESHOLD) {
+   struct ib_uverbs_cmd_hdr_ex hdr_ex;
+
+   if (copy_from_user(hdr_ex, buf, sizeof(hdr_ex)))
+   return -EFAULT;
+
+   if (((hdr_ex.in_words + hdr_ex.provider_in_words) * 4) != count)
+   return -EINVAL;
+
+   return uverbs_cmd_table[hdr.command](file,
+buf + sizeof(hdr_ex),
+(hdr_ex.in_words +
+ hdr_ex.provider_in_words) 
* 4,
+(hdr_ex.out_words +
+ 
hdr_ex.provider_out_words) * 4);
+   } else {
+   if (hdr.in_words * 4 != count)
+   return -EINVAL;
+
+   return uverbs_cmd_table[hdr.command](file,
+buf + sizeof(hdr),
+hdr.in_words * 4,
+hdr.out_words * 4);
+   }
 }
 
 static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma)
diff --git a/include/uapi/rdma/ib_user_verbs.h 
b/include/uapi/rdma/ib_user_verbs.h
index 805711e..61535aa 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -43,6 +43,7 @@
  * compatibility are made.
  */
 #define IB_USER_VERBS_ABI_VERSION  6
+#define IB_USER_VERBS_CMD_THRESHOLD50
 
 enum {
IB_USER_VERBS_CMD_GET_CONTEXT,
@@ -123,6 +124,15 @@ struct ib_uverbs_cmd_hdr {
__u16 out_words;
 };
 
+struct ib_uverbs_cmd_hdr_ex {
+   __u32 command;
+   __u16 in_words;
+   __u16 out_words;
+   __u16 provider_in_words;
+   __u16 provider_out_words;
+   __u32 cmd_hdr_reserved;
+};
+
 struct ib_uverbs_get_context {
__u64 response;
__u64 driver_data[0];
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 for-next 0/4] Add receive Flow Steering support

2013-07-03 Thread Or Gerlitz
Hi Roland, all

V3 addresses the comments made by Sean. There are still some concerns/questions 
posed 
by Roland on the uverbs extensions element of the series. I have posted replies 
for
them, but so far no further comments were made. 

V3 changes:
  - Addressed comments from Sean:
  - modified the change-log of patch #1 to be clearer on the priority and domain
semantics and usage
  - re-arranged the fields of struct ib_flow_attr
  - removed check from ib_flow_destroy
  - removed the IB flow spec which wasn't inline with the L2/L3/L4 approach
done for Ethernet/IP/TCP|UDP, will use proper IB flow specs when adding
the support for IPoIB flow steering

 
V2 changes:
  - dropped struct ib_kern_flow from patch #3, this structure wasn't 
used and was left there by mistake (bug, thanks Roland)
  - removed the void *flow_context field from struct ib_flow, this was 
pointing to driver private data for that flow, but doesn't belong here, 
i.e need not be seen by the verbs consumer but rather hidden.
  - renamed struct mlx4_flow_handle to mlx4_ib_flow, a structure that contains 
the verbs level struct ib_flow and the mlx4 registeration ID for that flow

V1 changes:

 - dropped the five pre-patches which were accepted into 3.10
 - rebased the patches against Roland's for-next / 3.10-rc4
 - in patch #3, ib_uverbs_destroy_flow was returning too quickly when the driver
   returned failure for ib_destroy_flow, need to free some uverbs resources 1st.
 - in patch #4, check index before accessing the array at 
mlx4_ib_create/destroy_flow

These patches add Flow Steering support to the kernel IB core, to uverbs and 
to the mlx4 IB (verbs) driver along with one patch to uverbs which adds 
some code to support extensions.

  IB/core: Add receive Flow Steering support
  IB/core: Infra-structure to support verbs extensions through uverbs
  IB/core: Export ib_create/destroy_flow through uverbs
  IB/mlx4: Add receive Flow Steering support

The main patch which introduces the Flow-Steering API is IB/core: Add receive 
Flow 
Steering support, see its change log. Looking on the Network Adapter Flow 
Steering 
slides from Tzahi Oved which he presented on the annual OFA 2012 meeting could 
be helpful
https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/518-network-adapter-flow-steering.html

Or.

Hadar Hen Zion (3):
  IB/core: Add receive Flow Steering support
  IB/core: Export ib_create/destroy_flow through uverbs
  IB/mlx4: Add receive Flow Steering support

Igor Ivanov (1):
  IB/core: Infra-structure to support verbs extensions through uverbs

 drivers/infiniband/core/uverbs.h  |3 +
 drivers/infiniband/core/uverbs_cmd.c  |  199 
 drivers/infiniband/core/uverbs_main.c |   42 +-
 drivers/infiniband/core/verbs.c   |   27 
 drivers/infiniband/hw/mlx4/main.c |  235 +
 drivers/infiniband/hw/mlx4/mlx4_ib.h  |   12 ++
 include/linux/mlx4/device.h   |5 -
 include/rdma/ib_verbs.h   |  122 +-
 include/uapi/rdma/ib_user_verbs.h |   98 ++-
 9 files changed, 729 insertions(+), 14 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 for-next 4/4] IB/mlx4: Add receive Flow Steering support

2013-07-03 Thread Or Gerlitz
From: Hadar Hen Zion had...@mellanox.com

Implement ib_create_flow and ib_destroy_flow.

Translate the verbs structures provided by the user to HW structures
and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands.

On the ATTACH command completion, the firmware provides 64 bit registration
ID which is placed into struct mlx4_ib_flow that wraps the instance of
struct ib_flow which is retuned to caller. Later, this reg ID is used
for detaching that flow from the firmware.

Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/hw/mlx4/main.c|  235 ++
 drivers/infiniband/hw/mlx4/mlx4_ib.h |   12 ++
 include/linux/mlx4/device.h  |5 -
 3 files changed, 247 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index a188d31..5b5518f 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -54,6 +54,8 @@
 #define DRV_VERSION1.0
 #define DRV_RELDATEApril 4, 2008
 
+#define MLX4_IB_FLOW_MAX_PRIO 0xFFF
+
 MODULE_AUTHOR(Roland Dreier);
 MODULE_DESCRIPTION(Mellanox ConnectX HCA InfiniBand driver);
 MODULE_LICENSE(Dual BSD/GPL);
@@ -88,6 +90,25 @@ static void init_query_mad(struct ib_smp *mad)
 
 static union ib_gid zgid;
 
+static int check_flow_steering_support(struct mlx4_dev *dev)
+{
+   int ib_num_ports = 0;
+   int i;
+
+   mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB)
+   ib_num_ports++;
+
+   if (dev-caps.steering_mode == MLX4_STEERING_MODE_DEVICE_MANAGED) {
+   if (ib_num_ports || mlx4_is_mfunc(dev)) {
+   pr_warn(Device managed flow steering is unavailable 
+   for IB ports or in multifunction env.\n);
+   return 0;
+   }
+   return 1;
+   }
+   return 0;
+}
+
 static int mlx4_ib_query_device(struct ib_device *ibdev,
struct ib_device_attr *props)
 {
@@ -144,6 +165,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
props-device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2B;
else
props-device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2A;
+   if (check_flow_steering_support(dev-dev))
+   props-device_cap_flags |= IB_DEVICE_MANAGED_FLOW_STEERING;
}
 
props-vendor_id   = be32_to_cpup((__be32 *) (out_mad-data + 
36)) 
@@ -798,6 +821,209 @@ struct mlx4_ib_steering {
union ib_gid gid;
 };
 
+static int parse_flow_attr(struct mlx4_dev *dev,
+  struct _ib_flow_spec *ib_spec,
+  struct _rule_hw *mlx4_spec)
+{
+   enum mlx4_net_trans_rule_id type;
+
+   switch (ib_spec-type) {
+   case IB_FLOW_SPEC_ETH:
+   type = MLX4_NET_TRANS_RULE_ID_ETH;
+   memcpy(mlx4_spec-eth.dst_mac, ib_spec-eth.val.dst_mac,
+  ETH_ALEN);
+   memcpy(mlx4_spec-eth.dst_mac_msk, ib_spec-eth.mask.dst_mac,
+  ETH_ALEN);
+   mlx4_spec-eth.vlan_tag = ib_spec-eth.val.vlan_tag;
+   mlx4_spec-eth.vlan_tag_msk = ib_spec-eth.mask.vlan_tag;
+   break;
+
+   case IB_FLOW_SPEC_IPV4:
+   type = MLX4_NET_TRANS_RULE_ID_IPV4;
+   mlx4_spec-ipv4.src_ip = ib_spec-ipv4.val.src_ip;
+   mlx4_spec-ipv4.src_ip_msk = ib_spec-ipv4.mask.src_ip;
+   mlx4_spec-ipv4.dst_ip = ib_spec-ipv4.val.dst_ip;
+   mlx4_spec-ipv4.dst_ip_msk = ib_spec-ipv4.mask.dst_ip;
+   break;
+
+   case IB_FLOW_SPEC_TCP:
+   case IB_FLOW_SPEC_UDP:
+   type = ib_spec-type == IB_FLOW_SPEC_TCP ?
+   MLX4_NET_TRANS_RULE_ID_TCP :
+   MLX4_NET_TRANS_RULE_ID_UDP;
+   mlx4_spec-tcp_udp.dst_port = ib_spec-tcp_udp.val.dst_port;
+   mlx4_spec-tcp_udp.dst_port_msk = 
ib_spec-tcp_udp.mask.dst_port;
+   mlx4_spec-tcp_udp.src_port = ib_spec-tcp_udp.val.src_port;
+   mlx4_spec-tcp_udp.src_port_msk = 
ib_spec-tcp_udp.mask.src_port;
+   break;
+
+   default:
+   return -EINVAL;
+   }
+   if (mlx4_map_sw_to_hw_steering_id(dev, type)  0 ||
+   mlx4_hw_rule_sz(dev, type)  0)
+   return -EINVAL;
+   mlx4_spec-id = cpu_to_be16(mlx4_map_sw_to_hw_steering_id(dev, type));
+   mlx4_spec-size = mlx4_hw_rule_sz(dev, type)  2;
+   return mlx4_hw_rule_sz(dev, type);
+}
+
+static int __mlx4_ib_create_flow(struct ib_qp *qp, struct ib_flow_attr 
*flow_attr,
+ int domain,
+ enum mlx4_net_trans_promisc_mode flow_type,
+ u64 *reg_id)
+{
+   int ret, i;
+   int size = 0;
+   void *ib_flow;
+   struct 

[PATCH V3 for-next 1/4] IB/core: Add receive Flow Steering support

2013-07-03 Thread Or Gerlitz
From: Hadar Hen Zion had...@mellanox.com

The RDMA stack allows for applications to create IB_QPT_RAW_PACKET QPs,
for which plain Ethernet packets are used, specifically packets which
don't carry any QPN to be matched by the receiving side.

Applications using these QPs must be provided with a method to
program some steering rule with the HW so packets arriving at
the local port can be routed to them.

This patch adds ib_create_flow which allow to provide a flow specification
for a QP, such that when there's a match between the specification and the
received packet, it can be forwarded to that QP, in a similar manner
one needs to use ib_attach_multicast for IB UD multicast handling.

Flow specifications are provided as instances of struct ib_flow_spec_yyy
which describe L2, L3 and L4 headers, currently specs for Ethernet, IPv4,
TCP and UDP are defined. Flow specs are made of values and masks.

The input to ib_create_flow is instance of struct ib_flow_attr which
contain few mandatory control elements and optional flow specs.

struct ib_flow_attr {
enum ib_flow_attr_type type;
u16  size;
u16  priority;
u32  flags;
u8   num_of_specs;
u8   port;
/* Following are the optional layers according to user request
 * struct ib_flow_spec_yyy
 * struct ib_flow_spec_zzz
 */
};

As these specs are eventually coming from user space, they are defined and
used in a way which allows adding new spec types without kernel/user ABI
change, and with a little API enhancement which defines the newly added spec.

The flow spec structures are defined in a TLV (Type-Length-Value) manner,
which allows to call ib_create_flow with a list of variable length of
optional specs.

For the actual processing of ib_flow_attr the driver uses the number of
specs and the size mandatory fields along with the TLV nature of the specs.

Steering rules processing order is according to the domain over which
the rule is set and the rule priority. All rules set by user space
applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains
could be used by future IPoIB RFS and Ethetool flow-steering interface
implementation. Lower priority numerical value means higher priority.

The returned value from ib_create_flow is instance of struct ib_flow
which contains a database pointer (handle) provided by the HW driver
to be used when calling ib_destroy_flow.

Applications that offload TCP/IP traffic could be written also over IB UD QPs.
As such, the ib_create_flow / ib_destroy_flow API is designed to support UD QPs
too, the HW driver sets IB_DEVICE_MANAGED_FLOW_STEERING to denote support
of flow steering.

The ib_flow_attr enum type relates to usage of flow steering for promiscuous
and sniffer purposes:

IB_FLOW_ATTR_NORMAL - regular rule, steering according to rule specification

IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive
all Ethernet traffic which isn't steered to any QP

IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for 
multicast

IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic

ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type.

Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/verbs.c |   27 +
 include/rdma/ib_verbs.h |  121 ++-
 2 files changed, 146 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 22192de..87a8102 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1254,3 +1254,30 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd)
return xrcd-device-dealloc_xrcd(xrcd);
 }
 EXPORT_SYMBOL(ib_dealloc_xrcd);
+
+struct ib_flow *ib_create_flow(struct ib_qp *qp,
+  struct ib_flow_attr *flow_attr,
+  int domain)
+{
+   struct ib_flow *flow_id;
+   if (!qp-device-create_flow)
+   return ERR_PTR(-ENOSYS);
+
+   flow_id = qp-device-create_flow(qp, flow_attr, domain);
+   if (!IS_ERR(flow_id))
+   atomic_inc(qp-usecnt);
+   return flow_id;
+}
+EXPORT_SYMBOL(ib_create_flow);
+
+int ib_destroy_flow(struct ib_flow *flow_id)
+{
+   int err;
+   struct ib_qp *qp = flow_id-qp;
+
+   err = qp-device-destroy_flow(flow_id);
+   if (!err)
+   atomic_dec(qp-usecnt);
+   return err;
+}
+EXPORT_SYMBOL(ib_destroy_flow);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 98cc4b2..1390a0f 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -116,7 +116,8 @@ enum ib_device_cap_flags {
IB_DEVICE_MEM_MGT_EXTENSIONS= (121),
IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122),
IB_DEVICE_MEM_WINDOW_TYPE_2A= (123),
-   IB_DEVICE_MEM_WINDOW_TYPE_2B= (124)
+   

[PATCH] IB/qib: fix module level leak

2013-07-03 Thread Mike Marciniszyn
The vzalloc()'ed field physshadow is leaked on module
unload.

This patch adds vfree after the sibling page shadow
is freed.

Reported-by: Dean Luick dean.lu...@intel.com
Reviewed-by: Dean Luick dean.lu...@intel.com
Signed-off-by: Mike Marciniszyn mike.marcinis...@intel.com
---
 drivers/infiniband/hw/qib/qib_init.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_init.c 
b/drivers/infiniband/hw/qib/qib_init.c
index fdae429..36e048e 100644
--- a/drivers/infiniband/hw/qib/qib_init.c
+++ b/drivers/infiniband/hw/qib/qib_init.c
@@ -1350,7 +1350,7 @@ static void cleanup_device_data(struct qib_devdata *dd)
if (dd-pageshadow) {
struct page **tmpp = dd-pageshadow;
dma_addr_t *tmpd = dd-physshadow;
-   int i, cnt = 0;
+   int i;
 
for (ctxt = 0; ctxt  dd-cfgctxts; ctxt++) {
int ctxt_tidbase = ctxt * dd-rcvtidcnt;
@@ -1363,13 +1363,13 @@ static void cleanup_device_data(struct qib_devdata *dd)
   PAGE_SIZE, PCI_DMA_FROMDEVICE);
qib_release_user_pages(tmpp[i], 1);
tmpp[i] = NULL;
-   cnt++;
}
}
 
-   tmpp = dd-pageshadow;
dd-pageshadow = NULL;
vfree(tmpp);
+   dd-physshadow = NULL;
+   vfree(tmpd);
}
 
/*

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-03 Thread Bart Van Assche

On 07/03/13 19:27, David Dillow wrote:

On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote:

The combination of dev_loss_tmo off and reconnect_delay  0 worked fine
in my tests. An I/O failure was detected shortly after the cable to the
target was pulled. I/O resumed shortly after the cable to the target was
reinserted.


Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo
 0, and fast_io_fail_tmo = 0. The other transports do not allow this
scenario, and I'm asking if it makes sense for SRP to allow it.

But now that you mention reconnect_delay, what is the meaning of that
when it is negative? That's not in the documentation. And should it be
considered in srp_tmo_valid() -- are there values of reconnect_delay
that cause problems?


None of the combinations that can be configured from user space can 
bring the kernel in trouble. If reconnect_delay = 0 that means that the 
time-based reconnect mechanism is disabled.



I'm starting to get a bit concerned about this patch -- can you, Vu, and
Sebastian comment on the testing you have done?


All combinations of reconnect_delay, fast_io_fail_tmo and dev_loss_tmo 
that result in different behavior have been tested.



Also, FC caps dev_loss_tmo at SCSI_DEVICE_BLOCK_MAX_TIMEOUT if
fail_io_fast_tmo is off; I agree with your reasoning about leaving it
unlimited if fast fail is on, but does that still hold if it is off?


I think setting dev_loss_tmo to a large value only makes sense if the
value of reconnect_delay is not too large. Setting both to a large value
would result in slow recovery after a transport layer failure has been
corrected.


So you agree it should be capped? I can't tell from your response.


Not all combinations of reconnect_delay / fail_io_fast_tmo / 
dev_loss_tmo result in useful behavior. It is up to the user to choose a 
meaningful combination.


Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-03 Thread David Dillow
On Wed, 2013-07-03 at 20:24 +0200, Bart Van Assche wrote:
 On 07/03/13 19:27, David Dillow wrote:
  On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote:
  The combination of dev_loss_tmo off and reconnect_delay  0 worked fine
  in my tests. An I/O failure was detected shortly after the cable to the
  target was pulled. I/O resumed shortly after the cable to the target was
  reinserted.
 
  Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo
   0, and fast_io_fail_tmo = 0. The other transports do not allow this
  scenario, and I'm asking if it makes sense for SRP to allow it.
 
  But now that you mention reconnect_delay, what is the meaning of that
  when it is negative? That's not in the documentation. And should it be
  considered in srp_tmo_valid() -- are there values of reconnect_delay
  that cause problems?
 
 None of the combinations that can be configured from user space can 
 bring the kernel in trouble. If reconnect_delay = 0 that means that the 
 time-based reconnect mechanism is disabled.

Then it should use the same semantics as the other attributes, and have
the user store off to turn it off.

And I'm getting the strong sense that the answer to my question about
fast_io_fail_tmo = 0 when dev_loss_tmo is that we should not allow that
combination, even if it doesn't break the kernel. If it doesn't make
sense, there is no reason to create an opportunity for user confusion.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices

2013-07-03 Thread Roland Dreier
On Wed, Jul 3, 2013 at 9:41 AM, Or Gerlitz ogerl...@mellanox.com wrote:
 Jack looked on this comment/code and he says that the active flag is used
 to prevent re-scheduling the timer from inside the timer handling routine.

 In the kernel, the comment header in the source file for del_timer_sync
 explicitly states that re-scheduling the timer must be prevented,
 or the sync is useless:Callers must prevent restarting of the timer,
 otherwise
 this function is meaningless

 So we believe that code should remain.

Look at the actual timer code.  del_timer_sync() won't work if
something unrelated re-adds the timer, but it will work if the timer
itself is what re-adds itself.

Documentation/DocBook/kernel-locking.tmpl says:

  Another common problem is deleting timers which restart
  themselves (by calling functionadd_timer()/function at the end
  of their timer function).  Because this is a fairly common case
  which is prone to races, you should use
functiondel_timer_sync()/function
  (filename class=headerfileinclude/linux/timer.h/filename)
  to handle this case.  It returns the number of times the timer
  had to be deleted before we finally stopped it from adding itself back
  in.

which pretty clearly says that del_timer_sync() will work in this case.

Or look at the code using it in arch/sparc/kernel/led.c for example
(just one of the first hits in my grep, there are many other
examples).

Not a big deal but I'm pretty sure the flag isn't needed.

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices

2013-07-03 Thread Or Gerlitz
On Wed, Jul 3, 2013 at 10:26 PM, Roland Dreier rol...@kernel.org wrote:
 On Wed, Jul 3, 2013 at 9:41 AM, Or Gerlitz ogerl...@mellanox.com wrote:
  Jack looked on this comment/code and he says that the active flag is used
  to prevent re-scheduling the timer from inside the timer handling routine.
 
  In the kernel, the comment header in the source file for del_timer_sync
  explicitly states that re-scheduling the timer must be prevented,
  or the sync is useless:Callers must prevent restarting of the timer,
  otherwise
  this function is meaningless
 
  So we believe that code should remain.

 Look at the actual timer code.  del_timer_sync() won't work if
 something unrelated re-adds the timer, but it will work if the timer
 itself is what re-adds itself.

[...]

OK, we will re-look into that tomorrow. So how V2 looks?

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 1/9] net/mlx5: Mellanox Connect-IB, core driver part 1/3

2013-07-03 Thread Joe Perches
On Wed, 2013-07-03 at 20:13 +0300, Or Gerlitz wrote:
 From: Eli Cohen e...@mellanox.com

trivial comments:

 diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
 b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
[]
 +static const char *deliv_status_to_str(u8 status)
 +{
 + switch (status) {
 + case MLX5_CMD_DELIVERY_STAT_OK:
 + return no errors;
[]
 + default:
 + return unknown status code\n;
 + }
 +}

Likely unnecessary newline for default case

 +static struct mlx5_cmd_mailbox *alloc_cmd_box(struct mlx5_core_dev *dev,
 +   gfp_t flags)
 +{
 + struct mlx5_cmd_mailbox *mailbox;
 +
 + mailbox = kmalloc(sizeof(*mailbox), flags);
 + if (!mailbox) {
 + mlx5_core_dbg(dev, failed allocation\n);
 + return ERR_PTR(-ENOMEM);
 + }

unnecessary OOM message.

 +static void set_wqname(struct mlx5_core_dev *dev)
 +{
 + struct mlx5_cmd *cmd = dev-cmd;
 +
 + strcpy(cmd-wq_name, mlx5_cmd_);
 + strcat(cmd-wq_name, dev_name(dev-pdev-dev));

More likely snprintf might be better.

snprintf(cmd-wq_name, sizeof(cmd-wq_name), mlx5_cmd_%s,
 dev_name(dev-pdev-dev));

How big is wq_name?  Will a maximum length dev_name always fit?


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 05/25] infiniband: Change how dentry's d_lock field is accessed

2013-07-03 Thread Waiman Long
Because of the changes made in dcache.h header file, files that
use the d_lock field of the dentry structure need to be changed
accordingly. All the d_lock's spin_lock() and spin_unlock() calls
are replaced by the corresponding d_lock() and d_unlock() calls.
There is no change in logic and everything should just work.

Signed-off-by: Waiman Long waiman.l...@hp.com
---
 drivers/infiniband/hw/ipath/ipath_fs.c |6 +++---
 drivers/infiniband/hw/qib/qib_fs.c |6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_fs.c 
b/drivers/infiniband/hw/ipath/ipath_fs.c
index e0c404b..1efee26 100644
--- a/drivers/infiniband/hw/ipath/ipath_fs.c
+++ b/drivers/infiniband/hw/ipath/ipath_fs.c
@@ -277,14 +277,14 @@ static int remove_file(struct dentry *parent, char *name)
goto bail;
}
 
-   spin_lock(tmp-d_lock);
+   d_lock(tmp);
if (!(d_unhashed(tmp)  tmp-d_inode)) {
dget_dlock(tmp);
__d_drop(tmp);
-   spin_unlock(tmp-d_lock);
+   d_unlock(tmp);
simple_unlink(parent-d_inode, tmp);
} else
-   spin_unlock(tmp-d_lock);
+   d_unlock(tmp);
 
ret = 0;
 bail:
diff --git a/drivers/infiniband/hw/qib/qib_fs.c 
b/drivers/infiniband/hw/qib/qib_fs.c
index f247fc6..63713ee 100644
--- a/drivers/infiniband/hw/qib/qib_fs.c
+++ b/drivers/infiniband/hw/qib/qib_fs.c
@@ -454,14 +454,14 @@ static int remove_file(struct dentry *parent, char *name)
goto bail;
}
 
-   spin_lock(tmp-d_lock);
+   d_lock(tmp);
if (!(d_unhashed(tmp)  tmp-d_inode)) {
dget_dlock(tmp);
__d_drop(tmp);
-   spin_unlock(tmp-d_lock);
+   d_unlock(tmp);
simple_unlink(parent-d_inode, tmp);
} else {
-   spin_unlock(tmp-d_lock);
+   d_unlock(tmp);
}
 
ret = 0;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 5/9] IB/mlx5: Mellanox Connect-IB, IB driver part 1/5

2013-07-03 Thread Joe Perches
On Wed, 2013-07-03 at 20:13 +0300, Or Gerlitz wrote:
 From: Eli Cohen e...@mellanox.com

more trivia:

 diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c
[]
 +struct ib_ah *create_ib_ah(struct ib_ah_attr *ah_attr,
 +struct mlx5_ib_ah *ah)
 +{
 + u32 sgi;

sgi is used once here and looks more confusing than helpful

 +
 + if (ah_attr-ah_flags  IB_AH_GRH) {
 + sgi = ah_attr-grh.sgid_index  20;
 +
 + memcpy(ah-av.rgid, ah_attr-grh.dgid, 16);
 + ah-av.grh_gid_fl = cpu_to_be32(ah_attr-grh.flow_label |
 + (1  30) | sgi);
 + ah-av.hop_limit = ah_attr-grh.hop_limit;
 + ah-av.tclass = ah_attr-grh.traffic_class;
 + }
 +
 + ah-av.rlid = cpu_to_be16(ah_attr-dlid);
 + ah-av.fl_mlid = ah_attr-src_path_bits  0x7f;
 + ah-av.stat_rate_sl = (ah_attr-static_rate  4) | (ah_attr-sl  0xf);
 +
 + return ah-ibah;
 +}

[]

 +static void *get_sw_cqe(struct mlx5_ib_cq *cq, int n)
 +{
 + void *cqe = get_cqe(cq, n  cq-ibcq.cqe);
 + struct mlx5_cqe64 *cqe64;
 +
 + cqe64 = (cq-mcq.cqe_sz == 64) ? cqe : cqe + 64;
 + return ((cqe64-op_own  MLX5_CQE_OWNER_MASK) ^
 + !!(n  (cq-ibcq.cqe + 1))) ? NULL : cqe;

I think foo ^ !!bar is excessively tricky.

 +static enum ib_wc_opcode get_umr_comp(struct mlx5_ib_wq *wq, int idx)
 +{

 + pr_warn(unkonwn completion status\n);

unknown tyop

[]

 +static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
 +   struct ib_ucontext *context, struct mlx5_ib_cq *cq,
 +   int entries, struct mlx5_create_cq_mbox_in **cqb,
 +   int *cqe_size, int *index, int *inlen)
[]
 + *inlen = sizeof **cqb + sizeof *(*cqb)-pas * ncont;

sizeof always uses parentheses

 + *cqb = vzalloc(*inlen);

Perhaps you may be using vzalloc too often.

Maybe you should have a helper allocating either
from kmalloc or vmalloc as necessary based on size.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 7/9] IB/mlx5: Mellanox Connect-IB, IB driver part 3/5

2013-07-03 Thread Joe Perches
On Wed, 2013-07-03 at 20:13 +0300, Or Gerlitz wrote:
 From: Eli Cohen e...@mellanox.com

More trivia:

 diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
 b/drivers/infiniband/hw/mlx5/mlx5_ib.h
[]
 +#define mlx5_ib_dbg(dev, format, arg...) \
 +do { \
 + pr_debug(%s:%s:%d:(pid %d):  format, (dev)-ib_dev.name,  \
 +  __func__, __LINE__, current-pid, ##arg);  \
 +} while (0)

unnecessary do {} while (0)

 +static void clean_keys(struct mlx5_ib_dev *dev, int c)
 +{
 + struct device *ddev = dev-ib_dev.dma_device;
 + struct mlx5_mr_cache *cache = dev-cache;
 + struct mlx5_cache_ent *ent = cache-ent[c];
 + struct mlx5_ib_mr *mr;
 + int size;
 + int err;
 +
 + while (1) {
 + spin_lock(ent-lock);
 + if (list_empty(ent-head)) {
 + spin_unlock(ent-lock);
 + return;
 + }
 + mr = list_first_entry(ent-head, struct mlx5_ib_mr, list);
 + list_del(mr-list);
 + ent-cur--;
 + ent-size--;
 + spin_unlock(ent-lock);
 + err = mlx5_core_destroy_mkey(dev-mdev, mr-mmr);
 + if (err) {
 + mlx5_ib_warn(dev, failed destroy mkey\n);

Are you leaking anything here by not freeing?

 + } else {
 + size = ALIGN(sizeof(u64) * (1  mr-order), 0x40);
 + dma_unmap_single(ddev, mr-dma, size, DMA_TO_DEVICE);
 + kfree(mr-pas);
 + kfree(mr);
 + }
 + };
 +}

 +static struct mlx5_ib_mr *reg_create(struct ib_pd *pd, u64 virt_addr,
 +  u64 length, struct ib_umem *umem,
 +  int npages, int page_shift,
 +  int access_flags)
 +{
[]
 + mr = kzalloc(sizeof(*mr), GFP_KERNEL);
 + if (!mr) {
 + mlx5_ib_warn(dev, allocation failed\n);

Another unnecessary OOM

 + mr = ERR_PTR(-ENOMEM);
 + }
 +
 + inlen = sizeof(*in) + sizeof(*in-pas) * ((npages + 1) / 2) * 2;
 + in = vzalloc(inlen);
 + if (!in) {
 + mlx5_ib_warn(dev, alloc failed\n);

here too.

 + err = -ENOMEM;
 + goto err_1;
 + }


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-03 Thread Vu Pham

David Dillow wrote:

On Wed, 2013-07-03 at 20:24 +0200, Bart Van Assche wrote:
  

On 07/03/13 19:27, David Dillow wrote:


On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote:
  

The combination of dev_loss_tmo off and reconnect_delay  0 worked fine
in my tests. An I/O failure was detected shortly after the cable to the
target was pulled. I/O resumed shortly after the cable to the target was
reinserted.


Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo
 0, and fast_io_fail_tmo = 0. The other transports do not allow this
scenario, and I'm asking if it makes sense for SRP to allow it.

But now that you mention reconnect_delay, what is the meaning of that
when it is negative? That's not in the documentation. And should it be
considered in srp_tmo_valid() -- are there values of reconnect_delay
that cause problems?
  
None of the combinations that can be configured from user space can 
bring the kernel in trouble. If reconnect_delay = 0 that means that the 
time-based reconnect mechanism is disabled.



Then it should use the same semantics as the other attributes, and have
the user store off to turn it off.

And I'm getting the strong sense that the answer to my question about
fast_io_fail_tmo = 0 when dev_loss_tmo is that we should not allow that
combination, even if it doesn't break the kernel. If it doesn't make
sense, there is no reason to create an opportunity for user confusion.
  

Hello Dave,

when dev_loss_tmo expired, srp not only removes the rport but also 
removes the associated scsi_host.
One may wish to set fast_io_fail_tmo =0 for I/Os to fail-over fast to 
other paths, and dev_loss_tmo off to keep the scsi_host around until the 
target coming back.


-vu
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU

2013-07-03 Thread Jeff Squyres (jsquyres)
Bump.

On Jul 2, 2013, at 8:31 AM, Jeff Squyres jsquy...@cisco.com wrote:

 (Previous patch did not include updates for the man pages)
 
 Keep IBV_MTU_* enums values as they are, but pass MTU values around as
 a struct containing a single int.  
 
 Per lengthy discusson on the linux-rdma list, this patch introdces a
 source code incompatibility.  Although legacy applications can
 continue to use the enum values, they will need to be updated to use
 the struct.  Newer applications are encouraged to use arbitrary int
 values, not the MTU enums (e.g., 1024, 1500, 9000).
 
 Signed-off-by: Jeff Squyres jsquy...@cisco.com
 ---
 Makefile.am|  3 +-
 examples/devinfo.c | 20 +++--
 examples/pingpong.c| 12 
 examples/pingpong.h|  1 -
 examples/rc_pingpong.c | 10 +++
 examples/srq_pingpong.c| 10 +++
 examples/uc_pingpong.c | 10 +++
 examples/ud_pingpong.c |  2 +-
 include/infiniband/verbs.h | 61 +--
 man/ibv_modify_qp.3|  2 +-
 man/ibv_mtu_to_num.3   | 71 ++
 man/ibv_query_port.3   |  4 +--
 man/ibv_query_qp.3 |  2 +-
 src/cmd.c  |  8 +++---
 src/marshall.c |  2 +-
 15 files changed, 160 insertions(+), 58 deletions(-)
 create mode 100644 man/ibv_mtu_to_num.3
 
 diff --git a/Makefile.am b/Makefile.am
 index 40e83be..1159e55 100644
 --- a/Makefile.am
 +++ b/Makefile.am
 @@ -54,7 +54,8 @@ man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 
 man/ibv_devinfo.1 \
 man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3
 \
 man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3  \
 man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3   
 \
 -man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3
 +man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3  \
 +man/ibv_mtu_to_num.3
 
 DEBIAN = debian/changelog debian/compat debian/control debian/copyright \
 debian/ibverbs-utils.install debian/libibverbs1.install \
 diff --git a/examples/devinfo.c b/examples/devinfo.c
 index ff078e4..e8fb27e 100644
 --- a/examples/devinfo.c
 +++ b/examples/devinfo.c
 @@ -111,18 +111,6 @@ static const char *atomic_cap_str(enum ibv_atomic_cap 
 atom_cap)
   }
 }
 
 -static const char *mtu_str(enum ibv_mtu max_mtu)
 -{
 - switch (max_mtu) {
 - case IBV_MTU_256:  return 256;
 - case IBV_MTU_512:  return 512;
 - case IBV_MTU_1024: return 1024;
 - case IBV_MTU_2048: return 2048;
 - case IBV_MTU_4096: return 4096;
 - default:   return invalid MTU;
 - }
 -}
 -
 static const char *width_str(uint8_t width)
 {
   switch (width) {
 @@ -301,10 +289,10 @@ static int print_hca_cap(struct ibv_device *ib_dev, 
 uint8_t ib_port)
   printf(\t\tport:\t%d\n, port);
   printf(\t\t\tstate:\t\t\t%s (%d)\n,
  port_state_str(port_attr.state), port_attr.state);
 - printf(\t\t\tmax_mtu:\t\t%s (%d)\n,
 -mtu_str(port_attr.max_mtu), port_attr.max_mtu);
 - printf(\t\t\tactive_mtu:\t\t%s (%d)\n,
 -mtu_str(port_attr.active_mtu), port_attr.active_mtu);
 + printf(\t\t\tmax_mtu:\t\t%d (%d)\n,
 +ibv_mtu_to_num(port_attr.max_mtu), 
 port_attr.max_mtu.mtu);
 + printf(\t\t\tactive_mtu:\t\t%d (%d)\n,
 + ibv_mtu_to_num(port_attr.active_mtu), 
 port_attr.active_mtu.mtu);
   printf(\t\t\tsm_lid:\t\t\t%d\n, port_attr.sm_lid);
   printf(\t\t\tport_lid:\t\t%d\n, port_attr.lid);
   printf(\t\t\tport_lmc:\t\t0x%02x\n, port_attr.lmc);
 diff --git a/examples/pingpong.c b/examples/pingpong.c
 index 90732ef..d1c22c9 100644
 --- a/examples/pingpong.c
 +++ b/examples/pingpong.c
 @@ -36,18 +36,6 @@
 #include stdio.h
 #include string.h
 
 -enum ibv_mtu pp_mtu_to_enum(int mtu)
 -{
 - switch (mtu) {
 - case 256:  return IBV_MTU_256;
 - case 512:  return IBV_MTU_512;
 - case 1024: return IBV_MTU_1024;
 - case 2048: return IBV_MTU_2048;
 - case 4096: return IBV_MTU_4096;
 - default:   return -1;
 - }
 -}
 -
 uint16_t pp_get_local_lid(struct ibv_context *context, int port)
 {
   struct ibv_port_attr attr;
 diff --git a/examples/pingpong.h b/examples/pingpong.h
 index 9cdc03e..91d217b 100644
 --- a/examples/pingpong.h
 +++ b/examples/pingpong.h
 @@ -35,7 +35,6 @@
 
 #include infiniband/verbs.h
 
 -enum ibv_mtu pp_mtu_to_enum(int mtu);
 uint16_t pp_get_local_lid(struct ibv_context *context, int port);
 int pp_get_port_info(struct ibv_context *context, int port,
struct ibv_port_attr *attr);
 diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
 index 15494a1..a7e1836 100644
 --- a/examples/rc_pingpong.c
 +++ b/examples/rc_pingpong.c
 @@ -78,7 +78,7 @@ struct pingpong_dest {
 };
 
 static