Re: PATCH: opensm enhancements
HI Jeff, On 6/26/2013 5:24 PM, Jeff Becker wrote: Hi Hal. At the OFA workshop, I mentioned that I've been working on some modifications to opensm that we use at NASA. Following extensive testing of these applied to opensm 3.3.13 (the version we run here), I have ported these to top of tree opensm, and have tested them on a small cluster. Thanks for getting this done! For future reference, patches should be sent as plain text as this makes it easier to comment. The first patch modifies the console logflush command to take on or off as an argument for toggling. Thanks. Applied. The second (more extensive) patch adds a command line option to specify a file in which each line contains a switch GUID/port pair to be ignored by opensm. The idea is to specify this file when you start opensm (it can be empty), and add ports to ignore (one per line for each end of a connection) to the file. At the next heavy sweep (or HUP) the sm will reprogram the forwarding tables without including the ignored links. We use this for replacing cables, as well as for system expansion (adding new racks). I'll comment on this one later. -- Hal Please let me know if you have any questions/issues with these. Thanks. -jeff -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/13] IB SRP initiator patches for kernel 3.11
The purpose of this InfiniBand SRP initiator patch series is as follows: - Make the SRP initiator driver better suited for use in a H.A. setup. Add fast_io_fail_tmo and dev_loss_tmo parameters. These can be used either to speed up failover or to avoid device removal when e.g. using initiator side mirroring. - Make the SRP initiator better suited for use on NUMA systems by making the HCA completion vector configurable. Changes since the v2 of the IB SRP initiator patches for kernel 3.11 patch series: - Improved documentation of the newly added sysfs parameters. - Limit fast_io_fail_tmo to SCSI_DEVICE_BLOCK_MAX_TIMEOUT. - Simplified the code for parsing values written into sysfs attributes. - Fixed a potential deadlock in the code added in scsi_transport_srp (invoking cancel_delayed_work() with the rport mutex held for work that needs the rport mutex itself). - Changed the default retry count back from 2 to 7 since there is not yet agreement about this change. - Dropped the patch that silences failed SCSI commands and also the patch that fixes a race between srp_queuecommand() and srp_claim_req() since there is no agreement about these patches. Changes since the v1 of the IB SRP initiator patches for kernel 3.11 patch series: - scsi_transport_srp: Allowed both fast_io_fail and dev_loss timeouts to be disabled. - scsi_transport_srp, srp_reconnect_rport(): switched from scsi_block_requests() to scsi_target_block() for blocking SCSI command processing temporarily. - scsi_transport_srp, srp_start_tl_fail_timers(): only block SCSI device command processing if the fast_io_fail timer is enabled. - Changed srp_abort() such that upon transport offline the value FAST_IO_FAIL is returned instead of SUCCESS. - Fixed a race condition in the maintain single connection patch: a new login after removal had started but before removal had finished still could create a duplicate connection. Fixed this by deferring removal from the target list until removal has finished. - Modified the error message in the same patch for reporting that a duplicate connection has been rejected. - Modified patch 2/15 such that all possible race conditions with srp_claim_req() are addressed. - Documented the comp_vector and tl_retry_count login string parameters. - Updated dev_loss_tmo and fast_io_fail_tmo documentation - mentioned off is a valid choice. Changes compared to v5 of the Make ib_srp better suited for H.A. purposes patch series: - Left out patches that are already upstream. - Made it possible to set dev_loss_tmo to off. This is useful in a setup using initiator side mirroring to avoid that new /dev/sd* names are reassigned after a failover or cable pull and reinsert. - Added kernel module parameters to ib_srp for configuring default values of the fast_io_fail_tmo and dev_loss_tmo parameters. - Added a patch from Dotan Barak that fixes a kernel oops during rmmod triggered by resource allocation failure at module load time. - Avoid duplicate connections by refusing relogins instead of dropping duplicate connections, as proposed by Sebastian Riemer. - Added a patch from Sebastian Riemer for failing SCSI commands silently. - Added a patch from Vu Pham to make the transport layer (IB RC) retry count configurable. - Made HCA completion vector configurable. Changes since v4: - Added a patch for removing SCSI devices upon a port down event Changes since v3: - Restored the dev_loss_tmo and fast_io_fail_tmo sysfs attributes. - Included a patch to fix an ib_srp crash that could be triggered by cable pulling. Changes since v2: - Addressed the v2 review comments. - Dropped the patches that have already been merged. - Dropped the patches for integration with multipathd. - Dropped the micro-optimization of the IB completion handlers. The individual patches in this series are as follows: 0001-IB-srp-Fix-remove_one-crash-due-to-resource-exhausti.patch 0002-IB-srp-Fix-race-between-srp_queuecommand-and-srp_cla.patch 0003-IB-srp-Avoid-that-srp_reset_host-is-skipped-after-a-.patch 0004-IB-srp-Fail-I-O-fast-if-target-offline.patch 0005-IB-srp-Skip-host-settle-delay.patch 0006-IB-srp-Maintain-a-single-connection-per-I_T-nexus.patch 0007-IB-srp-Keep-rport-as-long-as-the-IB-transport-layer.patch 0008-scsi_transport_srp-Add-transport-layer-error-handlin.patch 0009-IB-srp-Add-srp_terminate_io.patch 0010-IB-srp-Use-SRP-transport-layer-error-recovery.patch 0011-IB-srp-Start-timers-if-a-transport-layer-error-occur.patch 0012-IB-srp-Fail-SCSI-commands-silently.patch 0013-IB-srp-Make-HCA-completion-vector-configurable.patch 0014-IB-srp-Make-transport-layer-retry-count-configurable.patch 0015-IB-srp-Bump-driver-version-and-release-date.patch -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 01/13] IB/srp: Fix remove_one crash due to resource exhaustion
From: Dotan Barak dot...@dev.mellanox.co.il If the add_one callback fails during driver load no resources are allocated so there isn't a need to release any resources. Trying to clean the resource may lead to the following kernel panic: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [a0132331] srp_remove_one+0x31/0x240 [ib_srp] RIP: 0010:[a0132331] [a0132331] srp_remove_one+0x31/0x240 [ib_srp] Process rmmod (pid: 4562, threadinfo 8800dd738000, task 8801167e60c0) Call Trace: [a024500e] ib_unregister_client+0x4e/0x120 [ib_core] [a01361bd] srp_cleanup_module+0x15/0x71 [ib_srp] [810ac6a4] sys_delete_module+0x194/0x260 [8100b0f2] system_call_fastpath+0x16/0x1b [bvanassche: Shortened patch description] Signed-off-by: Dotan Barak dot...@dev.mellanox.co.il Reviewed-by: Eli Cohen e...@mellanox.co.il Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: Sebastian Riemer sebastian.rie...@profitbricks.com Acked-by: David Dillow dillo...@ornl.gov Cc: Roland Dreier rol...@purestorage.com Cc: Vu Pham v...@mellanox.com --- drivers/infiniband/ulp/srp/ib_srp.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 7ccf328..368d160 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -2507,6 +2507,8 @@ static void srp_remove_one(struct ib_device *device) struct srp_target_port *target; srp_dev = ib_get_client_data(device, srp_client); + if (!srp_dev) + return; list_for_each_entry_safe(host, tmp_host, srp_dev-dev_list, list) { device_unregister(host-dev); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 02/13] IB/srp: Avoid that srp_reset_host() is skipped after a TL error
The SCSI error handler assumes that the transport layer is operational if an eh_abort_handler() returns SUCCESS. Hence let srp_abort() only return SUCCESS if sending the ABORT TASK task management function succeeded. This patch avoids that the SCSI error handler skips the srp_reset_host() call after a transport layer error. Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: David Dillow dillo...@ornl.gov Cc: Roland Dreier rol...@purestorage.com Cc: Vu Pham v...@mellanox.com Cc: Sebastian Riemer sebastian.rie...@profitbricks.com --- drivers/infiniband/ulp/srp/ib_srp.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 368d160..0e0a5a2 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1744,18 +1744,22 @@ static int srp_abort(struct scsi_cmnd *scmnd) { struct srp_target_port *target = host_to_target(scmnd-device-host); struct srp_request *req = (struct srp_request *) scmnd-host_scribble; + int ret; shost_printk(KERN_ERR, target-scsi_host, SRP abort called\n); if (!req || !srp_claim_req(target, req, scmnd)) return FAILED; - srp_send_tsk_mgmt(target, req-index, scmnd-device-lun, - SRP_TSK_ABORT_TASK); + if (srp_send_tsk_mgmt(target, req-index, scmnd-device-lun, + SRP_TSK_ABORT_TASK) == 0) + ret = SUCCESS; + else + ret = FAILED; srp_free_req(target, req, scmnd, 0); scmnd-result = DID_ABORT 16; scmnd-scsi_done(scmnd); - return SUCCESS; + return ret; } static int srp_reset_device(struct scsi_cmnd *scmnd) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 03/13] IB/srp: Fail I/O fast if target offline
If reconnecting failed we know that no command completion will be received anymore. Hence let the SCSI error handler fail such commands immediately. Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: David Dillow dillo...@ornl.gov Acked-by: Sebastian Riemer sebastian.rie...@profitbricks.com Cc: Roland Dreier rol...@purestorage.com Cc: Vu Pham v...@mellanox.com --- drivers/infiniband/ulp/srp/ib_srp.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 0e0a5a2..19279e5 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1753,6 +1753,8 @@ static int srp_abort(struct scsi_cmnd *scmnd) if (srp_send_tsk_mgmt(target, req-index, scmnd-device-lun, SRP_TSK_ABORT_TASK) == 0) ret = SUCCESS; + else if (target-transport_offline) + ret = FAST_IO_FAIL; else ret = FAILED; srp_free_req(target, req, scmnd, 0); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 04/13] IB/srp: Skip host settle delay
The SRP initiator implements host reset by reconnecting to the SRP target. That means that communication with the target is possible as soon as host reset finished. Hence skip the host settle delay. Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: David Dillow dillo...@ornl.gov Cc: Roland Dreier rol...@purestorage.com Cc: Vu Pham v...@mellanox.com Cc: Sebastian Riemer sebastian.rie...@profitbricks.com --- drivers/infiniband/ulp/srp/ib_srp.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 19279e5..2c82b90 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1952,6 +1952,7 @@ static struct scsi_host_template srp_template = { .eh_abort_handler = srp_abort, .eh_device_reset_handler= srp_reset_device, .eh_host_reset_handler = srp_reset_host, + .skip_settle_delay = true, .sg_tablesize = SRP_DEF_SG_TABLESIZE, .can_queue = SRP_CMD_SQ_SIZE, .this_id= -1, -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 05/13] IB/srp: Maintain a single connection per I_T nexus
An SRP target is required to maintain a single connection between initiator and target. This means that if the 'add_target' attribute is used to create a second connection to a target that the first connection will be logged out and that the SCSI error handler will kick in. The SCSI error handler will cause the SRP initiator to reconnect, which will cause I/O over the second connection to fail. Avoid such ping-pong behavior by disabling relogins. Note: if reconnecting manually is necessary, that is possible by deleting and recreating an rport via sysfs. Signed-off-by: Bart Van Assche bvanass...@acm.org Signed-off-by: Sebastian Riemer sebastian.rie...@profitbricks.com Acked-by: David Dillow dillo...@ornl.gov Cc: Roland Dreier rol...@kernel.org Cc: Vu Pham v...@mellanox.com --- drivers/infiniband/ulp/srp/ib_srp.c | 44 +-- 1 file changed, 42 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 2c82b90..f046e32 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -542,11 +542,11 @@ static void srp_remove_work(struct work_struct *work) WARN_ON_ONCE(target-state != SRP_TARGET_REMOVED); + srp_remove_target(target); + spin_lock(target-srp_host-target_lock); list_del(target-list); spin_unlock(target-srp_host-target_lock); - - srp_remove_target(target); } static void srp_rport_delete(struct srp_rport *rport) @@ -2008,6 +2008,36 @@ static struct class srp_class = { .dev_release = srp_release_dev }; +/** + * srp_conn_unique() - check whether the connection to a target is unique + */ +static bool srp_conn_unique(struct srp_host *host, + struct srp_target_port *target) +{ + struct srp_target_port *t; + bool ret = false; + + if (target-state == SRP_TARGET_REMOVED) + goto out; + + ret = true; + + spin_lock(host-target_lock); + list_for_each_entry(t, host-target_list, list) { + if (t != target + target-id_ext == t-id_ext + target-ioc_guid == t-ioc_guid + target-initiator_ext == t-initiator_ext) { + ret = false; + break; + } + } + spin_unlock(host-target_lock); + +out: + return ret; +} + /* * Target ports are added by writing * @@ -2264,6 +2294,16 @@ static ssize_t srp_create_target(struct device *dev, if (ret) goto err; + if (!srp_conn_unique(target-srp_host, target)) { + shost_printk(KERN_INFO, target-scsi_host, +PFX Already connected to target port with id_ext=%016llx;ioc_guid=%016llx;initiator_ext=%016llx\n, +be64_to_cpu(target-id_ext), +be64_to_cpu(target-ioc_guid), +be64_to_cpu(target-initiator_ext)); + ret = -EEXIST; + goto err; + } + if (!host-srp_dev-fmr_pool !target-allow_ext_sg target-cmd_sg_cnt target-sg_tablesize) { pr_warn(No FMR pool and no external indirect descriptors, limiting sg_tablesize to cmd_sg_cnt\n); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 06/13] IB/srp: Keep rport as long as the IB transport layer
Keep the rport data structure around after srp_remove_host() has finished until cleanup of the IB transport layer has finished completely. This is necessary because later patches use the rport pointer inside the queuecommand callback. Without this patch accessing the rport from inside a queuecommand callback is racy because srp_remove_host() must be invoked before scsi_remove_host() and because the queuecommand callback may get invoked after srp_remove_host() has finished. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Roland Dreier rol...@purestorage.com Cc: James Bottomley jbottom...@parallels.com Cc: David Dillow dillo...@ornl.gov Cc: Vu Pham v...@mellanox.com Cc: Sebastian Riemer sebastian.rie...@profitbricks.com --- drivers/infiniband/ulp/srp/ib_srp.c |3 +++ drivers/infiniband/ulp/srp/ib_srp.h |1 + drivers/scsi/scsi_transport_srp.c | 18 ++ include/scsi/scsi_transport_srp.h |2 ++ 4 files changed, 24 insertions(+) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index f046e32..f65701d 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -526,11 +526,13 @@ static void srp_remove_target(struct srp_target_port *target) WARN_ON_ONCE(target-state != SRP_TARGET_REMOVED); srp_del_scsi_host_attr(target-scsi_host); + srp_rport_get(target-rport); srp_remove_host(target-scsi_host); scsi_remove_host(target-scsi_host); srp_disconnect_target(target); ib_destroy_cm_id(target-cm_id); srp_free_target_ib(target); + srp_rport_put(target-rport); srp_free_req_data(target); scsi_host_put(target-scsi_host); } @@ -1982,6 +1984,7 @@ static int srp_add_target(struct srp_host *host, struct srp_target_port *target) } rport-lld_data = target; + target-rport = rport; spin_lock(host-target_lock); list_add_tail(target-list, host-target_list); diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h index 66fbedd..1817ed5 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.h +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -153,6 +153,7 @@ struct srp_target_port { u16 io_class; struct srp_host*srp_host; struct Scsi_Host *scsi_host; + struct srp_rport *rport; chartarget_name[32]; unsigned intscsi_id; unsigned intsg_tablesize; diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c index f379c7f..f7ba94a 100644 --- a/drivers/scsi/scsi_transport_srp.c +++ b/drivers/scsi/scsi_transport_srp.c @@ -185,6 +185,24 @@ static int srp_host_match(struct attribute_container *cont, struct device *dev) } /** + * srp_rport_get() - increment rport reference count + */ +void srp_rport_get(struct srp_rport *rport) +{ + get_device(rport-dev); +} +EXPORT_SYMBOL(srp_rport_get); + +/** + * srp_rport_put() - decrement rport reference count + */ +void srp_rport_put(struct srp_rport *rport) +{ + put_device(rport-dev); +} +EXPORT_SYMBOL(srp_rport_put); + +/** * srp_rport_add - add a SRP remote port to the device hierarchy * @shost: scsi host the remote port is connected to. * @ids: The port id for the remote port. diff --git a/include/scsi/scsi_transport_srp.h b/include/scsi/scsi_transport_srp.h index ff0f04a..5a2d2d1 100644 --- a/include/scsi/scsi_transport_srp.h +++ b/include/scsi/scsi_transport_srp.h @@ -38,6 +38,8 @@ extern struct scsi_transport_template * srp_attach_transport(struct srp_function_template *); extern void srp_release_transport(struct scsi_transport_template *); +extern void srp_rport_get(struct srp_rport *rport); +extern void srp_rport_put(struct srp_rport *rport); extern struct srp_rport *srp_rport_add(struct Scsi_Host *, struct srp_rport_identifiers *); extern void srp_rport_del(struct srp_rport *); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
Add the necessary functions in the SRP transport module to allow an SRP initiator driver to implement transport layer error handling similar to the functionality already provided by the FC transport layer. This includes: - Support for implementing fast_io_fail_tmo, the time that should elapse after having detected a transport layer problem and before failing I/O. - Support for implementing dev_loss_tmo, the time that should elapse after having detected a transport layer problem and before removing a remote port. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Roland Dreier rol...@purestorage.com Cc: James Bottomley jbottom...@parallels.com Cc: David Dillow dillo...@ornl.gov Cc: Vu Pham v...@mellanox.com Cc: Sebastian Riemer sebastian.rie...@profitbricks.com --- Documentation/ABI/stable/sysfs-transport-srp | 38 +++ drivers/scsi/scsi_transport_srp.c| 468 +- include/scsi/scsi_transport_srp.h| 62 +++- 3 files changed, 565 insertions(+), 3 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-transport-srp b/Documentation/ABI/stable/sysfs-transport-srp index b36fb0d..52babb9 100644 --- a/Documentation/ABI/stable/sysfs-transport-srp +++ b/Documentation/ABI/stable/sysfs-transport-srp @@ -5,6 +5,24 @@ Contact: linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org Description: Instructs an SRP initiator to disconnect from a target and to remove all LUNs imported from that target. +What: /sys/class/srp_remote_ports/port-h:n/dev_loss_tmo +Date: October 1, 2013 +KernelVersion: 3.11 +Contact: linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org +Description: Number of seconds the SCSI layer will wait after a transport + layer error has been observed before removing a target port. + Zero means immediate removal. Setting this attribute to off + will disable this behavior. + +What: /sys/class/srp_remote_ports/port-h:n/fast_io_fail_tmo +Date: October 1, 2013 +KernelVersion: 3.11 +Contact: linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org +Description: Number of seconds the SCSI layer will wait after a transport + layer error has been observed before failing I/O. Zero means + failing I/O immediately. Setting this attribute to off will + disable this behavior. + What: /sys/class/srp_remote_ports/port-h:n/port_id Date: June 27, 2007 KernelVersion: 2.6.24 @@ -12,8 +30,28 @@ Contact: linux-s...@vger.kernel.org Description: 16-byte local SRP port identifier in hexadecimal format. An example: 4c:49:4e:55:58:20:56:49:4f:00:00:00:00:00:00:00. +What: /sys/class/srp_remote_ports/port-h:n/reconnect_delay +Date: October 1, 2013 +KernelVersion: 3.11 +Contact: linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org +Description: Number of seconds the SCSI layer will wait after a reconnect + attempt failed before retrying. + What: /sys/class/srp_remote_ports/port-h:n/roles Date: June 27, 2007 KernelVersion: 2.6.24 Contact: linux-s...@vger.kernel.org Description: Role of the remote port. Either SRP Initiator or SRP Target. + +What: /sys/class/srp_remote_ports/port-h:n/state +Date: October 1, 2013 +KernelVersion: 3.11 +Contact: linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org +Description: State of the transport layer used for communication with the + remote port. running if the transport layer is operational; + blocked if a transport layer error has been encountered but + the fail_io_fast_tmo timer has not yet fired; fail-fast + after the fail_io_fast_tmo timer has fired and before the + dev_loss_tmo timer has fired; lost after the + dev_loss_tmo timer has fired and before the port is finally + removed. diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c index f7ba94a..1b9ebd5 100644 --- a/drivers/scsi/scsi_transport_srp.c +++ b/drivers/scsi/scsi_transport_srp.c @@ -24,12 +24,15 @@ #include linux/err.h #include linux/slab.h #include linux/string.h +#include linux/delay.h #include scsi/scsi.h +#include scsi/scsi_cmnd.h #include scsi/scsi_device.h #include scsi/scsi_host.h #include scsi/scsi_transport.h #include scsi/scsi_transport_srp.h +#include scsi_priv.h #include scsi_transport_srp_internal.h struct srp_host_attrs { @@ -38,7 +41,7 @@ struct srp_host_attrs { #define to_srp_host_attrs(host)((struct srp_host_attrs *)(host)-shost_data) #define SRP_HOST_ATTRS 0 -#define SRP_RPORT_ATTRS 3 +#define SRP_RPORT_ATTRS 8 struct srp_internal { struct scsi_transport_template t; @@ -54,6 +57,26 @@ struct srp_internal { #definedev_to_rport(d) container_of(d, struct srp_rport, dev)
[PATCH v3 08/13] IB/srp: Add srp_terminate_io()
Finish all outstanding I/O requests after fast_io_fail_tmo expired, which speeds up failover in a multipath setup. This patch is a reworked version of a patch from Sebastian Riemer. Reported-by: Sebastian Riemer sebastian.rie...@profitbricks.com Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: David Dillow dillo...@ornl.gov Cc: Roland Dreier rol...@kernel.org Cc: Vu Pham v...@mellanox.com Cc: Sebastian Riemer sebastian.rie...@profitbricks.com --- drivers/infiniband/ulp/srp/ib_srp.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index f65701d..8ba4e9c 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -686,17 +686,29 @@ static void srp_free_req(struct srp_target_port *target, spin_unlock_irqrestore(target-lock, flags); } -static void srp_reset_req(struct srp_target_port *target, struct srp_request *req) +static void srp_finish_req(struct srp_target_port *target, + struct srp_request *req, int result) { struct scsi_cmnd *scmnd = srp_claim_req(target, req, NULL); if (scmnd) { srp_free_req(target, req, scmnd, 0); - scmnd-result = DID_RESET 16; + scmnd-result = result; scmnd-scsi_done(scmnd); } } +static void srp_terminate_io(struct srp_rport *rport) +{ + struct srp_target_port *target = rport-lld_data; + int i; + + for (i = 0; i SRP_CMD_SQ_SIZE; ++i) { + struct srp_request *req = target-req_ring[i]; + srp_finish_req(target, req, DID_TRANSPORT_FAILFAST 16); + } +} + static int srp_reconnect_target(struct srp_target_port *target) { struct Scsi_Host *shost = target-scsi_host; @@ -723,8 +735,7 @@ static int srp_reconnect_target(struct srp_target_port *target) for (i = 0; i SRP_CMD_SQ_SIZE; ++i) { struct srp_request *req = target-req_ring[i]; - if (req-scmnd) - srp_reset_req(target, req); + srp_finish_req(target, req, DID_RESET 16); } INIT_LIST_HEAD(target-free_tx); @@ -1782,7 +1793,7 @@ static int srp_reset_device(struct scsi_cmnd *scmnd) for (i = 0; i SRP_CMD_SQ_SIZE; ++i) { struct srp_request *req = target-req_ring[i]; if (req-scmnd req-scmnd-device == scmnd-device) - srp_reset_req(target, req); + srp_finish_req(target, req, DID_RESET 16); } return SUCCESS; @@ -2594,6 +2605,7 @@ static void srp_remove_one(struct ib_device *device) static struct srp_function_template ib_srp_transport_functions = { .rport_delete= srp_rport_delete, + .terminate_rport_io = srp_terminate_io, }; static int __init srp_init_module(void) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 10/13] IB/srp: Start timers if a transport layer error occurs
Start the reconnect timer, fast_io_fail timer and dev_loss timers if a transport layer error occurs. Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: David Dillow dillo...@ornl.gov Cc: Roland Dreier rol...@kernel.org Cc: Vu Pham v...@mellanox.com Cc: Sebastian Riemer sebastian.rie...@profitbricks.com --- drivers/infiniband/ulp/srp/ib_srp.c | 19 +++ drivers/infiniband/ulp/srp/ib_srp.h |1 + 2 files changed, 20 insertions(+) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 0f69ae1..2557b7a 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -595,6 +595,7 @@ static void srp_remove_target(struct srp_target_port *target) srp_disconnect_target(target); ib_destroy_cm_id(target-cm_id); srp_free_target_ib(target); + cancel_work_sync(target-tl_err_work); srp_rport_put(target-rport); srp_free_req_data(target); scsi_host_put(target-scsi_host); @@ -1364,6 +1365,21 @@ static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc) PFX Recv failed with error code %d\n, res); } +/** + * srp_tl_err_work() - handle a transport layer error + * + * Note: This function may get invoked before the rport has been created, + * hence the target-rport test. + */ +static void srp_tl_err_work(struct work_struct *work) +{ + struct srp_target_port *target; + + target = container_of(work, struct srp_target_port, tl_err_work); + if (target-rport) + srp_start_tl_fail_timers(target-rport); +} + static void srp_handle_qp_err(enum ib_wc_status wc_status, enum ib_wc_opcode wc_opcode, struct srp_target_port *target) @@ -1373,6 +1389,7 @@ static void srp_handle_qp_err(enum ib_wc_status wc_status, PFX failed %s status %d\n, wc_opcode IB_WC_RECV ? receive : send, wc_status); + queue_work(system_long_wq, target-tl_err_work); } target-qp_in_error = true; } @@ -1735,6 +1752,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) if (ib_send_cm_drep(cm_id, NULL, 0)) shost_printk(KERN_ERR, target-scsi_host, PFX Sending CM DREP failed\n); + queue_work(system_long_wq, target-tl_err_work); break; case IB_CM_TIMEWAIT_EXIT: @@ -2379,6 +2397,7 @@ static ssize_t srp_create_target(struct device *dev, sizeof (struct srp_indirect_buf) + target-cmd_sg_cnt * sizeof (struct srp_direct_buf); + INIT_WORK(target-tl_err_work, srp_tl_err_work); INIT_WORK(target-remove_work, srp_remove_work); spin_lock_init(target-lock); INIT_LIST_HEAD(target-free_tx); diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h index fda82f7..e45d9d0 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.h +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -175,6 +175,7 @@ struct srp_target_port { struct srp_iu *rx_ring[SRP_RQ_SIZE]; struct srp_request req_ring[SRP_CMD_SQ_SIZE]; + struct work_struct tl_err_work; struct work_struct remove_work; struct list_headlist; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 09/13] IB/srp: Use SRP transport layer error recovery
Enable reconnect_delay, fast_io_fail_tmo and dev_loss_tmo functionality for the IB SRP initiator. Add kernel module parameters that allow to specify default values for these three parameters. Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: David Dillow dillo...@ornl.gov Cc: Roland Dreier rol...@kernel.org Cc: Vu Pham v...@mellanox.com Cc: Sebastian Riemer sebastian.rie...@profitbricks.com --- drivers/infiniband/ulp/srp/ib_srp.c | 123 +-- drivers/infiniband/ulp/srp/ib_srp.h |1 - 2 files changed, 88 insertions(+), 36 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 8ba4e9c..0f69ae1 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -86,6 +86,31 @@ module_param(topspin_workarounds, int, 0444); MODULE_PARM_DESC(topspin_workarounds, Enable workarounds for Topspin/Cisco SRP target bugs if != 0); +static int srp_reconnect_delay = 10; +module_param_named(reconnect_delay, srp_reconnect_delay, int, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(reconnect_delay, Time between successive reconnect attempts); + +static struct kernel_param_ops srp_tmo_ops; + +static int srp_fast_io_fail_tmo = 15; +module_param_cb(fast_io_fail_tmo, srp_tmo_ops, srp_fast_io_fail_tmo, + S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(fast_io_fail_tmo, +Number of seconds between the observation of a transport + layer error and failing all I/O. \off\ means that this + functionality is disabled.); + +static int srp_dev_loss_tmo = 600; +module_param_cb(dev_loss_tmo, srp_tmo_ops, srp_dev_loss_tmo, + S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(dev_loss_tmo, +Maximum number of seconds that the SRP transport should + insulate transport layer errors. After this time has been + exceeded the SCSI target is removed. Should be + between 1 and __stringify(SCSI_DEVICE_BLOCK_MAX_TIMEOUT) + if fast_io_fail_tmo has not been set. \off\ means that + this functionality is disabled.); + static void srp_add_one(struct ib_device *device); static void srp_remove_one(struct ib_device *device); static void srp_recv_completion(struct ib_cq *cq, void *target_ptr); @@ -102,6 +127,44 @@ static struct ib_client srp_client = { static struct ib_sa_client srp_sa_client; +static int srp_tmo_get(char *buffer, const struct kernel_param *kp) +{ + int tmo = *(int *)kp-arg; + + if (tmo = 0) + return sprintf(buffer, %d, tmo); + else + return sprintf(buffer, off); +} + +static int srp_tmo_set(const char *val, const struct kernel_param *kp) +{ + int tmo, res; + + if (strncmp(val, off, 3) != 0) { + res = kstrtoint(val, 0, tmo); + if (res) + goto out; + } else { + tmo = -1; + } + if (kp-arg == srp_fast_io_fail_tmo) + res = srp_tmo_valid(tmo, srp_dev_loss_tmo); + else + res = srp_tmo_valid(srp_fast_io_fail_tmo, tmo); + if (res) + goto out; + *(int *)kp-arg = tmo; + +out: + return res; +} + +static struct kernel_param_ops srp_tmo_ops = { + .get = srp_tmo_get, + .set = srp_tmo_set, +}; + static inline struct srp_target_port *host_to_target(struct Scsi_Host *host) { return (struct srp_target_port *) host-hostdata; @@ -709,13 +772,20 @@ static void srp_terminate_io(struct srp_rport *rport) } } -static int srp_reconnect_target(struct srp_target_port *target) +/* + * It is up to the caller to ensure that srp_rport_reconnect() calls are + * serialized and that no concurrent srp_queuecommand(), srp_abort(), + * srp_reset_device() or srp_reset_host() calls will occur while this function + * is in progress. One way to realize that is not to call this function + * directly but to call srp_reconnect_rport() instead since that last function + * serializes calls of this function via rport-mutex and also blocks + * srp_queuecommand() calls before invoking this function. + */ +static int srp_rport_reconnect(struct srp_rport *rport) { - struct Scsi_Host *shost = target-scsi_host; + struct srp_target_port *target = rport-lld_data; int i, ret; - scsi_target_block(shost-shost_gendev); - srp_disconnect_target(target); /* * Now get a new local CM ID so that we avoid confusing the target in @@ -745,28 +815,9 @@ static int srp_reconnect_target(struct srp_target_port *target) if (ret == 0) ret = srp_connect_target(target); - scsi_target_unblock(shost-shost_gendev, ret == 0 ? SDEV_RUNNING : - SDEV_TRANSPORT_OFFLINE); - target-transport_offline = !!ret; - - if (ret) - goto err; - - shost_printk(KERN_INFO,
[PATCH v3 11/13] IB/srp: Make HCA completion vector configurable
Several InfiniBand HCA's allow to configure the completion vector per queue pair. This allows to spread the workload created by IB completion interrupts over multiple MSI-X vectors and hence over multiple CPU cores. In other words, configuring the completion vector properly not only allows to reduce latency on an initiator connected to multiple SRP targets but also allows to improve throughput. Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: David Dillow dillo...@ornl.gov Cc: Roland Dreier rol...@kernel.org Cc: Vu Pham v...@mellanox.com Cc: Sebastian Riemer sebastian.rie...@profitbricks.com --- Documentation/ABI/stable/sysfs-driver-ib_srp |7 +++ drivers/infiniband/ulp/srp/ib_srp.c | 26 -- drivers/infiniband/ulp/srp/ib_srp.h |1 + 3 files changed, 32 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp b/Documentation/ABI/stable/sysfs-driver-ib_srp index 481aae9..5c53d28 100644 --- a/Documentation/ABI/stable/sysfs-driver-ib_srp +++ b/Documentation/ABI/stable/sysfs-driver-ib_srp @@ -54,6 +54,13 @@ Description: Interface for making ib_srp connect to a new target. ib_srp. Specifying a value that exceeds cmd_sg_entries is only safe with partial memory descriptor list support enabled (allow_ext_sg=1). + * comp_vector, a number in the range 0..n-1 specifying the + MSI-X completion vector. Some HCA's allocate multiple (n) + MSI-X vectors per HCA port. If the IRQ affinity masks of + these interrupts have been configured such that each MSI-X + interrupt is handled by a different CPU then the comp_vector + parameter can be used to spread the SRP completion workload + over multiple CPU's. What: /sys/class/infiniband_srp/srp-hca-port_number/ibdev Date: January 2, 2006 diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 2557b7a..6c164f6 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -294,14 +294,16 @@ static int srp_create_target_ib(struct srp_target_port *target) return -ENOMEM; recv_cq = ib_create_cq(target-srp_host-srp_dev-dev, - srp_recv_completion, NULL, target, SRP_RQ_SIZE, 0); + srp_recv_completion, NULL, target, SRP_RQ_SIZE, + target-comp_vector); if (IS_ERR(recv_cq)) { ret = PTR_ERR(recv_cq); goto err; } send_cq = ib_create_cq(target-srp_host-srp_dev-dev, - srp_send_completion, NULL, target, SRP_SQ_SIZE, 0); + srp_send_completion, NULL, target, SRP_SQ_SIZE, + target-comp_vector); if (IS_ERR(send_cq)) { ret = PTR_ERR(send_cq); goto err_recv_cq; @@ -1976,6 +1978,14 @@ static ssize_t show_local_ib_device(struct device *dev, return sprintf(buf, %s\n, target-srp_host-srp_dev-dev-name); } +static ssize_t show_comp_vector(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct srp_target_port *target = host_to_target(class_to_shost(dev)); + + return sprintf(buf, %d\n, target-comp_vector); +} + static ssize_t show_cmd_sg_entries(struct device *dev, struct device_attribute *attr, char *buf) { @@ -2002,6 +2012,7 @@ static DEVICE_ATTR(req_lim, S_IRUGO, show_req_lim, NULL); static DEVICE_ATTR(zero_req_lim,S_IRUGO, show_zero_req_lim, NULL); static DEVICE_ATTR(local_ib_port, S_IRUGO, show_local_ib_port, NULL); static DEVICE_ATTR(local_ib_device, S_IRUGO, show_local_ib_device, NULL); +static DEVICE_ATTR(comp_vector, S_IRUGO, show_comp_vector, NULL); static DEVICE_ATTR(cmd_sg_entries, S_IRUGO, show_cmd_sg_entries, NULL); static DEVICE_ATTR(allow_ext_sg,S_IRUGO, show_allow_ext_sg,NULL); @@ -2016,6 +2027,7 @@ static struct device_attribute *srp_host_attrs[] = { dev_attr_zero_req_lim, dev_attr_local_ib_port, dev_attr_local_ib_device, + dev_attr_comp_vector, dev_attr_cmd_sg_entries, dev_attr_allow_ext_sg, NULL @@ -2140,6 +2152,7 @@ enum { SRP_OPT_CMD_SG_ENTRIES = 1 9, SRP_OPT_ALLOW_EXT_SG= 1 10, SRP_OPT_SG_TABLESIZE= 1 11, + SRP_OPT_COMP_VECTOR = 1 12, SRP_OPT_ALL = (SRP_OPT_ID_EXT | SRP_OPT_IOC_GUID | SRP_OPT_DGID | @@ -2160,6 +2173,7 @@ static const match_table_t srp_opt_tokens = { { SRP_OPT_CMD_SG_ENTRIES, cmd_sg_entries=%u }, { SRP_OPT_ALLOW_EXT_SG,
[PATCH v3 12/13] IB/srp: Make transport layer retry count configurable
Allow the InfiniBand RC retry count to be configured by the user as an option in the target login string. Reducing this retry count helps with reducing path failover time. [bvanassche: Rewrote patch description / changed default retry count from 2 back to 7] Signed-off-by: Vu Pham v...@mellanox.com Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Roland Dreier rol...@purestorage.com Cc: David Dillow dillo...@ornl.gov Cc: Vu Pham v...@mellanox.com Cc: Sebastian Riemer sebastian.rie...@profitbricks.com --- Documentation/ABI/stable/sysfs-driver-ib_srp |2 ++ drivers/infiniband/ulp/srp/ib_srp.c | 24 +++- drivers/infiniband/ulp/srp/ib_srp.h |1 + 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp b/Documentation/ABI/stable/sysfs-driver-ib_srp index 5c53d28..18e9b27 100644 --- a/Documentation/ABI/stable/sysfs-driver-ib_srp +++ b/Documentation/ABI/stable/sysfs-driver-ib_srp @@ -61,6 +61,8 @@ Description: Interface for making ib_srp connect to a new target. interrupt is handled by a different CPU then the comp_vector parameter can be used to spread the SRP completion workload over multiple CPU's. + * tl_retry_count, a number in the range 2..7 specifying the + IB RC retry count. What: /sys/class/infiniband_srp/srp-hca-port_number/ibdev Date: January 2, 2006 diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 6c164f6..91b2d04 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -453,7 +453,7 @@ static int srp_send_req(struct srp_target_port *target) req-param.responder_resources= 4; req-param.remote_cm_response_timeout = 20; req-param.local_cm_response_timeout = 20; - req-param.retry_count= 7; + req-param.retry_count= target-tl_retry_count; req-param.rnr_retry_count= 7; req-param.max_cm_retries = 15; @@ -1986,6 +1986,14 @@ static ssize_t show_comp_vector(struct device *dev, return sprintf(buf, %d\n, target-comp_vector); } +static ssize_t show_tl_retry_count(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct srp_target_port *target = host_to_target(class_to_shost(dev)); + + return sprintf(buf, %d\n, target-tl_retry_count); +} + static ssize_t show_cmd_sg_entries(struct device *dev, struct device_attribute *attr, char *buf) { @@ -2013,6 +2021,7 @@ static DEVICE_ATTR(zero_req_lim,S_IRUGO, show_zero_req_lim, NULL); static DEVICE_ATTR(local_ib_port, S_IRUGO, show_local_ib_port, NULL); static DEVICE_ATTR(local_ib_device, S_IRUGO, show_local_ib_device, NULL); static DEVICE_ATTR(comp_vector, S_IRUGO, show_comp_vector, NULL); +static DEVICE_ATTR(tl_retry_count, S_IRUGO, show_tl_retry_count, NULL); static DEVICE_ATTR(cmd_sg_entries, S_IRUGO, show_cmd_sg_entries, NULL); static DEVICE_ATTR(allow_ext_sg,S_IRUGO, show_allow_ext_sg,NULL); @@ -2028,6 +2037,7 @@ static struct device_attribute *srp_host_attrs[] = { dev_attr_local_ib_port, dev_attr_local_ib_device, dev_attr_comp_vector, + dev_attr_tl_retry_count, dev_attr_cmd_sg_entries, dev_attr_allow_ext_sg, NULL @@ -2153,6 +2163,7 @@ enum { SRP_OPT_ALLOW_EXT_SG= 1 10, SRP_OPT_SG_TABLESIZE= 1 11, SRP_OPT_COMP_VECTOR = 1 12, + SRP_OPT_TL_RETRY_COUNT = 1 13, SRP_OPT_ALL = (SRP_OPT_ID_EXT | SRP_OPT_IOC_GUID | SRP_OPT_DGID | @@ -2174,6 +2185,7 @@ static const match_table_t srp_opt_tokens = { { SRP_OPT_ALLOW_EXT_SG, allow_ext_sg=%u }, { SRP_OPT_SG_TABLESIZE, sg_tablesize=%u }, { SRP_OPT_COMP_VECTOR, comp_vector=%u}, + { SRP_OPT_TL_RETRY_COUNT, tl_retry_count=%u }, { SRP_OPT_ERR, NULL} }; @@ -2337,6 +2349,15 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target) target-comp_vector = token; break; + case SRP_OPT_TL_RETRY_COUNT: + if (match_int(args, token) || token 2 || token 7) { + pr_warn(bad tl_retry_count parameter '%s' (must be a number between 2 and 7)\n, + p); + goto out; + } + target-tl_retry_count = token; + break; + default: pr_warn(unknown parameter or
[PATCH v3 13/13] IB/srp: Bump driver version and release date
Signed-off-by: Vu Pham v...@mellanox.com Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Roland Dreier rol...@purestorage.com Cc: David Dillow dillo...@ornl.gov Cc: Sebastian Riemer sebastian.rie...@profitbricks.com --- drivers/infiniband/ulp/srp/ib_srp.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 91b2d04..fa38bc3 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -53,8 +53,8 @@ #define DRV_NAME ib_srp #define PFXDRV_NAME : -#define DRV_VERSION0.2 -#define DRV_RELDATENovember 1, 2005 +#define DRV_VERSION1.0 +#define DRV_RELDATEJuly 1, 2013 MODULE_AUTHOR(Roland Dreier); MODULE_DESCRIPTION(InfiniBand SCSI RDMA Protocol initiator -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11
On 03/07/2013 15:41, Bart Van Assche wrote: [...] Bart, The individual patches in this series are as follows: 0001-IB-srp-Fix-remove_one-crash-due-to-resource-exhausti.patch 0002-IB-srp-Fix-race-between-srp_queuecommand-and-srp_cla.patch 0003-IB-srp-Avoid-that-srp_reset_host-is-skipped-after-a-.patch 0004-IB-srp-Fail-I-O-fast-if-target-offline.patch 0005-IB-srp-Skip-host-settle-delay.patch 0006-IB-srp-Maintain-a-single-connection-per-I_T-nexus.patch 0007-IB-srp-Keep-rport-as-long-as-the-IB-transport-layer.patch 0008-scsi_transport_srp-Add-transport-layer-error-handlin.patch 0009-IB-srp-Add-srp_terminate_io.patch 0010-IB-srp-Use-SRP-transport-layer-error-recovery.patch 0011-IB-srp-Start-timers-if-a-transport-layer-error-occur.patch 0012-IB-srp-Fail-SCSI-commands-silently.patch 0013-IB-srp-Make-HCA-completion-vector-configurable.patch 0014-IB-srp-Make-transport-layer-retry-count-configurable.patch 0015-IB-srp-Bump-driver-version-and-release-date.patch Some of these patches were already picked by Roland (SB), I would suggest that you post V4 and drop the ones which were accepted. e8ca413 IB/srp: Bump driver version and release date 4b5e5f4 IB/srp: Make HCA completion vector configurable 96fc248 IB/srp: Maintain a single connection per I_T nexus 99e1c13 IB/srp: Fail I/O fast if target offline 2742c1d IB/srp: Skip host settle delay 086f44f IB/srp: Avoid skipping srp_reset_host() after a transport error 1fe0cb8 IB/srp: Fix remove_one crash due to resource exhaustion Also, Would help if you use the --cover-letter of git format-patch and the resulted cover letter (patch 0/N) as it has standard content which you can enhance and place your additions. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 08/13] IB/srp: Add srp_terminate_io()
On Wed, 2013-07-03 at 14:55 +0200, Bart Van Assche wrote: Finish all outstanding I/O requests after fast_io_fail_tmo expired, which speeds up failover in a multipath setup. This patch is a reworked version of a patch from Sebastian Riemer. Reported-by: Sebastian Riemer sebastian.rie...@profitbricks.com Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: David Dillow dillo...@ornl.gov I don't believe I ack'd this; I don't want the callers doing the result shift, do it in srp_finish_req(). -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 14/15] IB/srp: Make transport layer retry count configurable
On Tue, 2013-07-02 at 13:18 -0600, Jason Gunthorpe wrote: On Mon, Jul 01, 2013 at 07:26:05AM -0400, David Dillow wrote: You assume independent failures, which is suspect -- many times these are data-dependent, or so I tend to think. Jason, do you have any insight on this (overall) topic you could share? All data transmitted on modern serial links is 'whitened' somehow. This is does independently on a link-by-link basis either with 8b/10b coding or with the 64b/66b scrambler. So the idea of a high level 'magic packet' that causes data-dependent errors is not statistically likely. My thought was that if we hit a statistically unlikely pattern that caused an issue, the retransmission is likely to also hit the issue given the deterministic scrambling. But I didn't think about the fact that the signal stream was being whitened. It is best to use all the information the SM provides when setting up the path, however I don't think there is a best practice idea yet for how to setup the retry count though.. Hmm, that would be a useful presentation for the workshop; I'll have to see if I can get some people interested here. Thanks for the information, Dave -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 12/13] IB/srp: Make transport layer retry count configurable
On Wed, 2013-07-03 at 14:59 +0200, Bart Van Assche wrote: Allow the InfiniBand RC retry count to be configured by the user as an option in the target login string. Reducing this retry count helps with reducing path failover time. [bvanassche: Rewrote patch description / changed default retry count from 2 back to 7] Acked-by: David Dillow dillo...@ornl.gov -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11
On 07/03/13 15:38, Or Gerlitz wrote: Some of these patches were already picked by Roland (SB), I would suggest that you post V4 and drop the ones which were accepted. One of the patches that is already in Roland's tree and that was in v1 of this series has been split into two patches in v2 and v3 of this series. So I'd like to hear from Roland what he prefers himself - that I drop the patches that are already in his tree or that Roland updates his tree with the most recently posted patches. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 08/13] IB/srp: Add srp_terminate_io()
On 07/03/13 16:08, David Dillow wrote: On Wed, 2013-07-03 at 14:55 +0200, Bart Van Assche wrote: Finish all outstanding I/O requests after fast_io_fail_tmo expired, which speeds up failover in a multipath setup. This patch is a reworked version of a patch from Sebastian Riemer. Reported-by: Sebastian Riemer sebastian.rie...@profitbricks.com Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: David Dillow dillo...@ornl.gov I don't believe I ack'd this; I don't want the callers doing the result shift, do it in srp_finish_req(). My apologies. You are correct, this patch was not yet acknowledged by you. Regarding the shift itself: is it really that important whether the caller or callee performs that shift ? Having it in the caller has the advantage that the compiler can optimize the shift operation out because the number that is being shifted left is a constant. And if later on it would be necessary to set more fields of the SCSI result in a caller of srp_finish_req() then that will be possible without having to modify the srp_finish_req() function itself. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 08/13] IB/srp: Add srp_terminate_io()
On Wed, 2013-07-03 at 16:45 +0200, Bart Van Assche wrote: Having it in the caller has the advantage that the compiler can optimize the shift operation out because the number that is being shifted left is a constant. srp_finish_req() is likely to be inlined, so the compiler will be able to make this optimization. Regardless, this is so far in the noise that it looses to readability. And if later on it would be necessary to set more fields of the SCSI result in a caller of srp_finish_req() then that will be possible without having to modify the srp_finish_req() function itself. Other than REQ_QUIET, what do you think would need to be added? I think we can cross that bridge when we get there, as I don't think REQ_QUIET should not be set in the LLDs. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 08/13] IB/srp: Add srp_terminate_io()
On Wed, 2013-07-03 at 10:57 -0400, David Dillow wrote: On Wed, 2013-07-03 at 16:45 +0200, Bart Van Assche wrote: Having it in the caller has the advantage that the compiler can optimize the shift operation out because the number that is being shifted left is a constant. srp_finish_req() is likely to be inlined, so the compiler will be able to make this optimization. Regardless, this is so far in the noise that it looses to readability. Eh, just leave it alone. As much as I don't like it, it does look to be fairly common among the LLDs and other transport code. Acked-by: David Dillow dillo...@ornl.gov -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
On Wed, 2013-07-03 at 14:54 +0200, Bart Van Assche wrote: +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo) +{ + return (fast_io_fail_tmo 0 || dev_loss_tmo 0 || + fast_io_fail_tmo dev_loss_tmo) + fast_io_fail_tmo = SCSI_DEVICE_BLOCK_MAX_TIMEOUT + dev_loss_tmo LONG_MAX / HZ ? 0 : -EINVAL; +} +EXPORT_SYMBOL_GPL(srp_tmo_valid); This would have been more readable: int srp_tmo_valid(int fast_io_fail_tmp, int dev_loss_tmo) { /* Fast IO fail must be off, or no greater than the max timeout */ if (fast_io_fail_tmo SCSI_DEVICE_BLOCK_MAX_TIMEOUT) return -EINVAL; /* Device timeout must be off, or fit into jiffies */ if (dev_loss_tmo = LONG_MAX / HZ) return -EINVAL; /* Fast IO must trigger before device loss, or one of the * timeouts must be disabled. */ if (fast_io_fail_tmo 0 || dev_loss_tmo 0) return 0; if (fast_io_fail dev_loss_tmo) return 0; return -EINVAL; } Though, now that I've unpacked it -- I don't think it is OK for dev_loss_tmo to be off, but fast IO to be on? That drops another conditional. Also, FC caps dev_loss_tmo at SCSI_DEVICE_BLOCK_MAX_TIMEOUT if fail_io_fast_tmo is off; I agree with your reasoning about leaving it unlimited if fast fail is on, but does that still hold if it is off? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
On 07/03/13 17:14, David Dillow wrote: On Wed, 2013-07-03 at 14:54 +0200, Bart Van Assche wrote: +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo) +{ + return (fast_io_fail_tmo 0 || dev_loss_tmo 0 || + fast_io_fail_tmo dev_loss_tmo) + fast_io_fail_tmo = SCSI_DEVICE_BLOCK_MAX_TIMEOUT + dev_loss_tmo LONG_MAX / HZ ? 0 : -EINVAL; +} +EXPORT_SYMBOL_GPL(srp_tmo_valid); This would have been more readable: int srp_tmo_valid(int fast_io_fail_tmp, int dev_loss_tmo) { /* Fast IO fail must be off, or no greater than the max timeout */ if (fast_io_fail_tmo SCSI_DEVICE_BLOCK_MAX_TIMEOUT) return -EINVAL; /* Device timeout must be off, or fit into jiffies */ if (dev_loss_tmo = LONG_MAX / HZ) return -EINVAL; /* Fast IO must trigger before device loss, or one of the * timeouts must be disabled. */ if (fast_io_fail_tmo 0 || dev_loss_tmo 0) return 0; if (fast_io_fail dev_loss_tmo) return 0; return -EINVAL; } Isn't that a matter of personal taste which of the above two is more clear ? It might also depend on the number of mathematics courses in someones educational background :-) Though, now that I've unpacked it -- I don't think it is OK for dev_loss_tmo to be off, but fast IO to be on? That drops another conditional. The combination of dev_loss_tmo off and reconnect_delay 0 worked fine in my tests. An I/O failure was detected shortly after the cable to the target was pulled. I/O resumed shortly after the cable to the target was reinserted. Also, FC caps dev_loss_tmo at SCSI_DEVICE_BLOCK_MAX_TIMEOUT if fail_io_fast_tmo is off; I agree with your reasoning about leaving it unlimited if fast fail is on, but does that still hold if it is off? I think setting dev_loss_tmo to a large value only makes sense if the value of reconnect_delay is not too large. Setting both to a large value would result in slow recovery after a transport layer failure has been corrected. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATCH: opensm enhancements
Hi Hal, I have some testing info about the second patch below. On 07/03/2013 03:23 AM, Hal Rosenstock wrote: HI Jeff, On 6/26/2013 5:24 PM, Jeff Becker wrote: Hi Hal. At the OFA workshop, I mentioned that I've been working on some modifications to opensm that we use at NASA. Following extensive testing of these applied to opensm 3.3.13 (the version we run here), I have ported these to top of tree opensm, and have tested them on a small cluster. Thanks for getting this done! For future reference, patches should be sent as plain text as this makes it easier to comment. OK. So I just send the output of git-format-patch directly? It appears to be formatted properly. The first patch modifies the console logflush command to take on or off as an argument for toggling. Thanks. Applied. The second (more extensive) patch adds a command line option to specify a file in which each line contains a switch GUID/port pair to be ignored by opensm. The idea is to specify this file when you start opensm (it can be empty), and add ports to ignore (one per line for each end of a connection) to the file. At the next heavy sweep (or HUP) the sm will reprogram the forwarding tables without including the ignored links. We use this for replacing cables, as well as for system expansion (adding new racks). I'll comment on this one later. Dale (cc'd) did some testing with my patch on Pleiades in preparation for a system augmentation (new racks) happening soon. He found that the SM correctly produces routes that do not use links marked to be ignored, but when you then remove or disable the links, the SM re-routes the fabric anyway and comes up with different routes than before. This rerouting causes problems with existing connections. There also appears to be a bookkeeping problem such that some of these links get added to the SM's light sampling list and never get removed. This ties up outstanding MAD packet slots, causing the SM to become unresponsive for several seconds every time it reviews its light sampling list. I'm working on fixing these. I'll take care of the second problem (incorrectly getting added to the light sampling list) first. Is it possible this problem is related to the re-routing on port disable problem? Anyhow, if you have any specific comments about these issues, that would be great. Thanks, and have a great Fourth of July. -jeff -- Hal Please let me know if you have any questions/issues with these. Thanks. -jeff -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices
On 01/07/2013 20:49, Roland Dreier wrote: - I think the active flag for the health check timer is unnecessary. It can just be stopped with del_timer_sync(). Hi Roland Jack looked on this comment/code and he says that the active flag is used to prevent re-scheduling the timer from inside the timer handling routine. In the kernel, the comment header in the source file for del_timer_sync explicitly states that re-scheduling the timer must be prevented, or the sync is useless:Callers must prevent restarting of the timer, otherwise this function is meaningless So we believe that code should remain. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH opensm] Add flags to OSM_EVENT_ID_UCAST_ROUTING_DONE
to be able to discern between ucast routing done when rerouting versus heavy sweep. Signed-off-by: Hal Rosenstock h...@mellanox.com --- diff --git a/include/opensm/osm_event_plugin.h b/include/opensm/osm_event_plugin.h index 6b060e7..ca5a719 100644 --- a/include/opensm/osm_event_plugin.h +++ b/include/opensm/osm_event_plugin.h @@ -94,6 +94,12 @@ typedef enum { LFT_CHANGED_BLOCK = (1 1) } osm_epi_lft_change_flags_t; +typedef enum { + UCAST_ROUTING_NONE, + UCAST_ROUTING_HEAVY_SWEEP, + UCAST_ROUTING_REROUTE +} osm_epi_ucast_routing_flags_t; + typedef struct osm_epi_lft_change_event { osm_switch_t *p_sw; osm_epi_lft_change_flags_t flags; diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c index 1b73834..0cc8162 100644 --- a/opensm/osm_state_mgr.c +++ b/opensm/osm_state_mgr.c @@ -1190,7 +1190,7 @@ static void do_sweep(osm_sm_t * sm) REROUTE COMPLETE); osm_opensm_report_event(sm-p_subn-p_osm, OSM_EVENT_ID_UCAST_ROUTING_DONE, - NULL); + (void *) UCAST_ROUTING_REROUTE); return; } } @@ -1387,7 +1387,8 @@ repeat_discovery: OSM_LOG_MSG_BOX(sm-p_log, OSM_LOG_VERBOSE, SWITCHES CONFIGURED FOR UNICAST); osm_opensm_report_event(sm-p_subn-p_osm, - OSM_EVENT_ID_UCAST_ROUTING_DONE, NULL); + OSM_EVENT_ID_UCAST_ROUTING_DONE, + (void *) UCAST_ROUTING_HEAVY_SWEEP); if (!sm-p_subn-opt.disable_multicast) { osm_mcast_mgr_process(sm, TRUE); diff --git a/osmeventplugin/src/osmeventplugin.c b/osmeventplugin/src/osmeventplugin.c index c5655fe..1eaf7ea 100644 --- a/osmeventplugin/src/osmeventplugin.c +++ b/osmeventplugin/src/osmeventplugin.c @@ -195,7 +195,7 @@ static void report(void *_log, osm_epi_event_id_t event_id, void *event_data) fprintf(log-log_file, Heavy sweep completed\n); break; case OSM_EVENT_ID_UCAST_ROUTING_DONE: - fprintf(log-log_file, Unicast routing completed\n); + fprintf(log-log_file, Unicast routing completed %d\n, event_data); break; case OSM_EVENT_ID_STATE_CHANGE: fprintf(log-log_file, SM state changed\n); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 4/9] IB/core: Add reserved values to enums for low-level drivers use
From: Jack Morgenstein ja...@dev.mellanox.co.il Continue the approach taken by commit d2b57063e4a IB/core: Reserve bits in enum ib_qp_create_flags for low-level driver use and reserved entries to the ib_qp_type and ib_wr_opcode enums. The low-level drivers will then define macros to use these reserved values, giving proper names to the macros for readability. Also add a range of reserved flags to enum ib_send_flags. The mlx5 IB driver uses the new additions. Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il --- include/rdma/ib_verbs.h | 35 +-- 1 files changed, 33 insertions(+), 2 deletions(-) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 98cc4b2..645c3ce 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -610,7 +610,21 @@ enum ib_qp_type { IB_QPT_RAW_PACKET = 8, IB_QPT_XRC_INI = 9, IB_QPT_XRC_TGT, - IB_QPT_MAX + IB_QPT_MAX, + /* Reserve a range for qp types internal to the low level driver. +* These qp types will not be visible at the IB core layer, so the +* IB_QPT_MAX usages should not be affected in the core layer +*/ + IB_QPT_RESERVED1 = 0x1000, + IB_QPT_RESERVED2, + IB_QPT_RESERVED3, + IB_QPT_RESERVED4, + IB_QPT_RESERVED5, + IB_QPT_RESERVED6, + IB_QPT_RESERVED7, + IB_QPT_RESERVED8, + IB_QPT_RESERVED9, + IB_QPT_RESERVED10, }; enum ib_qp_create_flags { @@ -766,6 +780,19 @@ enum ib_wr_opcode { IB_WR_MASKED_ATOMIC_CMP_AND_SWP, IB_WR_MASKED_ATOMIC_FETCH_AND_ADD, IB_WR_BIND_MW, + /* reserve values for low level drivers' internal use. +* These values will not be used at all in the ib core layer. +*/ + IB_WR_RESERVED1 = 0xf0, + IB_WR_RESERVED2, + IB_WR_RESERVED3, + IB_WR_RESERVED4, + IB_WR_RESERVED5, + IB_WR_RESERVED6, + IB_WR_RESERVED7, + IB_WR_RESERVED8, + IB_WR_RESERVED9, + IB_WR_RESERVED10, }; enum ib_send_flags { @@ -773,7 +800,11 @@ enum ib_send_flags { IB_SEND_SIGNALED= (11), IB_SEND_SOLICITED = (12), IB_SEND_INLINE = (13), - IB_SEND_IP_CSUM = (14) + IB_SEND_IP_CSUM = (14), + + /* reserve bits 26-31 for low level drivers' internal use */ + IB_SEND_RESERVED_START = (1 26), + IB_SEND_RESERVED_END= (1 31), }; struct ib_sge { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 5/9] IB/mlx5: Mellanox Connect-IB, IB driver part 1/5
From: Eli Cohen e...@mellanox.com Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/infiniband/hw/mlx5/ah.c | 95 drivers/infiniband/hw/mlx5/cq.c | 844 + drivers/infiniband/hw/mlx5/doorbell.c | 100 drivers/infiniband/hw/mlx5/mad.c | 139 ++ 4 files changed, 1178 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/hw/mlx5/ah.c create mode 100644 drivers/infiniband/hw/mlx5/cq.c create mode 100644 drivers/infiniband/hw/mlx5/doorbell.c create mode 100644 drivers/infiniband/hw/mlx5/mad.c diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c new file mode 100644 index 000..ff8f1cb --- /dev/null +++ b/drivers/infiniband/hw/mlx5/ah.c @@ -0,0 +1,95 @@ +/* + * Copyright (c) 2013, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include mlx5_ib.h + +struct ib_ah *create_ib_ah(struct ib_ah_attr *ah_attr, + struct mlx5_ib_ah *ah) +{ + u32 sgi; + + if (ah_attr-ah_flags IB_AH_GRH) { + sgi = ah_attr-grh.sgid_index 20; + + memcpy(ah-av.rgid, ah_attr-grh.dgid, 16); + ah-av.grh_gid_fl = cpu_to_be32(ah_attr-grh.flow_label | + (1 30) | sgi); + ah-av.hop_limit = ah_attr-grh.hop_limit; + ah-av.tclass = ah_attr-grh.traffic_class; + } + + ah-av.rlid = cpu_to_be16(ah_attr-dlid); + ah-av.fl_mlid = ah_attr-src_path_bits 0x7f; + ah-av.stat_rate_sl = (ah_attr-static_rate 4) | (ah_attr-sl 0xf); + + return ah-ibah; +} + +struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) +{ + struct mlx5_ib_ah *ah; + + ah = kzalloc(sizeof(*ah), GFP_ATOMIC); + if (!ah) + return ERR_PTR(-ENOMEM); + + return create_ib_ah(ah_attr, ah); /* never fails */ +} + +int mlx5_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr) +{ + struct mlx5_ib_ah *ah = to_mah(ibah); + u32 tmp; + + memset(ah_attr, 0, sizeof(*ah_attr)); + + tmp = be32_to_cpu(ah-av.grh_gid_fl); + if (tmp (1 30)) { + ah_attr-ah_flags = IB_AH_GRH; + ah_attr-grh.sgid_index = (tmp 20) 0xff; + ah_attr-grh.flow_label = tmp 0xf; + memcpy(ah_attr-grh.dgid, ah-av.rgid, 16); + ah_attr-grh.hop_limit = ah-av.hop_limit; + ah_attr-grh.traffic_class = ah-av.tclass; + } + ah_attr-dlid = be16_to_cpu(ah-av.rlid); + ah_attr-static_rate = ah-av.stat_rate_sl 4; + ah_attr-sl = ah-av.stat_rate_sl 0xf; + + return 0; +} + +int mlx5_ib_destroy_ah(struct ib_ah *ah) +{ + kfree(to_mah(ah)); + return 0; +} diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c new file mode 100644 index 000..c05868e --- /dev/null +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -0,0 +1,844 @@ +/* + * Copyright (c) 2013, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *
[PATCH V2 9/9] IB/mlx5: Mellanox Connect-IB, IB driver part 5/5
From: Eli Cohen e...@mellanox.com Signed-off-by: Eli Cohen e...@mellanox.com --- MAINTAINERS | 10 ++ drivers/infiniband/Kconfig |1 + drivers/infiniband/Makefile |1 + drivers/infiniband/hw/mlx5/Kconfig | 10 ++ drivers/infiniband/hw/mlx5/Makefile |3 +++ 5 files changed, 25 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/hw/mlx5/Kconfig create mode 100644 drivers/infiniband/hw/mlx5/Makefile diff --git a/MAINTAINERS b/MAINTAINERS index 6e82fb5..b426536 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5377,6 +5377,16 @@ S: Supported F: drivers/net/ethernet/mellanox/mlx5/core/ F: include/linux/mlx5/ +Mellanox MLX5 IB driver +M: Eli Cohen e...@mellanox.com +L: linux-rdma@vger.kernel.org +W: http://www.mellanox.com +Q: http://patchwork.kernel.org/project/linux-rdma/list/ +T: git://openfabrics.org/~eli/connect-ib.git +S: Supported +F: include/linux/mlx5/ +F: drivers/infiniband/hw/mlx5/ + MODULE SUPPORT M: Rusty Russell ru...@rustcorp.com.au S: Maintained diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index c85b56c..5ceda71 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -50,6 +50,7 @@ source drivers/infiniband/hw/amso1100/Kconfig source drivers/infiniband/hw/cxgb3/Kconfig source drivers/infiniband/hw/cxgb4/Kconfig source drivers/infiniband/hw/mlx4/Kconfig +source drivers/infiniband/hw/mlx5/Kconfig source drivers/infiniband/hw/nes/Kconfig source drivers/infiniband/hw/ocrdma/Kconfig diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile index b126fef..1fe6988 100644 --- a/drivers/infiniband/Makefile +++ b/drivers/infiniband/Makefile @@ -7,6 +7,7 @@ obj-$(CONFIG_INFINIBAND_AMSO1100) += hw/amso1100/ obj-$(CONFIG_INFINIBAND_CXGB3) += hw/cxgb3/ obj-$(CONFIG_INFINIBAND_CXGB4) += hw/cxgb4/ obj-$(CONFIG_MLX4_INFINIBAND) += hw/mlx4/ +obj-$(CONFIG_MLX5_INFINIBAND) += hw/mlx5/ obj-$(CONFIG_INFINIBAND_NES) += hw/nes/ obj-$(CONFIG_INFINIBAND_OCRDMA)+= hw/ocrdma/ obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/ diff --git a/drivers/infiniband/hw/mlx5/Kconfig b/drivers/infiniband/hw/mlx5/Kconfig new file mode 100644 index 000..8e6aebf --- /dev/null +++ b/drivers/infiniband/hw/mlx5/Kconfig @@ -0,0 +1,10 @@ +config MLX5_INFINIBAND + tristate Mellanox Connect-IB HCA support + depends on NETDEVICES ETHERNET PCI X86 + select NET_VENDOR_MELLANOX + select MLX5_CORE + ---help--- + This driver provides low-level InfiniBand support for + Mellanox Connect-IB PCI Express host channel adapters (HCAs). + This is required to use InfiniBand protocols such as + IP-over-IB or SRP with these devices. diff --git a/drivers/infiniband/hw/mlx5/Makefile b/drivers/infiniband/hw/mlx5/Makefile new file mode 100644 index 000..4ea0135 --- /dev/null +++ b/drivers/infiniband/hw/mlx5/Makefile @@ -0,0 +1,3 @@ +obj-$(CONFIG_MLX5_INFINIBAND) += mlx5_ib.o + +mlx5_ib-y := main.o cq.o doorbell.o qp.o mem.o srq.o mr.o ah.o mad.o -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 0/9] Add Mellanox mlx5 driver for Connect-IB devices
Hi Roland, all Here's V2 of the driver, with Dave's and Roland's comments addressed, looking forward to see if we have OK from Roland to merge that into 3.11 Jack, Moshe and Or. changes from V1: - Addreessed Dave Miller's comments: * Local variables in functions listed from longest to shortest * --i/++i changed to i--/i++ in all for-loops * Removed leading /* empty line from all comments * magic constants given names * endianness code moved to driver.h, and defined an endianness-dependent macro for use in assignment. * destroy_msg_cache() duplicated code removed - Addressed Roland's comments: * Renamed foo_spl to foo_lock for spinlocks. * Eliminated magic number from mlx5_cmd_stats field declaration in struct mlx5_cmd. * Eliminated unused procedure mlx5_ib_umem_populate_pas() command execution times, but all file-name-based mask bits removed. * Cleaned up mlx5_ib.h: * Added new patch for ib_verbs.h, adding reserved values to several enums * For several ib-core enums, added reserved values for use by low-level drivers. By defining macros at the low level (i.e., renaming the reserved values, in effect), the ll drivers may use these enums without needing to duplicate the ib-core enums while adding extra values. This fixes compilation problems such as: /home/roland/Src/linux-merge.git/drivers/infiniband/hw/mlx5/qp.c:975:2: error: case value 4671 not in enumerated type enum ib_qp_type * Changed ib_latency_class to mlx5_ib_latency_class, visible only in low-level driver * Eliminated the unused IB_WR_xxx_PSV enums * Defined macros MLX5_IB_SEND_UMR_UNREG, MLX5_IB_QPT_REG_UMR, and MLX5_IB_WR_UMR, taking advantage of the reserved values added to the ib_core enums. * debug-mask removed from mlx5_ib * Regarding mlx5_core, still have a debug mask to enable printouts of command data and * Removed forced -Wall -Werror -DDEBUG settings in the mlx5 core/ib makefiles changes from V0: - Per Dave's request, cross posting to both netdev and linux-rdma, to see if there are comments from netdev on the core driver. From: Eli Cohen e...@mellanox.com The patches that follow constitute the driver for Mellanox's 5th generation of HCAs named Connect-IB. The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This partitioning resembles what we have for mlx4 with the substantial difference that mlx5_ib is the pci device driver and not mlx5_core. mlx5_core provides general functionality that is intended to be used by other Mellanox devices that will be introduced in the future. In this sense, it can be perceived as a library. mlx5_ib has a similar role as any hardware device under drivers/infiniband/hw. The patches are partitioned to avoid exceeding the 100KB vger.kernel.org limitation. They are divided such that the first three ones have the code of the mlx5_core driver, and the last five the code of the mlx5_ib driver. Only the last patch per driver adds the Makefiles and Kconfigs, to make things robust for future bisections. PPC is not yet supported but support will be included in the near future. Eli Cohen (8): net/mlx5: Mellanox Connect-IB, core driver part 1/3 net/mlx5: Mellanox Connect-IB, core driver part 2/3 net/mlx5: Mellanox Connect-IB, core driver part 3/3 IB/mlx5: Mellanox Connect-IB, IB driver part 1/5 IB/mlx5: Mellanox Connect-IB, IB driver part 2/5 IB/mlx5: Mellanox Connect-IB, IB driver part 3/5 IB/mlx5: Mellanox Connect-IB, IB driver part 4/5 IB/mlx5: Mellanox Connect-IB, IB driver part 5/5 Jack Morgenstein (1): IB/core: Add reserved values to enums for low-level drivers use MAINTAINERS| 22 + drivers/infiniband/Kconfig |1 + drivers/infiniband/Makefile|1 + drivers/infiniband/hw/mlx5/Kconfig | 10 + drivers/infiniband/hw/mlx5/Makefile|3 + drivers/infiniband/hw/mlx5/ah.c| 95 + drivers/infiniband/hw/mlx5/cq.c| 844 +++ drivers/infiniband/hw/mlx5/doorbell.c | 100 + drivers/infiniband/hw/mlx5/mad.c | 139 ++ drivers/infiniband/hw/mlx5/main.c | 1504 drivers/infiniband/hw/mlx5/mem.c | 162 ++ drivers/infiniband/hw/mlx5/mlx5_ib.h | 547 + drivers/infiniband/hw/mlx5/mr.c| 1021 drivers/infiniband/hw/mlx5/qp.c| 2537 drivers/infiniband/hw/mlx5/srq.c | 478 drivers/infiniband/hw/mlx5/user.h | 121 + drivers/net/ethernet/mellanox/Kconfig |1 + drivers/net/ethernet/mellanox/Makefile |1 + drivers/net/ethernet/mellanox/mlx5/core/Kconfig| 18 + drivers/net/ethernet/mellanox/mlx5/core/Makefile |5 +
[PATCH V2 7/9] IB/mlx5: Mellanox Connect-IB, IB driver part 3/5
From: Eli Cohen e...@mellanox.com Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/infiniband/hw/mlx5/mlx5_ib.h | 547 ++ drivers/infiniband/hw/mlx5/mr.c | 1021 ++ 2 files changed, 1568 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/hw/mlx5/mlx5_ib.h create mode 100644 drivers/infiniband/hw/mlx5/mr.c diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h new file mode 100644 index 000..d2067c3 --- /dev/null +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -0,0 +1,547 @@ +/* + * Copyright (c) 2013, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef MLX5_IB_H +#define MLX5_IB_H + +#include linux/kernel.h +#include linux/sched.h +#include rdma/ib_verbs.h +#include rdma/ib_smi.h +#include linux/mlx5/driver.h +#include linux/mlx5/cq.h +#include linux/mlx5/qp.h +#include linux/mlx5/srq.h +#include linux/types.h + +#define mlx5_ib_dbg(dev, format, arg...) \ +do { \ + pr_debug(%s:%s:%d:(pid %d): format, (dev)-ib_dev.name, \ +__func__, __LINE__, current-pid, ##arg); \ +} while (0) + +#define mlx5_ib_err(dev, format, arg...) \ +pr_err(%s:%s:%d:(pid %d): format, (dev)-ib_dev.name, __func__, \ + __LINE__, current-pid, ##arg) + +#define mlx5_ib_warn(dev, format, arg...) \ +pr_warn(%s:%s:%d:(pid %d): format, (dev)-ib_dev.name, __func__,\ + __LINE__, current-pid, ##arg) + +enum { + MLX5_IB_MMAP_CMD_SHIFT = 8, + MLX5_IB_MMAP_CMD_MASK = 0xff, +}; + +enum mlx5_ib_mmap_cmd { + MLX5_IB_MMAP_REGULAR_PAGE = 0, + MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES = 1, /* always last */ +}; + +enum { + MLX5_RES_SCAT_DATA32_CQE= 0x1, + MLX5_RES_SCAT_DATA64_CQE= 0x2, + MLX5_REQ_SCAT_DATA32_CQE= 0x11, + MLX5_REQ_SCAT_DATA64_CQE= 0x22, +}; + +enum mlx5_ib_latency_class { + MLX5_IB_LATENCY_CLASS_LOW, + MLX5_IB_LATENCY_CLASS_MEDIUM, + MLX5_IB_LATENCY_CLASS_HIGH, + MLX5_IB_LATENCY_CLASS_FAST_PATH +}; + +enum mlx5_ib_mad_ifc_flags { + MLX5_MAD_IFC_IGNORE_MKEY= 1, + MLX5_MAD_IFC_IGNORE_BKEY= 2, + MLX5_MAD_IFC_NET_VIEW = 4, +}; + +struct mlx5_ib_ucontext { + struct ib_ucontext ibucontext; + struct list_headdb_page_list; + + /* protect doorbell record alloc/free +*/ + struct mutexdb_page_mutex; + struct mlx5_uuar_info uuari; +}; + +static inline struct mlx5_ib_ucontext *to_mucontext(struct ib_ucontext *ibucontext) +{ + return container_of(ibucontext, struct mlx5_ib_ucontext, ibucontext); +} + +struct mlx5_ib_pd { + struct ib_pdibpd; + u32 pdn; + u32 pa_lkey; +}; + +/* Use macros here so that don't have to duplicate + * enum ib_send_flags and enum ib_qp_type for low-level driver + */ + +#define MLX5_IB_SEND_UMR_UNREG IB_SEND_RESERVED_START +#define MLX5_IB_QPT_REG_UMRIB_QPT_RESERVED1 +#define MLX5_IB_WR_UMR IB_WR_RESERVED1 + +struct wr_list { + u16 opcode; + u16 next; +}; + +struct mlx5_ib_wq { + u64*wrid; + u32*wr_data; + struct wr_list *w_list; + unsigned *wqe_head; + u16 unsig_count; + + /*
Re: rtnl_lock deadlock on 3.10
On Wed, Jul 03, 2013 at 07:33:07AM +0200, Hannes Frederic Sowa wrote: On Wed, Jul 03, 2013 at 07:11:52AM +0200, Hannes Frederic Sowa wrote: On Tue, Jul 02, 2013 at 01:38:26PM +, Cong Wang wrote: On Tue, 02 Jul 2013 at 08:28 GMT, Hannes Frederic Sowa han...@stressinduktion.org wrote: On Mon, Jul 01, 2013 at 09:54:56AM -0500, Shawn Bohrer wrote: I've managed to hit a deadlock at boot a couple times while testing the 3.10 rc kernels. It seems to always happen when my network devices are initializing. This morning I updated to v3.10 and made a few config tweaks and so far I've hit it 4 out of 5 reboots. It looks like most processes are getting stuck on rtnl_lock. Below is a boot log with the soft lockup prints. Please let know if there is any other information I can provide: Could you try a build with CONFIG_LOCKDEP enabled? The problem is clear: ib_register_device() is called with rtnl_lock, but itself needs device_mutex, however, ib_register_client() first acquires device_mutex, then indirectly calls register_netdev() which takes rtnl_lock. Deadlock! One possible fix is always taking rtnl_lock before taking device_mutex, something like below: diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 18c1ece..890870b 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -381,6 +381,7 @@ int ib_register_client(struct ib_client *client) { struct ib_device *device; + rtnl_lock(); mutex_lock(device_mutex); list_add_tail(client-list, client_list); @@ -389,6 +390,7 @@ int ib_register_client(struct ib_client *client) client-add(device); mutex_unlock(device_mutex); + rtnl_unlock(); return 0; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index b6e049a..5a7a048 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -1609,7 +1609,7 @@ static struct net_device *ipoib_add_port(const char *format, goto event_failed; } - result = register_netdev(priv-dev); + result = register_netdevice(priv-dev); if (result) { printk(KERN_WARNING %s: couldn't register ipoib port %d; error %d\n, hca-name, port, result); Looks good to me. Shawn, could you test this patch? ib_unregister_device/ib_unregister_client would need the same change, too. I have not checked the other -add() and -remove() functions. Also cc'ed linux-rdma@vger.kernel.org, Roland Dreier. Cong's patch is missing the #include linux/rtnetlink.h but otherwise I've had 34 successful reboots with no deadlocks which is a good sign. It sounds like there are more paths that need to be audited and a proper patch submitted. I can do more testing later if needed. Thanks, Shawn -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATCH: opensm enhancements
Hi again Jeff, On 7/3/2013 12:20 PM, Jeff Becker wrote: Hi Hal, I have some testing info about the second patch below. On 07/03/2013 03:23 AM, Hal Rosenstock wrote: HI Jeff, On 6/26/2013 5:24 PM, Jeff Becker wrote: Hi Hal. At the OFA workshop, I mentioned that I've been working on some modifications to opensm that we use at NASA. Following extensive testing of these applied to opensm 3.3.13 (the version we run here), I have ported these to top of tree opensm, and have tested them on a small cluster. Thanks for getting this done! For future reference, patches should be sent as plain text as this makes it easier to comment. OK. So I just send the output of git-format-patch directly? It appears to be formatted properly. The first patch modifies the console logflush command to take on or off as an argument for toggling. Thanks. Applied. The second (more extensive) patch adds a command line option to specify a file in which each line contains a switch GUID/port pair to be ignored by opensm. The idea is to specify this file when you start opensm (it can be empty), and add ports to ignore (one per line for each end of a connection) to the file. At the next heavy sweep (or HUP) the sm will reprogram the forwarding tables without including the ignored links. We use this for replacing cables, as well as for system expansion (adding new racks). I'll comment on this one later. Dale (cc'd) did some testing with my patch on Pleiades in preparation for a system augmentation (new racks) happening soon. He found that the SM correctly produces routes that do not use links marked to be ignored, but when you then remove or disable the links, the SM re-routes the fabric anyway and comes up with different routes than before. This rerouting causes problems with existing connections. There also appears to be a bookkeeping problem such that some of these links get added to the SM's light sampling list and never get removed. This ties up outstanding MAD packet slots, causing the SM to become unresponsive for several seconds every time it reviews its light sampling list. Yes, this is one of several issues with using this approach. I plan on detailing these later as well as posting a slightly different approach for this but that may take a little longer... I'm working on fixing these. I'll take care of the second problem (incorrectly getting added to the light sampling list) first. Is it possible this problem is related to the re-routing on port disable problem? Anyhow, if you have any specific comments about these issues, that would be great. Thanks, and have a great Fourth of July. Thanks; you too! -- Hal -jeff -- Hal Please let me know if you have any questions/issues with these. Thanks. -jeff -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rtnl_lock deadlock on 3.10
On 03/07/2013 20:22, Shawn Bohrer wrote: On Wed, Jul 03, 2013 at 07:33:07AM +0200, Hannes Frederic Sowa wrote: On Wed, Jul 03, 2013 at 07:11:52AM +0200, Hannes Frederic Sowa wrote: On Tue, Jul 02, 2013 at 01:38:26PM +, Cong Wang wrote: On Tue, 02 Jul 2013 at 08:28 GMT, Hannes Frederic Sowa han...@stressinduktion.org wrote: On Mon, Jul 01, 2013 at 09:54:56AM -0500, Shawn Bohrer wrote: I've managed to hit a deadlock at boot a couple times while testing the 3.10 rc kernels. It seems to always happen when my network devices are initializing. This morning I updated to v3.10 and made a few config tweaks and so far I've hit it 4 out of 5 reboots. It looks like most processes are getting stuck on rtnl_lock. Below is a boot log with the soft lockup prints. Please let know if there is any other information I can provide: Could you try a build with CONFIG_LOCKDEP enabled? The problem is clear: ib_register_device() is called with rtnl_lock, but itself needs device_mutex, however, ib_register_client() first acquires device_mutex, then indirectly calls register_netdev() which takes rtnl_lock. Deadlock! One possible fix is always taking rtnl_lock before taking device_mutex, something like below: diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 18c1ece..890870b 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -381,6 +381,7 @@ int ib_register_client(struct ib_client *client) { struct ib_device *device; + rtnl_lock(); mutex_lock(device_mutex); list_add_tail(client-list, client_list); @@ -389,6 +390,7 @@ int ib_register_client(struct ib_client *client) client-add(device); mutex_unlock(device_mutex); + rtnl_unlock(); return 0; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index b6e049a..5a7a048 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -1609,7 +1609,7 @@ static struct net_device *ipoib_add_port(const char *format, goto event_failed; } - result = register_netdev(priv-dev); + result = register_netdevice(priv-dev); if (result) { printk(KERN_WARNING %s: couldn't register ipoib port %d; error %d\n, hca-name, port, result); Looks good to me. Shawn, could you test this patch? ib_unregister_device/ib_unregister_client would need the same change, too. I have not checked the other -add() and -remove() functions. Also cc'ed linux-rdma@vger.kernel.org, Roland Dreier. Cong's patch is missing the #include linux/rtnetlink.h but otherwise I've had 34 successful reboots with no deadlocks which is a good sign. It sounds like there are more paths that need to be audited and a proper patch submitted. I can do more testing later if needed. Thanks, Shawn Guys, I was a bit busy today looking into that, but I don't think we want the IB core layer (core/device.c) to use rtnl locking which is something that belongs to the network stack. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote: On 07/03/13 17:14, David Dillow wrote: On Wed, 2013-07-03 at 14:54 +0200, Bart Van Assche wrote: +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo) +{ + return (fast_io_fail_tmo 0 || dev_loss_tmo 0 || + fast_io_fail_tmo dev_loss_tmo) + fast_io_fail_tmo = SCSI_DEVICE_BLOCK_MAX_TIMEOUT + dev_loss_tmo LONG_MAX / HZ ? 0 : -EINVAL; +} +EXPORT_SYMBOL_GPL(srp_tmo_valid); This would have been more readable: int srp_tmo_valid(int fast_io_fail_tmp, int dev_loss_tmo) { /* Fast IO fail must be off, or no greater than the max timeout */ if (fast_io_fail_tmo SCSI_DEVICE_BLOCK_MAX_TIMEOUT) return -EINVAL; /* Device timeout must be off, or fit into jiffies */ if (dev_loss_tmo = LONG_MAX / HZ) return -EINVAL; /* Fast IO must trigger before device loss, or one of the * timeouts must be disabled. */ if (fast_io_fail_tmo 0 || dev_loss_tmo 0) return 0; if (fast_io_fail dev_loss_tmo) return 0; return -EINVAL; } Isn't that a matter of personal taste which of the above two is more clear ? No, it is quite common in Linux for complicated conditionals to be broken up into helper functions, and Vu found logic bugs in previous iterations. After unpacking it, I still found behavior that is questionable. All of this strongly points to that block being too dense for its own good. It might also depend on the number of mathematics courses in someones educational background :-) Or the number of logic courses, or their experience with Lisp. :) Though, now that I've unpacked it -- I don't think it is OK for dev_loss_tmo to be off, but fast IO to be on? That drops another conditional. The combination of dev_loss_tmo off and reconnect_delay 0 worked fine in my tests. An I/O failure was detected shortly after the cable to the target was pulled. I/O resumed shortly after the cable to the target was reinserted. Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo 0, and fast_io_fail_tmo = 0. The other transports do not allow this scenario, and I'm asking if it makes sense for SRP to allow it. But now that you mention reconnect_delay, what is the meaning of that when it is negative? That's not in the documentation. And should it be considered in srp_tmo_valid() -- are there values of reconnect_delay that cause problems? I'm starting to get a bit concerned about this patch -- can you, Vu, and Sebastian comment on the testing you have done? Also, FC caps dev_loss_tmo at SCSI_DEVICE_BLOCK_MAX_TIMEOUT if fail_io_fast_tmo is off; I agree with your reasoning about leaving it unlimited if fast fail is on, but does that still hold if it is off? I think setting dev_loss_tmo to a large value only makes sense if the value of reconnect_delay is not too large. Setting both to a large value would result in slow recovery after a transport layer failure has been corrected. So you agree it should be capped? I can't tell from your response. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
From: Hadar Hen Zion had...@mellanox.com Implement ib_uverbs_create_flow and ib_uverbs_destroy_flow to support flow steering for user space applications. Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/uverbs.h |3 + drivers/infiniband/core/uverbs_cmd.c | 199 + drivers/infiniband/core/uverbs_main.c | 13 ++- include/rdma/ib_verbs.h |1 + include/uapi/rdma/ib_user_verbs.h | 88 ++- 5 files changed, 302 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index 0fcd7aa..ad9d102 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -155,6 +155,7 @@ extern struct idr ib_uverbs_cq_idr; extern struct idr ib_uverbs_qp_idr; extern struct idr ib_uverbs_srq_idr; extern struct idr ib_uverbs_xrcd_idr; +extern struct idr ib_uverbs_rule_idr; void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj); @@ -215,5 +216,7 @@ IB_UVERBS_DECLARE_CMD(destroy_srq); IB_UVERBS_DECLARE_CMD(create_xsrq); IB_UVERBS_DECLARE_CMD(open_xrcd); IB_UVERBS_DECLARE_CMD(close_xrcd); +IB_UVERBS_DECLARE_CMD(create_flow); +IB_UVERBS_DECLARE_CMD(destroy_flow); #endif /* UVERBS_H */ diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index a7d00f6..bfc53f7 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -54,6 +54,7 @@ static struct uverbs_lock_class qp_lock_class = { .name = QP-uobj }; static struct uverbs_lock_class ah_lock_class = { .name = AH-uobj }; static struct uverbs_lock_class srq_lock_class = { .name = SRQ-uobj }; static struct uverbs_lock_class xrcd_lock_class = { .name = XRCD-uobj }; +static struct uverbs_lock_class rule_lock_class = { .name = RULE-uobj }; #define INIT_UDATA(udata, ibuf, obuf, ilen, olen) \ do {\ @@ -330,6 +331,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file, INIT_LIST_HEAD(ucontext-srq_list); INIT_LIST_HEAD(ucontext-ah_list); INIT_LIST_HEAD(ucontext-xrcd_list); + INIT_LIST_HEAD(ucontext-rule_list); ucontext-closing = 0; resp.num_comp_vectors = file-device-num_comp_vectors; @@ -2587,6 +2589,203 @@ out_put: return ret ? ret : in_len; } +static int kern_spec_to_ib_spec(struct ib_kern_spec *kern_spec, + struct _ib_flow_spec *ib_spec) +{ + ib_spec-type = kern_spec-type; + + switch (ib_spec-type) { + case IB_FLOW_SPEC_ETH: + ib_spec-eth.size = sizeof(struct ib_flow_spec_eth); + memcpy(ib_spec-eth.val, kern_spec-eth.val, + sizeof(struct ib_flow_eth_filter)); + memcpy(ib_spec-eth.mask, kern_spec-eth.mask, + sizeof(struct ib_flow_eth_filter)); + break; + case IB_FLOW_SPEC_IPV4: + ib_spec-ipv4.size = sizeof(struct ib_flow_spec_ipv4); + memcpy(ib_spec-ipv4.val, kern_spec-ipv4.val, + sizeof(struct ib_flow_ipv4_filter)); + memcpy(ib_spec-ipv4.mask, kern_spec-ipv4.mask, + sizeof(struct ib_flow_ipv4_filter)); + break; + case IB_FLOW_SPEC_TCP: + case IB_FLOW_SPEC_UDP: + ib_spec-tcp_udp.size = sizeof(struct ib_flow_spec_tcp_udp); + memcpy(ib_spec-tcp_udp.val, kern_spec-tcp_udp.val, + sizeof(struct ib_flow_tcp_udp_filter)); + memcpy(ib_spec-tcp_udp.mask, kern_spec-tcp_udp.mask, + sizeof(struct ib_flow_tcp_udp_filter)); + break; + default: + return -EINVAL; + } + return 0; +} + +ssize_t ib_uverbs_create_flow(struct ib_uverbs_file *file, + const char __user *buf, int in_len, + int out_len) +{ + struct ib_uverbs_create_flow cmd; + struct ib_uverbs_create_flow_resp resp; + struct ib_uobject *uobj; + struct ib_flow*flow_id; + struct ib_kern_flow_attr *kern_flow_attr; + struct ib_flow_attr *flow_attr; + struct ib_qp *qp; + int err = 0; + void *kern_spec; + void *ib_spec; + int i; + + if (out_len sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(cmd, buf, sizeof(cmd))) + return -EFAULT; + + if ((cmd.flow_attr.type == IB_FLOW_ATTR_SNIFFER +!capable(CAP_NET_ADMIN)) || !capable(CAP_NET_RAW)) + return -EPERM; + + if (cmd.flow_attr.num_of_specs) { + kern_flow_attr = kmalloc(cmd.flow_attr.size, GFP_KERNEL); + if (!kern_flow_attr) +
[PATCH V3 for-next 2/4] IB/core: Infra-structure to support verbs extensions through uverbs
From: Igor Ivanov igor.iva...@itseez.com Add Infra-structure to support extended uverbs capabilities in a forward/backward manner. Uverbs command opcodes which are based on the verbs extensions approach should be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format and processed a bit differently. Signed-off-by: Igor Ivanov igor.iva...@itseez.com Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/uverbs_main.c | 29 - include/uapi/rdma/ib_user_verbs.h | 10 ++ 2 files changed, 34 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 2c6f0f2..e4e7b24 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -583,9 +583,6 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf, if (copy_from_user(hdr, buf, sizeof hdr)) return -EFAULT; - if (hdr.in_words * 4 != count) - return -EINVAL; - if (hdr.command = ARRAY_SIZE(uverbs_cmd_table) || !uverbs_cmd_table[hdr.command]) return -EINVAL; @@ -597,8 +594,30 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf, if (!(file-device-ib_dev-uverbs_cmd_mask (1ull hdr.command))) return -ENOSYS; - return uverbs_cmd_table[hdr.command](file, buf + sizeof hdr, -hdr.in_words * 4, hdr.out_words * 4); + if (hdr.command = IB_USER_VERBS_CMD_THRESHOLD) { + struct ib_uverbs_cmd_hdr_ex hdr_ex; + + if (copy_from_user(hdr_ex, buf, sizeof(hdr_ex))) + return -EFAULT; + + if (((hdr_ex.in_words + hdr_ex.provider_in_words) * 4) != count) + return -EINVAL; + + return uverbs_cmd_table[hdr.command](file, +buf + sizeof(hdr_ex), +(hdr_ex.in_words + + hdr_ex.provider_in_words) * 4, +(hdr_ex.out_words + + hdr_ex.provider_out_words) * 4); + } else { + if (hdr.in_words * 4 != count) + return -EINVAL; + + return uverbs_cmd_table[hdr.command](file, +buf + sizeof(hdr), +hdr.in_words * 4, +hdr.out_words * 4); + } } static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma) diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h index 805711e..61535aa 100644 --- a/include/uapi/rdma/ib_user_verbs.h +++ b/include/uapi/rdma/ib_user_verbs.h @@ -43,6 +43,7 @@ * compatibility are made. */ #define IB_USER_VERBS_ABI_VERSION 6 +#define IB_USER_VERBS_CMD_THRESHOLD50 enum { IB_USER_VERBS_CMD_GET_CONTEXT, @@ -123,6 +124,15 @@ struct ib_uverbs_cmd_hdr { __u16 out_words; }; +struct ib_uverbs_cmd_hdr_ex { + __u32 command; + __u16 in_words; + __u16 out_words; + __u16 provider_in_words; + __u16 provider_out_words; + __u32 cmd_hdr_reserved; +}; + struct ib_uverbs_get_context { __u64 response; __u64 driver_data[0]; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 for-next 0/4] Add receive Flow Steering support
Hi Roland, all V3 addresses the comments made by Sean. There are still some concerns/questions posed by Roland on the uverbs extensions element of the series. I have posted replies for them, but so far no further comments were made. V3 changes: - Addressed comments from Sean: - modified the change-log of patch #1 to be clearer on the priority and domain semantics and usage - re-arranged the fields of struct ib_flow_attr - removed check from ib_flow_destroy - removed the IB flow spec which wasn't inline with the L2/L3/L4 approach done for Ethernet/IP/TCP|UDP, will use proper IB flow specs when adding the support for IPoIB flow steering V2 changes: - dropped struct ib_kern_flow from patch #3, this structure wasn't used and was left there by mistake (bug, thanks Roland) - removed the void *flow_context field from struct ib_flow, this was pointing to driver private data for that flow, but doesn't belong here, i.e need not be seen by the verbs consumer but rather hidden. - renamed struct mlx4_flow_handle to mlx4_ib_flow, a structure that contains the verbs level struct ib_flow and the mlx4 registeration ID for that flow V1 changes: - dropped the five pre-patches which were accepted into 3.10 - rebased the patches against Roland's for-next / 3.10-rc4 - in patch #3, ib_uverbs_destroy_flow was returning too quickly when the driver returned failure for ib_destroy_flow, need to free some uverbs resources 1st. - in patch #4, check index before accessing the array at mlx4_ib_create/destroy_flow These patches add Flow Steering support to the kernel IB core, to uverbs and to the mlx4 IB (verbs) driver along with one patch to uverbs which adds some code to support extensions. IB/core: Add receive Flow Steering support IB/core: Infra-structure to support verbs extensions through uverbs IB/core: Export ib_create/destroy_flow through uverbs IB/mlx4: Add receive Flow Steering support The main patch which introduces the Flow-Steering API is IB/core: Add receive Flow Steering support, see its change log. Looking on the Network Adapter Flow Steering slides from Tzahi Oved which he presented on the annual OFA 2012 meeting could be helpful https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/518-network-adapter-flow-steering.html Or. Hadar Hen Zion (3): IB/core: Add receive Flow Steering support IB/core: Export ib_create/destroy_flow through uverbs IB/mlx4: Add receive Flow Steering support Igor Ivanov (1): IB/core: Infra-structure to support verbs extensions through uverbs drivers/infiniband/core/uverbs.h |3 + drivers/infiniband/core/uverbs_cmd.c | 199 drivers/infiniband/core/uverbs_main.c | 42 +- drivers/infiniband/core/verbs.c | 27 drivers/infiniband/hw/mlx4/main.c | 235 + drivers/infiniband/hw/mlx4/mlx4_ib.h | 12 ++ include/linux/mlx4/device.h |5 - include/rdma/ib_verbs.h | 122 +- include/uapi/rdma/ib_user_verbs.h | 98 ++- 9 files changed, 729 insertions(+), 14 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 for-next 4/4] IB/mlx4: Add receive Flow Steering support
From: Hadar Hen Zion had...@mellanox.com Implement ib_create_flow and ib_destroy_flow. Translate the verbs structures provided by the user to HW structures and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands. On the ATTACH command completion, the firmware provides 64 bit registration ID which is placed into struct mlx4_ib_flow that wraps the instance of struct ib_flow which is retuned to caller. Later, this reg ID is used for detaching that flow from the firmware. Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/hw/mlx4/main.c| 235 ++ drivers/infiniband/hw/mlx4/mlx4_ib.h | 12 ++ include/linux/mlx4/device.h |5 - 3 files changed, 247 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index a188d31..5b5518f 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -54,6 +54,8 @@ #define DRV_VERSION1.0 #define DRV_RELDATEApril 4, 2008 +#define MLX4_IB_FLOW_MAX_PRIO 0xFFF + MODULE_AUTHOR(Roland Dreier); MODULE_DESCRIPTION(Mellanox ConnectX HCA InfiniBand driver); MODULE_LICENSE(Dual BSD/GPL); @@ -88,6 +90,25 @@ static void init_query_mad(struct ib_smp *mad) static union ib_gid zgid; +static int check_flow_steering_support(struct mlx4_dev *dev) +{ + int ib_num_ports = 0; + int i; + + mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB) + ib_num_ports++; + + if (dev-caps.steering_mode == MLX4_STEERING_MODE_DEVICE_MANAGED) { + if (ib_num_ports || mlx4_is_mfunc(dev)) { + pr_warn(Device managed flow steering is unavailable + for IB ports or in multifunction env.\n); + return 0; + } + return 1; + } + return 0; +} + static int mlx4_ib_query_device(struct ib_device *ibdev, struct ib_device_attr *props) { @@ -144,6 +165,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, props-device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2B; else props-device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2A; + if (check_flow_steering_support(dev-dev)) + props-device_cap_flags |= IB_DEVICE_MANAGED_FLOW_STEERING; } props-vendor_id = be32_to_cpup((__be32 *) (out_mad-data + 36)) @@ -798,6 +821,209 @@ struct mlx4_ib_steering { union ib_gid gid; }; +static int parse_flow_attr(struct mlx4_dev *dev, + struct _ib_flow_spec *ib_spec, + struct _rule_hw *mlx4_spec) +{ + enum mlx4_net_trans_rule_id type; + + switch (ib_spec-type) { + case IB_FLOW_SPEC_ETH: + type = MLX4_NET_TRANS_RULE_ID_ETH; + memcpy(mlx4_spec-eth.dst_mac, ib_spec-eth.val.dst_mac, + ETH_ALEN); + memcpy(mlx4_spec-eth.dst_mac_msk, ib_spec-eth.mask.dst_mac, + ETH_ALEN); + mlx4_spec-eth.vlan_tag = ib_spec-eth.val.vlan_tag; + mlx4_spec-eth.vlan_tag_msk = ib_spec-eth.mask.vlan_tag; + break; + + case IB_FLOW_SPEC_IPV4: + type = MLX4_NET_TRANS_RULE_ID_IPV4; + mlx4_spec-ipv4.src_ip = ib_spec-ipv4.val.src_ip; + mlx4_spec-ipv4.src_ip_msk = ib_spec-ipv4.mask.src_ip; + mlx4_spec-ipv4.dst_ip = ib_spec-ipv4.val.dst_ip; + mlx4_spec-ipv4.dst_ip_msk = ib_spec-ipv4.mask.dst_ip; + break; + + case IB_FLOW_SPEC_TCP: + case IB_FLOW_SPEC_UDP: + type = ib_spec-type == IB_FLOW_SPEC_TCP ? + MLX4_NET_TRANS_RULE_ID_TCP : + MLX4_NET_TRANS_RULE_ID_UDP; + mlx4_spec-tcp_udp.dst_port = ib_spec-tcp_udp.val.dst_port; + mlx4_spec-tcp_udp.dst_port_msk = ib_spec-tcp_udp.mask.dst_port; + mlx4_spec-tcp_udp.src_port = ib_spec-tcp_udp.val.src_port; + mlx4_spec-tcp_udp.src_port_msk = ib_spec-tcp_udp.mask.src_port; + break; + + default: + return -EINVAL; + } + if (mlx4_map_sw_to_hw_steering_id(dev, type) 0 || + mlx4_hw_rule_sz(dev, type) 0) + return -EINVAL; + mlx4_spec-id = cpu_to_be16(mlx4_map_sw_to_hw_steering_id(dev, type)); + mlx4_spec-size = mlx4_hw_rule_sz(dev, type) 2; + return mlx4_hw_rule_sz(dev, type); +} + +static int __mlx4_ib_create_flow(struct ib_qp *qp, struct ib_flow_attr *flow_attr, + int domain, + enum mlx4_net_trans_promisc_mode flow_type, + u64 *reg_id) +{ + int ret, i; + int size = 0; + void *ib_flow; + struct
[PATCH V3 for-next 1/4] IB/core: Add receive Flow Steering support
From: Hadar Hen Zion had...@mellanox.com The RDMA stack allows for applications to create IB_QPT_RAW_PACKET QPs, for which plain Ethernet packets are used, specifically packets which don't carry any QPN to be matched by the receiving side. Applications using these QPs must be provided with a method to program some steering rule with the HW so packets arriving at the local port can be routed to them. This patch adds ib_create_flow which allow to provide a flow specification for a QP, such that when there's a match between the specification and the received packet, it can be forwarded to that QP, in a similar manner one needs to use ib_attach_multicast for IB UD multicast handling. Flow specifications are provided as instances of struct ib_flow_spec_yyy which describe L2, L3 and L4 headers, currently specs for Ethernet, IPv4, TCP and UDP are defined. Flow specs are made of values and masks. The input to ib_create_flow is instance of struct ib_flow_attr which contain few mandatory control elements and optional flow specs. struct ib_flow_attr { enum ib_flow_attr_type type; u16 size; u16 priority; u32 flags; u8 num_of_specs; u8 port; /* Following are the optional layers according to user request * struct ib_flow_spec_yyy * struct ib_flow_spec_zzz */ }; As these specs are eventually coming from user space, they are defined and used in a way which allows adding new spec types without kernel/user ABI change, and with a little API enhancement which defines the newly added spec. The flow spec structures are defined in a TLV (Type-Length-Value) manner, which allows to call ib_create_flow with a list of variable length of optional specs. For the actual processing of ib_flow_attr the driver uses the number of specs and the size mandatory fields along with the TLV nature of the specs. Steering rules processing order is according to the domain over which the rule is set and the rule priority. All rules set by user space applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains could be used by future IPoIB RFS and Ethetool flow-steering interface implementation. Lower priority numerical value means higher priority. The returned value from ib_create_flow is instance of struct ib_flow which contains a database pointer (handle) provided by the HW driver to be used when calling ib_destroy_flow. Applications that offload TCP/IP traffic could be written also over IB UD QPs. As such, the ib_create_flow / ib_destroy_flow API is designed to support UD QPs too, the HW driver sets IB_DEVICE_MANAGED_FLOW_STEERING to denote support of flow steering. The ib_flow_attr enum type relates to usage of flow steering for promiscuous and sniffer purposes: IB_FLOW_ATTR_NORMAL - regular rule, steering according to rule specification IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive all Ethernet traffic which isn't steered to any QP IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type. Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/verbs.c | 27 + include/rdma/ib_verbs.h | 121 ++- 2 files changed, 146 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 22192de..87a8102 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1254,3 +1254,30 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd) return xrcd-device-dealloc_xrcd(xrcd); } EXPORT_SYMBOL(ib_dealloc_xrcd); + +struct ib_flow *ib_create_flow(struct ib_qp *qp, + struct ib_flow_attr *flow_attr, + int domain) +{ + struct ib_flow *flow_id; + if (!qp-device-create_flow) + return ERR_PTR(-ENOSYS); + + flow_id = qp-device-create_flow(qp, flow_attr, domain); + if (!IS_ERR(flow_id)) + atomic_inc(qp-usecnt); + return flow_id; +} +EXPORT_SYMBOL(ib_create_flow); + +int ib_destroy_flow(struct ib_flow *flow_id) +{ + int err; + struct ib_qp *qp = flow_id-qp; + + err = qp-device-destroy_flow(flow_id); + if (!err) + atomic_dec(qp-usecnt); + return err; +} +EXPORT_SYMBOL(ib_destroy_flow); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 98cc4b2..1390a0f 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -116,7 +116,8 @@ enum ib_device_cap_flags { IB_DEVICE_MEM_MGT_EXTENSIONS= (121), IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122), IB_DEVICE_MEM_WINDOW_TYPE_2A= (123), - IB_DEVICE_MEM_WINDOW_TYPE_2B= (124) +
[PATCH] IB/qib: fix module level leak
The vzalloc()'ed field physshadow is leaked on module unload. This patch adds vfree after the sibling page shadow is freed. Reported-by: Dean Luick dean.lu...@intel.com Reviewed-by: Dean Luick dean.lu...@intel.com Signed-off-by: Mike Marciniszyn mike.marcinis...@intel.com --- drivers/infiniband/hw/qib/qib_init.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/qib/qib_init.c b/drivers/infiniband/hw/qib/qib_init.c index fdae429..36e048e 100644 --- a/drivers/infiniband/hw/qib/qib_init.c +++ b/drivers/infiniband/hw/qib/qib_init.c @@ -1350,7 +1350,7 @@ static void cleanup_device_data(struct qib_devdata *dd) if (dd-pageshadow) { struct page **tmpp = dd-pageshadow; dma_addr_t *tmpd = dd-physshadow; - int i, cnt = 0; + int i; for (ctxt = 0; ctxt dd-cfgctxts; ctxt++) { int ctxt_tidbase = ctxt * dd-rcvtidcnt; @@ -1363,13 +1363,13 @@ static void cleanup_device_data(struct qib_devdata *dd) PAGE_SIZE, PCI_DMA_FROMDEVICE); qib_release_user_pages(tmpp[i], 1); tmpp[i] = NULL; - cnt++; } } - tmpp = dd-pageshadow; dd-pageshadow = NULL; vfree(tmpp); + dd-physshadow = NULL; + vfree(tmpd); } /* -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
On 07/03/13 19:27, David Dillow wrote: On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote: The combination of dev_loss_tmo off and reconnect_delay 0 worked fine in my tests. An I/O failure was detected shortly after the cable to the target was pulled. I/O resumed shortly after the cable to the target was reinserted. Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo 0, and fast_io_fail_tmo = 0. The other transports do not allow this scenario, and I'm asking if it makes sense for SRP to allow it. But now that you mention reconnect_delay, what is the meaning of that when it is negative? That's not in the documentation. And should it be considered in srp_tmo_valid() -- are there values of reconnect_delay that cause problems? None of the combinations that can be configured from user space can bring the kernel in trouble. If reconnect_delay = 0 that means that the time-based reconnect mechanism is disabled. I'm starting to get a bit concerned about this patch -- can you, Vu, and Sebastian comment on the testing you have done? All combinations of reconnect_delay, fast_io_fail_tmo and dev_loss_tmo that result in different behavior have been tested. Also, FC caps dev_loss_tmo at SCSI_DEVICE_BLOCK_MAX_TIMEOUT if fail_io_fast_tmo is off; I agree with your reasoning about leaving it unlimited if fast fail is on, but does that still hold if it is off? I think setting dev_loss_tmo to a large value only makes sense if the value of reconnect_delay is not too large. Setting both to a large value would result in slow recovery after a transport layer failure has been corrected. So you agree it should be capped? I can't tell from your response. Not all combinations of reconnect_delay / fail_io_fast_tmo / dev_loss_tmo result in useful behavior. It is up to the user to choose a meaningful combination. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
On Wed, 2013-07-03 at 20:24 +0200, Bart Van Assche wrote: On 07/03/13 19:27, David Dillow wrote: On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote: The combination of dev_loss_tmo off and reconnect_delay 0 worked fine in my tests. An I/O failure was detected shortly after the cable to the target was pulled. I/O resumed shortly after the cable to the target was reinserted. Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo 0, and fast_io_fail_tmo = 0. The other transports do not allow this scenario, and I'm asking if it makes sense for SRP to allow it. But now that you mention reconnect_delay, what is the meaning of that when it is negative? That's not in the documentation. And should it be considered in srp_tmo_valid() -- are there values of reconnect_delay that cause problems? None of the combinations that can be configured from user space can bring the kernel in trouble. If reconnect_delay = 0 that means that the time-based reconnect mechanism is disabled. Then it should use the same semantics as the other attributes, and have the user store off to turn it off. And I'm getting the strong sense that the answer to my question about fast_io_fail_tmo = 0 when dev_loss_tmo is that we should not allow that combination, even if it doesn't break the kernel. If it doesn't make sense, there is no reason to create an opportunity for user confusion. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices
On Wed, Jul 3, 2013 at 9:41 AM, Or Gerlitz ogerl...@mellanox.com wrote: Jack looked on this comment/code and he says that the active flag is used to prevent re-scheduling the timer from inside the timer handling routine. In the kernel, the comment header in the source file for del_timer_sync explicitly states that re-scheduling the timer must be prevented, or the sync is useless:Callers must prevent restarting of the timer, otherwise this function is meaningless So we believe that code should remain. Look at the actual timer code. del_timer_sync() won't work if something unrelated re-adds the timer, but it will work if the timer itself is what re-adds itself. Documentation/DocBook/kernel-locking.tmpl says: Another common problem is deleting timers which restart themselves (by calling functionadd_timer()/function at the end of their timer function). Because this is a fairly common case which is prone to races, you should use functiondel_timer_sync()/function (filename class=headerfileinclude/linux/timer.h/filename) to handle this case. It returns the number of times the timer had to be deleted before we finally stopped it from adding itself back in. which pretty clearly says that del_timer_sync() will work in this case. Or look at the code using it in arch/sparc/kernel/led.c for example (just one of the first hits in my grep, there are many other examples). Not a big deal but I'm pretty sure the flag isn't needed. - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices
On Wed, Jul 3, 2013 at 10:26 PM, Roland Dreier rol...@kernel.org wrote: On Wed, Jul 3, 2013 at 9:41 AM, Or Gerlitz ogerl...@mellanox.com wrote: Jack looked on this comment/code and he says that the active flag is used to prevent re-scheduling the timer from inside the timer handling routine. In the kernel, the comment header in the source file for del_timer_sync explicitly states that re-scheduling the timer must be prevented, or the sync is useless:Callers must prevent restarting of the timer, otherwise this function is meaningless So we believe that code should remain. Look at the actual timer code. del_timer_sync() won't work if something unrelated re-adds the timer, but it will work if the timer itself is what re-adds itself. [...] OK, we will re-look into that tomorrow. So how V2 looks? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 1/9] net/mlx5: Mellanox Connect-IB, core driver part 1/3
On Wed, 2013-07-03 at 20:13 +0300, Or Gerlitz wrote: From: Eli Cohen e...@mellanox.com trivial comments: diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c [] +static const char *deliv_status_to_str(u8 status) +{ + switch (status) { + case MLX5_CMD_DELIVERY_STAT_OK: + return no errors; [] + default: + return unknown status code\n; + } +} Likely unnecessary newline for default case +static struct mlx5_cmd_mailbox *alloc_cmd_box(struct mlx5_core_dev *dev, + gfp_t flags) +{ + struct mlx5_cmd_mailbox *mailbox; + + mailbox = kmalloc(sizeof(*mailbox), flags); + if (!mailbox) { + mlx5_core_dbg(dev, failed allocation\n); + return ERR_PTR(-ENOMEM); + } unnecessary OOM message. +static void set_wqname(struct mlx5_core_dev *dev) +{ + struct mlx5_cmd *cmd = dev-cmd; + + strcpy(cmd-wq_name, mlx5_cmd_); + strcat(cmd-wq_name, dev_name(dev-pdev-dev)); More likely snprintf might be better. snprintf(cmd-wq_name, sizeof(cmd-wq_name), mlx5_cmd_%s, dev_name(dev-pdev-dev)); How big is wq_name? Will a maximum length dev_name always fit? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 05/25] infiniband: Change how dentry's d_lock field is accessed
Because of the changes made in dcache.h header file, files that use the d_lock field of the dentry structure need to be changed accordingly. All the d_lock's spin_lock() and spin_unlock() calls are replaced by the corresponding d_lock() and d_unlock() calls. There is no change in logic and everything should just work. Signed-off-by: Waiman Long waiman.l...@hp.com --- drivers/infiniband/hw/ipath/ipath_fs.c |6 +++--- drivers/infiniband/hw/qib/qib_fs.c |6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_fs.c b/drivers/infiniband/hw/ipath/ipath_fs.c index e0c404b..1efee26 100644 --- a/drivers/infiniband/hw/ipath/ipath_fs.c +++ b/drivers/infiniband/hw/ipath/ipath_fs.c @@ -277,14 +277,14 @@ static int remove_file(struct dentry *parent, char *name) goto bail; } - spin_lock(tmp-d_lock); + d_lock(tmp); if (!(d_unhashed(tmp) tmp-d_inode)) { dget_dlock(tmp); __d_drop(tmp); - spin_unlock(tmp-d_lock); + d_unlock(tmp); simple_unlink(parent-d_inode, tmp); } else - spin_unlock(tmp-d_lock); + d_unlock(tmp); ret = 0; bail: diff --git a/drivers/infiniband/hw/qib/qib_fs.c b/drivers/infiniband/hw/qib/qib_fs.c index f247fc6..63713ee 100644 --- a/drivers/infiniband/hw/qib/qib_fs.c +++ b/drivers/infiniband/hw/qib/qib_fs.c @@ -454,14 +454,14 @@ static int remove_file(struct dentry *parent, char *name) goto bail; } - spin_lock(tmp-d_lock); + d_lock(tmp); if (!(d_unhashed(tmp) tmp-d_inode)) { dget_dlock(tmp); __d_drop(tmp); - spin_unlock(tmp-d_lock); + d_unlock(tmp); simple_unlink(parent-d_inode, tmp); } else { - spin_unlock(tmp-d_lock); + d_unlock(tmp); } ret = 0; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 5/9] IB/mlx5: Mellanox Connect-IB, IB driver part 1/5
On Wed, 2013-07-03 at 20:13 +0300, Or Gerlitz wrote: From: Eli Cohen e...@mellanox.com more trivia: diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c [] +struct ib_ah *create_ib_ah(struct ib_ah_attr *ah_attr, +struct mlx5_ib_ah *ah) +{ + u32 sgi; sgi is used once here and looks more confusing than helpful + + if (ah_attr-ah_flags IB_AH_GRH) { + sgi = ah_attr-grh.sgid_index 20; + + memcpy(ah-av.rgid, ah_attr-grh.dgid, 16); + ah-av.grh_gid_fl = cpu_to_be32(ah_attr-grh.flow_label | + (1 30) | sgi); + ah-av.hop_limit = ah_attr-grh.hop_limit; + ah-av.tclass = ah_attr-grh.traffic_class; + } + + ah-av.rlid = cpu_to_be16(ah_attr-dlid); + ah-av.fl_mlid = ah_attr-src_path_bits 0x7f; + ah-av.stat_rate_sl = (ah_attr-static_rate 4) | (ah_attr-sl 0xf); + + return ah-ibah; +} [] +static void *get_sw_cqe(struct mlx5_ib_cq *cq, int n) +{ + void *cqe = get_cqe(cq, n cq-ibcq.cqe); + struct mlx5_cqe64 *cqe64; + + cqe64 = (cq-mcq.cqe_sz == 64) ? cqe : cqe + 64; + return ((cqe64-op_own MLX5_CQE_OWNER_MASK) ^ + !!(n (cq-ibcq.cqe + 1))) ? NULL : cqe; I think foo ^ !!bar is excessively tricky. +static enum ib_wc_opcode get_umr_comp(struct mlx5_ib_wq *wq, int idx) +{ + pr_warn(unkonwn completion status\n); unknown tyop [] +static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata, + struct ib_ucontext *context, struct mlx5_ib_cq *cq, + int entries, struct mlx5_create_cq_mbox_in **cqb, + int *cqe_size, int *index, int *inlen) [] + *inlen = sizeof **cqb + sizeof *(*cqb)-pas * ncont; sizeof always uses parentheses + *cqb = vzalloc(*inlen); Perhaps you may be using vzalloc too often. Maybe you should have a helper allocating either from kmalloc or vmalloc as necessary based on size. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 7/9] IB/mlx5: Mellanox Connect-IB, IB driver part 3/5
On Wed, 2013-07-03 at 20:13 +0300, Or Gerlitz wrote: From: Eli Cohen e...@mellanox.com More trivia: diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h [] +#define mlx5_ib_dbg(dev, format, arg...) \ +do { \ + pr_debug(%s:%s:%d:(pid %d): format, (dev)-ib_dev.name, \ + __func__, __LINE__, current-pid, ##arg); \ +} while (0) unnecessary do {} while (0) +static void clean_keys(struct mlx5_ib_dev *dev, int c) +{ + struct device *ddev = dev-ib_dev.dma_device; + struct mlx5_mr_cache *cache = dev-cache; + struct mlx5_cache_ent *ent = cache-ent[c]; + struct mlx5_ib_mr *mr; + int size; + int err; + + while (1) { + spin_lock(ent-lock); + if (list_empty(ent-head)) { + spin_unlock(ent-lock); + return; + } + mr = list_first_entry(ent-head, struct mlx5_ib_mr, list); + list_del(mr-list); + ent-cur--; + ent-size--; + spin_unlock(ent-lock); + err = mlx5_core_destroy_mkey(dev-mdev, mr-mmr); + if (err) { + mlx5_ib_warn(dev, failed destroy mkey\n); Are you leaking anything here by not freeing? + } else { + size = ALIGN(sizeof(u64) * (1 mr-order), 0x40); + dma_unmap_single(ddev, mr-dma, size, DMA_TO_DEVICE); + kfree(mr-pas); + kfree(mr); + } + }; +} +static struct mlx5_ib_mr *reg_create(struct ib_pd *pd, u64 virt_addr, + u64 length, struct ib_umem *umem, + int npages, int page_shift, + int access_flags) +{ [] + mr = kzalloc(sizeof(*mr), GFP_KERNEL); + if (!mr) { + mlx5_ib_warn(dev, allocation failed\n); Another unnecessary OOM + mr = ERR_PTR(-ENOMEM); + } + + inlen = sizeof(*in) + sizeof(*in-pas) * ((npages + 1) / 2) * 2; + in = vzalloc(inlen); + if (!in) { + mlx5_ib_warn(dev, alloc failed\n); here too. + err = -ENOMEM; + goto err_1; + } -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
David Dillow wrote: On Wed, 2013-07-03 at 20:24 +0200, Bart Van Assche wrote: On 07/03/13 19:27, David Dillow wrote: On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote: The combination of dev_loss_tmo off and reconnect_delay 0 worked fine in my tests. An I/O failure was detected shortly after the cable to the target was pulled. I/O resumed shortly after the cable to the target was reinserted. Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo 0, and fast_io_fail_tmo = 0. The other transports do not allow this scenario, and I'm asking if it makes sense for SRP to allow it. But now that you mention reconnect_delay, what is the meaning of that when it is negative? That's not in the documentation. And should it be considered in srp_tmo_valid() -- are there values of reconnect_delay that cause problems? None of the combinations that can be configured from user space can bring the kernel in trouble. If reconnect_delay = 0 that means that the time-based reconnect mechanism is disabled. Then it should use the same semantics as the other attributes, and have the user store off to turn it off. And I'm getting the strong sense that the answer to my question about fast_io_fail_tmo = 0 when dev_loss_tmo is that we should not allow that combination, even if it doesn't break the kernel. If it doesn't make sense, there is no reason to create an opportunity for user confusion. Hello Dave, when dev_loss_tmo expired, srp not only removes the rport but also removes the associated scsi_host. One may wish to set fast_io_fail_tmo =0 for I/Os to fail-over fast to other paths, and dev_loss_tmo off to keep the scsi_host around until the target coming back. -vu -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU
Bump. On Jul 2, 2013, at 8:31 AM, Jeff Squyres jsquy...@cisco.com wrote: (Previous patch did not include updates for the man pages) Keep IBV_MTU_* enums values as they are, but pass MTU values around as a struct containing a single int. Per lengthy discusson on the linux-rdma list, this patch introdces a source code incompatibility. Although legacy applications can continue to use the enum values, they will need to be updated to use the struct. Newer applications are encouraged to use arbitrary int values, not the MTU enums (e.g., 1024, 1500, 9000). Signed-off-by: Jeff Squyres jsquy...@cisco.com --- Makefile.am| 3 +- examples/devinfo.c | 20 +++-- examples/pingpong.c| 12 examples/pingpong.h| 1 - examples/rc_pingpong.c | 10 +++ examples/srq_pingpong.c| 10 +++ examples/uc_pingpong.c | 10 +++ examples/ud_pingpong.c | 2 +- include/infiniband/verbs.h | 61 +-- man/ibv_modify_qp.3| 2 +- man/ibv_mtu_to_num.3 | 71 ++ man/ibv_query_port.3 | 4 +-- man/ibv_query_qp.3 | 2 +- src/cmd.c | 8 +++--- src/marshall.c | 2 +- 15 files changed, 160 insertions(+), 58 deletions(-) create mode 100644 man/ibv_mtu_to_num.3 diff --git a/Makefile.am b/Makefile.am index 40e83be..1159e55 100644 --- a/Makefile.am +++ b/Makefile.am @@ -54,7 +54,8 @@ man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 man/ibv_devinfo.1 \ man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3 \ man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3 \ man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3 \ -man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3 +man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3 \ +man/ibv_mtu_to_num.3 DEBIAN = debian/changelog debian/compat debian/control debian/copyright \ debian/ibverbs-utils.install debian/libibverbs1.install \ diff --git a/examples/devinfo.c b/examples/devinfo.c index ff078e4..e8fb27e 100644 --- a/examples/devinfo.c +++ b/examples/devinfo.c @@ -111,18 +111,6 @@ static const char *atomic_cap_str(enum ibv_atomic_cap atom_cap) } } -static const char *mtu_str(enum ibv_mtu max_mtu) -{ - switch (max_mtu) { - case IBV_MTU_256: return 256; - case IBV_MTU_512: return 512; - case IBV_MTU_1024: return 1024; - case IBV_MTU_2048: return 2048; - case IBV_MTU_4096: return 4096; - default: return invalid MTU; - } -} - static const char *width_str(uint8_t width) { switch (width) { @@ -301,10 +289,10 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port) printf(\t\tport:\t%d\n, port); printf(\t\t\tstate:\t\t\t%s (%d)\n, port_state_str(port_attr.state), port_attr.state); - printf(\t\t\tmax_mtu:\t\t%s (%d)\n, -mtu_str(port_attr.max_mtu), port_attr.max_mtu); - printf(\t\t\tactive_mtu:\t\t%s (%d)\n, -mtu_str(port_attr.active_mtu), port_attr.active_mtu); + printf(\t\t\tmax_mtu:\t\t%d (%d)\n, +ibv_mtu_to_num(port_attr.max_mtu), port_attr.max_mtu.mtu); + printf(\t\t\tactive_mtu:\t\t%d (%d)\n, + ibv_mtu_to_num(port_attr.active_mtu), port_attr.active_mtu.mtu); printf(\t\t\tsm_lid:\t\t\t%d\n, port_attr.sm_lid); printf(\t\t\tport_lid:\t\t%d\n, port_attr.lid); printf(\t\t\tport_lmc:\t\t0x%02x\n, port_attr.lmc); diff --git a/examples/pingpong.c b/examples/pingpong.c index 90732ef..d1c22c9 100644 --- a/examples/pingpong.c +++ b/examples/pingpong.c @@ -36,18 +36,6 @@ #include stdio.h #include string.h -enum ibv_mtu pp_mtu_to_enum(int mtu) -{ - switch (mtu) { - case 256: return IBV_MTU_256; - case 512: return IBV_MTU_512; - case 1024: return IBV_MTU_1024; - case 2048: return IBV_MTU_2048; - case 4096: return IBV_MTU_4096; - default: return -1; - } -} - uint16_t pp_get_local_lid(struct ibv_context *context, int port) { struct ibv_port_attr attr; diff --git a/examples/pingpong.h b/examples/pingpong.h index 9cdc03e..91d217b 100644 --- a/examples/pingpong.h +++ b/examples/pingpong.h @@ -35,7 +35,6 @@ #include infiniband/verbs.h -enum ibv_mtu pp_mtu_to_enum(int mtu); uint16_t pp_get_local_lid(struct ibv_context *context, int port); int pp_get_port_info(struct ibv_context *context, int port, struct ibv_port_attr *attr); diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c index 15494a1..a7e1836 100644 --- a/examples/rc_pingpong.c +++ b/examples/rc_pingpong.c @@ -78,7 +78,7 @@ struct pingpong_dest { }; static