[patch] RDMA/cxgb3: timeout condition is never true
This is a static checker fix. count is unsigned so it's never -1. Since count is 16 bits and the addition operation is implicitly casted to int then there is no wrapping here. Signed-off-by: Dan Carpenter dan.carpen...@oracle.com diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index e5649e8..b57c0be 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -883,7 +883,8 @@ u16 iwch_rqes_posted(struct iwch_qp *qhp) { union t3_wr *wqe = qhp-wq.queue; u16 count = 0; - while ((count+1) != 0 fw_riwrh_opcode((struct fw_riwrh *)wqe) == T3_WR_RCV) { + + while (count USHRT_MAX fw_riwrh_opcode((struct fw_riwrh *)wqe) == T3_WR_RCV) { count++; wqe++; } -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: config file lost
On 17/06/2013 21:44, Hal Rosenstock wrote: I'm not 100% sure about the origin of those RPMs but I think the 3.3.15 one is RedHat packaged and the 3.3.16 appears to be PLD packaged and the processes are a little different. I suspect the 3.3.16 one is packaged with the spec file in the tree whereas RedHat uses their own spec file. FWIW it's simple to generate an up to date config file: opensm -c opensm.conf Hal, YES for your observations, that 3.3.15 was RHEL packages and the 3.3.16 was built from the upstream spec. I know that I can generate the config file using the method you suggested, however, does the upstream service scripts uses the location to which this is generated, so things are plug-and-play, or I need to hack that somehow? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Log changes related to event subscription and forwarding
On 6/5/2013 7:14 AM, Line Holen wrote: Signed-off-by: Line Holen line.ho...@oracle.com Thanks. Applied with minor change to keep log level same in send_report. -- Hal -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH opensm] ftree: Allow defining only io_guids file and consider rest of nodes as CN nodes
From 2068a60399865590aff094738e3350f972c6e9b4 Mon Sep 17 00:00:00 2001 From: Vladimir Koushnir vladim...@mellanox.com Date: Sun, 9 Jun 2013 18:05:01 +0300 Subject: [PATCH] ftree: Allow defining only io_guids file and consider rest of nodes as CN nodes In the current implementation cn_guids and io_guids files should be provided in order to separate between Compute Nodes and IO Nodes for ftree routing. This approach is not optimal when there are several IO Nodes the fabric compared to thousands of CNs. Provided patch considers all nodes to be Compute nodes by default. If some guid is presenting in io_guids file and not presenting in cn_guid file the corresponding node will be considered as IO node. Signed-off-by: Vladimir Koushnir vladim...@mellanox.com --- opensm/osm_subnet.c |7 - opensm/osm_ucast_ftree.c | 53 +++-- 2 files changed, 32 insertions(+), 28 deletions(-) diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c index bc05ae0..5e06d83 100644 --- a/opensm/osm_subnet.c +++ b/opensm/osm_subnet.c @@ -2879,11 +2879,14 @@ int osm_subn_output_conf(FILE *out, IN osm_subn_opt_t * p_opts) fprintf(out, # The file holding the fat-tree I/O node guids\n - # One guid in each line\nio_guid_file %s\n\n, + # One guid in each line.\n + # If only io_guid file is provided, the rest of nodes\n + # are considered as compute nodes.\n + io_guid_file %s\n\n, p_opts-io_guid_file ? p_opts-io_guid_file : null_str); fprintf(out, - # Number of reverse hops allowed for I/O nodes \n + # Number of reverse hops allowed for I/O nodes\n # Used for connectivity between I/O nodes connected to Top Switches\nmax_reverse_hops %d\n\n, p_opts-max_reverse_hops); diff --git a/opensm/osm_ucast_ftree.c b/opensm/osm_ucast_ftree.c index 58e51d4..a069edb 100644 --- a/opensm/osm_ucast_ftree.c +++ b/opensm/osm_ucast_ftree.c @@ -3268,8 +3268,11 @@ fabric_construct_hca_ports(IN ftree_fabric_t * p_ftree, IN ftree_hca_t * p_hca) osm_physp_t *p_remote_osm_port; uint8_t i; uint8_t remote_port_num; - boolean_t is_cn = FALSE; + boolean_t is_cn = TRUE; + boolean_t is_in_cn_file = FALSE; boolean_t is_io = FALSE; + boolean_t is_cns_file_provided = fabric_cns_provided(p_ftree); + boolean_t is_ios_file_provided = fabric_ios_provided(p_ftree); int res = 0; for (i = 0; i osm_node_get_num_physp(p_node); i++) { @@ -3325,16 +3328,27 @@ fabric_construct_hca_ports(IN ftree_fabric_t * p_ftree, IN ftree_hca_t * p_hca) /* If CN file is not supplied, then all the CAs considered as Compute Nodes. Otherwise all the CAs are not CNs, and only guids that are present in the CN file will be marked as compute nodes. */ - if (!fabric_cns_provided(p_ftree)) { - is_cn = TRUE; - } else { + if (is_cns_file_provided == TRUE) { name_map_item_t *p_elem = (name_map_item_t *) - cl_qmap_get(p_ftree-cn_guid_tbl, - cl_ntoh64(osm_physp_get_port_guid - (p_osm_port))); + cl_qmap_get(p_ftree-cn_guid_tbl, + cl_ntoh64(osm_physp_get_port_guid +(p_osm_port))); + if (p_elem == (name_map_item_t *) + cl_qmap_end(p_ftree-cn_guid_tbl)) + is_cn = FALSE; + else + is_in_cn_file = TRUE; + } + if (is_in_cn_file == FALSE is_ios_file_provided == TRUE) { + name_map_item_t *p_elem = (name_map_item_t *) + cl_qmap_get(p_ftree-io_guid_tbl, + cl_ntoh64(osm_physp_get_port_guid +(p_osm_port))); if (p_elem != (name_map_item_t *) - cl_qmap_end(p_ftree-cn_guid_tbl)) - is_cn = TRUE; + cl_qmap_end(p_ftree-io_guid_tbl)) { + is_io = TRUE; + is_cn = FALSE; + } } if (is_cn) { @@ -3343,32 +3357,19 @@ fabric_construct_hca_ports(IN ftree_fabric_t * p_ftree, IN ftree_hca_t * p_hca) OSM_LOG(p_ftree-p_osm-log, OSM_LOG_DEBUG, Marking CN port GUID 0x%016 PRIx64 \n, cl_ntoh64(osm_physp_get_port_guid(p_osm_port))); - } else { - if
Re: Node Description mismatch between saquery smpquery
On 6/17/2013 5:38 PM, Albert Chu wrote: We've recently noticed that the Node Description for a node can mis-mismatch between the output of smpquery and saquery. For example: # smpquery NodeDesc 427 Node Description:.sierra1932 qib0 # saquery NodeRecord 427 | grep NodeDesc NodeDescription.QLogic Infiniband HCA A restart of OpenSM is the current solution to resolve this. We've noticed it occurring more often on our larger clusters than our smaller clusters, leading to a speculation about why it is happening. The speculation is when a node comes up, there is a window of time in which the HCA is up, can be scanned by OpenSM, but not yet have its node descriptor set (in RHEL I appears to be set via /etc/init.d/rdma). During this window, OpenSM reads/stores the non-desired node descriptor (in the above case the non-desired Qlogic Infiniband HCA). When the node descriptor is changed, a trap should be sent to opensm indicating the change. Normally OpenSM gets the trap and reads the new node descriptor. Are you sure the trap is being issued by those devices when the NodeDescription is changed locally ? Also, if so, do these devices implement timeout/retry on sending the trap (e.g. trying to make sure that they receive trap repress before giving up on trap) ? On our large clusters all nodes are typically brought up at the same time, so there are probably a ton of node descriptor change traps happening at the exact same time. We speculate a number of these are dropped/lost, and subsequently OpenSM never realizes that the node descriptor has changed. Do you see any evidence of that traps are being dropped ? Have you correlated any VL15Dropped counters in the subnet with this ? Also, there is a module parameter in MAD kernel module that might help with any unsolicited MAD bursts. You might try increasing that on your SM node(s). I don't know if the speculation sounds reasonable or not. Regardless, we're not sure of the best fix. A trivial fix would be to just make OpenSM re-scan the node descriptor of an HCA, perhaps during a heavy sweep. But I don't know if this is optimal. It'll introduce more MADs on the wire. However if the present solution is to restart OpenSM, we figure this can't be any worse. Yes, but to add the additional queries in is O(n) there and has been resisted in the past. Just wondering what peoples thoughts are of if there's another obvious solution we're not seeing. I think this issue needs better understanding first. -- Hal Al -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: config file lost
On 6/18/2013 5:03 AM, Or Gerlitz wrote: On 17/06/2013 21:44, Hal Rosenstock wrote: I'm not 100% sure about the origin of those RPMs but I think the 3.3.15 one is RedHat packaged and the 3.3.16 appears to be PLD packaged and the processes are a little different. I suspect the 3.3.16 one is packaged with the spec file in the tree whereas RedHat uses their own spec file. FWIW it's simple to generate an up to date config file: opensm -c opensm.conf Hal, YES for your observations, that 3.3.15 was RHEL packages and the 3.3.16 was built from the upstream spec. I know that I can generate the config file using the method you suggested, however, does the upstream service scripts uses the location to which this is generated, Is /etc/rdma a standard location in Linux ? Is it used by other RDMA upstream components ? Also, opensm doesn't by default use this location for the config file. I expect that's dealt with by other scripts RedHat supplies. so things are plug-and-play, or I need to hack that somehow? You would currently need to hack that. -- Hal Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: config file lost
On 18/06/2013 14:19, Hal Rosenstock wrote: Is /etc/rdma a standard location in Linux ? Is it used by other RDMA upstream components ? its used by RHEL packages, not upstream Also, opensm doesn't by default use this location for the config file. I expect that's dealt with by other scripts RedHat supplies. yes, this is part of their specs I think so things are plug-and-play, or I need to hack that somehow? You would currently need to hack that. How exactly? I'd like to build rpm from upstream opensm, generate config file and have the opensm service script to read this config and apply it for successive restarts or sig HUPs I send. Maybe you can come up with some patch, it will help Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
Bart Van Assche wrote: On 06/14/13 19:59, Vu Pham wrote: On 06/13/13 21:43, Vu Pham wrote: +/** + * srp_tmo_valid() - check timeout combination validity + * + * If no fast I/O fail timeout has been configured then the device loss timeout + * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O fail timeout has + * been configured then it must be below the device loss timeout. + */ +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo) +{ +return (fast_io_fail_tmo 0 1 = dev_loss_tmo +dev_loss_tmo = SCSI_DEVICE_BLOCK_MAX_TIMEOUT) +|| (0 = fast_io_fail_tmo +(dev_loss_tmo 0 || + (fast_io_fail_tmo dev_loss_tmo + dev_loss_tmo LONG_MAX / HZ))) ? 0 : -EINVAL; +} +EXPORT_SYMBOL_GPL(srp_tmo_valid); fast_io_fail_tmo is off, one cannot turn off dev_loss_tmo with negative value dev_loss_tmo is off, one cannot turn off fast_io_fail_tmo with negative value OK, will update the documentation such that it correctly refers to off instead of a negative value and I will also mention that dev_loss_tmo can now be disabled. It's not only the documentation but also the code logic, you cannot turn dev_loss_tmo off if fast_io_fail_tmo already turned off and vice versa with the return statement above. Does this mean that you think it would be useful to disable both the fast_io_fail and the dev_loss mechanisms, and hence rely on the user to remove remote ports that have disappeared and on the SCSI command timeout to detect path failures ? Yes. I'll start testing this to see whether that combination does not trigger any adverse behavior. Ok If rport's state is already SRP_RPORT_BLOCKED, I don't think we need to do extra block with scsi_block_requests() Please keep in mind that srp_reconnect_rport() can be called from two different contexts: that function can not only be called from inside the SRP transport layer but also from inside the SCSI error handler (see also the srp_reset_device() modifications in a later patch in this series). If this function is invoked from the context of the SCSI error handler the chance is high that the SCSI device will have another state than SDEV_BLOCK. Hence the scsi_block_requests() call in this function. Yes, srp_reconnect_rport() can be called from two contexts; however, it deals with same rport rport's state. I'm thinking something like this: if (rport-state != SRP_RPORT_BLOCKED) { scsi_block_requests(shost); Sorry but I'm afraid that that approach would still allow the user to unblock one or more SCSI devices via sysfs during the i-f-reconnect(rport) call, something we do not want. I don't think that user can unblock scsi device(s) via sysfs if you use scsi_block_requests(shost) in srp_start_tl_fail_timers(). -vu I think that we can use only the pair scsi_block_requests()/scsi_unblock_requests() unless the advantage of multipathd recognizing the SDEV_BLOCK is noticeable. I think the advantage of multipathd recognizing the SDEV_BLOCK state before the fast_io_fail_tmo timer has expired is important. Multipathd does not queue I/O to paths that are in the SDEV_BLOCK state so setting that state helps I/O to fail over more quickly, especially for large values of fast_io_fail_tmo. Hope this helps, Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libibverbs: Allow arbitrary int values for MTU
On Mon, Jun 17, 2013 at 02:00:07PM -0700, Jeff Squyres wrote: Keep IBV_MTU_* enums values as they are, but pass MTU values around as int's. This is an ABI-compatible change; legacy applications will use the enum values, but newer applications can use an int for values that do not currently exist in the enum set (e.g., 1500, 9000). (if people like the idea of this patch, I will send the corresponding kernel patch) Signed-off-by: Jeff Squyres jsquy...@cisco.com examples/devinfo.c | 11 +-- examples/pingpong.c| 12 examples/pingpong.h| 1 - examples/rc_pingpong.c | 8 examples/srq_pingpong.c| 8 examples/uc_pingpong.c | 8 examples/ud_pingpong.c | 2 +- include/infiniband/verbs.h | 20 +--- man/ibv_modify_qp.3| 2 +- man/ibv_query_port.3 | 4 ++-- man/ibv_query_qp.3 | 2 +- src/libibverbs.map | 3 +++ src/verbs.c| 24 13 files changed, 70 insertions(+), 35 deletions(-) diff --git a/examples/devinfo.c b/examples/devinfo.c index ff078e4..b856c82 100644 +++ b/examples/devinfo.c @@ -111,15 +111,22 @@ static const char *atomic_cap_str(enum ibv_atomic_cap atom_cap) } } -static const char *mtu_str(enum ibv_mtu max_mtu) +static const char *mtu_str(int max_mtu) This is simpler: { static char str[16]; snprintf(str, sizeof(str), %d, ibv_mtu_to_num(max_mtu)); return str; } You may want to replace 'enum ibv_mtu' with 'ibv_mtu_t' to make it more clear that it is not an integer. - enum ibv_mtu mtu = IBV_MTU_1024; + int mtu = 1024; No, you must keep using IBV_MTU_1024 for all cases requesting a 1024 byte MTU, new libraries will accept both, old libraries will only accept IBV_MTU_1024, so we must stick with IBV_MTU_1024 in applications. So: int mtu = num_to_ibv_mtu(1024); - mtu = pp_mtu_to_enum(strtol(optarg, NULL, 0)); + mtu = strtol(optarg, NULL, 0); if (mtu 0) { usage(argv[0]); return 1; Same deal, add 'mtu = num_to_ibv_mtu(mtu);' Same two comments repeated many times. +/** + * ibv_num_to_mtu - If an MTU enum value for num is available, + * return it (e.g., 1024 - IBV_MTU_1024). Otherwise, return num + * (e.g., 1500 - 1500). + */ +int num_to_ibv_mtu(int num); Probably should be ibv_num_to_mtu() to keep with the naming pattern.. + +/** + * ibv_mtu_to_num - Return the numeric value of a symbolic enum + * ibv_mtu name (e.g., IBV_MTU_1024 - 1024). If a non-symbolic enum + * value is passed, just return the same value (e.g., 1500 - 1500). + */ +int ibv_mtu_to_num(int mtu); These should be static inlines, not externs. There is a desire to not add new symbols to libibverbs. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] RDMA/cxgb3: timeout condition is never true
Acked-by: Steve Wise sw...@opengridcomputing.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Node Description mismatch between saquery smpquery
-Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- Subject: Re: Node Description mismatch between saquery smpquery On Tue, 2013-06-18 at 07:13 -0400, Hal Rosenstock wrote: On 6/17/2013 5:38 PM, Albert Chu wrote: We've recently noticed that the Node Description for a node can mis-mismatch between the output of smpquery and saquery. For example: # smpquery NodeDesc 427 Node Description:.sierra1932 qib0 # saquery NodeRecord 427 | grep NodeDesc NodeDescription.QLogic Infiniband HCA A restart of OpenSM is the current solution to resolve this. [snip] When the node descriptor is changed, a trap should be sent to opensm indicating the change. Normally OpenSM gets the trap and reads the new node descriptor. Are you sure the trap is being issued by those devices when the NodeDescription is changed locally ? These particular devices do support the trap and tests show they do send traps on changes (i.e. manually changing /sys/class/infiniband/qib0/node_desc). Also, if so, do these devices implement timeout/retry on sending the trap (e.g. trying to make sure that they receive trap repress before giving up on trap) ? This I don't know. I've been trying to figure out if they do and if they do how it might be configurable. Is there a way to figure this out? Looking quickly at the driver I don't think it does resend the trap. However, Mike might know better: CC'ed. Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ibsim: Do not return allportselect for non-switches
ibsim only supports allportselect (i.e. port 0xff) for switches but not HCAs. However the allportselect flag would be returned for all classportinfo requests, leading to unexpected results when running some infiniband-diag tools against ibsim. Signed-off-by: Albert L. Chu ch...@llnl.gov --- ibsim/sim_mad.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/ibsim/sim_mad.c b/ibsim/sim_mad.c index 0b52c52..223d507 100644 --- a/ibsim/sim_mad.c +++ b/ibsim/sim_mad.c @@ -235,6 +235,7 @@ static int reply_MAD(void *buf, ib_rpc_t * rpc, ib_dr_path_t * path, static int do_cpi(Port * port, unsigned op, uint32_t mod, uint8_t * data) { + Node *node = port-node; int status = 0; if (op != IB_MAD_METHOD_GET) @@ -242,7 +243,10 @@ static int do_cpi(Port * port, unsigned op, uint32_t mod, uint8_t * data) memset(data, 0, IB_SMP_DATA_SIZE); mad_set_field(data, 0, IB_CPI_BASEVER_F, 1); mad_set_field(data, 0, IB_CPI_CLASSVER_F, 1); - mad_set_field(data, 0, IB_CPI_CAPMASK_F, 0x300); + if (node-type != SWITCH_NODE) + mad_set_field(data, 0, IB_CPI_CAPMASK_F, 0x200); + else + mad_set_field(data, 0, IB_CPI_CAPMASK_F, 0x300); mad_set_field(data, 0, IB_CPI_RESP_TIME_VALUE_F, 0x12); return status; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/ehca: Fix error return code in ehca_create_slab_caches()
From: Wei Yongjun yongjun_...@trendmicro.com.cn Fix to return -ENOMEM in the kmem_cache_create() error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn --- drivers/infiniband/hw/ehca/ehca_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index f8a6291..8f43093 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -211,6 +211,7 @@ static int ehca_create_slab_caches(void) if (!ctblk_cache) { ehca_gen_err(Cannot create ctblk SLAB cache.); ehca_cleanup_small_qp_cache(); + ret = -ENOMEM; goto create_slab_caches6; } #endif -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html