Re: Reducing the number of IB interrupts caused by the SRP initiator
On Sun, Jan 17, 2010 at 9:08 PM, Bart Van Assche bart.vanass...@gmail.com wrote: Hello, Once I noticed that the SRP initiator triggers two IB interrupts per I/O (more precise, per srp_queuecommand() call) I started looking at reducing the number of IB interrupts. The patch below didn't help. Does this mean that the patch below isn't complete or is this because of the hw driver used during the test (mlx4) ? When you say that the patch did not help, do you mean that it did not reduce the rate of interrupts? I also think that you have a race here since the head and tail operations are not protected (I don't know the SRP driver I think locking should be done here). It would be better if you'd choose a value that is a fraction of the queue size to request a completion so that you don't get to a situation where your queue is full and you the sender is waiting for it to be cleared. Also, you did not touch the completion handler. Seems to me you should increment the tail for each completion by the number of completions you skipped. Failing to do so will keep cause the completion to be requested each post after the first time it skipped queue len completions. Bart. diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 54c8fe2..1f674b8 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -241,7 +241,7 @@ static int srp_create_target_ib(struct srp_target_port *target) init_attr-cap.max_recv_wr = SRP_RQ_SIZE; init_attr-cap.max_recv_sge = 1; init_attr-cap.max_send_sge = 1; - init_attr-sq_sig_type = IB_SIGNAL_ALL_WR; + init_attr-sq_sig_type = IB_SIGNAL_REQ_WR; init_attr-qp_type = IB_QPT_RC; init_attr-send_cq = target-cq; init_attr-recv_cq = target-cq; @@ -1001,7 +1001,8 @@ static int __srp_post_send(struct srp_target_port *target, wr.sg_list = list; wr.num_sge = 1; wr.opcode = IB_WR_SEND; - wr.send_flags = IB_SEND_SIGNALED; + wr.send_flags = target-tx_head - target-tx_tail == SRP_SQ_SIZE - 1 + ? IB_SEND_SIGNALED : 0; ret = ib_post_send(target-qp, wr, bad_wr); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/11] opensm: Allow the routing engine to participate in path SL calculations.
Hi Yevgeny, On Thu, 2010-01-14 at 09:24 -0700, Yevgeny Kliteynik wrote: Jim, On 20/Nov/09 21:15, Jim Schutt wrote: LASH already does this, in a hard-coded fashion. Generalize this by adding a callback to struct osm_routing_engine that computes a path SL value, and fix up LASH to use it. This patchset causes the requested or QoS-computed SL value to be passed to the routing engine path SL computation as a hint. In the event the routing engine's use of SLs allows it to support more than one QoS level, it may be able to make use of the SL hint to do so. For now, LASH just ignores the hint. Note that before this change, if LASH was configured and a specific path SL value was requested that differed from what LASH needed to route the fabric without credit loops, the path SL lookup would fail. Now LASH's SL value is always used. Possibly the choice between failing a path SL request when it conflicts with routing, vs. always providing an SL value that gives a credit-loop- free routing, should be user-configurable? SL can come from the following places: - user requested specific SL in PathRecord query - QoS policy configuration - SL specified in partition parameters - basic QoS (no policies, only SL2VL table) - routing engine Except for QoS policy being able to override SL that is specified in the partition parameters (with an error message in the log), IMHO if there's a conflict between SLs coming from different constraints PathRecord should fail to find a satisfiable path, or at least we should see some error message in the log that the selected SL conflicts with other OSM configurations, but will be used anyway. [snip...] @@ -725,6 +707,14 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa, goto Exit; } + /* +* If the routing engine wants to have a say in path SL selection, +* send the currently computed SL value as a hint and let the routing +* engine override it. +*/ + if (p_re p_re-path_sl) + sl = p_re-path_sl(p_re-context, sl, p_src_port, p_dest_port); In addition to error message if routing engine overrides the provided hint, need to check whether the returned SL is valid - check the corresponding bit in valid_sl_mask. It might be irrelevant for torus-2QoS routing (not sure yet, need to read more patches :-) ), but it's probably needed in general case. Also, perhaps it would be better to provide the bitmask of available SLs as a hint if there are more than one suitable SL? I mean something like this (didn't try it, didn't even compile it, need corresponding change in the p_re-path_sl callback, it's just to illustrate what I mean): Your suggestion below won't accomplish what I was trying to accomplish. Torus-2QoS needs to encode global path information into the SL value in order to provide routing free of credit loops. But it only needs 3 bits of SL to do this, leaving one free. So, it uses that bit to provide two levels of quality of service. This usage of SL clashes with the QoS policy engine, which uses each SL value to provide up to 16 levels of quality of service. So to the QoS policy engine, every SL value is distinct, but to torus-2QoS, SL values 0-7 are all the same wrt. QoS level, and SL values 8-15 are also all the same wrt. a second QoS level. I wanted to use the QoS policy engine to configure QoS level in torus-2QoS, so I used this hint idea. What torus-2QoS' path_sl() does is append the high-order bit from the SL hint, as computed by the QoS policy engine, onto the 3 low-order bits that it computes are needed to avoid deadlock. Does that help explain what I'm after? -- Jim --- opensm/opensm/osm_sa_path_record.c | 47 ++- 1 files changed, 29 insertions(+), 18 deletions(-) diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c index 7120d65..6de8979 100644 --- a/opensm/opensm/osm_sa_path_record.c +++ b/opensm/opensm/osm_sa_path_record.c @@ -171,7 +171,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa, uint8_t required_mtu; uint8_t required_rate; uint8_t required_pkt_life; -uint8_t sl; +uint8_t sl = OSM_DEFAULT_SL; uint8_t in_port_num; ib_net16_t dest_lid; uint8_t i; @@ -688,33 +688,44 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa, cl_ntoh16(pkey), sl); } else sl = p_prtn-sl; -} else if (sa-p_subn-opt.qos) { +} + +/* + * If the routing engine wants to have a say in path SL selection, + * send the currently computed SL value as a hint and let the routing + * engine override it. + */ +if (p_re p_re-path_sl) +sl = p_re-path_sl(p_re-context, valid_sl_mask, p_src_port, p_dest_port); + +if (sa-p_subn-opt.qos !(valid_sl_mask (1
Re: [infiniband-diags] [UPDATED PATCH] [3/3] support --load-cache in iblinkinfo and ibqueryerrors
Hey Sasha, Here's an updated patch with the cleanup changes as we discussed. Al On Sat, 2010-01-16 at 16:28 +0200, Sasha Khapyorsky wrote: On 10:23 Fri 15 Jan , Al Chu wrote: Hi Sasha, This adds the --load-cache options to iblinkinfo and ibqueryerrors. Al -- Albert Chu ch...@llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From: Albert Chu ch...@llnl.gov Date: Thu, 10 Dec 2009 11:22:50 -0800 Subject: [PATCH] support --load-cache in iblinkinfo and ibqueryerrors Signed-off-by: Albert Chu ch...@llnl.gov --- infiniband-diags/man/iblinkinfo.8| 11 ++- infiniband-diags/man/ibqueryerrors.8 | 10 ++- infiniband-diags/src/iblinkinfo.c| 52 +--- infiniband-diags/src/ibqueryerrors.c | 53 ++--- 4 files changed, 99 insertions(+), 27 deletions(-) diff --git a/infiniband-diags/man/iblinkinfo.8 b/infiniband-diags/man/iblinkinfo.8 index 0f53b00..f184edf 100644 --- a/infiniband-diags/man/iblinkinfo.8 +++ b/infiniband-diags/man/iblinkinfo.8 @@ -6,7 +6,7 @@ iblinkinfo \- report link info for all links in the fabric .SH SYNOPSIS .B iblinkinfo [-hcdl -C ca_name -P ca_port -v lt,hoq,vlstall -S guid --D direct_route] +-D direct_route \-\-load\-cache filename] .SH DESCRIPTION .PP @@ -42,7 +42,14 @@ Print port capabilities (enabled and supported values) \fB\-P ca_port\fRuse the specified ca_port for the search. .TP \fB\-R\fR (This option is obsolete and does nothing) - +.TP +\fB\-\-load\-cache\fR filename +Load and use the cached ibnetdiscover data stored in the specified +filename. May be useful for outputting and learning about other +fabrics or a previous state of a fabric. Cannot be used if user +specifies a directo route path. See +.B ibnetdiscover +for information on caching ibnetdiscover output. .SH AUTHOR .TP diff --git a/infiniband-diags/man/ibqueryerrors.8 b/infiniband-diags/man/ibqueryerrors.8 index 83a2b5a..56a0d67 100644 --- a/infiniband-diags/man/ibqueryerrors.8 +++ b/infiniband-diags/man/ibqueryerrors.8 @@ -6,7 +6,7 @@ ibqueryerrors \- query and report non-zero IB port counters .SH SYNOPSIS .B ibqueryerrors [-s err1,err2,... -c -r -C ca_name -P ca_port -G node_guid --D direct_route -d -k -K] +-D direct_route -d -k -K \-\-load\-cache filename] .SH DESCRIPTION .PP @@ -60,6 +60,14 @@ specified the data counters will be cleared without any printed output. .TP \fB\-\-details\fR include transmit discard details .TP +\fB\-\-load\-cache\fR filename +Load and use the cached ibnetdiscover data stored in the specified +filename. May be useful for outputting and learning about other +fabrics or a previous state of a fabric. Cannot be used if user +specifies a directo route path. See +.B ibnetdiscover +for information on caching ibnetdiscover output. +.TP \fB\-R\fR (This option is obsolete and does nothing) .SH COMMON OPTIONS diff --git a/infiniband-diags/src/iblinkinfo.c b/infiniband-diags/src/iblinkinfo.c index 21b31bb..10e3ad5 100644 --- a/infiniband-diags/src/iblinkinfo.c +++ b/infiniband-diags/src/iblinkinfo.c @@ -55,6 +55,7 @@ static char *node_name_map_file = NULL; static nn_map_t *node_name_map = NULL; +static char *load_cache_file = NULL; static uint64_t guid = 0; static char *guid_str = NULL; @@ -230,6 +231,9 @@ static int process_opt(void *context, int ch, char *optarg) case 1: node_name_map_file = strdup(optarg); break; + case 2: + load_cache_file = strdup(optarg); + break; case 'S': guid_str = optarg; guid = (uint64_t) strtoull(guid_str, 0, 0); @@ -291,6 +295,7 @@ int main(int argc, char **argv) print additional switch settings (PktLifeTime, HoqLife, VLStallCount)}, {portguids, 'g', 0, NULL, print port guids instead of node guids}, + {load-cache, 2, 1, file, filename of ibnetdiscover cache to load}, {GNDN, 'R', 0, NULL, (This option is obsolete and does nothing)}, {0} @@ -317,6 +322,11 @@ int main(int argc, char **argv) mad_rpc_set_timeout(ibmad_port, ibd_timeout); node_name_map = open_node_name_map(node_name_map_file); + + if (dr_path load_cache_file) { + fprintf(stderr, Cannot specify cache and direct route path\n); + exit(1); + } Why is this limitation needed really? if (dr_path) { /* only scan part of the fabric */ @@ -334,19 +344,37 @@ int main(int argc, char **argv) guid_str); } - if (resolved = 0) - if ((fabric = ibnd_discover_fabric(ibmad_port, port_id, -