Re: Reducing the number of IB interrupts caused by the SRP initiator

2010-01-18 Thread Eli Cohen
On Sun, Jan 17, 2010 at 9:08 PM, Bart Van Assche
bart.vanass...@gmail.com wrote:
 Hello,

 Once I noticed that the SRP initiator triggers two IB interrupts per I/O (more
 precise, per srp_queuecommand() call) I started looking at reducing the number
 of IB interrupts. The patch below didn't help. Does this mean that the patch
 below isn't complete or is this because of the hw driver used during the test
 (mlx4) ?
When you say that the patch did not help, do you mean that it did not
reduce the rate of interrupts?  I also think that you have a race here
since the head and tail operations are not protected (I don't know the
SRP driver I think locking should be done here). It would be better if
you'd choose a value that is a fraction of the queue size to request a
completion so that you don't get to a situation where your queue is
full and you the sender is waiting for it to be cleared. Also, you did
not touch the completion handler. Seems to me you should increment the
tail for each completion by the number of completions you skipped.
Failing to do so will keep cause the completion to be requested each
post after the first time it skipped queue len completions.



 Bart.

 diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
 b/drivers/infiniband/ulp/srp/ib_srp.c
 index 54c8fe2..1f674b8 100644
 --- a/drivers/infiniband/ulp/srp/ib_srp.c
 +++ b/drivers/infiniband/ulp/srp/ib_srp.c
 @@ -241,7 +241,7 @@ static int srp_create_target_ib(struct srp_target_port 
 *target)
        init_attr-cap.max_recv_wr     = SRP_RQ_SIZE;
        init_attr-cap.max_recv_sge    = 1;
        init_attr-cap.max_send_sge    = 1;
 -       init_attr-sq_sig_type         = IB_SIGNAL_ALL_WR;
 +       init_attr-sq_sig_type         = IB_SIGNAL_REQ_WR;
        init_attr-qp_type             = IB_QPT_RC;
        init_attr-send_cq             = target-cq;
        init_attr-recv_cq             = target-cq;
 @@ -1001,7 +1001,8 @@ static int __srp_post_send(struct srp_target_port 
 *target,
        wr.sg_list    = list;
        wr.num_sge    = 1;
        wr.opcode     = IB_WR_SEND;
 -       wr.send_flags = IB_SEND_SIGNALED;
 +       wr.send_flags = target-tx_head - target-tx_tail == SRP_SQ_SIZE - 1
 +               ? IB_SEND_SIGNALED : 0;

        ret = ib_post_send(target-qp, wr, bad_wr);

 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/11] opensm: Allow the routing engine to participate in path SL calculations.

2010-01-18 Thread Jim Schutt

Hi Yevgeny,

On Thu, 2010-01-14 at 09:24 -0700, Yevgeny Kliteynik wrote:
 Jim,
 
 On 20/Nov/09 21:15, Jim Schutt wrote:
  LASH already does this, in a hard-coded fashion.
 
  Generalize this by adding a callback to struct osm_routing_engine that
  computes a path SL value, and fix up LASH to use it.
 
  This patchset causes the requested or QoS-computed SL value to be passed
  to the routing engine path SL computation as a hint.  In the event the
  routing engine's use of SLs allows it to support more than one QoS level,
  it may be able to make use of the SL hint to do so.
 
  For now, LASH just ignores the hint.
 
  Note that before this change, if LASH was configured and a specific path
  SL value was requested that differed from what LASH needed to route the
  fabric without credit loops, the path SL lookup would fail.  Now LASH's
  SL value is always used.
 
  Possibly the choice between failing a path SL request when it conflicts
  with routing, vs. always providing an SL value that gives a credit-loop-
  free routing, should be user-configurable?
 
 SL can come from the following places:
   - user requested specific SL in PathRecord query
   - QoS policy configuration
   - SL specified in partition parameters
   - basic QoS (no policies, only SL2VL table)
   - routing engine
 
 Except for QoS policy being able to override SL that is specified in
 the partition parameters (with an error message in the log), IMHO if
 there's a conflict between SLs coming from different constraints
 PathRecord should fail to find a satisfiable path, or at least we
 should see some error message in the log that the selected SL
 conflicts with other OSM configurations, but will be used anyway.
 
 [snip...]
 
 
  @@ -725,6 +707,14 @@ static ib_api_status_t pr_rcv_get_path_parms(IN 
  osm_sa_t * sa,
  goto Exit;
  }
 
  +   /*
  +* If the routing engine wants to have a say in path SL selection,
  +* send the currently computed SL value as a hint and let the routing
  +* engine override it.
  +*/
  +   if (p_re  p_re-path_sl)
  +   sl = p_re-path_sl(p_re-context, sl, p_src_port, p_dest_port);
 
 
 In addition to error message if routing engine overrides the provided
 hint, need to check whether the returned SL is valid - check the
 corresponding bit in valid_sl_mask. It might be irrelevant for torus-2QoS
 routing (not sure yet, need to read more patches :-) ), but it's
 probably needed in general case.
 
 Also, perhaps it would be better to provide the bitmask of available
 SLs as a hint if there are more than one suitable SL?
 
 I mean something like this (didn't try it, didn't even compile it,
 need corresponding change in the p_re-path_sl callback, it's just
 to illustrate what I mean):

Your suggestion below won't accomplish what I was trying to 
accomplish.

Torus-2QoS needs to encode global path information into the
SL value in order to provide routing free of credit loops.

But it only needs 3 bits of SL to do this, leaving one free.
So, it uses that bit to provide two levels of quality of
service.

This usage of SL clashes with the QoS policy engine, which
uses each SL value to provide up to 16 levels of quality
of service.  So to the QoS policy engine, every SL value
is distinct, but to torus-2QoS, SL values 0-7 are all the
same wrt. QoS level, and SL values 8-15 are also all the
same wrt. a second QoS level.

I wanted to use the QoS policy engine to configure QoS
level in torus-2QoS, so I used this hint idea.
What torus-2QoS' path_sl() does is append the high-order
bit from the SL hint, as computed by the QoS policy engine,
onto the 3 low-order bits that it computes are needed 
to avoid deadlock.

Does that help explain what I'm after?

-- Jim

 
 ---
   opensm/opensm/osm_sa_path_record.c |   47 
 ++-
   1 files changed, 29 insertions(+), 18 deletions(-)
 
 diff --git a/opensm/opensm/osm_sa_path_record.c 
 b/opensm/opensm/osm_sa_path_record.c
 index 7120d65..6de8979 100644
 --- a/opensm/opensm/osm_sa_path_record.c
 +++ b/opensm/opensm/osm_sa_path_record.c
 @@ -171,7 +171,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN 
 osm_sa_t * sa,
   uint8_t required_mtu;
   uint8_t required_rate;
   uint8_t required_pkt_life;
 -uint8_t sl;
 +uint8_t sl = OSM_DEFAULT_SL;
   uint8_t in_port_num;
   ib_net16_t dest_lid;
   uint8_t i;
 @@ -688,33 +688,44 @@ static ib_api_status_t pr_rcv_get_path_parms(IN 
 osm_sa_t * sa,
   cl_ntoh16(pkey), sl);
   } else
   sl = p_prtn-sl;
 -} else if (sa-p_subn-opt.qos) {
 +}
 +
 +/*
 + * If the routing engine wants to have a say in path SL selection,
 + * send the currently computed SL value as a hint and let the routing
 + * engine override it.
 + */
 +if (p_re  p_re-path_sl)
 +sl = p_re-path_sl(p_re-context, valid_sl_mask, p_src_port, 
 p_dest_port);
 +
 +if (sa-p_subn-opt.qos  !(valid_sl_mask  (1  

Re: [infiniband-diags] [UPDATED PATCH] [3/3] support --load-cache in iblinkinfo and ibqueryerrors

2010-01-18 Thread Al Chu
Hey Sasha,

Here's an updated patch with the cleanup changes as we discussed.

Al

On Sat, 2010-01-16 at 16:28 +0200, Sasha Khapyorsky wrote:
 On 10:23 Fri 15 Jan , Al Chu wrote:
  Hi Sasha,
  
  This adds the --load-cache options to iblinkinfo and ibqueryerrors.
  
  Al
  
  -- 
  Albert Chu
  ch...@llnl.gov
  Computer Scientist
  High Performance Systems Division
  Lawrence Livermore National Laboratory
 
  From: Albert Chu ch...@llnl.gov
  Date: Thu, 10 Dec 2009 11:22:50 -0800
  Subject: [PATCH] support --load-cache in iblinkinfo and ibqueryerrors
  
  
  Signed-off-by: Albert Chu ch...@llnl.gov
  ---
   infiniband-diags/man/iblinkinfo.8|   11 ++-
   infiniband-diags/man/ibqueryerrors.8 |   10 ++-
   infiniband-diags/src/iblinkinfo.c|   52 
  +---
   infiniband-diags/src/ibqueryerrors.c |   53 
  ++---
   4 files changed, 99 insertions(+), 27 deletions(-)
  
  diff --git a/infiniband-diags/man/iblinkinfo.8 
  b/infiniband-diags/man/iblinkinfo.8
  index 0f53b00..f184edf 100644
  --- a/infiniband-diags/man/iblinkinfo.8
  +++ b/infiniband-diags/man/iblinkinfo.8
  @@ -6,7 +6,7 @@ iblinkinfo \- report link info for all links in the fabric
   .SH SYNOPSIS
   .B iblinkinfo
  [-hcdl -C ca_name -P ca_port -v lt,hoq,vlstall -S guid
  --D direct_route]
  +-D direct_route \-\-load\-cache filename]
   
   .SH DESCRIPTION
   .PP
  @@ -42,7 +42,14 @@ Print port capabilities (enabled and supported values)
   \fB\-P ca_port\fRuse the specified ca_port for the search.
   .TP
   \fB\-R\fR (This option is obsolete and does nothing)
  -
  +.TP
  +\fB\-\-load\-cache\fR filename
  +Load and use the cached ibnetdiscover data stored in the specified
  +filename.  May be useful for outputting and learning about other
  +fabrics or a previous state of a fabric.  Cannot be used if user
  +specifies a directo route path.  See
  +.B ibnetdiscover
  +for information on caching ibnetdiscover output.
   
   .SH AUTHOR
   .TP
  diff --git a/infiniband-diags/man/ibqueryerrors.8 
  b/infiniband-diags/man/ibqueryerrors.8
  index 83a2b5a..56a0d67 100644
  --- a/infiniband-diags/man/ibqueryerrors.8
  +++ b/infiniband-diags/man/ibqueryerrors.8
  @@ -6,7 +6,7 @@ ibqueryerrors \- query and report non-zero IB port counters
   .SH SYNOPSIS
   .B ibqueryerrors
   [-s err1,err2,... -c -r -C ca_name -P ca_port -G node_guid
  --D direct_route -d -k -K]
  +-D direct_route -d -k -K \-\-load\-cache filename]
   
   .SH DESCRIPTION
   .PP
  @@ -60,6 +60,14 @@ specified the data counters will be cleared without any 
  printed output.
   .TP
   \fB\-\-details\fR include transmit discard details
   .TP
  +\fB\-\-load\-cache\fR filename
  +Load and use the cached ibnetdiscover data stored in the specified
  +filename.  May be useful for outputting and learning about other
  +fabrics or a previous state of a fabric.  Cannot be used if user
  +specifies a directo route path.  See
  +.B ibnetdiscover
  +for information on caching ibnetdiscover output.
  +.TP
   \fB\-R\fR  (This option is obsolete and does nothing)
   
   .SH COMMON OPTIONS
  diff --git a/infiniband-diags/src/iblinkinfo.c 
  b/infiniband-diags/src/iblinkinfo.c
  index 21b31bb..10e3ad5 100644
  --- a/infiniband-diags/src/iblinkinfo.c
  +++ b/infiniband-diags/src/iblinkinfo.c
  @@ -55,6 +55,7 @@
   
   static char *node_name_map_file = NULL;
   static nn_map_t *node_name_map = NULL;
  +static char *load_cache_file = NULL;
   
   static uint64_t guid = 0;
   static char *guid_str = NULL;
  @@ -230,6 +231,9 @@ static int process_opt(void *context, int ch, char 
  *optarg)
  case 1:
  node_name_map_file = strdup(optarg);
  break;
  +   case 2:
  +   load_cache_file = strdup(optarg);
  +   break;
  case 'S':
  guid_str = optarg;
  guid = (uint64_t) strtoull(guid_str, 0, 0);
  @@ -291,6 +295,7 @@ int main(int argc, char **argv)
   print additional switch settings (PktLifeTime, HoqLife, 
  VLStallCount)},
  {portguids, 'g', 0, NULL,
   print port guids instead of node guids},
  +   {load-cache, 2, 1, file, filename of ibnetdiscover cache 
  to load},
  {GNDN, 'R', 0, NULL,
   (This option is obsolete and does nothing)},
  {0}
  @@ -317,6 +322,11 @@ int main(int argc, char **argv)
  mad_rpc_set_timeout(ibmad_port, ibd_timeout);
   
  node_name_map = open_node_name_map(node_name_map_file);
  +   
  +   if (dr_path  load_cache_file) {
  +   fprintf(stderr, Cannot specify cache and direct route path\n);
  +   exit(1);
  +   }
 
 Why is this limitation needed really?
 
   
  if (dr_path) {
  /* only scan part of the fabric */
  @@ -334,19 +344,37 @@ int main(int argc, char **argv)
 guid_str);
  }
   
  -   if (resolved = 0)
  -   if ((fabric = ibnd_discover_fabric(ibmad_port, port_id,
  -