Re: IPoIB issues

2010-03-10 Thread Eli Cohen
On Thu, Mar 11, 2010 at 09:47:31AM +0200, Or Gerlitz wrote:
> >The patch does not address these failures directly but maybe as a
> >side effect they would go away too.
> The patch seems to solve a case of possible "live lock" happening in
> a node which has both CM and datagram neighbors e.g where ipoib have
> called netif_stop etc but there is now room in the QP for more
> postings which could turn into letting the network layer continue to
> post if the CQ would have been polled. Its hard to see how this
> relates to the post send error print
Right, I meant that they could disapear due to the system not getting
into such a state that they will show up but the patch __does not__
address that problem.

> 
> >I think printing the return value is in place so in the future we will have 
> >more information in such cases.
> I posted a patch that does this, but I think it missed the 2.6.34
> merge cycle.
> 
Can you push them to OFED-1.5.1? We'll remove the patch later when
it's in the kernel but at least we'll have the information handy
if/when we need it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPoIB issues

2010-03-10 Thread Or Gerlitz

Eli Cohen wrote:
The patch does not address these failures directly but maybe as a side effect they would go away too. 
The patch seems to solve a case of possible "live lock" happening in a 
node which has both CM and datagram neighbors e.g where ipoib have 
called netif_stop etc but there is now room in the QP for more postings 
which could turn into letting the network layer continue to post if the 
CQ would have been polled. Its hard to see how this  relates to the post 
send error print



I think printing the return value is in place so in the future we will have 
more information in such cases.
I posted a patch that does this, but I think it missed the 2.6.34 merge 
cycle.


Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPoIB issues

2010-03-10 Thread Eli Cohen
On Wed, Mar 10, 2010 at 05:30:38PM +0200, Moni Shoua wrote:
> Hi Eli
> Although Josh already reported that the patch seems to fix the issue I have a 
> question though.
> 
> "post_send failed" prints were during work in datagram mode. I don't know if 
> Josh verified 
> that but I don't expect that these prints would go away, even with the patch. 
> Am I right?
The patch does not address these failures directly but maybe as a side
effect they would go away too. Maybe Josh can share with us his
experience.

> 
> BTW, what could be the reason for UD QP post_send() failures?
> 

Usually they should not fail unless the WR is malformed or the QP has
all available WR outstanding, which should not happen in IPoIB. I
think printing the return value is in place so in the future we will
have more information in such cases.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kitten - mlx4: Unhandled interrupt - owner bit

2010-03-10 Thread Roland Dreier
 > To clarify I checked each of the statements separatly and from what I
 > could gather
 > (eqe->owner & 0x80) was true and
 > (eq->cons_index & eq->nent) false.
 > But true! As I am not sure what each statement hides,
 > I do not know if both should be false or true for the eqe to be
 > returned. Will try to check the cons_index closer.

As the '^' (XOR) implies, they should be the same for the EQE to be
returned.

 > Where could I find out more about owner and cons_index / nent ?

The ConnectX programmer's reference manual is what you need.
-- 
Roland Dreier  
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kitten - mlx4: Unhandled interrupt - owner bit

2010-03-10 Thread Fredrik Unger
Eli Cohen wrote:
> On Wed, Mar 10, 2010 at 04:03:26PM +0100, Fredrik Unger wrote:
>> When investigating the error it seems to stem from next_eqe_sw in 
>> drivers/net/mlx4/eq.c
>> called by the interrupt handler.
>> What happens is that (eqe->owner & 0x80) is true causing the routine to 
>> return
>> NULL resulting in an unhandled interrupt (eg the interrupt routine returns 0)
> 
> Please note that the condition is a bit more complicated. I quote the
> whole function:
> 
> static struct mlx4_eqe *next_eqe_sw(struct mlx4_eq *eq)
> {
> struct mlx4_eqe *eqe = get_eqe(eq, eq->cons_index);
> return !!(eqe->owner & 0x80) ^ !!(eq->cons_index & eq->nent) ? NULL : 
> eqe;
> }

Yes you are correct,
To clarify I checked each of the statements separatly and from what I
could gather
(eqe->owner & 0x80) was true and
(eq->cons_index & eq->nent) false.
But true! As I am not sure what each statement hides,
I do not know if both should be false or true for the eqe to be
returned. Will try to check the cons_index closer.

Where could I find out more about owner and cons_index / nent ?

Thank you,

Fredrik Unger


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/15] opensm: torus-2QoS example input files

2010-03-10 Thread Jim Schutt

The attached files can be used to test the torus-2QoS routing
engine using ibsim.

fabric-torus-5x5x5 contains a fabric description that ibsim can read.
Once ibsim is running, run opensm like this:

  opensm --config opensm.conf --torus_config torus-2QoS-5x5x5.conf
or 
  opensm --config opensm.conf --torus_config torus-2QoS-5x5x5.conf \
 -Q --qos_policy_file qos-policy-torus-5x5x5.conf

-- Jim



fabric-torus-5x5x5.bz2
Description: application/bzip

# Limit the maximal operational VLs
max_op_vls 8

# The number of seconds between subnet sweeps (0 disables it)
sweep_interval 10

# Routing engine
# Multiple routing engines can be specified separated by
# commas so that specific ordering of routing algorithms will
# be tried if earlier routing engines fail.
# Supported engines: minhop, updn, file, ftree, lash, dor
routing_engine torus-2QoS,no_fallback

# Use unicast routing cache (use FALSE if unsure)
use_ucast_cache TRUE

# Force flush of the log file after each log message
force_log_flush TRUE

# Log file to be used
log_file /dev/tty

# console [off|local|loopback|socket]
console loopback

# Telnet port for console (default 1)
console_port 1

# QoS default options
# Note that for OFED > 1.3, this information can also be in qos-policy.conf.
# However, it may be good to have it here also for torus-2QoS, as this will
# change the defaults even if not using QoS.
qos_max_vls 8
qos_high_limit 0
qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0
qos_vlarb_low 0:64,1:64,2:64,3:64,4:64,5:64,6:64,7:64,8:64
qos_sl2vl (null)

# This is a QoS configuration for the torus-2QoS routing engine.
# As it supports only 2 levels of QoS, via SL bit 3, we should configure
# only SLs 0 and 8.  Based on that torus-2QoS will pick the appropriate
# SL value to provide deadlock-free routing for both QoS levels.

port-groups
port-group
name: Service_nodes
port-name: "H_0_0_0_0/P1"   # E.g. admin
port-name: "H_0_0_1_0/P1"   # E.g. NFS server
port-name: "H_0_0_2_0/P1"   # E.g. boot server
port-name: "H_0_0_3_0/P1"   # E.g. login node
end-port-group

port-group
name: Lustre_nodes

port-name: "H_0_0_4_0/P1"   # E.g. MDS

port-name: "H_0_1_0_0/P1"   # E.g. OSS
port-name: "H_0_1_1_0/P1"   # E.g. OSS
port-name: "H_0_1_2_0/P1"   # E.g. OSS
port-name: "H_0_1_3_0/P1"   # E.g. OSS
port-name: "H_0_1_4_0/P1"   # E.g. OSS
end-port-group

port-group
name: Compute_nodes

port-name: "H_0_2_0_0/P1"
port-name: "H_0_2_1_0/P1"
port-name: "H_0_2_2_0/P1"
port-name: "H_0_2_3_0/P1"
port-name: "H_0_2_4_0/P1"

port-name: "H_0_3_0_0/P1"
port-name: "H_0_3_1_0/P1"
port-name: "H_0_3_2_0/P1"
port-name: "H_0_3_3_0/P1"
port-name: "H_0_3_4_0/P1"

port-name: "H_0_4_0_0/P1"
port-name: "H_0_4_1_0/P1"
port-name: "H_0_4_2_0/P1"
port-name: "H_0_4_3_0/P1"
port-name: "H_0_4_4_0/P1"

port-name: "H_1_0_0_0/P1"
port-name: "H_1_0_1_0/P1"
port-name: "H_1_0_2_0/P1"
port-name: "H_1_0_3_0/P1"
port-name: "H_1_0_4_0/P1"

port-name: "H_1_1_0_0/P1"
port-name: "H_1_1_1_0/P1"
port-name: "H_1_1_2_0/P1"
port-name: "H_1_1_3_0/P1"
port-name: "H_1_1_4_0/P1"

port-name: "H_1_2_0_0/P1"
port-name: "H_1_2_1_0/P1"
port-name: "H_1_2_2_0/P1"
port-name: "H_1_2_3_0/P1"
port-name: "H_1_2_4_0/P1"

port-name: "H_1_3_0_0/P1"
port-name: "H_1_3_1_0/P1"
port-name: "H_1_3_2_0/P1"
port-name: "H_1_3_3_0/P1"
port-name: "H_1_3_4_0/P1"

port-name: "H_1_4_0_0/P1"
port-name: "H_1_4_1_0/P1"
port-name: "H_1_4_2_0/P1"
port-name: "H_1_4_3_0/P1"
port-name: "H_1_4_4_0/P1"

port-name: "H_2_0_0_0/P1"
port-name: "H_2_0_1_0/P1"
port-name: "H_2_0_2_0/P1"
port-name: "H_2_0_3_0/P1"
port-name: "H_2_0_4_0/P1"

port-name: "H_2_1_0_0/P1"
port-name: "H_2_1_1_0/P1"
port-name: "H_2_1_2_0/P1"
port-name: "H_2_1_3_0/P1"
port-name: "H_2_1_4_0/P1"

port-name: "H_2_2_0_0/P1"
port-name: "H_2_2_1_0/P1"
port-name: "H_2_2_2_0/P1"
port-name: "H_2_2_3_0/P1"
port-name: "H_2_2_4_0/P1"

port-name: "H_2_3_0_0/P1"
port-name: "H_2_3_1_0/P1"
port-name: "H_2_3_2_0/P1"
port-name: "H_2_3_3_0/P1"
port-name: "H_2_3_4_0/P1"

port-name: "H_2_4_0_0/P1"
port-name: "H_2_4_1_0/P1"
port-name: "H_2_4_2_0/P1"
port-name: "H_2_4_3_0/P1"
port-name: "H_2_4_4_0/P1"

port-name: "H_3_0_0_0/P1"
port-name: "H_3_0_1_0/P1"
port-name: "H_3_0_2_0/P1"
port-name: "H_3_0_3_0/P1"
port-name: "H_3_0_4_0/P1"

port-name: "H_3_1_0_0/P1"
port-name: "H_3_1_

[PATCH v2 07/15] opensm: Add torus-2QoS routing engine.

2010-03-10 Thread Jim Schutt

Generating routes for a torus that are free of credit loops requires
the use of multiple virtual lanes, and thus SLs on IB.  For IB fabrics
it also requires that _every_ application use path record queries -
any application that uses an SL that was not obtained via a path record
query may cause credit loops.

In addition, if a fabric topology change (e.g. failed switch/link)
causes a change in the path SL values needed to prevent credit loops,
then _every_ application needs to repath for every path whose SL has
changed.  AFAIK there is no good way to do this as yet in general.

Also, the requirement for path SL queries on every connection places a
heavy load on subnet administration, and the possibility that path SL
values can change makes caching as a performance enhancement more
difficult.

Since multiple VL/SL values are required to prevent credit loops on a
torus,  supporting QoS means that QoS and routing need to share the small
pool of available SL values, and the even smaller pool of available VL
values.

The torus-2QoS engine addresses these issues for a 2D/3D torus fabric
by providing the following functionality:
- routing that is free of credit loops
- two levels of QoS, assuming switches support 8 data VLs
- ability to route around a single failed switch, and/or multiple failed
links, without
- introducing credit loops
- changing path SL values
- very short run times, with good scaling properties as fabric size
increases

The routing engine currently in opensm that is most functional for a
torus-connected fabric is LASH.  In comparison with torus-2QoS, LASH
has the following issues:
- LASH does not support QoS.
- changing inter-switch topology (add/remove a switch, or
removing all the links between a switch) can change many
path SL values, potentially leading to credit loops if
running applications do not repath.
- running time to calculate routes scales poorly with increasing
fabric size.

The basic algorithm used by torus-2QoS is DOR.  It also uses SL bits 0-2,
one SL bit per torus dimension, to encode whether a path crosses a dateline
(where the coordinate value wraps to zero) for each of the three dimensions,
in order to avoid the credit loops that otherwise result on a torus.  It
uses SL bit 3 to distinguish between two QoS levels.

It uses the SL2VL tables to map those eight SL values per QoS level into
two VL values per QoS level, based on which coordinate direction a link
points.  For two QoS levels, this consumes four data VLs, where VL bit
0 encodes whether the path crosses the dateline for the coordinate
direction in which the link points, and VL bit 2 encodes QoS level.

In the event of link failure, it routes the long way around the 1-D ring
containing the failed link.  I.e. no turns are introduced into a path in
order to route around a failed link.  Note that due to this implementation,
torus-2QoS cannot route a torus with link failures that break a 1-D ring
into two disjoint segments.

Under DOR routing in a torus with a failed switch, paths that would
otherwise turn at the failed switch cannot be routed without introducing
an "illegal" turn into the path.  Such turns are "illegal" in the
sense that allowing them will allow credit loops, unless something can
be done.

The routes produced by torus-2QoS will introduce such "illegal" turns when
a switch fails.  It makes use of the input/output port dependence in the
SL2VL maps to set the otherwise unused VL bit 1 for the path hop following
such an illegal turn.  This is enough to avoid credit loops in the
presence of a single failed switch.

As an example, consider the following 2D torus, and consider routes
from S to D, both when the switch at F is operational, and when it
has failed.  torus-2QoS will generate routes such that the path
S-F-D is followed if F is operational, and the path S-E-I-L-D
if F has failed:

|||||||
  --+++++++--
|||||||
  --+++++D+--
|||||||
  --++++IL+--
|||||||
  --++S+EF+--
|||||||
  --+++++++--

The turn in S-E-I-L-D at switch I is the illegal turn introduced
into the path.  The turns at E and L are extra turns introduced
into the path that are legal in the sense that no credit loops
can be constructed using them.

The path hop after the turn at switch I has VL bit 1 set, which marks
it as a hop after an illegal turn.

I've used the latest development version of ibdmchk, because it can
use path SL values and SL2VL tables, to check for credit loops in
cases like the above routed with torus-2QoS, and it finds none.

I've also looked for credit loops in a torus with multiple failed
switches routed with torus-2QoS, and learned that if and only if
the failed switches are adjacent in the last DOR dimension, there
will be no

[PATCH v2 14/15] opensm: Avoid havoc in dump_ucast_routes() caused by torus-2QoS persistent use of osm_port_t:priv.

2010-03-10 Thread Jim Schutt
Torus-2QoS makes persistent use of osm_port_t:priv to speed calculation
of path SL values.

However, osm_switch_recommend_path() uses a non-NULL osm_port_t:priv
as a flag that osm_port_t:priv holds a tracking array used when
LMC > 0.  It turns out that 1) dump_ucast_routes() does not need
osm_switch_recommend_path() to consider alternate routes, and 2)
before the addition of torus-2QoS, osm_port_t:priv use never
persisted past the unicast routing function, so it was always
NULL on entry to dump_ucast_routes().

Fix this up by making the routing_for_lmc flag explicitly set by
the caller of osm_switch_recommend_path(), rather than inferring
it from osm_port_t:priv.  This retains existing behavior for
existing routing engines, and allows torus-2QoS to make persistent
use of osm_port_t:priv.

The alternative would be to add another member to osm_port_t,
say osm_port_t:priv2.

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_switch.h |   12 
 opensm/opensm/osm_dump.c   |2 +-
 opensm/opensm/osm_switch.c |7 ---
 opensm/opensm/osm_ucast_mgr.c  |1 +
 4 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/opensm/include/opensm/osm_switch.h 
b/opensm/include/opensm/osm_switch.h
index cb6e5ac..f18d19b 100644
--- a/opensm/include/opensm/osm_switch.h
+++ b/opensm/include/opensm/osm_switch.h
@@ -888,6 +888,7 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * 
p_sw,
  IN osm_port_t * p_port, IN uint16_t lid_ho,
  IN unsigned start_from,
  IN boolean_t ignore_existing,
+ IN boolean_t routing_for_lmc,
  IN boolean_t dor);
 /*
 * PARAMETERS
@@ -910,6 +911,17 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * 
p_sw,
 *  If false, the switch will choose an existing route if one
 *  exists, otherwise will choose the optimal route.
 *
+*  routing_for_lmc
+*  [in] We support an enhanced LMC aware routing mode:
+*  In the case of LMC > 0, we can track the remote side
+*  system and node for all of the lids of the target
+*  and try and avoid routing again through the same
+*  system / node.
+*
+*  Assume if routing_for_lmc is TRUE that this procedure
+*  was provided with the tracking array and counter via
+*  p_port->priv, and we can conduct this algorithm.
+*
 *  dor
 *  [in] If TRUE, Dimension Order Routing will be done.
 *
diff --git a/opensm/opensm/osm_dump.c b/opensm/opensm/osm_dump.c
index f3f4623..030de74 100644
--- a/opensm/opensm/osm_dump.c
+++ b/opensm/opensm/osm_dump.c
@@ -221,7 +221,7 @@ static void dump_ucast_routes(cl_map_item_t * item, FILE * 
file, void *cxt)
/* No LMC Optimization */
best_port = osm_switch_recommend_path(p_sw, p_port,
  lid_ho, 1, TRUE,
- dor);
+ FALSE, dor);
fprintf(file, "No %u hop path possible via port %u!",
best_hops, best_port);
}
diff --git a/opensm/opensm/osm_switch.c b/opensm/opensm/osm_switch.c
index 1cd8bfc..14b0021 100644
--- a/opensm/opensm/osm_switch.c
+++ b/opensm/opensm/osm_switch.c
@@ -214,6 +214,7 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * 
p_sw,
  IN osm_port_t * p_port, IN uint16_t lid_ho,
  IN unsigned start_from,
  IN boolean_t ignore_existing,
+ IN boolean_t routing_for_lmc,
  IN boolean_t dor)
 {
/*
@@ -223,10 +224,10 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * 
p_sw,
   and try and avoid routing again through the same
   system / node.
 
-  If this procedure is provided with the tracking array
-  and counter we can conduct this algorithm.
+  Assume if routing_for_lmc is true that this procedure was
+  provided the tracking array and counter via p_port->priv,
+  and we can conduct this algorithm.
 */
-   boolean_t routing_for_lmc = (p_port->priv != NULL);
uint16_t base_lid;
uint8_t hops;
uint8_t least_hops;
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index 9a3ea25..fbc9244 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -251,6 +251,7 @@ static void ucast_mgr_process_port(IN osm_ucast_mgr_t * 
p_mgr,
 */
port = osm_switch_recommend_path(p_sw, p_port, lid_ho, start_from,
 p_mgr->p_subn->ignore_

[PATCH v2 12/15] opensm: Make it possible to configure no fallback routing engine.

2010-03-10 Thread Jim Schutt
For a fabric that requires routing with an engine with special properties,
say avoiding credit loops via making use of SLs in routing, it might
be preferable to not fall back to minhop if the configured routing engine
fails.

E.g. the torus-2QoS routing engine uses both SL2VL maps and path SL values
to provide routing free of credit loops, but cannot route fabrics for
some patterns of failed switches.  Should a switch fail that creates such
a pattern, it may be preferable to keep the previous routing information
loaded in the switches until a switch can be replaced that restores
torus-2QoS's ability to route the fabric.

The alternative, having some other engine route the fabric, will immediately
introduce credit loops.

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_subnet.h |1 +
 opensm/opensm/osm_opensm.c |5 +
 opensm/opensm/osm_qos.c|6 ++
 opensm/opensm/osm_ucast_mgr.c  |   23 +++
 4 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h 
b/opensm/include/opensm/osm_subnet.h
index d2d9661..bd5b6f5 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -216,6 +216,7 @@ typedef struct osm_subn_opt {
osm_qos_options_t qos_rtr_options;
boolean_t enable_quirks;
boolean_t no_clients_rereg;
+   boolean_t no_fallback_routing_engine;
 #ifdef ENABLE_OSM_PERF_MGR
boolean_t perfmgr;
boolean_t perfmgr_redir;
diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c
index 10d3af5..4b2b971 100644
--- a/opensm/opensm/osm_opensm.c
+++ b/opensm/opensm/osm_opensm.c
@@ -159,6 +159,11 @@ static struct osm_routing_engine 
*setup_routing_engine(osm_opensm_t *osm,
struct osm_routing_engine *re;
const struct routing_engine_module *m;
 
+   if (!strcmp(name, "no_fallback")) {
+   osm->subn.opt.no_fallback_routing_engine = TRUE;
+   return NULL;
+   }
+
for (m = routing_modules; m->name && *m->name; m++) {
if (!strcmp(m->name, name)) {
re = malloc(sizeof(struct osm_routing_engine));
diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
index d78531b..8a26008 100644
--- a/opensm/opensm/osm_qos.c
+++ b/opensm/opensm/osm_qos.c
@@ -211,6 +211,12 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t 
*node,
int ret = 0;
unsigned i, j;
 
+   /*
+* Do nothing unless the most recent routing attempt was successful.
+*/
+   if (!re)
+   return ret;
+
for (i = 1; i < num_ports; i++) {
p = osm_node_get_physp_ptr(node, i);
force_update = p->need_update || sm->p_subn->need_update;
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index dd6568f..d7a4a8c 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -977,7 +977,8 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
p_routing_eng = p_routing_eng->next;
}
 
-   if (!p_osm->routing_engine_used) {
+   if (!p_osm->routing_engine_used &&
+   p_osm->subn.opt.no_fallback_routing_engine != TRUE) {
/* If configured routing algorithm failed, use default MinHop */
struct osm_routing_engine *r = p_osm->default_routing_engine;
 
@@ -987,14 +988,20 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
osm_ucast_mgr_set_fwd_tables(p_mgr);
}
 
-   OSM_LOG(p_mgr->p_log, OSM_LOG_INFO,
-   "%s tables configured on all switches\n",
-   osm_routing_engine_type_str(p_osm->
-   routing_engine_used->type));
-
-   if (p_mgr->p_subn->opt.use_ucast_cache)
-   p_mgr->cache_valid = TRUE;
+   if (p_osm->routing_engine_used) {
+   OSM_LOG(p_mgr->p_log, OSM_LOG_INFO,
+   "%s tables configured on all switches\n",
+   osm_routing_engine_type_str(p_osm->
+   routing_engine_used->type));
 
+   if (p_mgr->p_subn->opt.use_ucast_cache)
+   p_mgr->cache_valid = TRUE;
+   } else {
+   p_mgr->p_subn->subnet_initialization_error = TRUE;
+   OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR,
+   "No routing engine able to successfully configure "
+   " switch tables on current fabric\n");
+   }
 Exit:
CL_PLOCK_RELEASE(p_mgr->p_lock);
OSM_LOG_EXIT(p_mgr->p_log);
-- 
1.6.6.1


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 11/15] opensm: Do not require -Q option for torus-2QoS routing engine.

2010-03-10 Thread Jim Schutt
The torus-2QoS engine provides a deadlock-free routing for a 2D/3D torus,
but requires that switch SL2VL maps be programmed.  Before this change,
"opensm -Q" was required for that to happen.

When a routing engine sets the struct osm_routing_engine:update_sl2vl
pointer, it is signalling its intent to participate in SL2VL map programming.
So, don't return early from osm_qos_setup() in that case; instead do everything
except attempt to read QoS configuration information.

For that to work properly, need to also always set up the default QoS config
information, instead of just when QoS is requested via -Q.

With that in place, the -Q option now means the same thing to torus-2QoS that
it means to other routing engines: QoS configuration is requested.

Signed-off-by: Jim Schutt 
---
 opensm/opensm/osm_qos.c|7 +--
 opensm/opensm/osm_subnet.c |   18 +-
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
index 23fd316..d78531b 100644
--- a/opensm/opensm/osm_qos.c
+++ b/opensm/opensm/osm_qos.c
@@ -289,7 +289,9 @@ int osm_qos_setup(osm_opensm_t * p_osm)
osm_node_t *p_node;
int ret = 0;
 
-   if (!p_osm->subn.opt.qos)
+   if (!(p_osm->subn.opt.qos ||
+ (p_osm->routing_engine_used &&
+  p_osm->routing_engine_used->update_sl2vl)))
return 0;
 
OSM_LOG_ENTER(&p_osm->log);
@@ -306,7 +308,8 @@ int osm_qos_setup(osm_opensm_t * p_osm)
cl_plock_excl_acquire(&p_osm->lock);
 
/* read QoS policy config file */
-   osm_qos_parse_policy_file(&p_osm->subn);
+   if (p_osm->subn.opt.qos)
+   osm_qos_parse_policy_file(&p_osm->subn);
 
p_tbl = &p_osm->subn.port_guid_tbl;
p_next = cl_qmap_head(p_tbl);
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index 47aa529..5478eae 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -1056,6 +1056,8 @@ static void subn_verify_qos_set(osm_qos_options_t *set, 
const char *prefix,
 
 int osm_subn_verify_config(IN osm_subn_opt_t * p_opts)
 {
+   osm_qos_options_t dflt;
+
if (p_opts->lmc > 7) {
log_report(" Invalid Cached Option Value:lmc = %u:"
   "Using Default:%u\n", p_opts->lmc, OSM_DEFAULT_LMC);
@@ -1099,17 +1101,15 @@ int osm_subn_verify_config(IN osm_subn_opt_t * p_opts)
p_opts->console = OSM_DEFAULT_CONSOLE;
}
 
-   if (p_opts->qos) {
-   osm_qos_options_t dflt;
-
-   /* the default options in qos_options must be correct.
-* every other one need not be, b/c those will default
-* back to whatever is in qos_options.
-*/
 
-   subn_set_default_qos_options(&dflt);
+   /* the default options in qos_options must be correct.
+* every other one need not be, b/c those will default
+* back to whatever is in qos_options.
+*/
+   subn_set_default_qos_options(&dflt);
+   subn_verify_qos_set(&p_opts->qos_options, "qos", &dflt);
 
-   subn_verify_qos_set(&p_opts->qos_options, "qos", &dflt);
+   if (p_opts->qos) {
subn_verify_qos_set(&p_opts->qos_ca_options, "qos_ca",
&p_opts->qos_options);
subn_verify_qos_set(&p_opts->qos_sw0_options, "qos_sw0",
-- 
1.6.6.1


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 15/15] opensm: Cause status of unicast routing attempt to propogate to callers of osm_ucast_mgr_process().

2010-03-10 Thread Jim Schutt
If unicast routing fails, there is no point to continuing with fabric bring-up.
Just restart a new heavy sweep instead.

Signed-off-by: Jim Schutt 
---
 opensm/opensm/osm_state_mgr.c |   12 +---
 opensm/opensm/osm_ucast_mgr.c |   14 +-
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index 96ad348..e666034 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1140,7 +1140,11 @@ static void do_sweep(osm_sm_t * sm)
/* Re-program the switches fully */
sm->p_subn->ignore_existing_lfts = TRUE;
 
-   osm_ucast_mgr_process(&sm->ucast_mgr);
+   if (osm_ucast_mgr_process(&sm->ucast_mgr)) {
+   OSM_LOG_MSG_BOX(sm->p_log, OSM_LOG_VERBOSE,
+   "REROUTE FAILED");
+   return;
+   }
osm_qos_setup(sm->p_subn->p_osm);
 
/* Reset flag */
@@ -1299,12 +1303,14 @@ repeat_discovery:
"LID ASSIGNMENT COMPLETE - STARTING SWITCH TABLE 
CONFIG");
 
/*
-* Proceed with unicast forwarding table configuration.
+* Proceed with unicast forwarding table configuration; repeat
+* if unicast routing fails.
 */
 
if (!sm->ucast_mgr.cache_valid ||
osm_ucast_cache_process(&sm->ucast_mgr))
-   osm_ucast_mgr_process(&sm->ucast_mgr);
+   if (osm_ucast_mgr_process(&sm->ucast_mgr))
+   goto repeat_discovery;
 
osm_qos_setup(sm->p_subn->p_osm);
 
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index fbc9244..8ea2e52 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -955,6 +955,7 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
osm_opensm_t *p_osm;
struct osm_routing_engine *p_routing_eng;
cl_qmap_t *p_sw_guid_tbl;
+   int failed = 0;
 
OSM_LOG_ENTER(p_mgr->p_log);
 
@@ -973,7 +974,8 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
 
p_osm->routing_engine_used = NULL;
while (p_routing_eng) {
-   if (!ucast_mgr_route(p_routing_eng, p_osm))
+   failed = ucast_mgr_route(p_routing_eng, p_osm);
+   if (!failed)
break;
p_routing_eng = p_routing_eng->next;
}
@@ -984,9 +986,11 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
struct osm_routing_engine *r = p_osm->default_routing_engine;
 
r->build_lid_matrices(r->context);
-   r->ucast_build_fwd_tables(r->context);
-   p_osm->routing_engine_used = r;
-   osm_ucast_mgr_set_fwd_tables(p_mgr);
+   failed = r->ucast_build_fwd_tables(r->context);
+   if (!failed) {
+   p_osm->routing_engine_used = r;
+   osm_ucast_mgr_set_fwd_tables(p_mgr);
+   }
}
 
if (p_osm->routing_engine_used) {
@@ -1006,7 +1010,7 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
 Exit:
CL_PLOCK_RELEASE(p_mgr->p_lock);
OSM_LOG_EXIT(p_mgr->p_log);
-   return 0;
+   return failed;
 }
 
 static int ucast_build_lid_matrices(void *context)
-- 
1.6.6.1


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 13/15] opensm: Avoid havoc in minhop caused by torus-2QoS persistent use of osm_port_t:priv.

2010-03-10 Thread Jim Schutt
Torus-2QoS makes persistent use of osm_port_t:priv to speed calculation
of path SL values.

It cannot clear osm_port_t:priv members when it tears down its persistent
data for the following reason: If a port is removed from the fabric, the
opensm core will delete the corresponding osm_port_t object, leaving
torus-2QoS holding a dangling reference.  Torus-2QoS then has a use-after-free
error when tearing down its persistent data if it tries to use its dangling
osm_port_t reference to clear the priv member.

When torus-2QoS is unable to route a fabric due to missing switches and
opensm is configured to fall back to minhop, havoc will ensue because
minhop uses a non-NULL osm_port_t:priv as a proxy for LMC > 0: it
assumes if osm_port_t:priv is non-NULL it can only be because
alloc_ports_priv() has been called.

Fix this up by always calling alloc_ports_priv(), and have it set
priv = NULL if LMC == 0.

Signed-off-by: Jim Schutt 
---
 opensm/opensm/osm_ucast_mgr.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index d7a4a8c..9a3ea25 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -314,8 +314,10 @@ static void alloc_ports_priv(osm_ucast_mgr_t * mgr)
 item = cl_qmap_next(item)) {
port = (osm_port_t *) item;
lmc = ib_port_info_get_lmc(&port->p_physp->port_info);
-   if (!lmc)
+   if (!lmc) {
+   port->priv = NULL;
continue;
+   }
r = malloc(sizeof(*r) + sizeof(r->guids[0]) * (1 << lmc));
if (!r) {
OSM_LOG(mgr->p_log, OSM_LOG_ERROR, "ERR 3A09: "
@@ -362,8 +364,7 @@ static void ucast_mgr_process_tbl(IN cl_map_item_t * 
p_map_item,
/* Initialize LIDs in buffer to invalid port number. */
memset(p_sw->new_lft, OSM_NO_PATH, p_sw->max_lid_ho + 1);
 
-   if (p_mgr->p_subn->opt.lmc)
-   alloc_ports_priv(p_mgr);
+   alloc_ports_priv(p_mgr);
 
/*
   Iterate through every port setting LID routes for each
@@ -380,8 +381,7 @@ static void ucast_mgr_process_tbl(IN cl_map_item_t * 
p_map_item,
}
}
 
-   if (p_mgr->p_subn->opt.lmc)
-   free_ports_priv(p_mgr);
+   free_ports_priv(p_mgr);
 
OSM_LOG_EXIT(p_mgr->p_log);
 }
-- 
1.6.6.1


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 03/15] opensm: Allow the routing engine to participate in path SL calculations.

2010-03-10 Thread Jim Schutt
LASH already does this, in a hard-coded fashion.

Generalize this by adding a callback to struct osm_routing_engine that
computes a path SL value, and fix up LASH to use it.

This patchset causes the requested or QoS-computed SL value to be passed
to the routing engine path SL computation as a hint.  In the event the
routing engine's use of SLs allows it to support more than one QoS level,
it may be able to make use of the SL hint to do so.

For now, LASH just ignores the hint.

Note that before this change, if LASH was configured and a specific path
SL value was requested that differed from what LASH needed to route the
fabric without credit loops, the path SL lookup would fail.  Now LASH's
SL value is always used.

Possibly the choice between failing a path SL request when it conflicts
with routing, vs. always providing an SL value that gives a credit-loop-
free routing, should be user-configurable?

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_opensm.h |6 +
 opensm/include/opensm/osm_ucast_lash.h |3 --
 opensm/opensm/osm_link_mgr.c   |   15 -
 opensm/opensm/osm_sa_path_record.c |   34 +++
 opensm/opensm/osm_ucast_lash.c |8 +-
 5 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h 
b/opensm/include/opensm/osm_opensm.h
index 25a6f90..734a6db 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -129,6 +129,9 @@ struct osm_routing_engine {
void (*update_sl2vl)(void *context, IN osm_physp_t *port,
 IN uint8_t in_port_num, IN uint8_t out_port_num,
 IN OUT ib_slvl_table_t *t);
+   uint8_t (*path_sl)(void *context, IN uint8_t path_sl_hint,
+  IN const osm_port_t *src_port,
+  IN const osm_port_t *dst_port);
void (*delete) (void *context);
struct osm_routing_engine *next;
 };
@@ -159,6 +162,9 @@ struct osm_routing_engine {
 *  which part of the SL2VL map to update.  For router/HCA ports,
 *  in_port_num/out_port_num should be ignored.
 *
+*  path_sl
+*  The callback for computing path SL.
+*
 *  delete
 *  The delete method, may be used for routing engine
 *  internals cleanup.
diff --git a/opensm/include/opensm/osm_ucast_lash.h 
b/opensm/include/opensm/osm_ucast_lash.h
index 9e15d38..dd90d5d 100644
--- a/opensm/include/opensm/osm_ucast_lash.h
+++ b/opensm/include/opensm/osm_ucast_lash.h
@@ -94,7 +94,4 @@ typedef struct _lash {
int ***virtual_location;
 } lash_t;
 
-uint8_t osm_get_lash_sl(osm_opensm_t * p_osm, const osm_port_t * p_src_port,
-   const osm_port_t * p_dst_port);
-
 #endif
diff --git a/opensm/opensm/osm_link_mgr.c b/opensm/opensm/osm_link_mgr.c
index aaeebc7..02d6ec8 100644
--- a/opensm/opensm/osm_link_mgr.c
+++ b/opensm/opensm/osm_link_mgr.c
@@ -53,21 +53,23 @@
 #include 
 #include 
 #include 
-#include 
 
 static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN osm_physp_t * p_physp)
 {
osm_opensm_t *p_osm = sm->p_subn->p_osm;
+   struct osm_routing_engine *re = p_osm->routing_engine_used;
const osm_port_t *p_sm_port, *p_src_port;
ib_net16_t slid, smlid;
uint8_t sl;
 
OSM_LOG_ENTER(sm->p_log);
 
-   if (!(p_osm->routing_engine_used &&
- p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH 
&&
+   if (!(re && re->path_sl &&
  (slid = osm_physp_get_base_lid(p_physp {
-   /* Use default SL if lash routing is not used */
+   /*
+* Use default SL if routing engine does not provide a
+* path SL lookup callback.
+*/
OSM_LOG_EXIT(sm->p_log);
return sm->p_subn->opt.sm_sl;
}
@@ -81,8 +83,9 @@ static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN 
osm_physp_t * p_physp)
p_src_port =
cl_ptr_vector_get(&sm->p_subn->port_lid_tbl, cl_ntoh16(slid));
 
-   /* Call lash to find proper SL */
-   sl = osm_get_lash_sl(p_osm, p_src_port, p_sm_port);
+   /* Call into routing engine to find proper SL */
+   sl = re->path_sl(re->context, sm->p_subn->opt.sm_sl,
+p_src_port, p_sm_port);
 
OSM_LOG_EXIT(sm->p_log);
return sl;
diff --git a/opensm/opensm/osm_sa_path_record.c 
b/opensm/opensm/osm_sa_path_record.c
index d88832b..b55d94c 100644
--- a/opensm/opensm/osm_sa_path_record.c
+++ b/opensm/opensm/osm_sa_path_record.c
@@ -161,6 +161,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * 
sa,
const osm_physp_t *p_dest_physp;
const osm_prtn_t *p_prtn = NULL;
osm_opensm_t *p_osm;
+   struct osm_routing_engine *p_re;
const ib_port_info_t *p_pi;
ib_api_status_t status = IB_SUCCESS;
ib_net16_t 

[PATCH v2 09/15] opensm: Enable torus-2QoS routing engine.

2010-03-10 Thread Jim Schutt

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_opensm.h |1 +
 opensm/opensm/main.c   |2 +-
 opensm/opensm/osm_opensm.c |6 ++
 3 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h 
b/opensm/include/opensm/osm_opensm.h
index fddcf53..8d63111 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -105,6 +105,7 @@ typedef enum _osm_routing_engine_type {
OSM_ROUTING_ENGINE_TYPE_FTREE,
OSM_ROUTING_ENGINE_TYPE_LASH,
OSM_ROUTING_ENGINE_TYPE_DOR,
+   OSM_ROUTING_ENGINE_TYPE_TORUS_2QOS,
OSM_ROUTING_ENGINE_TYPE_UNKNOWN
 } osm_routing_engine_type_t;
 /***/
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index f9a33af..f396de4 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -174,7 +174,7 @@ static void show_usage(void)
   "  Min Hop algorithm.  Multiple routing engines can be 
specified\n"
   "  separated by commas so that specific ordering of 
routing\n"
   "  algorithms will be tried if earlier routing engines 
fail.\n"
-  "  Supported engines: updn, file, ftree, lash, dor\n\n");
+  "  Supported engines: updn, file, ftree, lash, dor, 
torus-2QoS\n\n");
printf("--do_mesh_analysis\n"
   "  This option enables additional analysis for the 
lash\n"
   "  routing engine to precondition switch port 
assignments\n"
diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c
index be1f153..10d3af5 100644
--- a/opensm/opensm/osm_opensm.c
+++ b/opensm/opensm/osm_opensm.c
@@ -70,6 +70,7 @@ extern int osm_ucast_file_setup(struct osm_routing_engine *, 
osm_opensm_t *);
 extern int osm_ucast_ftree_setup(struct osm_routing_engine *, osm_opensm_t *);
 extern int osm_ucast_lash_setup(struct osm_routing_engine *, osm_opensm_t *);
 extern int osm_ucast_dor_setup(struct osm_routing_engine *, osm_opensm_t *);
+extern int osm_ucast_torus2QoS_setup(struct osm_routing_engine *, osm_opensm_t 
*);
 
 const static struct routing_engine_module routing_modules[] = {
{"minhop", osm_ucast_minhop_setup},
@@ -78,6 +79,7 @@ const static struct routing_engine_module routing_modules[] = 
{
{"ftree", osm_ucast_ftree_setup},
{"lash", osm_ucast_lash_setup},
{"dor", osm_ucast_dor_setup},
+   {"torus-2QoS", osm_ucast_torus2QoS_setup},
{NULL, NULL}
 };
 
@@ -98,6 +100,8 @@ const char *osm_routing_engine_type_str(IN 
osm_routing_engine_type_t type)
return "lash";
case OSM_ROUTING_ENGINE_TYPE_DOR:
return "dor";
+   case OSM_ROUTING_ENGINE_TYPE_TORUS_2QOS:
+   return "torus-2QoS";
default:
break;
}
@@ -124,6 +128,8 @@ osm_routing_engine_type_t osm_routing_engine_type(IN const 
char *str)
return OSM_ROUTING_ENGINE_TYPE_LASH;
else if (!strcasecmp(str, "dor"))
return OSM_ROUTING_ENGINE_TYPE_DOR;
+   else if (!strcasecmp(str, "torus-2QoS"))
+   return OSM_ROUTING_ENGINE_TYPE_TORUS_2QOS;
else
return OSM_ROUTING_ENGINE_TYPE_UNKNOWN;
 }
-- 
1.6.6.1


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 00/15] opensm: Add new torus routing engine: torus-2QoS

2010-03-10 Thread Jim Schutt
This is v2 of a patchset to add to opensm a new routing engine designed
to handle large fabrics connected with a 2D/3D topology.

Changes since initial version:

- Merged my patchsets from 11/20/2009, 12/18/2009, 2/16/2010.
- Moved infomation contained in the earlier patch series introduction
emails into the appropriate commit messages.
- Rebased to c183eb8c4c.
- Addressed issues found by Yevgeny Kliteynik in original patchsets.
Yevgeny's --no_default_routing option patch is not included
in the merging, but would be a good addition.
- Renamed osm_ucast_torus.c to osm_torus.c.
Since osm_torus.c contains code to implement both unicast and
multicast routing, the new name seems more appropriate.  The
multicast support depends heavily on the unicast routing code,
so it is more convenient to keep everything in one file.
- Removed redundant check for changed sl2vl map.
This functionality already exists in sl2vl_update_table().
- Set sl2vl maps on CA ports for torus-2QoS.
This was missing in the original patches.
- Do not force torus-2QoS to use SLs 8-15 when not using "opensm -Q".
This was an interim measure introduced before multicast support was
working, that allowed multicast to use SL/VL 0 and thus not deadlock
against unicast.  I forget to take it out in the multicast patchset,
so I took it out when I merged.
- Renamed torus variables referencing "origin" to "seed".
These things refer to switches used to seed the torus topology
appropriately, so the new name should reduce confusion going forward.
This also contains a keyword change in the torus configuration file,
so I'll repost an updated example.

Jim Schutt (15):
  opensm: Prepare for routing engine input to path record SL lookup and
SL2VL map setup.
  opensm: Allow the routing engine to influence SL2VL calculations.
  opensm: Allow the routing engine to participate in path SL
calculations.
  opensm: Track the minimum value in the fabric of data VLs supported.
  opensm: Add struct osm_routing_engine callback to build spanning
trees for multicast.
  opensm: Make mcast_mgr_purge_tree() available outside
osm_mcast_mgr.c.
  opensm: Add torus-2QoS routing engine.
  opensm: Update documentation to describe torus-2QoS.
  opensm: Enable torus-2QoS routing engine.
  opensm: Add opensm option to specify file name for extra torus-2QoS
configuration information.
  opensm: Do not require -Q option for torus-2QoS routing engine.
  opensm: Make it possible to configure no fallback routing engine.
  opensm: Avoid havoc in minhop caused by torus-2QoS persistent use of
osm_port_t:priv.
  opensm: Avoid havoc in dump_ucast_routes() caused by torus-2QoS
persistent use of osm_port_t:priv.
  opensm: Cause status of unicast routing attempt to propogate to
callers of osm_ucast_mgr_process().

 opensm/doc/current-routing.txt |  269 +-
 opensm/include/opensm/osm_base.h   |   18 +
 opensm/include/opensm/osm_multicast.h  |   33 +
 opensm/include/opensm/osm_opensm.h |   29 +-
 opensm/include/opensm/osm_subnet.h |7 +
 opensm/include/opensm/osm_switch.h |   12 +
 opensm/include/opensm/osm_ucast_lash.h |3 -
 opensm/man/opensm.8.in |9 +-
 opensm/opensm/Makefile.am  |2 +-
 opensm/opensm/main.c   |   11 +-
 opensm/opensm/osm_console.c|   10 +-
 opensm/opensm/osm_dump.c   |5 +-
 opensm/opensm/osm_link_mgr.c   |   16 +-
 opensm/opensm/osm_mcast_mgr.c  |   11 +-
 opensm/opensm/osm_opensm.c |   54 +-
 opensm/opensm/osm_port_info_rcv.c  |   13 +-
 opensm/opensm/osm_qos.c|   40 +-
 opensm/opensm/osm_sa_path_record.c |   33 +-
 opensm/opensm/osm_state_mgr.c  |   23 +-
 opensm/opensm/osm_subnet.c |   20 +-
 opensm/opensm/osm_switch.c |7 +-
 opensm/opensm/osm_torus.c  | 9114 
 opensm/opensm/osm_ucast_lash.c |   11 +-
 opensm/opensm/osm_ucast_mgr.c  |   55 +-
 24 files changed, 9696 insertions(+), 109 deletions(-)
 create mode 100644 opensm/opensm/osm_torus.c


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 10/15] opensm: Add opensm option to specify file name for extra torus-2QoS configuration information.

2010-03-10 Thread Jim Schutt

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_base.h   |   18 ++
 opensm/include/opensm/osm_subnet.h |5 +
 opensm/opensm/main.c   |9 +
 opensm/opensm/osm_subnet.c |1 +
 opensm/opensm/osm_torus.c  |2 +-
 5 files changed, 34 insertions(+), 1 deletions(-)

diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h
index 4e9aaa9..8720c38 100644
--- a/opensm/include/opensm/osm_base.h
+++ b/opensm/include/opensm/osm_base.h
@@ -277,6 +277,24 @@ BEGIN_C_DECLS
 #endif /* __WIN__ */
 /***/
 
+/d* OpenSM: Base/OSM_DEFAULT_TORUS_CONF_FILE
+* NAME
+*  OSM_DEFAULT_TORUS_CONF_FILE
+*
+* DESCRIPTION
+*  Specifies the default file name for extra torus-2QoS configuration
+*
+* SYNOPSIS
+*/
+#ifdef __WIN__
+#define OSM_DEFAULT_TORUS_CONF_FILE strcat(GetOsmCachePath(), 
"osm-torus-2QoS.conf")
+#elif defined(OPENSM_CONFIG_DIR)
+#define OSM_DEFAULT_TORUS_CONF_FILE OPENSM_CONFIG_DIR "/torus-2QoS.conf"
+#else
+#define OSM_DEFAULT_TORUS_CONF_FILE "/etc/opensm/torus-2QoS.conf"
+#endif /* __WIN__ */
+/***/
+
 /d* OpenSM: Base/OSM_DEFAULT_PREFIX_ROUTES_FILE
 * NAME
 *  OSM_DEFAULT_PREFIX_ROUTES_FILE
diff --git a/opensm/include/opensm/osm_subnet.h 
b/opensm/include/opensm/osm_subnet.h
index d74a57c..d2d9661 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -201,6 +201,7 @@ typedef struct osm_subn_opt {
char *guid_routing_order_file;
char *sa_db_file;
boolean_t sa_db_dump;
+   char *torus_conf_file;
boolean_t do_mesh_analysis;
boolean_t exit_on_fatal;
boolean_t honor_guid2lid_file;
@@ -418,6 +419,10 @@ typedef struct osm_subn_opt {
 *  When TRUE causes OpenSM to dump SA DB at the end of every
 *  light sweep regardless the current verbosity level.
 *
+*  torus_conf_file
+*  Name of the file with extra configuration info for torus-2QoS
+*  routing engine.
+*
 *  exit_on_fatal
 *  If TRUE (default) - SM will exit on fatal subnet initialization
 *  issues.
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index f396de4..578ae9f 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -231,6 +231,10 @@ static void show_usage(void)
   "  Set the order port guids will be routed for the 
MinHop\n"
   "  and Up/Down routing algorithms to the guids provided 
in the\n"
   "  given file (one to a line)\n\n");
+   printf("--torus_config \n"
+  "  This option defines the file name for the extra 
configuration\n"
+  "  info needed for the torus-2QoS routing engine.   The 
default\n"
+  "  name is \'"OSM_DEFAULT_TORUS_CONF_FILE"\'\n\n");
printf("--once, -o\n"
   "  This option causes OpenSM to configure the subnet\n"
   "  once, then exit.  Ports remain in the ACTIVE 
state.\n\n");
@@ -610,6 +614,7 @@ int main(int argc, char *argv[])
{"sm_sl", 1, NULL, 7},
{"retries", 1, NULL, 8},
{"log_prefix", 1, NULL, 9},
+   {"torus_config", 1, NULL, 10},
{NULL, 0, NULL, 0}  /* Required at the end of the array */
};
 
@@ -992,6 +997,10 @@ int main(int argc, char *argv[])
SET_STR_OPT(opt.log_prefix, optarg);
printf("Log prefix = %s\n", opt.log_prefix);
break;
+   case 10:
+   SET_STR_OPT(opt.torus_conf_file, optarg);
+   printf("Torus-2QoS config file = %s\n", 
opt.torus_conf_file);
+   break;
case 'h':
case '?':
case ':':
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index 55b9384..47aa529 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -758,6 +758,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * p_opt)
p_opt->guid_routing_order_file = NULL;
p_opt->sa_db_file = NULL;
p_opt->sa_db_dump = FALSE;
+   p_opt->torus_conf_file = strdup(OSM_DEFAULT_TORUS_CONF_FILE);
p_opt->do_mesh_analysis = FALSE;
p_opt->exit_on_fatal = TRUE;
p_opt->enable_quirks = FALSE;
diff --git a/opensm/opensm/osm_torus.c b/opensm/opensm/osm_torus.c
index 7f80034..7c3b550 100644
--- a/opensm/opensm/osm_torus.c
+++ b/opensm/opensm/osm_torus.c
@@ -9043,7 +9043,7 @@ int torus_build_lfts(void *context)
torus->osm = ctx->osm;
fabric->osm = ctx->osm;
 
-   if (!parse_config(OPENSM_CONFIG_DIR "/opensm-torus.conf",
+   if (!parse_config(ctx->osm->subn.opt.torus_conf_file,
  fabric, torus))
goto out;
 
-- 
1.6.6.1


--
To unsubscribe from this list: send the line "unsub

[PATCH v2 02/15] opensm: Allow the routing engine to influence SL2VL calculations.

2010-03-10 Thread Jim Schutt
Note that the original code assumes that QoS setup is mostly static and
based only on user configuration.  As a result, there is no provision for
routing engines that want to compute contributions to the SL2VL maps.

Fix this up by adding a callback to struct osm_routing_engine that computes
a per-port SL2VL map, and call it from the appropriate place in the QoS
setup path.  Assume that if a routing engine provides a update_sl2vl()
callback that there will input-port dependence in the SL2VL maps, and
so do not attempt to use optimized SL2VL map programming even if the
switch supports it.

Also need to move the call to osm_qos_setup() in do_sweep() to after the
call to the routing engine, so that any SL2VL map contributions from the
routing engine are based on the latest information.  Need to call
osm_qos_setup() for requested reroute for the same reason.

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_opensm.h |   12 
 opensm/opensm/osm_qos.c|   27 +++
 opensm/opensm/osm_state_mgr.c  |5 +++--
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h 
b/opensm/include/opensm/osm_opensm.h
index e97142e..25a6f90 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -126,6 +126,9 @@ struct osm_routing_engine {
int (*build_lid_matrices) (void *context);
int (*ucast_build_fwd_tables) (void *context);
void (*ucast_dump_tables) (void *context);
+   void (*update_sl2vl)(void *context, IN osm_physp_t *port,
+IN uint8_t in_port_num, IN uint8_t out_port_num,
+IN OUT ib_slvl_table_t *t);
void (*delete) (void *context);
struct osm_routing_engine *next;
 };
@@ -147,6 +150,15 @@ struct osm_routing_engine {
 *  ucast_dump_tables
 *  The callback for dumping unicast routing tables.
 *
+*  update_sl2vl(void *context, IN osm_physp_t *port,
+*   IN uint8_t in_port_num, IN uint8_t out_port_num,
+*   OUT ib_slvl_table_t *t)
+*  The callback to allow routing engine input for SL2VL maps.
+*  *port is the phyical port for which the SL2VL map is to be
+*  updated. For switches, in_port_num/out_port_num identify
+*  which part of the SL2VL map to update.  For router/HCA ports,
+*  in_port_num/out_port_num should be ignored.
+*
 *  delete
 *  The delete method, may be used for routing engine
 *  internals cleanup.
diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
index f814ea8..23fd316 100644
--- a/opensm/opensm/osm_qos.c
+++ b/opensm/opensm/osm_qos.c
@@ -207,6 +207,7 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t 
*node,
osm_physp_t *p0, *p;
unsigned force_update;
unsigned num_ports = osm_node_get_num_physp(node);
+   struct osm_routing_engine *re = sm->p_subn->p_osm->routing_engine_used;
int ret = 0;
unsigned i, j;
 
@@ -223,7 +224,7 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t 
*node,
return ret;
 
if (ib_switch_info_get_opt_sl2vlmapping(&node->sw->switch_info) &&
-   sm->p_subn->opt.use_optimized_slvl) {
+   sm->p_subn->opt.use_optimized_slvl && !re->update_sl2vl) {
p = osm_node_get_physp_ptr(node, 1);
force_update = p->need_update || sm->p_subn->need_update;
return sl2vl_update_table(sm, p, 1, 0x3, force_update,
@@ -233,10 +234,20 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t 
*node,
for (i = 1; i < num_ports; i++) {
p = osm_node_get_physp_ptr(node, i);
force_update = p->need_update || sm->p_subn->need_update;
-   for (j = 0; j < num_ports; j++)
+   for (j = 0; j < num_ports; j++) {
+   const ib_slvl_table_t *port_sl2vl = &qcfg->sl2vl;
+   ib_slvl_table_t routing_sl2vl;
+
+   if (re->update_sl2vl) {
+   routing_sl2vl = *port_sl2vl;
+   re->update_sl2vl(re->context,
+p, i, j, &routing_sl2vl);
+   port_sl2vl = &routing_sl2vl;
+   }
if (sl2vl_update_table(sm, p, i, i << 8 | j,
-  force_update, &qcfg->sl2vl))
+  force_update, port_sl2vl))
ret = -1;
+   }
}
 
return ret;
@@ -246,6 +257,9 @@ static int qos_endport_setup(osm_sm_t * sm, osm_physp_t * p,
 const struct qos_config *qcfg)
 {
unsigned force_update = p->need_update || sm->p_subn->need_update;
+   struct osm_routing_engine *re = sm->p_

[PATCH v2 08/15] opensm: Update documentation to describe torus-2QoS.

2010-03-10 Thread Jim Schutt

Signed-off-by: Jim Schutt 
---
 opensm/doc/current-routing.txt |  269 +++-
 opensm/man/opensm.8.in |9 ++-
 2 files changed, 275 insertions(+), 3 deletions(-)

diff --git a/opensm/doc/current-routing.txt b/opensm/doc/current-routing.txt
index 1302860..78a2e01 100644
--- a/opensm/doc/current-routing.txt
+++ b/opensm/doc/current-routing.txt
@@ -1,7 +1,7 @@
 Current OpenSM Routing
-7/9/07
+10/9/09
 
-OpenSM offers five routing engines:
+OpenSM offers six routing engines:
 
 1.  Min Hop Algorithm - based on the minimum hops to each node where the
 path length is optimized.
@@ -28,6 +28,13 @@ two switches.  This provides deadlock free routes for 
hypercubes when
 the fabric is cabled as a hypercube and for meshes when cabled as a
 mesh (see details below).
 
+6. Torus-2QoS unicast routing algorithm - a DOR-based routing algorithm
+specialized for 2D/3D torus topologies.  Torus-2QoS provides deadlock-free
+routing while supporting two quality of service (QoS) levels.  In addition
+it is able to route around multiple failed fabric links or a single failed
+fabric switch without introducing deadlocks, and without changing path SL
+values granted before the failure.
+
 OpenSM provides an optional unicast routing cache (enabled by -A or
 --ucast_cache options). When enabled, unicast routing cache prevents
 routing recalculation (which is a heavy task in a large cluster) when
@@ -388,3 +395,261 @@ ports, one port on one end of the cable, and the other 
port on the
 other end, continuing along the mesh dimension.
 
 Use '-R dor' option to activate the DOR algorithm.
+
+Torus-2QoS Routing Algorithm
+
+
+Torus-2QoS is routing algorithm designed for large-scale 2D/3D torus fabrics.
+The torus-2QoS routing engine can provide the following functionality on
+a 2D/3D torus:
+- routing that is free of credit loops
+- two levels of QoS, assuming switches support 8 data VLs
+- ability to route around a single failed switch, and/or multiple failed
+links, without
+- introducing credit loops
+- changing path SL values
+- very short run times, with good scaling properties as fabric size
+increases
+
+Torus-2QoS is a DOR-based algorithm that avoids deadlocks that would otherwise
+occur in a torus using the concept of a dateline for each torus dimension.
+It encodes into a path SL which datelines the path crosses as follows:
+
+  sl = 0;
+  for (d = 0; d < torus_dimensions; d++)
+/* path_crosses_dateline(d) returns 0 or 1 */
+sl |= path_crosses_dateline(d) << d;
+
+For a 3D torus, that leaves one SL bit free, which torus-2QoS uses to
+implement two QoS levels.
+
+This is possible because torus-2QoS also makes use of the output port
+dependence of the switch SL2VL maps.  It computes in which torus coordinate
+direction each interswitch link "points", and writes SL2VL maps for such
+ports as follows:
+
+  for (sl = 0; sl < 16; sl ++)
+/* cdir(port) reports which torus coordinate direction a switch port
+ * "points" in, and returns 0, 1, or 2 */
+sl2vl(iport,oport,sl) = 0x1 & (sl >> cdir(oport));
+
+Thus torus-2QoS consumes 8 SL values (SL bits 0-2) and 2 VL values (VL bit 0)
+per QoS level to provide deadlock-free routing on a 3D torus.
+
+Torus-2QoS routes around link failure by "taking the long way around" any
+1D ring interrupted by a link failure.  For example, consider the 2D 6x5
+torus below, where switches are denoted by [+a-zA-Z]:
+
+||||||
+   4  --++++++--
+||||||
+   3  --+++D++--
+||||||
+   2  --++Ir++--
+||||||
+   1  --mSnTop--
+||||||
+ y=0  --++++++--
+||||||
+
+  x=012345
+
+For a pristine fabric the path from S to D would be S-n-T-r-d.  In the
+event that either link S-n or n-T has failed, torus-2QoS would use the path
+S-m-p-o-T-r-D.  Note that it can do this without changing the path SL
+value; once the 1D ring m-S-n-T-o-p-m has been broken by failure, path
+segments using it cannot contribute to deadlock, and the x-direction
+dateline (between, say, x=5 and x=0) can be ignored for path segments on
+that ring.
+
+One result of this is that torus-2QoS can route around many simultaneous
+link failures, as long as no 1D ring is broken into disjoint regions.  For
+example, if links n-T and T-o have both failed, that ring has been broken
+into two disjoint regions, T and o-p-m-S-n.  Torus-2QoS checks for such
+issues, reports if they are found, and refuses to route such fabrics.
+
+Handling a failed switch under DOR requires introducing into a path at
+least one turn that would be otherwise "illegal", i.e. not allowed by DOR
+rules.  Torus-2QoS will introduce such a turn as close as possible to the
+failed switch in order 

[PATCH v2 01/15] opensm: Prepare for routing engine input to path record SL lookup and SL2VL map setup.

2010-03-10 Thread Jim Schutt
In the event a routing engine needs to participate in SL assignment and
SL2VL map setup in order to avoid credit loops in a fabric, it will be
useful to make the routing engine context more widely available.

To this end, have osm_opensm_t save a pointer to the routing engine used,
rather than its type.  This will make the routing engine context easily
available in, e.g., sl2vl_update() and pr_rcv_get_path_parms().

Make the necessary adjustments to the code that used the old
routing_engine_used as an enum _osm_routing_engine_type.  In order to
keep the behavior where minhop was used if the configured routing engines
failed, the easiest solution was to add a pointer to osm_opensm_t which
pointed to the minhop struct osm_routing_engine.

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_opensm.h |4 ++-
 opensm/opensm/osm_console.c|   10 ++--
 opensm/opensm/osm_dump.c   |3 +-
 opensm/opensm/osm_link_mgr.c   |5 ++-
 opensm/opensm/osm_opensm.c |   43 +---
 opensm/opensm/osm_sa_path_record.c |3 +-
 opensm/opensm/osm_ucast_lash.c |3 +-
 opensm/opensm/osm_ucast_mgr.c  |   17 --
 8 files changed, 54 insertions(+), 34 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h 
b/opensm/include/opensm/osm_opensm.h
index c6c9bdb..e97142e 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -120,6 +120,7 @@ typedef enum _osm_routing_engine_type {
 *  added later.
 */
 struct osm_routing_engine {
+   osm_routing_engine_type_t type;
const char *name;
void *context;
int (*build_lid_matrices) (void *context);
@@ -183,7 +184,8 @@ typedef struct osm_opensm {
cl_dispatcher_t disp;
cl_plock_t lock;
struct osm_routing_engine *routing_engine_list;
-   osm_routing_engine_type_t routing_engine_used;
+   struct osm_routing_engine *routing_engine_used;
+   struct osm_routing_engine *default_routing_engine;
osm_stats_t stats;
osm_console_t console;
nn_map_t *node_name_map;
diff --git a/opensm/opensm/osm_console.c b/opensm/opensm/osm_console.c
index a27bee3..31394a7 100644
--- a/opensm/opensm/osm_console.c
+++ b/opensm/opensm/osm_console.c
@@ -372,6 +372,8 @@ static void print_status(osm_opensm_t * p_osm, FILE * out)
cl_list_item_t *item;
 
if (out) {
+   const char *re_str;
+
cl_plock_acquire(&p_osm->lock);
fprintf(out, "   OpenSM Version   : %s\n", 
p_osm->osm_version);
fprintf(out, "   SM State : %s\n",
@@ -380,9 +382,11 @@ static void print_status(osm_opensm_t * p_osm, FILE * out)
p_osm->subn.opt.sm_priority);
fprintf(out, "   SA State : %s\n",
sa_state_str(p_osm->sa.state));
-   fprintf(out, "   Routing Engine   : %s\n",
-   osm_routing_engine_type_str(p_osm->
-   routing_engine_used));
+
+   re_str = p_osm->routing_engine_used ?
+   
osm_routing_engine_type_str(p_osm->routing_engine_used->type) :
+   
osm_routing_engine_type_str(OSM_ROUTING_ENGINE_TYPE_NONE);
+   fprintf(out, "   Routing Engine   : %s\n", re_str);
 
fprintf(out, "   Loaded event plugins :");
if (cl_qlist_head(&p_osm->plugin_list) ==
diff --git a/opensm/opensm/osm_dump.c b/opensm/opensm/osm_dump.c
index 86e9c00..f3f4623 100644
--- a/opensm/opensm/osm_dump.c
+++ b/opensm/opensm/osm_dump.c
@@ -135,7 +135,8 @@ static void dump_ucast_routes(cl_map_item_t * item, FILE * 
file, void *cxt)
"Switch 0x%016" PRIx64 "\nLID: Port : Hops : Optimal\n",
cl_ntoh64(osm_node_get_node_guid(p_node)));
 
-   dor = (p_osm->routing_engine_used == OSM_ROUTING_ENGINE_TYPE_DOR);
+   dor = (p_osm->routing_engine_used &&
+  p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_DOR);
 
for (lid_ho = 1; lid_ho <= max_lid_ho; lid_ho++) {
fprintf(file, "0x%04X : ", lid_ho);
diff --git a/opensm/opensm/osm_link_mgr.c b/opensm/opensm/osm_link_mgr.c
index 03a585b..aaeebc7 100644
--- a/opensm/opensm/osm_link_mgr.c
+++ b/opensm/opensm/osm_link_mgr.c
@@ -64,8 +64,9 @@ static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN 
osm_physp_t * p_physp)
 
OSM_LOG_ENTER(sm->p_log);
 
-   if (p_osm->routing_engine_used != OSM_ROUTING_ENGINE_TYPE_LASH
-   || !(slid = osm_physp_get_base_lid(p_physp))) {
+   if (!(p_osm->routing_engine_used &&
+ p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH 
&&
+ (slid = osm_physp_get_base_lid(p_physp {
/* Use default SL if lash routing is not used */
OSM_LOG_EXIT(sm->p_log);
return sm->p_subn

[PATCH v2 05/15] opensm: Add struct osm_routing_engine callback to build spanning trees for multicast.

2010-03-10 Thread Jim Schutt
If a routing engine needs to compute spanning trees with special
properties, it needs a way to override the default implementation.
A routing engine callback provides that mechanism.  Routing engines
that can use the default implementation can leave the callback
pointer set to NULL.

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_opensm.h |6 ++
 opensm/opensm/osm_mcast_mgr.c  |7 ++-
 2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h 
b/opensm/include/opensm/osm_opensm.h
index 734a6db..fddcf53 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -132,6 +132,8 @@ struct osm_routing_engine {
uint8_t (*path_sl)(void *context, IN uint8_t path_sl_hint,
   IN const osm_port_t *src_port,
   IN const osm_port_t *dst_port);
+   ib_api_status_t (*mcast_build_stree)(void *context,
+IN OUT osm_mgrp_box_t *mgb);
void (*delete) (void *context);
struct osm_routing_engine *next;
 };
@@ -165,6 +167,10 @@ struct osm_routing_engine {
 *  path_sl
 *  The callback for computing path SL.
 *
+*  mcast_build_stree
+*  The callback for building the spanning tree for multicast
+*  forwarding, called per MLID.
+*
 *  delete
 *  The delete method, may be used for routing engine
 *  internals cleanup.
diff --git a/opensm/opensm/osm_mcast_mgr.c b/opensm/opensm/osm_mcast_mgr.c
index 322635d..bd67d4e 100644
--- a/opensm/opensm/osm_mcast_mgr.c
+++ b/opensm/opensm/osm_mcast_mgr.c
@@ -986,6 +986,7 @@ Exit:
 static ib_api_status_t mcast_mgr_process_mlid(osm_sm_t * sm, uint16_t mlid)
 {
ib_api_status_t status = IB_SUCCESS;
+   struct osm_routing_engine *re = sm->p_subn->p_osm->routing_engine_used;
osm_mgrp_box_t *mbox;
 
OSM_LOG_ENTER(sm->p_log);
@@ -1000,7 +1001,11 @@ static ib_api_status_t mcast_mgr_process_mlid(osm_sm_t * 
sm, uint16_t mlid)
 
mbox = osm_get_mbox_by_mlid(sm->p_subn, cl_hton16(mlid));
if (mbox) {
-   status = mcast_mgr_build_spanning_tree(sm, mbox);
+   if (re && re->mcast_build_stree)
+   status = re->mcast_build_stree(re->context, mbox);
+   else
+   status = mcast_mgr_build_spanning_tree(sm, mbox);
+
if (status != IB_SUCCESS)
OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 0A17: "
"Unable to create spanning tree (%s) for mlid "
-- 
1.6.6.1


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 06/15] opensm: Make mcast_mgr_purge_tree() available outside osm_mcast_mgr.c.

2010-03-10 Thread Jim Schutt
A routing engine that needs to compute multicast spanning trees with
special properties will need to delete old trees.  There's already
a function that does this: mcast_mgr_purge_tree().

Make it available outside osm_mcast_mgr.c, and change the name
to follow the naming convention (osm_ prefix) for global functions.

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_multicast.h |   33 +
 opensm/opensm/osm_mcast_mgr.c |4 ++--
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/opensm/include/opensm/osm_multicast.h 
b/opensm/include/opensm/osm_multicast.h
index 1da575d..df6ac6c 100644
--- a/opensm/include/opensm/osm_multicast.h
+++ b/opensm/include/opensm/osm_multicast.h
@@ -53,6 +53,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef __cplusplus
 #  define BEGIN_C_DECLS extern "C" {
@@ -193,6 +194,38 @@ osm_mgrp_t *osm_mgrp_new(IN osm_subn_t * subn, IN 
ib_net16_t mlid,
 *  Multicast Group, osm_mgrp_delete
 */
 
+/*
+ * Need a forward declaration to work around include loop:
+ * osm_sm.h <- osm_multicast.h
+ */
+struct osm_sm;
+
+/f* OpenSM: Multicast Tree/osm_purge_mtree
+* NAME
+*  osm_purge_mtree
+*
+* DESCRIPTION
+*  Frees all the nodes in a multicast spanning tree
+*
+* SYNOPSIS
+*/
+void osm_purge_mtree(IN struct osm_sm * sm, IN osm_mgrp_box_t * mgb);
+/*
+* PARAMETERS
+*  sm
+*  [in] Pointer to osm_sm_t object.
+*  mgb
+*  [in] Pointer to an osm_mgrp_box_t object.
+*
+* RETURN VALUES
+*  None.
+*
+*
+* NOTES
+*
+* SEE ALSO
+*/
+
 /f* OpenSM: Multicast Group/osm_mgrp_is_guid
 * NAME
 *  osm_mgrp_is_guid
diff --git a/opensm/opensm/osm_mcast_mgr.c b/opensm/opensm/osm_mcast_mgr.c
index bd67d4e..e6db6db 100644
--- a/opensm/opensm/osm_mcast_mgr.c
+++ b/opensm/opensm/osm_mcast_mgr.c
@@ -146,7 +146,7 @@ static void mcast_mgr_purge_tree_node(IN osm_mtree_node_t * 
p_mtn)
free(p_mtn);
 }
 
-static void mcast_mgr_purge_tree(osm_sm_t * sm, IN osm_mgrp_box_t * mbox)
+void osm_purge_mtree(osm_sm_t * sm, IN osm_mgrp_box_t * mbox)
 {
OSM_LOG_ENTER(sm->p_log);
 
@@ -735,7 +735,7 @@ static ib_api_status_t 
mcast_mgr_build_spanning_tree(osm_sm_t * sm,
   on multicast forwarding table information if the user wants to
   preserve existing multicast routes.
 */
-   mcast_mgr_purge_tree(sm, mbox);
+   osm_purge_mtree(sm, mbox);
 
/* build the first "subset" containing all member ports */
if (make_port_list(&port_list, mbox)) {
-- 
1.6.6.1


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 04/15] opensm: Track the minimum value in the fabric of data VLs supported.

2010-03-10 Thread Jim Schutt
A routing engine that wants to make contributions to SL2VL maps in support
of routing free from credit loops may need to know the minimum number
of supported data VLs in the fabric.

This code tracks that value.

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_subnet.h |1 +
 opensm/opensm/osm_port_info_rcv.c  |   13 -
 opensm/opensm/osm_state_mgr.c  |6 ++
 opensm/opensm/osm_subnet.c |1 +
 4 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h 
b/opensm/include/opensm/osm_subnet.h
index 3970e98..d74a57c 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -520,6 +520,7 @@ typedef struct osm_subn {
uint16_t max_mcast_lid_ho;
uint8_t min_ca_mtu;
uint8_t min_ca_rate;
+   uint8_t min_data_vls;
boolean_t ignore_existing_lfts;
boolean_t subnet_initialization_error;
boolean_t force_heavy_sweep;
diff --git a/opensm/opensm/osm_port_info_rcv.c 
b/opensm/opensm/osm_port_info_rcv.c
index 9260047..c05301e 100644
--- a/opensm/opensm/osm_port_info_rcv.c
+++ b/opensm/opensm/osm_port_info_rcv.c
@@ -83,6 +83,7 @@ static void pi_rcv_process_endport(IN osm_sm_t * sm, IN 
osm_physp_t * p_physp,
ib_api_status_t status;
ib_net64_t port_guid;
uint8_t rate, mtu;
+   unsigned data_vls;
cl_qmap_t *p_sm_tbl;
osm_remote_sm_t *p_sm;
 
@@ -92,7 +93,7 @@ static void pi_rcv_process_endport(IN osm_sm_t * sm, IN 
osm_physp_t * p_physp,
 
/* HACK extended port 0 should be handled too! */
if (osm_physp_get_port_num(p_physp) != 0) {
-   /* track the minimal endport MTU and rate */
+   /* track the minimal endport MTU, rate, and operational VLs */
mtu = ib_port_info_get_mtu_cap(p_pi);
if (mtu < sm->p_subn->min_ca_mtu) {
OSM_LOG(sm->p_log, OSM_LOG_VERBOSE,
@@ -108,6 +109,16 @@ static void pi_rcv_process_endport(IN osm_sm_t * sm, IN 
osm_physp_t * p_physp,
PRIx64 "\n", rate, cl_ntoh64(port_guid));
sm->p_subn->min_ca_rate = rate;
}
+
+   data_vls = 1U << (ib_port_info_get_op_vls(p_pi) - 1);
+   if (data_vls >= IB_MAX_NUM_VLS)
+   data_vls = IB_MAX_NUM_VLS - 1;
+   if ((uint8_t)data_vls < sm->p_subn->min_data_vls) {
+   OSM_LOG(sm->p_log, OSM_LOG_VERBOSE,
+   "Setting endport minimal data VLs to:%u defined 
by port:0x%"
+   PRIx64 "\n", data_vls, cl_ntoh64(port_guid));
+   sm->p_subn->min_data_vls = data_vls;
+   }
}
 
if (port_guid != sm->p_subn->sm_port_guid) {
diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index 6fcccba..96ad348 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1164,6 +1164,12 @@ repeat_discovery:
sm->p_subn->force_reroute = FALSE;
sm->p_subn->subnet_initialization_error = FALSE;
 
+   /* Reset tracking values in case limiting component got removed
+* from fabric. */
+   sm->p_subn->min_ca_mtu = IB_MAX_MTU;
+   sm->p_subn->min_ca_rate = IB_MAX_RATE;
+   sm->p_subn->min_data_vls = IB_MAX_NUM_VLS - 1;
+
/* rescan configuration updates */
if (!config_parsed && osm_subn_rescan_conf_files(sm->p_subn) < 0)
OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 331A: "
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index e4126bc..55b9384 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -525,6 +525,7 @@ ib_api_status_t osm_subn_init(IN osm_subn_t * p_subn, IN 
osm_opensm_t * p_osm,
p_subn->max_mcast_lid_ho = IB_LID_MCAST_END_HO;
p_subn->min_ca_mtu = IB_MAX_MTU;
p_subn->min_ca_rate = IB_MAX_RATE;
+   p_subn->min_data_vls = IB_MAX_NUM_VLS - 1;
p_subn->ignore_existing_lfts = TRUE;
 
/* we assume master by default - so we only need to set it true if 
STANDBY */
-- 
1.6.6.1


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kitten - mlx4: Unhandled interrupt - owner bit

2010-03-10 Thread Eli Cohen
On Wed, Mar 10, 2010 at 04:03:26PM +0100, Fredrik Unger wrote:
> 
> When investigating the error it seems to stem from next_eqe_sw in 
> drivers/net/mlx4/eq.c
> called by the interrupt handler.
> What happens is that (eqe->owner & 0x80) is true causing the routine to return
> NULL resulting in an unhandled interrupt (eg the interrupt routine returns 0)

Please note that the condition is a bit more complicated. I quote the
whole function:

static struct mlx4_eqe *next_eqe_sw(struct mlx4_eq *eq)
{
struct mlx4_eqe *eqe = get_eqe(eq, eq->cons_index);
return !!(eqe->owner & 0x80) ^ !!(eq->cons_index & eq->nent) ? NULL : 
eqe;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 2/2] mlx4/IB: Add support for enhanced atomic operations

2010-03-10 Thread Vladimir Sokolovsky
Added support for masked atomic operations:
- Masked Compare and Swap
- Masked Fetch and Add

Signed-off-by: Vladimir Sokolovsky 
---
 drivers/infiniband/hw/mlx4/cq.c   |8 
 drivers/infiniband/hw/mlx4/main.c |3 ++-
 drivers/infiniband/hw/mlx4/qp.c   |   27 +++
 include/linux/mlx4/device.h   |4 ++--
 include/linux/mlx4/qp.h   |7 +++
 5 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index de5263b..783eae7 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -660,6 +660,14 @@ repoll:
wc->opcode= IB_WC_FETCH_ADD;
wc->byte_len  = 8;
break;
+   case MLX4_OPCODE_MASKED_ATOMIC_CS:
+   wc->opcode= IB_WC_MASKED_COMP_SWAP;
+   wc->byte_len  = 8;
+   break;
+   case MLX4_OPCODE_MASKED_ATOMIC_FA:
+   wc->opcode= IB_WC_MASKED_FETCH_ADD;
+   wc->byte_len  = 8;
+   break;
case MLX4_OPCODE_BIND_MW:
wc->opcode= IB_WC_BIND_MW;
break;
diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index e596537..e72afd9 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -92,7 +92,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
IB_DEVICE_PORT_ACTIVE_EVENT |
IB_DEVICE_SYS_IMAGE_GUID|
IB_DEVICE_RC_RNR_NAK_GEN|
-   IB_DEVICE_BLOCK_MULTICAST_LOOPBACK;
+   IB_DEVICE_BLOCK_MULTICAST_LOOPBACK  |
+   IB_DEVICE_MASKED_ATOMIC;
if (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_BAD_PKEY_CNTR)
props->device_cap_flags |= IB_DEVICE_BAD_PKEY_CNTR;
if (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_BAD_QKEY_CNTR)
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 2a97c96..2ae8c81 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -84,6 +84,8 @@ static const __be32 mlx4_ib_opcode[] = {
[IB_WR_SEND_WITH_INV]   = cpu_to_be32(MLX4_OPCODE_SEND_INVAL),
[IB_WR_LOCAL_INV]   = cpu_to_be32(MLX4_OPCODE_LOCAL_INVAL),
[IB_WR_FAST_REG_MR] = cpu_to_be32(MLX4_OPCODE_FMR),
+   [IB_WR_MASKED_ATOMIC_CMP_AND_SWP]   = 
cpu_to_be32(MLX4_OPCODE_MASKED_ATOMIC_CS),
+   [IB_WR_MASKED_ATOMIC_FETCH_AND_ADD] = 
cpu_to_be32(MLX4_OPCODE_MASKED_ATOMIC_FA),
 };
 
 static struct mlx4_ib_sqp *to_msqp(struct mlx4_ib_qp *mqp)
@@ -1406,6 +1408,9 @@ static void set_atomic_seg(struct mlx4_wqe_atomic_seg 
*aseg, struct ib_send_wr *
if (wr->opcode == IB_WR_ATOMIC_CMP_AND_SWP) {
aseg->swap_add = cpu_to_be64(wr->wr.atomic.swap);
aseg->compare  = cpu_to_be64(wr->wr.atomic.compare_add);
+   } else if (wr->opcode == IB_WR_MASKED_ATOMIC_FETCH_AND_ADD) {
+   aseg->swap_add = cpu_to_be64(wr->wr.atomic.compare_add);
+   aseg->compare  = cpu_to_be64(wr->wr.atomic.compare_add_mask);
} else {
aseg->swap_add = cpu_to_be64(wr->wr.atomic.compare_add);
aseg->compare  = 0;
@@ -1413,6 +1418,14 @@ static void set_atomic_seg(struct mlx4_wqe_atomic_seg 
*aseg, struct ib_send_wr *
 
 }
 
+static void set_mask_atomic_seg(struct mlx4_wqe_mask_atomic_seg *aseg, struct 
ib_send_wr *wr)
+{
+   aseg->swap_add = cpu_to_be64(wr->wr.atomic.swap);
+   aseg->swap_add_mask = cpu_to_be64(wr->wr.atomic.swap_mask);
+   aseg->compare  = cpu_to_be64(wr->wr.atomic.compare_add);
+   aseg->compare_mask = cpu_to_be64(wr->wr.atomic.compare_add_mask);
+}
+
 static void set_datagram_seg(struct mlx4_wqe_datagram_seg *dseg,
 struct ib_send_wr *wr)
 {
@@ -1566,6 +1579,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
switch (wr->opcode) {
case IB_WR_ATOMIC_CMP_AND_SWP:
case IB_WR_ATOMIC_FETCH_AND_ADD:
+   case IB_WR_MASKED_ATOMIC_FETCH_AND_ADD:
set_raddr_seg(wqe, wr->wr.atomic.remote_addr,
  wr->wr.atomic.rkey);
wqe  += sizeof (struct mlx4_wqe_raddr_seg);
@@ -1578,6 +1592,19 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
 
break;
 
+   case IB_WR_MASKED_ATOMIC_CMP_AND_SWP:
+   set_raddr_seg(wqe, wr->wr.atomic.remote_addr,
+ wr->wr.atomic.rkey);
+   wqe  += sizeof (struct 

[PATCH V3 1/2] IB/core: Add support for enhanced atomic operations

2010-03-10 Thread Vladimir Sokolovsky
- Add a new IB_WR_MASKED_ATOMIC_CMP_AND_SWP and 
IB_WR_MASKED_ATOMIC_FETCH_AND_ADD send
opcodes that can be used to mark a "masked atomic compare and swap" and
"masked atomic fetch and add" work request correspondingly.
- Add IB_DEVICE_MASKED_ATOMIC capability bit.
- Add mask fields to atomic struct of ib_send_wr
- Add new opcodes to ib_wc_opcode

Masked Compare and Swap (MskCmpSwap)
The MskCmpSwap atomic operation is an extension to the CmpSwap operation
defined in the IB spec. MskCmpSwap allows the user to select a portion of the
64 bit target data for the “compare” check as well as to restrict the swap to a
(possibly different) portion. The pseudo code below describes the operation:

| atomic_response = *va
| if (!((compare_add ^ *va) & compare_add_mask)) then
| *va = (*va & ~(swap_mask)) | (swap & swap_mask)
|
| return atomic_response

The additional operands are carried in the Extended Transport Header. Atomic
response generation and packet format for MskCmpSwap is as for standard IB
Atomic operations.

Masked Fetch and Add (MFetchAdd)
The MFetchAdd Atomic operation extends the functionality of the standard IB
FetchAdd by allowing the user to split the target into multiple fields of
selectable length. The atomic add is done independently on each one of this
fields. A bit set in the field_boundary parameter specifies the field
boundaries. The pseudo code below describes the operation:

| bit_adder(ci, b1, b2, *co)
| {
|   value = ci + b1 + b2
|   *co = !!(value & 2)
|
|   return value & 1
| }
|
| #define MASK_IS_SET(mask, attr)  (!!((mask)&(attr)))
| bit_position = 1
| carry = 0
| atomic_response = 0
|
| for i = 0 to 63
| {
| if ( i != 0 )
| bit_position =  bit_position << 1
|
| bit_add_res = bit_adder(carry, MASK_IS_SET(*va, bit_position), 
MASK_IS_SET(compare_add, bit_position), &new_carry)
| if (bit_add_res)
| atomic_response |= bit_position
|
| carry = ((new_carry) && (!MASK_IS_SET(compare_add_mask, 
bit_position)))
| }
|
| return atomic_response

Signed-off-by: Vladimir Sokolovsky 
---
 include/rdma/ib_verbs.h |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 09509ed..53b16a6 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -105,6 +105,7 @@ enum ib_device_cap_flags {
IB_DEVICE_UD_TSO= (1<<19),
IB_DEVICE_MEM_MGT_EXTENSIONS= (1<<21),
IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (1<<22),
+   IB_DEVICE_MASKED_ATOMIC = (1<<23),
 };
 
 enum ib_atomic_cap {
@@ -467,6 +468,8 @@ enum ib_wc_opcode {
IB_WC_LSO,
IB_WC_LOCAL_INV,
IB_WC_FAST_REG_MR,
+   IB_WC_MASKED_COMP_SWAP,
+   IB_WC_MASKED_FETCH_ADD,
 /*
  * Set value of IB_WC_RECV so consumers can test if a completion is a
  * receive by testing (opcode & IB_WC_RECV).
@@ -689,6 +692,8 @@ enum ib_wr_opcode {
IB_WR_RDMA_READ_WITH_INV,
IB_WR_LOCAL_INV,
IB_WR_FAST_REG_MR,
+   IB_WR_MASKED_ATOMIC_CMP_AND_SWP,
+   IB_WR_MASKED_ATOMIC_FETCH_AND_ADD,
 };
 
 enum ib_send_flags {
@@ -731,6 +736,8 @@ struct ib_send_wr {
u64 remote_addr;
u64 compare_add;
u64 swap;
+   u64 compare_add_mask;
+   u64 swap_mask;
u32 rkey;
} atomic;
struct {
-- 
1.6.6.GIT

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 0/2] Add support for enhanced atomic operations

2010-03-10 Thread Vladimir Sokolovsky
Hi Roland,

This patchset adds support for the following enhanced atomic
operations:
- Masked atomic compare and swap
- Masked atomic fetch and add

These operations enable using a smaller amount of memory when using
multiple locks by using portions of a 64 bit value in an atomic
operation.
For some applications the memory savings are very significant. One
example is fine grain lock implementations for huge data sets. In
other cases, the benefit is the ability to update multiple fields with
a single io operation.

Vladimir Sokolovsky(2):
IB/core: Add support for enhanced atomic operations
mlx4/IB: Add support for enhanced atomic operations

changes from V2:
- patch #1: 
  Updated description
  Renamed:
IB_WR_ATOMIC_MASKED_CMP_AND_SWP -> IB_WR_MASKED_ATOMIC_CMP_AND_SWP
IB_WR_ATOMIC_MASKED_FETCH_AND_ADD -> IB_WR_MASKED_ATOMIC_FETCH_AND_ADD
  In the ib_send_wr struct the new fields added before the rkey field

- patch #2:
  Set IB_DEVICE_MASKED_ATOMIC flag with other flags that get set for
  all devices

Regards,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPoIB issues

2010-03-10 Thread Moni Shoua
Eli Cohen wrote:
> I just posted a patch which might fix your problem. Please try it and
> let us know if it fixed anything.
> 
Hi Eli
Although Josh already reported that the patch seems to fix the issue I have a 
question though.

"post_send failed" prints were during work in datagram mode. I don't know if 
Josh verified 
that but I don't expect that these prints would go away, even with the patch. 
Am I right?

BTW, what could be the reason for UD QP post_send() failures?

>>
>> In datagram mode, I see errors on the boot servers of the form.
>>
>> ib0: post_send failed
>> ib0: post_send failed
>> ib0: post_send failed
>>
>>
>> When using connected mode, I hit a different error:
>>
>> NETDEV WATCHDOG: ib0: transmit timed out
>> ib0: transmit timeout: latency 1999 msecs
>> ib0: queue stopped 1, tx_head 2154042680, tx_tail 2154039464
>> NETDEV WATCHDOG: ib0: transmit timed out
>> ib0: transmit timeout: latency 2999 msecs
>> ib0: queue stopped 1, tx_head 2154042680, tx_tail 2154039464
>> ...
>> ...
>> NETDEV WATCHDOG: ib0: transmit timed out
>> ib0: transmit timeout: latency 61824999 msecs
>> ib0: queue stopped 1, tx_head 2154042680, tx_tail 2154039464
>>
>>
>> The errors seem to hit only after NFS comes into play.  Once it
>> starts, the NETDEV WATCHDOG messages continue until I run
>> 'ifconfig ib0 down up'.  I've tried tuning send_queue_size and
>> recv_queue_size on both sides, the txqueuelen of the ib0 interface, the
>> NFS rsize/wsize.  None of it seems to help greatly.  Does anyone have
>> any ideas about what can I do to try to fix
>> these problems?
>>
>> -JE
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kitten - mlx4: Unhandled interrupt - owner bit

2010-03-10 Thread Fredrik Unger
Hi,

I am new to this list, and if my question is misplaced
please suggest a better forum on or off-list.

We are using InfiniBand (core & mlx4 of OFED 1.4.1 + OFED kernel patches)
in a light weight kernel named kitten, partially derived from linux.
http://code.google.com/p/kitten/

We see problems of one or two unhandled interrupts when doing RDMA_READ
data transfers with mlx4 cards.  (SEND and RDMA_WRITE works well)
It appears only with larger messages 1-4 Mb.
write-combining is turned off.

Below a pingpong test - 1000 iterations per messages size:
ex.
<8>(init_task)   Size Average  Stddev Min  Median   
  Max
...
<8>(init_task) 524288  271.797.09  138.96  271.51  
429.24
<4>irq_dispatch: Unhandled interrupt 74 (4a) [Owner]
<8>(init_task)1048576  569.99  981.73  272.01  537.56
31581.67
<8>(init_task)2097152 1070.57   28.95  537.88 1069.66 
1779.97
<8>(init_task)4194304 2135.99   52.86 1070.10 2134.70 
3124.28

This error is random and appears in about one of three runs. Note the high max
value for one 1Mb message, as I guess the connection recovers.

When investigating the error it seems to stem from next_eqe_sw in 
drivers/net/mlx4/eq.c
called by the interrupt handler.
What happens is that (eqe->owner & 0x80) is true causing the routine to return
NULL resulting in an unhandled interrupt (eg the interrupt routine returns 0)

My understanding is that when the interrupt gets flagged the card would
have given the eqe (event queue entry?) to the software, but it could very well 
be more complex.

The same message can be seen when starting the driver, but it does not cause 
any problems :
<6>mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
<4>irq_dispatch: Unhandled interrupt 74 (4a) [Owner]
  x 16

This problem could not be reproduced under linux so far.
The kitten interrupt handler is simple and just forwards the interrupt to the 
driver.

What does owner in the eqe struct mean ? Hardware or Software owns the entry ?
Has this bug been seen in Linux, even if we were not able to reproduce it ?
Can I get more debug information from the card ?
Any tips to what could go wrong in this context ? Are we missing some setup ?


Sincerely,

Fredrik Unger
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] opensm: EPI new event for duplicated node guid

2010-03-10 Thread Slava Strebkov
Added new event for plugin reporting duplicated guid.

Signed-off-by: Slava Strebkov 
---
 opensm/include/opensm/osm_event_plugin.h |1 +
 opensm/opensm/osm_node_info_rcv.c|6 ++
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/opensm/include/opensm/osm_event_plugin.h 
b/opensm/include/opensm/osm_event_plugin.h
index 33d1920..7c4ed93 100644
--- a/opensm/include/opensm/osm_event_plugin.h
+++ b/opensm/include/opensm/osm_event_plugin.h
@@ -72,6 +72,7 @@ typedef enum {
OSM_EVENT_ID_PORT_SELECT,
OSM_EVENT_ID_TRAP,
OSM_EVENT_ID_SUBNET_UP,
+   OSM_EVENT_ID_DUPLICATED_GUID,
OSM_EVENT_ID_MAX
 } osm_epi_event_id_t;
 
diff --git a/opensm/opensm/osm_node_info_rcv.c 
b/opensm/opensm/osm_node_info_rcv.c
index b3e272c..db97df6 100644
--- a/opensm/opensm/osm_node_info_rcv.c
+++ b/opensm/opensm/osm_node_info_rcv.c
@@ -68,6 +68,7 @@ static void report_duplicated_guid(IN osm_sm_t * sm, 
osm_physp_t * p_physp,
 {
osm_physp_t *p_old, *p_new;
osm_dr_path_t path;
+   osm_epi_pe_event_t epi_pe_data;
 
p_old = p_physp->p_remote_physp;
p_new = osm_node_get_physp_ptr(p_neighbor_node, port_num);
@@ -82,6 +83,11 @@ static void report_duplicated_guid(IN osm_sm_t * sm, 
osm_physp_t * p_physp,
cl_ntoh64(p_old->p_node->node_info.node_guid), p_old->port_num,
cl_ntoh64(p_new->p_node->node_info.node_guid), p_new->port_num);
 
+   osm_epi_create_port_id(&epi_pe_data.port_id,
+   p_physp->p_node->node_info.node_guid, p_physp->port_num,
+   (char*)p_physp->p_node->node_desc.description);
+   osm_opensm_report_event(sm->p_subn->p_osm,
+   OSM_EVENT_ID_DUPLICATED_GUID, &epi_pe_data);
osm_dump_dr_path(sm->p_log, osm_physp_get_dr_path_ptr(p_physp),
 OSM_LOG_ERROR);
 
-- 
1.6.3.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html