Re: [ewg] [PATCH v4] IB Core: RAW ETH support

2010-06-15 Thread Roland Dreier
 > I tested it before on the Roland tree ( iboe branch ) and it fails,
 > because it writen in the way suitable for OFED. If adapt the patch to
 > the Roland tree, then appling Mellanox OFED patches will fail, because
 > it changes the same functions in the code.
 > Here is one example:
 > Look at __mlx4_ib_modify_qp at the Roland tree - there is no RAW_ETY
 > support. But in the OFED version of the same function this support is
 > present.
 > RAW_ETH patch modify this function and looking for RAW_ETY word and
 > without this RAW_ETH Mellanox patch will fail.

Don't take this too personally -- I picked a semi-random email in this
thread to reply to; this is pretty broadly targeted.



What the hell is the thinking behind introducing IB_QPT_RAW_ETH?  You're
inserting an enum value before IB_QPT_RAW_ETY, so any old userspace
passing in IB_QPT_RAW_ETY will silently get different behavior depending
on the kernel version.  And you're creating two constands that differ in
a single letter (IB_QPT_RAW_ETY vs. IB_QPT_RAW_ETH).  How are you going
to explain that to users?  How is anyone ever going to get it right?
For that matter, what exactly does IB_QPT_RAW_ETH mean?

This all seems to be a symptom of how broken our development process
is.  Yes, unfortunately I can't spend as much time reviewing and
applying patches as I might like, and I apologize for that.  But if we
have all the RDMA developers piling up shit in their little area and
then sending it on to be merged as soon as it kind of works, without
thinking about design or maintainability and without ever doing any
review, then I'm always going to have an expanding review backlog.

And then we have OFED compounding problems -- "Oh that's a nice pile of
shit you've built there.  We better ship it to users while it's still
steaming."  How about if OFED developers take a little time to think
things through?



In other words, can someone explain the plan for this raw QP stuff to me?

 - R.
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] ofa_kernel/infiniband node description patch

2010-06-15 Thread Mike Heinz
Well, the feedback from you was about making sure the description was null 
terminated.

This *is* settable through sysfs, and still is even when the patch is applied. 
The problem is that the current model is to set the description once, at boot 
time, through an init script. This will often cause the description to be set 
incorrectly, because the host name has not been set at the time the script is 
run.

The reason I changed the default behavior for the various HCAs was because it 
simply seems like a smarter default behavior than simply setting it to the 
model of the HCA.

So, basically, if you have an init script to set the node descriptions it will 
still work - but this patch makes it unlikely you will need such a script in 
the first place.

-Original Message-
From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com] 
Sent: Tuesday, June 15, 2010 4:10 PM
To: Mike Heinz
Cc: linux-rdma@vger.kernel.org; Or Gerlitz
Subject: Re: [PATCH v2] ofa_kernel/infiniband node description patch

On Tue, Jun 15, 2010 at 02:35:35PM -0500, Mike Heinz wrote:

> This updated patch incorporates feedback from Jason Gunthorpe and Or Gerlitz.

It does? :)

I admit this is puzzling to me, isn't this settable via sysfs?

On the other hand, it does seem quite wrong to me to use utsname in a
static way like it was before, so this is an improvement..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] ofa_kernel/infiniband node description patch

2010-06-15 Thread Jason Gunthorpe
On Tue, Jun 15, 2010 at 02:35:35PM -0500, Mike Heinz wrote:

> This updated patch incorporates feedback from Jason Gunthorpe and Or Gerlitz.

It does? :)

I admit this is puzzling to me, isn't this settable via sysfs?

On the other hand, it does seem quite wrong to me to use utsname in a
static way like it was before, so this is an improvement..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 17/17] opensm: Cause status of unicast routing attempt to propogate to callers of osm_ucast_mgr_process().

2010-06-15 Thread Jim Schutt
If unicast routing fails, there is no point to continuing with fabric bring-up.
Just restart a new heavy sweep instead.

Signed-off-by: Jim Schutt 
---
 opensm/opensm/osm_state_mgr.c |   12 +---
 opensm/opensm/osm_ucast_mgr.c |   14 +-
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index 762bb27..422f3a2 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1140,7 +1140,11 @@ static void do_sweep(osm_sm_t * sm)
/* Re-program the switches fully */
sm->p_subn->ignore_existing_lfts = TRUE;
 
-   osm_ucast_mgr_process(&sm->ucast_mgr);
+   if (osm_ucast_mgr_process(&sm->ucast_mgr)) {
+   OSM_LOG_MSG_BOX(sm->p_log, OSM_LOG_VERBOSE,
+   "REROUTE FAILED");
+   return;
+   }
osm_qos_setup(sm->p_subn->p_osm);
 
/* Reset flag */
@@ -1299,12 +1303,14 @@ repeat_discovery:
"LID ASSIGNMENT COMPLETE - STARTING SWITCH TABLE 
CONFIG");
 
/*
-* Proceed with unicast forwarding table configuration.
+* Proceed with unicast forwarding table configuration; if it fails
+* return early to wait for a trap or the next sweep interval.
 */
 
if (!sm->ucast_mgr.cache_valid ||
osm_ucast_cache_process(&sm->ucast_mgr))
-   osm_ucast_mgr_process(&sm->ucast_mgr);
+   if (osm_ucast_mgr_process(&sm->ucast_mgr))
+   return;
 
osm_qos_setup(sm->p_subn->p_osm);
 
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index f5a715f..85495eb 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -1069,6 +1069,7 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
osm_opensm_t *p_osm;
struct osm_routing_engine *p_routing_eng;
cl_qmap_t *p_sw_guid_tbl;
+   int failed = 0;
 
OSM_LOG_ENTER(p_mgr->p_log);
 
@@ -1087,7 +1088,8 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
 
p_osm->routing_engine_used = NULL;
while (p_routing_eng) {
-   if (!ucast_mgr_route(p_routing_eng, p_osm))
+   failed = ucast_mgr_route(p_routing_eng, p_osm);
+   if (!failed)
break;
p_routing_eng = p_routing_eng->next;
}
@@ -1098,9 +1100,11 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
struct osm_routing_engine *r = p_osm->default_routing_engine;
 
r->build_lid_matrices(r->context);
-   r->ucast_build_fwd_tables(r->context);
-   p_osm->routing_engine_used = r;
-   osm_ucast_mgr_set_fwd_tables(p_mgr);
+   failed = r->ucast_build_fwd_tables(r->context);
+   if (!failed) {
+   p_osm->routing_engine_used = r;
+   osm_ucast_mgr_set_fwd_tables(p_mgr);
+   }
}
 
if (p_osm->routing_engine_used) {
@@ -1120,7 +1124,7 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
 Exit:
CL_PLOCK_RELEASE(p_mgr->p_lock);
OSM_LOG_EXIT(p_mgr->p_log);
-   return 0;
+   return failed;
 }
 
 static int ucast_build_lid_matrices(void *context)
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 16/17] opensm: Avoid havoc in dump_ucast_routes() caused by torus-2QoS persistent use of osm_port_t:priv.

2010-06-15 Thread Jim Schutt
Torus-2QoS makes persistent use of osm_port_t:priv to speed calculation
of path SL values.

However, osm_switch_recommend_path() uses a non-NULL osm_port_t:priv
as a flag that osm_port_t:priv holds a tracking array used when
LMC > 0.  It turns out that 1) dump_ucast_routes() does not need
osm_switch_recommend_path() to consider alternate routes, and 2)
before the addition of torus-2QoS, osm_port_t:priv use never
persisted past the unicast routing function, so it was always
NULL on entry to dump_ucast_routes().

Fix this up by making the routing_for_lmc flag explicitly set by
the caller of osm_switch_recommend_path(), rather than inferring
it from osm_port_t:priv.  This retains existing behavior for
existing routing engines, and allows torus-2QoS to make persistent
use of osm_port_t:priv.

The alternative would be to add another member to osm_port_t,
say osm_port_t:priv2.

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_switch.h |   12 
 opensm/opensm/osm_dump.c   |2 +-
 opensm/opensm/osm_switch.c |7 ---
 opensm/opensm/osm_ucast_mgr.c  |1 +
 4 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/opensm/include/opensm/osm_switch.h 
b/opensm/include/opensm/osm_switch.h
index 51a8427..f407dd9 100644
--- a/opensm/include/opensm/osm_switch.h
+++ b/opensm/include/opensm/osm_switch.h
@@ -918,6 +918,7 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * 
p_sw,
  IN osm_port_t * p_port, IN uint16_t lid_ho,
  IN unsigned start_from,
  IN boolean_t ignore_existing,
+ IN boolean_t routing_for_lmc,
  IN boolean_t dor);
 /*
 * PARAMETERS
@@ -940,6 +941,17 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * 
p_sw,
 *  If false, the switch will choose an existing route if one
 *  exists, otherwise will choose the optimal route.
 *
+*  routing_for_lmc
+*  [in] We support an enhanced LMC aware routing mode:
+*  In the case of LMC > 0, we can track the remote side
+*  system and node for all of the lids of the target
+*  and try and avoid routing again through the same
+*  system / node.
+*
+*  Assume if routing_for_lmc is TRUE that this procedure
+*  was provided with the tracking array and counter via
+*  p_port->priv, and we can conduct this algorithm.
+*
 *  dor
 *  [in] If TRUE, Dimension Order Routing will be done.
 *
diff --git a/opensm/opensm/osm_dump.c b/opensm/opensm/osm_dump.c
index bfff1a0..535a03f 100644
--- a/opensm/opensm/osm_dump.c
+++ b/opensm/opensm/osm_dump.c
@@ -221,7 +221,7 @@ static void dump_ucast_routes(cl_map_item_t * item, FILE * 
file, void *cxt)
/* No LMC Optimization */
best_port = osm_switch_recommend_path(p_sw, p_port,
  lid_ho, 1, TRUE,
- dor);
+ FALSE, dor);
fprintf(file, "No %u hop path possible via port %u!",
best_hops, best_port);
}
diff --git a/opensm/opensm/osm_switch.c b/opensm/opensm/osm_switch.c
index b621852..9785a9d 100644
--- a/opensm/opensm/osm_switch.c
+++ b/opensm/opensm/osm_switch.c
@@ -216,6 +216,7 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * 
p_sw,
  IN osm_port_t * p_port, IN uint16_t lid_ho,
  IN unsigned start_from,
  IN boolean_t ignore_existing,
+ IN boolean_t routing_for_lmc,
  IN boolean_t dor)
 {
/*
@@ -225,10 +226,10 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * 
p_sw,
   and try and avoid routing again through the same
   system / node.
 
-  If this procedure is provided with the tracking array
-  and counter we can conduct this algorithm.
+  Assume if routing_for_lmc is true that this procedure was
+  provided the tracking array and counter via p_port->priv,
+  and we can conduct this algorithm.
 */
-   boolean_t routing_for_lmc = (p_port->priv != NULL);
uint16_t base_lid;
uint8_t hops;
uint8_t least_hops;
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index e6e40f0..f5a715f 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -252,6 +252,7 @@ static void ucast_mgr_process_port(IN osm_ucast_mgr_t * 
p_mgr,
 */
port = osm_switch_recommend_path(p_sw, p_port, lid_ho, start_from,
 p_mgr->p_subn->ignore_

[PATCH v3 10/17] opensm: Update documentation to describe torus-2QoS.

2010-06-15 Thread Jim Schutt

Signed-off-by: Jim Schutt 
---
 opensm/doc/current-routing.txt |  269 +++-
 opensm/man/opensm.8.in |9 ++-
 2 files changed, 275 insertions(+), 3 deletions(-)

diff --git a/opensm/doc/current-routing.txt b/opensm/doc/current-routing.txt
index 1302860..78a2e01 100644
--- a/opensm/doc/current-routing.txt
+++ b/opensm/doc/current-routing.txt
@@ -1,7 +1,7 @@
 Current OpenSM Routing
-7/9/07
+10/9/09
 
-OpenSM offers five routing engines:
+OpenSM offers six routing engines:
 
 1.  Min Hop Algorithm - based on the minimum hops to each node where the
 path length is optimized.
@@ -28,6 +28,13 @@ two switches.  This provides deadlock free routes for 
hypercubes when
 the fabric is cabled as a hypercube and for meshes when cabled as a
 mesh (see details below).
 
+6. Torus-2QoS unicast routing algorithm - a DOR-based routing algorithm
+specialized for 2D/3D torus topologies.  Torus-2QoS provides deadlock-free
+routing while supporting two quality of service (QoS) levels.  In addition
+it is able to route around multiple failed fabric links or a single failed
+fabric switch without introducing deadlocks, and without changing path SL
+values granted before the failure.
+
 OpenSM provides an optional unicast routing cache (enabled by -A or
 --ucast_cache options). When enabled, unicast routing cache prevents
 routing recalculation (which is a heavy task in a large cluster) when
@@ -388,3 +395,261 @@ ports, one port on one end of the cable, and the other 
port on the
 other end, continuing along the mesh dimension.
 
 Use '-R dor' option to activate the DOR algorithm.
+
+Torus-2QoS Routing Algorithm
+
+
+Torus-2QoS is routing algorithm designed for large-scale 2D/3D torus fabrics.
+The torus-2QoS routing engine can provide the following functionality on
+a 2D/3D torus:
+- routing that is free of credit loops
+- two levels of QoS, assuming switches support 8 data VLs
+- ability to route around a single failed switch, and/or multiple failed
+links, without
+- introducing credit loops
+- changing path SL values
+- very short run times, with good scaling properties as fabric size
+increases
+
+Torus-2QoS is a DOR-based algorithm that avoids deadlocks that would otherwise
+occur in a torus using the concept of a dateline for each torus dimension.
+It encodes into a path SL which datelines the path crosses as follows:
+
+  sl = 0;
+  for (d = 0; d < torus_dimensions; d++)
+/* path_crosses_dateline(d) returns 0 or 1 */
+sl |= path_crosses_dateline(d) << d;
+
+For a 3D torus, that leaves one SL bit free, which torus-2QoS uses to
+implement two QoS levels.
+
+This is possible because torus-2QoS also makes use of the output port
+dependence of the switch SL2VL maps.  It computes in which torus coordinate
+direction each interswitch link "points", and writes SL2VL maps for such
+ports as follows:
+
+  for (sl = 0; sl < 16; sl ++)
+/* cdir(port) reports which torus coordinate direction a switch port
+ * "points" in, and returns 0, 1, or 2 */
+sl2vl(iport,oport,sl) = 0x1 & (sl >> cdir(oport));
+
+Thus torus-2QoS consumes 8 SL values (SL bits 0-2) and 2 VL values (VL bit 0)
+per QoS level to provide deadlock-free routing on a 3D torus.
+
+Torus-2QoS routes around link failure by "taking the long way around" any
+1D ring interrupted by a link failure.  For example, consider the 2D 6x5
+torus below, where switches are denoted by [+a-zA-Z]:
+
+||||||
+   4  --++++++--
+||||||
+   3  --+++D++--
+||||||
+   2  --++Ir++--
+||||||
+   1  --mSnTop--
+||||||
+ y=0  --++++++--
+||||||
+
+  x=012345
+
+For a pristine fabric the path from S to D would be S-n-T-r-d.  In the
+event that either link S-n or n-T has failed, torus-2QoS would use the path
+S-m-p-o-T-r-D.  Note that it can do this without changing the path SL
+value; once the 1D ring m-S-n-T-o-p-m has been broken by failure, path
+segments using it cannot contribute to deadlock, and the x-direction
+dateline (between, say, x=5 and x=0) can be ignored for path segments on
+that ring.
+
+One result of this is that torus-2QoS can route around many simultaneous
+link failures, as long as no 1D ring is broken into disjoint regions.  For
+example, if links n-T and T-o have both failed, that ring has been broken
+into two disjoint regions, T and o-p-m-S-n.  Torus-2QoS checks for such
+issues, reports if they are found, and refuses to route such fabrics.
+
+Handling a failed switch under DOR requires introducing into a path at
+least one turn that would be otherwise "illegal", i.e. not allowed by DOR
+rules.  Torus-2QoS will introduce such a turn as close as possible to the
+failed switch in order 

[PATCH v3 15/17] opensm: Avoid havoc in minhop caused by torus-2QoS persistent use of osm_port_t:priv.

2010-06-15 Thread Jim Schutt
Torus-2QoS makes persistent use of osm_port_t:priv to speed calculation
of path SL values.

It cannot clear osm_port_t:priv members when it tears down its persistent
data for the following reason: If a port is removed from the fabric, the
opensm core will delete the corresponding osm_port_t object, leaving
torus-2QoS holding a dangling reference.  Torus-2QoS then has a use-after-free
error when tearing down its persistent data if it tries to use its dangling
osm_port_t reference to clear the priv member.

When torus-2QoS is unable to route a fabric due to missing switches and
opensm is configured to fall back to minhop, havoc will ensue because
minhop uses a non-NULL osm_port_t:priv as a proxy for LMC > 0: it
assumes if osm_port_t:priv is non-NULL it can only be because
alloc_ports_priv() has been called.

Fix this up by always calling alloc_ports_priv(), and have it set
priv = NULL if LMC == 0.

Signed-off-by: Jim Schutt 
---
 opensm/opensm/osm_ucast_mgr.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index d1c485f..e6e40f0 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -315,8 +315,10 @@ static void alloc_ports_priv(osm_ucast_mgr_t * mgr)
 item = cl_qmap_next(item)) {
port = (osm_port_t *) item;
lmc = ib_port_info_get_lmc(&port->p_physp->port_info);
-   if (!lmc)
+   if (!lmc) {
+   port->priv = NULL;
continue;
+   }
r = malloc(sizeof(*r) + sizeof(r->guids[0]) * (1 << lmc));
if (!r) {
OSM_LOG(mgr->p_log, OSM_LOG_ERROR, "ERR 3A09: "
@@ -363,8 +365,7 @@ static void ucast_mgr_process_tbl(IN cl_map_item_t * 
p_map_item,
/* Initialize LIDs in buffer to invalid port number. */
memset(p_sw->new_lft, OSM_NO_PATH, p_sw->max_lid_ho + 1);
 
-   if (p_mgr->p_subn->opt.lmc)
-   alloc_ports_priv(p_mgr);
+   alloc_ports_priv(p_mgr);
 
/*
   Iterate through every port setting LID routes for each
@@ -381,8 +382,7 @@ static void ucast_mgr_process_tbl(IN cl_map_item_t * 
p_map_item,
}
}
 
-   if (p_mgr->p_subn->opt.lmc)
-   free_ports_priv(p_mgr);
+   free_ports_priv(p_mgr);
 
OSM_LOG_EXIT(p_mgr->p_log);
 }
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 11/17] opensm: Enable torus-2QoS routing engine.

2010-06-15 Thread Jim Schutt

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_opensm.h |1 +
 opensm/opensm/main.c   |2 +-
 opensm/opensm/osm_opensm.c |6 ++
 3 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h 
b/opensm/include/opensm/osm_opensm.h
index fddcf53..8d63111 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -105,6 +105,7 @@ typedef enum _osm_routing_engine_type {
OSM_ROUTING_ENGINE_TYPE_FTREE,
OSM_ROUTING_ENGINE_TYPE_LASH,
OSM_ROUTING_ENGINE_TYPE_DOR,
+   OSM_ROUTING_ENGINE_TYPE_TORUS_2QOS,
OSM_ROUTING_ENGINE_TYPE_UNKNOWN
 } osm_routing_engine_type_t;
 /***/
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index 0093aa7..abc3282 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -174,7 +174,7 @@ static void show_usage(void)
   "  Min Hop algorithm.  Multiple routing engines can be 
specified\n"
   "  separated by commas so that specific ordering of 
routing\n"
   "  algorithms will be tried if earlier routing engines 
fail.\n"
-  "  Supported engines: updn, file, ftree, lash, dor\n\n");
+  "  Supported engines: updn, file, ftree, lash, dor, 
torus-2QoS\n\n");
printf("--do_mesh_analysis\n"
   "  This option enables additional analysis for the 
lash\n"
   "  routing engine to precondition switch port 
assignments\n"
diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c
index 5614240..8b03947 100644
--- a/opensm/opensm/osm_opensm.c
+++ b/opensm/opensm/osm_opensm.c
@@ -70,6 +70,7 @@ extern int osm_ucast_file_setup(struct osm_routing_engine *, 
osm_opensm_t *);
 extern int osm_ucast_ftree_setup(struct osm_routing_engine *, osm_opensm_t *);
 extern int osm_ucast_lash_setup(struct osm_routing_engine *, osm_opensm_t *);
 extern int osm_ucast_dor_setup(struct osm_routing_engine *, osm_opensm_t *);
+extern int osm_ucast_torus2QoS_setup(struct osm_routing_engine *, osm_opensm_t 
*);
 
 const static struct routing_engine_module routing_modules[] = {
{"minhop", osm_ucast_minhop_setup},
@@ -78,6 +79,7 @@ const static struct routing_engine_module routing_modules[] = 
{
{"ftree", osm_ucast_ftree_setup},
{"lash", osm_ucast_lash_setup},
{"dor", osm_ucast_dor_setup},
+   {"torus-2QoS", osm_ucast_torus2QoS_setup},
{NULL, NULL}
 };
 
@@ -98,6 +100,8 @@ const char *osm_routing_engine_type_str(IN 
osm_routing_engine_type_t type)
return "lash";
case OSM_ROUTING_ENGINE_TYPE_DOR:
return "dor";
+   case OSM_ROUTING_ENGINE_TYPE_TORUS_2QOS:
+   return "torus-2QoS";
default:
break;
}
@@ -124,6 +128,8 @@ osm_routing_engine_type_t osm_routing_engine_type(IN const 
char *str)
return OSM_ROUTING_ENGINE_TYPE_LASH;
else if (!strcasecmp(str, "dor"))
return OSM_ROUTING_ENGINE_TYPE_DOR;
+   else if (!strcasecmp(str, "torus-2QoS"))
+   return OSM_ROUTING_ENGINE_TYPE_TORUS_2QOS;
else
return OSM_ROUTING_ENGINE_TYPE_UNKNOWN;
 }
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 12/17] opensm: Add opensm option to specify file name for extra torus-2QoS configuration information.

2010-06-15 Thread Jim Schutt

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_base.h   |   18 ++
 opensm/include/opensm/osm_subnet.h |5 +
 opensm/opensm/main.c   |9 +
 opensm/opensm/osm_subnet.c |1 +
 opensm/opensm/osm_torus.c  |2 +-
 5 files changed, 34 insertions(+), 1 deletions(-)

diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h
index e0d6c66..fa4c78d 100644
--- a/opensm/include/opensm/osm_base.h
+++ b/opensm/include/opensm/osm_base.h
@@ -271,6 +271,24 @@ BEGIN_C_DECLS
 #endif
 /***/
 
+/d* OpenSM: Base/OSM_DEFAULT_TORUS_CONF_FILE
+* NAME
+*  OSM_DEFAULT_TORUS_CONF_FILE
+*
+* DESCRIPTION
+*  Specifies the default file name for extra torus-2QoS configuration
+*
+* SYNOPSIS
+*/
+#ifdef __WIN__
+#define OSM_DEFAULT_TORUS_CONF_FILE strcat(GetOsmCachePath(), 
"osm-torus-2QoS.conf")
+#elif defined(OPENSM_CONFIG_DIR)
+#define OSM_DEFAULT_TORUS_CONF_FILE OPENSM_CONFIG_DIR "/torus-2QoS.conf"
+#else
+#define OSM_DEFAULT_TORUS_CONF_FILE "/etc/opensm/torus-2QoS.conf"
+#endif /* __WIN__ */
+/***/
+
 /d* OpenSM: Base/OSM_DEFAULT_PREFIX_ROUTES_FILE
 * NAME
 *  OSM_DEFAULT_PREFIX_ROUTES_FILE
diff --git a/opensm/include/opensm/osm_subnet.h 
b/opensm/include/opensm/osm_subnet.h
index 4fa0161..fa3e46e 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -204,6 +204,7 @@ typedef struct osm_subn_opt {
char *guid_routing_order_file;
char *sa_db_file;
boolean_t sa_db_dump;
+   char *torus_conf_file;
boolean_t do_mesh_analysis;
boolean_t exit_on_fatal;
boolean_t honor_guid2lid_file;
@@ -431,6 +432,10 @@ typedef struct osm_subn_opt {
 *  When TRUE causes OpenSM to dump SA DB at the end of every
 *  light sweep regardless the current verbosity level.
 *
+*  torus_conf_file
+*  Name of the file with extra configuration info for torus-2QoS
+*  routing engine.
+*
 *  exit_on_fatal
 *  If TRUE (default) - SM will exit on fatal subnet initialization
 *  issues.
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index abc3282..b0bc372 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -231,6 +231,10 @@ static void show_usage(void)
   "  Set the order port guids will be routed for the 
MinHop\n"
   "  and Up/Down routing algorithms to the guids provided 
in the\n"
   "  given file (one to a line)\n\n");
+   printf("--torus_config \n"
+  "  This option defines the file name for the extra 
configuration\n"
+  "  info needed for the torus-2QoS routing engine.   The 
default\n"
+  "  name is \'"OSM_DEFAULT_TORUS_CONF_FILE"\'\n\n");
printf("--once, -o\n"
   "  This option causes OpenSM to configure the subnet\n"
   "  once, then exit.  Ports remain in the ACTIVE 
state.\n\n");
@@ -615,6 +619,7 @@ int main(int argc, char *argv[])
{"sm_sl", 1, NULL, 7},
{"retries", 1, NULL, 8},
{"log_prefix", 1, NULL, 9},
+   {"torus_config", 1, NULL, 10},
{NULL, 0, NULL, 0}  /* Required at the end of the array */
};
 
@@ -1003,6 +1008,10 @@ int main(int argc, char *argv[])
SET_STR_OPT(opt.log_prefix, optarg);
printf("Log prefix = %s\n", opt.log_prefix);
break;
+   case 10:
+   SET_STR_OPT(opt.torus_conf_file, optarg);
+   printf("Torus-2QoS config file = %s\n", 
opt.torus_conf_file);
+   break;
case 'h':
case '?':
case ':':
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index 8224b5f..bc34a0f 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -753,6 +753,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * p_opt)
p_opt->guid_routing_order_file = NULL;
p_opt->sa_db_file = NULL;
p_opt->sa_db_dump = FALSE;
+   p_opt->torus_conf_file = strdup(OSM_DEFAULT_TORUS_CONF_FILE);
p_opt->do_mesh_analysis = FALSE;
p_opt->exit_on_fatal = TRUE;
p_opt->enable_quirks = FALSE;
diff --git a/opensm/opensm/osm_torus.c b/opensm/opensm/osm_torus.c
index fe643f2..871a3f5 100644
--- a/opensm/opensm/osm_torus.c
+++ b/opensm/opensm/osm_torus.c
@@ -9049,7 +9049,7 @@ int torus_build_lfts(void *context)
torus->osm = ctx->osm;
fabric->osm = ctx->osm;
 
-   if (!parse_config(OPENSM_CONFIG_DIR "/opensm-torus.conf",
+   if (!parse_config(ctx->osm->subn.opt.torus_conf_file,
  fabric, torus))
goto out;
 
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux

[PATCH v3 13/17] opensm: Do not require -Q option for torus-2QoS routing engine.

2010-06-15 Thread Jim Schutt
The torus-2QoS engine provides a deadlock-free routing for a 2D/3D torus,
but requires that switch SL2VL maps be programmed.  Before this change,
"opensm -Q" was required for that to happen.

When a routing engine sets the struct osm_routing_engine:update_sl2vl
pointer, it is signalling its intent to participate in SL2VL map programming.
So, don't return early from osm_qos_setup() in that case; instead do everything
except attempt to read QoS configuration information.

For that to work properly, need to also always set up the default QoS config
information, instead of just when QoS is requested via -Q.

With that in place, the -Q option now means the same thing to torus-2QoS that
it means to other routing engines: QoS configuration is requested.

Signed-off-by: Jim Schutt 
---
 opensm/opensm/osm_qos.c|7 +--
 opensm/opensm/osm_subnet.c |   18 +-
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
index dadef29..6d2af55 100644
--- a/opensm/opensm/osm_qos.c
+++ b/opensm/opensm/osm_qos.c
@@ -290,7 +290,9 @@ int osm_qos_setup(osm_opensm_t * p_osm)
osm_node_t *p_node;
int ret = 0;
 
-   if (!p_osm->subn.opt.qos)
+   if (!(p_osm->subn.opt.qos ||
+ (p_osm->routing_engine_used &&
+  p_osm->routing_engine_used->update_sl2vl)))
return 0;
 
OSM_LOG_ENTER(&p_osm->log);
@@ -307,7 +309,8 @@ int osm_qos_setup(osm_opensm_t * p_osm)
cl_plock_excl_acquire(&p_osm->lock);
 
/* read QoS policy config file */
-   osm_qos_parse_policy_file(&p_osm->subn);
+   if (p_osm->subn.opt.qos)
+   osm_qos_parse_policy_file(&p_osm->subn);
 
p_tbl = &p_osm->subn.port_guid_tbl;
p_next = cl_qmap_head(p_tbl);
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index bc34a0f..f714af7 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -1051,6 +1051,8 @@ static void subn_verify_qos_set(osm_qos_options_t *set, 
const char *prefix,
 
 int osm_subn_verify_config(IN osm_subn_opt_t * p_opts)
 {
+   osm_qos_options_t dflt;
+
if (p_opts->lmc > 7) {
log_report(" Invalid Cached Option Value:lmc = %u:"
   "Using Default:%u\n", p_opts->lmc, OSM_DEFAULT_LMC);
@@ -1101,17 +1103,15 @@ int osm_subn_verify_config(IN osm_subn_opt_t * p_opts)
p_opts->console = OSM_DEFAULT_CONSOLE;
}
 
-   if (p_opts->qos) {
-   osm_qos_options_t dflt;
-
-   /* the default options in qos_options must be correct.
-* every other one need not be, b/c those will default
-* back to whatever is in qos_options.
-*/
 
-   subn_set_default_qos_options(&dflt);
+   /* the default options in qos_options must be correct.
+* every other one need not be, b/c those will default
+* back to whatever is in qos_options.
+*/
+   subn_set_default_qos_options(&dflt);
+   subn_verify_qos_set(&p_opts->qos_options, "qos", &dflt);
 
-   subn_verify_qos_set(&p_opts->qos_options, "qos", &dflt);
+   if (p_opts->qos) {
subn_verify_qos_set(&p_opts->qos_ca_options, "qos_ca",
&p_opts->qos_options);
subn_verify_qos_set(&p_opts->qos_sw0_options, "qos_sw0",
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 14/17] opensm: Make it possible to configure no fallback routing engine.

2010-06-15 Thread Jim Schutt
For a fabric that requires routing with an engine with special properties,
say avoiding credit loops via making use of SLs in routing, it might
be preferable to not fall back to minhop if the configured routing engine
fails.

E.g. the torus-2QoS routing engine uses both SL2VL maps and path SL values
to provide routing free of credit loops, but cannot route fabrics for
some patterns of failed switches.  Should a switch fail that creates such
a pattern, it may be preferable to keep the previous routing information
loaded in the switches until a switch can be replaced that restores
torus-2QoS's ability to route the fabric.

The alternative, having some other engine route the fabric, will immediately
introduce credit loops.

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_subnet.h |1 +
 opensm/opensm/osm_opensm.c |5 +
 opensm/opensm/osm_qos.c|6 ++
 opensm/opensm/osm_ucast_mgr.c  |   23 +++
 4 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h 
b/opensm/include/opensm/osm_subnet.h
index fa3e46e..42ae416 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -219,6 +219,7 @@ typedef struct osm_subn_opt {
osm_qos_options_t qos_rtr_options;
boolean_t enable_quirks;
boolean_t no_clients_rereg;
+   boolean_t no_fallback_routing_engine;
 #ifdef ENABLE_OSM_PERF_MGR
boolean_t perfmgr;
boolean_t perfmgr_redir;
diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c
index 8b03947..e296812 100644
--- a/opensm/opensm/osm_opensm.c
+++ b/opensm/opensm/osm_opensm.c
@@ -159,6 +159,11 @@ static struct osm_routing_engine 
*setup_routing_engine(osm_opensm_t *osm,
struct osm_routing_engine *re;
const struct routing_engine_module *m;
 
+   if (!strcmp(name, "no_fallback")) {
+   osm->subn.opt.no_fallback_routing_engine = TRUE;
+   return NULL;
+   }
+
for (m = routing_modules; m->name && *m->name; m++) {
if (!strcmp(m->name, name)) {
re = malloc(sizeof(struct osm_routing_engine));
diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
index 6d2af55..dc6a8ff 100644
--- a/opensm/opensm/osm_qos.c
+++ b/opensm/opensm/osm_qos.c
@@ -211,6 +211,12 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t 
*node,
int ret = 0;
unsigned i, j;
 
+   /*
+* Do nothing unless the most recent routing attempt was successful.
+*/
+   if (!re)
+   return ret;
+
for (i = 1; i < num_ports; i++) {
p = osm_node_get_physp_ptr(node, i);
force_update = p->need_update || sm->p_subn->need_update;
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index 10629cb..d1c485f 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -1091,7 +1091,8 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
p_routing_eng = p_routing_eng->next;
}
 
-   if (!p_osm->routing_engine_used) {
+   if (!p_osm->routing_engine_used &&
+   p_osm->subn.opt.no_fallback_routing_engine != TRUE) {
/* If configured routing algorithm failed, use default MinHop */
struct osm_routing_engine *r = p_osm->default_routing_engine;
 
@@ -1101,14 +1102,20 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
osm_ucast_mgr_set_fwd_tables(p_mgr);
}
 
-   OSM_LOG(p_mgr->p_log, OSM_LOG_INFO,
-   "%s tables configured on all switches\n",
-   osm_routing_engine_type_str(p_osm->
-   routing_engine_used->type));
-
-   if (p_mgr->p_subn->opt.use_ucast_cache)
-   p_mgr->cache_valid = TRUE;
+   if (p_osm->routing_engine_used) {
+   OSM_LOG(p_mgr->p_log, OSM_LOG_INFO,
+   "%s tables configured on all switches\n",
+   osm_routing_engine_type_str(p_osm->
+   routing_engine_used->type));
 
+   if (p_mgr->p_subn->opt.use_ucast_cache)
+   p_mgr->cache_valid = TRUE;
+   } else {
+   p_mgr->p_subn->subnet_initialization_error = TRUE;
+   OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR,
+   "No routing engine able to successfully configure "
+   " switch tables on current fabric\n");
+   }
 Exit:
CL_PLOCK_RELEASE(p_mgr->p_lock);
OSM_LOG_EXIT(p_mgr->p_log);
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 06/17] opensm: Make mcast_mgr_purge_tree() available outside osm_mcast_mgr.c.

2010-06-15 Thread Jim Schutt
A routing engine that needs to compute multicast spanning trees with
special properties will need to delete old trees.  There's already
a function that does this: mcast_mgr_purge_tree().

Make it available outside osm_mcast_mgr.c, and change the name
to follow the naming convention (osm_ prefix) for global functions.

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_multicast.h |   33 +
 opensm/opensm/osm_mcast_mgr.c |4 ++--
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/opensm/include/opensm/osm_multicast.h 
b/opensm/include/opensm/osm_multicast.h
index 1da575d..df6ac6c 100644
--- a/opensm/include/opensm/osm_multicast.h
+++ b/opensm/include/opensm/osm_multicast.h
@@ -53,6 +53,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef __cplusplus
 #  define BEGIN_C_DECLS extern "C" {
@@ -193,6 +194,38 @@ osm_mgrp_t *osm_mgrp_new(IN osm_subn_t * subn, IN 
ib_net16_t mlid,
 *  Multicast Group, osm_mgrp_delete
 */
 
+/*
+ * Need a forward declaration to work around include loop:
+ * osm_sm.h <- osm_multicast.h
+ */
+struct osm_sm;
+
+/f* OpenSM: Multicast Tree/osm_purge_mtree
+* NAME
+*  osm_purge_mtree
+*
+* DESCRIPTION
+*  Frees all the nodes in a multicast spanning tree
+*
+* SYNOPSIS
+*/
+void osm_purge_mtree(IN struct osm_sm * sm, IN osm_mgrp_box_t * mgb);
+/*
+* PARAMETERS
+*  sm
+*  [in] Pointer to osm_sm_t object.
+*  mgb
+*  [in] Pointer to an osm_mgrp_box_t object.
+*
+* RETURN VALUES
+*  None.
+*
+*
+* NOTES
+*
+* SEE ALSO
+*/
+
 /f* OpenSM: Multicast Group/osm_mgrp_is_guid
 * NAME
 *  osm_mgrp_is_guid
diff --git a/opensm/opensm/osm_mcast_mgr.c b/opensm/opensm/osm_mcast_mgr.c
index bd67d4e..e6db6db 100644
--- a/opensm/opensm/osm_mcast_mgr.c
+++ b/opensm/opensm/osm_mcast_mgr.c
@@ -146,7 +146,7 @@ static void mcast_mgr_purge_tree_node(IN osm_mtree_node_t * 
p_mtn)
free(p_mtn);
 }
 
-static void mcast_mgr_purge_tree(osm_sm_t * sm, IN osm_mgrp_box_t * mbox)
+void osm_purge_mtree(osm_sm_t * sm, IN osm_mgrp_box_t * mbox)
 {
OSM_LOG_ENTER(sm->p_log);
 
@@ -735,7 +735,7 @@ static ib_api_status_t 
mcast_mgr_build_spanning_tree(osm_sm_t * sm,
   on multicast forwarding table information if the user wants to
   preserve existing multicast routes.
 */
-   mcast_mgr_purge_tree(sm, mbox);
+   osm_purge_mtree(sm, mbox);
 
/* build the first "subset" containing all member ports */
if (make_port_list(&port_list, mbox)) {
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 05/17] opensm: Add struct osm_routing_engine callback to build spanning trees for multicast.

2010-06-15 Thread Jim Schutt
If a routing engine needs to compute spanning trees with special
properties, it needs a way to override the default implementation.
A routing engine callback provides that mechanism.  Routing engines
that can use the default implementation can leave the callback
pointer set to NULL.

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_opensm.h |6 ++
 opensm/opensm/osm_mcast_mgr.c  |7 ++-
 2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h 
b/opensm/include/opensm/osm_opensm.h
index 734a6db..fddcf53 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -132,6 +132,8 @@ struct osm_routing_engine {
uint8_t (*path_sl)(void *context, IN uint8_t path_sl_hint,
   IN const osm_port_t *src_port,
   IN const osm_port_t *dst_port);
+   ib_api_status_t (*mcast_build_stree)(void *context,
+IN OUT osm_mgrp_box_t *mgb);
void (*delete) (void *context);
struct osm_routing_engine *next;
 };
@@ -165,6 +167,10 @@ struct osm_routing_engine {
 *  path_sl
 *  The callback for computing path SL.
 *
+*  mcast_build_stree
+*  The callback for building the spanning tree for multicast
+*  forwarding, called per MLID.
+*
 *  delete
 *  The delete method, may be used for routing engine
 *  internals cleanup.
diff --git a/opensm/opensm/osm_mcast_mgr.c b/opensm/opensm/osm_mcast_mgr.c
index 322635d..bd67d4e 100644
--- a/opensm/opensm/osm_mcast_mgr.c
+++ b/opensm/opensm/osm_mcast_mgr.c
@@ -986,6 +986,7 @@ Exit:
 static ib_api_status_t mcast_mgr_process_mlid(osm_sm_t * sm, uint16_t mlid)
 {
ib_api_status_t status = IB_SUCCESS;
+   struct osm_routing_engine *re = sm->p_subn->p_osm->routing_engine_used;
osm_mgrp_box_t *mbox;
 
OSM_LOG_ENTER(sm->p_log);
@@ -1000,7 +1001,11 @@ static ib_api_status_t mcast_mgr_process_mlid(osm_sm_t * 
sm, uint16_t mlid)
 
mbox = osm_get_mbox_by_mlid(sm->p_subn, cl_hton16(mlid));
if (mbox) {
-   status = mcast_mgr_build_spanning_tree(sm, mbox);
+   if (re && re->mcast_build_stree)
+   status = re->mcast_build_stree(re->context, mbox);
+   else
+   status = mcast_mgr_build_spanning_tree(sm, mbox);
+
if (status != IB_SUCCESS)
OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 0A17: "
"Unable to create spanning tree (%s) for mlid "
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 03/17] opensm: Allow the routing engine to participate in path SL calculations.

2010-06-15 Thread Jim Schutt
LASH already does this, in a hard-coded fashion.

Generalize this by adding a callback to struct osm_routing_engine that
computes a path SL value, and fix up LASH to use it.

This patchset causes the requested or QoS-computed SL value to be passed
to the routing engine path SL computation as a hint.  In the event the
routing engine's use of SLs allows it to support more than one QoS level,
it may be able to make use of the SL hint to do so.

For now, LASH just ignores the hint.

Note that before this change, if LASH was configured and a specific path
SL value was requested that differed from what LASH needed to route the
fabric without credit loops, the path SL lookup would fail.  Now LASH's
SL value is always used.

Possibly the choice between failing a path SL request when it conflicts
with routing, vs. always providing an SL value that gives a credit-loop-
free routing, should be user-configurable?

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_opensm.h |6 +
 opensm/include/opensm/osm_ucast_lash.h |3 --
 opensm/opensm/osm_link_mgr.c   |   15 -
 opensm/opensm/osm_sa_path_record.c |   34 +++
 opensm/opensm/osm_ucast_lash.c |8 +-
 5 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h 
b/opensm/include/opensm/osm_opensm.h
index 25a6f90..734a6db 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -129,6 +129,9 @@ struct osm_routing_engine {
void (*update_sl2vl)(void *context, IN osm_physp_t *port,
 IN uint8_t in_port_num, IN uint8_t out_port_num,
 IN OUT ib_slvl_table_t *t);
+   uint8_t (*path_sl)(void *context, IN uint8_t path_sl_hint,
+  IN const osm_port_t *src_port,
+  IN const osm_port_t *dst_port);
void (*delete) (void *context);
struct osm_routing_engine *next;
 };
@@ -159,6 +162,9 @@ struct osm_routing_engine {
 *  which part of the SL2VL map to update.  For router/HCA ports,
 *  in_port_num/out_port_num should be ignored.
 *
+*  path_sl
+*  The callback for computing path SL.
+*
 *  delete
 *  The delete method, may be used for routing engine
 *  internals cleanup.
diff --git a/opensm/include/opensm/osm_ucast_lash.h 
b/opensm/include/opensm/osm_ucast_lash.h
index 9e15d38..dd90d5d 100644
--- a/opensm/include/opensm/osm_ucast_lash.h
+++ b/opensm/include/opensm/osm_ucast_lash.h
@@ -94,7 +94,4 @@ typedef struct _lash {
int ***virtual_location;
 } lash_t;
 
-uint8_t osm_get_lash_sl(osm_opensm_t * p_osm, const osm_port_t * p_src_port,
-   const osm_port_t * p_dst_port);
-
 #endif
diff --git a/opensm/opensm/osm_link_mgr.c b/opensm/opensm/osm_link_mgr.c
index c309916..e446e16 100644
--- a/opensm/opensm/osm_link_mgr.c
+++ b/opensm/opensm/osm_link_mgr.c
@@ -53,21 +53,23 @@
 #include 
 #include 
 #include 
-#include 
 
 static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN osm_physp_t * p_physp)
 {
osm_opensm_t *p_osm = sm->p_subn->p_osm;
+   struct osm_routing_engine *re = p_osm->routing_engine_used;
const osm_port_t *p_sm_port, *p_src_port;
ib_net16_t slid;
uint8_t sl;
 
OSM_LOG_ENTER(sm->p_log);
 
-   if (!(p_osm->routing_engine_used &&
- p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH 
&&
+   if (!(re && re->path_sl &&
  (slid = osm_physp_get_base_lid(p_physp {
-   /* Use default SL if lash routing is not used */
+   /*
+* Use default SL if routing engine does not provide a
+* path SL lookup callback.
+*/
OSM_LOG_EXIT(sm->p_log);
return sm->p_subn->opt.sm_sl;
}
@@ -78,8 +80,9 @@ static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN 
osm_physp_t * p_physp)
/* Find osm_port of the source = p_physp */
p_src_port = osm_get_port_by_lid(sm->p_subn, slid);
 
-   /* Call lash to find proper SL */
-   sl = osm_get_lash_sl(p_osm, p_src_port, p_sm_port);
+   /* Call into routing engine to find proper SL */
+   sl = re->path_sl(re->context, sm->p_subn->opt.sm_sl,
+p_src_port, p_sm_port);
 
OSM_LOG_EXIT(sm->p_log);
return sl;
diff --git a/opensm/opensm/osm_sa_path_record.c 
b/opensm/opensm/osm_sa_path_record.c
index 093c70d..a323671 100644
--- a/opensm/opensm/osm_sa_path_record.c
+++ b/opensm/opensm/osm_sa_path_record.c
@@ -164,6 +164,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * 
sa,
const osm_physp_t *p_dest_physp;
const osm_prtn_t *p_prtn = NULL;
osm_opensm_t *p_osm;
+   struct osm_routing_engine *p_re;
const ib_port_info_t *p_pi;
ib_api_status_t status = IB_SUCCESS;
ib

[PATCH v3 01/17] opensm: Prepare for routing engine input to path record SL lookup and SL2VL map setup.

2010-06-15 Thread Jim Schutt
In the event a routing engine needs to participate in SL assignment and
SL2VL map setup in order to avoid credit loops in a fabric, it will be
useful to make the routing engine context more widely available.

To this end, have osm_opensm_t save a pointer to the routing engine used,
rather than its type.  This will make the routing engine context easily
available in, e.g., sl2vl_update() and pr_rcv_get_path_parms().

Make the necessary adjustments to the code that used the old
routing_engine_used as an enum _osm_routing_engine_type.  In order to
keep the behavior where minhop was used if the configured routing engines
failed, the easiest solution was to add a pointer to osm_opensm_t which
pointed to the minhop struct osm_routing_engine.

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_opensm.h |4 ++-
 opensm/opensm/osm_console.c|   10 ++--
 opensm/opensm/osm_dump.c   |3 +-
 opensm/opensm/osm_link_mgr.c   |5 ++-
 opensm/opensm/osm_opensm.c |   43 +---
 opensm/opensm/osm_sa_path_record.c |3 +-
 opensm/opensm/osm_ucast_lash.c |3 +-
 opensm/opensm/osm_ucast_mgr.c  |   17 --
 8 files changed, 54 insertions(+), 34 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h 
b/opensm/include/opensm/osm_opensm.h
index c6c9bdb..e97142e 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -120,6 +120,7 @@ typedef enum _osm_routing_engine_type {
 *  added later.
 */
 struct osm_routing_engine {
+   osm_routing_engine_type_t type;
const char *name;
void *context;
int (*build_lid_matrices) (void *context);
@@ -183,7 +184,8 @@ typedef struct osm_opensm {
cl_dispatcher_t disp;
cl_plock_t lock;
struct osm_routing_engine *routing_engine_list;
-   osm_routing_engine_type_t routing_engine_used;
+   struct osm_routing_engine *routing_engine_used;
+   struct osm_routing_engine *default_routing_engine;
osm_stats_t stats;
osm_console_t console;
nn_map_t *node_name_map;
diff --git a/opensm/opensm/osm_console.c b/opensm/opensm/osm_console.c
index bc7bea3..b99bb84 100644
--- a/opensm/opensm/osm_console.c
+++ b/opensm/opensm/osm_console.c
@@ -382,6 +382,8 @@ static void print_status(osm_opensm_t * p_osm, FILE * out)
cl_list_item_t *item;
 
if (out) {
+   const char *re_str;
+
cl_plock_acquire(&p_osm->lock);
fprintf(out, "   OpenSM Version   : %s\n", 
p_osm->osm_version);
fprintf(out, "   SM State : %s\n",
@@ -390,9 +392,11 @@ static void print_status(osm_opensm_t * p_osm, FILE * out)
p_osm->subn.opt.sm_priority);
fprintf(out, "   SA State : %s\n",
sa_state_str(p_osm->sa.state));
-   fprintf(out, "   Routing Engine   : %s\n",
-   osm_routing_engine_type_str(p_osm->
-   routing_engine_used));
+
+   re_str = p_osm->routing_engine_used ?
+   
osm_routing_engine_type_str(p_osm->routing_engine_used->type) :
+   
osm_routing_engine_type_str(OSM_ROUTING_ENGINE_TYPE_NONE);
+   fprintf(out, "   Routing Engine   : %s\n", re_str);
 
fprintf(out, "   Loaded event plugins :");
if (cl_qlist_head(&p_osm->plugin_list) ==
diff --git a/opensm/opensm/osm_dump.c b/opensm/opensm/osm_dump.c
index fe2c3bc..bfff1a0 100644
--- a/opensm/opensm/osm_dump.c
+++ b/opensm/opensm/osm_dump.c
@@ -135,7 +135,8 @@ static void dump_ucast_routes(cl_map_item_t * item, FILE * 
file, void *cxt)
"Switch 0x%016" PRIx64 "\nLID: Port : Hops : Optimal\n",
cl_ntoh64(osm_node_get_node_guid(p_node)));
 
-   dor = (p_osm->routing_engine_used == OSM_ROUTING_ENGINE_TYPE_DOR);
+   dor = (p_osm->routing_engine_used &&
+  p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_DOR);
 
for (lid_ho = 1; lid_ho <= max_lid_ho; lid_ho++) {
fprintf(file, "0x%04X : ", lid_ho);
diff --git a/opensm/opensm/osm_link_mgr.c b/opensm/opensm/osm_link_mgr.c
index e6c9b3b..c309916 100644
--- a/opensm/opensm/osm_link_mgr.c
+++ b/opensm/opensm/osm_link_mgr.c
@@ -64,8 +64,9 @@ static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN 
osm_physp_t * p_physp)
 
OSM_LOG_ENTER(sm->p_log);
 
-   if (p_osm->routing_engine_used != OSM_ROUTING_ENGINE_TYPE_LASH
-   || !(slid = osm_physp_get_base_lid(p_physp))) {
+   if (!(p_osm->routing_engine_used &&
+ p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH 
&&
+ (slid = osm_physp_get_base_lid(p_physp {
/* Use default SL if lash routing is not used */
OSM_LOG_EXIT(sm->p_log);
return sm->p_subn

[PATCH v3 00/17] opensm: Add new torus routing engine: torus-2QoS

2010-06-15 Thread Jim Schutt
This is v3 of a patchset to add to opensm a new routing engine designed
to handle large fabrics connected with a 2D/3D torus topology.

Changes since v2:

- Rebased to a3dec3a87a.
- Divide "Add torus-2QoS routing engine" patch into three parts
   to avoid rejection by mailing list.
- Bug fix: reduce number of required seed links for a torus
   with one or more dimensions of radix four.
- Bug fix: don't let torus-2QoS be fooled into thinking it can route
   a torus with two or more blocks of switches adjacent in z missing.
- Bug fix: if osm_ucast_mgr_process() fails, no configured routing engine
   could route the fabric, so wait for a trap or sweep interval before
   next heavy sweep.
- Bug fix: cut-n-paste error in handle_case_0x731().

Changes since initial version:

- Merged my patchsets from 11/20/2009, 12/18/2009, 2/16/2010.
- Moved infomation contained in the earlier patch series introduction
emails into the appropriate commit messages.
- Rebased to c183eb8c4c.
- Addressed issues found by Yevgeny Kliteynik in original patchsets.
Yevgeny's --no_default_routing option patch is not included
in the merging, but would be a good addition.
- Renamed osm_ucast_torus.c to osm_torus.c.
Since osm_torus.c contains code to implement both unicast and
multicast routing, the new name seems more appropriate.  The
multicast support depends heavily on the unicast routing code,
so it is more convenient to keep everything in one file.
- Removed redundant check for changed sl2vl map.
This functionality already exists in sl2vl_update_table().
- Set sl2vl maps on CA ports for torus-2QoS.
This was missing in the original patches.
- Do not force torus-2QoS to use SLs 8-15 when not using "opensm -Q".
This was an interim measure introduced before multicast support was
working, that allowed multicast to use SL/VL 0 and thus not deadlock
against unicast.  I forget to take it out in the multicast patchset,
so I took it out when I merged.
- Renamed torus variables referencing "origin" to "seed".
These things refer to switches used to seed the torus topology
appropriately, so the new name should reduce confusion going forward.
This also contains a keyword change in the torus configuration file,
so I'll repost an updated example.


Jim Schutt (17):
  opensm: Prepare for routing engine input to path record SL lookup and
SL2VL map setup.
  opensm: Allow the routing engine to influence SL2VL calculations.
  opensm: Allow the routing engine to participate in path SL
calculations.
  opensm: Track the minimum value in the fabric of data VLs supported.
  opensm: Add struct osm_routing_engine callback to build spanning
trees for multicast.
  opensm: Make mcast_mgr_purge_tree() available outside
osm_mcast_mgr.c.
  opensm: Add torus-2QoS routing engine, part 1.
  opensm: Add torus-2QoS routing engine, part 2.
  opensm: Add torus-2QoS routing engine, part 3.
  opensm: Update documentation to describe torus-2QoS.
  opensm: Enable torus-2QoS routing engine.
  opensm: Add opensm option to specify file name for extra torus-2QoS
configuration information.
  opensm: Do not require -Q option for torus-2QoS routing engine.
  opensm: Make it possible to configure no fallback routing engine.
  opensm: Avoid havoc in minhop caused by torus-2QoS persistent use of
osm_port_t:priv.
  opensm: Avoid havoc in dump_ucast_routes() caused by torus-2QoS
persistent use of osm_port_t:priv.
  opensm: Cause status of unicast routing attempt to propogate to
callers of osm_ucast_mgr_process().

 opensm/doc/current-routing.txt |  269 +-
 opensm/include/opensm/osm_base.h   |   18 +
 opensm/include/opensm/osm_multicast.h  |   33 +
 opensm/include/opensm/osm_opensm.h |   29 +-
 opensm/include/opensm/osm_subnet.h |7 +
 opensm/include/opensm/osm_switch.h |   12 +
 opensm/include/opensm/osm_ucast_lash.h |3 -
 opensm/man/opensm.8.in |9 +-
 opensm/opensm/Makefile.am  |2 +-
 opensm/opensm/main.c   |   11 +-
 opensm/opensm/osm_console.c|   10 +-
 opensm/opensm/osm_dump.c   |5 +-
 opensm/opensm/osm_link_mgr.c   |   16 +-
 opensm/opensm/osm_mcast_mgr.c  |   11 +-
 opensm/opensm/osm_opensm.c |   54 +-
 opensm/opensm/osm_port_info_rcv.c  |   13 +-
 opensm/opensm/osm_qos.c|   40 +-
 opensm/opensm/osm_sa_path_record.c |   33 +-
 opensm/opensm/osm_state_mgr.c  |   23 +-
 opensm/opensm/osm_subnet.c |   20 +-
 opensm/opensm/osm_switch.c |7 +-
 opensm/opensm/osm_torus.c  | 9120 
 opensm/opensm/osm_ucast_lash.c |   11 +-
 opensm/opensm/osm_ucast_mgr.c  |   55 +-
 24 files changed, 9702 insertions(+), 109 deletions(-)
 create mode 100644 opensm/opensm/osm_torus.c


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

[PATCH v3 02/17] opensm: Allow the routing engine to influence SL2VL calculations.

2010-06-15 Thread Jim Schutt
Note that the original code assumes that QoS setup is mostly static and
based only on user configuration.  As a result, there is no provision for
routing engines that want to compute contributions to the SL2VL maps.

Fix this up by adding a callback to struct osm_routing_engine that computes
a per-port SL2VL map, and call it from the appropriate place in the QoS
setup path.  Assume that if a routing engine provides a update_sl2vl()
callback that there will input-port dependence in the SL2VL maps, and
so do not attempt to use optimized SL2VL map programming even if the
switch supports it.

Also need to move the call to osm_qos_setup() in do_sweep() to after the
call to the routing engine, so that any SL2VL map contributions from the
routing engine are based on the latest information.  Need to call
osm_qos_setup() for requested reroute for the same reason.

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_opensm.h |   12 
 opensm/opensm/osm_qos.c|   27 +++
 opensm/opensm/osm_state_mgr.c  |5 +++--
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h 
b/opensm/include/opensm/osm_opensm.h
index e97142e..25a6f90 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -126,6 +126,9 @@ struct osm_routing_engine {
int (*build_lid_matrices) (void *context);
int (*ucast_build_fwd_tables) (void *context);
void (*ucast_dump_tables) (void *context);
+   void (*update_sl2vl)(void *context, IN osm_physp_t *port,
+IN uint8_t in_port_num, IN uint8_t out_port_num,
+IN OUT ib_slvl_table_t *t);
void (*delete) (void *context);
struct osm_routing_engine *next;
 };
@@ -147,6 +150,15 @@ struct osm_routing_engine {
 *  ucast_dump_tables
 *  The callback for dumping unicast routing tables.
 *
+*  update_sl2vl(void *context, IN osm_physp_t *port,
+*   IN uint8_t in_port_num, IN uint8_t out_port_num,
+*   OUT ib_slvl_table_t *t)
+*  The callback to allow routing engine input for SL2VL maps.
+*  *port is the phyical port for which the SL2VL map is to be
+*  updated. For switches, in_port_num/out_port_num identify
+*  which part of the SL2VL map to update.  For router/HCA ports,
+*  in_port_num/out_port_num should be ignored.
+*
 *  delete
 *  The delete method, may be used for routing engine
 *  internals cleanup.
diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
index cce59ee..dadef29 100644
--- a/opensm/opensm/osm_qos.c
+++ b/opensm/opensm/osm_qos.c
@@ -207,6 +207,7 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t 
*node,
osm_physp_t *p0, *p;
unsigned force_update;
unsigned num_ports = osm_node_get_num_physp(node);
+   struct osm_routing_engine *re = sm->p_subn->p_osm->routing_engine_used;
int ret = 0;
unsigned i, j;
 
@@ -223,7 +224,7 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t 
*node,
return ret;
 
if (ib_switch_info_get_opt_sl2vlmapping(&node->sw->switch_info) &&
-   sm->p_subn->opt.use_optimized_slvl) {
+   sm->p_subn->opt.use_optimized_slvl && !re->update_sl2vl) {
p = osm_node_get_physp_ptr(node, 1);
force_update = p->need_update || sm->p_subn->need_update;
return sl2vl_update_table(sm, p, 1, 0x3, force_update,
@@ -234,10 +235,20 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t 
*node,
p = osm_node_get_physp_ptr(node, i);
force_update = p->need_update || sm->p_subn->need_update;
j = ib_switch_info_is_enhanced_port0(&node->sw->switch_info) ? 
0 : 1;
-   for (; j < num_ports; j++)
+   for (; j < num_ports; j++) {
+   const ib_slvl_table_t *port_sl2vl = &qcfg->sl2vl;
+   ib_slvl_table_t routing_sl2vl;
+
+   if (re->update_sl2vl) {
+   routing_sl2vl = *port_sl2vl;
+   re->update_sl2vl(re->context,
+p, i, j, &routing_sl2vl);
+   port_sl2vl = &routing_sl2vl;
+   }
if (sl2vl_update_table(sm, p, i, i << 8 | j,
-  force_update, &qcfg->sl2vl))
+  force_update, port_sl2vl))
ret = -1;
+   }
}
 
return ret;
@@ -247,6 +258,9 @@ static int qos_endport_setup(osm_sm_t * sm, osm_physp_t * p,
 const struct qos_config *qcfg)
 {
unsigned force_update = p->need_update || sm->p_subn->need_update;
+   str

[PATCH v3 04/17] opensm: Track the minimum value in the fabric of data VLs supported.

2010-06-15 Thread Jim Schutt
A routing engine that wants to make contributions to SL2VL maps in support
of routing free from credit loops may need to know the minimum number
of supported data VLs in the fabric.

This code tracks that value.

Signed-off-by: Jim Schutt 
---
 opensm/include/opensm/osm_subnet.h |1 +
 opensm/opensm/osm_port_info_rcv.c  |   13 -
 opensm/opensm/osm_state_mgr.c  |6 ++
 opensm/opensm/osm_subnet.c |1 +
 4 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h 
b/opensm/include/opensm/osm_subnet.h
index 95a635c..4fa0161 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -536,6 +536,7 @@ typedef struct osm_subn {
uint16_t max_mcast_lid_ho;
uint8_t min_ca_mtu;
uint8_t min_ca_rate;
+   uint8_t min_data_vls;
boolean_t ignore_existing_lfts;
boolean_t subnet_initialization_error;
boolean_t force_heavy_sweep;
diff --git a/opensm/opensm/osm_port_info_rcv.c 
b/opensm/opensm/osm_port_info_rcv.c
index 9260047..c05301e 100644
--- a/opensm/opensm/osm_port_info_rcv.c
+++ b/opensm/opensm/osm_port_info_rcv.c
@@ -83,6 +83,7 @@ static void pi_rcv_process_endport(IN osm_sm_t * sm, IN 
osm_physp_t * p_physp,
ib_api_status_t status;
ib_net64_t port_guid;
uint8_t rate, mtu;
+   unsigned data_vls;
cl_qmap_t *p_sm_tbl;
osm_remote_sm_t *p_sm;
 
@@ -92,7 +93,7 @@ static void pi_rcv_process_endport(IN osm_sm_t * sm, IN 
osm_physp_t * p_physp,
 
/* HACK extended port 0 should be handled too! */
if (osm_physp_get_port_num(p_physp) != 0) {
-   /* track the minimal endport MTU and rate */
+   /* track the minimal endport MTU, rate, and operational VLs */
mtu = ib_port_info_get_mtu_cap(p_pi);
if (mtu < sm->p_subn->min_ca_mtu) {
OSM_LOG(sm->p_log, OSM_LOG_VERBOSE,
@@ -108,6 +109,16 @@ static void pi_rcv_process_endport(IN osm_sm_t * sm, IN 
osm_physp_t * p_physp,
PRIx64 "\n", rate, cl_ntoh64(port_guid));
sm->p_subn->min_ca_rate = rate;
}
+
+   data_vls = 1U << (ib_port_info_get_op_vls(p_pi) - 1);
+   if (data_vls >= IB_MAX_NUM_VLS)
+   data_vls = IB_MAX_NUM_VLS - 1;
+   if ((uint8_t)data_vls < sm->p_subn->min_data_vls) {
+   OSM_LOG(sm->p_log, OSM_LOG_VERBOSE,
+   "Setting endport minimal data VLs to:%u defined 
by port:0x%"
+   PRIx64 "\n", data_vls, cl_ntoh64(port_guid));
+   sm->p_subn->min_data_vls = data_vls;
+   }
}
 
if (port_guid != sm->p_subn->sm_port_guid) {
diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index cdd72c1..762bb27 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1164,6 +1164,12 @@ repeat_discovery:
sm->p_subn->force_reroute = FALSE;
sm->p_subn->subnet_initialization_error = FALSE;
 
+   /* Reset tracking values in case limiting component got removed
+* from fabric. */
+   sm->p_subn->min_ca_mtu = IB_MAX_MTU;
+   sm->p_subn->min_ca_rate = IB_MAX_RATE;
+   sm->p_subn->min_data_vls = IB_MAX_NUM_VLS - 1;
+
/* rescan configuration updates */
if (!config_parsed && osm_subn_rescan_conf_files(sm->p_subn) < 0)
OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 331A: "
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index d5c5ab2..8224b5f 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -529,6 +529,7 @@ ib_api_status_t osm_subn_init(IN osm_subn_t * p_subn, IN 
osm_opensm_t * p_osm,
p_subn->max_mcast_lid_ho = IB_LID_MCAST_END_HO;
p_subn->min_ca_mtu = IB_MAX_MTU;
p_subn->min_ca_rate = IB_MAX_RATE;
+   p_subn->min_data_vls = IB_MAX_NUM_VLS - 1;
p_subn->ignore_existing_lfts = TRUE;
 
/* we assume master by default - so we only need to set it true if 
STANDBY */
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mellanox implementation for atomic operations

2010-06-15 Thread Dotan Barak
Hi.

On 10/05/2010 08:42, lihaidong wrote:
> Hi,
>I have a question about atomic operations.
>According to IB specification o10-48, all atomic operation request made to 
> the same HCA, referencing the same physical memory are serialized with 
> respect to each other.  I know this should be complied with if HCA supports 
> atomic operations, right?
>   
Right.
>According to  IB specification o10-49, all atomic operations requests that 
> referencing the same physical memory are serialized with respect to each 
> other. This means that atomic operations performed by processors should 
> serialized with atomic operations performed by HCAs, too, if they were 
> referencing the same physical memory.
>   
So far so good.
>   I want to know whether Mellanox implementation for atomic operations comply 
> with o10-49 or not.
>   if not ,to what extent it comply with the rule? 
>   I also was intrested in how this rule is complied with by others vendors?
>
>   
Let give you a general answer:
The struct ibv_device_attr contains the atomic_cap attribute, this
attribute defines the atomicity
level that the HCA support (None, only within the HCA, between all HCAs
(global)).

I think that your code should check this attribute
(This way your code will support all vendors HCAs).

As much as i know, atomic operations are only supported within one HCA.

I hope that this answer helped you ..
Dotan

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] ofa_kernel/infiniband node description patch

2010-06-15 Thread Mike Heinz
Currently, the node description of an HCA is set to a description of the HCA 
hardware or, at boot time, to a brief string containing the hostname of the 
node the HCA is installed in.

The problem is that if the host's DHCP server is slow, the node description may 
be set before the hostname, resulting in an entire fabric of nodes called 
"localhost".

This fix adds a small parsing function to the core infiniband code and a hook 
in each of the HCA drivers so that, at the time the HCA is actually queried for 
its node description, the description is scanned for an '@' character which is 
then replaced with the utsname of the node. This ensures that even if the 
hostname is initially set incorrectly, if it later changes the HCA will report 
the updated information.

In addition, the initialization code for HCA drivers that preset the node_desc 
has been patched to include an '@' character at the beginning of the 
description. This eliminates the need for a special initialization script - 
although existing scripts are still supported.

This updated patch incorporates feedback from Jason Gunthorpe and Or Gerlitz.

Signed-Off-By: Michael Heinz 

---

Testing on Mellanox HCA, case 1 (default):

r...@bart:~# cat /sys/class/infiniband/mthca0/node_desc
@:MT25218 InfiniHostEx Mellanox Technologies

[r...@panic ~]# smpquery ND 6
Node Description: bart:MT25218 InfiniHostEx Mellanox Technologies


Testing on Mellanox HCA, case 2 - over 64 characters long:

r...@bart:~# echo 
"0123456789112345678921234567...@234567894123456789512345678961234567897" 
>/sys/class/infiniband/mthca0/node_desc
r...@bart:~# cat /sys/class/infiniband/mthca0/node_desc 
0123456789112345678921234567...@23456789412345678951234567896123

[r...@panic sbin]# smpquery ND 6
Node 
Description:.0123456789112345678921234567893bart2345678941234567895123456789


Testing on Mellanox HCA, case 3 - short:

r...@bart:~# echo "@" >/sys/class/infiniband/mthca0/node_desc

[r...@panic sbin]# smpquery ND 6
Node Description:...bart

--

Testing with QIB HCA:

[r...@node-b2 ~]# cat /sys/class/infiniband/qib0/node_desc
@:QLogic kernel.org driver

[r...@node-a1 ~]# smpquery ND 0x140
Node Description:.node-b2:QLogic kernel.org driver


[r...@node-b2 1]# cat /sys/class/infiniband/qib0/node_desc
@

[r...@node-a1 ~]# smpquery ND 0x140
Node Description:.node-b2


[r...@node-b2 ~]# echo 
"0123456789112345678921234567...@234567894123456789512345678961234567897" 
>/sys/class/infiniband/qib0/node_desc

[r...@node-a1 ~]# smpquery ND 0x140
Node 
Description:.0123456789112345678921234567893node-b22345678941234567895123456


---


diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index ef1304f..bdf1cfa 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -41,6 +41,7 @@
 #include "mad_rmpp.h"
 #include "smi.h"
 #include "agent.h"
+#include "linux/utsname.h"
 
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_DESCRIPTION("kernel IB MAD API");
@@ -932,6 +933,29 @@ int ib_get_mad_data_offset(u8 mgmt_class)
 }
 EXPORT_SYMBOL(ib_get_mad_data_offset);
 
+#define NODE_DESC_FIELD_LENGTH 64
+void ib_build_node_desc(char *dest, char *src)
+{
+   int i;
+   for (i=0; inodename;
+   for (; *name && *name != '.' && 
iattr_mod)
smp->status |= IB_SMP_INVALID_FIELD;
 
-   memcpy(smp->data, ibdev->node_desc, sizeof(smp->data));
+   ib_build_node_desc((char*)smp->data, ibdev->node_desc);
 
return reply(smp);
 }
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c 
b/drivers/infiniband/hw/ipath/ipath_verbs.c
index dd7f26d..db8b719 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -2180,7 +2180,7 @@ int ipath_register_ib_device(struct ipath_devdata *dd)
dev->dma_ops = &ipath_dma_mapping_ops;
 
snprintf(dev->node_desc, sizeof(dev->node_desc),
-IPATH_IDSTR " %s", init_utsname()->nodename);
+"@:" IPATH_IDSTR);
 
ret = ib_register_device(dev, NULL);
if (ret)
diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index f38d5b1..d83398f 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -196,7 +196,7 @@ static void node_desc_override(struct ib_device *dev,
mad->mad_hdr.method == IB_MGMT_METHOD_GET_RESP &&
mad->mad_hdr.attr_id == IB_SMP_ATTR_NODE_DESC) {
spin_lock(&to_mdev(dev)->sm_lock);
-   memcpy(((struct ib_smp *) mad)->data, dev->node_desc, 64);
+   ib_build_node_desc((char*)((struct ib_smp *) mad)->data, 
dev->node_desc);
spin_unlock(&to_mdev(dev)->sm_lock);
}
 }
diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 4e94e36..67e317f 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@

Re: [PATCH v2] opensm: Modify connect_roots to allow connectivity between all switches

2010-06-15 Thread Eli Dorfman (Voltaire)
Yevgeny Kliteynik wrote:
> On 15-Jun-10 10:09 AM, Eli Dorfman (Voltaire) wrote:
>> Yevgeny Kliteynik wrote:
>>> Eli,
>>>
>>> On 13-Jun-10 6:10 PM, Eli Dorfman (Voltaire) wrote:
 After a second thought and in order not to break current configuration,
 I send this modified patch that does not change connect_roots option
 but
 changes its functionality in up-down (I think that in fat-tree it is
 already implemented)

 Modify connect_roots option to allow connectivity between
 all switches in up-down routing algorithm and in this way be
 fully IBA compliant

 Signed-off-by: Eli Dorfman
 ---
opensm/man/opensm.8.in |2 +-
opensm/opensm/main.c   |2 +-
opensm/opensm/osm_ucast_updn.c |4 +---
3 files changed, 3 insertions(+), 5 deletions(-)

 diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in
 index 9053611..c67126e 100644
 --- a/opensm/man/opensm.8.in
 +++ b/opensm/man/opensm.8.in
 @@ -174,7 +174,7 @@ the host comes back online.
.TP
\fB\-z\fR, \fB\-\-connect_roots\fR
This option enforces routing engines (up/down and
 -fat-tree) to make connectivity between root switches and in
 +fat-tree) to make connectivity between all switches and in
this way to be fully IBA complaint. In many cases this can
violate "pure" deadlock free algorithm, so use it carefully.
.TP
 diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
 index 0093aa7..82ca78f 100644
 --- a/opensm/opensm/main.c
 +++ b/opensm/opensm/main.c
 @@ -187,7 +187,7 @@ static void show_usage(void)
   "  Sets the SL to use to communicate with the
 SM/SA. Defaults to 0.\n\n");
printf("--connect_roots, -z\n"
   "  This option enforces routing engines (up/down
 and \n"
 -   "  fat-tree) to make connectivity between root
 switches\n"
 +   "  fat-tree) to make connectivity between all
 switches\n"
   "  and in this way be IBA compliant. In many
 cases,\n"
   "  this can violate \"pure\" deadlock free
 algorithm, so\n"
   "  use it carefully.\n\n");
 diff --git a/opensm/opensm/osm_ucast_updn.c
 b/opensm/opensm/osm_ucast_updn.c
 index 164c6f4..f44ca24 100644
 --- a/opensm/opensm/osm_ucast_updn.c
 +++ b/opensm/opensm/osm_ucast_updn.c
 @@ -314,9 +314,7 @@ static int updn_set_min_hop_table(IN updn_t *
 p_updn)
 item = cl_qmap_next(item)) {
p_sw = (osm_switch_t *)item;
/* Clear Min Hop Table */
 -if (p_subn->opt.connect_roots)
 -updn_clear_non_root_hops(p_updn, p_sw);
 -else
 +if (!p_subn->opt.connect_roots)
osm_switch_clear_hops(p_sw);
}

>>>
>>> What kind of testing was done for this?
>>> I have a strong feeling that it will break up/down.
>>> If the connect_roots option is on, you will not clear
>>> the lid matrix at all, and it will contain also the
>>> down/up routes.
>>
>> that is the idea.
>> instead of leaving only up/dpwn routes we want to keep routes between
>> all switches.
>> the routes between host nodes will still be up/down.
> 
> Lid matrix might contain down/up routes between many
> kinds of switches, not only between roots. This means
> that you might have down/up paths between leaf switches,
> even though you could have found up/down path for them.
> This also means that you might end up with down/up routes
> between HCAs that are connected to these leafs.

ok I'll check this again.

Thanks,
Eli

> 
> -- Yevgeny
> 
>> Eli
>>
>>>
>>> -- Yevgeny
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] opensm/osmtest.c: fix bug in getting attr offset

2010-06-15 Thread Yevgeny Kliteynik
Fix bug that was introduced by commit 4fd4ca306f93376963725285f3bf7c87a76055b0

Signed-off-by: Yevgeny Kliteynik 
---
 opensm/osmtest/osmtest.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/opensm/osmtest/osmtest.c b/opensm/osmtest/osmtest.c
index abdfa7e..7412ce7 100644
--- a/opensm/osmtest/osmtest.c
+++ b/opensm/osmtest/osmtest.c
@@ -563,7 +563,7 @@ osmtest_get_all_recs(IN osmtest_t * const p_osmt,

p_context->p_osmt = p_osmt;
user.attr_id = attr_id;
-   user.attr_offset = ib_get_attr_offset((uint16_t) (attr_size >> 3));
+   user.attr_offset = ib_get_attr_offset((uint16_t) attr_size);

req.query_type = OSMV_QUERY_USER_DEFINED;
req.timeout_ms = p_osmt->opt.transaction_timeout;
-- 
1.5.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] opensm: Modify connect_roots to allow connectivity between all switches

2010-06-15 Thread Yevgeny Kliteynik

On 15-Jun-10 10:09 AM, Eli Dorfman (Voltaire) wrote:

Yevgeny Kliteynik wrote:

Eli,

On 13-Jun-10 6:10 PM, Eli Dorfman (Voltaire) wrote:

After a second thought and in order not to break current configuration,
I send this modified patch that does not change connect_roots option but
changes its functionality in up-down (I think that in fat-tree it is
already implemented)

Modify connect_roots option to allow connectivity between
all switches in up-down routing algorithm and in this way be
fully IBA compliant

Signed-off-by: Eli Dorfman
---
   opensm/man/opensm.8.in |2 +-
   opensm/opensm/main.c   |2 +-
   opensm/opensm/osm_ucast_updn.c |4 +---
   3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in
index 9053611..c67126e 100644
--- a/opensm/man/opensm.8.in
+++ b/opensm/man/opensm.8.in
@@ -174,7 +174,7 @@ the host comes back online.
   .TP
   \fB\-z\fR, \fB\-\-connect_roots\fR
   This option enforces routing engines (up/down and
-fat-tree) to make connectivity between root switches and in
+fat-tree) to make connectivity between all switches and in
   this way to be fully IBA complaint. In many cases this can
   violate "pure" deadlock free algorithm, so use it carefully.
   .TP
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index 0093aa7..82ca78f 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -187,7 +187,7 @@ static void show_usage(void)
  "  Sets the SL to use to communicate with the
SM/SA. Defaults to 0.\n\n");
   printf("--connect_roots, -z\n"
  "  This option enforces routing engines (up/down
and \n"
-   "  fat-tree) to make connectivity between root
switches\n"
+   "  fat-tree) to make connectivity between all
switches\n"
  "  and in this way be IBA compliant. In many
cases,\n"
  "  this can violate \"pure\" deadlock free
algorithm, so\n"
  "  use it carefully.\n\n");
diff --git a/opensm/opensm/osm_ucast_updn.c
b/opensm/opensm/osm_ucast_updn.c
index 164c6f4..f44ca24 100644
--- a/opensm/opensm/osm_ucast_updn.c
+++ b/opensm/opensm/osm_ucast_updn.c
@@ -314,9 +314,7 @@ static int updn_set_min_hop_table(IN updn_t * p_updn)
item = cl_qmap_next(item)) {
   p_sw = (osm_switch_t *)item;
   /* Clear Min Hop Table */
-if (p_subn->opt.connect_roots)
-updn_clear_non_root_hops(p_updn, p_sw);
-else
+if (!p_subn->opt.connect_roots)
   osm_switch_clear_hops(p_sw);
   }



What kind of testing was done for this?
I have a strong feeling that it will break up/down.
If the connect_roots option is on, you will not clear
the lid matrix at all, and it will contain also the
down/up routes.


that is the idea.
instead of leaving only up/dpwn routes we want to keep routes between all 
switches.
the routes between host nodes will still be up/down.


Lid matrix might contain down/up routes between many
kinds of switches, not only between roots. This means
that you might have down/up paths between leaf switches,
even though you could have found up/down path for them.
This also means that you might end up with down/up routes
between HCAs that are connected to these leafs.

-- Yevgeny


Eli



-- Yevgeny


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html