[PATCH resend] ib/ehca: rework destroy_eq()

2009-11-20 Thread Alexander Schmidt
The ibmebus_free_irq() function, which might sleep, was called with
interrupts disabled. To ship around this, make sure that no interrupts
are running by killing the interrupt tasklet. Also lock the shca_list_lock to
protect against the poll_eqs_timer running concurrently.

Signed-off-by: Alexander Schmidt al...@linux.vnet.ibm.com
---
Hi Roland,

seems like I used your old mail address and I forgot to add the linux-rdma
list.

Please apply this for your next tree, thanks.

 drivers/infiniband/hw/ehca/ehca_classes.h |1 +
 drivers/infiniband/hw/ehca/ehca_eq.c  |9 ++---
 drivers/infiniband/hw/ehca/ehca_main.c|2 +-
 3 files changed, 8 insertions(+), 4 deletions(-)

--- linux-2.6.orig/drivers/infiniband/hw/ehca/ehca_eq.c
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_eq.c
@@ -169,12 +169,15 @@ int ehca_destroy_eq(struct ehca_shca *sh
unsigned long flags;
u64 h_ret;

-   spin_lock_irqsave(eq-spinlock, flags);
ibmebus_free_irq(eq-ist, (void *)shca);

-   h_ret = hipz_h_destroy_eq(shca-ipz_hca_handle, eq);
+   spin_lock_irqsave(shca_list_lock, flags);
+   eq-is_initialized = 0;
+   spin_unlock_irqrestore(shca_list_lock, flags);
+
+   tasklet_kill(eq-interrupt_task);

-   spin_unlock_irqrestore(eq-spinlock, flags);
+   h_ret = hipz_h_destroy_eq(shca-ipz_hca_handle, eq);

if (h_ret != H_SUCCESS) {
ehca_err(shca-ib_device, Can't free EQ resources.);
--- linux-2.6.orig/drivers/infiniband/hw/ehca/ehca_classes.h
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_classes.h
@@ -375,6 +375,7 @@ extern rwlock_t ehca_qp_idr_lock;
 extern rwlock_t ehca_cq_idr_lock;
 extern struct idr ehca_qp_idr;
 extern struct idr ehca_cq_idr;
+extern spinlock_t shca_list_lock;

 extern int ehca_static_rate;
 extern int ehca_port_act_time;
--- linux-2.6.orig/drivers/infiniband/hw/ehca/ehca_main.c
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_main.c
@@ -123,7 +123,7 @@ DEFINE_IDR(ehca_qp_idr);
 DEFINE_IDR(ehca_cq_idr);

 static LIST_HEAD(shca_list); /* list of all registered ehcas */
-static DEFINE_SPINLOCK(shca_list_lock);
+DEFINE_SPINLOCK(shca_list_lock);

 static struct timer_list poll_eqs_timer;
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


patch for non-ib network

2009-11-20 Thread javed


hello,

is it possible to add a patch in ofed release for non-ib networks? we've 
integrated ofed with 'paramnet-3' a 10Gbps system area network. currently 
IPoIB is working fine. more details can be provided if required.


thanks for your time.
javed


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ib_post_send in drivers

2009-11-20 Thread frank zago

Hello,

It seems ib_post_send() is implemented slightly differently in the 
various hardware drivers (as in kernel 2.6.31). Here are the differences 
I've noticed regarding the bad_wr parameter.


amso1100/c2_qp.c : c2_post_send()
* bails out and does not set bad_wr if the 1st check is bad.

cxgb3/iwch_qp.c : post_one_send()
* test for bad_send_wr but it should always be set

cxgb3/iwch_qp.c : iwch_post_send()
* bails out and does not set bad_wr if the 1st 2 checks are bad

ehca/ehca_reqs.c : ehca_post_send()
* bails out and does not set bad_wr if the 1st check is bad.
* test for bad_send_wr but it should always be set
* always return success if at least one post succeeded.

ehca/ehca_reqs.c : post_one_send()
* test for bad_send_wr but it should always be set

nes/nes_verbs.c : nes_post_send()
* bails out and does not set bad_wr if the 1st check is bad.

I think assume most are bugs (especially the ehca driver). I can post a 
patch to fix these if confirmed.


Regards,
Frank

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ib_post_send in drivers

2009-11-20 Thread Bart Van Assche
On Fri, Nov 20, 2009 at 5:16 PM, frank zago fz...@systemfabricworks.com wrote:

 It seems ib_post_send() is implemented slightly differently in the various 
 hardware drivers (as in kernel 2.6.31). Here are the differences I've noticed 
 regarding the bad_wr parameter.

 amso1100/c2_qp.c : c2_post_send()
 * bails out and does not set bad_wr if the 1st check is bad.

 cxgb3/iwch_qp.c : post_one_send()
 * test for bad_send_wr but it should always be set

 cxgb3/iwch_qp.c : iwch_post_send()
 * bails out and does not set bad_wr if the 1st 2 checks are bad

 ehca/ehca_reqs.c : ehca_post_send()
 * bails out and does not set bad_wr if the 1st check is bad.
 * test for bad_send_wr but it should always be set
 * always return success if at least one post succeeded.

 ehca/ehca_reqs.c : post_one_send()
 * test for bad_send_wr but it should always be set

 nes/nes_verbs.c : nes_post_send()
 * bails out and does not set bad_wr if the 1st check is bad.

 I think assume most are bugs (especially the ehca driver). I can post a patch 
 to fix these if confirmed.

I would like to add the following item to the above list:

mlx4/qp.c: mlx4_ib_post_send()
* when passing a list containing more than one item to
mlx4_ib_post_send(), and sending the second or later item fails (e.g.
because of QP overflow), the preceding items are sent anyway. This
behavior makes it almost impossible to get error recovery right for
block device implementations that use ib_post_send() (e.g. the SRPT
target implementation).

If my interpretation of the section about verbs in the InfiniBand
Architecture Specification is correct, either all work requests should
be processed or none. A quote from section 11.4.1.1, Post Send Request
(page 622 in volume 1 of release 1.2.1):

If an immediate error is returned, the QP state shall not be affected.

Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/11] Add new torus routing engine: torus-2QoS

2009-11-20 Thread Jim Schutt
This patch series adds a new routing engine designed to handle large 
fabrics connected with a 2D/3D torus topology.

Patches 1-4 do some preparation to handle new SL-related features of
the routing engine, patches 5/6 add and enable the engine, and patches
7-11 have some fixups that only make sense in the presence of the new
engine.

So why a new torus routing engine?

Because I believe none of the existing routing engines can provide a
satisfactory operational experience on a large-scale torus, i.e. one
with hundreds of switches.

Generating routes for a torus that are free of credit loops requires
the use of multiple virtual lanes, and thus SLs on IB.  For IB fabrics
it also requires that _every_ application use path record queries - 
any application that uses an SL that was not obtained via a path record
query may cause credit loops.

In addition, if a fabric topology change (e.g. failed switch/link)
causes a change in the path SL values needed to prevent credit loops,
then _every_ application needs to repath for every path whose SL has
changed.  AFAIK there is no good way to do this as yet in general.

Also, the requirement for path SL queries on every connection places a
heavy load on subnet administration, and the possibility that path SL
values can change makes caching as a performance enhancement more 
difficult.

Since multiple VL/SL values are required to prevent credit loops on a 
torus,  supporting QoS means that QoS and routing need to share the small 
pool of available SL values, and the even smaller pool of available VL 
values.

This patch series, and the routing engine it introduces, addresses these
issues for a 2D/3D torus fabric.  The torus-2QoS engine can provide the
following functionality on a 2D/3D torus:
- routing that is free of credit loops
- two levels of QoS, assuming switches support 8 data VLs
- ability to route around a single failed switch, and/or multiple failed
links, without
- introducing credit loops
- changing path SL values
- very short run times, with good scaling properties as fabric size
increases

The routing engine currently in opensm that is most functional for a
torus-connected fabric is LASH.  In comparison with torus-2QoS, LASH
has the following issues:
- LASH does not support QoS.
- changing inter-switch topology (add/remove a switch, or
removing all the links between a switch) can change many
path SL values, potentially leading to credit loops if
running applications do not repath.
- running time to calculate routes scales poorly with increasing 
fabric size.

The basic algorithm used by torus-2QoS is DOR.  It also uses SL bits 0-2,
one SL bit per torus dimension, to encode whether a path crosses a dateline
(where the coordinate value wraps to zero) for each of the three dimensions,
in order to avoid the credit loops that otherwise result on a torus.  It
uses SL bit 3 to distinguish between two QoS levels.

It uses the SL2VL tables to map those eight SL values per QoS level into
two VL values per QoS level, based on which coordinate direction a link
points.  For two QoS levels, this consumes four data VLs, where VL bit
0 encodes whether the path crosses the dateline for the coordinate
direction in which the link points, and VL bit 2 encodes QoS level.

In the event of link failure, it routes the long way around the 1-D ring
containing the failed link.  I.e. no turns are introduced into a path in
order to route around a failed link.  Note that due to this implementation, 
torus-2QoS cannot route a torus with link failures that break a 1-D ring
into two disjoint segments.

Under DOR routing in a torus with a failed switch, paths that would
otherwise turn at the failed switch cannot be routed without introducing
an illegal turn into the path.  Such turns are illegal in the
sense that allowing them will allow credit loops, unless something can
be done.

The routes produced by torus-2QoS will introduce such illegal turns when
a switch fails.  It makes use of the input/output port dependence in the
SL2VL maps to set the otherwise unused VL bit 1 for the path hop following 
such an illegal turn.  This is enough to avoid credit loops in the 
presence of a single failed switch.

As an example, consider the following 2D torus, and consider routes
from S to D, both when the switch at F is operational, and when it
has failed.  torus-2QoS will generate routes such that the path
S-F-D is followed if F is operational, and the path S-E-I-L-D
if F has failed:

|||||||
  --+++++++--
|||||||
  --+++++D+--
|||||||
  --++++IL+--
|||||||
  --++S+EF+--
|||||||
  --+++++++--

The turn in S-E-I-L-D at switch I is the illegal turn introduced
into the path.  The turns at E and L are extra turns 

[PATCH 04/11] opensm: Track the minimum value in the fabric of data VLs supported.

2009-11-20 Thread Jim Schutt
A routing engine that wants to make contributions to SL2VL maps in support
of routing free from credit loops may need to know the minimum number
of supported data VLs in the fabric.

This code tracks that value.

Signed-off-by: Jim Schutt jasc...@sandia.gov
---
 opensm/include/opensm/osm_subnet.h |1 +
 opensm/opensm/osm_port_info_rcv.c  |   13 -
 opensm/opensm/osm_state_mgr.c  |6 ++
 opensm/opensm/osm_subnet.c |1 +
 4 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h 
b/opensm/include/opensm/osm_subnet.h
index 0302f91..c303e86 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -509,6 +509,7 @@ typedef struct osm_subn {
uint16_t max_mcast_lid_ho;
uint8_t min_ca_mtu;
uint8_t min_ca_rate;
+   uint8_t min_data_vls;
boolean_t ignore_existing_lfts;
boolean_t subnet_initialization_error;
boolean_t force_heavy_sweep;
diff --git a/opensm/opensm/osm_port_info_rcv.c 
b/opensm/opensm/osm_port_info_rcv.c
index 8a99064..b0d54c8 100644
--- a/opensm/opensm/osm_port_info_rcv.c
+++ b/opensm/opensm/osm_port_info_rcv.c
@@ -82,6 +82,7 @@ static void pi_rcv_process_endport(IN osm_sm_t * sm, IN 
osm_physp_t * p_physp,
ib_api_status_t status;
ib_net64_t port_guid;
uint8_t rate, mtu;
+   unsigned data_vls;
cl_qmap_t *p_sm_tbl;
osm_remote_sm_t *p_sm;
 
@@ -91,7 +92,7 @@ static void pi_rcv_process_endport(IN osm_sm_t * sm, IN 
osm_physp_t * p_physp,
 
/* HACK extended port 0 should be handled too! */
if (osm_physp_get_port_num(p_physp) != 0) {
-   /* track the minimal endport MTU and rate */
+   /* track the minimal endport MTU, rate, and operational VLs */
mtu = ib_port_info_get_mtu_cap(p_pi);
if (mtu  sm-p_subn-min_ca_mtu) {
OSM_LOG(sm-p_log, OSM_LOG_VERBOSE,
@@ -107,6 +108,16 @@ static void pi_rcv_process_endport(IN osm_sm_t * sm, IN 
osm_physp_t * p_physp,
PRIx64 \n, rate, cl_ntoh64(port_guid));
sm-p_subn-min_ca_rate = rate;
}
+
+   data_vls = 1U  (ib_port_info_get_op_vls(p_pi) - 1);
+   if (data_vls = IB_MAX_NUM_VLS)
+   data_vls = IB_MAX_NUM_VLS - 1;
+   if ((uint8_t)data_vls  sm-p_subn-min_data_vls) {
+   OSM_LOG(sm-p_log, OSM_LOG_VERBOSE,
+   Setting endport minimal data VLs to:%u defined 
by port:0x%
+   PRIx64 \n, data_vls, cl_ntoh64(port_guid));
+   sm-p_subn-min_data_vls = data_vls;
+   }
}
 
if (port_guid != sm-p_subn-sm_port_guid) {
diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index c3f49dc..b6c41a6 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1132,6 +1132,12 @@ repeat_discovery:
sm-p_subn-force_reroute = FALSE;
sm-p_subn-subnet_initialization_error = FALSE;
 
+   /* Reset tracking values in case limiting component got removed
+* from fabric. */
+   sm-p_subn-min_ca_mtu = IB_MAX_MTU;
+   sm-p_subn-min_ca_rate = IB_MAX_RATE;
+   sm-p_subn-min_data_vls = IB_MAX_NUM_VLS - 1;
+
/* rescan configuration updates */
if (!config_parsed  osm_subn_rescan_conf_files(sm-p_subn)  0)
OSM_LOG(sm-p_log, OSM_LOG_ERROR, ERR 331A: 
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index 2cfcbe6..19ba730 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -526,6 +526,7 @@ ib_api_status_t osm_subn_init(IN osm_subn_t * p_subn, IN 
osm_opensm_t * p_osm,
p_subn-max_mcast_lid_ho = IB_LID_MCAST_END_HO;
p_subn-min_ca_mtu = IB_MAX_MTU;
p_subn-min_ca_rate = IB_MAX_RATE;
+   p_subn-min_data_vls = IB_MAX_NUM_VLS - 1;
p_subn-ignore_existing_lfts = TRUE;
 
/* we assume master by default - so we only need to set it true if 
STANDBY */
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/11] opensm: Enable torus-2QoS routing engine.

2009-11-20 Thread Jim Schutt

Signed-off-by: Jim Schutt jasc...@sandia.gov
---
 opensm/include/opensm/osm_opensm.h |1 +
 opensm/opensm/osm_opensm.c |6 ++
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h 
b/opensm/include/opensm/osm_opensm.h
index ef9d4e1..90c6c0f 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -105,6 +105,7 @@ typedef enum _osm_routing_engine_type {
OSM_ROUTING_ENGINE_TYPE_FTREE,
OSM_ROUTING_ENGINE_TYPE_LASH,
OSM_ROUTING_ENGINE_TYPE_DOR,
+   OSM_ROUTING_ENGINE_TYPE_TORUS_2QOS,
OSM_ROUTING_ENGINE_TYPE_UNKNOWN
 } osm_routing_engine_type_t;
 /***/
diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c
index 9cd254e..7052d49 100644
--- a/opensm/opensm/osm_opensm.c
+++ b/opensm/opensm/osm_opensm.c
@@ -70,6 +70,7 @@ extern int osm_ucast_file_setup(struct osm_routing_engine *, 
osm_opensm_t *);
 extern int osm_ucast_ftree_setup(struct osm_routing_engine *, osm_opensm_t *);
 extern int osm_ucast_lash_setup(struct osm_routing_engine *, osm_opensm_t *);
 extern int osm_ucast_dor_setup(struct osm_routing_engine *, osm_opensm_t *);
+extern int osm_ucast_torus2QoS_setup(struct osm_routing_engine *, osm_opensm_t 
*);
 
 const static struct routing_engine_module routing_modules[] = {
{minhop, osm_ucast_minhop_setup},
@@ -78,6 +79,7 @@ const static struct routing_engine_module routing_modules[] = 
{
{ftree, osm_ucast_ftree_setup},
{lash, osm_ucast_lash_setup},
{dor, osm_ucast_dor_setup},
+   {torus-2QoS, osm_ucast_torus2QoS_setup},
{NULL, NULL}
 };
 
@@ -98,6 +100,8 @@ const char *osm_routing_engine_type_str(IN 
osm_routing_engine_type_t type)
return lash;
case OSM_ROUTING_ENGINE_TYPE_DOR:
return dor;
+   case OSM_ROUTING_ENGINE_TYPE_TORUS_2QOS:
+   return torus-2QoS;
default:
break;
}
@@ -124,6 +128,8 @@ osm_routing_engine_type_t osm_routing_engine_type(IN const 
char *str)
return OSM_ROUTING_ENGINE_TYPE_LASH;
else if (!strcasecmp(str, dor))
return OSM_ROUTING_ENGINE_TYPE_DOR;
+   else if (!strcasecmp(str, torus-2QoS))
+   return OSM_ROUTING_ENGINE_TYPE_TORUS_2QOS;
else
return OSM_ROUTING_ENGINE_TYPE_UNKNOWN;
 }
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/11] opensm: Allow the routing engine to participate in path SL calculations.

2009-11-20 Thread Jim Schutt
LASH already does this, in a hard-coded fashion.

Generalize this by adding a callback to struct osm_routing_engine that
computes a path SL value, and fix up LASH to use it.

This patchset causes the requested or QoS-computed SL value to be passed
to the routing engine path SL computation as a hint.  In the event the
routing engine's use of SLs allows it to support more than one QoS level,
it may be able to make use of the SL hint to do so.

For now, LASH just ignores the hint.

Note that before this change, if LASH was configured and a specific path
SL value was requested that differed from what LASH needed to route the
fabric without credit loops, the path SL lookup would fail.  Now LASH's
SL value is always used.

Possibly the choice between failing a path SL request when it conflicts
with routing, vs. always providing an SL value that gives a credit-loop-
free routing, should be user-configurable?

Signed-off-by: Jim Schutt jasc...@sandia.gov
---
 opensm/include/opensm/osm_opensm.h |6 +
 opensm/include/opensm/osm_ucast_lash.h |3 --
 opensm/opensm/osm_link_mgr.c   |   15 -
 opensm/opensm/osm_sa_path_record.c |   34 +++
 opensm/opensm/osm_ucast_lash.c |8 +-
 5 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h 
b/opensm/include/opensm/osm_opensm.h
index 616113b..ef9d4e1 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -129,6 +129,9 @@ struct osm_routing_engine {
void (*update_sl2vl)(void *context, IN osm_port_t *port,
 IN uint8_t in_port_num, IN uint8_t out_port_num,
 IN OUT ib_slvl_table_t *t);
+   uint8_t (*path_sl)(void *context, IN uint8_t path_sl_hint,
+  IN const osm_port_t *src_port,
+  IN const osm_port_t *dst_port);
void (*delete) (void *context);
struct osm_routing_engine *next;
 };
@@ -160,6 +163,9 @@ struct osm_routing_engine {
 *  for which the SL2VL map should be updated, and in_port_num/
 *  out_port_num should be ignored.
 *
+*  path_sl
+*  The callback for computing path SL.
+*
 *  delete
 *  The delete method, may be used for routing engine
 *  internals cleanup.
diff --git a/opensm/include/opensm/osm_ucast_lash.h 
b/opensm/include/opensm/osm_ucast_lash.h
index 9e15d38..dd90d5d 100644
--- a/opensm/include/opensm/osm_ucast_lash.h
+++ b/opensm/include/opensm/osm_ucast_lash.h
@@ -94,7 +94,4 @@ typedef struct _lash {
int ***virtual_location;
 } lash_t;
 
-uint8_t osm_get_lash_sl(osm_opensm_t * p_osm, const osm_port_t * p_src_port,
-   const osm_port_t * p_dst_port);
-
 #endif
diff --git a/opensm/opensm/osm_link_mgr.c b/opensm/opensm/osm_link_mgr.c
index aaeebc7..02d6ec8 100644
--- a/opensm/opensm/osm_link_mgr.c
+++ b/opensm/opensm/osm_link_mgr.c
@@ -53,21 +53,23 @@
 #include opensm/osm_helper.h
 #include opensm/osm_msgdef.h
 #include opensm/osm_opensm.h
-#include opensm/osm_ucast_lash.h
 
 static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN osm_physp_t * p_physp)
 {
osm_opensm_t *p_osm = sm-p_subn-p_osm;
+   struct osm_routing_engine *re = p_osm-routing_engine_used;
const osm_port_t *p_sm_port, *p_src_port;
ib_net16_t slid, smlid;
uint8_t sl;
 
OSM_LOG_ENTER(sm-p_log);
 
-   if (!(p_osm-routing_engine_used 
- p_osm-routing_engine_used-type == OSM_ROUTING_ENGINE_TYPE_LASH 

+   if (!(re  re-path_sl 
  (slid = osm_physp_get_base_lid(p_physp {
-   /* Use default SL if lash routing is not used */
+   /*
+* Use default SL if routing engine does not provide a
+* path SL lookup callback.
+*/
OSM_LOG_EXIT(sm-p_log);
return sm-p_subn-opt.sm_sl;
}
@@ -81,8 +83,9 @@ static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN 
osm_physp_t * p_physp)
p_src_port =
cl_ptr_vector_get(sm-p_subn-port_lid_tbl, cl_ntoh16(slid));
 
-   /* Call lash to find proper SL */
-   sl = osm_get_lash_sl(p_osm, p_src_port, p_sm_port);
+   /* Call into routing engine to find proper SL */
+   sl = re-path_sl(re-context, sm-p_subn-opt.sm_sl,
+p_src_port, p_sm_port);
 
OSM_LOG_EXIT(sm-p_log);
return sl;
diff --git a/opensm/opensm/osm_sa_path_record.c 
b/opensm/opensm/osm_sa_path_record.c
index 484cb5b..dcb2d4e 100644
--- a/opensm/opensm/osm_sa_path_record.c
+++ b/opensm/opensm/osm_sa_path_record.c
@@ -161,6 +161,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * 
sa,
const osm_physp_t *p_dest_physp;
const osm_prtn_t *p_prtn = NULL;
osm_opensm_t *p_osm;
+   struct osm_routing_engine *p_re;
const ib_port_info_t *p_pi;

[PATCH 11/11] opensm: Update documentation to describe torus-2QoS.

2009-11-20 Thread Jim Schutt

Signed-off-by: Jim Schutt jasc...@sandia.gov
---
 opensm/doc/current-routing.txt |  154 +++-
 opensm/man/opensm.8.in |9 ++-
 2 files changed, 160 insertions(+), 3 deletions(-)

diff --git a/opensm/doc/current-routing.txt b/opensm/doc/current-routing.txt
index 1302860..141d793 100644
--- a/opensm/doc/current-routing.txt
+++ b/opensm/doc/current-routing.txt
@@ -1,7 +1,7 @@
 Current OpenSM Routing
-7/9/07
+10/9/09
 
-OpenSM offers five routing engines:
+OpenSM offers six routing engines:
 
 1.  Min Hop Algorithm - based on the minimum hops to each node where the
 path length is optimized.
@@ -28,6 +28,13 @@ two switches.  This provides deadlock free routes for 
hypercubes when
 the fabric is cabled as a hypercube and for meshes when cabled as a
 mesh (see details below).
 
+6. Torus-2QoS unicast routing algorithm - a DOR-based routing algorithm
+specialized for 2D/3D torus topologies.  Torus-2QoS provides deadlock-free
+routing while supporting two quality of service (QoS) levels.  In addition
+it is able to route around multiple failed fabric links or a single failed
+fabric switch without introducing deadlocks, and without changing path SL
+values granted before the failure.
+
 OpenSM provides an optional unicast routing cache (enabled by -A or
 --ucast_cache options). When enabled, unicast routing cache prevents
 routing recalculation (which is a heavy task in a large cluster) when
@@ -388,3 +395,146 @@ ports, one port on one end of the cable, and the other 
port on the
 other end, continuing along the mesh dimension.
 
 Use '-R dor' option to activate the DOR algorithm.
+
+Torus-2QoS Routing Algorithm
+
+
+Torus-2QoS is routing algorithm designed for large-scale 2D/3D torus fabrics.
+
+It is a DOR-based algorithm that avoids deadlocks that would otherwise
+occur in a torus using the concept of a dateline for each torus dimension.
+It encodes into a path SL which datelines the path crosses as follows:
+
+  sl = 0;
+  for (d = 0; d  torus_dimensions; d++)
+/* path_crosses_dateline(d) returns 0 or 1 */
+sl |= path_crosses_dateline(d)  d;
+
+For a 3D torus, that leaves one SL bit free, which torus-2QoS uses to
+implement two QoS levels.
+
+This is possible because torus-2QoS also makes use of the output port
+dependence of the switch SL2VL maps.  It computes in which torus coordinate
+direction each interswitch link points, and writes SL2VL maps for such
+ports as follows:
+
+  for (sl = 0; sl  16; sl ++)
+/* cdir(port) reports which torus coordinate direction a switch port
+ * points in, and returns 0, 1, or 2 */
+sl2vl(iport,oport,sl) = 0x1  (sl  cdir(oport));
+
+Thus torus-2QoS consumes 8 SL values (SL bits 0-2) and 2 VL values (VL bit 0)
+ per QoS level to provide deadlock-free routing on a 3D torus.
+
+Torus-2QoS routes around link failure by taking the long way around any
+1D ring interrupted by a link failure.  For example, consider the 2D 6x5
+torus below, where switches are denoted by [+a-zA-Z]:
+
+||||||
+   4  --++++++--
+||||||
+   3  --+++D++--
+||||||
+   2  --++Ir++--
+||||||
+   1  --mSnTop--
+||||||
+ y=0  --++++++--
+||||||
+
+  x=012345
+
+For a pristine fabric the path from S to D would be S-n-T-r-d.  In the
+event that either link S-n or n-T has failed, torus-2QoS would use the path
+S-m-p-o-T-r-D.  Note that it can do this without changing the path SL
+value; once the 1D ring m-S-n-T-o-p-m has been broken by failure, path
+segments using it cannot contribute to deadlock, and the x-direction
+dateline (between, say, x=5 and x=0) can be ignored for path segments on
+that ring.
+
+One result of this is that torus-2QoS can route around many simultaneous
+link failures, as long as no 1D ring is broken into disjoint regions.  For
+example, if links n-T and T-o have both failed, that ring has been broken
+into two disjoint regions, T and o-p-m-S-n.  Torus-2QoS checks for such
+issues, reports if they are found, and refuses to route such fabrics.
+
+Handling a failed switch under DOR requires introducing into a path at
+least one turn that would be otherwise illegal, i.e. not allowed by DOR
+rules.  Torus-2QoS will introduce such a turn as close as possible to the
+failed switch in order to route around it.
+
+In the above example, suppose switch T has failed, and consider the path
+from S to D.  Torus-2QoS will produce the path S-n-I-r-D, rather than the
+S-n-T-r-D path for a pristine torus, by introducing an early turn at n.
+For traffic arriving at switch I from n, normal DOR rules will generate an
+illegal turn in the path from S to D at I, and a legal turn at r.
+
+Torus-2QoS will also use the input port 

[PATCH 01/11] opensm: Prepare for routing engine input to path record SL lookup and SL2VL map setup.

2009-11-20 Thread Jim Schutt
In the event a routing engine needs to participate in SL assignment and
SL2VL map setup in order to avoid credit loops in a fabric, it will be
useful to make the routing engine context more widely available.

To this end, have osm_opensm_t save a pointer to the routing engine used,
rather than its type.  This will make the routing engine context easily
available in, e.g., sl2vl_update() and pr_rcv_get_path_parms().

Make the necessary adjustments to the code that used the old
routing_engine_used as an enum _osm_routing_engine_type.  In order to
keep the behavior where minhop was used if the configured routing engines
failed, the easiest solution was to add a pointer to osm_opensm_t which
pointed to the minhop struct osm_routing_engine.

Signed-off-by: Jim Schutt jasc...@sandia.gov
---
 opensm/include/opensm/osm_opensm.h |4 ++-
 opensm/opensm/osm_console.c|   10 ++--
 opensm/opensm/osm_dump.c   |3 +-
 opensm/opensm/osm_link_mgr.c   |5 ++-
 opensm/opensm/osm_opensm.c |   43 +---
 opensm/opensm/osm_sa_path_record.c |3 +-
 opensm/opensm/osm_ucast_lash.c |3 +-
 opensm/opensm/osm_ucast_mgr.c  |   17 --
 8 files changed, 54 insertions(+), 34 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h 
b/opensm/include/opensm/osm_opensm.h
index c6c9bdb..e97142e 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -120,6 +120,7 @@ typedef enum _osm_routing_engine_type {
 *  added later.
 */
 struct osm_routing_engine {
+   osm_routing_engine_type_t type;
const char *name;
void *context;
int (*build_lid_matrices) (void *context);
@@ -183,7 +184,8 @@ typedef struct osm_opensm {
cl_dispatcher_t disp;
cl_plock_t lock;
struct osm_routing_engine *routing_engine_list;
-   osm_routing_engine_type_t routing_engine_used;
+   struct osm_routing_engine *routing_engine_used;
+   struct osm_routing_engine *default_routing_engine;
osm_stats_t stats;
osm_console_t console;
nn_map_t *node_name_map;
diff --git a/opensm/opensm/osm_console.c b/opensm/opensm/osm_console.c
index 206e7f7..f0c7aa0 100644
--- a/opensm/opensm/osm_console.c
+++ b/opensm/opensm/osm_console.c
@@ -362,6 +362,8 @@ static void print_status(osm_opensm_t * p_osm, FILE * out)
cl_list_item_t *item;
 
if (out) {
+   const char *re_str;
+
cl_plock_acquire(p_osm-lock);
fprintf(out,OpenSM Version   : %s\n, 
p_osm-osm_version);
fprintf(out,SM State : %s\n,
@@ -370,9 +372,11 @@ static void print_status(osm_opensm_t * p_osm, FILE * out)
p_osm-subn.opt.sm_priority);
fprintf(out,SA State : %s\n,
sa_state_str(p_osm-sa.state));
-   fprintf(out,Routing Engine   : %s\n,
-   osm_routing_engine_type_str(p_osm-
-   routing_engine_used));
+
+   re_str = p_osm-routing_engine_used ?
+   
osm_routing_engine_type_str(p_osm-routing_engine_used-type) :
+   
osm_routing_engine_type_str(OSM_ROUTING_ENGINE_TYPE_NONE);
+   fprintf(out,Routing Engine   : %s\n, re_str);
 
fprintf(out,Loaded event plugins :);
if (cl_qlist_head(p_osm-plugin_list) ==
diff --git a/opensm/opensm/osm_dump.c b/opensm/opensm/osm_dump.c
index 86e9c00..f3f4623 100644
--- a/opensm/opensm/osm_dump.c
+++ b/opensm/opensm/osm_dump.c
@@ -135,7 +135,8 @@ static void dump_ucast_routes(cl_map_item_t * item, FILE * 
file, void *cxt)
Switch 0x%016 PRIx64 \nLID: Port : Hops : Optimal\n,
cl_ntoh64(osm_node_get_node_guid(p_node)));
 
-   dor = (p_osm-routing_engine_used == OSM_ROUTING_ENGINE_TYPE_DOR);
+   dor = (p_osm-routing_engine_used 
+  p_osm-routing_engine_used-type == OSM_ROUTING_ENGINE_TYPE_DOR);
 
for (lid_ho = 1; lid_ho = max_lid_ho; lid_ho++) {
fprintf(file, 0x%04X : , lid_ho);
diff --git a/opensm/opensm/osm_link_mgr.c b/opensm/opensm/osm_link_mgr.c
index 03a585b..aaeebc7 100644
--- a/opensm/opensm/osm_link_mgr.c
+++ b/opensm/opensm/osm_link_mgr.c
@@ -64,8 +64,9 @@ static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN 
osm_physp_t * p_physp)
 
OSM_LOG_ENTER(sm-p_log);
 
-   if (p_osm-routing_engine_used != OSM_ROUTING_ENGINE_TYPE_LASH
-   || !(slid = osm_physp_get_base_lid(p_physp))) {
+   if (!(p_osm-routing_engine_used 
+ p_osm-routing_engine_used-type == OSM_ROUTING_ENGINE_TYPE_LASH 

+ (slid = osm_physp_get_base_lid(p_physp {
/* Use default SL if lash routing is not used */
OSM_LOG_EXIT(sm-p_log);
return sm-p_subn-opt.sm_sl;
diff --git 

[PATCH 08/11] opensm: Do not require -Q option for torus-2QoS routing engine.

2009-11-20 Thread Jim Schutt
The torus-2QoS engine provides a deadlock-free routing for a 2D/3D torus,
but requires that switch SL2VL maps be programmed.  Before this change,
opensm -Q was required for that to happen.

When a routing engine sets the struct osm_routing_engine:update_sl2vl
pointer, it is signalling its intent to participate in SL2VL map programming.
So, don't return early from osm_qos_setup() in that case; instead do everything
except attempt to read QoS configuration information.

For that to work properly, need to also always set up the default QoS config
information, instead of just when QoS is requested via -Q.

With that in place, the -Q option now means the same thing to torus-2QoS that
it means to other routing engines: QoS configuration is requested.

Otherwise, torus-2QoS can confine its unicast traffic to SLs 8-15, leaving
SL 0 free, e.g. for multicast.  This is useful until such time as
torus-2QoS can be extended to implement a spanning tree for multicast that
will not deadlock against the routing used for unicast.

Signed-off-by: Jim Schutt jasc...@sandia.gov
---
 opensm/opensm/osm_qos.c |7 +--
 opensm/opensm/osm_subnet.c  |   18 +-
 opensm/opensm/osm_ucast_torus.c |   24 +++-
 3 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
index f42c334..0f0b24f 100644
--- a/opensm/opensm/osm_qos.c
+++ b/opensm/opensm/osm_qos.c
@@ -288,7 +288,9 @@ int osm_qos_setup(osm_opensm_t * p_osm)
int ret = 0;
uint8_t i;
 
-   if (!p_osm-subn.opt.qos)
+   if (!(p_osm-subn.opt.qos ||
+ (p_osm-routing_engine_used 
+  p_osm-routing_engine_used-update_sl2vl)))
return 0;
 
OSM_LOG_ENTER(p_osm-log);
@@ -305,7 +307,8 @@ int osm_qos_setup(osm_opensm_t * p_osm)
cl_plock_excl_acquire(p_osm-lock);
 
/* read QoS policy config file */
-   osm_qos_parse_policy_file(p_osm-subn);
+   if (p_osm-subn.opt.qos)
+   osm_qos_parse_policy_file(p_osm-subn);
 
p_tbl = p_osm-subn.port_guid_tbl;
p_next = cl_qmap_head(p_tbl);
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index c9bb20c..cc81545 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -1044,6 +1044,8 @@ static void subn_verify_qos_set(osm_qos_options_t *set, 
const char *prefix,
 
 int osm_subn_verify_config(IN osm_subn_opt_t * p_opts)
 {
+   osm_qos_options_t dflt;
+
if (p_opts-lmc  7) {
log_report( Invalid Cached Option Value:lmc = %u:
   Using Default:%u\n, p_opts-lmc, OSM_DEFAULT_LMC);
@@ -1087,17 +1089,15 @@ int osm_subn_verify_config(IN osm_subn_opt_t * p_opts)
p_opts-console = OSM_DEFAULT_CONSOLE;
}
 
-   if (p_opts-qos) {
-   osm_qos_options_t dflt;
-
-   /* the default options in qos_options must be correct.
-* every other one need not be, b/c those will default
-* back to whatever is in qos_options.
-*/
 
-   subn_set_default_qos_options(dflt);
+   /* the default options in qos_options must be correct.
+* every other one need not be, b/c those will default
+* back to whatever is in qos_options.
+*/
+   subn_set_default_qos_options(dflt);
+   subn_verify_qos_set(p_opts-qos_options, qos, dflt);
 
-   subn_verify_qos_set(p_opts-qos_options, qos, dflt);
+   if (p_opts-qos) {
subn_verify_qos_set(p_opts-qos_ca_options, qos_ca,
p_opts-qos_options);
subn_verify_qos_set(p_opts-qos_sw0_options, qos_sw0,
diff --git a/opensm/opensm/osm_ucast_torus.c b/opensm/opensm/osm_ucast_torus.c
index 6fff73e..8eb2880 100644
--- a/opensm/opensm/osm_ucast_torus.c
+++ b/opensm/opensm/osm_ucast_torus.c
@@ -298,6 +298,7 @@ struct torus {
 #define Z_MESH (1U  2)
 #define MSG_DEADLOCK (1U  29)
 #define NOTIFY_CHANGES (1U  30)
+#define QOS_ENABLED (1U  31)
 
 #define ALL_MESH(flags) \
((flags  (X_MESH | Y_MESH | Z_MESH)) == (X_MESH | Y_MESH | Z_MESH))
@@ -8548,7 +8549,25 @@ uint8_t torus_path_sl(void *context, uint8_t 
path_sl_hint,
sl  = sl_set_use_loop_vl(use_vl1(ssw-i, dsw-i, t-x_sz), 0);
sl |= sl_set_use_loop_vl(use_vl1(ssw-j, dsw-j, t-y_sz), 1);
sl |= sl_set_use_loop_vl(use_vl1(ssw-k, dsw-k, t-z_sz), 2);
-   sl |= sl_set_qos(sl_get_qos(path_sl_hint));
+
+   /*
+* If QoS was not requested by user, force path SLs into 8-15 range.
+* This leaves SL 0 available for multicast, and SL2VL mappings
+* will keep multicast traffic from deadlocking with unicast traffic.
+*
+* However, multicast might still deadlock against itself if multiple
+* multicast groups each use their own spanning tree.
+*
+* FIXME: it is possible to construct a spanning tree that can

[PATCH 07/11] opensm: Add opensm option to specify file name for extra torus-2QoS configuration information.

2009-11-20 Thread Jim Schutt

Signed-off-by: Jim Schutt jasc...@sandia.gov
---
 opensm/include/opensm/osm_base.h   |   18 ++
 opensm/include/opensm/osm_subnet.h |5 +
 opensm/opensm/main.c   |8 
 opensm/opensm/osm_subnet.c |1 +
 opensm/opensm/osm_ucast_torus.c|2 +-
 5 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h
index 9d8bf98..0a90ba8 100644
--- a/opensm/include/opensm/osm_base.h
+++ b/opensm/include/opensm/osm_base.h
@@ -278,6 +278,24 @@ BEGIN_C_DECLS
 #endif /* __WIN__ */
 /***/
 
+/d* OpenSM: Base/OSM_DEFAULT_TORUS_CONF_FILE
+* NAME
+*  OSM_DEFAULT_TORUS_CONF_FILE
+*
+* DESCRIPTION
+*  Specifies the default file name for extra torus-2QoS configuration
+*
+* SYNOPSIS
+*/
+#ifdef __WIN__
+#define OSM_DEFAULT_TORUS_CONF_FILE strcat(GetOsmCachePath(), 
osm-torus-2QoS.conf)
+#elif defined(OPENSM_CONFIG_DIR)
+#define OSM_DEFAULT_TORUS_CONF_FILE OPENSM_CONFIG_DIR /torus-2QoS.conf
+#else
+#define OSM_DEFAULT_TORUS_CONF_FILE /etc/opensm/torus-2QoS.conf
+#endif /* __WIN__ */
+/***/
+
 /d* OpenSM: Base/OSM_DEFAULT_PREFIX_ROUTES_FILE
 * NAME
 *  OSM_DEFAULT_PREFIX_ROUTES_FILE
diff --git a/opensm/include/opensm/osm_subnet.h 
b/opensm/include/opensm/osm_subnet.h
index c303e86..6350dfb 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -200,6 +200,7 @@ typedef struct osm_subn_opt {
char *ids_guid_file;
char *guid_routing_order_file;
char *sa_db_file;
+   char *torus_conf_file;
boolean_t do_mesh_analysis;
boolean_t exit_on_fatal;
boolean_t honor_guid2lid_file;
@@ -411,6 +412,10 @@ typedef struct osm_subn_opt {
 *  sa_db_file
 *  Name of the SA database file.
 *
+*  torus_conf_file
+*  Name of the file with extra configuration info for torus-2QoS
+*  routing engine.
+*
 *  exit_on_fatal
 *  If TRUE (default) - SM will exit on fatal subnet initialization
 *  issues.
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index 18efde1..488327c 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -231,6 +231,10 @@ static void show_usage(void)
 Set the order port guids will be routed for the 
MinHop\n
 and Up/Down routing algorithms to the guids provided 
in the\n
 given file (one to a line)\n\n);
+   printf(--torus_config path to file\n
+This option defines the file name for the extra 
configuration\n
+info needed for the torus-2QoS routing engine.   The 
default\n
+name is \'OSM_DEFAULT_TORUS_CONF_FILE\'\n\n);
printf(--once, -o\n
 This option causes OpenSM to configure the subnet\n
 once, then exit.  Ports remain in the ACTIVE 
state.\n\n);
@@ -607,6 +611,7 @@ int main(int argc, char *argv[])
{lash_start_vl, 1, NULL, 6},
{sm_sl, 1, NULL, 7},
{retries, 1, NULL, 8},
+   {torus_config, 1, NULL, 9},
{NULL, 0, NULL, 0}  /* Required at the end of the array */
};
 
@@ -985,6 +990,9 @@ int main(int argc, char *argv[])
printf( Transaction retries = %u\n,
   opt.transaction_retries);
break;
+   case 9:
+   SET_STR_OPT(opt.torus_conf_file, optarg);
+   break;
case 'h':
case '?':
case ':':
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index 19ba730..c9bb20c 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -747,6 +747,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * p_opt)
p_opt-ids_guid_file = NULL;
p_opt-guid_routing_order_file = NULL;
p_opt-sa_db_file = NULL;
+   p_opt-torus_conf_file = strdup(OSM_DEFAULT_TORUS_CONF_FILE);
p_opt-do_mesh_analysis = FALSE;
p_opt-exit_on_fatal = TRUE;
p_opt-enable_quirks = FALSE;
diff --git a/opensm/opensm/osm_ucast_torus.c b/opensm/opensm/osm_ucast_torus.c
index 149189f..6fff73e 100644
--- a/opensm/opensm/osm_ucast_torus.c
+++ b/opensm/opensm/osm_ucast_torus.c
@@ -8573,7 +8573,7 @@ int torus_build_lfts(void *context)
torus-osm = ctx-osm;
fabric-osm = ctx-osm;
 
-   if (!parse_config(OPENSM_CONFIG_DIR /opensm-torus.conf,
+   if (!parse_config(ctx-osm-subn.opt.torus_conf_file,
  fabric, torus))
goto out;
 
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/11] opensm: Avoid havoc in minhop caused by torus-2QoS persistent use of osm_port_t:priv.

2009-11-20 Thread Jim Schutt
Torus-2QoS makes persistent use of osm_port_t:priv to speed calculation
of path SL values.

It cannot clear osm_port_t:priv members when it tears down its persistent
data for the following reason: If a port is removed from the fabric, the
opensm core will delete the corresponding osm_port_t object, leaving
torus-2QoS holding a dangling reference.  Torus-2QoS then has a use-after-free
error when tearing down its persistent data if it tries to use its dangling
osm_port_t reference to clear the priv member.

When torus-2QoS is unable to route a fabric due to missing switches and
opensm is configured to fall back to minhop, havoc will ensue because
minhop uses a non-NULL osm_port_t:priv as a proxy for LMC  0: it
assumes if osm_port_t:priv is non-NULL it can only be because
alloc_ports_priv() has been called.

Fix this up by always calling alloc_ports_priv(), and have it set
priv = NULL if LMC == 0.

Signed-off-by: Jim Schutt jasc...@sandia.gov
---
 opensm/opensm/osm_ucast_mgr.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index f3cd379..1bb7a13 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -314,8 +314,10 @@ static void alloc_ports_priv(osm_ucast_mgr_t * mgr)
 item = cl_qmap_next(item)) {
port = (osm_port_t *) item;
lmc = ib_port_info_get_lmc(port-p_physp-port_info);
-   if (!lmc)
+   if (!lmc) {
+   port-priv = NULL;
continue;
+   }
r = malloc(sizeof(*r) + sizeof(r-guids[0]) * (1  lmc));
if (!r) {
OSM_LOG(mgr-p_log, OSM_LOG_ERROR, ERR 3A09: 
@@ -362,8 +364,7 @@ static void ucast_mgr_process_tbl(IN cl_map_item_t * 
p_map_item,
/* Initialize LIDs in buffer to invalid port number. */
memset(p_sw-new_lft, OSM_NO_PATH, p_sw-max_lid_ho + 1);
 
-   if (p_mgr-p_subn-opt.lmc)
-   alloc_ports_priv(p_mgr);
+   alloc_ports_priv(p_mgr);
 
/*
   Iterate through every port setting LID routes for each
@@ -380,8 +381,7 @@ static void ucast_mgr_process_tbl(IN cl_map_item_t * 
p_map_item,
}
}
 
-   if (p_mgr-p_subn-opt.lmc)
-   free_ports_priv(p_mgr);
+   free_ports_priv(p_mgr);
 
OSM_LOG_EXIT(p_mgr-p_log);
 }
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/11] opensm: Add torus-2QoS routing engine.

2009-11-20 Thread Jim Schutt

This engine routes a 2D/3D torus without credit loops while providing two
quality-of-service levels.

Signed-off-by: Jim Schutt jasc...@sandia.gov
---

I've attached the patch as a compressed file, as otherwise
it is too large to make it through the list.

-- Jim

 opensm/opensm/Makefile.am   |2 +-
 opensm/opensm/osm_ucast_torus.c | 8643 +++
 2 files changed, 8644 insertions(+), 1 deletions(-)
 create mode 100644 opensm/opensm/osm_ucast_torus.c



0005-opensm-Add-torus-2QoS-routing-engine.patch.bz2
Description: application/bzip


torus-2QoS example input files (was Re: [PATCH 00/11] Add new torus routing engine: torus-2QoS)

2009-11-20 Thread Jim Schutt

The attached files can be used to test the torus-2QoS routing
engine using ibsim.

fabric-torus-5x5x5 contains a fabric description that ibsim can read.
Once ibsim is running, run opensm like this:

  opensm --config opensm.conf --torus_config torus-2QoS-5x5x5.conf
or 
  opensm --config opensm.conf --torus_config torus-2QoS-5x5x5.conf \
 -Q --qos_policy_file qos-policy-torus-5x5x5.conf

-- Jim



fabric-torus-5x5x5.bz2
Description: application/bzip

# Limit the maximal operational VLs
max_op_vls 8

# The number of seconds between subnet sweeps (0 disables it)
sweep_interval 10

# Routing engine
# Multiple routing engines can be specified separated by
# commas so that specific ordering of routing algorithms will
# be tried if earlier routing engines fail.
# Supported engines: minhop, updn, file, ftree, lash, dor
routing_engine torus-2QoS,no_fallback

# Use unicast routing cache (use FALSE if unsure)
use_ucast_cache TRUE

# Force flush of the log file after each log message
force_log_flush TRUE

# Log file to be used
log_file /dev/tty

# console [off|local|loopback|socket]
console loopback

# Telnet port for console (default 1)
console_port 1

# QoS default options
# Note that for OFED  1.3, this information can also be in qos-policy.conf.
# However, it may be good to have it here also for torus-2QoS, as this will
# change the defaults even if not using QoS.
qos_max_vls 8
qos_high_limit 0
qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0
qos_vlarb_low 0:64,1:64,2:64,3:64,4:64,5:64,6:64,7:64,8:64
qos_sl2vl (null)

# This is a QoS configuration for the torus-2QoS routing engine.
# As it supports only 2 levels of QoS, via SL bit 3, we should configure
# only SLs 0 and 8.  Based on that torus-2QoS will pick the appropriate
# SL value to provide deadlock-free routing for both QoS levels.

port-groups
port-group
name: Service_nodes
port-name: H_0_0_0_0/P1   # E.g. admin
port-name: H_0_0_1_0/P1   # E.g. NFS server
port-name: H_0_0_2_0/P1   # E.g. boot server
port-name: H_0_0_3_0/P1   # E.g. login node
end-port-group

port-group
name: Lustre_nodes

port-name: H_0_0_4_0/P1   # E.g. MDS

port-name: H_0_1_0_0/P1   # E.g. OSS
port-name: H_0_1_1_0/P1   # E.g. OSS
port-name: H_0_1_2_0/P1   # E.g. OSS
port-name: H_0_1_3_0/P1   # E.g. OSS
port-name: H_0_1_4_0/P1   # E.g. OSS
end-port-group

port-group
name: Compute_nodes

port-name: H_0_2_0_0/P1
port-name: H_0_2_1_0/P1
port-name: H_0_2_2_0/P1
port-name: H_0_2_3_0/P1
port-name: H_0_2_4_0/P1

port-name: H_0_3_0_0/P1
port-name: H_0_3_1_0/P1
port-name: H_0_3_2_0/P1
port-name: H_0_3_3_0/P1
port-name: H_0_3_4_0/P1

port-name: H_0_4_0_0/P1
port-name: H_0_4_1_0/P1
port-name: H_0_4_2_0/P1
port-name: H_0_4_3_0/P1
port-name: H_0_4_4_0/P1

port-name: H_1_0_0_0/P1
port-name: H_1_0_1_0/P1
port-name: H_1_0_2_0/P1
port-name: H_1_0_3_0/P1
port-name: H_1_0_4_0/P1

port-name: H_1_1_0_0/P1
port-name: H_1_1_1_0/P1
port-name: H_1_1_2_0/P1
port-name: H_1_1_3_0/P1
port-name: H_1_1_4_0/P1

port-name: H_1_2_0_0/P1
port-name: H_1_2_1_0/P1
port-name: H_1_2_2_0/P1
port-name: H_1_2_3_0/P1
port-name: H_1_2_4_0/P1

port-name: H_1_3_0_0/P1
port-name: H_1_3_1_0/P1
port-name: H_1_3_2_0/P1
port-name: H_1_3_3_0/P1
port-name: H_1_3_4_0/P1

port-name: H_1_4_0_0/P1
port-name: H_1_4_1_0/P1
port-name: H_1_4_2_0/P1
port-name: H_1_4_3_0/P1
port-name: H_1_4_4_0/P1

port-name: H_2_0_0_0/P1
port-name: H_2_0_1_0/P1
port-name: H_2_0_2_0/P1
port-name: H_2_0_3_0/P1
port-name: H_2_0_4_0/P1

port-name: H_2_1_0_0/P1
port-name: H_2_1_1_0/P1
port-name: H_2_1_2_0/P1
port-name: H_2_1_3_0/P1
port-name: H_2_1_4_0/P1

port-name: H_2_2_0_0/P1
port-name: H_2_2_1_0/P1
port-name: H_2_2_2_0/P1
port-name: H_2_2_3_0/P1
port-name: H_2_2_4_0/P1

port-name: H_2_3_0_0/P1
port-name: H_2_3_1_0/P1
port-name: H_2_3_2_0/P1
port-name: H_2_3_3_0/P1
port-name: H_2_3_4_0/P1

port-name: H_2_4_0_0/P1
port-name: H_2_4_1_0/P1
port-name: H_2_4_2_0/P1
port-name: H_2_4_3_0/P1
port-name: H_2_4_4_0/P1

port-name: H_3_0_0_0/P1
port-name: H_3_0_1_0/P1
port-name: H_3_0_2_0/P1
port-name: H_3_0_3_0/P1
port-name: H_3_0_4_0/P1

port-name: H_3_1_0_0/P1
port-name: H_3_1_1_0/P1
port-name: H_3_1_2_0/P1
port-name: H_3_1_3_0/P1
port-name: H_3_1_4_0/P1

port-name: H_3_2_0_0/P1
port-name: 

RE: ib_post_send in drivers

2009-11-20 Thread Sean Hefty
mlx4/qp.c: mlx4_ib_post_send()
* when passing a list containing more than one item to
mlx4_ib_post_send(), and sending the second or later item fails (e.g.
because of QP overflow), the preceding items are sent anyway. This
behavior makes it almost impossible to get error recovery right for
block device implementations that use ib_post_send() (e.g. the SRPT
target implementation).

Yes - this is the correct behavior.  The bad_wr pointer should reference the WR
that failed, with all WRs in the list passed that point being returned
unprocessed.  This is the reason for having the bad_wr in the call.  Error
recovery shouldn't be any more difficult than posting one WR at a time.

If my interpretation of the section about verbs in the InfiniBand
Architecture Specification is correct, either all work requests should
be processed or none. A quote from section 11.4.1.1, Post Send Request
(page 622 in volume 1 of release 1.2.1):

The IB spec does not define an API.  For performance reasons, you don't want the
implementation to walk through the WR list multiple times - once to check it,
then a second time to actually post the requests to the hardware.

- Sean

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html