Hi Dale, On 18:06 Wed 03 Mar , Dale Purdy wrote: > > Provide a means to specify on a per switch basis the mapping (order) > between switch ports and dimensions for Dimension Order Routing. This > allows the DOR routing engine to be used when the cabling is not > properly aligned for DOR, either initially, or for an upgrade.
Nice stuff. Is this something useful with ! '-R dor'? > > Signed-off-by: Dale Purdy <pu...@sgi.com> The patch itself is broken somehow - it has double space at start of non-changed line (it is fixable with sed -e 's/^ / /', so don't resend patch only for this). Some more minor comments are below. > --- > opensm/include/opensm/osm_subnet.h | 1 + > opensm/include/opensm/osm_switch.h | 30 +++++++++ > opensm/man/opensm.8.in | 31 +++++++-- > opensm/opensm/main.c | 13 ++++- > opensm/opensm/osm_subnet.c | 7 ++ > opensm/opensm/osm_switch.c | 2 +- > opensm/opensm/osm_ucast_mgr.c | 120 > ++++++++++++++++++++++++++++++++++++ > 7 files changed, 195 insertions(+), 9 deletions(-) > > diff --git a/opensm/include/opensm/osm_subnet.h > b/opensm/include/opensm/osm_subnet.h > index 3970e98..e4e298e 100644 > --- a/opensm/include/opensm/osm_subnet.h > +++ b/opensm/include/opensm/osm_subnet.h > @@ -186,6 +186,7 @@ typedef struct osm_subn_opt { > uint16_t console_port; > char *port_prof_ignore_file; > char *hop_weights_file; > + char *dimn_ports_file; > boolean_t port_profile_switch_nodes; > boolean_t sweep_on_trap; > char *routing_engine_names; > diff --git a/opensm/include/opensm/osm_switch.h > b/opensm/include/opensm/osm_switch.h > index cb6e5ac..1c6807e 100644 > --- a/opensm/include/opensm/osm_switch.h > +++ b/opensm/include/opensm/osm_switch.h > @@ -100,6 +100,7 @@ typedef struct osm_switch { > uint16_t num_hops; > uint8_t **hops; > osm_port_profile_t *p_prof; > + uint8_t *dimn_ports; > uint8_t *lft; > uint8_t *new_lft; > uint16_t lft_size; > @@ -871,6 +872,35 @@ static inline uint8_t osm_switch_get_mft_max_position(IN > osm_switch_t * p_sw) > * RETURN VALUE > */ > > +/****f* OpenSM: Switch/osm_switch_get_dimn_port > +* NAME > +* osm_switch_get_dimn_port > +* > +* DESCRIPTION > +* Get the routing ordered port > +* > +* SYNOPSIS > +*/ > +static inline uint8_t osm_switch_get_dimn_port(IN const osm_switch_t * p_sw, > + IN uint8_t port_num) > +{ > + CL_ASSERT(p_sw); > + if (p_sw->dimn_ports == NULL) > + return port_num; > + return p_sw->dimn_ports[port_num]; > +} > +/* > +* PARAMETERS > +* p_sw > +* [in] Pointer to the switch object. > +* > +* port_num > +* [in] Port number in the switch > +* > +* RETURN VALUES > +* Returns the port number ordered for routing purposes. > +*/ > + > /****f* OpenSM: Switch/osm_switch_recommend_path > * NAME > * osm_switch_recommend_path > diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in > index 7aca8f9..9053611 100644 > --- a/opensm/man/opensm.8.in > +++ b/opensm/man/opensm.8.in > @@ -37,6 +37,7 @@ opensm \- InfiniBand subnet manager and administration > (SM/SA) > [\-console-port <port>] > [\-i(gnore-guids) <equalize-ignore-guids-file>] > [\-w | \-\-hop_weights_file <path to file>] > +[\-O | \-\-dimn_ports_file <path to file>] > [\-f <log file path> | \-\-log_file <log file path> ] > [\-L | \-\-log_limit <size in MB>] [\-e(rase_log_file)] > [\-P(config) <partition config file> ] > @@ -273,6 +274,16 @@ factor of 1. Lines starting with # are comments. > Weights affect only the > output route from the port, so many useful configurations will require > weights > to be specified in pairs. > .TP > +\fB\-O\fR, \fB\-\-dimn_ports_file\fR <path to file> > +This option provides a mapping between hypercube dimensions and ports > +on a per switch basis for the DOR routing engine. The file consists > +of lines containing a switch node GUID (specified as a 64 bit hex > +number, with leading 0x) followed by a list of non-zero port numbers, > +separated by spaces, one switch per line. The order for the port > +numbers is in one to one correspondence to the dimensions. Ports not > +listed on a line are assigned to the remaining dimensions, in port > +order. Anything after a # is a comment. > +.TP > \fB\-x\fR, \fB\-\-honor_guid2lid\fR > This option forces OpenSM to honor the guid2lid file, > when it comes out of Standby state, if such file exists > @@ -969,17 +980,20 @@ algorithm and so uses shortest paths. Instead of > spreading traffic > out across different paths with the same shortest distance, it chooses > among the available shortest paths based on an ordering of dimensions. > Each port must be consistently cabled to represent a hypercube > -dimension or a mesh dimension. Paths are grown from a destination > -back to a source using the lowest dimension (port) of available paths > -at each step. This provides the ordering necessary to avoid deadlock. > +dimension or a mesh dimension. Alternatively, the -O option can be > +used to assign a custom mapping between the ports on a given switch, > +and the associated dimension. Paths are grown from a destination back > +to a source using the lowest dimension (port) of available paths at > +each step. This provides the ordering necessary to avoid deadlock. > When there are multiple links between any two switches, they still > represent only one dimension and traffic is balanced across them > unless port equalization is turned off. In the case of hypercubes, > the same port must be used throughout the fabric to represent the > -hypercube dimension and match on both ends of the cable. In the case > -of meshes, the dimension should consistently use the same pair of > -ports, one port on one end of the cable, and the other port on the > -other end, continuing along the mesh dimension. > +hypercube dimension and match on both ends of the cable, or the -O > +option used to accomplish the alignment. In the case of meshes, the > +dimension should consistently use the same pair of ports, one port on > +one end of the cable, and the other port on the other end, continuing > +along the mesh dimension, or the -O option used as an override. > > Use '-R dor' option to activate the DOR algorithm. > > @@ -1111,3 +1125,6 @@ Thomas Sodring > .TP > Ira Weiny > .RI < wei...@llnl.gov > > +.TP > +Dale Purdy > +.RI < pu...@sgi.com > > diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c > index f9a33af..0093aa7 100644 > --- a/opensm/opensm/main.c > +++ b/opensm/opensm/main.c > @@ -275,6 +275,10 @@ static void show_usage(void) > " This option provides the means to define a > weighting\n" > " factor per port for customizing the least weight\n" > " hops for the routing.\n\n"); > + printf("--dimn_ports_file, -O <path to file>\n" > + " This option provides the means to define a mapping\n" > + " between ports and dimension (Order) for controlling\n" > + " Dimension Order Routing (DOR).\n\n"); > printf("--honor_guid2lid, -x\n" > " This option forces OpenSM to honor the guid2lid > file,\n" > " when it comes out of Standby state, if such file > exists\n" > @@ -543,7 +547,7 @@ int main(int argc, char *argv[]) > char *conf_template = NULL, *config_file = NULL; > uint32_t val; > const char *const short_option = > - > "F:c:i:w:f:ed:D:g:l:L:s:t:a:u:m:X:R:zM:U:S:P:Y:ANBIQvVhoryxp:n:q:k:C:G:H:"; > + > "F:c:i:w:O:f:ed:D:g:l:L:s:t:a:u:m:X:R:zM:U:S:P:Y:ANBIQvVhoryxp:n:q:k:C:G:H:"; > > /* > In the array below, the 2nd parameter specifies the number > @@ -560,6 +564,7 @@ int main(int argc, char *argv[]) > {"guid", 1, NULL, 'g'}, > {"ignore_guids", 1, NULL, 'i'}, > {"hop_weights_file", 1, NULL, 'w'}, > + {"dimn_ports_file", 1, NULL, 'O'}, > {"lmc", 1, NULL, 'l'}, > {"sweep", 1, NULL, 's'}, > {"timeout", 1, NULL, 't'}, > @@ -696,6 +701,12 @@ int main(int argc, char *argv[]) > opt.hop_weights_file); > break; > > + case 'O': > + opt.dimn_ports_file = optarg; > + printf(" Dimension Ports File = %s\n", > + opt.dimn_ports_file); > + break; > + > case 'g': > /* > Specifies port guid with which to bind. > diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c > index e4126bc..e3f118b 100644 > --- a/opensm/opensm/osm_subnet.c > +++ b/opensm/opensm/osm_subnet.c > @@ -324,6 +324,7 @@ static const opt_rec_t opt_tbl[] = { > { "force_heavy_sweep", OPT_OFFSET(force_heavy_sweep), > opts_parse_boolean, NULL, 1 }, > { "port_prof_ignore_file", OPT_OFFSET(port_prof_ignore_file), > opts_parse_charp, NULL, 0 }, > { "hop_weights_file", OPT_OFFSET(hop_weights_file), opts_parse_charp, > NULL, 0 }, > + { "dimn_ports_file", OPT_OFFSET(dimn_ports_file), opts_parse_charp, > NULL, 0 }, > { "port_profile_switch_nodes", OPT_OFFSET(port_profile_switch_nodes), > opts_parse_boolean, NULL, 1 }, > { "sweep_on_trap", OPT_OFFSET(sweep_on_trap), opts_parse_boolean, NULL, > 1 }, > { "routing_engine", OPT_OFFSET(routing_engine_names), opts_parse_charp, > NULL, 0 }, > @@ -742,6 +743,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * p_opt) > p_opt->accum_log_file = TRUE; > p_opt->port_prof_ignore_file = NULL; > p_opt->hop_weights_file = NULL; > + p_opt->dimn_ports_file = NULL; > p_opt->port_profile_switch_nodes = FALSE; > p_opt->sweep_on_trap = TRUE; > p_opt->use_ucast_cache = FALSE; > @@ -1385,6 +1387,11 @@ int osm_subn_output_conf(FILE *out, IN osm_subn_opt_t > * p_opts) > p_opts->hop_weights_file ? p_opts->hop_weights_file : null_str); > > fprintf(out, > + "# The file holding non-default port order per switch for DOR > routing \n" > + "dimn_ports_file %s\n\n", > + p_opts->dimn_ports_file ? p_opts->dimn_ports_file : null_str); > + > + fprintf(out, > "# Routing engine\n" > "# Multiple routing engines can be specified separated by\n" > "# commas so that specific ordering of routing algorithms > will\n" > diff --git a/opensm/opensm/osm_switch.c b/opensm/opensm/osm_switch.c > index 1cd8bfc..398fd65 100644 > --- a/opensm/opensm/osm_switch.c > +++ b/opensm/opensm/osm_switch.c > @@ -341,7 +341,7 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * > p_sw, > > /* port number starts with one and num_ports is 1 + num phys ports */ > for (i = start_from; i < start_from + num_ports; i++) { > - port_num = i%num_ports; > + port_num = osm_switch_get_dimn_port(p_sw, i%num_ports); > if (!port_num || > osm_switch_get_hop_count(p_sw, base_lid, port_num) != > least_hops) > diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c > index 7ec58b5..e74775c 100644 > --- a/opensm/opensm/osm_ucast_mgr.c > +++ b/opensm/opensm/osm_ucast_mgr.c > @@ -46,6 +46,7 @@ > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > +#include <ctype.h> > #include <iba/ib_types.h> > #include <complib/cl_qmap.h> > #include <complib/cl_debug.h> > @@ -488,6 +489,112 @@ static void set_default_hop_wf(cl_map_item_t * > p_map_item, void *ctx) > } > } > > +static void free_dimn_ports(cl_map_item_t * p_map_item, void *ctx) > +{ > + osm_switch_t *sw = (osm_switch_t *) p_map_item; > + > + if (sw->dimn_ports) { > + free(sw->dimn_ports); > + sw->dimn_ports = NULL; > + } > +} > + > +static int set_dimn_ports(void *ctx, uint64_t guid, char *p) > +{ > + osm_ucast_mgr_t *m = ctx; > + osm_node_t *node = osm_get_node_by_guid(m->p_subn, cl_hton64(guid)); > + osm_switch_t *sw; > + uint8_t *dimn_ports = NULL; > + uint8_t port; > + uint *ports = NULL; 'uint' is not something standard (we had some build compatibility issues with 'uint' in infiniband-diags in the past), so what about 'unsigned int'? > + const int bpw = sizeof(*ports)*8; > + int words; > + int i = 1; /* port 0 maps to port 0 */ > + > + if (!node || !(sw = node->sw)) { > + OSM_LOG(m->p_log, OSM_LOG_DEBUG, > + "switch with guid 0x%016" PRIx64 " is not found\n", > + guid); > + return 0; > + } > + > + if (sw->dimn_ports) { > + OSM_LOG(m->p_log, OSM_LOG_DEBUG, > + "switch with guid 0x%016" PRIx64 " already listed\n", > + guid); It is GIUD double listed case, right? Wouldn't OSM_LOG_VERBOSE be more appropriate? > + return 0; > + } > + > + dimn_ports = malloc(sizeof(*dimn_ports)*sw->num_ports); > + if (!dimn_ports) { > + OSM_LOG(m->p_log, OSM_LOG_ERROR, > + "ERR 3A07: cannot allocate memory for dimn_ports\n"); > + return -1; > + } > + memset(dimn_ports, 0, sizeof(*dimn_ports)*sw->num_ports); > + > + /* the ports array is for record keeping of which ports have > + * been seen */ > + words = (sw->num_ports + bpw - 1)/bpw; > + ports = malloc(words*sizeof(*ports)); > + if (!ports) { > + OSM_LOG(m->p_log, OSM_LOG_ERROR, > + "ERR 3A08: cannot allocate memory for ports\n"); > + return -1; > + } > + memset(ports, 0, words*sizeof(*ports)); > + > + while ((*p != '\0') && (*p != '#')) { > + char *e; > + > + port = strtoul(p, &e, 0); > + if ((p == e) || (port == 0) || (port >= sw->num_ports) || > + !osm_node_get_physp_ptr(node, port)) { > + OSM_LOG(m->p_log, OSM_LOG_DEBUG, > + "bad port %d specified for guid 0x%016" PRIx64 > "\n", > + port, guid); > + free(dimn_ports); > + free(ports); Ditto. > + return 0; > + } > + > + if (ports[port/bpw] & (1u << (port%bpw))) { > + OSM_LOG(m->p_log, OSM_LOG_DEBUG, > + "port %d already specified for guid 0x%016" > PRIx64 "\n", > + port, guid); Ditto. > + free(dimn_ports); > + free(ports); > + return 0; > + } > + > + ports[port/bpw] |= (1u << (port%bpw)); > + dimn_ports[i++] = port; > + > + p = e; > + while (isspace(*p)) { > + p++; > + } > + } > + > + if (i > 1) { > + for (port = 1; port < sw->num_ports; port++) { > + /* fill out the rest of the dimn_ports array > + * in sequence using the remaining unspecified > + * ports. > + */ > + if (!(ports[port/bpw] & (1u << (port%bpw)))) { > + dimn_ports[i++] = port; > + } > + } > + sw->dimn_ports = dimn_ports; > + } else { > + free(dimn_ports); > + } > + > + free(ports); > + return 0; > +} > + > int osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * p_mgr) > { > uint32_t i; > @@ -515,6 +622,19 @@ int osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t > * p_mgr) > } > } > > + cl_qmap_apply_func(p_sw_guid_tbl, free_dimn_ports, NULL); > + if (p_mgr->p_subn->opt.dimn_ports_file) { > + OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG, > + "Fetching dimension ports file \'%s\'\n", > + p_mgr->p_subn->opt.dimn_ports_file); > + if (parse_node_map(p_mgr->p_subn->opt.dimn_ports_file, > + set_dimn_ports, p_mgr)) { > + OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR, "ERR 3A05: " > + "cannot parse dimn_ports_file \'%s\'\n", > + p_mgr->p_subn->opt.dimn_ports_file); > + } > + } > + Hmm, if it is DOR only it can be done under 'if (is_dor)' (to save cycles of other REs). Otherwise (generic usability) ucast_mgr_setup_all_switches() seems as more appropriate place to have such setup, no? And what about adding: if (sw->dimn_ports) free(dimn_ports); in osm_switch_delete()? Sasha > /* > Set the switch matrices for each switch's own port 0 LID(s) > then set the lid matrices for the each switch's leaf nodes. > -- > 1.7.0 > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html