When using reverse hop with nodes on root switches, origin switch is explored multiple times with wrong hop numbers creating loops in the network. This is fixed by checking before going up (even on the master path) that the remote switch is either not configured or with a longer path (second test was missing).
Signed-off-by: Nicolas Morey-Chaisemartin <[email protected]> --- opensm/opensm/osm_ucast_ftree.c | 57 +++++++++++++++++++++++--------------- 1 files changed, 34 insertions(+), 23 deletions(-) diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c index 6c695de..5a9eeff 100644 --- a/opensm/opensm/osm_ucast_ftree.c +++ b/opensm/opensm/osm_ucast_ftree.c @@ -2396,38 +2396,49 @@ fabric_route_downgoing_by_going_up(IN ftree_fabric_t * p_ftree, if (is_real_lid) { /* This LID may already be in the LFT in the reverse_hop feature is used */ /* We update the LFT only if this LID isn't already present. */ - if (p_remote_sw->p_osm_sw-> - new_lft[target_lid] == OSM_NO_PATH) { - p_remote_sw->p_osm_sw-> - new_lft[target_lid] = + + /* skip if target lid has been already set on remote switch fwd tbl (with a bigger hop count) */ + if ((p_remote_sw->p_osm_sw->new_lft[target_lid] == + OSM_NO_PATH) + || + ((p_remote_sw->p_osm_sw->new_lft[target_lid] != + OSM_NO_PATH) + && + ((target_rank - p_remote_sw->rank + + 2 * reverse_hops) < + sw_get_least_hops(p_remote_sw, target_lid)))) { + + p_remote_sw->p_osm_sw->new_lft[target_lid] = p_min_port->remote_port_num; OSM_LOG(&p_ftree->p_osm->log, OSM_LOG_DEBUG, "Switch %s: set path to CA LID %u through port %u\n", tuple_to_str(p_remote_sw->tuple), target_lid, p_min_port->remote_port_num); - } - /* On the remote switch that is pointed by the min_group, - set hops for ALL the ports in the remote group. */ - set_hops_on_remote_sw(p_min_group, target_lid, - target_rank - p_remote_sw->rank + - 2 * reverse_hops, is_target_a_sw); - } + /* On the remote switch that is pointed by the min_group, + set hops for ALL the ports in the remote group. */ - /* Recursion step: - Assign downgoing ports by stepping up, starting on REMOTE switch. */ - created_route |= fabric_route_downgoing_by_going_up(p_ftree, p_remote_sw, /* remote switch - used as a route-downgoing alg. next step point */ - p_sw, /* this switch - prev. position switch for the function */ - target_lid, /* LID that we're routing to */ - target_rank, /* rank of the LID that we're routing to */ - is_real_lid, /* whether this target LID is real or dummy */ - is_main_path, /* whether this is path to HCA that should by tracked by counters */ - is_target_a_sw, /* Wheter target lid is a switch or not */ - reverse_hop_credit, /* Remaining reverse_hops allowed */ - reverse_hops); /* Number of reverse_hops done up to this point */ - } + set_hops_on_remote_sw(p_min_group, target_lid, + target_rank - + p_remote_sw->rank + + 2 * reverse_hops, + is_target_a_sw); + /* Recursion step: + Assign downgoing ports by stepping up, starting on REMOTE switch. */ + created_route |= fabric_route_downgoing_by_going_up(p_ftree, p_remote_sw, /* remote switch - used as a route-downgoing alg. next step point */ + p_sw, /* this switch - prev. position switch for the function */ + target_lid, /* LID that we're routing to */ + target_rank, /* rank of the LID that we're routing to */ + is_real_lid, /* whether this target LID is real or dummy */ + is_main_path, /* whether this is path to HCA that should by tracked by counters */ + is_target_a_sw, /* Wheter target lid is a switch or not */ + reverse_hop_credit, /* Remaining reverse_hops allowed */ + reverse_hops); /* Number of reverse_hops done up to this point */ + } + } + } /* we're done for the third case */ if (!is_real_lid) return created_route; -- 1.6.3.1 _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
