Re: [PATCH] OpenSM - The DnUp routing algorithm.
Hi Ken, On 13:29 Wed 13 Apr , Alex Netes wrote: -Original Message- From: Schmidt, Kenneth P [mailto:kenneth.schm...@pnl.gov] Sent: Monday, March 28, 2011 3:49 AM To: Alex Netes Cc: Sasha Khapyorsky; Carr, Jared F Subject: Re: [PATCH] OpenSM - The DnUp routing algorithm. Alex, On 03/27/2011, at 09:06, Alex Netes wrote: Ken, On 16:04 Wed 23 Mar , Ken Schmidt wrote: This routing algorithm operates in a very similar fashion to UpDn, but is modified to allow optimal routing on certain network structures in which uplinks and CA nodes are connected to the same switch nodes. (For example Chinook at EMSL and RoadRunner at LANL.) In these networks the optimal paths between nodes connected to a single chassis would remain within the chassis. However due to the uplinks being connected at the same level of the network as the CA nodes UpDn will not allow these paths to be used for communication between the CA nodes. DnUp follows the same procedure as UpDn with a few differences. Ranking is based solely on the relative distance from CA nodes, any switch node with a CA node directly attached is assigned a rank of 0 any switch node without a CA node attached is assigned a rank of one greater than the minimum rank of their neighbors. Transitions are also reversed; The initial direction is down and only one transition to up is allowed. There is also an option which relaxes this restriction to allow communication with switches nodes similar to the functionality of connect_roots in UpDn. --- I have few general questions. How can you assure that all the routes between the hosts on the same chassis will go strictly through the chassis (spines) and not other lines? I am not positive we can be assured that there won't ever be a time when it decides to pick a route through the external lines instead of the chassis connections. However, because there are more connections going to the rest of the subnet, osm_switch_recommend_path() should try to balance the routes to all the LID's connected through the same chassis through the internal ports because they will have fewer number of paths going through those links. I guess, you can use --hop_weights_file option. The default port weight is 1, so you can define weight of ports going from line to another line to be 2, so that way for hosts connected to the same chassis, the local route through the spine would be always chosen as it would be min_hop route. Is it possible to assign routes between switches/switches hosts/switches in different chassises (I guess it's more complicate than connect_roots in UPDN)? I was working on an algorithm that would build the routes automatically to allow routes that would violate the DnUp rules, but we came up with an easier solution. I am not entirely sure it is better, but it was considerably easier to implement. Basically, instead of setting the all routes that violate the rules to OSM_NO_PATH, and skip everything beyond it in the breadth first search, we add a configured constant weight (prune_weight) to them which should be greater than the number of hops in the network. In our case, we set it to 32. This allows the minhop to use the paths that would normally be denied, but only if it can't use one of the paths that doesn't break the rules. Ken Schmidt Research Scientist, Molecular Science Computing Operations EMSL: Environmental Molecular Sciences Laboratory Pacific Northwest National Laboratory 902 Battelle Boulevard P.O. Box 999, MSIN K8-84 Richland, WA 99352 USA Tel: 509-371-6107 Fax: 509-371-6110 kenneth.schm...@pnl.gov www.emsl.pnl.gov -- Alex -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] opensm: fixed segfault when enable qos on fabric with no switches
Alex Netes wrote: Signed-off-by: Alex Netes ale...@mellanox.com --- opensm/osm_qos.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) Acked-by: Jim Schutt jasc...@sandia.gov -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] infiniband-diags: saquery, exit with ENODEV when queries fail to find results
On Sat, 9 Apr 2011 05:04:22 -0700 Hal Rosenstock h...@dev.mellanox.co.il wrote: On 4/8/2011 4:45 PM, Ira Weiny wrote: Subject: [PATCH 1/3] infiniband-diags: saquery, exit with ENODEV when queries fail to find results 0 records is a perfectly valid response to a GetTable and that's different from a real error. Why not just indicate the number of records returned in dump_results ? Traditionally a non-zero return means there was an error. I was looking for a good example of an error being returned when there was nothing found. Grep is the only example I could find which returns 1 when it does not find a match. I think you are correct though that not finding any results is not an error. I will reject the patch and clean up the next one (2/3). I think 3/3 should have been separate anyway. Ira -- Hal Signed-off-by: Ira Weiny wei...@llnl.gov --- src/saquery.c | 29 +++-- 1 files changed, 23 insertions(+), 6 deletions(-) diff --git a/src/saquery.c b/src/saquery.c index 03869e3..d7c8e6f 100644 --- a/src/saquery.c +++ b/src/saquery.c @@ -876,6 +876,9 @@ static int get_and_dump_any_records(bind_handle_t h, uint16_t attr_id, if (ret) return ret; + if (result.result_cnt == 0) + return ENODEV; + dump_results(result, dump_func); return 0; @@ -896,7 +899,14 @@ static int get_and_dump_all_records(bind_handle_t h, uint16_t attr_id, if (ret) return ret; + if (result.result_cnt == 0) { + ret = ENODEV; + goto Exit; + } + dump_results(result, dump_func); + +Exit: return_mad(); return ret; } @@ -915,7 +925,7 @@ static int get_lid_from_name(bind_handle_t h, const char *name, uint16_t * lid) if (ret) return ret; - ret = IB_NOT_FOUND; + ret = ENODEV; for (i = 0; i result.result_cnt; i++) { node_record = get_query_rec(result.p_result_madw, i); p_ni = (node_record-node_info); @@ -924,7 +934,7 @@ static int get_lid_from_name(bind_handle_t h, const char *name, uint16_t * lid) sizeof(node_record-node_desc.description)) == 0) { *lid = cl_ntoh16(node_record-lid); - ret = IB_SUCCESS; + ret = 0; break; } } @@ -939,7 +949,7 @@ static uint16_t get_lid(bind_handle_t h, const char *name) if (!name) return 0; if (isalpha(name[0])) { - if (get_lid_from_name(h, name, rc_lid) != IB_SUCCESS) { + if (get_lid_from_name(h, name, rc_lid) != 0) { fprintf(stderr, Failed to find lid for \%s\\n, name); exit(EINVAL); } @@ -1040,6 +1050,8 @@ static int print_node_records(bind_handle_t h) if (ret) return ret; + ret = ENODEV; + if (node_print_desc == ALL_DESC) { printf( LID \name\\n); printf(\n); @@ -1049,14 +1061,18 @@ static int print_node_records(bind_handle_t h) node_record = get_query_rec(result.p_result_madw, i); if (node_print_desc == ALL_DESC) { print_node_desc(node_record); + ret = 0; } else if (node_print_desc == NAME_OF_LID) { - if (requested_lid == cl_ntoh16(node_record-lid)) + if (requested_lid == cl_ntoh16(node_record-lid)) { print_node_record(node_record); + ret = 0; + } } else if (node_print_desc == NAME_OF_GUID) { ib_node_info_t *p_ni = (node_record-node_info); - - if (requested_guid == cl_ntoh64(p_ni-port_guid)) + if (requested_guid == cl_ntoh64(p_ni-port_guid)) { print_node_record(node_record); + ret = 0; + } } else { if (!requested_name || (strncmp(requested_name, @@ -1068,6 +1084,7 @@ static int print_node_records(bind_handle_t h) return_mad(); exit(0); } + ret = 0; } } } -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
changing subnet ID
Question with regard to changing the subnet prefix in OpenSM. We want to uniquely identify individual subnets via changing the subnet prefix in OpenSM: /etc/ofa/opensm-ib0.conf:subnet_prefix 0xfe80 to something like /etc/ofa/opensm-ib0.conf:subnet_prefix 0xfe800100 Then an application can make some sense of what ib subnets it has access to, particularly on multi-homed or multi interfaced hosts that can be on any combination of ib subnets. The RFC on ipv6 addressing (http://tools.ietf.org/html/rfc4291#section-2.5.6) says those bits should be 0. I'm unclear if FE80/10 or FE80:0:0:0/64 is link local. Does anyone know if this will break things or a better way to do this, and keep the host address part private (not unique) between subnets? I suppose it could break routing ::)) host A - multi if on ib0 - 0xfe80 ib1 - 0xfe81 host B - multi if on ib0 - 0xfe81 ib1 - 0xfe80 host C - multi if on ib0 - 0xfe800100 ib1 - 0xfe81 host D - multi if on ib0 - 0xfe800100 ib1 - 0xfe81 host E - multi homed and multi if on ib0 - 0xfe80 ib1 - 0xfe81 ib2 - 0xfe800100 ib3 - 0xfe800100 An application that spans all 5 hosts wants to know which hosts it can *expect* to connect to over what interfaces, without actually making the connection. 909 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: changing subnet ID
On Wed, Apr 13, 2011 at 03:18:31PM -0700, Bob Ciotti wrote: Question with regard to changing the subnet prefix in OpenSM. We want to uniquely identify individual subnets via changing the subnet prefix in OpenSM: I highly recommend assigning a uniqe GID prefix to all subnets. eg the ibtool program will use the GID prefix to identify the proper HCA port to use, which in of itself makes it worth doing.. You should not use FE80::/10 as a prefix. In IB all prefixes are /64s. I recommend generating a ULA for your site and using that. See http://en.wikipedia.org/wiki/Unique_local_address This bit of python shows how to create the random address for your site: import os fd+os.urandom(40//8).encode(hex)+ 'fd0b96859d2b' (aka fd0b:9685:9d2b:::/64) The lower 16 bits of the ULA can be incremented and assigned to each of your site's subnets. Generate a new ULA if you run out. This will work well with future IB routing specifications and allow a hierarchical routing scheme where all subnets in your site can be collapsed to a single /48 address for off-site access. The RFC on ipv6 addressing (http://tools.ietf.org/html/rfc4291#section-2.5.6) says those bits should be 0. I'm unclear if FE80/10 or FE80:0:0:0/64 is link local. In IPv6 FE80::/10 is link local, technically in IB all prefixes are /64 so it is FE80::/64. -- Jason Gunthorpe jguntho...@obsidianresearch.com(780)4406067x832 Chief Technology Officer, Obsidian Research Corp Edmonton, Canada -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html