Re: [PATCH] OpenSM - The DnUp routing algorithm.

2011-04-13 Thread Alex Netes
Hi Ken,

On 13:29 Wed 13 Apr , Alex Netes wrote:
 
 
 -Original Message-
 From: Schmidt, Kenneth P [mailto:kenneth.schm...@pnl.gov] 
 Sent: Monday, March 28, 2011 3:49 AM
 To: Alex Netes
 Cc: Sasha Khapyorsky; Carr, Jared F
 Subject: Re: [PATCH] OpenSM - The DnUp routing algorithm.
 
 Alex,
 
 On 03/27/2011, at 09:06, Alex Netes wrote:
 
  Ken,
  
  On 16:04 Wed 23 Mar , Ken Schmidt wrote:
  This routing algorithm operates in a very similar fashion to UpDn, 
  but is modified to allow optimal routing on certain network 
  structures in which uplinks and CA nodes are connected to the same 
  switch nodes. (For example Chinook at EMSL and RoadRunner at LANL.) 
  In these networks the optimal paths between nodes connected to a 
  single chassis would remain within the chassis.  However due to the 
  uplinks being connected at the same level of the network as the CA 
  nodes UpDn will not allow these paths to be used for communication between 
  the CA nodes.
  
  DnUp follows the same procedure as UpDn with a few differences.  
  Ranking is based solely on the relative distance from CA nodes, any 
  switch node with a CA node directly attached is assigned a rank of 0 
  any switch node without a CA node attached is assigned a rank of one 
  greater than the minimum rank of their neighbors. Transitions are 
  also reversed; The initial direction is down and only one transition 
  to up is allowed.  There is also an option which relaxes this 
  restriction to allow communication with switches nodes similar to the 
  functionality of connect_roots in UpDn.
  
  ---
  
  I have few general questions.
  How can you assure that all the routes between the hosts on the same 
  chassis will go strictly through the chassis (spines) and not other lines?
 I am not positive we can be assured that there won't ever be a time when it 
 decides to pick a route through the external lines instead of the chassis 
 connections.  However, because there are more connections going to the rest 
 of the subnet, osm_switch_recommend_path() should try to balance the routes 
 to all the LID's connected through the same chassis through the internal 
 ports because they will have fewer number of paths going through those links.

I guess, you can use --hop_weights_file option. The default port weight is 1,
so you can define weight of ports going from line to another line to be 2, so
that way for hosts connected to the same chassis, the local route through the
spine would be always chosen as it would be min_hop route.

 
  Is it possible to assign routes between switches/switches 
  hosts/switches in different chassises (I guess it's more complicate than 
  connect_roots in UPDN)?
 I was working on an algorithm that would build the routes automatically to 
 allow routes that would violate the DnUp rules, but we came up with an easier 
 solution.  I am not entirely sure it is better, but it was considerably 
 easier to implement.  Basically, instead of setting the all routes that 
 violate the rules to OSM_NO_PATH, and skip everything beyond it in the 
 breadth first search, we add a configured constant weight (prune_weight) to 
 them which should be greater than the number of hops in the network.  In our 
 case, we set it to 32.  This allows the minhop to use the paths that would 
 normally be denied, but only if it can't use one of the paths that doesn't 
 break the rules.
 
 
 Ken Schmidt
 Research Scientist, Molecular Science Computing Operations
 EMSL: Environmental Molecular Sciences Laboratory
 
 Pacific Northwest National Laboratory
 902 Battelle Boulevard
 P.O. Box 999, MSIN K8-84
 Richland, WA  99352 USA
 Tel:  509-371-6107
 Fax: 509-371-6110
 kenneth.schm...@pnl.gov
 www.emsl.pnl.gov
 

-- Alex
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] opensm: fixed segfault when enable qos on fabric with no switches

2011-04-13 Thread Jim Schutt

Alex Netes wrote:

Signed-off-by: Alex Netes ale...@mellanox.com
---
 opensm/osm_qos.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)


Acked-by: Jim Schutt jasc...@sandia.gov

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] infiniband-diags: saquery, exit with ENODEV when queries fail to find results

2011-04-13 Thread Ira Weiny
On Sat, 9 Apr 2011 05:04:22 -0700
Hal Rosenstock h...@dev.mellanox.co.il wrote:

 On 4/8/2011 4:45 PM, Ira Weiny wrote:
  
  Subject: [PATCH 1/3] infiniband-diags: saquery, exit with ENODEV when 
  queries fail to find results
 
 0 records is a perfectly valid response to a GetTable and that's
 different from a real error. Why not just indicate the number of records
 returned in dump_results ?

Traditionally a non-zero return means there was an error.  I was looking for a 
good example of an error being returned when there was nothing found.  Grep is 
the only example I could find which returns 1 when it does not find a match.

I think you are correct though that not finding any results is not an error.

I will reject the patch and clean up the next one (2/3).  I think 3/3 should 
have been separate anyway.

Ira

 
 -- Hal
 
  
  Signed-off-by: Ira Weiny wei...@llnl.gov
  ---
   src/saquery.c |   29 +++--
   1 files changed, 23 insertions(+), 6 deletions(-)
  
  diff --git a/src/saquery.c b/src/saquery.c
  index 03869e3..d7c8e6f 100644
  --- a/src/saquery.c
  +++ b/src/saquery.c
  @@ -876,6 +876,9 @@ static int get_and_dump_any_records(bind_handle_t h, 
  uint16_t attr_id,
  if (ret)
  return ret;
   
  +   if (result.result_cnt == 0)
  +   return ENODEV;
  +
  dump_results(result, dump_func);
   
  return 0;
  @@ -896,7 +899,14 @@ static int get_and_dump_all_records(bind_handle_t h, 
  uint16_t attr_id,
  if (ret)
  return ret;
   
  +   if (result.result_cnt == 0) {
  +   ret = ENODEV;
  +   goto Exit;
  +   }
  +
  dump_results(result, dump_func);
  +
  +Exit:
  return_mad();
  return ret;
   }
  @@ -915,7 +925,7 @@ static int get_lid_from_name(bind_handle_t h, const 
  char *name, uint16_t * lid)
  if (ret)
  return ret;
   
  -   ret = IB_NOT_FOUND;
  +   ret = ENODEV;
  for (i = 0; i  result.result_cnt; i++) {
  node_record = get_query_rec(result.p_result_madw, i);
  p_ni = (node_record-node_info);
  @@ -924,7 +934,7 @@ static int get_lid_from_name(bind_handle_t h, const 
  char *name, uint16_t * lid)
 sizeof(node_record-node_desc.description)) ==
  0) {
  *lid = cl_ntoh16(node_record-lid);
  -   ret = IB_SUCCESS;
  +   ret = 0;
  break;
  }
  }
  @@ -939,7 +949,7 @@ static uint16_t get_lid(bind_handle_t h, const char 
  *name)
  if (!name)
  return 0;
  if (isalpha(name[0])) {
  -   if (get_lid_from_name(h, name, rc_lid) != IB_SUCCESS) {
  +   if (get_lid_from_name(h, name, rc_lid) != 0) {
  fprintf(stderr, Failed to find lid for \%s\\n, 
  name);
  exit(EINVAL);
  }
  @@ -1040,6 +1050,8 @@ static int print_node_records(bind_handle_t h)
  if (ret)
  return ret;
   
  +   ret = ENODEV;
  +
  if (node_print_desc == ALL_DESC) {
  printf(   LID \name\\n);
  printf(\n);
  @@ -1049,14 +1061,18 @@ static int print_node_records(bind_handle_t h)
  node_record = get_query_rec(result.p_result_madw, i);
  if (node_print_desc == ALL_DESC) {
  print_node_desc(node_record);
  +   ret = 0;
  } else if (node_print_desc == NAME_OF_LID) {
  -   if (requested_lid == cl_ntoh16(node_record-lid))
  +   if (requested_lid == cl_ntoh16(node_record-lid)) {
  print_node_record(node_record);
  +   ret = 0;
  +   }
  } else if (node_print_desc == NAME_OF_GUID) {
  ib_node_info_t *p_ni = (node_record-node_info);
  -
  -   if (requested_guid == cl_ntoh64(p_ni-port_guid))
  +   if (requested_guid == cl_ntoh64(p_ni-port_guid)) {
  print_node_record(node_record);
  +   ret = 0;
  +   }
  } else {
  if (!requested_name ||
  (strncmp(requested_name,
  @@ -1068,6 +1084,7 @@ static int print_node_records(bind_handle_t h)
  return_mad();
  exit(0);
  }
  +   ret = 0;
  }
  }
  }
 


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


changing subnet ID

2011-04-13 Thread Bob Ciotti

Question with regard to changing the subnet prefix in OpenSM.

We want to uniquely identify individual subnets via changing the
subnet prefix in OpenSM:

 /etc/ofa/opensm-ib0.conf:subnet_prefix 0xfe80

to something like 

 /etc/ofa/opensm-ib0.conf:subnet_prefix 0xfe800100

Then an application can make some sense of what ib subnets it has access to, 
particularly on multi-homed 
or multi interfaced hosts that can be on any combination of ib subnets.

The RFC on ipv6 addressing (http://tools.ietf.org/html/rfc4291#section-2.5.6) 
says those bits should 
be 0. I'm unclear if FE80/10 or FE80:0:0:0/64 is link local.

Does anyone know if this will break things or a better way to do this, and keep 
the host address part
private (not unique) between subnets? I suppose it could break routing ::))


host A - multi if on
ib0 - 0xfe80
ib1 - 0xfe81

host B - multi if on
ib0 - 0xfe81
ib1 - 0xfe80

host C - multi if on
ib0 - 0xfe800100
ib1 - 0xfe81

host D - multi if on
ib0 - 0xfe800100
ib1 - 0xfe81

host E - multi homed and multi if on
ib0 - 0xfe80
ib1 - 0xfe81
ib2 - 0xfe800100
ib3 - 0xfe800100

An application that spans all 5 hosts wants to know which hosts it can *expect* 
to connect to over what 
interfaces, without actually making the connection.



909

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: changing subnet ID

2011-04-13 Thread Jason Gunthorpe
On Wed, Apr 13, 2011 at 03:18:31PM -0700, Bob Ciotti wrote:

 Question with regard to changing the subnet prefix in OpenSM.
 
 We want to uniquely identify individual subnets via changing the
 subnet prefix in OpenSM:

I highly recommend assigning a uniqe GID prefix to all subnets. eg
the ibtool program will use the GID prefix to identify the proper HCA
port to use, which in of itself makes it worth doing..

You should not use FE80::/10 as a prefix.

In IB all prefixes are /64s. I recommend generating a ULA for your
site and using that. See

http://en.wikipedia.org/wiki/Unique_local_address

This bit of python shows how to create the random address for your site:

 import os
 fd+os.urandom(40//8).encode(hex)+
'fd0b96859d2b'
(aka fd0b:9685:9d2b:::/64)

The lower 16 bits of the ULA can be incremented and assigned to each of
your site's subnets. Generate a new ULA if you run out.

This will work well with future IB routing specifications and allow
a hierarchical routing scheme where all subnets in your site can be
collapsed to a single /48 address for off-site access.

 The RFC on ipv6 addressing
 (http://tools.ietf.org/html/rfc4291#section-2.5.6) says those bits
 should be 0. I'm unclear if FE80/10 or FE80:0:0:0/64 is link local.

In IPv6 FE80::/10 is link local, technically in IB all prefixes are
/64 so it is FE80::/64.

-- 
Jason Gunthorpe jguntho...@obsidianresearch.com(780)4406067x832
Chief Technology Officer, Obsidian Research Corp Edmonton, Canada
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html