Re: opensm: file routing engine

2011-04-25 Thread Weiny, Ira K.
On Apr 22, 2011, at 1:37 PM, Paul Monday (Parallel Scientific) wrote:

 Thank you, your detail is greatly appreciated :)
 
 I have one other strange question ... is it possible to carve a single
 physical switch into two logical switches (put a cable between ports
 16/17 and modify the routing tables ... this seems like it wouldn't work
 as the Unicast LID / Switch: guid rows in the respective files below
 serve as keys so the single switch would be identified twice).
 Not that I am aware of.  When you say you have a single switch I assume you 
 mean a switch based on a single switch ASIC?  Like a 24 or 36 port pizza 
 box switch.
 Yes, a 36 port Mellanox pizza box with a single crossbar ... based on 
 how I read these files, it looks like they key off a single GUID that 
 identifies the switch ... which would probably make the subnet manager 
 unhappy if I arbitrarily tried to mock it up being two switches somehow
 The file formats seem to be:
 
 opensm-lfts.dump (later becomes -U [file])
 - Contains all discovered ports (powered on), their function (Switch vs.
 Channel Adapter), their LID and some extra information.  This is
 essentially the physical network (if all machines are powered on) ...
 the format is:
 Unicast lids [0-x] of switch Lid LID# guidGUID  ('switch description'):
 LID 0x  SwitchPort ZZZ  #Channel Adapter | Switch  portguid
 GUID: 'Descirption'
 
 I assume this file grows with all of the Channel Adapters and switches.
 Given a switch-switch connection a row would look like
 0x0019 005 # Switch portguid 0x003 'MF3:switch-my:MTS3600/U1'
 Yes this file grows with more nodes in the system.  But the line above is 
 not a connection but rather a linear forwarding table entry.  In general, 
 this is saying that for the given lid 0x0019 route out port 5 of that 
 switch (the switch given by the Unicast lids [... line.  The information 
 after '#' is more information about the node with lid=0x0019. This is _not_ 
 the other end of the link on port 5.
 Ahhh, I see ... so this table could get quite large ... if I have 1,000 
 nodes in a subnet, each with a LID assigned, this table would become 
 quite large as each LID would be listed for each switch if I have my 
 forwarding thoughts in my head ... maybe I need to wander around and 
 steal another switch from someone ;-)

Another option would be to use ibsim:  
git://git.openfabrics.org/~alexnetes/ibsim.git

You could simulate more switches in your network.

Ira


 The topology of the physical connections are shown in opensm-subnet.lst.
 Ahhh, but the opensm-subnet.lst is not handed to the file routing 
 algorithm ... this must be derived at runtime each run I'm guessing 
 and then dumped to /var/log.  Very helpful!  Thank you for the pointer.
 You could essentially use this file to map the entire physical network,
 you would end up with a graph ... but no information for how to traverse
 it efficiently, does that sound right?
 No this is not mapping the physical network.  It is a dump of the port 
 forwarding which was programed into each switch by opensm.
 
 Changing this file is what allows you to change the routing and then feed it 
 back into opensm.
 
 opensm-lid-matrix.dump
 - Looks like it contains the hop information ... but it's a bit more
 cryptic since I have only one switch :(  It should contain a list of all
 switches, the LID for the switch and then hop information.  The hop
 information is what I'm a bit puzzled about here, as well as what port
 guid information is tacked on.  The format of the file is:
 Switch: guid 0xx
 LID 0x  00 ff ffhops for all ports  # portguid 0x000
 That is the switch to switch hop count information. Probably not of much use 
 with only 1 switch.
 Ugh ... I need another switch or .dump files from someone ... I haven't 
 found any stray .dump files out on the network, but then, Google knows 
 all and someone must have posted a couple somewhere to play with.
 
 Thank you so much again Ira, I wasn't too far off and mostly it seems 
 I'm off in places that having only a single switch wouldn't let me see.  
 The semantic correction of opensm-lfts.dump was critical.
 
 Cheers, have a wonderful weekend.
 
 Paul Monday
 Parallel Scientific, LLC
 

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: opensm: file routing engine

2011-04-22 Thread Weiny, Ira K.

On Apr 22, 2011, at 7:41 AM, Paul Monday (Parallel Scientific) wrote:

 I've been toying with the file routing engine implementation for some 
 work I'm doing, but I'm finding very little documentation on it.  I only 
 have one switch to experiment with at the moment as well so some of the 
 information in the lid / lfts files that are generated are not obvious 
 for how they expand to a multiple switch environment.  Perhaps there is 
 a document around since I'm a RTFM type of person?
 
 At any rate, here's what I've gathered with 4. being the big question.
 
 1. The easiest way to get started with the file routing engine is to 
 generate the lid / lfts using a different routing engine.  I went ahead 
 and did the following:  opensm -D 0x40 -R ftree
 2. Once run, copy the /var/log/opensm-lfts.dump and 
 /var/log/opensm-lid-matrix.dump files elsewhere for use
 3. I've tried to generalize the file contents below
 4. Modify the opensm-lid-matrix.dump file to implement or tweak the 
 routing algorithm over the physical network?
 5. Run opensm -R file -M new-lid-matrix.dump -U new-lfts.dump

I think this is the general method yes.

 
 I have one other strange question ... is it possible to carve a single 
 physical switch into two logical switches (put a cable between ports 
 16/17 and modify the routing tables ... this seems like it wouldn't work 
 as the Unicast LID / Switch: guid rows in the respective files below 
 serve as keys so the single switch would be identified twice).

Not that I am aware of.  When you say you have a single switch I assume you 
mean a switch based on a single switch ASIC?  Like a 24 or 36 port pizza box 
switch.

 
 The file formats seem to be:
 
 opensm-lfts.dump (later becomes -U [file])
 - Contains all discovered ports (powered on), their function (Switch vs. 
 Channel Adapter), their LID and some extra information.  This is 
 essentially the physical network (if all machines are powered on) ... 
 the format is:
 Unicast lids [0-x] of switch Lid LID# guid GUID ('switch description'):
 LID 0x SwitchPort ZZZ # Channel Adapter | Switch portguid 
 GUID: 'Descirption'
 
 I assume this file grows with all of the Channel Adapters and switches.  
 Given a switch-switch connection a row would look like
 0x0019 005 # Switch portguid 0x003 'MF3:switch-my:MTS3600/U1'

Yes this file grows with more nodes in the system.  But the line above is not a 
connection but rather a linear forwarding table entry.  In general, this is 
saying that for the given lid 0x0019 route out port 5 of that switch (the 
switch given by the Unicast lids [... line.  The information after '#' is 
more information about the node with lid=0x0019. This is _not_ the other end of 
the link on port 5.

The topology of the physical connections are shown in opensm-subnet.lst.

 
 You could essentially use this file to map the entire physical network, 
 you would end up with a graph ... but no information for how to traverse 
 it efficiently, does that sound right?

No this is not mapping the physical network.  It is a dump of the port 
forwarding which was programed into each switch by opensm.

Changing this file is what allows you to change the routing and then feed it 
back into opensm.

 
 opensm-lid-matrix.dump
 - Looks like it contains the hop information ... but it's a bit more 
 cryptic since I have only one switch :(  It should contain a list of all 
 switches, the LID for the switch and then hop information.  The hop 
 information is what I'm a bit puzzled about here, as well as what port 
 guid information is tacked on.  The format of the file is:
 Switch: guid 0xx
 LID 0x 00 ff ff hops for all ports # portguid 0x000

That is the switch to switch hop count information. Probably not of much use 
with only 1 switch.

Ira

 
 I know ... it's a detailed question but I figured I would write enough 
 so someone else wouldn't have to reverse engineer using the file routing 
 engine if this is basically right.
 
 Paul Monday
 Parallel Scientific, LLC
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: opensm: file routing engine

2011-04-22 Thread Paul Monday (Parallel Scientific)

Thank you, your detail is greatly appreciated :)


I have one other strange question ... is it possible to carve a single
physical switch into two logical switches (put a cable between ports
16/17 and modify the routing tables ... this seems like it wouldn't work
as the Unicast LID / Switch: guid rows in the respective files below
serve as keys so the single switch would be identified twice).

Not that I am aware of.  When you say you have a single switch I assume you mean a switch 
based on a single switch ASIC?  Like a 24 or 36 port pizza box switch.
Yes, a 36 port Mellanox pizza box with a single crossbar ... based on 
how I read these files, it looks like they key off a single GUID that 
identifies the switch ... which would probably make the subnet manager 
unhappy if I arbitrarily tried to mock it up being two switches somehow

The file formats seem to be:

opensm-lfts.dump (later becomes -U [file])
- Contains all discovered ports (powered on), their function (Switch vs.
Channel Adapter), their LID and some extra information.  This is
essentially the physical network (if all machines are powered on) ...
the format is:
Unicast lids [0-x] of switch Lid LID# guidGUID  ('switch description'):
LID 0x  SwitchPort ZZZ  #Channel Adapter | Switch  portguid
GUID: 'Descirption'

I assume this file grows with all of the Channel Adapters and switches.
Given a switch-switch connection a row would look like
0x0019 005 # Switch portguid 0x003 'MF3:switch-my:MTS3600/U1'
Yes this file grows with more nodes in the system.  But the line above is not a connection but 
rather a linear forwarding table entry.  In general, this is saying that for the given lid 
0x0019 route out port 5 of that switch (the switch given by the Unicast lids 
[... line.  The information after '#' is more information about the node with lid=0x0019. 
This is _not_ the other end of the link on port 5.
Ahhh, I see ... so this table could get quite large ... if I have 1,000 
nodes in a subnet, each with a LID assigned, this table would become 
quite large as each LID would be listed for each switch if I have my 
forwarding thoughts in my head ... maybe I need to wander around and 
steal another switch from someone ;-)

The topology of the physical connections are shown in opensm-subnet.lst.
Ahhh, but the opensm-subnet.lst is not handed to the file routing 
algorithm ... this must be derived at runtime each run I'm guessing 
and then dumped to /var/log.  Very helpful!  Thank you for the pointer.

You could essentially use this file to map the entire physical network,
you would end up with a graph ... but no information for how to traverse
it efficiently, does that sound right?

No this is not mapping the physical network.  It is a dump of the port 
forwarding which was programed into each switch by opensm.

Changing this file is what allows you to change the routing and then feed it 
back into opensm.


opensm-lid-matrix.dump
- Looks like it contains the hop information ... but it's a bit more
cryptic since I have only one switch :(  It should contain a list of all
switches, the LID for the switch and then hop information.  The hop
information is what I'm a bit puzzled about here, as well as what port
guid information is tacked on.  The format of the file is:
Switch: guid 0xx
LID 0x  00 ff ffhops for all ports  # portguid 0x000

That is the switch to switch hop count information. Probably not of much use 
with only 1 switch.
Ugh ... I need another switch or .dump files from someone ... I haven't 
found any stray .dump files out on the network, but then, Google knows 
all and someone must have posted a couple somewhere to play with.


Thank you so much again Ira, I wasn't too far off and mostly it seems 
I'm off in places that having only a single switch wouldn't let me see.  
The semantic correction of opensm-lfts.dump was critical.


Cheers, have a wonderful weekend.

Paul Monday
Parallel Scientific, LLC

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html