Re: [lustre-discuss] LNET Conf Advise and Rearchitecting

2019-04-08 Thread Harr, Cameron
Paul,

We still largely use static routing as we migrate from 2.5 and 2.8 to 2.10. We 
basically cross mount all our production file systems across the various 
compute clusters and have routing clusters to route Lustre traffic from IB or 
OPA to Ethernet between buildings. Each building has its own routing cluster 
and the compute clusters have their own routing nodes. We make sure to use 
separate Lustre networks for each compute cluster and for each building's IB 
SAN. The inter-building Ethernet network has its own id as well. The 
/etc/modprobe.d/lnet.conf file then loads the routing tables when the lnet 
module is loaded. It works well except for when we need to make a change.

A simple traffic diagram would look like the following.

[compute node] <-> [compute cluster router node] <-> [Bldg A routing cluster] 
<--ETH--> [Bldg B routing cluster] <-> [Bldg B IB SAN / Lustre cluster]

For the lnet.conf files, there are different ones for each type of node. See 
the below examples, where o2ib1 is net id for compute cluster 1, o2ib100 is SAN 
for bldg A, and o2ib200 is SAN for bldg B. As we have many clusters, our config 
is more complex and I've done significant pruning. Hopefully it still makes 
sense.

  *   Compute node (essentially send everything to cluster router nodes)
 *

options lnet networks="o2ib1(ib0)" \
routes="tcp0 192.168.128.[4-5]@o2ib1; \
o2ib100 192.168.128.[4-5]@o2ib1; \
o2ib200 192.168.128.[4-5]@o2ib1"

  *   Compute cluster router (send both WAN and Bldg B traffic to Bldg A router 
nodes)
 *

options lnet forwarding="enabled" \
networks="o2ib100(san0),o2ib1(ib0)" \
routes="tcp0 172.9.2.[2-5]@o2ib100; \
o2ib200 172.9.2.[2-5]@o2ib100"


  *   Bldg A router node (Send traffic to compute cluster to cluster router 
nodes and Bldg B traffic to Bldg B routers)
 *

options lnet forwarding="enabled" \
 networks="tcp0(lnet0),o2ib100(san0)" \
 routes="o2ib1 172.9.1.[91-92]@o2ib100; \
 o2ib200 172.9.7.[2-5]@tcp0;"

  *   Bldg B router node (Send compute cluster or other Bldg A traffic to Bldg 
A routers)
 *

options lnet forwarding="enabled" \
 networks="tcp0(lnet0),o2ib200(san0)" \
 routes="o2ib1 172.9.2.[2-5]@tcp0; \
 o2ib100 172.9.2.[2-5]@tcp0"


  *   Lustre server in Bldg B (Send everything to Bldg B routers)
 *

options lnet networks="o2ib200(ib0)" \
routes="tcp0 172.9.3.[42-45]@o2ib200; \
o2ib1 172.9.3.[42-45]@o2ib200; \
o2ib100 172.9.3.[42-45]@o2ib200"

On 4/4/19 1:18 PM, Andreas Dilger wrote:

On Apr 4, 2019, at 09:49, Paul Edmon 
 wrote:



I was hoping to get some advise on how to write a valid lnet.conf and also how 
we should rearchitect our current LNET layout.  I tried following the Lustre 
docs for lnet.conf but they were not helpful and I ended up not being able to 
successfully set up a LNET router programmatically.  You can see my attempt to 
do so in puppet here:

https://github.com/fasrc/puppet-lnet

I'm pretty sure I am missing something but I don't know what.

As for our current architecture it is as follows.  Right now we have two data 
centers separated by 100 km each with Lustre filesystems in them and their own 
IB islands.  To complicate matters we will have a third IB island coming online 
soon as well, so what we set up should be extensible.  I want to code this in 
Puppet so I can easily lay down new lnet.conf's and spin up new LNET layers.  
Here are the systems in each place as well as the Lustre versions.



I can't comment on the LNet configuration side of things, but the Dynamic LNet 
Config feature is not available in Lustre 2.5.x, which makes this configuration 
a lot more complex than it needs to be.  With Lustre 2.5.x need to specify the 
static routing config as module parameters (ip2nets is easiest I believe, but I 
don't know the details).  In Lustre 2.10 you can change the routing 
configuration at runtime with YAML configuration files or interactive lnetctl 
usage.  With 2.5.x this will mean rebooting (or at least unmounting Lustre and 
removing all modules) for all of the nodes at least once to install a new LNet 
configuration.

Note that LNet is source routed (i.e. the clients and servers determine the 
route to send requests to their peers and they determine how to reply), so the 
routing configuration needs to be installed on all of the clients and servers, 
not just on the router nodes.  The clients need routes to all of the servers 
(though not to each other), and the servers need routes to all of the clients.  
This means every separate cluster that does not have direct communication to 
another cluster (clients or servers) will need to be on a separate LNet network 
with one or more routers in between.

You may want to consider upgrading the clients to 2.10 and use DLC instead of 
investing a lot of time to con

Re: [lustre-discuss] LNET Conf Advise and Rearchitecting

2019-04-04 Thread Andreas Dilger
On Apr 4, 2019, at 09:49, Paul Edmon  wrote:
> 
> I was hoping to get some advise on how to write a valid lnet.conf and also 
> how we should rearchitect our current LNET layout.  I tried following the 
> Lustre docs for lnet.conf but they were not helpful and I ended up not being 
> able to successfully set up a LNET router programmatically.  You can see my 
> attempt to do so in puppet here: 
> 
> https://github.com/fasrc/puppet-lnet
> 
> I'm pretty sure I am missing something but I don't know what.
> 
> As for our current architecture it is as follows.  Right now we have two data 
> centers separated by 100 km each with Lustre filesystems in them and their 
> own IB islands.  To complicate matters we will have a third IB island coming 
> online soon as well, so what we set up should be extensible.  I want to code 
> this in Puppet so I can easily lay down new lnet.conf's and spin up new LNET 
> layers.  Here are the systems in each place as well as the Lustre versions.

I can't comment on the LNet configuration side of things, but the Dynamic LNet 
Config feature is not available in Lustre 2.5.x, which makes this configuration 
a lot more complex than it needs to be.  With Lustre 2.5.x need to specify the 
static routing config as module parameters (ip2nets is easiest I believe, but I 
don't know the details).  In Lustre 2.10 you can change the routing 
configuration at runtime with YAML configuration files or interactive lnetctl 
usage.  With 2.5.x this will mean rebooting (or at least unmounting Lustre and 
removing all modules) for all of the nodes at least once to install a new LNet 
configuration.

Note that LNet is source routed (i.e. the clients and servers determine the 
route to send requests to their peers and they determine how to reply), so the 
routing configuration needs to be installed on all of the clients and servers, 
not just on the router nodes.  The clients need routes to all of the servers 
(though not to each other), and the servers need routes to all of the clients.  
This means every separate cluster that does not have direct communication to 
another cluster (clients or servers) will need to be on a separate LNet network 
with one or more routers in between.

You may want to consider upgrading the clients to 2.10 and use DLC instead of 
investing a lot of time to configure the "old" routes?

Hopefully someone more familiar with the actual routing configuration will 
chime in.  You may want to consider looking if there are routing examples in 
some of the old LUG/LAD presentations.  Of course, when you figure out the 
details, improvements to the parts of the manual that are lacking detail would 
be welcome.

Cheers, Andreas

> Boston
> 
> boslfs: 5PB Lustre IEEL Filesystem, Lustre 2.5.34, IB only export, routed via 
> boslnet[01-02] as o2ib1
> 
> boslnet[01,02]: Lustre 2.5.34 bridges boslfs IB to our 10 GBE ethernet network
> 
> Holyoke
> 
> holylfs: 5PB Lustre IEEL Filesystem, Lustre 2.5.34, IB only export, routed 
> via holylnet[01-02] as o2ib0
> 
> holylfs02: 5PB Lustre Filesystem, Lustre 2.10.4, IB only export, routed via 
> holylnet[03-04] as o2ib2
> 
> holylfs03: 3PD Lustre Filesystem, Lustre 2.10.6, IB only export, routed via 
> holylnet[01-02] as o2ib0
> 
> scratchlfs: 2PB, DDN Exascaler, Lustre 2.10.5, IB only export, routed via 
> holylnet[01-02] as o2ib0
> 
> holylnet[01-04]: Lustre 2.5.34, bridges FDR IB to GbE ethernet network
> 
> panlfs2, panlfs3, kuanglfs: Various Lustre 2.5.3, exported via IB and GbE do 
> not use LNET routers, mounted via o2ib or tcp0 (depending on if they are GbE 
> or IB connected)
> 
> All these systems are exported to the compute which live in both datacenters 
> both on and off of the IB fabric.  The compute is running Lustre 2.10.3.  As 
> noted soon we will have an independent IB fabric at Holyoke that is EDR/HDR 
> that we want to bridge with the FDR fabric using an LNET router as well.
> 
> What I would like to do is both bridge the Boston network from IB to GbE and 
> then back to IB (right now it just does IB to GbE), make all the Lustre hosts 
> at Holyoke that aren't dual homed use the same block of LNET routers that we 
> can expand easily and programatically, finally lay the ground work for the 
> LNET bridging from FDR to EDR fabrics.  It would also be good to use Lustre 
> 2.10.x for the routers or whatever the latest version is so we an use 
> lnet.conf.  I tried this on my own but I couldn't get a conf that worked even 
> though I thought I had one.  I wasn't sure what I was doing wrong.
> 
> If you like I can share with you our current configs but given that I'm happy 
> to throw them out I'm good with just scrapping all I have to start from 
> scratch and do it right.  Also happy to send a diagram if that would be 
> helpful.
> 
> Thanks for your help in advance!
> 
> -Paul Edmon-
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/list

[lustre-discuss] LNET Conf Advise and Rearchitecting

2019-04-04 Thread Paul Edmon
I was hoping to get some advise on how to write a valid lnet.conf and 
also how we should rearchitect our current LNET layout.  I tried 
following the Lustre docs for lnet.conf but they were not helpful and I 
ended up not being able to successfully set up a LNET router 
programmatically.  You can see my attempt to do so in puppet here:


https://github.com/fasrc/puppet-lnet

I'm pretty sure I am missing something but I don't know what.

As for our current architecture it is as follows.  Right now we have two 
data centers separated by 100 km each with Lustre filesystems in them 
and their own IB islands.  To complicate matters we will have a third IB 
island coming online soon as well, so what we set up should be 
extensible.  I want to code this in Puppet so I can easily lay down new 
lnet.conf's and spin up new LNET layers.  Here are the systems in each 
place as well as the Lustre versions.


Boston

boslfs: 5PB Lustre IEEL Filesystem, Lustre 2.5.34, IB only export, 
routed via boslnet[01-02] as o2ib1


boslnet[01,02]: Lustre 2.5.34 bridges boslfs IB to our 10 GBE ethernet 
network


Holyoke

holylfs: 5PB Lustre IEEL Filesystem, Lustre 2.5.34, IB only export, 
routed via holylnet[01-02] as o2ib0


holylfs02: 5PB Lustre Filesystem, Lustre 2.10.4, IB only export, routed 
via holylnet[03-04] as o2ib2


holylfs03: 3PD Lustre Filesystem, Lustre 2.10.6, IB only export, routed 
via holylnet[01-02] as o2ib0


scratchlfs: 2PB, DDN Exascaler, Lustre 2.10.5, IB only export, routed 
via holylnet[01-02] as o2ib0


holylnet[01-04]: Lustre 2.5.34, bridges FDR IB to GbE ethernet network

panlfs2, panlfs3, kuanglfs: Various Lustre 2.5.3, exported via IB and 
GbE do not use LNET routers, mounted via o2ib or tcp0 (depending on if 
they are GbE or IB connected)


All these systems are exported to the compute which live in both 
datacenters both on and off of the IB fabric.  The compute is running 
Lustre 2.10.3.  As noted soon we will have an independent IB fabric at 
Holyoke that is EDR/HDR that we want to bridge with the FDR fabric using 
an LNET router as well.


What I would like to do is both bridge the Boston network from IB to GbE 
and then back to IB (right now it just does IB to GbE), make all the 
Lustre hosts at Holyoke that aren't dual homed use the same block of 
LNET routers that we can expand easily and programatically, finally lay 
the ground work for the LNET bridging from FDR to EDR fabrics.  It would 
also be good to use Lustre 2.10.x for the routers or whatever the latest 
version is so we an use lnet.conf.  I tried this on my own but I 
couldn't get a conf that worked even though I thought I had one.  I 
wasn't sure what I was doing wrong.


If you like I can share with you our current configs but given that I'm 
happy to throw them out I'm good with just scrapping all I have to start 
from scratch and do it right.  Also happy to send a diagram if that 
would be helpful.


Thanks for your help in advance!

-Paul Edmon-

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org