On Aug 26, 2020, at 16:37, Faaland, Olaf P. 
<faala...@llnl.gov<mailto:faala...@llnl.gov>> wrote:

Does Lustre 2.12 require that routes for every intermediate network are 
defined, on every node on a path?

For example, given this Lustre network, where:
 A-D are nodes and 1-6 are addresses
 network tcp2 has only routers, no clients and no servers

A(1) -tcp1- (2)B(3) -tcp2- (4)C(5) -tcp3- (6)D

And configured routes:

A: options lnet routes="tcp3 2@tcp1"
B: options lnet routes="tcp3 4@tcp2"
C: options lnet routes="tcp1 3@tcp2"
D: options lnet routes="tcp1 5@tcp3"

With Lustre <= 2.10 we configured only these routes.  The only nodes that need 
to know tcp2 exist are attached to it, and so there are no routes to tcp2 
defined anywhere.

It looks to me like Lustre 2.12 attempts to send error notifications back to 
the original sender, and so nodes A and D may end up receiving messages from 
nids on tcp2.  This then requires nodes A and D to have routes to tcp2 defined, 
so they can reply to the messages.

Interesting.  I'm no LNet expert, but it seems strange to me that nodes other 
than B and C should care about the state of connections within @tcp2 if they 
are not endpoints themselves. A and D should never be sending messges directly 
to those nodes, and the LNet routers B/C knowing which connections peers in 
@tcp2 are working should be enough for them to make routing decisions for A and 
D.

If B/C nodes are themselves unable to communicate with their peers, then _that_ 
should be sent back to A/D to indicate they cannot route packets to the target 
NID, but I wouldn't think A/D should get information about @tcp2 themselves?

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud






_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to