On Aug 26, 2020, at 16:37, Faaland, Olaf P. <faala...@llnl.gov<mailto:faala...@llnl.gov>> wrote:
Does Lustre 2.12 require that routes for every intermediate network are defined, on every node on a path? For example, given this Lustre network, where: A-D are nodes and 1-6 are addresses network tcp2 has only routers, no clients and no servers A(1) -tcp1- (2)B(3) -tcp2- (4)C(5) -tcp3- (6)D And configured routes: A: options lnet routes="tcp3 2@tcp1" B: options lnet routes="tcp3 4@tcp2" C: options lnet routes="tcp1 3@tcp2" D: options lnet routes="tcp1 5@tcp3" With Lustre <= 2.10 we configured only these routes. The only nodes that need to know tcp2 exist are attached to it, and so there are no routes to tcp2 defined anywhere. It looks to me like Lustre 2.12 attempts to send error notifications back to the original sender, and so nodes A and D may end up receiving messages from nids on tcp2. This then requires nodes A and D to have routes to tcp2 defined, so they can reply to the messages. Interesting. I'm no LNet expert, but it seems strange to me that nodes other than B and C should care about the state of connections within @tcp2 if they are not endpoints themselves. A and D should never be sending messges directly to those nodes, and the LNet routers B/C knowing which connections peers in @tcp2 are working should be enough for them to make routing decisions for A and D. If B/C nodes are themselves unable to communicate with their peers, then _that_ should be sent back to A/D to indicate they cannot route packets to the target NID, but I wouldn't think A/D should get information about @tcp2 themselves? Cheers, Andreas -- Andreas Dilger Principal Lustre Architect Whamcloud
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org