Hi Sebastien.

It is in fact an asymmetric routing problem. But the way routes are declared 
today in Lustre makes it quite difficult to avoid in this particular context.

I was considering the possibility to add a flag, a special route, whatever, to 
force LNet to return the response to the same router the request arrived from. 
Nevertheless, since I started to look at Lustre's code today for the very first 
time, it will take quite some time before I get something useful. I don't even 
know if this is actually possible. If that ever happens, I'll be glad to 
contribute it.

Cheers,
Alejandro

-----Original Message-----
From: Sebastien Buisson [mailto:sbuis...@ddn.com] 
Sent: Friday, October 13, 2017 3:42 PM
To: LOPEZ, ALEXANDRE
Cc: Lustre Discuss (lustre-discuss@lists.lustre.org)
Subject: Re: [lustre-discuss] Routers and shortest path

Hi Alejandro!

This makes me think of an asymmetric routing problem. It could be addressed by 
implementing something like reverse path filtering 
(http://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.kernel.rpf.html) in LNet: nodes 
would not accept requests from peers through router B when they are configured 
to talk to those peers through router A only.

If there is no other ready for use solution and you are willing to contribute 
code :)

Cheers,
Sebastien.

> Le 13 oct. 2017 à 15:20, LOPEZ, ALEXANDRE <alexandre.lo...@atos.net> a écrit :
> 
> Hi everyone,
>  
> I’d like to have your opinion on a problem I’m facing. Sorry for the long 
> mail but I failed to make it shorter without removing some important 
> information.
>  
> Each islet on my cluster has a dedicated Lustre router connected to the 
> interconnect and to a dedicated network where Lustre servers are reachable. 
> Lustre servers are NOT on the main interconnect, thus the need for routers. 
> Any router is reachable thru the interconnect from any node but, when the 
> node and the router aren’t on the same islet, several switches (hops) need to 
> be crossed. The idea is to use the shortest path to the servers thru the 
> islet-local router.
>  
> I created the appropriate routes on each compute node to contact the 
> islet-local Lustre router. There is also a lower-priority route to fail over 
> a router on another islet in case the local Lustre router fails. (This could 
> have also been done with the route’s hops, but my understanding is that the 
> final result is the same.) I also created the routes on the Lustre servers 
> for the responses to reach the clients thru the routes.
>  
> This seems to work as expected, but this is actually false.
>  
> Although the filesystem is mounted on the clients and works, there is a 
> problem when there is no failure (all routers are up). The problem roots in 
> the routes used to deliver the responses from the servers. If I assign 
> priorities to the routes on the servers, the higher priority route will 
> always be used to send the responses. So, if a compute node sent a request 
> thru its islet’s router (the shortest path), the response will not return 
> thru the same router but thru the one designated by the higher priority 
> route, making the return path longer. Using hops is the same thing: the route 
> with the lower hop value is chosen, but the same set of routes apply to all 
> the nodes on all the islets and a valid value for an islet is not valid for 
> all the others. If I assign neither priority nor hops, round-robin will be 
> used and the next route on the list is selected.
>  
> The ideal solution would be for the response to follow the reverse path 
> followed by the request (thru the same router) but I found no way to do it.
>  
> Is there any way to make the responses go the reverse (shortest) path?
>  
> Any other way to solve this?
>  
> I considered assigning a separate Lustre network to each islet but, although 
> this solves this problem, it adds new ones; so I ended up discarding it.
>  
> I’m currently using Lustre 2.7 but I found nothing suggesting that 2.10 will 
> solve the problem.
>  
> Thanks for your time and answers.
>  
> Alexandre Lopez
> Big Data & Security – Data Management
> Bull SAS – Atos Technologies
>  
>  
>  
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to