I think the only way to do this today is to assign the clients in each “islet” 
a unique LNet. What problems did that cause for you (besides the administrative 
headache?)

Chris Horn

On 10/13/17, 9:51 AM, "lustre-discuss on behalf of LOPEZ, ALEXANDRE" 
<lustre-discuss-boun...@lists.lustre.org on behalf of alexandre.lo...@atos.net> 
wrote:

    Hi Sebastien.
    
    It is in fact an asymmetric routing problem. But the way routes are 
declared today in Lustre makes it quite difficult to avoid in this particular 
context.
    
    I was considering the possibility to add a flag, a special route, whatever, 
to force LNet to return the response to the same router the request arrived 
from. Nevertheless, since I started to look at Lustre's code today for the very 
first time, it will take quite some time before I get something useful. I don't 
even know if this is actually possible. If that ever happens, I'll be glad to 
contribute it.
    
    Cheers,
    Alejandro
    
    -----Original Message-----
    From: Sebastien Buisson [mailto:sbuis...@ddn.com] 
    Sent: Friday, October 13, 2017 3:42 PM
    To: LOPEZ, ALEXANDRE
    Cc: Lustre Discuss (lustre-discuss@lists.lustre.org)
    Subject: Re: [lustre-discuss] Routers and shortest path
    
    Hi Alejandro!
    
    This makes me think of an asymmetric routing problem. It could be addressed 
by implementing something like reverse path filtering 
(http://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.kernel.rpf.html) in LNet: nodes 
would not accept requests from peers through router B when they are configured 
to talk to those peers through router A only.
    
    If there is no other ready for use solution and you are willing to 
contribute code :)
    
    Cheers,
    Sebastien.
    
    > Le 13 oct. 2017 à 15:20, LOPEZ, ALEXANDRE <alexandre.lo...@atos.net> a 
écrit :
    > 
    > Hi everyone,
    >  
    > I’d like to have your opinion on a problem I’m facing. Sorry for the long 
mail but I failed to make it shorter without removing some important 
information.
    >  
    > Each islet on my cluster has a dedicated Lustre router connected to the 
interconnect and to a dedicated network where Lustre servers are reachable. 
Lustre servers are NOT on the main interconnect, thus the need for routers. Any 
router is reachable thru the interconnect from any node but, when the node and 
the router aren’t on the same islet, several switches (hops) need to be 
crossed. The idea is to use the shortest path to the servers thru the 
islet-local router.
    >  
    > I created the appropriate routes on each compute node to contact the 
islet-local Lustre router. There is also a lower-priority route to fail over a 
router on another islet in case the local Lustre router fails. (This could have 
also been done with the route’s hops, but my understanding is that the final 
result is the same.) I also created the routes on the Lustre servers for the 
responses to reach the clients thru the routes.
    >  
    > This seems to work as expected, but this is actually false.
    >  
    > Although the filesystem is mounted on the clients and works, there is a 
problem when there is no failure (all routers are up). The problem roots in the 
routes used to deliver the responses from the servers. If I assign priorities 
to the routes on the servers, the higher priority route will always be used to 
send the responses. So, if a compute node sent a request thru its islet’s 
router (the shortest path), the response will not return thru the same router 
but thru the one designated by the higher priority route, making the return 
path longer. Using hops is the same thing: the route with the lower hop value 
is chosen, but the same set of routes apply to all the nodes on all the islets 
and a valid value for an islet is not valid for all the others. If I assign 
neither priority nor hops, round-robin will be used and the next route on the 
list is selected.
    >  
    > The ideal solution would be for the response to follow the reverse path 
followed by the request (thru the same router) but I found no way to do it.
    >  
    > Is there any way to make the responses go the reverse (shortest) path?
    >  
    > Any other way to solve this?
    >  
    > I considered assigning a separate Lustre network to each islet but, 
although this solves this problem, it adds new ones; so I ended up discarding 
it.
    >  
    > I’m currently using Lustre 2.7 but I found nothing suggesting that 2.10 
will solve the problem.
    >  
    > Thanks for your time and answers.
    >  
    > Alexandre Lopez
    > Big Data & Security – Data Management
    > Bull SAS – Atos Technologies
    >  
    >  
    >  
    > _______________________________________________
    > lustre-discuss mailing list
    > lustre-discuss@lists.lustre.org
    > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    
    _______________________________________________
    lustre-discuss mailing list
    lustre-discuss@lists.lustre.org
    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to