To original question: lnetctl on router node shows ‘enable: 1 ’ 

# lnetctl routing show
routing:
    - cpt[0]:
 …snip…
    - enable: 1

Lustre 2.10.3-1.el6

Alex.

On 4/17/18, 7:05 PM, "lustre-discuss on behalf of Faaland, Olaf P." 
<lustre-discuss-boun...@lists.lustre.org on behalf of faala...@llnl.gov> wrote:

    Update:
    
    Joe pointed out "lnetctl set routing 1".  After invoking that on the router 
node, the compute node reports the route as up:
    
    [root@ulna66:lustre-211]# lnetctl route show -v
    route:
        - net: o2ib100
          gateway: 192.168.128.4@o2ib33
          hop: -1
          priority: 0
          state: up
    
    Does this replace the lnet module parameter "forwarding"?
    
    Olaf P. Faaland
    Livermore Computing
    
    
    ________________________________________
    From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of 
Faaland, Olaf P. <faala...@llnl.gov>
    Sent: Tuesday, April 17, 2018 4:34:22 PM
    To: lustre-discuss@lists.lustre.org
    Subject: [lustre-discuss] Lustre 2.11 lnet troubleshooting
    
    Hi,
    
    I've got a cluster running 2.11 with 2 routers and 68  compute nodes.  It's 
the first time I've used a post-multi-rail version of Lustre.
    
    The problem I'm trying to troubleshoot is that my sample compute node 
(ulna66) seems to think the router I configured (ulna4) is down, and so an 
attempt to ping outside the cluster results in failure and "no route to XXX" on 
the console.  I can lctl ping the router from the compute node and vice-versa.  
 Forwarding is enabled on the router node via modprobe argument.
    
    lnetctl route show reports that the route is down.  Where I'm stuck is 
figuring out what in userspace (e.g. lnetctl or lctl) can tell me why.
    
    The compute node's lnet configuration is:
    
    [root@ulna66:lustre-211]# cat /etc/lnet.conf
    ip2nets:
      - net-spec: o2ib33
        interfaces:
             0: hsi0
        ip-range:
             0: 192.168.128.*
    route:
        - net: o2ib100
          gateway: 192.168.128.4@o2ib33
    
    After I start lnet, systemctl reports success and the state is as follows:
    
    [root@ulna66:lustre-211]# lnetctl net show
    net:
        - net type: lo
          local NI(s):
            - nid: 0@lo
              status: up
        - net type: o2ib33
          local NI(s):
            - nid: 192.168.128.66@o2ib33
              status: up
              interfaces:
                  0: hsi0
    
    [root@ulna66:lustre-211]# lnetctl peer show --verbose
    peer:
        - primary nid: 192.168.128.4@o2ib33
          Multi-Rail: False
          peer ni:
            - nid: 192.168.128.4@o2ib33
              state: up
              max_ni_tx_credits: 8
              available_tx_credits: 8
              min_tx_credits: 7
              tx_q_num_of_buf: 0
              available_rtr_credits: 8
              min_rtr_credits: 8
              refcount: 4
              statistics:
                  send_count: 2
                  recv_count: 2
                  drop_count: 0
    
    [root@ulna66:lustre-211]# lnetctl route show --verbose
    route:
        - net: o2ib100
          gateway: 192.168.128.4@o2ib33
          hop: -1
          priority: 0
          state: down
    
    I can instrument the code, but I figure there must be someplace available 
to normal users to look, that I'm unaware of.
    
    thanks,
    
    Olaf P. Faaland
    Livermore Computing
    _______________________________________________
    lustre-discuss mailing list
    lustre-discuss@lists.lustre.org
    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    _______________________________________________
    lustre-discuss mailing list
    lustre-discuss@lists.lustre.org
    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    


_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to