This is a known issue, see https://jira.whamcloud.com/browse/LU-11840 and 
https://jira.whamcloud.com/browse/LU-13548

Aurélien

De : lustre-discuss <lustre-discuss-boun...@lists.lustre.org> au nom de Mark 
Lundie <mark.lun...@manchester.ac.uk>
Date : mardi 1 décembre 2020 à 13:16
À : fırat yılmaz <firatyilm...@gmail.com>
Cc : "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org>
Objet : RE: [EXTERNAL] [lustre-discuss] lnet routing issue - 2.12.5 client with 
2.10.3 server


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi Firat,

Thanks for your reply. Apologies if I am being silly here, but there is no 
route configured for that network. We have the networks tcp (10.110.0.0/16) and 
tcp1 (10.10.0.0/16). The servers have interfaces on both, but the clients only 
have an interface on tcp1. I'm not sure why the client is trying to route to 
10.110.0.21@tcp:

client # mount /net/lustre/
mount.lustre: mount hmeta1@tcp1:hmeta2@tcp1:/lustre at /net/lustre failed: 
Input/output error
Is the MGS running?

hmeta1 resolves to 10.10.0.91, on tcp1.

Thanks,

Mark
________________________________
From: fırat yılmaz <firatyilm...@gmail.com>
Sent: 01 December 2020 11:55
To: Mark Lundie <mark.lun...@manchester.ac.uk>
Cc: lustre-discuss@lists.lustre.org <lustre-discuss@lists.lustre.org>
Subject: Re: [lustre-discuss] lnet routing issue - 2.12.5 client with 2.10.3 
server

Hi Mark,

[Tue Dec  1 11:07:55 2020] LNetError: 
2127:0:(lib-move.c:1999:lnet_handle_find_routed_path()) no route to 
10.110.0.21@tcp from <?>

I would suggest checking  lnetctl routing show and remove the route to  
10.110.0.21@tcp and try to mount.
https://wiki.lustre.org/LNet_Router_Config_Guide



On Tue, Dec 1, 2020 at 2:41 PM Mark Lundie 
<mark.lun...@manchester.ac.uk<mailto:mark.lun...@manchester.ac.uk>> wrote:
Hi all,

I've just run in to an issue mounting on a newly upgraded client running 2.12.5 
with 2.10.3 servers. Just to give some background, we're about to replace our 
existing Lustre storage, but will run it concurrently with the replacement for 
a couple of months. We'll be running 2.12.5 server on the new MDS and OSSs and 
I plan to update all clients to the same version. I would like to avoid 
updating the existing servers though.

The problem is this. The servers have two tcp LNET networks, tcp and tcp1, on 
separate subnets and VLANs. The clients only see tcp1 (a small number are also 
on tcp3, routed via 2 lnet routers), which has been fine until now. With the 
2.12.5 client, however, it is trying to mount from tcp. 2.10.3 to 2.12.5 is 
obviously a bit of a jump, but does anyone have any ideas on what has changed 
and what I could do here please?

meta# lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
    - net type: tcp
      local NI(s):
        - nid: 10.110.0.21@tcp
          status: up
          interfaces:
              0: bond0.22
    - net type: tcp1
      local NI(s):
        - nid: 10.10.0.91@tcp1
          status: up
          interfaces:
              0: bond0

meta# lnetctl route show
route:
    - net: tcp2
      gateway: 10.10.0.254@tcp1
    - net: tcp3
      gateway: 10.10.0.254@tcp1


client# lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
    - net type: o2ib
      local NI(s):
        - nid: 10.12.170.47@o2ib
          status: up
          interfaces:
              0: ib0
    - net type: tcp1
      local NI(s):
        - nid: 10.10.170.47@tcp1
          status: up
          interfaces:
              0: em1

[Tue Dec  1 11:07:55 2020] LNetError: 
2127:0:(lib-move.c:1999:lnet_handle_find_routed_path()) no route to 
10.110.0.21@tcp from <?>
[Tue Dec  1 11:08:01 2020] LustreError: 
1792:0:(mgc_request.c:249:do_config_log_add()) MGC10.10.0.91@tcp1: failed 
processing log, type 1: rc = -5
[Tue Dec  1 11:08:08 2020] LustreError: 2169:0:(mgc_request.c:599:do_requeue()) 
failed processing log: -5
[Tue Dec  1 11:08:19 2020] LNetError: 
2127:0:(lib-move.c:1999:lnet_handle_find_routed_path()) no route to 
10.110.0.22@tcp from <?>
[Tue Dec  1 11:08:30 2020] LustreError: 15c-8: MGC10.10.0.91@tcp1: The 
configuration from log 'lustre-client' failed (-5). This may be the result of 
communication errors between this node and the MGS, a bad configuration, or 
other errors. See the syslog for more information.

client# lctl ping 10.10.0.91@tcp1
12345-0@lo
12345-10.110.0.21@tcp
12345-10.10.0.91@tcp1

Any suggestions will be greatly appreciated!

Many thanks,

Mark

Dr Mark Lundie | Research IT Systems Administrator | Research IT | Directorate 
of IT Services | B39, Sackville Street Building | The University of Manchester 
| Manchester | M1 3WE | 0161 275 8403 | ri.itservices.manchester.ac.uk


Working Hours: Tues - Thurs 0730-1730; Fri 0730-1630
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to