You need to run writeconf on all targets at the same time, and mount in a 
specific order. That is documented in th Lustre Operations Manual.

Cheers, Andreas

On Jan 18, 2023, at 03:49, Edmondson, Edward via lustre-discuss 
<lustre-discuss@lists.lustre.org> wrote:


Hi all,

I'm struggling to get my OSS mounts online after a less than clean shutdown. 
I'm on lustre 2.12.9. Plenty of googling etc doesn’t bring up anything that 
seems particular to the problem I’m having unfortunately.

lnet seems to be up, pings ok both ways, communications clearly happen between 
the nodes judging by the logs. I've been through the log reconfiguration 
process with --writeconf on everything, step by step as in the manual

On the OSS node when I try to mount:
mount.lustre: mount /dev/mapper/lustre-oss0 at /mnt/oss0 failed: No such file 
or directory
Is the MGS specification correct?
Is the filesystem name correct?
If upgrading, is the copied client log valid? (see upgrade docs)

In logs:
Jan 18 10:27:56 nas-0-4 kernel: LustreError: 
31015:0:(ldlm_lib.c:494:client_obd_setup()) can't add initial connection
Jan 18 10:27:56 nas-0-4 kernel: LustreError: 
31015:0:(lwp_dev.c:125:lwp_setup()) lustre-MDT0000-lwp-OST0000: client obd 
setup error: rc = -2
Jan 18 10:27:56 nas-0-4 kernel: LustreError: 
31015:0:(lwp_dev.c:273:lwp_init0()) lustre-MDT0000-lwp-OST0000: setup lwp 
failed. -2
Jan 18 10:27:56 nas-0-4 kernel: LustreError: 
31015:0:(obd_config.c:559:class_setup()) setup lustre-MDT0000-lwp-OST0000 
failed (-2)
Jan 18 10:27:56 nas-0-4 kernel: LustreError: 
31015:0:(obd_mount.c:202:lustre_start_simple()) lustre-MDT0000-lwp-OST0000 
setup error -2
Jan 18 10:27:56 nas-0-4 kernel: LustreError: 
31015:0:(obd_mount_server.c:671:lustre_lwp_setup()) lustre-MDT0000-lwp-OST0000: 
setup up failed: rc -2
Jan 18 10:27:56 nas-0-4 kernel: LustreError: 15c-8: MGC10.3.255.200@o2ib: The 
configuration from log 'lustre-client' failed (-2). This may be the result of 
communication errors between this node and the MGS, a bad configuration, or 
other errors. See the syslog for more information.
Jan 18 10:27:56 nas-0-4 kernel: LustreError: 
30961:0:(obd_mount_server.c:1414:server_start_targets()) lustre-OST0000: failed 
to start LWP: -2
Jan 18 10:27:56 nas-0-4 kernel: LustreError: 
30961:0:(obd_mount_server.c:1992:server_fill_super()) Unable to start targets: 
-2
Jan 18 10:27:56 nas-0-4 kernel: Lustre: Failing over lustre-OST0000
Jan 18 10:27:57 nas-0-4 kernel: LustreError: 
30961:0:(ldlm_lockd.c:3203:ldlm_cleanup()) ldlm still has namespaces; clean 
these up first.
Jan 18 10:27:57 nas-0-4 kernel: LustreError: 
30961:0:(ldlm_lockd.c:2862:ldlm_put_ref()) ldlm_cleanup failed: -16
Jan 18 10:27:57 nas-0-4 kernel: Lustre: server umount lustre-OST0000 complete
Jan 18 10:27:57 nas-0-4 kernel: LustreError: 
30961:0:(obd_mount.c:1604:lustre_fill_super()) Unable to mount (-2)

On the MGS/MDT node (which has now mounted the MGS and MDT fine):
Jan 18 10:27:56 nas-0-3 kernel: Lustre: MGS: Connection restored to 
24758df3-a11a-f5db-18a5-2e0e35f2099d (at 10.3.255.199@o2ib)
Jan 18 10:27:56 nas-0-3 kernel: Lustre: MGS: Regenerating lustre-OST0000 log by 
user request: rc = 0
Jan 18 10:27:56 nas-0-3 kernel: Lustre: Found index 0 for lustre-OST0000, 
updating log
Jan 18 10:27:56 nas-0-3 kernel: Lustre: Client log for lustre-OST0000 was not 
updated; writeconf the MDT first to regenerate it.

The MDT has absolutely been writeconfed so that last message isn't terribly 
helpful. fscks are clean, so there's not a problem there.

Any advice hugely appreciated!

--
Dr Edd Edmondson
HPC Systems Manager
Dept of Physics and Astronomy
University College London

(he/him) During remote working email is the best way to contact me. If needed I 
am available by phone on 0203 108 1399, by Microsoft Teams, or other methods by 
arrangement.
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
  • [lustre-discuss] ... Edmondson, Edward via lustre-discuss
    • Re: [lustre-... Andreas Dilger via lustre-discuss
      • Re: [lus... Hanafi, Mahmoud (ARC-TN)[InuTeq, LLC] via lustre-discuss
    • Re: [lustre-... Edmondson, Edward via lustre-discuss

Reply via email to