Hi Robin, Your MDT configuration file seems corrupt somehow. Those "already exists, won't add” errors make me think about a ticket we opened a while back https://jira.whamcloud.com/browse/LU-15000 It was also with 2.12 but only because we had used lctl llog_cancel (or the newer lctl del_ost command). A patch is required on 2.12 to fix it.
At this point, to be able to mount, you could regenerate all config files by following the writeconf procedure detailed in the Lustre manual. Good luck! Stephane On Mar 6, 2023, at 2:27 AM, Teeninga, Robin <r.teeni...@rug.nl<mailto:r.teeni...@rug.nl>> wrote: Hello Stephane, Thanks for your feedback. Why did you run e2fsck? I was suspecting some errors but the e2fsck didn't see anything Did e2fsck fix something? no What version of e2fsprogs are you using? e2fsprogs-1.46.2.wc3-0.el7.x86_64 The device had no free i-nodes anymore so I mounted the device with mount -t ldiskfs mdtdevice /mnt to be able to free up some space. But after we still could not mount the mdt Mar 6 11:23:51 mds01 kernel: LDISKFS-fs (dm-19): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc Mar 6 11:23:52 mds01 kernel: LustreError: 11-0: data-MDT0001-osp-MDT0000: operation mds_connect to node 0@lo failed: rc = -114 Mar 6 11:23:52 mds01 kernel: LustreError: Skipped 9 previous similar messages Mar 6 11:23:52 mds01 kernel: LustreError: 79765:0:(genops.c:556:class_register_device()) data-OST0000-osc-MDT0000: already exists, won't add Mar 6 11:23:52 mds01 kernel: LustreError: 79765:0:(obd_config.c:1835:class_config_llog_handler()) MGC1@tcp14: cfg command failed: rc = -17 Mar 6 11:23:52 mds01 kernel: Lustre: cmd=cf001 0:data-OST0000-osc-MDT0000 1:osp 2:data-MDT0000-mdtlov_UUID Mar 6 11:23:52 mds01 kernel: LustreError: 15c-8: MGC@tcp14: The configuration from log 'data-MDT0000' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Mar 6 11:23:52 mds01 kernel: LustreError: 79753:0:(obd_mount_server.c:1397:server_start_targets()) failed to start server data-MDT0000: -17 Mar 6 11:23:52 mds01 kernel: LustreError: 79753:0:(obd_mount_server.c:1992:server_fill_super()) Unable to start targets: -17 Mar 6 11:23:52 mds01 kernel: Lustre: Failing over data-MDT0000 Mar 6 11:23:52 mds01 kernel: Lustre: data-MDT0000: Not available for connect from @o2ib4 (stopping) Mar 6 11:23:53 mds01 kernel: Lustre: server umount data-MDT0000 complete Mar 6 11:23:53 mds01 kernel: LustreError: 79753:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount (-17) Robin On Sun, Mar 5, 2023 at 2:07 AM Stephane Thiell <sthi...@stanford.edu<mailto:sthi...@stanford.edu>> wrote: Hi Robin, Sorry to hear about your problem. A few questions… Why did you run e2fsck? Did e2fsck fix something? What version of e2fsprogs are you using? errno 28 is ENOSPC, what does dumpe2fs say about available space? You can check the values of "Free blocks” and "Free inodes” using this command: dumpe2fs -h /dev/mapper/****-MDT0000 Best, Stephane > On Mar 2, 2023, at 2:08 AM, Teeninga, Robin via lustre-discuss > <lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>> > wrote: > > Hello, > > I've did an e2fsck on my MDT and after that I could not mount the MDT anymore > It gives me this error when I've tried to mount the filesystem > any ideas how to resolve this? > > We are running Lustre server 2.12.7 on CentOS 7.9 > mount.lustre: mount /dev/mapper/****-MDT0000 at /lustre/****-MDT0000 failed: > File exists > > > Mar 2 10:58:35 mds01 kernel: LDISKFS-fs (dm-19): mounted filesystem with > ordered mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc > Mar 2 10:58:35 mds01 kernel: LustreError: > 160060:0:(llog.c:1398:llog_backup()) MGC****@tcp14: failed to open backup > logfile ****-MDT0000T: rc = -28 > Mar 2 10:58:35 mds01 kernel: LustreError: > 160060:0:(mgc_request.c:1879:mgc_llog_local_copy()) MGC****@tcp14: failed to > copy remote log ****-MDT0000: rc = -28 > Mar 2 10:58:35 mds01 kernel: LustreError: 137-5: ****-MDT0001_UUID: not > available for connect from 0@lo (no target). If you are running an HA pair > check that the target is mounted on the other server. > Mar 2 10:58:35 mds01 kernel: LustreError: Skipped 4 previous similar messages > Mar 2 10:58:35 mds01 kernel: LustreError: > 160127:0:(genops.c:556:class_register_device()) *****-OST0000-osc-MDT0000: > already exists, won't add > Mar 2 10:58:35 mds01 kernel: LustreError: > 160127:0:(obd_config.c:1835:class_config_llog_handler()) MGC****@tcp14: cfg > command failed: rc = -17 > Mar 2 10:58:36 mds01 kernel: Lustre: cmd=cf001 0:****-OST0000-osc-MDT0000 > 1:osp 2:****-MDT0000-mdtlov_UUID > Mar 2 10:58:36 mds01 kernel: LustreError: 15c-8: MGC****@tcp14: The > configuration from log '****-MDT0000' failed (-17). This may be the result of > communication errors between this node and the MGS, a bad configuration, or > other errors. See the syslog for more information. > Mar 2 10:58:36 mds01 kernel: LustreError: > 160060:0:(obd_mount_server.c:1397:server_start_targets()) failed to start > server ****-MDT0000: -17 > Mar 2 10:58:36 mds01 kernel: LustreError: > 160060:0:(obd_mount_server.c:1992:server_fill_super()) Unable to start > targets: -17 > Mar 2 10:58:36 mds01 kernel: Lustre: Failing over ****-MDT0000 > Mar 2 10:58:37 mds01 kernel: Lustre: server umount ****-MDT0000 complete > Mar 2 10:58:37 mds01 kernel: LustreError: > 160060:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount (-17) > > > Regards, > > Robin > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org