Hi Robin,

Your MDT configuration file seems corrupt somehow.
Those "already exists, won't add” errors make me think about a ticket we opened 
a while back https://jira.whamcloud.com/browse/LU-15000
It was also with 2.12 but only because we had used lctl llog_cancel (or the 
newer lctl del_ost command). A patch is required on 2.12 to fix it.

At this point, to be able to mount, you could regenerate all config files by 
following the writeconf procedure detailed in the Lustre manual.

Good luck!

Stephane

On Mar 6, 2023, at 2:27 AM, Teeninga, Robin 
<r.teeni...@rug.nl<mailto:r.teeni...@rug.nl>> wrote:

Hello Stephane,

Thanks for your feedback.

Why did you run e2fsck?
I was suspecting some errors but the e2fsck didn't see anything
Did e2fsck fix something?
no
What version of e2fsprogs are you using?
e2fsprogs-1.46.2.wc3-0.el7.x86_64

The device had no free i-nodes anymore
so I mounted the device with  mount -t ldiskfs mdtdevice /mnt to be able to 
free up some space.
But after we still could not mount the mdt

Mar  6 11:23:51 mds01 kernel: LDISKFS-fs (dm-19): mounted filesystem with 
ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Mar  6 11:23:52 mds01 kernel: LustreError: 11-0: data-MDT0001-osp-MDT0000: 
operation mds_connect to node 0@lo failed: rc = -114
Mar  6 11:23:52 mds01 kernel: LustreError: Skipped 9 previous similar messages
Mar  6 11:23:52 mds01 kernel: LustreError: 
79765:0:(genops.c:556:class_register_device()) data-OST0000-osc-MDT0000: 
already exists, won't add
Mar  6 11:23:52 mds01 kernel: LustreError: 
79765:0:(obd_config.c:1835:class_config_llog_handler()) MGC1@tcp14: cfg command 
failed: rc = -17
Mar  6 11:23:52 mds01 kernel: Lustre:    cmd=cf001 0:data-OST0000-osc-MDT0000  
1:osp  2:data-MDT0000-mdtlov_UUID
Mar  6 11:23:52 mds01 kernel: LustreError: 15c-8: MGC@tcp14: The configuration 
from log 'data-MDT0000' failed (-17). This may be the result of communication 
errors between this node and the MGS, a bad configuration, or other errors. See 
the syslog for more information.
Mar  6 11:23:52 mds01 kernel: LustreError: 
79753:0:(obd_mount_server.c:1397:server_start_targets()) failed to start server 
data-MDT0000: -17
Mar  6 11:23:52 mds01 kernel: LustreError: 
79753:0:(obd_mount_server.c:1992:server_fill_super()) Unable to start targets: 
-17
Mar  6 11:23:52 mds01 kernel: Lustre: Failing over data-MDT0000
Mar  6 11:23:52 mds01 kernel: Lustre: data-MDT0000: Not available for connect 
from @o2ib4 (stopping)
Mar  6 11:23:53 mds01 kernel: Lustre: server umount data-MDT0000 complete
Mar  6 11:23:53 mds01 kernel: LustreError: 
79753:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-17)

Robin

On Sun, Mar 5, 2023 at 2:07 AM Stephane Thiell 
<sthi...@stanford.edu<mailto:sthi...@stanford.edu>> wrote:
Hi Robin,

Sorry to hear about your problem.

A few questions…

Why did you run e2fsck?
Did e2fsck fix something?
What version of e2fsprogs are you using?

errno 28 is ENOSPC, what does dumpe2fs say about available space?

You can check the values of "Free blocks” and "Free inodes” using this command:

dumpe2fs -h /dev/mapper/****-MDT0000


Best,
Stephane


> On Mar 2, 2023, at 2:08 AM, Teeninga, Robin via lustre-discuss 
> <lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>> 
> wrote:
>
> Hello,
>
> I've did an e2fsck on my MDT and after that I could not mount the MDT anymore
> It gives me this error when I've tried to mount the filesystem
> any ideas how to resolve this?
>
> We are running Lustre server 2.12.7 on CentOS 7.9
> mount.lustre: mount /dev/mapper/****-MDT0000 at /lustre/****-MDT0000 failed: 
> File exists
>
>
> Mar  2 10:58:35 mds01 kernel: LDISKFS-fs (dm-19): mounted filesystem with 
> ordered  mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
> Mar  2 10:58:35 mds01 kernel: LustreError: 
> 160060:0:(llog.c:1398:llog_backup()) MGC****@tcp14: failed to open backup 
> logfile ****-MDT0000T: rc = -28
> Mar  2 10:58:35 mds01 kernel: LustreError: 
> 160060:0:(mgc_request.c:1879:mgc_llog_local_copy()) MGC****@tcp14: failed to 
> copy remote log ****-MDT0000: rc = -28
> Mar  2 10:58:35 mds01 kernel: LustreError: 137-5: ****-MDT0001_UUID: not 
> available for connect from 0@lo (no target). If you are running an HA pair 
> check that the target is mounted on the other server.
> Mar  2 10:58:35 mds01 kernel: LustreError: Skipped 4 previous similar messages
> Mar  2 10:58:35 mds01 kernel: LustreError: 
> 160127:0:(genops.c:556:class_register_device()) *****-OST0000-osc-MDT0000: 
> already exists, won't add
> Mar  2 10:58:35 mds01 kernel: LustreError: 
> 160127:0:(obd_config.c:1835:class_config_llog_handler()) MGC****@tcp14: cfg 
> command failed: rc = -17
> Mar  2 10:58:36 mds01 kernel: Lustre:    cmd=cf001 0:****-OST0000-osc-MDT0000 
>  1:osp  2:****-MDT0000-mdtlov_UUID
> Mar  2 10:58:36 mds01 kernel: LustreError: 15c-8: MGC****@tcp14: The 
> configuration from log '****-MDT0000' failed (-17). This may be the result of 
> communication errors between this node and the MGS, a bad configuration, or 
> other errors. See the syslog for more information.
> Mar  2 10:58:36 mds01 kernel: LustreError: 
> 160060:0:(obd_mount_server.c:1397:server_start_targets()) failed to start 
> server ****-MDT0000: -17
> Mar  2 10:58:36 mds01 kernel: LustreError: 
> 160060:0:(obd_mount_server.c:1992:server_fill_super()) Unable to start 
> targets: -17
> Mar  2 10:58:36 mds01 kernel: Lustre: Failing over ****-MDT0000
> Mar  2 10:58:37 mds01 kernel: Lustre: server umount ****-MDT0000 complete
> Mar  2 10:58:37 mds01 kernel: LustreError: 
> 160060:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-17)
>
>
> Regards,
>
> Robin
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to