Re: [lustre-discuss] Filesystem could not mount after e2fsck

2023-03-04 Thread Stephane Thiell via lustre-discuss
Hi Robin,

Sorry to hear about your problem.

A few questions…

Why did you run e2fsck?
Did e2fsck fix something?
What version of e2fsprogs are you using?

errno 28 is ENOSPC, what does dumpe2fs say about available space?

You can check the values of "Free blocks” and "Free inodes” using this command:

dumpe2fs -h /dev/mapper/-MDT


Best,
Stephane


> On Mar 2, 2023, at 2:08 AM, Teeninga, Robin via lustre-discuss 
>  wrote:
> 
> Hello,
> 
> I've did an e2fsck on my MDT and after that I could not mount the MDT anymore
> It gives me this error when I've tried to mount the filesystem
> any ideas how to resolve this?
> 
> We are running Lustre server 2.12.7 on CentOS 7.9
> mount.lustre: mount /dev/mapper/-MDT at /lustre/-MDT failed: 
> File exists
> 
> 
> Mar  2 10:58:35 mds01 kernel: LDISKFS-fs (dm-19): mounted filesystem with 
> ordered  mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
> Mar  2 10:58:35 mds01 kernel: LustreError: 
> 160060:0:(llog.c:1398:llog_backup()) MGC@tcp14: failed to open backup 
> logfile -MDTT: rc = -28
> Mar  2 10:58:35 mds01 kernel: LustreError: 
> 160060:0:(mgc_request.c:1879:mgc_llog_local_copy()) MGC@tcp14: failed to 
> copy remote log -MDT: rc = -28
> Mar  2 10:58:35 mds01 kernel: LustreError: 137-5: -MDT0001_UUID: not 
> available for connect from 0@lo (no target). If you are running an HA pair 
> check that the target is mounted on the other server.
> Mar  2 10:58:35 mds01 kernel: LustreError: Skipped 4 previous similar messages
> Mar  2 10:58:35 mds01 kernel: LustreError: 
> 160127:0:(genops.c:556:class_register_device()) *-OST-osc-MDT: 
> already exists, won't add
> Mar  2 10:58:35 mds01 kernel: LustreError: 
> 160127:0:(obd_config.c:1835:class_config_llog_handler()) MGC@tcp14: cfg 
> command failed: rc = -17
> Mar  2 10:58:36 mds01 kernel: Lustre:cmd=cf001 0:-OST-osc-MDT 
>  1:osp  2:-MDT-mdtlov_UUID  
> Mar  2 10:58:36 mds01 kernel: LustreError: 15c-8: MGC@tcp14: The 
> configuration from log '-MDT' failed (-17). This may be the result of 
> communication errors between this node and the MGS, a bad configuration, or 
> other errors. See the syslog for more information.
> Mar  2 10:58:36 mds01 kernel: LustreError: 
> 160060:0:(obd_mount_server.c:1397:server_start_targets()) failed to start 
> server -MDT: -17
> Mar  2 10:58:36 mds01 kernel: LustreError: 
> 160060:0:(obd_mount_server.c:1992:server_fill_super()) Unable to start 
> targets: -17
> Mar  2 10:58:36 mds01 kernel: Lustre: Failing over -MDT
> Mar  2 10:58:37 mds01 kernel: Lustre: server umount -MDT complete
> Mar  2 10:58:37 mds01 kernel: LustreError: 
> 160060:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-17)
> 
> 
> Regards,
> 
> Robin
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Slow Lustre traffic failover issue

2023-03-04 Thread 覃江龙 via lustre-discuss
Dear Developer,

I hope this message finds you well. I am currently working with a Lustre file 
system installed on two nodes, with a mounted client and NFS connection to the 
Lustre client directory. When I generated traffic into the Lustre directory and 
one of the nodes failed, the MGS and OST services switched to the second node 
and it took five to six minutes for the traffic to resume. However, when I 
switched to using an ext3 file system, the traffic resumed in only one to two 
minutes.

I was wondering if you could shed some light on why the Lustre switch is taking 
longer, and how I could potentially address this issue. Thank you for your time 
and expertise.

Best regards,Radiant___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org