Re: [lustre-discuss] Help with recovery of data

2022-06-28 Thread Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss
An update.  We were able to recover our filesystem (minus the two days between 
when the ZFS swap occurred and when we detected it and shut down the 
filesystem).  Simply promoting the cloned ZFS volume (which was really our 
primary volume) and cleaning up the snapshot and clone got us back to normal.   
I did run an lfsck afterwards to clean up any problems from the two days we 
were running on the February clone of the MDT (orphaned files on the OSTs).  I 
believe the lfsck is finished – lots of details below.  I don’t think I see 
anything concerning but the output is a bit hard to interpret so if anyone sees 
otherwise, please let me know.  Any other follow up advice would be appreciated 
as well.



[root@hpfs-fsl-mds1 ~]# lctl lfsck_query -M scratch-MDT
layout_mdts_init: 0
layout_mdts_scanning-phase1: 0
layout_mdts_scanning-phase2: 0
layout_mdts_completed: 0
layout_mdts_failed: 0
layout_mdts_stopped: 0
layout_mdts_paused: 0
layout_mdts_crashed: 0
layout_mdts_partial: 1
layout_mdts_co-failed: 0
layout_mdts_co-stopped: 0
layout_mdts_co-paused: 0
layout_mdts_unknown: 0
layout_osts_init: 0
layout_osts_scanning-phase1: 0
layout_osts_scanning-phase2: 0
layout_osts_completed: 24
layout_osts_failed: 0
layout_osts_stopped: 0
layout_osts_paused: 0
layout_osts_crashed: 0
layout_osts_partial: 0
layout_osts_co-failed: 0
layout_osts_co-stopped: 0
layout_osts_co-paused: 0
layout_osts_unknown: 0
layout_repaired: 6350036
namespace_mdts_init: 0
namespace_mdts_scanning-phase1: 0
namespace_mdts_scanning-phase2: 0
namespace_mdts_completed: 1
namespace_mdts_failed: 0
namespace_mdts_stopped: 0
namespace_mdts_paused: 0
namespace_mdts_crashed: 0
namespace_mdts_partial: 0
namespace_mdts_co-failed: 0
namespace_mdts_co-stopped: 0
namespace_mdts_co-paused: 0
namespace_mdts_unknown: 0
namespace_osts_init: 0
namespace_osts_scanning-phase1: 0
namespace_osts_scanning-phase2: 0
namespace_osts_completed: 0
namespace_osts_failed: 0
namespace_osts_stopped: 0
namespace_osts_paused: 0
namespace_osts_crashed: 0
namespace_osts_partial: 0
namespace_osts_co-failed: 0
namespace_osts_co-stopped: 0
namespace_osts_co-paused: 0
namespace_osts_unknown: 0
namespace_repaired: 1430801
[root@hpfs-fsl-mds1 ~]#


[root@hpfs-fsl-mds1 ~]# lctl get_param -n mdd.scratch-MDT.lfsck_namespace
name: lfsck_namespace
magic: 0xa06249ff
version: 2
status: completed
flags:
param:
last_completed_time: 1656429356
time_since_last_completed: 26787 seconds
latest_start_time: 1656367470
time_since_latest_start: 88673 seconds
last_checkpoint_time: 1656429356
time_since_last_checkpoint: 26787 seconds
latest_start_position: 15, N/A, N/A
last_checkpoint_position: 983045597, N/A, N/A
first_failure_position: N/A, N/A, N/A
checked_phase1: 242090614
checked_phase2: 260856
updated_phase1: 1430801
updated_phase2: 0
failed_phase1: 0
failed_phase2: 0
directories: 12889816
dirent_repaired: 0
linkea_repaired: 1430801
nlinks_repaired: 0
multiple_linked_checked: 1870740
multiple_linked_repaired: 0
unknown_inconsistency: 0
unmatched_pairs_repaired: 0
dangling_repaired: 0
multiple_referenced_repaired: 0
bad_file_type_repaired: 0
lost_dirent_repaired: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 0
striped_dirs_repaired: 0
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 0
striped_shards_repaired: 0
striped_shards_failed: 0
striped_shards_skipped: 0
name_hash_repaired: 0
linkea_overflow_cleared: 3
agent_entries_repaired: 0
success_count: 1
run_time_phase1: 60927 seconds
run_time_phase2: 958 seconds
average_speed_phase1: 3973 items/sec
average_speed_phase2: 272 objs/sec
average_speed_total: 3916 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
[root@hpfs-fsl-mds1 ~]#


[root@hpfs-fsl-mds1 ~]# lctl get_param -n mdd.scratch-MDT.lfsck_layout
name: lfsck_layout
magic: 0xb1732fed
version: 2
status: partial
flags:
param:
last_completed_time: 1656428398
time_since_last_completed: 27798 seconds
latest_start_time: 1656367470
time_since_latest_start: 88726 seconds
last_checkpoint_time: 1656428398
time_since_last_checkpoint: 27798 seconds
latest_start_position: 15
last_checkpoint_position: 983045597
first_failure_position: 0
success_count: 1
repaired_dangling: 287730
repaired_unmatched_pair: 0
repaired_multiple_referenced: 0
repaired_orphan: 0
repaired_inconsistent_owner: 6062306
repaired_others: 0
skipped: 0
failed_phase1: 0
failed_phase2: 0
checked_phase1: 300484421
checked_phase2: 0
run_time_phase1: 60907 seconds
run_time_phase2: 1 seconds
average_speed_phase1: 4933 items/sec
average_speed_phase2: 0 objs/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
[root@hpfs-fsl-mds1 ~]#


[root@hpfs-fsl-mds1 ~]# lctl get_param -n osd-ldiskfs.scratch-MDT.oi_scrub
name: OI_scrub
magic: 0x4c5fd252
oi_files: 64
status: completed
flags:
param:
time_since_last_completed: 27850 seconds
time_since_la

Re: [lustre-discuss] Help with recovery of data

2022-06-22 Thread Andreas Dilger via lustre-discuss
First thing, if you haven't already done so, would be to make a separate "dd" 
backup of the ldiskfs MDT(s) to some external storage before you do anything 
else.  That will give you a fallback in case whatever changes you make don't 
work out well.

I would also suggest to contact the ZFS mailing list to ask if they can help 
restore the "new version" of the MDT at the ZFS level.  You may also want to 
consider a separate ZFS-level backup because the core of the problem appears to 
be ZFS related.  Unfortunately, the opportunity to recover a newer version of 
the ldiskfs MDT at the ZFS level declines the more changes are made to the ZFS 
pool.

I don't think LFSCK will repair the missing files on the MDT, since the OSTs 
don't have enough information to regenerate the namespace.  At most LFSCK will 
create stub files on the MDT under .lustre/lost+found that connect the objects 
for the new files created after your MDT snapshot, but they won't have proper 
filenames.  At most they will have UID/GID/timestamps to identify the 
owners/age, and the users would need to identify the files by content.


On Jun 22, 2022, at 10:46, Vicker, Darby J. (JSC-EG111)[Jacobs Technology, 
Inc.] via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>> wrote:

A quick follow up.  I thought an lfsck would only clean up (i.e. remove 
orphaned MDT and OST objects) but it appears this might have a good shot at 
repairing the file system – specifically, recreating the MDT objects with the 
--create-mdtobj option.  We have started this command:

[root@hpfs-fsl-mds1 ~]# lctl lfsck_start -M scratch-MDT --dryrun on 
--create-mdtobj on

And after running for about an hour we are already seeing this from the query:

layout_repaired: 4645105

Can anyone confirm this will work for our situation – i.e. repair the metadata 
for the OST objects that were orphaned when our metadata got reverted?

From: "Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.]" 
mailto:darby.vicke...@nasa.gov>>
Date: Tuesday, June 21, 2022 at 5:27 PM
To: "lustre-discuss@lists.lustre.org" 
mailto:lustre-discuss@lists.lustre.org>>
Subject: Help with recovery of data

Hi everyone,

We ran into a problem with our lustre filesystem this weekend and could use a 
sanity check and/or advice on recovery.

We are running on CentOS 7.9, ZFS 2.1.4 and Lustre 2.14.  We are using ZFS 
OST’s but and an ldiskfs MDT (for better MDT performance).  For various 
reasons, the ldiskfs is built on a zdev.  Every night we (intend to) back up 
the metadata by ZFS snapshot-ing the zdev, mount the MDT via ldiskfs and tar up 
the contents, umount and remove the ZFS snapshot.  On Sunday (6/19 at about 4 
pm), the metadata server crashed.  It came back up fine but users started 
reporting many missing files and directories today (6/21) – everything since 
about February 9th is gone.  After quite a bit of investigation, it looks like 
the MDT got rolled back to a snapshot of the metadata from February.

[root@hpfs-fsl-mds1 ~]# zfs list -t snap mds1-0/meta-scratch
NAME   USED  AVAIL REFER  MOUNTPOINT
mds1-0/meta-scratch@snap  52.3G  - 1.34T  -
[root@hpfs-fsl-mds1 ~]# zfs get all mds1-0/meta-scratch@snap | grep creation
mds1-0/meta-scratch@snap  creation  Thu Feb 10  3:35 2022  -
[root@hpfs-fsl-mds1 ~]#

We discovered that our MDT backups have been stalled since February since the 
first step is to create mds1-0/meta-scratch@snap  and that dataset already 
exists.  The script was erroring out since the existing snapshot still in 
place.  We have rebooted this MDS several times (gracefully) since February 
with no issues but, apparently, whatever happened in the server crash on Sunday 
caused the MDT to revert to the February data.  So, in theory, the data on the 
OST’s is still there, we are just missing the metadata due to the ZFS glitch.

So the first question - is anyone familiar with this failure mode of ZFS or if 
there is a way recover from it?  I think its unlikely there are any direct ZFS 
recovery options but wanted to ask.

Obviously, MDT backups would be our best recovery option but since this was all 
caused by the backup scripts stalling (and the subsequent rolling back to the 
last snapshot), our backups are the same age as the current data on the 
filesystem.

[root@hpfs-fsl-mds1 ~]# ls -lrt /internal/ldiskfs_backups/
total 629789909
-rw-r--r-- 1 root root 1657 Apr 30  2019 process.txt
-rw-r--r-- 1 root root 445317560320 Jan 25 15:36 
mds1-0_meta-scratch-2022_01_25.tar
-rw-r--r-- 1 root root 446230016000 Jan 26 15:31 
mds1-0_meta-scratch-2022_01_26.tar
-rw-r--r-- 1 root root 448093808640 Jan 27 15:46 
mds1-0_meta-scratch-2022_01_27.tar
-rw-r--r-- 1 root root 440368783360 Jan 28 16:56 
mds1-0_meta-scratch-2022_01_28.tar
-rw-r--r-- 1 root root 442342113280 Jan 29 14:45 
mds1-0_meta-scratch-2022_01_29.tar
-rw-r--r-- 1 root root 442922567680 Jan 30 15:03 
mds1-0_meta-scratch-20

Re: [lustre-discuss] Help with recovery of data

2022-06-22 Thread Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss
A quick follow up.  I thought an lfsck would only clean up (i.e. remove 
orphaned MDT and OST objects) but it appears this might have a good shot at 
repairing the file system – specifically, recreating the MDT objects with the 
--create-mdtobj option.  We have started this command:

[root@hpfs-fsl-mds1 ~]# lctl lfsck_start -M scratch-MDT --dryrun on 
--create-mdtobj on

And after running for about an hour we are already seeing this from the query:

layout_repaired: 4645105

Can anyone confirm this will work for our situation – i.e. repair the metadata 
for the OST objects that were orphaned when our metadata got reverted?

From: "Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.]" 

Date: Tuesday, June 21, 2022 at 5:27 PM
To: "lustre-discuss@lists.lustre.org" 
Subject: Help with recovery of data

Hi everyone,

We ran into a problem with our lustre filesystem this weekend and could use a 
sanity check and/or advice on recovery.

We are running on CentOS 7.9, ZFS 2.1.4 and Lustre 2.14.  We are using ZFS 
OST’s but and an ldiskfs MDT (for better MDT performance).  For various 
reasons, the ldiskfs is built on a zdev.  Every night we (intend to) back up 
the metadata by ZFS snapshot-ing the zdev, mount the MDT via ldiskfs and tar up 
the contents, umount and remove the ZFS snapshot.  On Sunday (6/19 at about 4 
pm), the metadata server crashed.  It came back up fine but users started 
reporting many missing files and directories today (6/21) – everything since 
about February 9th is gone.  After quite a bit of investigation, it looks like 
the MDT got rolled back to a snapshot of the metadata from February.

[root@hpfs-fsl-mds1 ~]# zfs list -t snap mds1-0/meta-scratch
NAME   USED  AVAIL REFER  MOUNTPOINT
mds1-0/meta-scratch@snap  52.3G  - 1.34T  -
[root@hpfs-fsl-mds1 ~]# zfs get all mds1-0/meta-scratch@snap | grep creation
mds1-0/meta-scratch@snap  creation  Thu Feb 10  3:35 2022  -
[root@hpfs-fsl-mds1 ~]#

We discovered that our MDT backups have been stalled since February since the 
first step is to create mds1-0/meta-scratch@snap  and that dataset already 
exists.  The script was erroring out since the existing snapshot still in 
place.  We have rebooted this MDS several times (gracefully) since February 
with no issues but, apparently, whatever happened in the server crash on Sunday 
caused the MDT to revert to the February data.  So, in theory, the data on the 
OST’s is still there, we are just missing the metadata due to the ZFS glitch.

So the first question - is anyone familiar with this failure mode of ZFS or if 
there is a way recover from it?  I think its unlikely there are any direct ZFS 
recovery options but wanted to ask.

Obviously, MDT backups would be our best recovery option but since this was all 
caused by the backup scripts stalling (and the subsequent rolling back to the 
last snapshot), our backups are the same age as the current data on the 
filesystem.

[root@hpfs-fsl-mds1 ~]# ls -lrt /internal/ldiskfs_backups/
total 629789909
-rw-r--r-- 1 root root 1657 Apr 30  2019 process.txt
-rw-r--r-- 1 root root 445317560320 Jan 25 15:36 
mds1-0_meta-scratch-2022_01_25.tar
-rw-r--r-- 1 root root 446230016000 Jan 26 15:31 
mds1-0_meta-scratch-2022_01_26.tar
-rw-r--r-- 1 root root 448093808640 Jan 27 15:46 
mds1-0_meta-scratch-2022_01_27.tar
-rw-r--r-- 1 root root 440368783360 Jan 28 16:56 
mds1-0_meta-scratch-2022_01_28.tar
-rw-r--r-- 1 root root 442342113280 Jan 29 14:45 
mds1-0_meta-scratch-2022_01_29.tar
-rw-r--r-- 1 root root 442922567680 Jan 30 15:03 
mds1-0_meta-scratch-2022_01_30.tar
-rw-r--r-- 1 root root 443076515840 Jan 31 15:17 
mds1-0_meta-scratch-2022_01_31.tar
-rw-r--r-- 1 root root 444589025280 Feb  1 15:11 
mds1-0_meta-scratch-2022_02_01.tar
-rw-r--r-- 1 root root 443741409280 Feb  2 15:17 
mds1-0_meta-scratch-2022_02_02.tar
-rw-r--r-- 1 root root 448209367040 Feb  3 15:24 
mds1-0_meta-scratch-2022_02_03.tar
-rw-r--r-- 1 root root 453777090560 Feb  4 15:55 
mds1-0_meta-scratch-2022_02_04.tar
-rw-r--r-- 1 root root 454211307520 Feb  5 14:37 
mds1-0_meta-scratch-2022_02_05.tar
-rw-r--r-- 1 root root 454619084800 Feb  6 14:30 
mds1-0_meta-scratch-2022_02_06.tar
-rw-r--r-- 1 root root 455459276800 Feb  7 15:26 
mds1-0_meta-scratch-2022_02_07.tar
-rw-r--r-- 1 root root 457470945280 Feb  8 15:07 
mds1-0_meta-scratch-2022_02_08.tar
-rw-r--r-- 1 root root 460592517120 Feb  9 15:21 
mds1-0_meta-scratch-2022_02_09.tar
-rw-r--r-- 1 root root 332377712640 Feb 10 12:04 
mds1-0_meta-scratch-2022_02_10.tar
[root@hpfs-fsl-mds1 ~]#


Yes, I know, we will put in some monitoring for this in the future...

Fortunately, we also have a robinhood system syncing with this file system.  
The sync is fairly up to date – the logs say a few days ago and I’ve used 
rbh-find to find some files that were created in the last few days.  So I think 
we have a shot at recovery.  We have this command running now to see what it 
will do:

rbh-diff --a

[lustre-discuss] Help with recovery of data

2022-06-21 Thread Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss
Hi everyone,

We ran into a problem with our lustre filesystem this weekend and could use a 
sanity check and/or advice on recovery.

We are running on CentOS 7.9, ZFS 2.1.4 and Lustre 2.14.  We are using ZFS 
OST’s but and an ldiskfs MDT (for better MDT performance).  For various 
reasons, the ldiskfs is built on a zdev.  Every night we (intend to) back up 
the metadata by ZFS snapshot-ing the zdev, mount the MDT via ldiskfs and tar up 
the contents, umount and remove the ZFS snapshot.  On Sunday (6/19 at about 4 
pm), the metadata server crashed.  It came back up fine but users started 
reporting many missing files and directories today (6/21) – everything since 
about February 9th is gone.  After quite a bit of investigation, it looks like 
the MDT got rolled back to a snapshot of the metadata from February.

[root@hpfs-fsl-mds1 ~]# zfs list -t snap mds1-0/meta-scratch
NAME   USED  AVAIL REFER  MOUNTPOINT
mds1-0/meta-scratch@snap  52.3G  - 1.34T  -
[root@hpfs-fsl-mds1 ~]# zfs get all mds1-0/meta-scratch@snap | grep creation
mds1-0/meta-scratch@snap  creation  Thu Feb 10  3:35 2022  -
[root@hpfs-fsl-mds1 ~]#

We discovered that our MDT backups have been stalled since February since the 
first step is to create mds1-0/meta-scratch@snap  and that dataset already 
exists.  The script was erroring out since the existing snapshot still in 
place.  We have rebooted this MDS several times (gracefully) since February 
with no issues but, apparently, whatever happened in the server crash on Sunday 
caused the MDT to revert to the February data.  So, in theory, the data on the 
OST’s is still there, we are just missing the metadata due to the ZFS glitch.

So the first question - is anyone familiar with this failure mode of ZFS or if 
there is a way recover from it?  I think its unlikely there are any direct ZFS 
recovery options but wanted to ask.

Obviously, MDT backups would be our best recovery option but since this was all 
caused by the backup scripts stalling (and the subsequent rolling back to the 
last snapshot), our backups are the same age as the current data on the 
filesystem.

[root@hpfs-fsl-mds1 ~]# ls -lrt /internal/ldiskfs_backups/
total 629789909
-rw-r--r-- 1 root root 1657 Apr 30  2019 process.txt
-rw-r--r-- 1 root root 445317560320 Jan 25 15:36 
mds1-0_meta-scratch-2022_01_25.tar
-rw-r--r-- 1 root root 446230016000 Jan 26 15:31 
mds1-0_meta-scratch-2022_01_26.tar
-rw-r--r-- 1 root root 448093808640 Jan 27 15:46 
mds1-0_meta-scratch-2022_01_27.tar
-rw-r--r-- 1 root root 440368783360 Jan 28 16:56 
mds1-0_meta-scratch-2022_01_28.tar
-rw-r--r-- 1 root root 442342113280 Jan 29 14:45 
mds1-0_meta-scratch-2022_01_29.tar
-rw-r--r-- 1 root root 442922567680 Jan 30 15:03 
mds1-0_meta-scratch-2022_01_30.tar
-rw-r--r-- 1 root root 443076515840 Jan 31 15:17 
mds1-0_meta-scratch-2022_01_31.tar
-rw-r--r-- 1 root root 444589025280 Feb  1 15:11 
mds1-0_meta-scratch-2022_02_01.tar
-rw-r--r-- 1 root root 443741409280 Feb  2 15:17 
mds1-0_meta-scratch-2022_02_02.tar
-rw-r--r-- 1 root root 448209367040 Feb  3 15:24 
mds1-0_meta-scratch-2022_02_03.tar
-rw-r--r-- 1 root root 453777090560 Feb  4 15:55 
mds1-0_meta-scratch-2022_02_04.tar
-rw-r--r-- 1 root root 454211307520 Feb  5 14:37 
mds1-0_meta-scratch-2022_02_05.tar
-rw-r--r-- 1 root root 454619084800 Feb  6 14:30 
mds1-0_meta-scratch-2022_02_06.tar
-rw-r--r-- 1 root root 455459276800 Feb  7 15:26 
mds1-0_meta-scratch-2022_02_07.tar
-rw-r--r-- 1 root root 457470945280 Feb  8 15:07 
mds1-0_meta-scratch-2022_02_08.tar
-rw-r--r-- 1 root root 460592517120 Feb  9 15:21 
mds1-0_meta-scratch-2022_02_09.tar
-rw-r--r-- 1 root root 332377712640 Feb 10 12:04 
mds1-0_meta-scratch-2022_02_10.tar
[root@hpfs-fsl-mds1 ~]#


Yes, I know, we will put in some monitoring for this in the future...

Fortunately, we also have a robinhood system syncing with this file system.  
The sync is fairly up to date – the logs say a few days ago and I’ve used 
rbh-find to find some files that were created in the last few days.  So I think 
we have a shot at recovery.  We have this command running now to see what it 
will do:

rbh-diff --apply=fs --dry-run --scan=/scratch-lustre

But it has already been running a long time with no output.  Our file system is 
fairly large:


[root@hpfs-fsl-lmon0 ~]# lfs df -h /scratch-lustre
UUID   bytesUsed   Available Use% Mounted on
scratch-MDT_UUID 1011.8G   82.6G  826.7G  10% 
/scratch-lustre[MDT:0]
scratch-OST_UUID   49.6T   16.2T   33.4T  33% 
/scratch-lustre[OST:0]
scratch-OST0001_UUID   49.6T   17.4T   32.3T  35% 
/scratch-lustre[OST:1]
scratch-OST0002_UUID   49.6T   16.8T   32.8T  34% 
/scratch-lustre[OST:2]
scratch-OST0003_UUID   49.6T   17.2T   32.4T  35% 
/scratch-lustre[OST:3]
scratch-OST0004_UUID   49.6T   16.7T   32.9T  34% 
/scratch-lustre[OST:4]
scratch-OST0005_UUID   49.6T