Thanks Andreas.  The orphan data is scattered throughout the array, 
although it's primarily on one OST (30) which seems to have been hit 
particularly hard by this outage:

[r...@iliadaccess04 lfsck2]# grep ERROR lfsck2.out
lfsck: ost_idx 5: pass2 ERROR: 3817 dangling inodes found (654297 files 
total)
lfsck: ost_idx 6: pass2 ERROR: 2 dangling inodes found (670416 files total)
lfsck: ost_idx 11: pass2 ERROR: 942 dangling inodes found (673425 files 
total)
lfsck: ost_idx 12: pass2 ERROR: 64 dangling inodes found (678878 files 
total)
[lfsck: ost_idx 13: pass3 ERROR: 24.2109MB of orphan data (6 of 776532 
files total)
lfsck: ost_idx 14: pass3 ERROR: 5.34375MB of orphan data (2 of 725085 
files total)
lfsck: ost_idx 15: pass2 ERROR: 1 dangling inodes found (672942 files total)
lfsck: ost_idx 15: pass3 ERROR: 58.4688MB of orphan data (6 of 739995 
files total)
lfsck: ost_idx 16: pass2 ERROR: 1 dangling inodes found (671379 files total)
lfsck: ost_idx 18: pass2 ERROR: 3371 dangling inodes found (692018 files 
total)
lfsck: ost_idx 18: pass3 ERROR: 5499.86MB of orphan data (620 of 688965 
files total)
lfsck: ost_idx 19: pass3 ERROR: 21.375MB of orphan data (8 of 775964 
files total)
[20] lfsck: ost_idx 20: pass3 ERROR: 3433.61MB of orphan data (16 of 
843328 files total)
[22] zero-length orphan objilfsck: ost_idx 22: pass3 ERROR: 1.21094MB of 
orphan data (16 of 859527 files total)
lfsck: ost_idx 23: pass2 ERROR: 1 dangling inodes found (663492 files total)
[23] zero-length orphan oblfsck: ost_idx 23: pass3 ERROR: 8571.68MB of 
orphan data (20 of 838490 files total)
[24] zero-length orphan objid 83735lfsck: ost_idx 24: pass3 ERROR: 
4367.45MB of orphan data (16 of 837371 files total)
[25] zero-length orphan objid lfsck: ost_idx 25: pass3 ERROR: 121.996MB 
of orphan data (16 of 858679 files total)
lfsck: ost_idx 30: pass2 ERROR: 46700 dangling inodes found (682467 
files total)
lfsck: ost_idx 30: pass3 ERROR: 45313.4MB of orphan data (7648 of 668343 
files total)
[r...@iliadaccess04 lfsck2]#

Thanks again,
Chris

On 11/12/10 3:05 AM, Andreas Dilger wrote:
> On 2010-11-11, at 19:53, Christopher Walker wrote:
>> Thanks very much for your reply. I've tried remaking the mdsdb and all
>> of the ostdb's, but I still get the same error -- it checks the first 34
>> osts without a problem, but can't find the ostdb file for the 35th
>> (which has ost_idx 42):
>>
>> with the filesystem up I can see files on this OST:
>>
>> [cwal...@iliadaccess04 P-Gadget3.3.1]$ lfs getstripe predict.c
>> OBDS:
>> 0: aegalfs-OST0000_UUID ACTIVE
>> ...
>> 33: aegalfs-OST0021_UUID ACTIVE
>> 42: aegalfs-OST002a_UUID ACTIVE
>> predict.c
>> obdidx objid objid group
>> 42 10 0xa 0
>>
>>
>> lfsck identifies several hundred GB of orphan data that we'd like to
>> recover, so we'd really like to run lfsck on this array. We're willing
>> to forgo the recovery on the 35th ost, but I want to make sure that
>> running lfsck -l with the current configuration won't make things worse.
> I'm not sure that what lfsck is reporting in this case is correct.  Is the 
> orphan data all on the same OST, or spread around separate OSTs?  My concern 
> is that if lfsck thinks the in-use objects on your estranged OST is actually 
> orphan data it will destroy that data.
>
> If there are a small number of very large objects on other OSTs that are 
> making up the bulk of the orphan space usage, you could mount those OSTs as 
> type ldiskfs and delete the objects by hand to free up the space.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
>

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to