Thanks *very* much -- I'll give this a shot later today and let you know how it goes.
Best, Chris On 11/12/10 3:17 AM, Wang Yibin wrote: > This is a bug in llapi_lov_get_uuids() which assigns UUID to the wrong OST > index when there are sparse OST(s). > Please file a bug for this. > > Before this bug can be fixed, you can apply the following patch to > e2fsprogs(version 1.41.12.2.ora1) lfsck.c as a workaround (not verified > though). > > --- e2fsprogs/e2fsck/lfsck.c 2010-11-12 11:43:42.000000000 +0800 > +++ lfsck.c 2010-11-12 12:14:38.000000000 +0800 > @@ -1226,6 +1226,12 @@ > __u64 last_id; > int i, rc; > > + /* skip empty UUID OST */ > + if(!strlen(lfsck_uuid[ost_idx].uuid)) { > + log_write("index %d UUID is empty(sparse OST index?). Skipping.\n", > ost_idx); > + return(0); > + } > + > sprintf(dbname, "%s.%d", MDS_OSTDB, ost_idx); > > VERBOSE(2, "testing ost_idx %d\n", ost_idx); > @@ -1279,11 +1284,20 @@ > ost_hdr->ost_uuid.uuid); > > if (obd_uuid_equals(&lfsck_uuid[ost_idx], &ost_hdr->ost_uuid)) { > + /* must be sparse ost index */ > if (ost_hdr->ost_index != ost_idx) { > log_write("Requested ost_idx %u doesn't match " > "index %u found in %s\n", ost_idx, > ost_hdr->ost_index, ost_files[i]); > - continue; > + > + log_write("Moving the index/uuid to the right place...\n"); > + /* zero the original uuid entry */ > + memset(&lfsck_uuid[ost_idx], 0, sizeof(struct obd_uuid)); > + /* copy it to the right place */ > + ost_idx = ost_hdr->ost_index; > + > strcpy(lfsck_uuid[ost_hdr->ost_index].uuid,ost_hdr->ost_uuid.uuid); > + /* skip this round */ > + goto out; > } > > break; > > > 在 2010-11-12,上午10:53, Christopher Walker 写道: > >> Thanks very much for your reply. I've tried remaking the mdsdb and all >> of the ostdb's, but I still get the same error -- it checks the first 34 >> osts without a problem, but can't find the ostdb file for the 35th >> (which has ost_idx 42): >> >> ... >> lfsck: ost_idx 34: pass3 OK (676803 files total) >> lfsck: can't find file for ost_idx 35 >> Files affected by missing ost info are : - >> lfsck: can't find file for ost_idx 36 >> Files affected by missing ost info are : - >> lfsck: can't find file for ost_idx 37 >> Files affected by missing ost info are : - >> lfsck: can't find file for ost_idx 38 >> Files affected by missing ost info are : - >> lfsck: can't find file for ost_idx 39 >> Files affected by missing ost info are : - >> lfsck: can't find file for ost_idx 40 >> Files affected by missing ost info are : - >> lfsck: can't find file for ost_idx 41 >> Files affected by missing ost info are : - >> lfsck: can't find file for ost_idx 42 >> Files affected by missing ost info are : - >> ... >> >> e2fsck claims to be making the ostdb without a problem: >> >> Pass 6: Acquiring information for lfsck >> OST: 'aegalfs-OST002a_UUID' ost idx 42: compat 0x2 rocomp 0 incomp 0x2 >> OST: num files = 676803 >> OST: last_id = 858163 >> >> and with the filesystem up I can see files on this OST: >> >> [cwal...@iliadaccess04 P-Gadget3.3.1]$ lfs getstripe predict.c >> OBDS: >> 0: aegalfs-OST0000_UUID ACTIVE >> ... >> 33: aegalfs-OST0021_UUID ACTIVE >> 42: aegalfs-OST002a_UUID ACTIVE >> predict.c >> obdidx objid objid group >> 42 10 0xa 0 >> >> >> lfsck identifies several hundred GB of orphan data that we'd like to >> recover, so we'd really like to run lfsck on this array. We're willing >> to forgo the recovery on the 35th ost, but I want to make sure that >> running lfsck -l with the current configuration won't make things worse. >> >> Thanks again for your reply; any further advice is very much appreciated! >> >> Best, >> Chris >> >> On 11/10/10 12:10 AM, Wang Yibin wrote: >>> The error message indicates that the UUID of OST #35 does not match between >>> the live filesystem and the ostdb file. >>> Is this ostdb obsolete? >>> >>> 在 2010-11-9,下午11:45, Christopher Walker 写道: >>> >>>> For reasons that I can't recall, our OSTs are not in consecutive order >>>> -- we have 35 OSTs, which are numbered consecutively from >>>> 0000-0021 >>>> and then there's one last OST at >>>> 002a >>>> >>>> When I try to run lfsck on this array, it works fine for the first 34 >>>> OSTs, but it can't seem to find the last OST db file: >>>> >>>> lfsck: ost_idx 34: pass3 OK (680045 files total) >>>> lfsck: can't find file for ost_idx 35 >>>> Files affected by missing ost info are : - >>>> lfsck: can't find file for ost_idx 36 >>>> Files affected by missing ost info are : - >>>> lfsck: can't find file for ost_idx 37 >>>> Files affected by missing ost info are : - >>>> lfsck: can't find file for ost_idx 38 >>>> Files affected by missing ost info are : - >>>> lfsck: can't find file for ost_idx 39 >>>> Files affected by missing ost info are : - >>>> lfsck: can't find file for ost_idx 40 >>>> Files affected by missing ost info are : - >>>> lfsck: can't find file for ost_idx 41 >>>> Files affected by missing ost info are : - >>>> lfsck: can't find file for ost_idx 42 >>>> Files affected by missing ost info are : - >>>> /n/scratch/hernquist_lab/tcox/tests/SbSbhs_e_8/P-Gadget3.3.1/IdlSubfind/.svn/text-base/ReadSubhaloFromReshuffledSnapshot.pro.svn-base >>>> >>>> and then lists all of the files that live on OST 002a. This db file >>>> definitely does exist -- it lives in the same directory as all of the >>>> other db files, and e2fsck for this OST ran without problems. >>>> >>>> Is there some way of forcing lfsck to recognize this OST db? Or, >>>> failing that, is it dangerous to run lfsck on the first 34 OSTs only? >>>> >>>> We're using e2fsck 1.41.6.sun1 (30-May-2009) >>>> >>>> Thanks very much! >>>> >>>> Chris >>>> >>>> >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss@lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss