On Jun 16, 2008 15:37 -0700, megan wrote: > I am using Lustre 2.6.18-53.1.13.el5_lustre.1.6.4.3smp kernel on a > CentOS 5 linux x86_64 linux box. > We had a hardware problem that caused the underlying ext3 partition > table to completely blow up. This is resulting in only three of five > OSTs being mountable. The main lustre disk of this unit cannot be > mounted because the MDS knows that two of its parts are missing.
It should be possible to mount a Lustre filesystem with OSTs that are not available. However, access to files on the unavailable OSTs will cause the process to wait on OST recovery. > The underlying set-up is JBOD hw that is passed to the linux OS, via > an LSI 8888ELP card in this case, as a simple device, ie. sde, > sdf,... The simple devices were partitioned using parted and > formatted ext3 then lustre was built on top of the five ext3 units. > There was no striping done across units/JBODS. Three of the five > units passed an e2fsck and an lfsck. Those remaining units are > mounted as such: > /dev/sdc 13T 6.3T 5.7T 53% /srv/lustre/OST/crew4- > OST0003 > /dev/sdd 13T 6.3T 5.7T 53% /srv/lustre/OST/crew4- > OST0004 > /dev/sdf 13T 6.2T 5.8T 52% /srv/lustre/OST/crew4- > OST0001 > > Being that it is unlikely that we shall be able to recover the > underlying ext3 on the other two units, is there some method by which > I might try to rescue the data from these last three units mounted > currently on the OSS? > > Any and all suggestion genuinely appreciated. The recoverability of your data depends heavily on the striping of the individual files (i.e. the default striping). If your files have a default stripe_count = 1, then you can probably recover 3/5 of the files in the filesystem. If your default stripe_count = 2, then you can probably only recover 1/5 of the files, and if you have a higher stripe_count you probably can't recover any files. What you need to do is to mount one of the clients and mark the corresponding OSTs inactive with: lctl dl # get device numbers for OSC 0000 and OSC 0002 lctl --device N deactivate Then, instead of the clients waiting for the OSTs to recover the client will get an IO error when it accesses files on the failed OSTs. To get a list of the files that are on the good OSTs run: lfs find --ost crew4-OST0001_UUID --ost crew4-OST0003_UUID --ost crew4-OST0004_UUID {mountpoint} Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss