Re: [Lustre-discuss] [HPDD-discuss] Recovering a failed OST

Bob Ball Thu, 22 May 2014 06:58:28 -0700

Thanks for the advice. Fortunately, the OST was completely drained offiles before all heck broke loose. With the help of the manual, acouple of lustre list threads, and some long-lost memories of a similarsituation a few years back, I was able to bring the OST alive again,albeit still read-only for the time being (2 days off for me, and now Ineed to IO test it before I'll trust it again).


Cheers,
bob


On 5/20/2014 10:49 AM, Martin Hecht wrote:

Hi bob,
just to make sure: You already followed:http://wiki.lustre.org/index.php/Handling_File_System_Errors,especially the steps for e2fsck linked there?
If you did *not yet* do any write operation to the damaged OST, youmight want to back up the whole OST first, using dd for instance (ifthe underlying hardware still permits it).
If the situation described (empty O directory, lost LAST_ID entry)occurred *after* the e2fsck, and you find lots of files in lost+foundwhen you mount the OST as ldiskfs, you can usell_recover_lost_found_objs to put them back in the correct place(http://manpages.ubuntu.com/manpages/precise/man1/ll_recover_lost_found_objs.1.html)- it is part of the lustre distribution. Once I had to run thisseveral times in order to restore the structure below.
best regards,
Martin

On 05/19/2014 08:24 PM, Bob Ball wrote:
Oh, better still, as I kept looking, and the low-level panicretreated, I found this on the mdt:
[root@lmd02 ~]# lctl get_param osc.*.prealloc_next_id
...
osc.umt3-OST0025-osc.prealloc_next_id=6778336
So, unless someone tells me that I am way off base, I'm going toproceed with the assumption that this is a valid starting point, andproceed to get my file system back online.
bob

On 5/19/2014 2:05 PM, Bob Ball wrote:
Google first, ask later.  I found this in the manuals:


      26.3.4 Fixing a Bad LAST_ID on an OST
The procedures there spell out pretty well what I must do, so thisshould be relatively straight forward. But, does this comment referto just this OST, or to all OST?*Note - *The file system must be stopped on all servers beforeperforming this procedure.
So, is this the best approach to follow, allowing for the fact thatthere is nothing at all left on the OST, or is there a better shortcut to choosing an appropriate LAST_ID?
Thanks again,
bob


On 5/19/2014 1:50 PM, Bob Ball wrote:
I need to completely remake a failed OST. I have done this in thepast, but this time, the disk failed in such a way that I cannotfully get recovery information from the OST before I destroy andrecreate. In particular, I am unable to recover the LAST_ID file,but successfully retrieved the last_rcvd and CONFIGS/* files.
mount -t ldiskfs /dev/sde /mnt/ost
pushd /mnt/ost
cd O
cd 0
cp -p LAST_ID /root/reformat/sde
The O directory exists, but it is empty. What can I do concerningthis missing LAST_ID file? I mean, I probably have something,somewhere, from some previous recovery, but that is way, way out ofdate.
My intent is to recreate this OST with the same index, and then putit back into production. All files were moved off the OST beforereaching this state, so nothing else needs to be recovered here.
Thanks,
bob

_______________________________________________
HPDD-discuss mailing list
hpdd-disc...@lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss
_______________________________________________
HPDD-discuss mailing list
hpdd-disc...@lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss
_______________________________________________
HPDD-discuss mailing list
hpdd-disc...@lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] [HPDD-discuss] Recovering a failed OST

Reply via email to