Hi Patrick,
Thanks for sharing your experience, looks like you did the bulk of troubleshooting in the Jira ticket.

I assume I should have a clean filesystem (ie. run fsck first) before disabling the dirdata feature ?
After I disable dirdata, I will need to run fsck with the "-D" option ?

FYI, ll_recover_lost_found_objs tool will recover files from lost+found on *OST* volumes (ie. moves them back into /O/0/dXX directory) based on extended file attributes. Section 37.5 of the HPDD manual.

thanks
chris hunter
chris.hun...@yale.edu

On 10/27/2015 12:06 PM, Patrick Farrell wrote:
Chris,

I had the joy of taking this one apart personally.  We mostly let lfsck do the 
repair and moved on, accepting that some of the dentries were trashed.  I 
think, for important things, our field staff did some manual recovery with the 
e2fsprogs tools, but it was not a common enough problem that we documented a 
procedure.

If you read LU-5626 carefully, there's an explanation of the exact nature of 
the damage, and having that should let you make partial recoveries by hand.  
I'm not familiar with the ll_recover_lost_found_objs tool, but I doubt it would 
prove helpful in this instance.

Note that there's two forms to this corruption.  One is if you move a directory 
which was created before dirdata was enabled, then the '..' entry ends up in 
the wrong place.  This does not trouble Lustre, but fsck reports it as an error 
and will 'correct' it, which has the effect of (usually) overwriting one dentry 
in the directory when it creates a new '..' dentry in the correct location.

I don't *think* that one causes the MDT to go read only, but I could be wrong.  
I *think* what causes the MDT to go read only is the other problem:

When you have a non-htree directory (not too many items in it, all directory 
entries in a single inode) that is in the bad state described above (with the 
'..' dentry in the wrong place after being moved) and that directory has enough 
files added to it that it becomes an htree directory, the resulting directory 
is corrupted more severely.  We never sorted out the precise details of this - 
I believe we chose to simply delete any directories in this state.  (I think 
lfsck did it for us, but can't recall for sure.)

I'd advise reading LU-5626 with care, and I'd also suggest you might turn off 
'dirdata' on your MDT until you have this under control.  That will at least 
prevent any more directories from ending up in either of these bad states if 
you use the filesystem without updating Lustre to a version with the LU-5626 
patch in it.

- Patrick
________________________________________
From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
Chris Hunter [chris.hun...@yale.edu]
Sent: Tuesday, October 27, 2015 10:22 AM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss]  recovery MDT ".." directory entries (LU-5626)

We have a lustre 1.8 filesystem that was upgraded to lustre 2.x and
"dirdata" feature was enabled. We encountered LU-5626/LU-2638 issue with
".." directory entries. Are there established recovery steps for this
issue ?

If I run fsck, the directory entries will be moved into lost+found.
I assume the next step is to run the ll_recover_lost_found_objs tool ?

Can you share any advice/experience about recovery ?

thanks,
chris hunter
chris.hun...@yale.edu

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=AwIFAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=d_G2h_sZYG4xtHMeKo8QgjDmOcMVdQvYgM-5Dri1AOY&m=83OYH_ms_eqiU1wnAGo9fAzmYQX3fBG7y1eio_j_xpU&s=hl5TuadAk5fXgjermbroSP81LGazmXpj1BxqaIsP7Cw&e=

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to