Hi Chris and Patrick, I was sick last week so I have found this conversation not before today, sorry
On 10/27/2015 05:06 PM, Patrick Farrell wrote: > If you read LU-5626 carefully, there's an explanation of the exact nature of > the damage, and having that should let you make partial recoveries by hand. > I'm not familiar with the ll_recover_lost_found_objs tool, but I doubt it > would prove helpful in this instance. there is no tool like ll_recover_lost_found_objs for the MDT. On OSTs this would be the right choice. > Note that there's two forms to this corruption. One is if you move a > directory which was created before dirdata was enabled, then the '..' entry > ends up in the wrong place. This does not trouble Lustre, but fsck reports > it as an error and will 'correct' it, which has the effect of (usually) > overwriting one dentry in the directory when it creates a new '..' dentry in > the correct location. > > I don't *think* that one causes the MDT to go read only, but I could be > wrong. I *think* what causes the MDT to go read only is the other problem: > > When you have a non-htree directory (not too many items in it, all directory > entries in a single inode) that is in the bad state described above (with the > '..' dentry in the wrong place after being moved) and that directory has > enough files added to it that it becomes an htree directory, the resulting > directory is corrupted more severely. We never sorted out the precise > details of this - I believe we chose to simply delete any directories in this > state. (I think lfsck did it for us, but can't recall for sure.) If I recall correctly, moving (or renaming) the corrupted directory to another place caused the MDT to go readonly, probably adding more files as Patrick wrote before is another trigger. In our case we captured the full ouptut of e2fsck which contained the original names and the inodes. fsck moved some of the files and subdiretories of the corrupted directories to lost+found. With the information contained in the e2fsck output we could move them back from lost+found to their original place on the ldiskfs level (I have parsed the e2fsck output for a pattern matching the inode numbers and created a script out of it). We had to repeat this a couple of times, because either some of the subdirectories moved to lost+found were in a bad shape themselves or were further damaged later when the owners added files to them later on or moved them around. So, if you have captured all your e2fsck output and you haven't yet cleaned up lost+found, you still can recover the data. lfsck would probably throw away the objects on the OSTs because it thinks they are orphane objects left over after deleting the files. best regards, Martin
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org