Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
On 2015/11/04, 02:42, "lustre-discuss on behalf of Martin Hecht"wrote: >On 11/04/2015 03:23 AM, Patrick Farrell wrote: >> PAF: Remember, the specific conditions are pretty tight. Created under >>1.8, not empty (if it's empty, the .. dentry is not misplaced when >>moved) but also non-htree, then moved with dirdata enabled, and then >>grown to this larger size. How many existing (small) directories do you >>move and then add a bunch of files to? It's a pretty rare operation. >>We only hit it at Martin's site because of an automated tool they have >>to re-arrange user/job directories. >Well, not only because of the tool. Especially, because when the >directories have been moved by the tool, no files are added anymore. >However, our mechanism gives a reason to the users to move their data >from time to time (that's not the intention of the mechanism, but that's >how some users react). > >But I'm not quite sure anymore if moving the directories is really a >precondition to run into LU-5626. >We have run the background lfsck which adds the FID to the existing >dentries. This might be an important detail, because in our case a >second '..' entry containing the FID was presumably created by lfsck (in >the wrong place), and not by moving the directory. To my current >understanding the user then only has to add some files to trigger the >LBUG. >A subsequent e2fsck will not only find this particular directory but all >other small directories with a '..' entry in the wrong place. When >e2fsck tries to fix these directories, some entries are overwritten by >the FID and these files are then moved to lost+found. Note that newer versions of LFSCK namespace checking (2.6 or 2.7, don't recall offhand) will be able to return such entries from lost+found back into the proper parent directory in the namespace, assuming they were created under 2.x. Lustre stores an extra "link" xattr on each inode with the filename and parent directory FID for each link to the file (up to the available xattr space for each inode), so in case of directory corruption it would be possible to rebuild the directory structure just from the "link" xattrs on each file. In the meantime, I attached a script to LU-5626 that could be used to re-link files from lost+found into the right directory and filename based on the output from e2fsck. It is a bit rough (needs manual editing of pathnames), but may be useful if someone has hit this problem. Cheers, Andreas >If one of these first entries happens to be a small subdirectory, I >believe there is a chance to run into the same issue again, when you >move everything back to the original location after the e2fsck and >someone starts adding files in these subdirectories. > >However, the preconditions are still quite narrow: small directories, >not empty, created without fid, then converted by lfsck (or >alternatively moved to a different place which would also create the >second '..' entry). To trigger the LBUG files need to be added to one of >these directories and for a second occurrence of the LBUG the same >conditions must hold for another subdirectory which must have been at >the very beginning of the directory. > >Martin > > > Cheers, Andreas -- Andreas Dilger Lustre Software Architect Intel High Performance Data Division ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
On 11/02/2015 12:30 PM, Martin Hecht wrote: Hi Chris and Patrick, I was sick last week so I have found this conversation not before today, sorry On 10/27/2015 05:06 PM, Patrick Farrell wrote: If you read LU-5626 carefully, there's an explanation of the exact nature of the damage, and having that should let you make partial recoveries by hand. I'm not familiar with the ll_recover_lost_found_objs tool, but I doubt it would prove helpful in this instance. there is no tool like ll_recover_lost_found_objs for the MDT. On OSTs this would be the right choice. Note that there's two forms to this corruption. One is if you move a directory which was created before dirdata was enabled, then the '..' entry ends up in the wrong place. This does not trouble Lustre, but fsck reports it as an error and will 'correct' it, which has the effect of (usually) overwriting one dentry in the directory when it creates a new '..' dentry in the correct location. I don't *think* that one causes the MDT to go read only, but I could be wrong. I *think* what causes the MDT to go read only is the other problem: When you have a non-htree directory (not too many items in it, all directory entries in a single inode) that is in the bad state described above (with the '..' dentry in the wrong place after being moved) and that directory has enough files added to it that it becomes an htree directory, the resulting directory is corrupted more severely. We never sorted out the precise details of this - I believe we chose to simply delete any directories in this state. (I think lfsck did it for us, but can't recall for sure.) If I recall correctly, moving (or renaming) the corrupted directory to another place caused the MDT to go readonly, probably adding more files as Patrick wrote before is another trigger. In our case we captured the full ouptut of e2fsck which contained the original names and the inodes. fsck moved some of the files and subdiretories of the corrupted directories to lost+found. With the information contained in the e2fsck output we could move them back from lost+found to their original place on the ldiskfs level (I have parsed the e2fsck output for a pattern matching the inode numbers and created a script out of it). We had to repeat this a couple of times, because either some of the subdirectories moved to lost+found were in a bad shape themselves or were further damaged later when the owners added files to them later on or moved them around. So, if you have captured all your e2fsck output and you haven't yet cleaned up lost+found, you still can recover the data. lfsck would probably throw away the objects on the OSTs because it thinks they are orphane objects left over after deleting the files. best regards, Martin Yes I believe you want to (manually) recover the directories from lost+found back to ROOT on the MDT before lfsck/oi_scrub runs. I don't think lfsck on the MDT will impact orphan objects on the OSTs. regards, chris hunter ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
Martin, Our observation at the time was that lfsck did not add the fid to the .. dentry unless there was already space in the appropriate location. I don't remember digging in to the details, but that was our observation at the time. (Since it meant lfsck namespace was behaving, in a sense, correctly, we were initially puzzled but decided it was all right. I seem to remember reading a comment somewhere that the developers decided rearranging the dentries was too hard, so they'd only add fids were space was already present.) It's possible we didn't get that quite right, though it would have to be partial somehow - misplaced .. dentries with fids were definitely not universal after running the namespace lfsck. (Drawing on experience from other sites here as well.) In any case, directories with bad .. dentries can be identified with fsck anyway. - Patrick From: Martin Hecht [he...@hlrs.de] Sent: Wednesday, November 04, 2015 3:42 AM To: Patrick Farrell; Mohr Jr, Richard Frank (Rick Mohr) Cc: lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626) On 11/04/2015 03:23 AM, Patrick Farrell wrote: > PAF: Remember, the specific conditions are pretty tight. Created under 1.8, > not empty (if it's empty, the .. dentry is not misplaced when moved) but also > non-htree, then moved with dirdata enabled, and then grown to this larger > size. How many existing (small) directories do you move and then add a bunch > of files to? It's a pretty rare operation. We only hit it at Martin's site > because of an automated tool they have to re-arrange user/job directories. Well, not only because of the tool. Especially, because when the directories have been moved by the tool, no files are added anymore. However, our mechanism gives a reason to the users to move their data from time to time (that's not the intention of the mechanism, but that's how some users react). But I'm not quite sure anymore if moving the directories is really a precondition to run into LU-5626. We have run the background lfsck which adds the FID to the existing dentries. This might be an important detail, because in our case a second '..' entry containing the FID was presumably created by lfsck (in the wrong place), and not by moving the directory. To my current understanding the user then only has to add some files to trigger the LBUG. A subsequent e2fsck will not only find this particular directory but all other small directories with a '..' entry in the wrong place. When e2fsck tries to fix these directories, some entries are overwritten by the FID and these files are then moved to lost+found. If one of these first entries happens to be a small subdirectory, I believe there is a chance to run into the same issue again, when you move everything back to the original location after the e2fsck and someone starts adding files in these subdirectories. However, the preconditions are still quite narrow: small directories, not empty, created without fid, then converted by lfsck (or alternatively moved to a different place which would also create the second '..' entry). To trigger the LBUG files need to be added to one of these directories and for a second occurrence of the LBUG the same conditions must hold for another subdirectory which must have been at the very beginning of the directory. Martin ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
On 11/04/2015 03:23 AM, Patrick Farrell wrote: > PAF: Remember, the specific conditions are pretty tight. Created under 1.8, > not empty (if it's empty, the .. dentry is not misplaced when moved) but also > non-htree, then moved with dirdata enabled, and then grown to this larger > size. How many existing (small) directories do you move and then add a bunch > of files to? It's a pretty rare operation. We only hit it at Martin's site > because of an automated tool they have to re-arrange user/job directories. Well, not only because of the tool. Especially, because when the directories have been moved by the tool, no files are added anymore. However, our mechanism gives a reason to the users to move their data from time to time (that's not the intention of the mechanism, but that's how some users react). But I'm not quite sure anymore if moving the directories is really a precondition to run into LU-5626. We have run the background lfsck which adds the FID to the existing dentries. This might be an important detail, because in our case a second '..' entry containing the FID was presumably created by lfsck (in the wrong place), and not by moving the directory. To my current understanding the user then only has to add some files to trigger the LBUG. A subsequent e2fsck will not only find this particular directory but all other small directories with a '..' entry in the wrong place. When e2fsck tries to fix these directories, some entries are overwritten by the FID and these files are then moved to lost+found. If one of these first entries happens to be a small subdirectory, I believe there is a chance to run into the same issue again, when you move everything back to the original location after the e2fsck and someone starts adding files in these subdirectories. However, the preconditions are still quite narrow: small directories, not empty, created without fid, then converted by lfsck (or alternatively moved to a different place which would also create the second '..' entry). To trigger the LBUG files need to be added to one of these directories and for a second occurrence of the LBUG the same conditions must hold for another subdirectory which must have been at the very beginning of the directory. Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org