Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
Hi, comments inline... On 11/04/2015 01:34 PM, Patrick Farrell wrote: > Our observation at the time was that lfsck did not add the fid to the .. > dentry unless there was already space in the appropriate location. Ok, I might have been wrong in this point and some manual mv by the users was involved. On 11/04/2015 04:24 PM, Chris Hunter wrote: > Yes I believe you want to (manually) recover the directories from > lost+found back to ROOT on the MDT before lfsck/oi_scrub runs. I don't > think lfsck on the MDT will impact orphan objects on the OSTs. With lfsck phase 2 introduced in lustre 2.6 the MDT-OST consistency is checked and repaired. Chris, you wrote that you have upgraded to "lustre 2.x", so I don't know if you have lfsck II already. And I'm not sure if MDT entries in lost+found are ignored by lfsck. I just wanted to point out that you might have to be careful here, but looking at the lustre manual it turns out that you are right. The consistency checks are run when lfsck type is set to "layout", which is a different thing than the "namespace" check used to update the FIDs. On 11/05/2015 01:29 AM, Dilger, Andreas wrote: > Note that newer versions of LFSCK namespace checking (2.6 or 2.7, don't > recall offhand) will be able to return such entries from lost+found back > into the proper parent directory in the namespace, assuming they were > created under 2.x. Lustre stores an extra "link" xattr on each inode with > the filename and parent directory FID for each link to the file (up to the > available xattr space for each inode), so in case of directory corruption > it would be possible to rebuild the directory structure just from the > "link" xattrs on each file. that's good to know. However, the files in this case were created with 1.8, so even if the current version after the upgrade has this "link" xattr, it doesn't help to recover from LU-5626. But your script is useful (it's pretty much the same as I did back then, but I didn't find my quick hack it anymore...) > In the meantime, I attached a script to LU-5626 that could be used to > re-link files from lost+found into the right directory and filename based > on the output from e2fsck. It is a bit rough (needs manual editing of > pathnames), but may be useful if someone has hit this problem. best regards, Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
On 2015/11/04, 02:42, "lustre-discuss on behalf of Martin Hecht"wrote: >On 11/04/2015 03:23 AM, Patrick Farrell wrote: >> PAF: Remember, the specific conditions are pretty tight. Created under >>1.8, not empty (if it's empty, the .. dentry is not misplaced when >>moved) but also non-htree, then moved with dirdata enabled, and then >>grown to this larger size. How many existing (small) directories do you >>move and then add a bunch of files to? It's a pretty rare operation. >>We only hit it at Martin's site because of an automated tool they have >>to re-arrange user/job directories. >Well, not only because of the tool. Especially, because when the >directories have been moved by the tool, no files are added anymore. >However, our mechanism gives a reason to the users to move their data >from time to time (that's not the intention of the mechanism, but that's >how some users react). > >But I'm not quite sure anymore if moving the directories is really a >precondition to run into LU-5626. >We have run the background lfsck which adds the FID to the existing >dentries. This might be an important detail, because in our case a >second '..' entry containing the FID was presumably created by lfsck (in >the wrong place), and not by moving the directory. To my current >understanding the user then only has to add some files to trigger the >LBUG. >A subsequent e2fsck will not only find this particular directory but all >other small directories with a '..' entry in the wrong place. When >e2fsck tries to fix these directories, some entries are overwritten by >the FID and these files are then moved to lost+found. Note that newer versions of LFSCK namespace checking (2.6 or 2.7, don't recall offhand) will be able to return such entries from lost+found back into the proper parent directory in the namespace, assuming they were created under 2.x. Lustre stores an extra "link" xattr on each inode with the filename and parent directory FID for each link to the file (up to the available xattr space for each inode), so in case of directory corruption it would be possible to rebuild the directory structure just from the "link" xattrs on each file. In the meantime, I attached a script to LU-5626 that could be used to re-link files from lost+found into the right directory and filename based on the output from e2fsck. It is a bit rough (needs manual editing of pathnames), but may be useful if someone has hit this problem. Cheers, Andreas >If one of these first entries happens to be a small subdirectory, I >believe there is a chance to run into the same issue again, when you >move everything back to the original location after the e2fsck and >someone starts adding files in these subdirectories. > >However, the preconditions are still quite narrow: small directories, >not empty, created without fid, then converted by lfsck (or >alternatively moved to a different place which would also create the >second '..' entry). To trigger the LBUG files need to be added to one of >these directories and for a second occurrence of the LBUG the same >conditions must hold for another subdirectory which must have been at >the very beginning of the directory. > >Martin > > > Cheers, Andreas -- Andreas Dilger Lustre Software Architect Intel High Performance Data Division ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
On 11/02/2015 12:30 PM, Martin Hecht wrote: Hi Chris and Patrick, I was sick last week so I have found this conversation not before today, sorry On 10/27/2015 05:06 PM, Patrick Farrell wrote: If you read LU-5626 carefully, there's an explanation of the exact nature of the damage, and having that should let you make partial recoveries by hand. I'm not familiar with the ll_recover_lost_found_objs tool, but I doubt it would prove helpful in this instance. there is no tool like ll_recover_lost_found_objs for the MDT. On OSTs this would be the right choice. Note that there's two forms to this corruption. One is if you move a directory which was created before dirdata was enabled, then the '..' entry ends up in the wrong place. This does not trouble Lustre, but fsck reports it as an error and will 'correct' it, which has the effect of (usually) overwriting one dentry in the directory when it creates a new '..' dentry in the correct location. I don't *think* that one causes the MDT to go read only, but I could be wrong. I *think* what causes the MDT to go read only is the other problem: When you have a non-htree directory (not too many items in it, all directory entries in a single inode) that is in the bad state described above (with the '..' dentry in the wrong place after being moved) and that directory has enough files added to it that it becomes an htree directory, the resulting directory is corrupted more severely. We never sorted out the precise details of this - I believe we chose to simply delete any directories in this state. (I think lfsck did it for us, but can't recall for sure.) If I recall correctly, moving (or renaming) the corrupted directory to another place caused the MDT to go readonly, probably adding more files as Patrick wrote before is another trigger. In our case we captured the full ouptut of e2fsck which contained the original names and the inodes. fsck moved some of the files and subdiretories of the corrupted directories to lost+found. With the information contained in the e2fsck output we could move them back from lost+found to their original place on the ldiskfs level (I have parsed the e2fsck output for a pattern matching the inode numbers and created a script out of it). We had to repeat this a couple of times, because either some of the subdirectories moved to lost+found were in a bad shape themselves or were further damaged later when the owners added files to them later on or moved them around. So, if you have captured all your e2fsck output and you haven't yet cleaned up lost+found, you still can recover the data. lfsck would probably throw away the objects on the OSTs because it thinks they are orphane objects left over after deleting the files. best regards, Martin Yes I believe you want to (manually) recover the directories from lost+found back to ROOT on the MDT before lfsck/oi_scrub runs. I don't think lfsck on the MDT will impact orphan objects on the OSTs. regards, chris hunter ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
Martin, Our observation at the time was that lfsck did not add the fid to the .. dentry unless there was already space in the appropriate location. I don't remember digging in to the details, but that was our observation at the time. (Since it meant lfsck namespace was behaving, in a sense, correctly, we were initially puzzled but decided it was all right. I seem to remember reading a comment somewhere that the developers decided rearranging the dentries was too hard, so they'd only add fids were space was already present.) It's possible we didn't get that quite right, though it would have to be partial somehow - misplaced .. dentries with fids were definitely not universal after running the namespace lfsck. (Drawing on experience from other sites here as well.) In any case, directories with bad .. dentries can be identified with fsck anyway. - Patrick From: Martin Hecht [he...@hlrs.de] Sent: Wednesday, November 04, 2015 3:42 AM To: Patrick Farrell; Mohr Jr, Richard Frank (Rick Mohr) Cc: lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626) On 11/04/2015 03:23 AM, Patrick Farrell wrote: > PAF: Remember, the specific conditions are pretty tight. Created under 1.8, > not empty (if it's empty, the .. dentry is not misplaced when moved) but also > non-htree, then moved with dirdata enabled, and then grown to this larger > size. How many existing (small) directories do you move and then add a bunch > of files to? It's a pretty rare operation. We only hit it at Martin's site > because of an automated tool they have to re-arrange user/job directories. Well, not only because of the tool. Especially, because when the directories have been moved by the tool, no files are added anymore. However, our mechanism gives a reason to the users to move their data from time to time (that's not the intention of the mechanism, but that's how some users react). But I'm not quite sure anymore if moving the directories is really a precondition to run into LU-5626. We have run the background lfsck which adds the FID to the existing dentries. This might be an important detail, because in our case a second '..' entry containing the FID was presumably created by lfsck (in the wrong place), and not by moving the directory. To my current understanding the user then only has to add some files to trigger the LBUG. A subsequent e2fsck will not only find this particular directory but all other small directories with a '..' entry in the wrong place. When e2fsck tries to fix these directories, some entries are overwritten by the FID and these files are then moved to lost+found. If one of these first entries happens to be a small subdirectory, I believe there is a chance to run into the same issue again, when you move everything back to the original location after the e2fsck and someone starts adding files in these subdirectories. However, the preconditions are still quite narrow: small directories, not empty, created without fid, then converted by lfsck (or alternatively moved to a different place which would also create the second '..' entry). To trigger the LBUG files need to be added to one of these directories and for a second occurrence of the LBUG the same conditions must hold for another subdirectory which must have been at the very beginning of the directory. Martin ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
On 11/04/2015 03:23 AM, Patrick Farrell wrote: > PAF: Remember, the specific conditions are pretty tight. Created under 1.8, > not empty (if it's empty, the .. dentry is not misplaced when moved) but also > non-htree, then moved with dirdata enabled, and then grown to this larger > size. How many existing (small) directories do you move and then add a bunch > of files to? It's a pretty rare operation. We only hit it at Martin's site > because of an automated tool they have to re-arrange user/job directories. Well, not only because of the tool. Especially, because when the directories have been moved by the tool, no files are added anymore. However, our mechanism gives a reason to the users to move their data from time to time (that's not the intention of the mechanism, but that's how some users react). But I'm not quite sure anymore if moving the directories is really a precondition to run into LU-5626. We have run the background lfsck which adds the FID to the existing dentries. This might be an important detail, because in our case a second '..' entry containing the FID was presumably created by lfsck (in the wrong place), and not by moving the directory. To my current understanding the user then only has to add some files to trigger the LBUG. A subsequent e2fsck will not only find this particular directory but all other small directories with a '..' entry in the wrong place. When e2fsck tries to fix these directories, some entries are overwritten by the FID and these files are then moved to lost+found. If one of these first entries happens to be a small subdirectory, I believe there is a chance to run into the same issue again, when you move everything back to the original location after the e2fsck and someone starts adding files in these subdirectories. However, the preconditions are still quite narrow: small directories, not empty, created without fid, then converted by lfsck (or alternatively moved to a different place which would also create the second '..' entry). To trigger the LBUG files need to be added to one of these directories and for a second occurrence of the LBUG the same conditions must hold for another subdirectory which must have been at the very beginning of the directory. Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
Mmm, unfortunately, still not quite right - Disabling dirdata will not save you in the conversion to HTree case either. It will just prevent *more* directories from getting a misplaced ".." dentry to begin with. As to size... I figured it out once - But it depends on file name length in the directory, since the dentry includes the file name. Once the total size of dentries in a directory exceeds 4096 bytes (one inode), then it will be converted to an HTree, I believe. So, at something like 32 bytes a dentry, which is like a 10-16 or so character file name (exact dentry length here requires more checking than I've got time for, but it's close), then you've got 32=2^5, 4096 = 2^12, so 2^12/2^5 = 2^7 or 128 dentries. But of course, longer file names --> bigger dentries --> fewer dentries before conversion to HTree. As far as "easy way to scan", well, fsck set to not make changes will find all the directories with misplaced ".." dentries, and also any already damaged-by-conversion-to-HTree directories. - Patrick On 11/03/2015 01:12 PM, Mohr Jr, Richard Frank (Rick Mohr) wrote: Patrick, Thanks for the clarification. I think I understand now. Disabling dirdata would not help any directories which have already had their “..” entry relocated. The next time fsck runs, those directories will potentially get corrupted. The bigger reason to disable dirdata is to prevent more serious corruption if a non-HTree directory with an incorrectly placed “..” gets converted to a HTree directory. How large does a directory need to be before the conversion to HTree happens? I don’t suppose there is an easy way to scan the file system to look for directories that might be subject to corruption… —Rick On Nov 3, 2015, at 12:30 PM, Patrick Farrellwrote: Hm. That's almost, but not quite, right. Disabling dirdata during the fsck run has no positive effect - fsck will still get upset about the incorrectly placed entry. (And whether or not dirdata is enabled, fsck will do the same thing. It doesn't know or care about the dirdata setting as such.) Steps #1 and #2 will not cause any problems until you run fsck, but there's no way around the issue once you do run fsck. The .. dentry must go back to the correct location to make fsck happy. If I remember right, fsck creates the .. dentry and doesn't include the fid (regardless of dirdata setting). This can overwrite another dentry if one has been placed in the location normally reserved for the .. dentry (which can happen if the dentry which was after the .. dentry is deleted, thereby making a space large enough for a dentry+FID). Furthermore, if you have a non-Htree directory where the .. dentry is incorrectly placed (your steps 1 & 2), then you add files until it shifts to become an HTree directory, THAT directory becomes corrupted in a more severe manner that will cause your MDT to remount read only and/or LBUG. (LU-2638 only fixes the .. dentry bug for HTree directories themselves. It does not help with a corrupted directory that then becomes an HTree directory.) - Patrick On 11/03/2015 11:17 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote: On Oct 27, 2015, at 1:46 PM, Patrick Farrell wrote: That's something of a time bomb - If one of those directories fsck wishes it could correct is small and grows in number of files, you'll get the MDT going read only (and a few odd LBUGs if you try to put it back). I was looking back over the incident where I thought I had hit this bug, but based on the lack of side effects that you mentioned, I am now starting to think that I was mistaken. Nevertheless, I am trying to understand the bug a little better in case I am still susceptible to it. I tried to summarize my understanding below, and maybe you can tell me if I am correct. For HTree directories, the problem is described in LU-2638. But since I am running Lustre >2.4, I should not be affected by this bug. For non-Tree directories, the problem is described in LU-5626. In order to trigger the bug, the following steps must happen: 1) A non-HTree directory created under Lustre 1.8 (which does not have a FID for its “..” entry) gets moved to a different parent directory. 2) Lustre tries to update the “..” entry in the directory, and if there is not enough space in the existing entry, it creates a new “..” entry and adds the FID. 3) Something happens to the MDT, and fsck needs to be run. When it runs, it notices that “..” is no longer the second entry in the directory. 4) fsck tries to “fix” the problem by moving the “..” entry back to its original position. With the FID in place, there is not enough space in the original position, but fsck moves it anyway which causes the “..” entry to overwrite part of the third entry in the directory. If that is correct, then steps #1 and #2 can happen without causing any problems. It is only at steps #3 and #4 that the corruption occurs,
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
Patrick, Thanks for the clarification. I think I understand now. Disabling dirdata would not help any directories which have already had their “..” entry relocated. The next time fsck runs, those directories will potentially get corrupted. The bigger reason to disable dirdata is to prevent more serious corruption if a non-HTree directory with an incorrectly placed “..” gets converted to a HTree directory. How large does a directory need to be before the conversion to HTree happens? I don’t suppose there is an easy way to scan the file system to look for directories that might be subject to corruption… —Rick > On Nov 3, 2015, at 12:30 PM, Patrick Farrellwrote: > > Hm. That's almost, but not quite, right. Disabling dirdata during the fsck > run has no positive effect - fsck will still get upset about the incorrectly > placed entry. (And whether or not dirdata is enabled, fsck will do the same > thing. It doesn't know or care about the dirdata setting as such.) > > Steps #1 and #2 will not cause any problems until you run fsck, but there's > no way around the issue once you do run fsck. The .. dentry must go back to > the correct location to make fsck happy. If I remember right, fsck creates > the .. dentry and doesn't include the fid (regardless of dirdata setting). > This can overwrite another dentry if one has been placed in the location > normally reserved for the .. dentry (which can happen if the dentry which was > after the .. dentry is deleted, thereby making a space large enough for a > dentry+FID). > > Furthermore, if you have a non-Htree directory where the .. dentry is > incorrectly placed (your steps 1 & 2), then you add files until it shifts to > become an HTree directory, THAT directory becomes corrupted in a more severe > manner that will cause your MDT to remount read only and/or LBUG. (LU-2638 > only fixes the .. dentry bug for HTree directories themselves. It does not > help with a corrupted directory that then becomes an HTree directory.) > > - Patrick > > On 11/03/2015 11:17 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote: >>> On Oct 27, 2015, at 1:46 PM, Patrick Farrell wrote: >>> >>> That's something of a time bomb - If one of those directories fsck wishes >>> it could correct is small and grows in number of files, you'll get the MDT >>> going read only (and a few odd LBUGs if you try to put it back). >> I was looking back over the incident where I thought I had hit this bug, but >> based on the lack of side effects that you mentioned, I am now starting to >> think that I was mistaken. Nevertheless, I am trying to understand the bug >> a little better in case I am still susceptible to it. I tried to summarize >> my understanding below, and maybe you can tell me if I am correct. >> >> For HTree directories, the problem is described in LU-2638. But since I am >> running Lustre >2.4, I should not be affected by this bug. >> >> For non-Tree directories, the problem is described in LU-5626. In order to >> trigger the bug, the following steps must happen: >> >> 1) A non-HTree directory created under Lustre 1.8 (which does not have a FID >> for its “..” entry) gets moved to a different parent directory. >> >> 2) Lustre tries to update the “..” entry in the directory, and if there is >> not enough space in the existing entry, it creates a new “..” entry and adds >> the FID. >> >> 3) Something happens to the MDT, and fsck needs to be run. When it runs, it >> notices that “..” is no longer the second entry in the directory. >> >> 4) fsck tries to “fix” the problem by moving the “..” entry back to its >> original position. With the FID in place, there is not enough space in the >> original position, but fsck moves it anyway which causes the “..” entry to >> overwrite part of the third entry in the directory. >> >> If that is correct, then steps #1 and #2 can happen without causing any >> problems. It is only at steps #3 and #4 that the corruption occurs, and as >> long as dirdata is disabled before fsck is run, then there should not be any >> problems. >> >> Is that explanation accurate? >> >> -- >> Rick Mohr >> Senior HPC System Administrator >> National Institute for Computational Sciences >> http://www.nics.tennessee.edu >> > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
> On Oct 27, 2015, at 1:46 PM, Patrick Farrellwrote: > > That's something of a time bomb - If one of those directories fsck wishes it > could correct is small and grows in number of files, you'll get the MDT going > read only (and a few odd LBUGs if you try to put it back). I was looking back over the incident where I thought I had hit this bug, but based on the lack of side effects that you mentioned, I am now starting to think that I was mistaken. Nevertheless, I am trying to understand the bug a little better in case I am still susceptible to it. I tried to summarize my understanding below, and maybe you can tell me if I am correct. For HTree directories, the problem is described in LU-2638. But since I am running Lustre >2.4, I should not be affected by this bug. For non-Tree directories, the problem is described in LU-5626. In order to trigger the bug, the following steps must happen: 1) A non-HTree directory created under Lustre 1.8 (which does not have a FID for its “..” entry) gets moved to a different parent directory. 2) Lustre tries to update the “..” entry in the directory, and if there is not enough space in the existing entry, it creates a new “..” entry and adds the FID. 3) Something happens to the MDT, and fsck needs to be run. When it runs, it notices that “..” is no longer the second entry in the directory. 4) fsck tries to “fix” the problem by moving the “..” entry back to its original position. With the FID in place, there is not enough space in the original position, but fsck moves it anyway which causes the “..” entry to overwrite part of the third entry in the directory. If that is correct, then steps #1 and #2 can happen without causing any problems. It is only at steps #3 and #4 that the corruption occurs, and as long as dirdata is disabled before fsck is run, then there should not be any problems. Is that explanation accurate? -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
Hm. That's almost, but not quite, right. Disabling dirdata during the fsck run has no positive effect - fsck will still get upset about the incorrectly placed entry. (And whether or not dirdata is enabled, fsck will do the same thing. It doesn't know or care about the dirdata setting as such.) Steps #1 and #2 will not cause any problems until you run fsck, but there's no way around the issue once you do run fsck. The .. dentry must go back to the correct location to make fsck happy. If I remember right, fsck creates the .. dentry and doesn't include the fid (regardless of dirdata setting). This can overwrite another dentry if one has been placed in the location normally reserved for the .. dentry (which can happen if the dentry which was after the .. dentry is deleted, thereby making a space large enough for a dentry+FID). Furthermore, if you have a non-Htree directory where the .. dentry is incorrectly placed (your steps 1 & 2), then you add files until it shifts to become an HTree directory, THAT directory becomes corrupted in a more severe manner that will cause your MDT to remount read only and/or LBUG. (LU-2638 only fixes the .. dentry bug for HTree directories themselves. It does not help with a corrupted directory that then becomes an HTree directory.) - Patrick On 11/03/2015 11:17 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote: On Oct 27, 2015, at 1:46 PM, Patrick Farrellwrote: That's something of a time bomb - If one of those directories fsck wishes it could correct is small and grows in number of files, you'll get the MDT going read only (and a few odd LBUGs if you try to put it back). I was looking back over the incident where I thought I had hit this bug, but based on the lack of side effects that you mentioned, I am now starting to think that I was mistaken. Nevertheless, I am trying to understand the bug a little better in case I am still susceptible to it. I tried to summarize my understanding below, and maybe you can tell me if I am correct. For HTree directories, the problem is described in LU-2638. But since I am running Lustre >2.4, I should not be affected by this bug. For non-Tree directories, the problem is described in LU-5626. In order to trigger the bug, the following steps must happen: 1) A non-HTree directory created under Lustre 1.8 (which does not have a FID for its “..” entry) gets moved to a different parent directory. 2) Lustre tries to update the “..” entry in the directory, and if there is not enough space in the existing entry, it creates a new “..” entry and adds the FID. 3) Something happens to the MDT, and fsck needs to be run. When it runs, it notices that “..” is no longer the second entry in the directory. 4) fsck tries to “fix” the problem by moving the “..” entry back to its original position. With the FID in place, there is not enough space in the original position, but fsck moves it anyway which causes the “..” entry to overwrite part of the third entry in the directory. If that is correct, then steps #1 and #2 can happen without causing any problems. It is only at steps #3 and #4 that the corruption occurs, and as long as dirdata is disabled before fsck is run, then there should not be any problems. Is that explanation accurate? -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
> On Nov 3, 2015, at 2:20 PM, Patrick Farrellwrote: > > Mmm, unfortunately, still not quite right - Disabling dirdata will not save > you in the conversion to HTree case either. It will just prevent *more* > directories from getting a misplaced ".." dentry to begin with. Sorry. Yes, that is what I meant to say. (My sentence was supposed to read “…correctly placed…”. I was thinking of a non-corrupted non-HTree directory that was moved and then converted. Poor wording on my part.) > As to size... I figured it out once - But it depends on file name length in > the directory, since the dentry includes the file name. Once the total size > of dentries in a directory exceeds 4096 bytes (one inode), then it will be > converted to an HTree, I believe. > > So, at something like 32 bytes a dentry, which is like a 10-16 or so > character file name (exact dentry length here requires more checking than > I've got time for, but it's close), then you've got 32=2^5, 4096 = 2^12, so > 2^12/2^5 = 2^7 or 128 dentries. > > But of course, longer file names --> bigger dentries --> fewer dentries > before conversion to HTree. So it doesn’t seem like it takes many entries at all. Interesting. We have many directories much larger than that and no sign of any corruption. I’ll have to spend some more time looking into this. -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
Comment inline... From: Mohr Jr, Richard Frank (Rick Mohr) [rm...@utk.edu] Sent: Tuesday, November 03, 2015 4:47 PM To: Patrick Farrell Cc: lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626) > On Nov 3, 2015, at 2:20 PM, Patrick Farrell <p...@cray.com> wrote: > > Mmm, unfortunately, still not quite right - Disabling dirdata will not save > you in the conversion to HTree case either. It will just prevent *more* > directories from getting a misplaced ".." dentry to begin with. Sorry. Yes, that is what I meant to say. (My sentence was supposed to read “…correctly placed…”. I was thinking of a non-corrupted non-HTree directory that was moved and then converted. Poor wording on my part.) > As to size... I figured it out once - But it depends on file name length in > the directory, since the dentry includes the file name. Once the total size > of dentries in a directory exceeds 4096 bytes (one inode), then it will be > converted to an HTree, I believe. > > So, at something like 32 bytes a dentry, which is like a 10-16 or so > character file name (exact dentry length here requires more checking than > I've got time for, but it's close), then you've got 32=2^5, 4096 = 2^12, so > 2^12/2^5 = 2^7 or 128 dentries. > > But of course, longer file names --> bigger dentries --> fewer dentries > before conversion to HTree. So it doesn’t seem like it takes many entries at all. Interesting. We have many directories much larger than that and no sign of any corruption. I’ll have to spend some more time looking into this. PAF: Remember, the specific conditions are pretty tight. Created under 1.8, not empty (if it's empty, the .. dentry is not misplaced when moved) but also non-htree, then moved with dirdata enabled, and then grown to this larger size. How many existing (small) directories do you move and then add a bunch of files to? It's a pretty rare operation. We only hit it at Martin's site because of an automated tool they have to re-arrange user/job directories. -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
Hi Chris and Patrick, I was sick last week so I have found this conversation not before today, sorry On 10/27/2015 05:06 PM, Patrick Farrell wrote: > If you read LU-5626 carefully, there's an explanation of the exact nature of > the damage, and having that should let you make partial recoveries by hand. > I'm not familiar with the ll_recover_lost_found_objs tool, but I doubt it > would prove helpful in this instance. there is no tool like ll_recover_lost_found_objs for the MDT. On OSTs this would be the right choice. > Note that there's two forms to this corruption. One is if you move a > directory which was created before dirdata was enabled, then the '..' entry > ends up in the wrong place. This does not trouble Lustre, but fsck reports > it as an error and will 'correct' it, which has the effect of (usually) > overwriting one dentry in the directory when it creates a new '..' dentry in > the correct location. > > I don't *think* that one causes the MDT to go read only, but I could be > wrong. I *think* what causes the MDT to go read only is the other problem: > > When you have a non-htree directory (not too many items in it, all directory > entries in a single inode) that is in the bad state described above (with the > '..' dentry in the wrong place after being moved) and that directory has > enough files added to it that it becomes an htree directory, the resulting > directory is corrupted more severely. We never sorted out the precise > details of this - I believe we chose to simply delete any directories in this > state. (I think lfsck did it for us, but can't recall for sure.) If I recall correctly, moving (or renaming) the corrupted directory to another place caused the MDT to go readonly, probably adding more files as Patrick wrote before is another trigger. In our case we captured the full ouptut of e2fsck which contained the original names and the inodes. fsck moved some of the files and subdiretories of the corrupted directories to lost+found. With the information contained in the e2fsck output we could move them back from lost+found to their original place on the ldiskfs level (I have parsed the e2fsck output for a pattern matching the inode numbers and created a script out of it). We had to repeat this a couple of times, because either some of the subdirectories moved to lost+found were in a bad shape themselves or were further damaged later when the owners added files to them later on or moved them around. So, if you have captured all your e2fsck output and you haven't yet cleaned up lost+found, you still can recover the data. lfsck would probably throw away the objects on the OSTs because it thinks they are orphane objects left over after deleting the files. best regards, Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
Chris, I had the joy of taking this one apart personally. We mostly let lfsck do the repair and moved on, accepting that some of the dentries were trashed. I think, for important things, our field staff did some manual recovery with the e2fsprogs tools, but it was not a common enough problem that we documented a procedure. If you read LU-5626 carefully, there's an explanation of the exact nature of the damage, and having that should let you make partial recoveries by hand. I'm not familiar with the ll_recover_lost_found_objs tool, but I doubt it would prove helpful in this instance. Note that there's two forms to this corruption. One is if you move a directory which was created before dirdata was enabled, then the '..' entry ends up in the wrong place. This does not trouble Lustre, but fsck reports it as an error and will 'correct' it, which has the effect of (usually) overwriting one dentry in the directory when it creates a new '..' dentry in the correct location. I don't *think* that one causes the MDT to go read only, but I could be wrong. I *think* what causes the MDT to go read only is the other problem: When you have a non-htree directory (not too many items in it, all directory entries in a single inode) that is in the bad state described above (with the '..' dentry in the wrong place after being moved) and that directory has enough files added to it that it becomes an htree directory, the resulting directory is corrupted more severely. We never sorted out the precise details of this - I believe we chose to simply delete any directories in this state. (I think lfsck did it for us, but can't recall for sure.) I'd advise reading LU-5626 with care, and I'd also suggest you might turn off 'dirdata' on your MDT until you have this under control. That will at least prevent any more directories from ending up in either of these bad states if you use the filesystem without updating Lustre to a version with the LU-5626 patch in it. - Patrick From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of Chris Hunter [chris.hun...@yale.edu] Sent: Tuesday, October 27, 2015 10:22 AM To: lustre-discuss@lists.lustre.org Subject: [lustre-discuss] recovery MDT ".." directory entries (LU-5626) We have a lustre 1.8 filesystem that was upgraded to lustre 2.x and "dirdata" feature was enabled. We encountered LU-5626/LU-2638 issue with ".." directory entries. Are there established recovery steps for this issue ? If I run fsck, the directory entries will be moved into lost+found. I assume the next step is to run the ll_recover_lost_found_objs tool ? Can you share any advice/experience about recovery ? thanks, chris hunter chris.hun...@yale.edu ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] recovery MDT ".." directory entries (LU-5626)
We have a lustre 1.8 filesystem that was upgraded to lustre 2.x and "dirdata" feature was enabled. We encountered LU-5626/LU-2638 issue with ".." directory entries. Are there established recovery steps for this issue ? If I run fsck, the directory entries will be moved into lost+found. I assume the next step is to run the ll_recover_lost_found_objs tool ? Can you share any advice/experience about recovery ? thanks, chris hunter chris.hun...@yale.edu ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
Excuse me, I said 'lfsck' below, but I meant 'fsck'. From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of Patrick Farrell [p...@cray.com] Sent: Tuesday, October 27, 2015 11:06 AM To: Chris Hunter; lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626) Chris, I had the joy of taking this one apart personally. We mostly let lfsck do the repair and moved on, accepting that some of the dentries were trashed. I think, for important things, our field staff did some manual recovery with the e2fsprogs tools, but it was not a common enough problem that we documented a procedure. If you read LU-5626 carefully, there's an explanation of the exact nature of the damage, and having that should let you make partial recoveries by hand. I'm not familiar with the ll_recover_lost_found_objs tool, but I doubt it would prove helpful in this instance. Note that there's two forms to this corruption. One is if you move a directory which was created before dirdata was enabled, then the '..' entry ends up in the wrong place. This does not trouble Lustre, but fsck reports it as an error and will 'correct' it, which has the effect of (usually) overwriting one dentry in the directory when it creates a new '..' dentry in the correct location. I don't *think* that one causes the MDT to go read only, but I could be wrong. I *think* what causes the MDT to go read only is the other problem: When you have a non-htree directory (not too many items in it, all directory entries in a single inode) that is in the bad state described above (with the '..' dentry in the wrong place after being moved) and that directory has enough files added to it that it becomes an htree directory, the resulting directory is corrupted more severely. We never sorted out the precise details of this - I believe we chose to simply delete any directories in this state. (I think lfsck did it for us, but can't recall for sure.) I'd advise reading LU-5626 with care, and I'd also suggest you might turn off 'dirdata' on your MDT until you have this under control. That will at least prevent any more directories from ending up in either of these bad states if you use the filesystem without updating Lustre to a version with the LU-5626 patch in it. - Patrick From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of Chris Hunter [chris.hun...@yale.edu] Sent: Tuesday, October 27, 2015 10:22 AM To: lustre-discuss@lists.lustre.org Subject: [lustre-discuss] recovery MDT ".." directory entries (LU-5626) We have a lustre 1.8 filesystem that was upgraded to lustre 2.x and "dirdata" feature was enabled. We encountered LU-5626/LU-2638 issue with ".." directory entries. Are there established recovery steps for this issue ? If I run fsck, the directory entries will be moved into lost+found. I assume the next step is to run the ll_recover_lost_found_objs tool ? Can you share any advice/experience about recovery ? thanks, chris hunter chris.hun...@yale.edu ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
Chris, That's probably best, to be safe. By the way, this is one where (if I remember right) sometimes you run fsck, let it correct things, then you must run it again - As it will find new things to object about in the modified filesystem. So if you weren't already, running fsck repeatedly until it doesn't complain is best. (That's also a best practice anyway..) I can't find a -d or -D option in my copy of fsck. Not sure what it means? Best of luck, - Patrick On 10/27/2015 12:52 PM, Chris Hunter wrote: Hi Patrick, Thanks for sharing your experience, looks like you did the bulk of troubleshooting in the Jira ticket. I assume I should have a clean filesystem (ie. run fsck first) before disabling the dirdata feature ? After I disable dirdata, I will need to run fsck with the "-D" option ? FYI, ll_recover_lost_found_objs tool will recover files from lost+found on *OST* volumes (ie. moves them back into /O/0/dXX directory) based on extended file attributes. Section 37.5 of the HPDD manual. thanks chris hunter chris.hun...@yale.edu On 10/27/2015 12:06 PM, Patrick Farrell wrote: Chris, I had the joy of taking this one apart personally. We mostly let lfsck do the repair and moved on, accepting that some of the dentries were trashed. I think, for important things, our field staff did some manual recovery with the e2fsprogs tools, but it was not a common enough problem that we documented a procedure. If you read LU-5626 carefully, there's an explanation of the exact nature of the damage, and having that should let you make partial recoveries by hand. I'm not familiar with the ll_recover_lost_found_objs tool, but I doubt it would prove helpful in this instance. Note that there's two forms to this corruption. One is if you move a directory which was created before dirdata was enabled, then the '..' entry ends up in the wrong place. This does not trouble Lustre, but fsck reports it as an error and will 'correct' it, which has the effect of (usually) overwriting one dentry in the directory when it creates a new '..' dentry in the correct location. I don't *think* that one causes the MDT to go read only, but I could be wrong. I *think* what causes the MDT to go read only is the other problem: When you have a non-htree directory (not too many items in it, all directory entries in a single inode) that is in the bad state described above (with the '..' dentry in the wrong place after being moved) and that directory has enough files added to it that it becomes an htree directory, the resulting directory is corrupted more severely. We never sorted out the precise details of this - I believe we chose to simply delete any directories in this state. (I think lfsck did it for us, but can't recall for sure.) I'd advise reading LU-5626 with care, and I'd also suggest you might turn off 'dirdata' on your MDT until you have this under control. That will at least prevent any more directories from ending up in either of these bad states if you use the filesystem without updating Lustre to a version with the LU-5626 patch in it. - Patrick From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of Chris Hunter [chris.hun...@yale.edu] Sent: Tuesday, October 27, 2015 10:22 AM To: lustre-discuss@lists.lustre.org Subject: [lustre-discuss] recovery MDT ".." directory entries (LU-5626) We have a lustre 1.8 filesystem that was upgraded to lustre 2.x and "dirdata" feature was enabled. We encountered LU-5626/LU-2638 issue with ".." directory entries. Are there established recovery steps for this issue ? If I run fsck, the directory entries will be moved into lost+found. I assume the next step is to run the ll_recover_lost_found_objs tool ? Can you share any advice/experience about recovery ? thanks, chris hunter chris.hun...@yale.edu ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=AwIFAg=-dg2m7zWuuDZ0MUcV7Sdqw=d_G2h_sZYG4xtHMeKo8QgjDmOcMVdQvYgM-5Dri1AOY=83OYH_ms_eqiU1wnAGo9fAzmYQX3fBG7y1eio_j_xpU=hl5TuadAk5fXgjermbroSP81LGazmXpj1BxqaIsP7Cw= ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
Hi Patrick, Thanks for sharing your experience, looks like you did the bulk of troubleshooting in the Jira ticket. I assume I should have a clean filesystem (ie. run fsck first) before disabling the dirdata feature ? After I disable dirdata, I will need to run fsck with the "-D" option ? FYI, ll_recover_lost_found_objs tool will recover files from lost+found on *OST* volumes (ie. moves them back into /O/0/dXX directory) based on extended file attributes. Section 37.5 of the HPDD manual. thanks chris hunter chris.hun...@yale.edu On 10/27/2015 12:06 PM, Patrick Farrell wrote: Chris, I had the joy of taking this one apart personally. We mostly let lfsck do the repair and moved on, accepting that some of the dentries were trashed. I think, for important things, our field staff did some manual recovery with the e2fsprogs tools, but it was not a common enough problem that we documented a procedure. If you read LU-5626 carefully, there's an explanation of the exact nature of the damage, and having that should let you make partial recoveries by hand. I'm not familiar with the ll_recover_lost_found_objs tool, but I doubt it would prove helpful in this instance. Note that there's two forms to this corruption. One is if you move a directory which was created before dirdata was enabled, then the '..' entry ends up in the wrong place. This does not trouble Lustre, but fsck reports it as an error and will 'correct' it, which has the effect of (usually) overwriting one dentry in the directory when it creates a new '..' dentry in the correct location. I don't *think* that one causes the MDT to go read only, but I could be wrong. I *think* what causes the MDT to go read only is the other problem: When you have a non-htree directory (not too many items in it, all directory entries in a single inode) that is in the bad state described above (with the '..' dentry in the wrong place after being moved) and that directory has enough files added to it that it becomes an htree directory, the resulting directory is corrupted more severely. We never sorted out the precise details of this - I believe we chose to simply delete any directories in this state. (I think lfsck did it for us, but can't recall for sure.) I'd advise reading LU-5626 with care, and I'd also suggest you might turn off 'dirdata' on your MDT until you have this under control. That will at least prevent any more directories from ending up in either of these bad states if you use the filesystem without updating Lustre to a version with the LU-5626 patch in it. - Patrick From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of Chris Hunter [chris.hun...@yale.edu] Sent: Tuesday, October 27, 2015 10:22 AM To: lustre-discuss@lists.lustre.org Subject: [lustre-discuss] recovery MDT ".." directory entries (LU-5626) We have a lustre 1.8 filesystem that was upgraded to lustre 2.x and "dirdata" feature was enabled. We encountered LU-5626/LU-2638 issue with ".." directory entries. Are there established recovery steps for this issue ? If I run fsck, the directory entries will be moved into lost+found. I assume the next step is to run the ll_recover_lost_found_objs tool ? Can you share any advice/experience about recovery ? thanks, chris hunter chris.hun...@yale.edu ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=AwIFAg=-dg2m7zWuuDZ0MUcV7Sdqw=d_G2h_sZYG4xtHMeKo8QgjDmOcMVdQvYgM-5Dri1AOY=83OYH_ms_eqiU1wnAGo9fAzmYQX3fBG7y1eio_j_xpU=hl5TuadAk5fXgjermbroSP81LGazmXpj1BxqaIsP7Cw= ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
> On Oct 27, 2015, at 11:22 AM, Chris Hunterwrote: > > We have a lustre 1.8 filesystem that was upgraded to lustre 2.x and "dirdata" > feature was enabled. We encountered LU-5626/LU-2638 issue with ".." directory > entries. Are there established recovery steps for this issue ? > > If I run fsck, the directory entries will be moved into lost+found. > I assume the next step is to run the ll_recover_lost_found_objs tool ? > > Can you share any advice/experience about recovery ? I only recall seeing the bug once on my file system (about a year after we upgraded), so it really hasn’t been a problem. It has been a while, so I don’t remember the details. But I think I just handled it by not letting fsck make any “corrections”. -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
Rick, That's something of a time bomb - If one of those directories fsck wishes it could correct is small and grows in number of files, you'll get the MDT going read only (and a few odd LBUGs if you try to put it back). - Patrick On 10/27/2015 12:18 PM, Mohr Jr, Richard Frank (Rick Mohr) wrote: On Oct 27, 2015, at 11:22 AM, Chris Hunterwrote: We have a lustre 1.8 filesystem that was upgraded to lustre 2.x and "dirdata" feature was enabled. We encountered LU-5626/LU-2638 issue with ".." directory entries. Are there established recovery steps for this issue ? If I run fsck, the directory entries will be moved into lost+found. I assume the next step is to run the ll_recover_lost_found_objs tool ? Can you share any advice/experience about recovery ? I only recall seeing the bug once on my file system (about a year after we upgraded), so it really hasn’t been a problem. It has been a while, so I don’t remember the details. But I think I just handled it by not letting fsck make any “corrections”. -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org