Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)

2015-11-04 Thread Dilger, Andreas
On 2015/11/04, 02:42, "lustre-discuss on behalf of Martin Hecht"
 wrote:

>On 11/04/2015 03:23 AM, Patrick Farrell wrote:
>> PAF: Remember, the specific conditions are pretty tight.  Created under
>>1.8, not empty (if it's empty, the .. dentry is not misplaced when
>>moved) but also non-htree, then moved with dirdata enabled, and then
>>grown to this larger size.  How many existing (small) directories do you
>>move and then add a bunch of files to?  It's a pretty rare operation.
>>We only hit it at Martin's site because of an automated tool they have
>>to re-arrange user/job directories.
>Well, not only because of the tool. Especially, because when the
>directories have been moved by the tool, no files are added anymore.
>However, our mechanism gives a reason to the users to move their data
>from time to time (that's not the intention of the mechanism, but that's
>how some users react).
>
>But I'm not quite sure anymore if moving the directories is really a
>precondition to run into LU-5626.
>We have run the background lfsck which adds the FID to the existing
>dentries. This might be an important detail, because in our case a
>second '..' entry containing the FID was presumably created by lfsck (in
>the wrong place), and not by moving the directory. To my current
>understanding the user then only has to add some files to trigger the
>LBUG.
>A subsequent e2fsck will not only find this particular directory but all
>other small directories with a '..' entry in the wrong place. When
>e2fsck tries to fix these directories, some entries are overwritten by
>the FID and these files are then moved to lost+found.

Note that newer versions of LFSCK namespace checking (2.6 or 2.7, don't
recall offhand) will be able to return such entries from lost+found back
into the proper parent directory in the namespace, assuming they were
created under 2.x.  Lustre stores an extra "link" xattr on each inode with
the filename and parent directory FID for each link to the file (up to the
available xattr space for each inode), so in case of directory corruption
it would be possible to rebuild the directory structure just from the
"link" xattrs on each file.

In the meantime, I attached a script to LU-5626 that could be used to
re-link files from lost+found into the right directory and filename based
on the output from e2fsck.  It is a bit rough (needs manual editing of
pathnames), but may be useful if someone has hit this problem.

Cheers, Andreas

>If one of these first entries happens to be a small subdirectory, I
>believe there is a chance to run into the same issue again, when you
>move everything back to the original location after the e2fsck and
>someone starts adding files in these subdirectories.
>
>However, the preconditions are still quite narrow: small directories,
>not empty, created without fid, then converted by lfsck (or
>alternatively moved to a different place which would also create the
>second '..' entry). To trigger the LBUG files need to be added to one of
>these directories and for a second occurrence of the LBUG the same
>conditions must hold for another subdirectory which must have been at
>the very beginning of the directory.
>
>Martin
>
>
>


Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)

2015-11-04 Thread Chris Hunter



On 11/02/2015 12:30 PM, Martin Hecht wrote:

Hi Chris and Patrick,

I was sick last week so I have found this conversation not before today,
sorry

On 10/27/2015 05:06 PM, Patrick Farrell wrote:

If you read LU-5626 carefully, there's an explanation of the exact nature of 
the damage, and having that should let you make partial recoveries by hand.  
I'm not familiar with the ll_recover_lost_found_objs tool, but I doubt it would 
prove helpful in this instance.

there is no tool like ll_recover_lost_found_objs for the MDT. On OSTs
this would be the right choice.


Note that there's two forms to this corruption.  One is if you move a directory 
which was created before dirdata was enabled, then the '..' entry ends up in 
the wrong place.  This does not trouble Lustre, but fsck reports it as an error 
and will 'correct' it, which has the effect of (usually) overwriting one dentry 
in the directory when it creates a new '..' dentry in the correct location.

I don't *think* that one causes the MDT to go read only, but I could be wrong.  
I *think* what causes the MDT to go read only is the other problem:

When you have a non-htree directory (not too many items in it, all directory 
entries in a single inode) that is in the bad state described above (with the 
'..' dentry in the wrong place after being moved) and that directory has enough 
files added to it that it becomes an htree directory, the resulting directory 
is corrupted more severely.  We never sorted out the precise details of this - 
I believe we chose to simply delete any directories in this state.  (I think 
lfsck did it for us, but can't recall for sure.)

If I recall correctly, moving (or renaming) the corrupted directory to
another place caused the MDT to go readonly, probably adding more files
as Patrick wrote before is another trigger.

In our case we captured the full ouptut of e2fsck which contained the
original names and the inodes. fsck moved some of the files and
subdiretories of the corrupted directories to lost+found. With the
information contained in the e2fsck output we could move them back from
lost+found to their original place on the ldiskfs level (I have parsed
the e2fsck output for a pattern matching the inode numbers and created a
script out of it). We had to repeat this a couple of times, because
either some of the subdirectories moved to lost+found were in a bad
shape themselves or were further damaged later when the owners added
files to them later on or moved them around.

So, if you have captured all your e2fsck output and you haven't yet
cleaned up lost+found, you still can recover the data. lfsck would
probably throw away the objects on the OSTs because it thinks they are
orphane objects left over after deleting the files.

best regards,
Martin


Yes I believe you want to (manually) recover the directories from 
lost+found back to ROOT on the MDT before lfsck/oi_scrub runs. I don't 
think lfsck on the MDT will impact orphan objects on the OSTs.


regards,
chris hunter

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)

2015-11-04 Thread Patrick Farrell
Martin,

Our observation at the time was that lfsck did not add the fid to the .. dentry 
unless there was already space in the appropriate location.  I don't remember 
digging in to the details, but that was our observation at the time.  (Since it 
meant lfsck namespace was behaving, in a sense, correctly, we were initially 
puzzled but decided it was all right.  I seem to remember reading a comment 
somewhere that the developers decided rearranging the dentries was too hard, so 
they'd only add fids were space was already present.)

It's possible we didn't get that quite right, though it would have to be 
partial somehow - misplaced .. dentries with fids were definitely not universal 
after running the namespace lfsck. (Drawing on experience from other sites here 
as well.)

In any case, directories with bad .. dentries can be identified with fsck 
anyway.

- Patrick

From: Martin Hecht [he...@hlrs.de]
Sent: Wednesday, November 04, 2015 3:42 AM
To: Patrick Farrell; Mohr Jr, Richard Frank (Rick Mohr)
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)

On 11/04/2015 03:23 AM, Patrick Farrell wrote:
> PAF: Remember, the specific conditions are pretty tight.  Created under 1.8, 
> not empty (if it's empty, the .. dentry is not misplaced when moved) but also 
> non-htree, then moved with dirdata enabled, and then grown to this larger 
> size.  How many existing (small) directories do you move and then add a bunch 
> of files to?  It's a pretty rare operation.  We only hit it at Martin's site 
> because of an automated tool they have to re-arrange user/job directories.
Well, not only because of the tool. Especially, because when the
directories have been moved by the tool, no files are added anymore.
However, our mechanism gives a reason to the users to move their data
from time to time (that's not the intention of the mechanism, but that's
how some users react).

But I'm not quite sure anymore if moving the directories is really a
precondition to run into LU-5626.
We have run the background lfsck which adds the FID to the existing
dentries. This might be an important detail, because in our case a
second '..' entry containing the FID was presumably created by lfsck (in
the wrong place), and not by moving the directory. To my current
understanding the user then only has to add some files to trigger the LBUG.
A subsequent e2fsck will not only find this particular directory but all
other small directories with a '..' entry in the wrong place. When
e2fsck tries to fix these directories, some entries are overwritten by
the FID and these files are then moved to lost+found.
If one of these first entries happens to be a small subdirectory, I
believe there is a chance to run into the same issue again, when you
move everything back to the original location after the e2fsck and
someone starts adding files in these subdirectories.

However, the preconditions are still quite narrow: small directories,
not empty, created without fid, then converted by lfsck (or
alternatively moved to a different place which would also create the
second '..' entry). To trigger the LBUG files need to be added to one of
these directories and for a second occurrence of the LBUG the same
conditions must hold for another subdirectory which must have been at
the very beginning of the directory.

Martin


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)

2015-11-04 Thread Martin Hecht
On 11/04/2015 03:23 AM, Patrick Farrell wrote:
> PAF: Remember, the specific conditions are pretty tight.  Created under 1.8, 
> not empty (if it's empty, the .. dentry is not misplaced when moved) but also 
> non-htree, then moved with dirdata enabled, and then grown to this larger 
> size.  How many existing (small) directories do you move and then add a bunch 
> of files to?  It's a pretty rare operation.  We only hit it at Martin's site 
> because of an automated tool they have to re-arrange user/job directories.
Well, not only because of the tool. Especially, because when the
directories have been moved by the tool, no files are added anymore.
However, our mechanism gives a reason to the users to move their data
from time to time (that's not the intention of the mechanism, but that's
how some users react).

But I'm not quite sure anymore if moving the directories is really a
precondition to run into LU-5626.
We have run the background lfsck which adds the FID to the existing
dentries. This might be an important detail, because in our case a
second '..' entry containing the FID was presumably created by lfsck (in
the wrong place), and not by moving the directory. To my current
understanding the user then only has to add some files to trigger the LBUG.
A subsequent e2fsck will not only find this particular directory but all
other small directories with a '..' entry in the wrong place. When
e2fsck tries to fix these directories, some entries are overwritten by
the FID and these files are then moved to lost+found.
If one of these first entries happens to be a small subdirectory, I
believe there is a chance to run into the same issue again, when you
move everything back to the original location after the e2fsck and
someone starts adding files in these subdirectories.

However, the preconditions are still quite narrow: small directories,
not empty, created without fid, then converted by lfsck (or
alternatively moved to a different place which would also create the
second '..' entry). To trigger the LBUG files need to be added to one of
these directories and for a second occurrence of the LBUG the same
conditions must hold for another subdirectory which must have been at
the very beginning of the directory.

Martin




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org