> On Oct 27, 2015, at 1:46 PM, Patrick Farrell <p...@cray.com> wrote:
> 
> That's something of a time bomb - If one of those directories fsck wishes it 
> could correct is small and grows in number of files, you'll get the MDT going 
> read only (and a few odd LBUGs if you try to put it back).

I was looking back over the incident where I thought I had hit this bug, but 
based on the lack of side effects that you mentioned, I am now starting to 
think that I was mistaken.  Nevertheless, I am trying to understand the bug a 
little better in case I am still susceptible to it.  I tried to summarize my 
understanding below, and maybe you can tell me if I am correct.

For HTree directories, the problem is described in LU-2638.  But since I am 
running Lustre >2.4, I should not be affected by this bug.

For non-Tree directories, the problem is described in LU-5626.  In order to 
trigger the bug, the following steps must happen:

1) A non-HTree directory created under Lustre 1.8 (which does not have a FID 
for its “..” entry) gets moved to a different parent directory.

2) Lustre tries to update the “..” entry in the directory, and if there is not 
enough space in the existing entry, it creates a new “..” entry and adds the 
FID.

3) Something happens to the MDT, and fsck needs to be run.  When it runs, it 
notices that “..” is no longer the second entry in the directory.

4) fsck tries to “fix” the problem by moving the “..” entry back to its 
original position.  With the FID in place, there is not enough space in the 
original position, but fsck moves it anyway which causes the “..” entry to 
overwrite part of the third entry in the directory.

If that is correct, then steps #1 and #2 can happen without causing any 
problems.  It is only at steps #3 and #4 that the corruption occurs, and as 
long as dirdata is disabled before fsck is run, then there should not be any 
problems.

Is that explanation accurate?

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to