Re: [lustre-discuss] 1.8 client on 3.13.0 kernel

2015-09-11 Thread Martin Hecht
a few more comments in-line

On 09/10/2015 09:11 PM, Lewis Hyatt wrote:
> Thanks a lot for the info, a little more optimistic :-).
>
> -Lewis
>
> On 9/10/15 11:17 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote:
>> Lewis,
>>
>> I did an upgrade from Lustre 1.8.6 to 2.4.3 on our servers, and for
>> the most part things went pretty good.  I’ll chime in on a couple of
>> Martin’s points and mention a few other things.
>>
>>> On Sep 10, 2015, at 9:30 AM, Martin Hecht  wrote:
>>>
>>> In any case the file systems should be clean before starting the
>>> upgrade, so I would recommend to run e2fsck on all targets and repair
>>> them before starting the upgrade. We did so, but unfortunately our
>>> e2fsprogs were not really up to date and after our lustre upgrade a lot
>>> of fixes for e2fsprogs were committed to whamclouds e2fsprogs git. So,
>>> probably some errors on the file systems were still present, but
>>> unnoticed when we did the upgrade.
>>
>> This is a very important point.  While I didn’t run e2fsck before the
>> upgrade (but maybe I should have), I made sure to install the latest
>> e2fsprogs.
well, a version of the e2fsprogs with some important fixes was released
shortly after we did the upgrade. Maybe this was just because we ran
into these bugs, and the vendor escalated our tickets to whamcloud/intel

>>
>>> Lustre 2 introduces the FID (which is something like an inode number,
>>> where lustre 1.8 used the inode number of the underlying ldiskfs, but
>>> with the possibility to have several MDTs in one file system a
>>> replacement was needed). The FID is stored in the inode, but it can
>>> also
>>> be activated that the FIDs are stored in the directory node, which
>>> makes
>>> lookups faster, especially when there are many files in a directory.
>>> However, there were bugs in the code that takes care about adding the
>>> FID to the directory entry when the file system is converted from
>>> 1.8 to
>>> 2.x. So, I would recommend to use a version in which these bug are
>>> solved. We went to 2.4.1 that time. By default this fid_in_dirent
>>> feature is not automatically enabled, however, this is the only point
>>> where a performance boost may be expected... so we took the risk to
>>> enable this... and ran into some bugs.
>>
>> Enabling fid_in_dirent prevents you from backing out of the upgrade. 
>> In theory, if you upgraded to Lustre 2.x without enabling
>> fid_in_dirent, you could always revert back to Lustre 1.8.  We tried
>> this on a test system, and the downgrade seemed to work.  However,
>> this was a small scale test and I have never tried it on a production
>> file system.  But if you want to minimize possible complications, you
>> could always leave this disabled for a while after the updgrade, and
>> then if things are going well, enable it later on.
actually, the FID is added to new contents, and you have to run the
oi_scrub once to convert the file system. That might be important to
know when you decide to use this feature. On the other hand, if you
don't enable fid_in_dirent, you can go back theoretically, but I think
the FID is still added to regular files (not to the directory entry),
and you can't read these files created with lustre 2 after the
downgrade. However, running lustre 2 without fid_in_dirent is possiblem
at least in the earlier 2.x versions - about 2.5 onwards you would have
to double check. This is sometimes called "Compatibility Mode IGIF"

Anyhow, to avoid running into the problem with the directory entries, I
would also recommend not to enable fid_in_dirent or make sure to choose
a version which has all the fixes for this problem. There are different
types of directories, large and small ones which have a different
structure, and the issue was already fixed for some cases, but we have
hit another case which was not correctly handled until we hit that bug
with our upgrade.

>>
>> My only other advice is to test as much as possible prior to the
>> upgrade.  If you have a little test hardware, install the same Lustre
>> 1.8 version you are currently running in production and then try
>> upgrading that to the new Lustre version.  I think preparation is the
>> key.  I think I spent about 2 months reading about upgrade
>> procedures, talking with others who have upgraded, reading JIRA bug
>> reports, and running tests on hardware.
well, our vendor was preparing the upgrade for about a year and did
intensive testing on several file systems and they changed the targeted
lustre version several times. The problem is that some bugs are only hit
on the real production system. For instance the fid_in_dirent issue: It
depends on the number of files in the directory, and you only notice the
bug when you have upgraded the file system and try to move some files
from such a directory to another place. I'm not sure if it has to be a
directory created after the upgrade, maybe the destination just has to
be a different directory. But to be honest you wouldn't test this

Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?

2015-09-11 Thread Martin Hecht
On 09/11/2015 05:23 AM, Dilger, Andreas wrote:
> On 2015/09/10, 6:54 PM, "Chris Hunter"  wrote:
>
>> We experienced file corruption on several OSTs. We proceeded through
>> recovery using e2fsck & ll_recover_lost_found_obj tools.
>> Following these steps, e2fsck came out clean.
>>
>> The file corruption did not impact the MDT. The files were still
>> referenced by the MDT. Accessing the file on a lustre client (ie. ls -l)
>> would report error "Cannot allocate memory"
>>
>> Following OST recovery steps, we started removing the corrupt files via
>> "unlink" command on lustre client (rm command would not remove file).
>>
>> Now dry-run e2fsck of the OST is reporting errors:
>> "deleted/unused inodes" in Pass 2 (checking directory structure),
>> "Unattached inodes" in Pass 4 (checking reference counts)
>> "free block count wrong" in Pass 5 (checking group summary information).
>>
>> Is e2fsck errors expected when unlinking files ?
> No, the "unlink" command is just avoiding the -ENOENT error that "rm" gets
> by calling "stat()" on the file before trying to unlink it.  This
> shouldn't cause any errors on the OSTs, unless there is ongoing corruption
> from the back-end storage.
Chris, with "live filesystem" you mean that you ran a readonly e2fsck on
a lustre file system while it was mounted and clients working on the
file system? Then, it is expected that e2fsck reports some error,
because the file system contents changes while the e2fsck is running and
the in-memory directory structure does not fit to the on-disk data
anymore. However, as Andreas points out, it might as well be a sign of
ongoing corruption on the storage, but only an offline e2fsck (i.e.
while the OST is unmounted, and the journal is played back) can clarify
this. 

regards,
Martin



smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1.8 client on 3.13.0 kernel

2015-09-11 Thread Patrick Farrell
Having an MDT backup might perhaps have allowed recovery and trying an improved 
upgrade process and/or upgrading to a version with the fixes in it.  It's not a 
bad idea if practical.  (And yes, the changes are MDT specific.)

By the way, the fid-in-dirent bug that Martin described is fixed in the most 
recent 2.5 from Intel, but I don't think it's fixed in 2.4?  Unsure.
But I'd recommend targeting 2.5 as the destination version for an upgrade.

From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
Chris Hunter [chris.hun...@yale.edu]
Sent: Friday, September 11, 2015 8:02 AM
To: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] 1.8 client on 3.13.0 kernel

Hi
I believe FID & dirdata feature changes would only affect the MDT during
a lustre upgrade. In hindsight/retrospective do you think a file-level
backup/restore of the MDT would have avoided some of these issues ?

thanks
chris hunter

> On 9/10/15 11:17 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote:
>> Lewis,
>>
>> I did an upgrade from Lustre 1.8.6 to 2.4.3 on our servers, and for the most 
>> part things went pretty good.  I?ll chime in on a couple of Martin?s points 
>> and mention a few other things.
>>
>>> On Sep 10, 2015, at 9:30 AM, Martin Hecht  wrote:
>>>
>>> In any case the file systems should be clean before starting the
>>> upgrade, so I would recommend to run e2fsck on all targets and repair
>>> them before starting the upgrade. We did so, but unfortunately our
>>> e2fsprogs were not really up to date and after our lustre upgrade a lot
>>> of fixes for e2fsprogs were committed to whamclouds e2fsprogs git. So,
>>> probably some errors on the file systems were still present, but
>>> unnoticed when we did the upgrade.
>>
>> This is a very important point.  While I didn?t run e2fsck before the 
>> upgrade (but maybe I should have), I made sure to install the latest 
>> e2fsprogs.
>>
>>> Lustre 2 introduces the FID (which is something like an inode number,
>>> where lustre 1.8 used the inode number of the underlying ldiskfs, but
>>> with the possibility to have several MDTs in one file system a
>>> replacement was needed). The FID is stored in the inode, but it can also
>>> be activated that the FIDs are stored in the directory node, which makes
>>> lookups faster, especially when there are many files in a directory.
>>> However, there were bugs in the code that takes care about adding the
>>> FID to the directory entry when the file system is converted from 1.8 to
>>> 2.x. So, I would recommend to use a version in which these bug are
>>> solved. We went to 2.4.1 that time. By default this fid_in_dirent
>>> feature is not automatically enabled, however, this is the only point
>>> where a performance boost may be expected... so we took the risk to
>>> enable this... and ran into some bugs.
>>
>> Enabling fid_in_dirent prevents you from backing out of the upgrade.  In 
>> theory, if you upgraded to Lustre 2.x without enabling fid_in_dirent, you 
>> could always revert back to Lustre 1.8.  We tried this on a test system, and 
>> the downgrade seemed to work.  However, this was a small scale test and I 
>> have never tried it on a production file system.  But if you want to 
>> minimize possible complications, you could always leave this disabled for a 
>> while after the updgrade, and then if things are going well, enable it later 
>> on.
>>
>>> LU-4504 quota out of sync: turn off quota, run e2fsck, turn it on again
>>> - I believe that's something which must be done anyhow quite often,
>>> because there is no quotacheck anymore. It's run in the background when
>>> enabling quotas, but file systems have to be unmounted for this.
>>
>> We didn?t exactly hit this bug, but I will mention that we have had a couple 
>> of instance where e2fsck complained about problems on an OST, and it turned 
>> out that we had to disable and re-enable quotas on the OST to correct the 
>> issue.
>>
>>> LU-4743: We had to remove the CATALOGS file on another file system
>>> (otherwise the MDT wouldn't mount)
>>
>> We hit this problem.
>>
>> Someone I know had to do a Lustre upgrade, and they suggested that I apply a 
>> patch for LU-4708 (which I did).  But if you upgrade to Lustre 2.5.2 or 
>> later, that patch should already be included.
>>
>> My only other advice is to test as much as possible prior to the upgrade.  
>> If you have a little test hardware, install the same Lustre 1.8 version you 
>> are currently running in production and then try upgrading that to the new 
>> Lustre version.  I think preparation is the key.  I think I spent about 2 
>> months reading about upgrade procedures, talking with others who have 
>> upgraded, reading JIRA bug reports, and running tests on hardware.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?

2015-09-11 Thread Chris Hunter



On 09/11/2015 03:41 AM, Martin Hecht wrote:

On 09/11/2015 05:23 AM, Dilger, Andreas wrote:

On 2015/09/10, 6:54 PM, "Chris Hunter"  wrote:


We experienced file corruption on several OSTs. We proceeded through
recovery using e2fsck & ll_recover_lost_found_obj tools.
Following these steps, e2fsck came out clean.

The file corruption did not impact the MDT. The files were still
referenced by the MDT. Accessing the file on a lustre client (ie. ls -l)
would report error "Cannot allocate memory"

Following OST recovery steps, we started removing the corrupt files via
"unlink" command on lustre client (rm command would not remove file).

Now dry-run e2fsck of the OST is reporting errors:
"deleted/unused inodes" in Pass 2 (checking directory structure),
"Unattached inodes" in Pass 4 (checking reference counts)
"free block count wrong" in Pass 5 (checking group summary information).

Is e2fsck errors expected when unlinking files ?

No, the "unlink" command is just avoiding the -ENOENT error that "rm" gets
by calling "stat()" on the file before trying to unlink it.  This
shouldn't cause any errors on the OSTs, unless there is ongoing corruption
from the back-end storage.

Chris, with "live filesystem" you mean that you ran a readonly e2fsck on
a lustre file system while it was mounted and clients working on the
file system? Then, it is expected that e2fsck reports some error,
because the file system contents changes while the e2fsck is running and
the in-memory directory structure does not fit to the on-disk data
anymore. However, as Andreas points out, it might as well be a sign of
ongoing corruption on the storage, but only an offline e2fsck (i.e.
while the OST is unmounted, and the journal is played back) can clarify
this.
Hi Martin, good point. The filesystem is active (3 clients) so e2fsck 
errors could be due to uncommitted journal transactions.
It would be nice to rule out underlying hardware issues before we do a 
full e2fsck.

thanks,
chris hunter
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 1.8 client on 3.13.0 kernel

2015-09-11 Thread Chris Hunter

Hi
I believe FID & dirdata feature changes would only affect the MDT during 
a lustre upgrade. In hindsight/retrospective do you think a file-level 
backup/restore of the MDT would have avoided some of these issues ?


thanks
chris hunter


On 9/10/15 11:17 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote:

Lewis,

I did an upgrade from Lustre 1.8.6 to 2.4.3 on our servers, and for the most 
part things went pretty good.  I?ll chime in on a couple of Martin?s points and 
mention a few other things.


On Sep 10, 2015, at 9:30 AM, Martin Hecht  wrote:

In any case the file systems should be clean before starting the
upgrade, so I would recommend to run e2fsck on all targets and repair
them before starting the upgrade. We did so, but unfortunately our
e2fsprogs were not really up to date and after our lustre upgrade a lot
of fixes for e2fsprogs were committed to whamclouds e2fsprogs git. So,
probably some errors on the file systems were still present, but
unnoticed when we did the upgrade.


This is a very important point.  While I didn?t run e2fsck before the upgrade 
(but maybe I should have), I made sure to install the latest e2fsprogs.


Lustre 2 introduces the FID (which is something like an inode number,
where lustre 1.8 used the inode number of the underlying ldiskfs, but
with the possibility to have several MDTs in one file system a
replacement was needed). The FID is stored in the inode, but it can also
be activated that the FIDs are stored in the directory node, which makes
lookups faster, especially when there are many files in a directory.
However, there were bugs in the code that takes care about adding the
FID to the directory entry when the file system is converted from 1.8 to
2.x. So, I would recommend to use a version in which these bug are
solved. We went to 2.4.1 that time. By default this fid_in_dirent
feature is not automatically enabled, however, this is the only point
where a performance boost may be expected... so we took the risk to
enable this... and ran into some bugs.


Enabling fid_in_dirent prevents you from backing out of the upgrade.  In 
theory, if you upgraded to Lustre 2.x without enabling fid_in_dirent, you could 
always revert back to Lustre 1.8.  We tried this on a test system, and the 
downgrade seemed to work.  However, this was a small scale test and I have 
never tried it on a production file system.  But if you want to minimize 
possible complications, you could always leave this disabled for a while after 
the updgrade, and then if things are going well, enable it later on.


LU-4504 quota out of sync: turn off quota, run e2fsck, turn it on again
- I believe that's something which must be done anyhow quite often,
because there is no quotacheck anymore. It's run in the background when
enabling quotas, but file systems have to be unmounted for this.


We didn?t exactly hit this bug, but I will mention that we have had a couple of 
instance where e2fsck complained about problems on an OST, and it turned out 
that we had to disable and re-enable quotas on the OST to correct the issue.


LU-4743: We had to remove the CATALOGS file on another file system
(otherwise the MDT wouldn't mount)


We hit this problem.

Someone I know had to do a Lustre upgrade, and they suggested that I apply a 
patch for LU-4708 (which I did).  But if you upgrade to Lustre 2.5.2 or later, 
that patch should already be included.

My only other advice is to test as much as possible prior to the upgrade.  If 
you have a little test hardware, install the same Lustre 1.8 version you are 
currently running in production and then try upgrading that to the new Lustre 
version.  I think preparation is the key.  I think I spent about 2 months 
reading about upgrade procedures, talking with others who have upgraded, 
reading JIRA bug reports, and running tests on hardware.

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org