Re: [lustre-discuss] 1.8 client on 3.13.0 kernel
a few more comments in-line On 09/10/2015 09:11 PM, Lewis Hyatt wrote: > Thanks a lot for the info, a little more optimistic :-). > > -Lewis > > On 9/10/15 11:17 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote: >> Lewis, >> >> I did an upgrade from Lustre 1.8.6 to 2.4.3 on our servers, and for >> the most part things went pretty good. I’ll chime in on a couple of >> Martin’s points and mention a few other things. >> >>> On Sep 10, 2015, at 9:30 AM, Martin Hechtwrote: >>> >>> In any case the file systems should be clean before starting the >>> upgrade, so I would recommend to run e2fsck on all targets and repair >>> them before starting the upgrade. We did so, but unfortunately our >>> e2fsprogs were not really up to date and after our lustre upgrade a lot >>> of fixes for e2fsprogs were committed to whamclouds e2fsprogs git. So, >>> probably some errors on the file systems were still present, but >>> unnoticed when we did the upgrade. >> >> This is a very important point. While I didn’t run e2fsck before the >> upgrade (but maybe I should have), I made sure to install the latest >> e2fsprogs. well, a version of the e2fsprogs with some important fixes was released shortly after we did the upgrade. Maybe this was just because we ran into these bugs, and the vendor escalated our tickets to whamcloud/intel >> >>> Lustre 2 introduces the FID (which is something like an inode number, >>> where lustre 1.8 used the inode number of the underlying ldiskfs, but >>> with the possibility to have several MDTs in one file system a >>> replacement was needed). The FID is stored in the inode, but it can >>> also >>> be activated that the FIDs are stored in the directory node, which >>> makes >>> lookups faster, especially when there are many files in a directory. >>> However, there were bugs in the code that takes care about adding the >>> FID to the directory entry when the file system is converted from >>> 1.8 to >>> 2.x. So, I would recommend to use a version in which these bug are >>> solved. We went to 2.4.1 that time. By default this fid_in_dirent >>> feature is not automatically enabled, however, this is the only point >>> where a performance boost may be expected... so we took the risk to >>> enable this... and ran into some bugs. >> >> Enabling fid_in_dirent prevents you from backing out of the upgrade. >> In theory, if you upgraded to Lustre 2.x without enabling >> fid_in_dirent, you could always revert back to Lustre 1.8. We tried >> this on a test system, and the downgrade seemed to work. However, >> this was a small scale test and I have never tried it on a production >> file system. But if you want to minimize possible complications, you >> could always leave this disabled for a while after the updgrade, and >> then if things are going well, enable it later on. actually, the FID is added to new contents, and you have to run the oi_scrub once to convert the file system. That might be important to know when you decide to use this feature. On the other hand, if you don't enable fid_in_dirent, you can go back theoretically, but I think the FID is still added to regular files (not to the directory entry), and you can't read these files created with lustre 2 after the downgrade. However, running lustre 2 without fid_in_dirent is possiblem at least in the earlier 2.x versions - about 2.5 onwards you would have to double check. This is sometimes called "Compatibility Mode IGIF" Anyhow, to avoid running into the problem with the directory entries, I would also recommend not to enable fid_in_dirent or make sure to choose a version which has all the fixes for this problem. There are different types of directories, large and small ones which have a different structure, and the issue was already fixed for some cases, but we have hit another case which was not correctly handled until we hit that bug with our upgrade. >> >> My only other advice is to test as much as possible prior to the >> upgrade. If you have a little test hardware, install the same Lustre >> 1.8 version you are currently running in production and then try >> upgrading that to the new Lustre version. I think preparation is the >> key. I think I spent about 2 months reading about upgrade >> procedures, talking with others who have upgraded, reading JIRA bug >> reports, and running tests on hardware. well, our vendor was preparing the upgrade for about a year and did intensive testing on several file systems and they changed the targeted lustre version several times. The problem is that some bugs are only hit on the real production system. For instance the fid_in_dirent issue: It depends on the number of files in the directory, and you only notice the bug when you have upgraded the file system and try to move some files from such a directory to another place. I'm not sure if it has to be a directory created after the upgrade, maybe the destination just has to be a different directory. But to be honest you wouldn't test this
Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?
On 09/11/2015 05:23 AM, Dilger, Andreas wrote: > On 2015/09/10, 6:54 PM, "Chris Hunter"wrote: > >> We experienced file corruption on several OSTs. We proceeded through >> recovery using e2fsck & ll_recover_lost_found_obj tools. >> Following these steps, e2fsck came out clean. >> >> The file corruption did not impact the MDT. The files were still >> referenced by the MDT. Accessing the file on a lustre client (ie. ls -l) >> would report error "Cannot allocate memory" >> >> Following OST recovery steps, we started removing the corrupt files via >> "unlink" command on lustre client (rm command would not remove file). >> >> Now dry-run e2fsck of the OST is reporting errors: >> "deleted/unused inodes" in Pass 2 (checking directory structure), >> "Unattached inodes" in Pass 4 (checking reference counts) >> "free block count wrong" in Pass 5 (checking group summary information). >> >> Is e2fsck errors expected when unlinking files ? > No, the "unlink" command is just avoiding the -ENOENT error that "rm" gets > by calling "stat()" on the file before trying to unlink it. This > shouldn't cause any errors on the OSTs, unless there is ongoing corruption > from the back-end storage. Chris, with "live filesystem" you mean that you ran a readonly e2fsck on a lustre file system while it was mounted and clients working on the file system? Then, it is expected that e2fsck reports some error, because the file system contents changes while the e2fsck is running and the in-memory directory structure does not fit to the on-disk data anymore. However, as Andreas points out, it might as well be a sign of ongoing corruption on the storage, but only an offline e2fsck (i.e. while the OST is unmounted, and the journal is played back) can clarify this. regards, Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] 1.8 client on 3.13.0 kernel
Having an MDT backup might perhaps have allowed recovery and trying an improved upgrade process and/or upgrading to a version with the fixes in it. It's not a bad idea if practical. (And yes, the changes are MDT specific.) By the way, the fid-in-dirent bug that Martin described is fixed in the most recent 2.5 from Intel, but I don't think it's fixed in 2.4? Unsure. But I'd recommend targeting 2.5 as the destination version for an upgrade. From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of Chris Hunter [chris.hun...@yale.edu] Sent: Friday, September 11, 2015 8:02 AM To: lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] 1.8 client on 3.13.0 kernel Hi I believe FID & dirdata feature changes would only affect the MDT during a lustre upgrade. In hindsight/retrospective do you think a file-level backup/restore of the MDT would have avoided some of these issues ? thanks chris hunter > On 9/10/15 11:17 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote: >> Lewis, >> >> I did an upgrade from Lustre 1.8.6 to 2.4.3 on our servers, and for the most >> part things went pretty good. I?ll chime in on a couple of Martin?s points >> and mention a few other things. >> >>> On Sep 10, 2015, at 9:30 AM, Martin Hechtwrote: >>> >>> In any case the file systems should be clean before starting the >>> upgrade, so I would recommend to run e2fsck on all targets and repair >>> them before starting the upgrade. We did so, but unfortunately our >>> e2fsprogs were not really up to date and after our lustre upgrade a lot >>> of fixes for e2fsprogs were committed to whamclouds e2fsprogs git. So, >>> probably some errors on the file systems were still present, but >>> unnoticed when we did the upgrade. >> >> This is a very important point. While I didn?t run e2fsck before the >> upgrade (but maybe I should have), I made sure to install the latest >> e2fsprogs. >> >>> Lustre 2 introduces the FID (which is something like an inode number, >>> where lustre 1.8 used the inode number of the underlying ldiskfs, but >>> with the possibility to have several MDTs in one file system a >>> replacement was needed). The FID is stored in the inode, but it can also >>> be activated that the FIDs are stored in the directory node, which makes >>> lookups faster, especially when there are many files in a directory. >>> However, there were bugs in the code that takes care about adding the >>> FID to the directory entry when the file system is converted from 1.8 to >>> 2.x. So, I would recommend to use a version in which these bug are >>> solved. We went to 2.4.1 that time. By default this fid_in_dirent >>> feature is not automatically enabled, however, this is the only point >>> where a performance boost may be expected... so we took the risk to >>> enable this... and ran into some bugs. >> >> Enabling fid_in_dirent prevents you from backing out of the upgrade. In >> theory, if you upgraded to Lustre 2.x without enabling fid_in_dirent, you >> could always revert back to Lustre 1.8. We tried this on a test system, and >> the downgrade seemed to work. However, this was a small scale test and I >> have never tried it on a production file system. But if you want to >> minimize possible complications, you could always leave this disabled for a >> while after the updgrade, and then if things are going well, enable it later >> on. >> >>> LU-4504 quota out of sync: turn off quota, run e2fsck, turn it on again >>> - I believe that's something which must be done anyhow quite often, >>> because there is no quotacheck anymore. It's run in the background when >>> enabling quotas, but file systems have to be unmounted for this. >> >> We didn?t exactly hit this bug, but I will mention that we have had a couple >> of instance where e2fsck complained about problems on an OST, and it turned >> out that we had to disable and re-enable quotas on the OST to correct the >> issue. >> >>> LU-4743: We had to remove the CATALOGS file on another file system >>> (otherwise the MDT wouldn't mount) >> >> We hit this problem. >> >> Someone I know had to do a Lustre upgrade, and they suggested that I apply a >> patch for LU-4708 (which I did). But if you upgrade to Lustre 2.5.2 or >> later, that patch should already be included. >> >> My only other advice is to test as much as possible prior to the upgrade. >> If you have a little test hardware, install the same Lustre 1.8 version you >> are currently running in production and then try upgrading that to the new >> Lustre version. I think preparation is the key. I think I spent about 2 >> months reading about upgrade procedures, talking with others who have >> upgraded, reading JIRA bug reports, and running tests on hardware. ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?
On 09/11/2015 03:41 AM, Martin Hecht wrote: On 09/11/2015 05:23 AM, Dilger, Andreas wrote: On 2015/09/10, 6:54 PM, "Chris Hunter"wrote: We experienced file corruption on several OSTs. We proceeded through recovery using e2fsck & ll_recover_lost_found_obj tools. Following these steps, e2fsck came out clean. The file corruption did not impact the MDT. The files were still referenced by the MDT. Accessing the file on a lustre client (ie. ls -l) would report error "Cannot allocate memory" Following OST recovery steps, we started removing the corrupt files via "unlink" command on lustre client (rm command would not remove file). Now dry-run e2fsck of the OST is reporting errors: "deleted/unused inodes" in Pass 2 (checking directory structure), "Unattached inodes" in Pass 4 (checking reference counts) "free block count wrong" in Pass 5 (checking group summary information). Is e2fsck errors expected when unlinking files ? No, the "unlink" command is just avoiding the -ENOENT error that "rm" gets by calling "stat()" on the file before trying to unlink it. This shouldn't cause any errors on the OSTs, unless there is ongoing corruption from the back-end storage. Chris, with "live filesystem" you mean that you ran a readonly e2fsck on a lustre file system while it was mounted and clients working on the file system? Then, it is expected that e2fsck reports some error, because the file system contents changes while the e2fsck is running and the in-memory directory structure does not fit to the on-disk data anymore. However, as Andreas points out, it might as well be a sign of ongoing corruption on the storage, but only an offline e2fsck (i.e. while the OST is unmounted, and the journal is played back) can clarify this. Hi Martin, good point. The filesystem is active (3 clients) so e2fsck errors could be due to uncommitted journal transactions. It would be nice to rule out underlying hardware issues before we do a full e2fsck. thanks, chris hunter ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] 1.8 client on 3.13.0 kernel
Hi I believe FID & dirdata feature changes would only affect the MDT during a lustre upgrade. In hindsight/retrospective do you think a file-level backup/restore of the MDT would have avoided some of these issues ? thanks chris hunter On 9/10/15 11:17 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote: Lewis, I did an upgrade from Lustre 1.8.6 to 2.4.3 on our servers, and for the most part things went pretty good. I?ll chime in on a couple of Martin?s points and mention a few other things. On Sep 10, 2015, at 9:30 AM, Martin Hechtwrote: In any case the file systems should be clean before starting the upgrade, so I would recommend to run e2fsck on all targets and repair them before starting the upgrade. We did so, but unfortunately our e2fsprogs were not really up to date and after our lustre upgrade a lot of fixes for e2fsprogs were committed to whamclouds e2fsprogs git. So, probably some errors on the file systems were still present, but unnoticed when we did the upgrade. This is a very important point. While I didn?t run e2fsck before the upgrade (but maybe I should have), I made sure to install the latest e2fsprogs. Lustre 2 introduces the FID (which is something like an inode number, where lustre 1.8 used the inode number of the underlying ldiskfs, but with the possibility to have several MDTs in one file system a replacement was needed). The FID is stored in the inode, but it can also be activated that the FIDs are stored in the directory node, which makes lookups faster, especially when there are many files in a directory. However, there were bugs in the code that takes care about adding the FID to the directory entry when the file system is converted from 1.8 to 2.x. So, I would recommend to use a version in which these bug are solved. We went to 2.4.1 that time. By default this fid_in_dirent feature is not automatically enabled, however, this is the only point where a performance boost may be expected... so we took the risk to enable this... and ran into some bugs. Enabling fid_in_dirent prevents you from backing out of the upgrade. In theory, if you upgraded to Lustre 2.x without enabling fid_in_dirent, you could always revert back to Lustre 1.8. We tried this on a test system, and the downgrade seemed to work. However, this was a small scale test and I have never tried it on a production file system. But if you want to minimize possible complications, you could always leave this disabled for a while after the updgrade, and then if things are going well, enable it later on. LU-4504 quota out of sync: turn off quota, run e2fsck, turn it on again - I believe that's something which must be done anyhow quite often, because there is no quotacheck anymore. It's run in the background when enabling quotas, but file systems have to be unmounted for this. We didn?t exactly hit this bug, but I will mention that we have had a couple of instance where e2fsck complained about problems on an OST, and it turned out that we had to disable and re-enable quotas on the OST to correct the issue. LU-4743: We had to remove the CATALOGS file on another file system (otherwise the MDT wouldn't mount) We hit this problem. Someone I know had to do a Lustre upgrade, and they suggested that I apply a patch for LU-4708 (which I did). But if you upgrade to Lustre 2.5.2 or later, that patch should already be included. My only other advice is to test as much as possible prior to the upgrade. If you have a little test hardware, install the same Lustre 1.8 version you are currently running in production and then try upgrading that to the new Lustre version. I think preparation is the key. I think I spent about 2 months reading about upgrade procedures, talking with others who have upgraded, reading JIRA bug reports, and running tests on hardware. ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org