Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?
On 09/11/2015 03:41 AM, Martin Hecht wrote: On 09/11/2015 05:23 AM, Dilger, Andreas wrote: On 2015/09/10, 6:54 PM, "Chris Hunter" wrote: We experienced file corruption on several OSTs. We proceeded through recovery using e2fsck & ll_recover_lost_found_obj tools. Following these steps, e2fsck came out clean. The file corruption did not impact the MDT. The files were still referenced by the MDT. Accessing the file on a lustre client (ie. ls -l) would report error "Cannot allocate memory" Following OST recovery steps, we started removing the corrupt files via "unlink" command on lustre client (rm command would not remove file). Now dry-run e2fsck of the OST is reporting errors: "deleted/unused inodes" in Pass 2 (checking directory structure), "Unattached inodes" in Pass 4 (checking reference counts) "free block count wrong" in Pass 5 (checking group summary information). Is e2fsck errors expected when unlinking files ? No, the "unlink" command is just avoiding the -ENOENT error that "rm" gets by calling "stat()" on the file before trying to unlink it. This shouldn't cause any errors on the OSTs, unless there is ongoing corruption from the back-end storage. Chris, with "live filesystem" you mean that you ran a readonly e2fsck on a lustre file system while it was mounted and clients working on the file system? Then, it is expected that e2fsck reports some error, because the file system contents changes while the e2fsck is running and the in-memory directory structure does not fit to the on-disk data anymore. However, as Andreas points out, it might as well be a sign of ongoing corruption on the storage, but only an offline e2fsck (i.e. while the OST is unmounted, and the journal is played back) can clarify this. Hi Martin, good point. The filesystem is active (3 clients) so e2fsck errors could be due to uncommitted journal transactions. It would be nice to rule out underlying hardware issues before we do a full e2fsck. thanks, chris hunter ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?
On 09/11/2015 05:23 AM, Dilger, Andreas wrote: > On 2015/09/10, 6:54 PM, "Chris Hunter" wrote: > >> We experienced file corruption on several OSTs. We proceeded through >> recovery using e2fsck & ll_recover_lost_found_obj tools. >> Following these steps, e2fsck came out clean. >> >> The file corruption did not impact the MDT. The files were still >> referenced by the MDT. Accessing the file on a lustre client (ie. ls -l) >> would report error "Cannot allocate memory" >> >> Following OST recovery steps, we started removing the corrupt files via >> "unlink" command on lustre client (rm command would not remove file). >> >> Now dry-run e2fsck of the OST is reporting errors: >> "deleted/unused inodes" in Pass 2 (checking directory structure), >> "Unattached inodes" in Pass 4 (checking reference counts) >> "free block count wrong" in Pass 5 (checking group summary information). >> >> Is e2fsck errors expected when unlinking files ? > No, the "unlink" command is just avoiding the -ENOENT error that "rm" gets > by calling "stat()" on the file before trying to unlink it. This > shouldn't cause any errors on the OSTs, unless there is ongoing corruption > from the back-end storage. Chris, with "live filesystem" you mean that you ran a readonly e2fsck on a lustre file system while it was mounted and clients working on the file system? Then, it is expected that e2fsck reports some error, because the file system contents changes while the e2fsck is running and the in-memory directory structure does not fit to the on-disk data anymore. However, as Andreas points out, it might as well be a sign of ongoing corruption on the storage, but only an offline e2fsck (i.e. while the OST is unmounted, and the journal is played back) can clarify this. regards, Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?
Hi Chris, On 09/02/2015 07:18 AM, Chris Hunter wrote: > Hi Andreas > > On 09/01/2015 07:22 PM, Dilger, Andreas wrote: >> On 2015/09/01, 7:59 AM, "lustre-discuss on behalf of Chris Hunter" >> > chris.hun...@yale.edu> wrote: >> >>> Hi Andreas, >>> Thanks for your help. >>> >>> If you have a striped lustre file with "holes" (ie. one chunk is gone >>> due hardware failure, etc.) are the remaining file chunks considered >>> orphan objects ? > So when a lustre striped file has a hole (eg. missing chunk due to > hardware failure), the remaining file chunks stay indefinitely on the > OSTs. > Is there a way to reclaim the space occupied by these pieces (after > recovery of any usuable data, etc.)? these remaining chunks still belong to the file (i.e. you have the metadata entry on the MDT and you see the file when lustre is mounted). By removing the file you free up the space. In general there are two types of inconsistencies which may occur: Orphan objects are objects which are NOT assigned to an entry on the MDT, i.e. chunks which do not belong to any file. These can be either pre-allocated chunks or chunks left over after a corruption of the metadata on the MDT. The other type of corruption is that you have a file, where chunks are missing in-between. This can happen, when an OST gets corrupted. As long as the MDT is Ok, you should be able to remove such a file. If in addition the MDT is also corrupted, you should first fix the MDT, and you might then only be able to unlink the file (which again might leave some orphan objects on the OSTs). lfsck should be able to remove them, depending on the lustre version you are running... Another point: When the OST got corrupted, after having them repaired with e2fsck, you can mount them as ldiskfs and see if there are chunks in lost+found and use the tool ll_recover_lost_found_objs to restore them in the original place. I believe these objects which e2fsck puts in lost+found are another kind of thing, usually not called "orphan objects". As I said, they usually can be easily recovered. Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?
Hi Andreas On 09/01/2015 07:22 PM, Dilger, Andreas wrote: On 2015/09/01, 7:59 AM, "lustre-discuss on behalf of Chris Hunter" wrote: Hi Andreas, Thanks for your help. If you have a striped lustre file with "holes" (ie. one chunk is gone due hardware failure, etc.) are the remaining file chunks considered orphan objects ? So when a lustre striped file has a hole (eg. missing chunk due to hardware failure), the remaining file chunks stay indefinitely on the OSTs. Is there a way to reclaim the space occupied by these pieces (after recovery of any usuable data, etc.)? AFAIK the online lfsck tool will scrub orphan objects. When mounting a OST on our oss server, I see syslog messages such as: Aug 31 23:20:45 oss1 kernel: Lustre: test-OST0002: deleting orphan objects from 0x0:228989008 to 0x0:228989127 Which leads me to believe these OST objects are subject to removal. However I don't know what exactly are orphan objects. These "orphan objects" are just precreated OST objects that were never allocated to MDS files before the MDS or OSS crashed (or were allocated before the MDS crashed but the client didn't complete recovery). They are unrelated to the problem you describe. Cheers, Andreas On 09/01/2015 12:58 AM, Dilger, Andreas wrote: On 2015/08/31, 3:46 PM, "HPDD-discuss on behalf of Chris Hunter" wrote: I am recovering from lustre OST failure and subsequent file corruption. We have striped files each with 1 missing chunk. I would like to dump the remaining file chunks from the OST. We have some tools (eg. debugfs) to grab the good chunks. My question, if we put the filesystem into production (ie. users start writing new files). What will happen to these good chunks ? Does lustre consider these "orphan" inodes (and lfsck deletes them) ? Since it was the OST that failed and not the MDT, then the remaining OST objects would not be removed. You can read the good chunks of such a file using: dd if= of=.new bs=1M conv=sync,noerror count= truncate --size= .new The "conv=sync,noerror" allows reading from the file without failing for the read errors returned from the missing stripe. However, this also prevents the dd from stopping when it hits the end of file, so the number of chunks to be read needs to be specified. Cheers, Andreas ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=AwIFAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=d_G2h_sZYG4xtHMeKo8QgjDmOcMVdQvYgM-5Dri1AOY&m=8a9pLNBThwNbZdkDsl_YKCAgEcnemEE2lnGA7CXhsrk&s=WogDVnKQv5gLqq3znYEOx_BaSQSBRJLNJYRjRKA3H9M&e= Cheers, Andreas Thanks, chris hunter ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?
On 2015/09/01, 7:59 AM, "lustre-discuss on behalf of Chris Hunter" wrote: >Hi Andreas, >Thanks for your help. > >If you have a striped lustre file with "holes" (ie. one chunk is gone >due hardware failure, etc.) are the remaining file chunks considered >orphan objects ? > >AFAIK the online lfsck tool will scrub orphan objects. When mounting a >OST on our oss server, I see syslog messages such as: >> Aug 31 23:20:45 oss1 kernel: Lustre: test-OST0002: deleting orphan >>objects from 0x0:228989008 to 0x0:228989127 > >Which leads me to believe these OST objects are subject to removal. >However I don't know what exactly are orphan objects. These "orphan objects" are just precreated OST objects that were never allocated to MDS files before the MDS or OSS crashed (or were allocated before the MDS crashed but the client didn't complete recovery). They are unrelated to the problem you describe. Cheers, Andreas >On 09/01/2015 12:58 AM, Dilger, Andreas wrote: >> On 2015/08/31, 3:46 PM, "HPDD-discuss on behalf of Chris Hunter" >> >> wrote: >> >>> I am recovering from lustre OST failure and subsequent file corruption. >>> We have striped files each with 1 missing chunk. I would like to dump >>> the remaining file chunks from the OST. We have some tools (eg. >>>debugfs) >>> to grab the good chunks. >>> >>> My question, if we put the filesystem into production (ie. users start >>> writing new files). What will happen to these good chunks ? >>> >>> Does lustre consider these "orphan" inodes (and lfsck deletes them) ? >> >> Since it was the OST that failed and not the MDT, then the remaining OST >> objects would not be removed. >> >> You can read the good chunks of such a file using: >> >>dd if= of=.new bs=1M conv=sync,noerror count= >>truncate --size= .new >> >> The "conv=sync,noerror" allows reading from the file without failing >> for the read errors returned from the missing stripe. However, this >> also prevents the dd from stopping when it hits the end of file, so >> the number of chunks to be read needs to be specified. >> >> Cheers, Andreas >> >___ >lustre-discuss mailing list >lustre-discuss@lists.lustre.org >http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > Cheers, Andreas -- Andreas Dilger Lustre Software Architect Intel High Performance Data Division ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?
Hi Andreas, Thanks for your help. If you have a striped lustre file with "holes" (ie. one chunk is gone due hardware failure, etc.) are the remaining file chunks considered orphan objects ? AFAIK the online lfsck tool will scrub orphan objects. When mounting a OST on our oss server, I see syslog messages such as: Aug 31 23:20:45 oss1 kernel: Lustre: test-OST0002: deleting orphan objects from 0x0:228989008 to 0x0:228989127 Which leads me to believe these OST objects are subject to removal. However I don't know what exactly are orphan objects. thanks, chris hunter chris.hun...@yale.edu On 09/01/2015 12:58 AM, Dilger, Andreas wrote: On 2015/08/31, 3:46 PM, "HPDD-discuss on behalf of Chris Hunter" wrote: I am recovering from lustre OST failure and subsequent file corruption. We have striped files each with 1 missing chunk. I would like to dump the remaining file chunks from the OST. We have some tools (eg. debugfs) to grab the good chunks. My question, if we put the filesystem into production (ie. users start writing new files). What will happen to these good chunks ? Does lustre consider these "orphan" inodes (and lfsck deletes them) ? Since it was the OST that failed and not the MDT, then the remaining OST objects would not be removed. You can read the good chunks of such a file using: dd if= of=.new bs=1M conv=sync,noerror count= truncate --size= .new The "conv=sync,noerror" allows reading from the file without failing for the read errors returned from the missing stripe. However, this also prevents the dd from stopping when it hits the end of file, so the number of chunks to be read needs to be specified. Cheers, Andreas ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?
On 2015/08/31, 3:46 PM, "HPDD-discuss on behalf of Chris Hunter" wrote: >I am recovering from lustre OST failure and subsequent file corruption. >We have striped files each with 1 missing chunk. I would like to dump >the remaining file chunks from the OST. We have some tools (eg. debugfs) >to grab the good chunks. > >My question, if we put the filesystem into production (ie. users start >writing new files). What will happen to these good chunks ? > >Does lustre consider these "orphan" inodes (and lfsck deletes them) ? Since it was the OST that failed and not the MDT, then the remaining OST objects would not be removed. You can read the good chunks of such a file using: dd if= of=.new bs=1M conv=sync,noerror count= truncate --size= .new The "conv=sync,noerror" allows reading from the file without failing for the read errors returned from the missing stripe. However, this also prevents the dd from stopping when it hits the end of file, so the number of chunks to be read needs to be specified. Cheers, Andreas -- Andreas Dilger Lustre Software Architect Intel High Performance Data Division ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org