Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?

2015-09-11 Thread Chris Hunter



On 09/11/2015 03:41 AM, Martin Hecht wrote:

On 09/11/2015 05:23 AM, Dilger, Andreas wrote:

On 2015/09/10, 6:54 PM, "Chris Hunter"  wrote:


We experienced file corruption on several OSTs. We proceeded through
recovery using e2fsck & ll_recover_lost_found_obj tools.
Following these steps, e2fsck came out clean.

The file corruption did not impact the MDT. The files were still
referenced by the MDT. Accessing the file on a lustre client (ie. ls -l)
would report error "Cannot allocate memory"

Following OST recovery steps, we started removing the corrupt files via
"unlink" command on lustre client (rm command would not remove file).

Now dry-run e2fsck of the OST is reporting errors:
"deleted/unused inodes" in Pass 2 (checking directory structure),
"Unattached inodes" in Pass 4 (checking reference counts)
"free block count wrong" in Pass 5 (checking group summary information).

Is e2fsck errors expected when unlinking files ?

No, the "unlink" command is just avoiding the -ENOENT error that "rm" gets
by calling "stat()" on the file before trying to unlink it.  This
shouldn't cause any errors on the OSTs, unless there is ongoing corruption
from the back-end storage.

Chris, with "live filesystem" you mean that you ran a readonly e2fsck on
a lustre file system while it was mounted and clients working on the
file system? Then, it is expected that e2fsck reports some error,
because the file system contents changes while the e2fsck is running and
the in-memory directory structure does not fit to the on-disk data
anymore. However, as Andreas points out, it might as well be a sign of
ongoing corruption on the storage, but only an offline e2fsck (i.e.
while the OST is unmounted, and the journal is played back) can clarify
this.
Hi Martin, good point. The filesystem is active (3 clients) so e2fsck 
errors could be due to uncommitted journal transactions.
It would be nice to rule out underlying hardware issues before we do a 
full e2fsck.

thanks,
chris hunter
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?

2015-09-11 Thread Martin Hecht
On 09/11/2015 05:23 AM, Dilger, Andreas wrote:
> On 2015/09/10, 6:54 PM, "Chris Hunter"  wrote:
>
>> We experienced file corruption on several OSTs. We proceeded through
>> recovery using e2fsck & ll_recover_lost_found_obj tools.
>> Following these steps, e2fsck came out clean.
>>
>> The file corruption did not impact the MDT. The files were still
>> referenced by the MDT. Accessing the file on a lustre client (ie. ls -l)
>> would report error "Cannot allocate memory"
>>
>> Following OST recovery steps, we started removing the corrupt files via
>> "unlink" command on lustre client (rm command would not remove file).
>>
>> Now dry-run e2fsck of the OST is reporting errors:
>> "deleted/unused inodes" in Pass 2 (checking directory structure),
>> "Unattached inodes" in Pass 4 (checking reference counts)
>> "free block count wrong" in Pass 5 (checking group summary information).
>>
>> Is e2fsck errors expected when unlinking files ?
> No, the "unlink" command is just avoiding the -ENOENT error that "rm" gets
> by calling "stat()" on the file before trying to unlink it.  This
> shouldn't cause any errors on the OSTs, unless there is ongoing corruption
> from the back-end storage.
Chris, with "live filesystem" you mean that you ran a readonly e2fsck on
a lustre file system while it was mounted and clients working on the
file system? Then, it is expected that e2fsck reports some error,
because the file system contents changes while the e2fsck is running and
the in-memory directory structure does not fit to the on-disk data
anymore. However, as Andreas points out, it might as well be a sign of
ongoing corruption on the storage, but only an offline e2fsck (i.e.
while the OST is unmounted, and the journal is played back) can clarify
this. 

regards,
Martin



smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?

2015-09-03 Thread Martin Hecht
Hi Chris,

On 09/02/2015 07:18 AM, Chris Hunter wrote:
> Hi Andreas
>
> On 09/01/2015 07:22 PM, Dilger, Andreas wrote:
>> On 2015/09/01, 7:59 AM, "lustre-discuss on behalf of Chris Hunter"
>> > chris.hun...@yale.edu> wrote:
>>
>>> Hi Andreas,
>>> Thanks for your help.
>>>
>>> If you have a striped lustre file with "holes" (ie. one chunk is gone
>>> due hardware failure, etc.) are the remaining file chunks considered
>>> orphan objects ?
> So when a lustre striped file has a hole (eg. missing chunk due to
> hardware failure), the remaining file chunks stay indefinitely on the
> OSTs.
> Is there a way to reclaim the space occupied by these pieces (after
> recovery of any usuable data, etc.)?
these remaining chunks still belong to the file (i.e. you have the
metadata entry on the MDT and you see the file when lustre is mounted).
By removing the file you free up the space.

In general there are two types of inconsistencies which may occur:
Orphan objects are objects which are NOT assigned to an entry on the
MDT, i.e. chunks which do not belong to any file. These can be either
pre-allocated chunks or chunks left over after a corruption of the
metadata on the MDT.

The other type of corruption is that you have a file, where chunks are
missing in-between. This can happen, when an OST gets corrupted. As long
as the MDT is Ok, you should be able to remove such a file. If in
addition the MDT is also corrupted, you should first fix the MDT, and
you might then only be able to unlink the file (which again might leave
some orphan objects on the OSTs). lfsck should be able to remove them,
depending on the lustre version you are running...

Another point: When the OST got corrupted, after having them repaired
with e2fsck, you can mount them as ldiskfs and see if there are chunks
in lost+found and use the tool ll_recover_lost_found_objs to restore
them in the original place. I believe these objects which e2fsck puts in
lost+found are another kind of thing, usually not called "orphan
objects". As I said, they usually can be easily recovered.

Martin




smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?

2015-09-01 Thread Chris Hunter

Hi Andreas

On 09/01/2015 07:22 PM, Dilger, Andreas wrote:

On 2015/09/01, 7:59 AM, "lustre-discuss on behalf of Chris Hunter"
 wrote:


Hi Andreas,
Thanks for your help.

If you have a striped lustre file with "holes" (ie. one chunk is gone
due hardware failure, etc.) are the remaining file chunks considered
orphan objects ?
So when a lustre striped file has a hole (eg. missing chunk due to 
hardware failure), the remaining file chunks stay indefinitely on the OSTs.
Is there a way to reclaim the space occupied by these pieces (after 
recovery of any usuable data, etc.)?




AFAIK the online lfsck tool will scrub orphan objects. When mounting a
OST on our oss server, I see syslog messages such as:

Aug 31 23:20:45 oss1 kernel: Lustre: test-OST0002: deleting orphan
objects from 0x0:228989008 to 0x0:228989127


Which leads me to believe these OST objects are subject to removal.
However I don't know what exactly are orphan objects.


These "orphan objects" are just precreated OST objects that were never
allocated to MDS files before the MDS or OSS crashed (or were allocated
before the MDS crashed but the client didn't complete recovery).  They are
unrelated to the problem you describe.

Cheers, Andreas


On 09/01/2015 12:58 AM, Dilger, Andreas wrote:

On 2015/08/31, 3:46 PM, "HPDD-discuss on behalf of Chris Hunter"

wrote:


I am recovering from lustre OST failure and subsequent file corruption.
We have striped files each with 1 missing chunk. I would like to dump
the remaining file chunks from the OST. We have some tools (eg.
debugfs)
to grab the good chunks.

My question, if we put the filesystem into production (ie. users start
writing new files). What will happen to these good chunks ?

Does lustre consider these "orphan" inodes (and lfsck deletes them) ?


Since it was the OST that failed and not the MDT, then the remaining OST
objects would not be removed.

You can read the good chunks of such a file using:

dd if= of=.new bs=1M conv=sync,noerror count=
truncate --size= .new

The "conv=sync,noerror" allows reading from the file without failing
for the read errors returned from the missing stripe.  However, this
also prevents the dd from stopping when it hits the end of file, so
the number of chunks to be read needs to be specified.

Cheers, Andreas


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=AwIFAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=d_G2h_sZYG4xtHMeKo8QgjDmOcMVdQvYgM-5Dri1AOY&m=8a9pLNBThwNbZdkDsl_YKCAgEcnemEE2lnGA7CXhsrk&s=WogDVnKQv5gLqq3znYEOx_BaSQSBRJLNJYRjRKA3H9M&e=




Cheers, Andreas

Thanks,
chris hunter

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?

2015-09-01 Thread Dilger, Andreas
On 2015/09/01, 7:59 AM, "lustre-discuss on behalf of Chris Hunter"
 wrote:

>Hi Andreas,
>Thanks for your help.
>
>If you have a striped lustre file with "holes" (ie. one chunk is gone
>due hardware failure, etc.) are the remaining file chunks considered
>orphan objects ?
>
>AFAIK the online lfsck tool will scrub orphan objects. When mounting a
>OST on our oss server, I see syslog messages such as:
>> Aug 31 23:20:45 oss1 kernel: Lustre: test-OST0002: deleting orphan
>>objects from 0x0:228989008 to 0x0:228989127
>
>Which leads me to believe these OST objects are subject to removal.
>However I don't know what exactly are orphan objects.

These "orphan objects" are just precreated OST objects that were never
allocated to MDS files before the MDS or OSS crashed (or were allocated
before the MDS crashed but the client didn't complete recovery).  They are
unrelated to the problem you describe.

Cheers, Andreas

>On 09/01/2015 12:58 AM, Dilger, Andreas wrote:
>> On 2015/08/31, 3:46 PM, "HPDD-discuss on behalf of Chris Hunter"
>> 
>> wrote:
>>
>>> I am recovering from lustre OST failure and subsequent file corruption.
>>> We have striped files each with 1 missing chunk. I would like to dump
>>> the remaining file chunks from the OST. We have some tools (eg.
>>>debugfs)
>>> to grab the good chunks.
>>>
>>> My question, if we put the filesystem into production (ie. users start
>>> writing new files). What will happen to these good chunks ?
>>>
>>> Does lustre consider these "orphan" inodes (and lfsck deletes them) ?
>>
>> Since it was the OST that failed and not the MDT, then the remaining OST
>> objects would not be removed.
>>
>> You can read the good chunks of such a file using:
>>
>>dd if= of=.new bs=1M conv=sync,noerror count=
>>truncate --size= .new
>>
>> The "conv=sync,noerror" allows reading from the file without failing
>> for the read errors returned from the missing stripe.  However, this
>> also prevents the dd from stopping when it hits the end of file, so
>> the number of chunks to be read needs to be specified.
>>
>> Cheers, Andreas
>>
>___
>lustre-discuss mailing list
>lustre-discuss@lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?

2015-09-01 Thread Chris Hunter

Hi Andreas,
Thanks for your help.

If you have a striped lustre file with "holes" (ie. one chunk is gone 
due hardware failure, etc.) are the remaining file chunks considered 
orphan objects ?


AFAIK the online lfsck tool will scrub orphan objects. When mounting a 
OST on our oss server, I see syslog messages such as:

Aug 31 23:20:45 oss1 kernel: Lustre: test-OST0002: deleting orphan objects from 
0x0:228989008 to 0x0:228989127


Which leads me to believe these OST objects are subject to removal. 
However I don't know what exactly are orphan objects.


thanks,
chris hunter
chris.hun...@yale.edu

On 09/01/2015 12:58 AM, Dilger, Andreas wrote:

On 2015/08/31, 3:46 PM, "HPDD-discuss on behalf of Chris Hunter"

wrote:


I am recovering from lustre OST failure and subsequent file corruption.
We have striped files each with 1 missing chunk. I would like to dump
the remaining file chunks from the OST. We have some tools (eg. debugfs)
to grab the good chunks.

My question, if we put the filesystem into production (ie. users start
writing new files). What will happen to these good chunks ?

Does lustre consider these "orphan" inodes (and lfsck deletes them) ?


Since it was the OST that failed and not the MDT, then the remaining OST
objects would not be removed.

You can read the good chunks of such a file using:

   dd if= of=.new bs=1M conv=sync,noerror count=
   truncate --size= .new

The "conv=sync,noerror" allows reading from the file without failing
for the read errors returned from the missing stripe.  However, this
also prevents the dd from stopping when it hits the end of file, so
the number of chunks to be read needs to be specified.

Cheers, Andreas


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?

2015-08-31 Thread Dilger, Andreas
On 2015/08/31, 3:46 PM, "HPDD-discuss on behalf of Chris Hunter"

wrote:

>I am recovering from lustre OST failure and subsequent file corruption.
>We have striped files each with 1 missing chunk. I would like to dump
>the remaining file chunks from the OST. We have some tools (eg. debugfs)
>to grab the good chunks.
>
>My question, if we put the filesystem into production (ie. users start
>writing new files). What will happen to these good chunks ?
>
>Does lustre consider these "orphan" inodes (and lfsck deletes them) ?

Since it was the OST that failed and not the MDT, then the remaining OST
objects would not be removed.

You can read the good chunks of such a file using:

  dd if= of=.new bs=1M conv=sync,noerror count=
  truncate --size= .new

The "conv=sync,noerror" allows reading from the file without failing
for the read errors returned from the missing stripe.  However, this
also prevents the dd from stopping when it hits the end of file, so
the number of chunks to be read needs to be specified.

Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org