Re: rfc: fuzz testing by direct writes to device

2012-09-05 Thread David Sterba
On Sun, Sep 02, 2012 at 04:43:48AM -0700, Shentino wrote:
 I assume the same results are expected during a scrub as during a normal read?

yes

  I've tested this on an 2 disk data/raid1, metadata/raid1 with a running
  dd over one of the devices continually and using the filesystem. It was
  slower.

 Would I be correct to assume that a core dump on fsck is an automatic
 bug or is using the old 3.3.8 kernel a taint that would invalidate a
 report?  Note that this is for the btrfs-progs containing the fsck,
 not the actual kernel side code.

You're talking about fsck and kernel, I'm not quite sure which one do
you refer to with 'bug'.

fsck can crash when it finds unexpected data in the tree structures,
but this would mean that it passed through the checksum verification
earlier. This would be a bug, and if it is reproducible on newer kernels
a report on 3.3.x does not disqualify it right away.

Same holds for kernel, a datastructure inconsistency will most probably
lead to a BUG and subsequent crash.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rfc: fuzz testing by direct writes to device

2012-09-05 Thread Shentino
On Wed, Sep 5, 2012 at 8:04 AM, David Sterba d...@jikos.cz wrote:
 On Sun, Sep 02, 2012 at 04:43:48AM -0700, Shentino wrote:
 I assume the same results are expected during a scrub as during a normal 
 read?

 yes

  I've tested this on an 2 disk data/raid1, metadata/raid1 with a running
  dd over one of the devices continually and using the filesystem. It was
  slower.

 Would I be correct to assume that a core dump on fsck is an automatic
 bug or is using the old 3.3.8 kernel a taint that would invalidate a
 report?  Note that this is for the btrfs-progs containing the fsck,
 not the actual kernel side code.

 You're talking about fsck and kernel, I'm not quite sure which one do
 you refer to with 'bug'.

 fsck can crash when it finds unexpected data in the tree structures,
 but this would mean that it passed through the checksum verification
 earlier. This would be a bug, and if it is reproducible on newer kernels
 a report on 3.3.x does not disqualify it right away.

 Same holds for kernel, a datastructure inconsistency will most probably
 lead to a BUG and subsequent crash.

 david

By this I mean running fsck on a btrfs that was trashed by a 3.3.8 kernel bug.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rfc: fuzz testing by direct writes to device

2012-09-04 Thread Goffredo Baroncelli

Hi,

On 09/02/2012 03:03 AM, Shentino wrote:

This whole subject was also about using sed to corrupt-o-magic a
file's data on disk.

Is this an acceptable method for testing?



I am not sure that doing sed /dev/sdX /dev/sdX ... is the right 
thing to do, because it rewrites the full disk. This means that:

- it takes a lot of time
- you don't have any control about which part of the disk you change: 
what happens if sed write a block which is update in parallel by BTRFS ?


Anyway I suggest to give a look to the following video [1], which 
explains the automatic repair. Moreover it shows [2] how corrupt a block 
with the btrfs-corrupt-block command.


Hoping that this helps you.

BR
G.Baroncelli

[1] http://www.youtube.com/watch?v=hxWuaozpe2I
[2] See minute 17:52 of the video above



On Sat, Sep 1, 2012 at 4:49 PM, Michaelm...@draftx.net  wrote:

It should not. It is always preferred that you dd your drive onto
another disk just in case though.

On Sat, Sep 1, 2012 at 5:31 PM, Shentinoshent...@gmail.com  wrote:

On Sat, Sep 1, 2012 at 1:59 PM, cwillucwi...@cwillu.com  wrote:

You still haven't said which kernel you were running; the thing to do
is try the very latest rc (if not btrfs-next).


Sorry about that!

I thought I included it.

3.3.8

Hmm...seems it's been EOL'ed.  I need to yell at my distro.

In the meantime, will mounting a btrfs filesystem with a new kernel
render it unmountable by older kernels?

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rfc: fuzz testing by direct writes to device

2012-09-04 Thread Shentino
On Tue, Sep 4, 2012 at 11:15 AM, Goffredo Baroncelli kreij...@libero.it wrote:
 Hi,


 On 09/02/2012 03:03 AM, Shentino wrote:

 This whole subject was also about using sed to corrupt-o-magic a
 file's data on disk.

 Is this an acceptable method for testing?

 I am not sure that doing sed /dev/sdX /dev/sdX ... is the right thing to
 do, because it rewrites the full disk. This means that:
 - it takes a lot of time
 - you don't have any control about which part of the disk you change: what
 happens if sed write a block which is update in parallel by BTRFS ?

Which is one reason I used a sha1 hash of a random read as the search key :P

 Anyway I suggest to give a look to the following video [1], which explains
 the automatic repair. Moreover it shows [2] how corrupt a block with the
 btrfs-corrupt-block command.

That does sound more convenient.

 Hoping that this helps you.

 BR
 G.Baroncelli

 [1] http://www.youtube.com/watch?v=hxWuaozpe2I
 [2] See minute 17:52 of the video above


 On Sat, Sep 1, 2012 at 4:49 PM, Michaelm...@draftx.net  wrote:

 It should not. It is always preferred that you dd your drive onto
 another disk just in case though.

 On Sat, Sep 1, 2012 at 5:31 PM, Shentinoshent...@gmail.com  wrote:

 On Sat, Sep 1, 2012 at 1:59 PM, cwillucwi...@cwillu.com  wrote:

 You still haven't said which kernel you were running; the thing to do
 is try the very latest rc (if not btrfs-next).


 Sorry about that!

 I thought I included it.

 3.3.8

 Hmm...seems it's been EOL'ed.  I need to yell at my distro.

 In the meantime, will mounting a btrfs filesystem with a new kernel
 render it unmountable by older kernels?

 --

 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 .


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rfc: fuzz testing by direct writes to device

2012-09-04 Thread Goffredo Baroncelli

On 09/05/2012 03:59 AM, Shentino wrote:

  I am not sure that doing sed/dev/sdX/dev/sdX ... is the right thing to
  do, because it rewrites the full disk. This means that:
  - it takes a lot of time
  - you don't have any control about which part of the disk you change: what
  happens if sed write a block which is update in parallel by BTRFS ?

Which is one reason I used a sha1 hash of a random read as the search key :P


This doesn't change. The race would be the following:

1- kernel read a sector from the disk
2- sed read a sector from the disk
3- sed write a sector to the disk (the same data or an update one 
doesn't matter)

4- kernel write an update sector to the disk

If 3 and 4 are different data the results are unpredictable. Yes it is a 
very unlikely case, but it could happens.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rfc: fuzz testing by direct writes to device

2012-09-02 Thread Shentino
On Sat, Sep 1, 2012 at 10:44 PM, David Sterba d...@jikos.cz wrote:
 On Sat, Sep 01, 2012 at 06:03:32PM -0700, Shentino wrote:
 This whole subject was also about using sed to corrupt-o-magic a
 file's data on disk.

 Is this an acceptable method for testing?

 Starting with kernels 3.4 the error handling has been improved,
 namely for the EIO, so it shouldn't take your box down when you hit one.
 Newer kernels got fixes to the 'transaction abort' cleanup, so it should
 be possible to umount and mount the filesystem without problems.

 The filesystem should survive shooting at blocks, the checksums catch
 any change (with respect to it's strength, ie. generating a hash
 collision will lead to crash/abort later).

 Expected result for reading blocks after random writes is:
 * EIO for the corrupted block (both data or metadata) provided that
   there's no other copy
 * transparent and automatic repair from other copies

I assume the same results are expected during a scrub as during a normal read?

 I've tested this on an 2 disk data/raid1, metadata/raid1 with a running
 dd over one of the devices continually and using the filesystem. It was
 slower.


 david

Would I be correct to assume that a core dump on fsck is an automatic
bug or is using the old 3.3.8 kernel a taint that would invalidate a
report?  Note that this is for the btrfs-progs containing the fsck,
not the actual kernel side code.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rfc: fuzz testing by direct writes to device

2012-09-01 Thread Shentino
How effective would it be to directly write to the underlying device
and then running tests to see if the corruption is properly detected?

I just ran a fuzz test by syncing, and then manually corrupting a file
with the help of a surgical sed (yes, the before and after patterns
had fixed equal lengths).  First I got an I/O error (expected), then I
ran scrub and got more problems (not ok), the system froze (not good),
a reboot failed to mount the system again (worse), and then the fsck
program dumped core.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rfc: fuzz testing by direct writes to device

2012-09-01 Thread Shentino
Also, since the problem prevented me from syncing my other filesystmes
I couldn't capture the debug info.

It vanished during the cold boot still sitting in dirty page cache.

On Fri, Aug 31, 2012 at 11:44 PM, Shentino shent...@gmail.com wrote:
 How effective would it be to directly write to the underlying device
 and then running tests to see if the corruption is properly detected?

 I just ran a fuzz test by syncing, and then manually corrupting a file
 with the help of a surgical sed (yes, the before and after patterns
 had fixed equal lengths).  First I got an I/O error (expected), then I
 ran scrub and got more problems (not ok), the system froze (not good),
 a reboot failed to mount the system again (worse), and then the fsck
 program dumped core.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rfc: fuzz testing by direct writes to device

2012-09-01 Thread Michael
Please make sure you are running a very recent kernel. Btrfs is VERY
active and fixes for things like this are going in all the time. Any
related crash errors, kernel oopses, and exact methodology so we can
reproduce would be useful.
dmesg and uname -a would help us triage this and see what we need to fix.
Mike

On Sat, Sep 1, 2012 at 3:10 AM, Shentino shent...@gmail.com wrote:

 Also, since the problem prevented me from syncing my other filesystmes
 I couldn't capture the debug info.

 It vanished during the cold boot still sitting in dirty page cache.

 On Fri, Aug 31, 2012 at 11:44 PM, Shentino shent...@gmail.com wrote:
  How effective would it be to directly write to the underlying device
  and then running tests to see if the corruption is properly detected?
 
  I just ran a fuzz test by syncing, and then manually corrupting a file
  with the help of a surgical sed (yes, the before and after patterns
  had fixed equal lengths).  First I got an I/O error (expected), then I
  ran scrub and got more problems (not ok), the system froze (not good),
  a reboot failed to mount the system again (worse), and then the fsck
  program dumped core.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rfc: fuzz testing by direct writes to device

2012-09-01 Thread Shentino
On Sat, Sep 1, 2012 at 1:59 PM, cwillu cwi...@cwillu.com wrote:
 You still haven't said which kernel you were running; the thing to do
 is try the very latest rc (if not btrfs-next).

Sorry about that!

I thought I included it.

3.3.8

Hmm...seems it's been EOL'ed.  I need to yell at my distro.

In the meantime, will mounting a btrfs filesystem with a new kernel
render it unmountable by older kernels?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rfc: fuzz testing by direct writes to device

2012-09-01 Thread Michael
It should not. It is always preferred that you dd your drive onto
another disk just in case though.

On Sat, Sep 1, 2012 at 5:31 PM, Shentino shent...@gmail.com wrote:
 On Sat, Sep 1, 2012 at 1:59 PM, cwillu cwi...@cwillu.com wrote:
 You still haven't said which kernel you were running; the thing to do
 is try the very latest rc (if not btrfs-next).

 Sorry about that!

 I thought I included it.

 3.3.8

 Hmm...seems it's been EOL'ed.  I need to yell at my distro.

 In the meantime, will mounting a btrfs filesystem with a new kernel
 render it unmountable by older kernels?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rfc: fuzz testing by direct writes to device

2012-09-01 Thread Shentino
This whole subject was also about using sed to corrupt-o-magic a
file's data on disk.

Is this an acceptable method for testing?

On Sat, Sep 1, 2012 at 4:49 PM, Michael m...@draftx.net wrote:
 It should not. It is always preferred that you dd your drive onto
 another disk just in case though.

 On Sat, Sep 1, 2012 at 5:31 PM, Shentino shent...@gmail.com wrote:
 On Sat, Sep 1, 2012 at 1:59 PM, cwillu cwi...@cwillu.com wrote:
 You still haven't said which kernel you were running; the thing to do
 is try the very latest rc (if not btrfs-next).

 Sorry about that!

 I thought I included it.

 3.3.8

 Hmm...seems it's been EOL'ed.  I need to yell at my distro.

 In the meantime, will mounting a btrfs filesystem with a new kernel
 render it unmountable by older kernels?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rfc: fuzz testing by direct writes to device

2012-09-01 Thread David Sterba
On Sat, Sep 01, 2012 at 06:03:32PM -0700, Shentino wrote:
 This whole subject was also about using sed to corrupt-o-magic a
 file's data on disk.
 
 Is this an acceptable method for testing?

Starting with kernels 3.4 the error handling has been improved,
namely for the EIO, so it shouldn't take your box down when you hit one.
Newer kernels got fixes to the 'transaction abort' cleanup, so it should
be possible to umount and mount the filesystem without problems.

The filesystem should survive shooting at blocks, the checksums catch
any change (with respect to it's strength, ie. generating a hash
collision will lead to crash/abort later).

Expected result for reading blocks after random writes is:
* EIO for the corrupted block (both data or metadata) provided that
  there's no other copy
* transparent and automatic repair from other copies

I've tested this on an 2 disk data/raid1, metadata/raid1 with a running
dd over one of the devices continually and using the filesystem. It was
slower.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html