Re: rfc: fuzz testing by direct writes to device
On Sun, Sep 02, 2012 at 04:43:48AM -0700, Shentino wrote: I assume the same results are expected during a scrub as during a normal read? yes I've tested this on an 2 disk data/raid1, metadata/raid1 with a running dd over one of the devices continually and using the filesystem. It was slower. Would I be correct to assume that a core dump on fsck is an automatic bug or is using the old 3.3.8 kernel a taint that would invalidate a report? Note that this is for the btrfs-progs containing the fsck, not the actual kernel side code. You're talking about fsck and kernel, I'm not quite sure which one do you refer to with 'bug'. fsck can crash when it finds unexpected data in the tree structures, but this would mean that it passed through the checksum verification earlier. This would be a bug, and if it is reproducible on newer kernels a report on 3.3.x does not disqualify it right away. Same holds for kernel, a datastructure inconsistency will most probably lead to a BUG and subsequent crash. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rfc: fuzz testing by direct writes to device
On Wed, Sep 5, 2012 at 8:04 AM, David Sterba d...@jikos.cz wrote: On Sun, Sep 02, 2012 at 04:43:48AM -0700, Shentino wrote: I assume the same results are expected during a scrub as during a normal read? yes I've tested this on an 2 disk data/raid1, metadata/raid1 with a running dd over one of the devices continually and using the filesystem. It was slower. Would I be correct to assume that a core dump on fsck is an automatic bug or is using the old 3.3.8 kernel a taint that would invalidate a report? Note that this is for the btrfs-progs containing the fsck, not the actual kernel side code. You're talking about fsck and kernel, I'm not quite sure which one do you refer to with 'bug'. fsck can crash when it finds unexpected data in the tree structures, but this would mean that it passed through the checksum verification earlier. This would be a bug, and if it is reproducible on newer kernels a report on 3.3.x does not disqualify it right away. Same holds for kernel, a datastructure inconsistency will most probably lead to a BUG and subsequent crash. david By this I mean running fsck on a btrfs that was trashed by a 3.3.8 kernel bug. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rfc: fuzz testing by direct writes to device
Hi, On 09/02/2012 03:03 AM, Shentino wrote: This whole subject was also about using sed to corrupt-o-magic a file's data on disk. Is this an acceptable method for testing? I am not sure that doing sed /dev/sdX /dev/sdX ... is the right thing to do, because it rewrites the full disk. This means that: - it takes a lot of time - you don't have any control about which part of the disk you change: what happens if sed write a block which is update in parallel by BTRFS ? Anyway I suggest to give a look to the following video [1], which explains the automatic repair. Moreover it shows [2] how corrupt a block with the btrfs-corrupt-block command. Hoping that this helps you. BR G.Baroncelli [1] http://www.youtube.com/watch?v=hxWuaozpe2I [2] See minute 17:52 of the video above On Sat, Sep 1, 2012 at 4:49 PM, Michaelm...@draftx.net wrote: It should not. It is always preferred that you dd your drive onto another disk just in case though. On Sat, Sep 1, 2012 at 5:31 PM, Shentinoshent...@gmail.com wrote: On Sat, Sep 1, 2012 at 1:59 PM, cwillucwi...@cwillu.com wrote: You still haven't said which kernel you were running; the thing to do is try the very latest rc (if not btrfs-next). Sorry about that! I thought I included it. 3.3.8 Hmm...seems it's been EOL'ed. I need to yell at my distro. In the meantime, will mounting a btrfs filesystem with a new kernel render it unmountable by older kernels? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rfc: fuzz testing by direct writes to device
On Tue, Sep 4, 2012 at 11:15 AM, Goffredo Baroncelli kreij...@libero.it wrote: Hi, On 09/02/2012 03:03 AM, Shentino wrote: This whole subject was also about using sed to corrupt-o-magic a file's data on disk. Is this an acceptable method for testing? I am not sure that doing sed /dev/sdX /dev/sdX ... is the right thing to do, because it rewrites the full disk. This means that: - it takes a lot of time - you don't have any control about which part of the disk you change: what happens if sed write a block which is update in parallel by BTRFS ? Which is one reason I used a sha1 hash of a random read as the search key :P Anyway I suggest to give a look to the following video [1], which explains the automatic repair. Moreover it shows [2] how corrupt a block with the btrfs-corrupt-block command. That does sound more convenient. Hoping that this helps you. BR G.Baroncelli [1] http://www.youtube.com/watch?v=hxWuaozpe2I [2] See minute 17:52 of the video above On Sat, Sep 1, 2012 at 4:49 PM, Michaelm...@draftx.net wrote: It should not. It is always preferred that you dd your drive onto another disk just in case though. On Sat, Sep 1, 2012 at 5:31 PM, Shentinoshent...@gmail.com wrote: On Sat, Sep 1, 2012 at 1:59 PM, cwillucwi...@cwillu.com wrote: You still haven't said which kernel you were running; the thing to do is try the very latest rc (if not btrfs-next). Sorry about that! I thought I included it. 3.3.8 Hmm...seems it's been EOL'ed. I need to yell at my distro. In the meantime, will mounting a btrfs filesystem with a new kernel render it unmountable by older kernels? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rfc: fuzz testing by direct writes to device
On 09/05/2012 03:59 AM, Shentino wrote: I am not sure that doing sed/dev/sdX/dev/sdX ... is the right thing to do, because it rewrites the full disk. This means that: - it takes a lot of time - you don't have any control about which part of the disk you change: what happens if sed write a block which is update in parallel by BTRFS ? Which is one reason I used a sha1 hash of a random read as the search key :P This doesn't change. The race would be the following: 1- kernel read a sector from the disk 2- sed read a sector from the disk 3- sed write a sector to the disk (the same data or an update one doesn't matter) 4- kernel write an update sector to the disk If 3 and 4 are different data the results are unpredictable. Yes it is a very unlikely case, but it could happens. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rfc: fuzz testing by direct writes to device
On Sat, Sep 1, 2012 at 10:44 PM, David Sterba d...@jikos.cz wrote: On Sat, Sep 01, 2012 at 06:03:32PM -0700, Shentino wrote: This whole subject was also about using sed to corrupt-o-magic a file's data on disk. Is this an acceptable method for testing? Starting with kernels 3.4 the error handling has been improved, namely for the EIO, so it shouldn't take your box down when you hit one. Newer kernels got fixes to the 'transaction abort' cleanup, so it should be possible to umount and mount the filesystem without problems. The filesystem should survive shooting at blocks, the checksums catch any change (with respect to it's strength, ie. generating a hash collision will lead to crash/abort later). Expected result for reading blocks after random writes is: * EIO for the corrupted block (both data or metadata) provided that there's no other copy * transparent and automatic repair from other copies I assume the same results are expected during a scrub as during a normal read? I've tested this on an 2 disk data/raid1, metadata/raid1 with a running dd over one of the devices continually and using the filesystem. It was slower. david Would I be correct to assume that a core dump on fsck is an automatic bug or is using the old 3.3.8 kernel a taint that would invalidate a report? Note that this is for the btrfs-progs containing the fsck, not the actual kernel side code. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
rfc: fuzz testing by direct writes to device
How effective would it be to directly write to the underlying device and then running tests to see if the corruption is properly detected? I just ran a fuzz test by syncing, and then manually corrupting a file with the help of a surgical sed (yes, the before and after patterns had fixed equal lengths). First I got an I/O error (expected), then I ran scrub and got more problems (not ok), the system froze (not good), a reboot failed to mount the system again (worse), and then the fsck program dumped core. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rfc: fuzz testing by direct writes to device
Also, since the problem prevented me from syncing my other filesystmes I couldn't capture the debug info. It vanished during the cold boot still sitting in dirty page cache. On Fri, Aug 31, 2012 at 11:44 PM, Shentino shent...@gmail.com wrote: How effective would it be to directly write to the underlying device and then running tests to see if the corruption is properly detected? I just ran a fuzz test by syncing, and then manually corrupting a file with the help of a surgical sed (yes, the before and after patterns had fixed equal lengths). First I got an I/O error (expected), then I ran scrub and got more problems (not ok), the system froze (not good), a reboot failed to mount the system again (worse), and then the fsck program dumped core. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rfc: fuzz testing by direct writes to device
Please make sure you are running a very recent kernel. Btrfs is VERY active and fixes for things like this are going in all the time. Any related crash errors, kernel oopses, and exact methodology so we can reproduce would be useful. dmesg and uname -a would help us triage this and see what we need to fix. Mike On Sat, Sep 1, 2012 at 3:10 AM, Shentino shent...@gmail.com wrote: Also, since the problem prevented me from syncing my other filesystmes I couldn't capture the debug info. It vanished during the cold boot still sitting in dirty page cache. On Fri, Aug 31, 2012 at 11:44 PM, Shentino shent...@gmail.com wrote: How effective would it be to directly write to the underlying device and then running tests to see if the corruption is properly detected? I just ran a fuzz test by syncing, and then manually corrupting a file with the help of a surgical sed (yes, the before and after patterns had fixed equal lengths). First I got an I/O error (expected), then I ran scrub and got more problems (not ok), the system froze (not good), a reboot failed to mount the system again (worse), and then the fsck program dumped core. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rfc: fuzz testing by direct writes to device
On Sat, Sep 1, 2012 at 1:59 PM, cwillu cwi...@cwillu.com wrote: You still haven't said which kernel you were running; the thing to do is try the very latest rc (if not btrfs-next). Sorry about that! I thought I included it. 3.3.8 Hmm...seems it's been EOL'ed. I need to yell at my distro. In the meantime, will mounting a btrfs filesystem with a new kernel render it unmountable by older kernels? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rfc: fuzz testing by direct writes to device
It should not. It is always preferred that you dd your drive onto another disk just in case though. On Sat, Sep 1, 2012 at 5:31 PM, Shentino shent...@gmail.com wrote: On Sat, Sep 1, 2012 at 1:59 PM, cwillu cwi...@cwillu.com wrote: You still haven't said which kernel you were running; the thing to do is try the very latest rc (if not btrfs-next). Sorry about that! I thought I included it. 3.3.8 Hmm...seems it's been EOL'ed. I need to yell at my distro. In the meantime, will mounting a btrfs filesystem with a new kernel render it unmountable by older kernels? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rfc: fuzz testing by direct writes to device
This whole subject was also about using sed to corrupt-o-magic a file's data on disk. Is this an acceptable method for testing? On Sat, Sep 1, 2012 at 4:49 PM, Michael m...@draftx.net wrote: It should not. It is always preferred that you dd your drive onto another disk just in case though. On Sat, Sep 1, 2012 at 5:31 PM, Shentino shent...@gmail.com wrote: On Sat, Sep 1, 2012 at 1:59 PM, cwillu cwi...@cwillu.com wrote: You still haven't said which kernel you were running; the thing to do is try the very latest rc (if not btrfs-next). Sorry about that! I thought I included it. 3.3.8 Hmm...seems it's been EOL'ed. I need to yell at my distro. In the meantime, will mounting a btrfs filesystem with a new kernel render it unmountable by older kernels? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rfc: fuzz testing by direct writes to device
On Sat, Sep 01, 2012 at 06:03:32PM -0700, Shentino wrote: This whole subject was also about using sed to corrupt-o-magic a file's data on disk. Is this an acceptable method for testing? Starting with kernels 3.4 the error handling has been improved, namely for the EIO, so it shouldn't take your box down when you hit one. Newer kernels got fixes to the 'transaction abort' cleanup, so it should be possible to umount and mount the filesystem without problems. The filesystem should survive shooting at blocks, the checksums catch any change (with respect to it's strength, ie. generating a hash collision will lead to crash/abort later). Expected result for reading blocks after random writes is: * EIO for the corrupted block (both data or metadata) provided that there's no other copy * transparent and automatic repair from other copies I've tested this on an 2 disk data/raid1, metadata/raid1 with a running dd over one of the devices continually and using the filesystem. It was slower. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html