Re: [PATCH] fstests: generic: Test SHARED flag about fiemap ioctl before and after sync
On Fri, May 13, 2016 at 09:52:52AM +0800, Qu Wenruo wrote: > The test case will check SHARED flag returned by fiemap ioctl on > reflinked files before and after sync. > > Normally SHARED flag won't change just due to a normal sync operation. > > But btrfs doesn't handle SHARED flag well, and this time it won't check > any delayed extent tree(reverse extent searching tree) modification, but > only metadata already committed to disk. > > So btrfs will not return correct SHARED flag on reflinked files if there > is no sync to commit all metadata. > > This testcase will just check it. > > Signed-off-by: Qu Wenruo> -- > And of course, xfs handles it quite well. Nice work Darrick. > Also the test case needs the new infrastructure introduced in previous > generic/352 test case. > --- > tests/generic/353 | 86 > +++ > tests/generic/353.out | 9 ++ > tests/generic/group | 1 + > 3 files changed, 96 insertions(+) > create mode 100755 tests/generic/353 > create mode 100644 tests/generic/353.out > > diff --git a/tests/generic/353 b/tests/generic/353 > new file mode 100755 > index 000..1e9117e > --- /dev/null > +++ b/tests/generic/353 > @@ -0,0 +1,86 @@ > +#! /bin/bash > +# FS QA Test 353 > +# > +# Check if fiemap ioctl returns correct SHARED flag on reflinked file > +# before and after sync the fs > +# > +# Btrfs has a bug in checking shared extent, which can only handle metadata > +# already committed to disk, but not delayed extent tree modification. > +# This caused SHARED flag only occurs after sync. I noticed this a while ago, but figured it was just btrfs being btrfs. Ho hum. Thanks for writing a test and getting the problem fixed. > +# > +#--- > +# Copyright (c) 2016 Fujitsu. All Rights Reserved. > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#--- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* tmp isn't used for anything in this testcase. > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > +. ./common/reflink > +. ./common/punch > + > +# remove previous $seqres.full before test > +rm -f $seqres.full > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs generic > +_supported_os Linux > +_require_scratch_reflink > +_require_fiemap > + > +_scratch_mkfs > /dev/null 2>&1 > +_scratch_mount > + > +blocksize=64k > +file1="$SCRATCH_MNT/file1" > +file2="$SCRATCH_MNT/file2" > + > +# write the initial file > +_pwrite_byte 0xcdcdcdcd 0 $blocksize $file1 | _filter_xfs_io > + > +# reflink initial file > +_reflink_range $file1 0 $file2 0 $blocksize | _filter_xfs_io > + > +# check their fiemap to make sure it's correct > +$XFS_IO_PROG -c "fiemap -v" $file1 | _filter_fiemap_flags > +$XFS_IO_PROG -c "fiemap -v" $file2 | _filter_fiemap_flags > + > +# sync and recheck, to make sure the fiemap doesn't change just > +# due to sync > +sync > +$XFS_IO_PROG -c "fiemap -v" $file1 | _filter_fiemap_flags > +$XFS_IO_PROG -c "fiemap -v" $file2 | _filter_fiemap_flags Nowadays, when I write a test that prints similar output one after the other I will also write a comment to the output to distinguish the two cases, e.g.: +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) fiemap before sync +0: [0..127]: 0x2001 +0: [0..127]: 0x2001 fiemap after sync +0: [0..127]: 0x2001 +0: [0..127]: 0x2001 This way when a bunch of tests regress some months later it's easier for me to relearn what's going on. (I wasn't always good at doing that.) Otherwise, everything looks ok. --D > + > +# success, all done > +status=0 > +exit > diff --git a/tests/generic/353.out b/tests/generic/353.out > new file mode 100644 > index 000..0cd8981 > --- /dev/null > +++ b/tests/generic/353.out > @@ -0,0 +1,9 @@ > +QA output created by 353 > +wrote 65536/65536 bytes at offset 0 > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +linked 65536/65536 bytes at offset 0 > +XXX Bytes, X ops; XX:XX:XX.X (XXX
Re: BTRFS Data at Rest File Corruption
Chris, See notes inline. On Thu, 2016-05-12 at 19:41 -0600, Chris Murphy wrote: > On Thu, May 12, 2016 at 11:49 AM, Richard A. Lochnercom> wrote: > > > > > I suspected, and I still suspect that the error occurred upon a > > metadata update that corrupted the checksum for the file, probably > > due > > to silent memory corruption. If the checksum was silently > > corrupted, > > it would be simply written to both drives causing this type of > > error. > Metadata is checksummed independently of data. So if the data isn't > updated, its checksum doesn't change, only metadata checksum is > changed. > > > > > > btrfs dmesg(s): > > > > [16510.334020] BTRFS warning (device sdb1): checksum error at > > logical > > 3037444042752 on dev /dev/sdb1, sector 4988789496, root 259, inode > > 1437377, offset 75754369024, length 4096, links 1 (path: > > Rick/sda4.img) > > [16510.334043] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr > > 0, rd > > 0, flush 0, corrupt 5, gen 0 > > [16510.345662] BTRFS error (device sdb1): unable to fixup (regular) > > error at logical 3037444042752 on dev /dev/sdb1 > > > > [17606.978439] BTRFS warning (device sdb1): checksum error at > > logical > > 3037444042752 on dev /dev/sdc1, sector 4988750584, root 259, inode > > 1437377, offset 75754369024, length 4096, links 1 (path: > > Rick/sda4.img) > > [17606.978460] BTRFS error (device sdb1): bdev /dev/sdc1 errs: wr > > 0, rd > > 13, flush 0, corrupt 4, gen 0 > > [17606.989497] BTRFS error (device sdb1): unable to fixup (regular) > > error at logical 3037444042752 on dev /dev/sdc1 > This is confusing. Are these the same boot? The later time has a > lower > corrupt count. Can you just 'dd if=sda4.img of=/dev/null' and report > all (new) messages in dmesg? It seems to me there should be pretty > much all the same monotonic-time for the problem with both devices. My apologies, they were from different boots. After the dd, I get these: [109479.550836] BTRFS warning (device sdb1): csum failed ino 1437377 off 75754369024 csum 1689728329 expected csum 2165338402 [109479.596626] BTRFS warning (device sdb1): csum failed ino 1437377 off 75754369024 csum 1689728329 expected csum 2165338402 [109479.601969] BTRFS warning (device sdb1): csum failed ino 1437377 off 75754369024 csum 1689728329 expected csum 2165338402 [109479.602189] BTRFS warning (device sdb1): csum failed ino 1437377 off 75754369024 csum 1689728329 expected csum 2165338402 [109479.602323] BTRFS warning (device sdb1): csum failed ino 1437377 off 75754369024 csum 1689728329 expected csum 2165338402 > > Also what do you get for these for each device: > > smartctl scterc -l /dev/sdX > cat /sys/block/sdX/device/timeout > # smartctl -l scterc /dev/sdb sartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.8-300.fc23.x86_64] (local build) Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools .org SCT Error Recovery Control: Read: 70 (7.0 seconds) Write: 70 (7.0 seconds) # smartctl -l scterc /dev/sdc smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.8-300.fc23.x86_64] (local build) Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools .org SCT Error Recovery Control: Read: 70 (7.0 seconds) Write: 70 (7.0 seconds) # cat /sys/block/sdb/device/timeout 30 # cat /sys/block/sdc/device/timeout 30 > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/3] btrfs-progs: autogen: Don't show success message on fail
When autogen.sh failed, the success message is still in output: # ./autogen.sh ... configure.ac:131: error: possibly undefined macro: PKG_CHECK_VAR If this token and others are legitimate, please use m4_pattern_allow. See the Autoconf documentation. Now type './configure' and 'make' to compile. # Fixed by check return value of autoconf. After patch: # ./autogen.sh ... configure.ac:132: error: possibly undefined macro: PKG_CHECK_VAR If this token and others are legitimate, please use m4_pattern_allow. See the Autoconf documentation. # Signed-off-by: Zhao Lei--- autogen.sh | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/autogen.sh b/autogen.sh index 8b9a9cb..a5f9af2 100755 --- a/autogen.sh +++ b/autogen.sh @@ -64,9 +64,10 @@ echo " automake: $(automake --version | head -1)" chmod +x version.sh rm -rf autom4te.cache -aclocal $AL_OPTS -autoconf $AC_OPTS -autoheader $AH_OPTS +aclocal $AL_OPTS && +autoconf $AC_OPTS && +autoheader $AH_OPTS || +exit 1 # it's better to use helper files from automake installation than # maintain copies in git tree -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/3] btrfs-progs: autogen: Make build success in CentOS 6 and 7
btrfs-progs build failed in CentOS 6 and 7: #./autogen.sh ... configure.ac:131: error: possibly undefined macro: PKG_CHECK_VAR If this token and others are legitimate, please use m4_pattern_allow. See the Autoconf documentation. ... Seems PKG_CHECK_VAR is new in pkgconfig 0.28 (24-Jan-2013): http://redmine.audacious-media-player.org/boards/1/topics/736 And the max available version for CentOS 7 in yum-repo and rpmfind.net is: pkgconfig-0.27.1-4.el7 http://rpmfind.net/linux/rpm2html/search.php?query=pkgconfig=Search+...=centos= I updated my pkgconfig to 0.30, but still failed at above error. (Maybe it is my setting problem) To make user in centos 6 and 7 building btrfs-progs without more changes, we can avoid using PKG_CHECK_VAR in following way found in: https://github.com/audacious-media-player/audacious-plugins/commit/f95ab6f939ecf0d9232b3165f9241d2ea9676b9e Changelog v1->v2: 1: Add AC_SUBST(UDEVDIR) Suggested by: Jeff MahoneySigned-off-by: Zhao Lei --- configure.ac | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 4688bc7..c79472c 100644 --- a/configure.ac +++ b/configure.ac @@ -128,7 +128,8 @@ PKG_STATIC(UUID_LIBS_STATIC, [uuid]) PKG_CHECK_MODULES(ZLIB, [zlib]) PKG_STATIC(ZLIB_LIBS_STATIC, [zlib]) -PKG_CHECK_VAR([UDEVDIR], [udev], [udevdir]) +UDEVDIR="$(pkg-config udev --variable=udevdir)" +AC_SUBST(UDEVDIR) dnl lzo library does not provide pkg-config, let use classic way AC_CHECK_LIB([lzo2], [lzo_version], [ -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/3] btrfs-progs: autogen: Some compatibility fixs
Changelog v1->v2: 1: btrfs-progs: autogen: Make build success in CentOS 6 and 7 Add AC_SUBST(UDEVDIR), suggested by: Jeff MahoneyZhao Lei (3): btrfs-progs: autogen: Avoid chdir fail on dirname with blank btrfs-progs: autogen: Make build success in CentOS 6 and 7 btrfs-progs: autogen: Don't show success message on fail autogen.sh | 9 + configure.ac | 3 ++- 2 files changed, 7 insertions(+), 5 deletions(-) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/3] btrfs-progs: autogen: Avoid chdir fail on dirname with blank
If source put in dir with blanks, as: /var/lib/jenkins/workspace/btrfs progs autogen will failed: ./autogen.sh: line 95: cd: /var/lib/jenkins/workspace/btrfs: No such file or directory Can be fixed by adding quotes into cd command. Signed-off-by: Zhao Lei--- autogen.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/autogen.sh b/autogen.sh index 9669850..8b9a9cb 100755 --- a/autogen.sh +++ b/autogen.sh @@ -92,7 +92,7 @@ find_autofile config.guess find_autofile config.sub find_autofile install-sh -cd $THEDIR +cd "$THEDIR" echo echo "Now type '$srcdir/configure' and 'make' to compile." -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About in-band dedupe for v4.7
Wang Shilong wrote on 2016/05/13 11:13 +0800: Hello Guys, I am commenting not because of Qu's patch, of course, Qu and Mark Fasheh Do a really good thing for Btrfs contributions. Just my two cents! 1) I think Currently, we should really focus on Btrfs stability, there are still many bugs hidden inside Btrfs, please See Filipe flighting patches here and there. Unfortunately, I have not seen Btrfs's Bug is reducing for these two years even we have frozen new features here. we are adopting Btrfs internally sometime before, but it is really unstable unfortunately. So Why not sit down and make it stable firstly? Make sense. Although maybe out of your expectation, inband de-dedupe did exposed some existing bugs we didn't ever found before. And they are all reproducible without inband dedupe. Some examples: 1) fiemap bugs Not only one, but at least 2. See the recently submitted generic/352 and 353. And the fix is already under testing and may come out soon. 2) inode->outstanding_extents problems. Currently we use SZ_128M hard coded max extent length for non-compressed file extents. But if we change the limit to a smaller one, for example, 128K. We will have outstanding extents leak, and tons of warning. Although it won't affect current codes, it's still better to fix it. And we're already testing the fix again. 3) Slow backref walk. Already in the comment of backref.c from ancient days, but we didn't put much concern until inband dedupe/heavily reflink work load. Even not that obvious, we are doing our best to stabilizing btrfs during the push of inband dedupe. (While a nitpicking jerk will never see this) But you are still quite right on this case, we may be in a rush to push it. 2)I am not against new feature, but for a new feature, I think we should be really careful now, Especially if a new feature affect normal write/read path, I think following things can be done to make things better: Although the affect to normal routine is limited to minimal. you're still right, we lacks the overall documentation to explain the design which tries to reduce the impact to existing write routine. ->Give your design and documents(maybe put it in wiki or somewhere else) So that other guys can really review your design instead of looking a bunch of codes firstly. And we really need understand pros and cons of design, also if there is TODO, please do clarity out how we can solve this problem(or it is possible?). Right, already planned before but always busy with other fixes. The case will change in recent 2 month dramatically, as the modification to patchset is already minimal, we have time to create/improve the documentation now. Thanks for the remind. ->We need reallly a lot of testing and benchmarking if it affects normal write/read path. ->I think Chris is nice to merge patches, but I really argue next time if we want to merge new feautres Please make sure at least two other guys review for patches. Thank you! Shilong Thanks for you suggestions, really helps a lot! Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About in-band dedupe for v4.7
Hello Guys, I am commenting not because of Qu's patch, of course, Qu and Mark Fasheh Do a really good thing for Btrfs contributions. Just my two cents! 1) I think Currently, we should really focus on Btrfs stability, there are still many bugs hidden inside Btrfs, please See Filipe flighting patches here and there. Unfortunately, I have not seen Btrfs's Bug is reducing for these two years even we have frozen new features here. we are adopting Btrfs internally sometime before, but it is really unstable unfortunately. So Why not sit down and make it stable firstly? 2)I am not against new feature, but for a new feature, I think we should be really careful now, Especially if a new feature affect normal write/read path, I think following things can be done to make things better: ->Give your design and documents(maybe put it in wiki or somewhere else) So that other guys can really review your design instead of looking a bunch of codes firstly. And we really need understand pros and cons of design, also if there is TODO, please do clarity out how we can solve this problem(or it is possible?). ->We need reallly a lot of testing and benchmarking if it affects normal write/read path. ->I think Chris is nice to merge patches, but I really argue next time if we want to merge new feautres Please make sure at least two other guys review for patches. Thank you! Shilong On Wed, May 11, 2016 at 6:11 AM, Mark Fashehwrote: > On Tue, May 10, 2016 at 03:19:52PM +0800, Qu Wenruo wrote: >> Hi, Chris, Josef and David, >> >> As merge window for v4.7 is coming, it would be good to hear your >> ideas about the inband dedupe. >> >> We are addressing the ENOSPC problem which Josef pointed out, and we >> believe the final fix patch would come out at the beginning of the >> merge window.(Next week) > > How about the fiemap performance problem you referenced before? My guess is > that it happens because you don't coalesce writes into anything larger than > a page so you're stuck deduping at some silly size like 4k. This in turn > fragments the files so much that fiemap has a hard time walking backrefs. > > I have to check the patches to be sure but perhaps you can tell me whether > my hunch is correct or not. > > > In fact, I actually asked privately for time to review your dedupe patches, > but I've been literally so busy cleaning up after the mess you left in your > last qgroups rewrite I haven't had time. > > You literally broke qgroups in almost every spot that matters. In some cases > (drop_snapshot) you tore out working code and left in a /* TODO */ comment > for someone else to complete. Snapshot create was so trivially and > completely broken by your changes that weeks later, I'm still hunting a > solution which doesn't involve adding an extra _commit_ to our commit. This > is a MASSIVE regression from where we were before. > > IMHO, you should not be trusted with large features or rewrites until you > can demonstrate: > > - A willingness to *completely* solve the problem you are trying to 'fix', >not do half the job which someone else will have to complete for you. > > - Actual testing. The snapshot bug I reference above exists purely because >nobody created a snapshot inside of one and checked the qgroup numbers! > > Sorry to be so harsh. >--Mark > > -- > Mark Fasheh > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fstests: generic: Test SHARED flag about fiemap ioctl before and after sync
The test case will check SHARED flag returned by fiemap ioctl on reflinked files before and after sync. Normally SHARED flag won't change just due to a normal sync operation. But btrfs doesn't handle SHARED flag well, and this time it won't check any delayed extent tree(reverse extent searching tree) modification, but only metadata already committed to disk. So btrfs will not return correct SHARED flag on reflinked files if there is no sync to commit all metadata. This testcase will just check it. Signed-off-by: Qu Wenruo-- And of course, xfs handles it quite well. Nice work Darrick. Also the test case needs the new infrastructure introduced in previous generic/352 test case. --- tests/generic/353 | 86 +++ tests/generic/353.out | 9 ++ tests/generic/group | 1 + 3 files changed, 96 insertions(+) create mode 100755 tests/generic/353 create mode 100644 tests/generic/353.out diff --git a/tests/generic/353 b/tests/generic/353 new file mode 100755 index 000..1e9117e --- /dev/null +++ b/tests/generic/353 @@ -0,0 +1,86 @@ +#! /bin/bash +# FS QA Test 353 +# +# Check if fiemap ioctl returns correct SHARED flag on reflinked file +# before and after sync the fs +# +# Btrfs has a bug in checking shared extent, which can only handle metadata +# already committed to disk, but not delayed extent tree modification. +# This caused SHARED flag only occurs after sync. +# +#--- +# Copyright (c) 2016 Fujitsu. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/reflink +. ./common/punch + +# remove previous $seqres.full before test +rm -f $seqres.full + +# real QA test starts here + +# Modify as appropriate. +_supported_fs generic +_supported_os Linux +_require_scratch_reflink +_require_fiemap + +_scratch_mkfs > /dev/null 2>&1 +_scratch_mount + +blocksize=64k +file1="$SCRATCH_MNT/file1" +file2="$SCRATCH_MNT/file2" + +# write the initial file +_pwrite_byte 0xcdcdcdcd 0 $blocksize $file1 | _filter_xfs_io + +# reflink initial file +_reflink_range $file1 0 $file2 0 $blocksize | _filter_xfs_io + +# check their fiemap to make sure it's correct +$XFS_IO_PROG -c "fiemap -v" $file1 | _filter_fiemap_flags +$XFS_IO_PROG -c "fiemap -v" $file2 | _filter_fiemap_flags + +# sync and recheck, to make sure the fiemap doesn't change just +# due to sync +sync +$XFS_IO_PROG -c "fiemap -v" $file1 | _filter_fiemap_flags +$XFS_IO_PROG -c "fiemap -v" $file2 | _filter_fiemap_flags + +# success, all done +status=0 +exit diff --git a/tests/generic/353.out b/tests/generic/353.out new file mode 100644 index 000..0cd8981 --- /dev/null +++ b/tests/generic/353.out @@ -0,0 +1,9 @@ +QA output created by 353 +wrote 65536/65536 bytes at offset 0 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +linked 65536/65536 bytes at offset 0 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +0: [0..127]: 0x2001 +0: [0..127]: 0x2001 +0: [0..127]: 0x2001 +0: [0..127]: 0x2001 diff --git a/tests/generic/group b/tests/generic/group index 3f00386..0392d4d 100644 --- a/tests/generic/group +++ b/tests/generic/group @@ -355,3 +355,4 @@ 350 blockdev quick rw 351 blockdev quick rw 352 auto clone +353 auto quick clone -- 2.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS Data at Rest File Corruption
On Thu, May 12, 2016 at 11:49 AM, Richard A. Lochnerwrote: > I suspected, and I still suspect that the error occurred upon a > metadata update that corrupted the checksum for the file, probably due > to silent memory corruption. If the checksum was silently corrupted, > it would be simply written to both drives causing this type of error. Metadata is checksummed independently of data. So if the data isn't updated, its checksum doesn't change, only metadata checksum is changed. > > btrfs dmesg(s): > > [16510.334020] BTRFS warning (device sdb1): checksum error at logical > 3037444042752 on dev /dev/sdb1, sector 4988789496, root 259, inode > 1437377, offset 75754369024, length 4096, links 1 (path: Rick/sda4.img) > [16510.334043] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd > 0, flush 0, corrupt 5, gen 0 > [16510.345662] BTRFS error (device sdb1): unable to fixup (regular) > error at logical 3037444042752 on dev /dev/sdb1 > > [17606.978439] BTRFS warning (device sdb1): checksum error at logical > 3037444042752 on dev /dev/sdc1, sector 4988750584, root 259, inode > 1437377, offset 75754369024, length 4096, links 1 (path: Rick/sda4.img) > [17606.978460] BTRFS error (device sdb1): bdev /dev/sdc1 errs: wr 0, rd > 13, flush 0, corrupt 4, gen 0 > [17606.989497] BTRFS error (device sdb1): unable to fixup (regular) > error at logical 3037444042752 on dev /dev/sdc1 This is confusing. Are these the same boot? The later time has a lower corrupt count. Can you just 'dd if=sda4.img of=/dev/null' and report all (new) messages in dmesg? It seems to me there should be pretty much all the same monotonic-time for the problem with both devices. Also what do you get for these for each device: smartctl scterc -l /dev/sdX cat /sys/block/sdX/device/timeout -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS Data at Rest File Corruption
Austin, Ah, the idea of rewriting the "bad" data block is very interesting. I had not thought of that. Interestingly, the corrupted file is a raw backup image of a btrfs file system partition. I can mount it as a loop device. I suppose I could rewrite that data block, mount it and run a scrub on that mounted loop device to find out if it is truly fixed. I should also mention that this data is not critical to me. I only brought this issue up because I thought it might be of interest. I can think of ways to protect against most manifestations of this type of error (since metadata is checksummed in btrfs), but I cannot argue that it would be worth the development effort, increased code complexity or the additional cpu cycles required to implement such a "defensive" algorithm for an "edge case" like this. Even with a defensive algorithm, these errors could still occur, but I believe you could shrink the time window in which they could occur enough to significantly reduce their probability. That said, I happen to have experienced this particular error twice (over a period of about 7 months) with btrfs on this system. I do believe that both were due to memory errors and I plan to upgrade soon to a Haswell system with ECC memory because of this. However, I wonder if my "commodity hardware" is that unique? In any event, thank you very much for your time and insight. Rick Lochner On Thu, 2016-05-12 at 14:29 -0400, Austin S. Hemmelgarn wrote: > On 2016-05-12 13:49, Richard A. Lochner wrote: > > > > Austin, > > > > I rebooted the computer and reran the scrub to no avail. The error > > is > > consistent. > > > > The reason I brought this question to the mailing list is because > > it > > seemed like a situation that might be of interest to the > > developers. > > Perhaps, there might be a way to "defend" against this type of > > corruption. > > > > I suspected, and I still suspect that the error occurred upon a > > metadata update that corrupted the checksum for the file, probably > > due > > to silent memory corruption. If the checksum was silently > > corrupted, > > it would be simply written to both drives causing this type of > > error. > That does seem to be the most likely cause, and sadly, is not > something > any filesystem can protect reliably against on any commodity > hardware. > > > > > > With that in mind, I proved (see below) that the data blocks match > > on > > both mirrors. This I expected since the data blocks should not > > have > > been touched as the the file has not been written. > > > > This is the sequence of events as I see them that I think might be > > of > > interest to the developers. > > > > 1. A block containing a checksum for the file was read into memory. > > The block read would have been checksummed, so the checksum for the > > file must have been good at that moment. > It's worth noting that BTRFS doesn't verify all the checksums in a > metadata block when it loads that metadata block, only the ones for > the > reads that triggered the metadata block being loaded will get > verified. > > > > > > 2. The checksum block was the altered in memory (perhaps to add or > > change a value). > > > > 3. A new checksum would then have been calculated for the checksum > > block. > > > > 4. The checksum block would have been written to both mirrors. > > > > Presumably, in the case that I am experiencing, an undetected > > memory > > error must have occurred after 1 and before step 3 was completed. > > > > I wonder if there is a way to correct or detect that situation. > The closest we could get is to provide an option to handle this in > scrub, preferably with a big scary warning on it as this same > situation > can be easily cause by someone modifying the disks themselves (we > can't > reasonably protect against that, but we shouldn't make it trivial > for > people to inject arbitrary data that way either). > > > > > > As I stated previously, the machine on which this occurred does not > > have ECC memory, however, I would not think that the majority of > > users > > running btrfs do either. If it has happened to me, it likely has > > happened to others. > > > > Rick Lochner > > > > btrfs dmesg(s): > > > > [16510.334020] BTRFS warning (device sdb1): checksum error at > > logical > > 3037444042752 on dev /dev/sdb1, sector 4988789496, root 259, inode > > 1437377, offset 75754369024, length 4096, links 1 (path: > > Rick/sda4.img) > > [16510.334043] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr > > 0, rd > > 0, flush 0, corrupt 5, gen 0 > > [16510.345662] BTRFS error (device sdb1): unable to fixup (regular) > > error at logical 3037444042752 on dev /dev/sdb1 > > > > [17606.978439] BTRFS warning (device sdb1): checksum error at > > logical > > 3037444042752 on dev /dev/sdc1, sector 4988750584, root 259, inode > > 1437377, offset 75754369024, length 4096, links 1 (path: > > Rick/sda4.img) > > [17606.978460] BTRFS error (device sdb1): bdev
Re: BTRFS Data at Rest File Corruption
On 2016-05-12 20:29, Austin S. Hemmelgarn wrote: >> I wonder if there is a way to correct or detect that situation. > The closest we could get is to provide an option to handle this in > scrub, preferably with a big scary warning on it as this same > situation can be easily cause by someone modifying the disks > themselves (we can't reasonably protect against that, but we > shouldn't make it trivial for people to inject arbitrary data that > way either). "btrfs check" has the option "--init-csum-tree"... Anyway, it should be exist an option to recalculate the checksum for a single file. BTRFS is good to highlight that a file is corrupted, but it should have the possibility to read it anyway: in some case is better to have a corrupted file (knowing that it is corrupted) that loosing all the file. BR G.Baroncelli -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1] btrfs-progs: Typo review of strings and comments
On 12 May 2016 at 07:29, David Sterbawrote: > On Wed, May 11, 2016 at 07:50:35PM -0400, Nicholas D Steeves wrote: >> There were a couple of instances where I wasn't sure what to do; I've >> annotated them, and they are long lines now. To find them, search or >> grep the diff for 'Steeves'. > > Found and updated, most of them real typos, 'strtoull' is name of a C > library functioon. Hi David, Thank you for reviewing this patch. Sorry for missing the context of the strtoull comment; I should have been able to infer that and am embarrassed that I failed to. Also, embarrassed because I think I've used it in some C++ code! I learned how to use git rebase and git reset today, and can submit a v2 patch diffed against master at your earliest convenience. My only remaining question is this: mkfs.c: printf:("Incompatible features: %s", features_buf) * Should this be left as "Imcompat features"? Regards, Nicholas -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About in-band dedupe for v4.7
On Wed, May 11, 2016 at 07:36:59PM +0200, David Sterba wrote: > On Tue, May 10, 2016 at 07:52:11PM -0700, Mark Fasheh wrote: > > Taking your history with qgroups out of this btw, my opinion does not > > change. > > > > With respect to in-memory only dedupe, it is my honest opinion that such a > > limited feature is not worth the extra maintenance work. In particular > > there's about 800 lines of code in the userspace patches which I'm sure > > you'd want merged, because how could we test this then? > > I like the in-memory dedup backend. It's lightweight, only a heuristic, > does not need any IO or persistent storage. OTOH I consider it a subpart > of the in-band deduplication that does all the persistency etc. So I > treat the ioctl interface from a broader aspect. Those are all nice qualities, but what do they all get us? For example, my 'large' duperemove test involves about 750 gigabytes of general purpose data - quite literally /home off my workstation. After the run I'm usually seeing between 65-75 gigabytes saved for a total of only 10% duplicated data. I would expect this to be fairly 'average' - /home on my machine has the usual stuff - documents, source code, media, etc. So if you were writing your whole fs out you could expect about the same from inline dedupe - 10%-ish. Let's be generous and go with that number though as a general 'this is how much dedupe we get'. What the memory backend is doing then is providing a cache of sha256/block calculations. This cache is very expensive to fill, and every written block must go through it. On top of that, the cache does not persist between mounts, and has items regularly removed from it when we run low on memory. All of this will drive down the amount of duplicated data we can find. So our best case savings is probably way below 10% - let's be _really_ nice and say 5%. Now ask yourself the question - would you accept a write cache which is expensive to fill and would only have a hit rate of less than 5%? Oh and there's 800 lines of userspace we'd merge to manage this cache too, kernel ioctls which would have to be finalized, etc. > A usecase I find interesting is to keep the in-memory dedup cache and > then flush it to disk on demand, compared to automatically synced dedup > (eg. at commit time). What's the benefit here? We're still going to be hashing blocks on the way in, and if we're not deduping them at write time then we're just have to remove the extents and dedupe them later. > > A couple examples sore points in my review so far: > > > > - Internally you're using a mutex (instead of a spinlock) to lock out > > queries > > to the in-memory hash, which I can see becoming a performance problem in > > the > > write path. > > > > - Also, we're doing SHA256 in the write path which I expect will > > slow it down even more dramatically. Given that all the work done gets > > thrown out every time we fill the hash (or remount), I just don't see much > > benefit to the user with this. > > I had some ideas to use faster hashes and do sha256 when it's going to > be stored on disk, but there were some concerns. The objection against > speed and performance hit at write time is valid. But we'll need to > verify that in real performance tests, which haven't happend yet up to > my knowledge. This is the type of thing that IMHO absolutely must be provided with each code drop of the feature. Dedupe is nice but _nobody_ will use it if it's slow. I know this from experience. I personally feel that btrfs has had enough of 'cute' and 'almost working' features. If we want inline dedupe we should do it correctly and with the right metrics from the beginning. This is slightly unrelated to our discussion but my other unsolicited opinion: As a kernel developer and maintainer of a file system for well over a decade I will say that balancing the number of out of tree patches is necessary but we should never be accepting of large features just because 'they've been out for a long time'. Again I mention this because other parts of the discussion felt like they were going in that direction. Thanks, --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: added quiet-option for scripts
From: M G Berberich-q,--quiet to prevent status-messages on stderr --verbose as alternative for -v moved 'Mode NO_FILE_DATA enabled' message to stderr changed default for g_verbose to 1 --- cmds-send.c | 26 ++ 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/cmds-send.c b/cmds-send.c index 4063475..81b086e 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -44,7 +44,9 @@ #include "send.h" #include "send-utils.h" -static int g_verbose = 0; +/* default is 1 for historical reasons + changing may break scripts */ +static int g_verbose = 1; struct btrfs_send { int send_fd; @@ -301,10 +303,10 @@ static int do_send(struct btrfs_send *send, u64 parent_root_id, "Try upgrading your kernel or don't use -e.\n"); goto out; } - if (g_verbose > 0) + if (g_verbose > 1) fprintf(stderr, "BTRFS_IOC_SEND returned %d\n", ret); - if (g_verbose > 0) + if (g_verbose > 1) fprintf(stderr, "joining genl thread\n"); close(pipefd[1]); @@ -429,9 +431,11 @@ int cmd_send(int argc, char **argv) while (1) { enum { GETOPT_VAL_SEND_NO_DATA = 256 }; static const struct option long_options[] = { + { "verbose", no_argument, NULL, 'v' }, + { "quiet", no_argument, NULL, 'q' }, { "no-data", no_argument, NULL, GETOPT_VAL_SEND_NO_DATA } }; - int c = getopt_long(argc, argv, "vec:f:i:p:", long_options, NULL); + int c = getopt_long(argc, argv, "vqec:f:i:p:", long_options, NULL); if (c < 0) break; @@ -440,6 +444,9 @@ int cmd_send(int argc, char **argv) case 'v': g_verbose++; break; + case 'q': + g_verbose--; + break; case 'e': new_end_cmd_semantic = 1; break; @@ -622,8 +629,8 @@ int cmd_send(int argc, char **argv) } } - if (send_flags & BTRFS_SEND_FLAG_NO_FILE_DATA) - printf("Mode NO_FILE_DATA enabled\n"); + if ((send_flags & BTRFS_SEND_FLAG_NO_FILE_DATA) && g_verbose > 1) + fprintf(stderr, "Mode NO_FILE_DATA enabled\n"); for (i = optind; i < argc; i++) { int is_first_subvol; @@ -632,7 +639,8 @@ int cmd_send(int argc, char **argv) free(subvol); subvol = argv[i]; - fprintf(stderr, "At subvol %s\n", subvol); + if (g_verbose > 0) + fprintf(stderr, "At subvol %s\n", subvol); subvol = realpath(subvol, NULL); if (!subvol) { @@ -713,8 +721,9 @@ const char * const cmd_send_usage[] = { "which case 'btrfs send' will determine a suitable parent among the", "clone sources itself.", "\n", - "-v Enable verbose debug output. Each occurrence of", + "-v, --verboseEnable verbose debug output. Each occurrence of", " this option increases the verbose level more.", + "-q, --quiet suppress messages to stderr.", "-e If sending multiple subvols at once, use the new", " format and omit the end-cmd between the subvols.", "-p Send an incremental stream from to", @@ -728,5 +737,6 @@ const char * const cmd_send_usage[] = { " does not contain any file data and thus cannot be used", " to transfer changes. This mode is faster and useful to", " show the differences in metadata.", + "--help display this help and exit", NULL }; -- 2.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Input/output error on newly created file
2016-05-12 19:41 keltezéssel, Nikolaus Rath írta: On May 12 2016, Diego Callejawrote: El jueves, 12 de mayo de 2016 8:46:00 (CEST) Nikolaus Rath escribió: *ping* Anyone any idea? All I can say is that I've had the same problem in the past. In my case, the problematic files where active torrents. The interesting thing is that I was able to read them correctly up to a point, then I would get the same error as you. No messages in dmesg. The amount of data I was able to read from them was not random, it was something multiple of 4K. After reboot the problems went away and I wasn't able to reproduce it. There has been reports of similiar issues in the past: http://www.spinics.net/lists/linux-btrfs/msg52371.html Thanks for the pointer. So according to these reports, the IO errors happen with: 1. dm-crypt, LVM and mount options rw,noatime,compress=lzo,ssd,discard,space_cache,autodefrag,inode_cache 2. dm-crypt, no LVM, and mount options noatime,compress,nossd I can add the following data point: 3. dm-crypt on LVM, and mount options relatime,compress=lzo (Just to preserve this for the future) I don't want to repeat my e-mails in the past, but I think I have a similar issue. In short: xen virtual machine. The files that rarely become unreadable (I/O error but no error in dmesg or anywhere) are mysql MyIsam database files, and they are always small. Like 16kbyte for example, or smaller. Sometimes dropping the fs cache fixes the problem, sometimes not. Umount / mount always fixes the problem. Scrub says the filesystem is OK (when the file is unreadable). I experienced similar problem with log files (smtp or apache log files), but it is rare. It happens 1-2 times in a month on a heavy loaded mail/web server. (the log file io error is not a real problem for me, mysql is.). The problem present from 3.18 to 4.5. I started to migrate mysql table files to innodb, but i think this is only a workaround. László Szalma Best, -Nikolaus -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS Data at Rest File Corruption
On 2016-05-12 13:49, Richard A. Lochner wrote: Austin, I rebooted the computer and reran the scrub to no avail. The error is consistent. The reason I brought this question to the mailing list is because it seemed like a situation that might be of interest to the developers. Perhaps, there might be a way to "defend" against this type of corruption. I suspected, and I still suspect that the error occurred upon a metadata update that corrupted the checksum for the file, probably due to silent memory corruption. If the checksum was silently corrupted, it would be simply written to both drives causing this type of error. That does seem to be the most likely cause, and sadly, is not something any filesystem can protect reliably against on any commodity hardware. With that in mind, I proved (see below) that the data blocks match on both mirrors. This I expected since the data blocks should not have been touched as the the file has not been written. This is the sequence of events as I see them that I think might be of interest to the developers. 1. A block containing a checksum for the file was read into memory. The block read would have been checksummed, so the checksum for the file must have been good at that moment. It's worth noting that BTRFS doesn't verify all the checksums in a metadata block when it loads that metadata block, only the ones for the reads that triggered the metadata block being loaded will get verified. 2. The checksum block was the altered in memory (perhaps to add or change a value). 3. A new checksum would then have been calculated for the checksum block. 4. The checksum block would have been written to both mirrors. Presumably, in the case that I am experiencing, an undetected memory error must have occurred after 1 and before step 3 was completed. I wonder if there is a way to correct or detect that situation. The closest we could get is to provide an option to handle this in scrub, preferably with a big scary warning on it as this same situation can be easily cause by someone modifying the disks themselves (we can't reasonably protect against that, but we shouldn't make it trivial for people to inject arbitrary data that way either). As I stated previously, the machine on which this occurred does not have ECC memory, however, I would not think that the majority of users running btrfs do either. If it has happened to me, it likely has happened to others. Rick Lochner btrfs dmesg(s): [16510.334020] BTRFS warning (device sdb1): checksum error at logical 3037444042752 on dev /dev/sdb1, sector 4988789496, root 259, inode 1437377, offset 75754369024, length 4096, links 1 (path: Rick/sda4.img) [16510.334043] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0 [16510.345662] BTRFS error (device sdb1): unable to fixup (regular) error at logical 3037444042752 on dev /dev/sdb1 [17606.978439] BTRFS warning (device sdb1): checksum error at logical 3037444042752 on dev /dev/sdc1, sector 4988750584, root 259, inode 1437377, offset 75754369024, length 4096, links 1 (path: Rick/sda4.img) [17606.978460] BTRFS error (device sdb1): bdev /dev/sdc1 errs: wr 0, rd 13, flush 0, corrupt 4, gen 0 [17606.989497] BTRFS error (device sdb1): unable to fixup (regular) error at logical 3037444042752 on dev /dev/sdc1 How I compared the data blocks: #btrfs-map-logical -l 3037444042752 /dev/sdc1 mirror 1 logical 3037444042752 physical 2554240299008 device /dev/sdc1 mirror 1 logical 3037444046848 physical 2554240303104 device /dev/sdc1 mirror 2 logical 3037444042752 physical 2554260221952 device /dev/sdb1 mirror 2 logical 3037444046848 physical 2554260226048 device /dev/sdb1 #dd if=/dev/sdc1 bs=1 skip=2554240299008 count=4096 of=c1 4096+0 records in 4096+0 records out 4096 bytes (4.1 kB) copied, 0.0292201 s, 140 kB/s #dd if=/dev/sdc1 bs=1 skip=2554240303104 count=4096 of=c2 4096+0 records in 4096+0 records out 4096 bytes (4.1 kB) copied, 0.0142381 s, 288 kB/s #dd if=/dev/sdb1 bs=1 skip=2554260221952 count=4096 of=b1 4096+0 records in 4096+0 records out 4096 bytes (4.1 kB) copied, 0.0293211 s, 140 kB/s #dd if=/dev/sdb1 bs=1 skip=2554260226048 count=4096 of=b2 4096+0 records in 4096+0 records out 4096 bytes (4.1 kB) copied, 0.0151947 s, 270 kB/s #diff b1 c1 #diff b2 c2 Excellent thinking here. Now, if you can find some external method to verify that that block is in fact correct, you can just write it back into the file itself at the correct offset, and fix the issue. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS Data at Rest File Corruption
Andrew, I agree with your supposition about the metadata and corrupted RAM. I verified that the data blocks on both devices are equal (see my reply to Austin for the commands I used. I believe that they correctly prove that the blocks are, in fact, equal. I am not sure I have the skills to "walk the checksum tree manually" as you described. I would also like to verify that the checksum blocks agree as I expect they do, but I may have to "bone up" on my tree walking skills first. Thanks for your help. Rick Lochner On Wed, 2016-05-11 at 21:16 -0400, Andrew Wade wrote: > > I would expect the "data at rest" to be good too. But perhaps > something happened to the metadata (checksum). If the checksum was > corrupted in RAM it could be written back to the disks due to updates > elsewhere in the metadata node. > If this is what happened I would expect the metadata node containing > the checksum to have a recent generation number. > I'm not actually a BTRFS developer myself, but you might be able to > find the generation by using btrfs-debug-tree from btrfs-tools. > btrfs-debug-tree -r /dev/sdc1 will give you the block number of the > checksum tree root, which you can then feed into btrfs-debug-tree -b > /dev/sdc1 and walk the tree manually. You're looking for the > largest key before 3037444042752. > For dumping the data and metadata blocks I think btrfs-map-logical is > what you need, though to be honest I've never used this tool myself. > Even if the file data is still good I don't know of a simple way to > tell BTRFS to ignore the checksums for a file. It is possible to > regenerate the checksum tree for the entire filesystem, but I > personally wouldn't do that unless you really need the file. > regards, > Andrew > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fsck: to repair or not to repair
On 05/12/2016 10:35 AM, Nikolaus Rath wrote: On May 12 2016, Henk Slagerwrote: On Wed, May 11, 2016 at 11:10 PM, Nikolaus Rath wrote: Hello, I recently ran btrfsck on one of my file systems, and got the following messages: checking extents checking free space cache checking fs roots root 5 inode 3149867 errors 400, nbytes wrong root 5 inode 3150237 errors 400, nbytes wrong root 5 inode 3150238 errors 400, nbytes wrong root 5 inode 3150242 errors 400, nbytes wrong root 5 inode 3150260 errors 400, nbytes wrong [ lots of similar message with different inode numbers ] root 5 inode 15595011 errors 400, nbytes wrong root 5 inode 15595016 errors 400, nbytes wrong Checking filesystem on /dev/mapper/vg0-nikratio_crypt UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38 found 263648960636 bytes used err is 1 total csum bytes: 395314372 total tree bytes: 908644352 total fs tree bytes: 352735232 total extent tree bytes: 95039488 btree space waste bytes: 156301160 file data blocks allocated: 675209801728 referenced 410351722496 Btrfs v3.17 Can someone explain to me the risk that I run by attempting a repair, and (conversely) what I put at stake when continuing to use this file system as-is? It has once been mentioned in this mail-list, that if the 'errors 400, nbytes wrong' is the only error on an fs, btrfs check --repair can fix them ( was around time of tools release 4.4 , by Qu AFAIK). I had /(have?) about 7 of those errors in small files on an fs that is 2.5 years old and has quite some older ro snapshots. I once tried to fix them with 4.5.0 + some patches tools, but actually they did not get fixed. At least with 4.5.2 or 4.5.3 tools it should be possible to fix them in your case. Maybe you first want to test it on an overlay of the device or copy the whole fs with dd. It depends on how much time you can allow the fs to be offline etc, it is up to you. In my case, I recreated the files in the working subvol, but as long [...] How did you determine which files were affected? Is there a way to map inodes to paths? btrfs inspect-internal inode-resolve . This resolves the in subvol to its fs paths Thanks, Ashish Thanks! -Nikolaus -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS Data at Rest File Corruption
Austin, I rebooted the computer and reran the scrub to no avail. The error is consistent. The reason I brought this question to the mailing list is because it seemed like a situation that might be of interest to the developers. Perhaps, there might be a way to "defend" against this type of corruption. I suspected, and I still suspect that the error occurred upon a metadata update that corrupted the checksum for the file, probably due to silent memory corruption. If the checksum was silently corrupted, it would be simply written to both drives causing this type of error. With that in mind, I proved (see below) that the data blocks match on both mirrors. This I expected since the data blocks should not have been touched as the the file has not been written. This is the sequence of events as I see them that I think might be of interest to the developers. 1. A block containing a checksum for the file was read into memory. The block read would have been checksummed, so the checksum for the file must have been good at that moment. 2. The checksum block was the altered in memory (perhaps to add or change a value). 3. A new checksum would then have been calculated for the checksum block. 4. The checksum block would have been written to both mirrors. Presumably, in the case that I am experiencing, an undetected memory error must have occurred after 1 and before step 3 was completed. I wonder if there is a way to correct or detect that situation. As I stated previously, the machine on which this occurred does not have ECC memory, however, I would not think that the majority of users running btrfs do either. If it has happened to me, it likely has happened to others. Rick Lochner btrfs dmesg(s): [16510.334020] BTRFS warning (device sdb1): checksum error at logical 3037444042752 on dev /dev/sdb1, sector 4988789496, root 259, inode 1437377, offset 75754369024, length 4096, links 1 (path: Rick/sda4.img) [16510.334043] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0 [16510.345662] BTRFS error (device sdb1): unable to fixup (regular) error at logical 3037444042752 on dev /dev/sdb1 [17606.978439] BTRFS warning (device sdb1): checksum error at logical 3037444042752 on dev /dev/sdc1, sector 4988750584, root 259, inode 1437377, offset 75754369024, length 4096, links 1 (path: Rick/sda4.img) [17606.978460] BTRFS error (device sdb1): bdev /dev/sdc1 errs: wr 0, rd 13, flush 0, corrupt 4, gen 0 [17606.989497] BTRFS error (device sdb1): unable to fixup (regular) error at logical 3037444042752 on dev /dev/sdc1 How I compared the data blocks: #btrfs-map-logical -l 3037444042752 /dev/sdc1 mirror 1 logical 3037444042752 physical 2554240299008 device /dev/sdc1 mirror 1 logical 3037444046848 physical 2554240303104 device /dev/sdc1 mirror 2 logical 3037444042752 physical 2554260221952 device /dev/sdb1 mirror 2 logical 3037444046848 physical 2554260226048 device /dev/sdb1 #dd if=/dev/sdc1 bs=1 skip=2554240299008 count=4096 of=c1 4096+0 records in 4096+0 records out 4096 bytes (4.1 kB) copied, 0.0292201 s, 140 kB/s #dd if=/dev/sdc1 bs=1 skip=2554240303104 count=4096 of=c2 4096+0 records in 4096+0 records out 4096 bytes (4.1 kB) copied, 0.0142381 s, 288 kB/s #dd if=/dev/sdb1 bs=1 skip=2554260221952 count=4096 of=b1 4096+0 records in 4096+0 records out 4096 bytes (4.1 kB) copied, 0.0293211 s, 140 kB/s #dd if=/dev/sdb1 bs=1 skip=2554260226048 count=4096 of=b2 4096+0 records in 4096+0 records out 4096 bytes (4.1 kB) copied, 0.0151947 s, 270 kB/s #diff b1 c1 #diff b2 c2 > On Wed, 2016-05-11 at 15:26 -0400, Austin S. Hemmelgarn wrote: On 2016-05-11 14:36, Richard Lochner wrote: > > > > Hello, > > > > I have encountered a data corruption error with BTRFS which may or > > may > > not be of interest to your developers. > > > > The problem is that an unmodified file on a RAID-1 volume that had > > been scrubbed successfully is now corrupt. The details follow. > > > > The volume was formatted as btrfs with raid1 data and raid1 > > metadata > > on two new 4T hard drives (WD Data Center Re WD4000FYYZ) . > > > > A large binary file was copied to the volume (~76 GB) on December > > 27, > > 2015. Soon after copying the file, a btrfs scrub was run. There > > were > > no errors. Multiple scrubs have also been run over the past > > several > > months. > > > > Recently, a scrub returned an unrecoverable error on that file. > > Again, the file has not been modified since it was originally > > copied > > and has the time stamp from December. Furthermore, SMART tests > > (long) > > for both drives do not indicate any errors (Current_Pending_Sector > > or > > otherwise). > > > > I should note that the system does not have ECC memory. > > > > It would be interesting to me to know if: > > > > a) The primary and secondary data blocks match (I suspect they do), > > and > > b) The primary and secondary checksums for the block match (I > > suspect > > they do as well) > Do you mean if
Re: Input/output error on newly created file
On May 12 2016, Diego Callejawrote: > El jueves, 12 de mayo de 2016 8:46:00 (CEST) Nikolaus Rath escribió: >> *ping* >> >> Anyone any idea? > > All I can say is that I've had the same problem in the past. In my > case, the problematic files where active torrents. The interesting > thing is that I was able to read them correctly up to a point, then > I would get the same error as you. No messages in dmesg. The amount > of data I was able to read from them was not random, it was > something multiple of 4K. After reboot the problems went away and > I wasn't able to reproduce it. > > There has been reports of similiar issues in the past: > http://www.spinics.net/lists/linux-btrfs/msg52371.html Thanks for the pointer. So according to these reports, the IO errors happen with: 1. dm-crypt, LVM and mount options rw,noatime,compress=lzo,ssd,discard,space_cache,autodefrag,inode_cache 2. dm-crypt, no LVM, and mount options noatime,compress,nossd I can add the following data point: 3. dm-crypt on LVM, and mount options relatime,compress=lzo (Just to preserve this for the future) Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.« -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: fix race between fsync and direct IO writes for prealloc extents
On 05/09/2016 07:01 AM, fdman...@kernel.org wrote: From: Filipe MananaWhen we do a direct IO write against a preallocated extent (fallocate) that does not go beyond the i_size of the inode, we do the write operation without holding the inode's i_mutex (an optimization that landed in commit 38851cc19adb ("Btrfs: implement unlocked dio write")). This allows for a very tiny time window where a race can happen with a concurrent fsync using the fast code path, as the direct IO write path creates first a new extent map (no longer flagged as a prealloc extent) and then it creates the ordered extent, while the fast fsync path first collects ordered extents and then it collects extent maps. This allows for the possibility of the fast fsync path to collect the new extent map without collecting the new ordered extent, and therefore logging an extent item based on the extent map without waiting for the ordered extent to be created and complete. This can result in a situation where after a log replay we end up with an extent not marked anymore as prealloc but it was only partially written (or not written at all), exposing random, stale or garbage data corresponding to the unwritten pages and without any checksums in the csum tree covering the extent's range. This is an extension of what was done in commit de0ee0edb21f ("Btrfs: fix race between fsync and lockless direct IO writes"). So fix this by creating first the ordered extent and then the extent map, so that this way if the fast fsync patch collects the new extent map it also collects the corresponding ordered extent. Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: add semaphore to synchronize direct IO writes with fsync
On 05/12/2016 10:26 AM, fdman...@kernel.org wrote: From: Filipe MananaDue to the optimization of lockless direct IO writes (the inode's i_mutex is not held) introduced in commit 38851cc19adb ("Btrfs: implement unlocked dio write"), we started having races between such writes with concurrent fsync operations that use the fast fsync path. These races were addressed in the patches titled "Btrfs: fix race between fsync and lockless direct IO writes" and "Btrfs: fix race between fsync and direct IO writes for prealloc extents". The races happened because the direct IO path, like every other write path, does create extent maps followed by the corresponding ordered extents while the fast fsync path collected first ordered extents and then it collected extent maps. This made it possible to log file extent items (based on the collected extent maps) without waiting for the corresponding ordered extents to complete (get their IO done). The two fixes mentioned before added a solution that consists of making the direct IO path create first the ordered extents and then the extent maps, while the fsync path attempts to collect any new ordered extents once it collects the extent maps. This was simple and did not require adding any synchonization primitive to any data structure (struct btrfs_inode for example) but it makes things more fragile for future development endeavours and adds an exceptional approach compared to the other write paths. This change adds a read-write semaphore to the btrfs inode structure and makes the direct IO path create the extent maps and the ordered extents while holding read access on that semaphore, while the fast fsync path collects extent maps and ordered extents while holding write access on that semaphore. The logic for direct IO write path is encapsulated in a new helper function that is used both for cow and nocow direct IO writes. Signed-off-by: Filipe Manana Looks good, thanks Filipe, Reviewed-by: Josef Bacik Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fsck: to repair or not to repair
On May 12 2016, Henk Slagerwrote: > On Wed, May 11, 2016 at 11:10 PM, Nikolaus Rath wrote: >> Hello, >> >> I recently ran btrfsck on one of my file systems, and got the following >> messages: >> >> checking extents >> checking free space cache >> checking fs roots >> root 5 inode 3149867 errors 400, nbytes wrong >> root 5 inode 3150237 errors 400, nbytes wrong >> root 5 inode 3150238 errors 400, nbytes wrong >> root 5 inode 3150242 errors 400, nbytes wrong >> root 5 inode 3150260 errors 400, nbytes wrong >> [ lots of similar message with different inode numbers ] >> root 5 inode 15595011 errors 400, nbytes wrong >> root 5 inode 15595016 errors 400, nbytes wrong >> Checking filesystem on /dev/mapper/vg0-nikratio_crypt >> UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38 >> found 263648960636 bytes used err is 1 >> total csum bytes: 395314372 >> total tree bytes: 908644352 >> total fs tree bytes: 352735232 >> total extent tree bytes: 95039488 >> btree space waste bytes: 156301160 >> file data blocks allocated: 675209801728 >> referenced 410351722496 >> Btrfs v3.17 >> >> >> >> Can someone explain to me the risk that I run by attempting a repair, >> and (conversely) what I put at stake when continuing to use this file >> system as-is? > > It has once been mentioned in this mail-list, that if the 'errors 400, > nbytes wrong' is the only error on an fs, btrfs check --repair can fix > them ( was around time of tools release 4.4 , by Qu AFAIK). > I had /(have?) about 7 of those errors in small files on an fs that is > 2.5 years old and has quite some older ro snapshots. I once tried to > fix them with 4.5.0 + some patches tools, but actually they did not > get fixed. At least with 4.5.2 or 4.5.3 tools it should be possible to > fix them in your case. Maybe you first want to test it on an overlay > of the device or copy the whole fs with dd. It depends on how much > time you can allow the fs to be offline etc, it is up to you. > > In my case, I recreated the files in the working subvol, but as long [...] How did you determine which files were affected? Is there a way to map inodes to paths? Thanks! -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.« -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: add semaphore to synchronize direct IO writes with fsync
From: Filipe MananaDue to the optimization of lockless direct IO writes (the inode's i_mutex is not held) introduced in commit 38851cc19adb ("Btrfs: implement unlocked dio write"), we started having races between such writes with concurrent fsync operations that use the fast fsync path. These races were addressed in the patches titled "Btrfs: fix race between fsync and lockless direct IO writes" and "Btrfs: fix race between fsync and direct IO writes for prealloc extents". The races happened because the direct IO path, like every other write path, does create extent maps followed by the corresponding ordered extents while the fast fsync path collected first ordered extents and then it collected extent maps. This made it possible to log file extent items (based on the collected extent maps) without waiting for the corresponding ordered extents to complete (get their IO done). The two fixes mentioned before added a solution that consists of making the direct IO path create first the ordered extents and then the extent maps, while the fsync path attempts to collect any new ordered extents once it collects the extent maps. This was simple and did not require adding any synchonization primitive to any data structure (struct btrfs_inode for example) but it makes things more fragile for future development endeavours and adds an exceptional approach compared to the other write paths. This change adds a read-write semaphore to the btrfs inode structure and makes the direct IO path create the extent maps and the ordered extents while holding read access on that semaphore, while the fast fsync path collects extent maps and ordered extents while holding write access on that semaphore. The logic for direct IO write path is encapsulated in a new helper function that is used both for cow and nocow direct IO writes. Signed-off-by: Filipe Manana --- fs/btrfs/btrfs_inode.h | 10 fs/btrfs/inode.c | 134 +++-- fs/btrfs/tree-log.c| 51 ++- 3 files changed, 77 insertions(+), 118 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 61205e3..1da5753 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -196,6 +196,16 @@ struct btrfs_inode { struct list_head delayed_iput; long delayed_iput_count; + /* +* To avoid races between lockless (i_mutex not held) direct IO writes +* and concurrent fsync requests. Direct IO writes must acquire read +* access on this semaphore for creating an extent map and its +* corresponding ordered extent. The fast fsync path must acquire write +* access on this semaphore before it collects ordered extents and +* extent maps. +*/ + struct rw_semaphore dio_sem; + struct inode vfs_inode; }; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5065ac2..c483bd21 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7145,6 +7145,43 @@ out: return em; } +static struct extent_map *btrfs_create_dio_extent(struct inode *inode, + const u64 start, + const u64 len, + const u64 orig_start, + const u64 block_start, + const u64 block_len, + const u64 orig_block_len, + const u64 ram_bytes, + const int type) +{ + struct extent_map *em = NULL; + int ret; + + down_read(_I(inode)->dio_sem); + if (type != BTRFS_ORDERED_NOCOW) { + em = create_pinned_em(inode, start, len, orig_start, + block_start, block_len, orig_block_len, + ram_bytes, type); + if (IS_ERR(em)) + goto out; + } + ret = btrfs_add_ordered_extent_dio(inode, start, block_start, + len, block_len, type); + if (ret) { + if (em) { + free_extent_map(em); + btrfs_drop_extent_cache(inode, start, + start + len - 1, 0); + } + em = ERR_PTR(ret); + } + out: + up_read(_I(inode)->dio_sem); + + return em; +} + static struct extent_map *btrfs_new_extent_direct(struct inode *inode, u64 start, u64 len) { @@ -7160,43 +7197,13 @@ static struct extent_map *btrfs_new_extent_direct(struct inode *inode, if (ret) return ERR_PTR(ret); - /* -* Create the ordered extent before the extent map. This is
Re: fsck: to repair or not to repair
On Wed, May 11, 2016 at 11:10 PM, Nikolaus Rathwrote: > Hello, > > I recently ran btrfsck on one of my file systems, and got the following > messages: > > checking extents > checking free space cache > checking fs roots > root 5 inode 3149867 errors 400, nbytes wrong > root 5 inode 3150237 errors 400, nbytes wrong > root 5 inode 3150238 errors 400, nbytes wrong > root 5 inode 3150242 errors 400, nbytes wrong > root 5 inode 3150260 errors 400, nbytes wrong > [ lots of similar message with different inode numbers ] > root 5 inode 15595011 errors 400, nbytes wrong > root 5 inode 15595016 errors 400, nbytes wrong > Checking filesystem on /dev/mapper/vg0-nikratio_crypt > UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38 > found 263648960636 bytes used err is 1 > total csum bytes: 395314372 > total tree bytes: 908644352 > total fs tree bytes: 352735232 > total extent tree bytes: 95039488 > btree space waste bytes: 156301160 > file data blocks allocated: 675209801728 > referenced 410351722496 > Btrfs v3.17 > > > > Can someone explain to me the risk that I run by attempting a repair, > and (conversely) what I put at stake when continuing to use this file > system as-is? It has once been mentioned in this mail-list, that if the 'errors 400, nbytes wrong' is the only error on an fs, btrfs check --repair can fix them ( was around time of tools release 4.4 , by Qu AFAIK). I had /(have?) about 7 of those errors in small files on an fs that is 2.5 years old and has quite some older ro snapshots. I once tried to fix them with 4.5.0 + some patches tools, but actually they did not get fixed. At least with 4.5.2 or 4.5.3 tools it should be possible to fix them in your case. Maybe you first want to test it on an overlay of the device or copy the whole fs with dd. It depends on how much time you can allow the fs to be offline etc, it is up to you. In my case, I recreated the files in the working subvol, but as long as I don't remove the older snapshots, the errors 400 will still be there I assume. At least I don't experience any negative impact of it, so I keep it like it is until at some point in time the older snapshots get removed or I am somehow forced to clone back the data into a fresh fs. I am running mostly latest stable or sometimes mainline kernel. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair
On Thu, May 12, 2016 at 04:35:24PM +0200, Niccolò Belli wrote: > When doing the btrfs check I also always do a btrfs scrub and it never found > any error. Once it didn't manage to finish the scrub because of: > BTRFS critical (device dm-0): corrupt leaf, slot offset bad: > block=670597120,root=1, slot=6 > and btrfs scrub status reported "was aborted after 00:00:10". > > Talking about scrub I created a systemd timer to run scrub hourly and I > noticed 2 *uncorrectable* errors suddenly appeared on my system. So I > immediately re-run the scrub just to confirm it and then I rebooted into the > Arch live usb and runned btrfs check: the metadata were perfect. So I runned > btrfs scrub from the live usb and there were no errors at all! I rebooted > into my system and runned scrub once again and the uncorrectable errors > where really gone! It happened two times in the past few days. That's what a RAM corruption problem looks like when you run btrfs scrub. Maybe the RAM itself is OK, but *something* is scribbling on it. Does the Arch live usb use the same kernel as your normal system? > Almost no patches get applied by the Arch kernel team: > https://git.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/linux > At the moment the only one is an harmless > "change-default-console-loglevel.patch". Did you try an older (or newer) kernel? I've been running 4.5.x on a few canary systems, but so far none of them have survived more than a day. Contrast with 4.1.x and 4.4.x, which runs for months between reboots for me. Maybe there's a regression in 4.5.x, maybe I did something wrong in my config or build, or maybe I just have too few data points to draw any conclusions, but my data so far is telling me to stay on 4.4.x until something changes (i.e. wait for a 4.5.x stable update or skip directly to 4.6.x). :-/ It's always worth trying this if only to eliminate regression as a possible root cause early. In practice, every mainline kernel release has a regression that affects at least one combination of config options and hardware. btrfs is stable enough now that you can be running one or two releases behind to avoid a problem elsewhere in the kernel. > Another option will be crashing it with my car's wheels hoping that because > of my comprehensive insurance policy Dell will give me the next model (the > Skylake one) as a replacement (hoping that it will not suffer from the same > issue of the Broadwell one). The first rule of Insurance Fraud Club: don't talk about Insurance Fraud Club. ;) It's possible there's a problem that affects only very specific chipsets You seem to have eliminated RAM in isolation, but there could be a problem in the kernel that affects only your chipset. signature.asc Description: Digital signature
Re: [PATCH] fstests: test creating a symlink and then fsync its parent directory
On 04/24/2016 09:26 PM, fdman...@kernel.org wrote: From: Filipe MananaTest creating a symlink, fsync its parent directory, power fail and mount again the filesystem. After these steps the symlink should exist and its content must match what we specified when we created it (must not be empty or point to something else). This is motivated by an issue in btrfs where after the log replay happens we get empty symlinks, which not only does not make much sense from a user's point of view, it's also not valid to have empty links in linux (wgich is explicitly forbidden by the symlink(2) system call). The issue in btrfs is fixed by the following patch for the linux kernel: "Btrfs: fix empty symlink after creating symlink and fsync parent dir" Tested against ext3, ext4, xfs, f2fs, reiserfs and nilfs2. Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Input/output error on newly created file
El jueves, 12 de mayo de 2016 8:46:00 (CEST) Nikolaus Rath escribió: > *ping* > > Anyone any idea? All I can say is that I've had the same problem in the past. In my case, the problematic files where active torrents. The interesting thing is that I was able to read them correctly up to a point, then I would get the same error as you. No messages in dmesg. The amount of data I was able to read from them was not random, it was something multiple of 4K. After reboot the problems went away and I wasn't able to reproduce it. There has been reports of similiar issues in the past: http://www.spinics.net/lists/linux-btrfs/msg52371.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo
Thanks for the detailed explanation, hopefully in the future someone will be able to make defrag snapshot/reflink aware in a scalable manner. I will not use use defrag anymore, but what do you suggest me to do to reclaim the lost space? Get rid of my current snapshots or maybe simply running bedup? Niccolò -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Input/output error on newly created file
*ping* Anyone any idea? Best, -Nikolaus On May 09 2016, Nikolaus Rathwrote: > On May 09 2016, Filipe Manana wrote: >> On Sun, May 8, 2016 at 11:18 PM, Nikolaus Rath wrote: >>> Hello, >>> >>> I just created an innocent 10 MB on a btrfs file system, yet my attempt >>> to read it a few seconds later (and ever since), just gives: >>> >>> $ ls -l in-progress/mysterious-io-error >>> -rw-rw-r-- 1 nikratio nikratio 10485760 May 8 14:41 >>> in-progress/mysterious-io-error >>> $ cat in-progress/mysterious-io-error >>> cat: in-progress/mysterious-io-error: Input/output error >> >> If you unmount and mount again the filesystem, does it happen again? > > After rebooting, the previously unaccessible file can now be read. But I > cannot tell if it contains the right data. > > However, I just encountered the same problem with another, freshly > created file. > >> How did you create the file? > > In Python 3. The equivalent code is more or less: > > with open('file.dat', 'wb+') as fh: > for buf in generate_data(): > fh.write(buf) # bufsize is about 128 kB > > > However, I should note that there is a lot of activity in this > file system (it contains my home directory), so the above alone will > probably not reproduce the problem. > > That said, so far both the problematic files were created by the same > application (S3QL, of which luckily I am also the maintainer). > > >> Does fsck reports any issues? > > Do you mean btrfsck? It actually has a lot to say: > > checking extents > checking free space cache > checking fs roots > root 5 inode 3149867 errors 400, nbytes wrong > root 5 inode 3150237 errors 400, nbytes wrong > root 5 inode 3150238 errors 400, nbytes wrong > root 5 inode 3150242 errors 400, nbytes wrong > root 5 inode 3150260 errors 400, nbytes wrong > [...] > root 5 inode 15595011 errors 400, nbytes wrong > root 5 inode 15595016 errors 400, nbytes wrong > Checking filesystem on /dev/mapper/vg0-nikratio_crypt > UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38 > found 263648960636 bytes used err is 1 > total csum bytes: 395314372 > total tree bytes: 908644352 > total fs tree bytes: 352735232 > total extent tree bytes: 95039488 > btree space waste bytes: 156301160 > file data blocks allocated: 675209801728 > referenced 410351722496 > Btrfs v3.17 > > However, the inode of the problematic file (16186241) is not > mentioned. But I guess this is not surprising, because also for this > file, I can read the contents after remounting. > > > Best, > -Nikolaus > > > -- > GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F > Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F > > »Time flies like an arrow, fruit flies like a Banana.« > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.« -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair
On 2016-05-12 10:35, Niccolò Belli wrote: On lunedì 9 maggio 2016 18:29:41 CEST, Zygo Blaxell wrote: Did you also check the data matches the backup? btrfs check will only look at the metadata, which is 0.1% of what you've copied. From what you've written, there should be a lot of errors in the data too. If you have incorrect data but btrfs scrub finds no incorrect checksums, then your storage layer is probably fine and we have to look at CPU, host RAM, and software as possible culprits. The logs you've posted so far indicate that bad metadata (e.g. negative item lengths, nonsense transids in metadata references but sane transids in the referred pages) is getting into otherwise valid and well-formed btrfs metadata pages. Since these pages are protected by checksums, the corruption can't be originating in the storage layer--if it was, the pages should be rejected as they are read from disk, before btrfs even looks at them, and the insane transid should be the "found" one not the "expected" one. That suggests there is either RAM corruption happening _after_ the data is read from disk (i.e. while the pages are cached in RAM), or a severe software bug in the kernel you're running. When doing the btrfs check I also always do a btrfs scrub and it never found any error. Once it didn't manage to finish the scrub because of: BTRFS critical (device dm-0): corrupt leaf, slot offset bad: block=670597120,root=1, slot=6 and btrfs scrub status reported "was aborted after 00:00:10". Talking about scrub I created a systemd timer to run scrub hourly and I noticed 2 *uncorrectable* errors suddenly appeared on my system. So I immediately re-run the scrub just to confirm it and then I rebooted into the Arch live usb and runned btrfs check: the metadata were perfect. So I runned btrfs scrub from the live usb and there were no errors at all! I rebooted into my system and runned scrub once again and the uncorrectable errors where really gone! It happened two times in the past few days. This would indicate to me that you've either got bad RAM (most likely), or some other hardware component is not working correctly. It's not unusual for hardware issues to be intermittent. Try different kernel versions (e.g. 4.4.9 or 4.1.23) in case whoever maintains your kernel had a bad day and merged a patch they should not have. Almost no patches get applied by the Arch kernel team: https://git.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/linux At the moment the only one is an harmless "change-default-console-loglevel.patch". Try a minimal configuration with as few drivers as possible loaded, especially GPU drivers and anything from the staging subdirectory--when these drivers have bugs, they ruin everything. Arch kernel team is quite conservative regarding staging/experimental features, I remember they rejected some config patches I submitted because of this. Anyway I will try to blacklist as many kernel modules as I can. Maybe blacklisting GPU is too much because if I can't actually use my laptop it will be much more difficult to reproduce the issue. Disable the GPU driver, but make sure you have the VGA_CONSOLE config enabled, and you should be fine (you'll just get a 80x25 text-mode console instead of a high-resolution one). Try memtest86+ which has a few more/different tests than memtest86. I have encountered RAM modules that pass memtest86 but fail memtest86+ and vice versa. Try memtester, a memory tester that runs as a Linux process, so it can detect corruption caused when device drivers spray data randomly into RAM, or when the CPU thermal controls are influenced by Linux (an overheating CPU-to-RAM bridge can really ruin your day, and some of the dumber laptop designs rely on the OS for thermal management). Try running more than one memory testing process, in case there is a bug in your hardware that affects interactions between multiple cores (memtest is single-threaded). You can run memtest86 inside a kvm (e.g. kvm -m 3072 -kernel /boot/memtest86.bin) to detect these kinds of issues. Kernel compiles are a bad way to test RAM. I've successfully built kernels on hosts with known RAM failures. The kernels don't always work properly, but it's quite rare to see a build fail outright. I didn't use memtest86+ because of the lack of EFI support, but I just tried the shiny new memtest86 7.0 beta with improved tests for 12+ hours without issues. Also I runned "memtester 4G" and "systester-cli -gausslg 64M -threads 4 -turns 10" together for 12 hours without any issue so I think both my ram and cpu are ok. That's probably a good indication of the CPU and the MB being OK, but not necessarily the RAM. There's two other possible options for testing the RAM that haven't been mentioned yet though (which I hadn't thought of myself until now): 1. If you have access to Windows, try the Windows Memory Diagnostic. This runs yet another slightly different set of tests from memtest86 and memtest86+,
Re: [PATCH 2/3] btrfs-progs: autogen: Make build success in CentOS 6 and 7
On 5/12/16 6:42 AM, Zhao Lei wrote: > btrfs-progs build failed in CentOS 6 and 7: > #./autogen.sh > ... > configure.ac:131: error: possibly undefined macro: PKG_CHECK_VAR > If this token and others are legitimate, please use m4_pattern_allow. > See the Autoconf documentation. > ... > > Seems PKG_CHECK_VAR is new in pkgconfig 0.28 (24-Jan-2013): > http://redmine.audacious-media-player.org/boards/1/topics/736 > > And the max available version for CentOS 7 in yum-repo and > rpmfind.net is: pkgconfig-0.27.1-4.el7 > http://rpmfind.net/linux/rpm2html/search.php?query=pkgconfig=Search+...=centos= > > I updated my pkgconfig to 0.30, but still failed at above error. > (Maybe it is my setting problem) > > To make user in centos 6 and 7 building btrfs-progs without > more changes, we can avoid using PKG_CHECK_VAR in following > way found in: > https://github.com/audacious-media-player/audacious-plugins/commit/f95ab6f939ecf0d9232b3165f9241d2ea9676b9e > > Signed-off-by: Zhao Lei> --- > configure.ac | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/configure.ac b/configure.ac > index 4688bc7..a754990 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -128,7 +128,7 @@ PKG_STATIC(UUID_LIBS_STATIC, [uuid]) > PKG_CHECK_MODULES(ZLIB, [zlib]) > PKG_STATIC(ZLIB_LIBS_STATIC, [zlib]) > > -PKG_CHECK_VAR([UDEVDIR], [udev], [udevdir]) > +UDEVDIR="$(pkg-config udev --variable=udevdir)" You need, minimally, AC_SUBST(UDEVDIR) here as well. -Jeff -- Jeff Mahoney SUSE Labs signature.asc Description: OpenPGP digital signature
Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair
On lunedì 9 maggio 2016 18:29:41 CEST, Zygo Blaxell wrote: Did you also check the data matches the backup? btrfs check will only look at the metadata, which is 0.1% of what you've copied. From what you've written, there should be a lot of errors in the data too. If you have incorrect data but btrfs scrub finds no incorrect checksums, then your storage layer is probably fine and we have to look at CPU, host RAM, and software as possible culprits. The logs you've posted so far indicate that bad metadata (e.g. negative item lengths, nonsense transids in metadata references but sane transids in the referred pages) is getting into otherwise valid and well-formed btrfs metadata pages. Since these pages are protected by checksums, the corruption can't be originating in the storage layer--if it was, the pages should be rejected as they are read from disk, before btrfs even looks at them, and the insane transid should be the "found" one not the "expected" one. That suggests there is either RAM corruption happening _after_ the data is read from disk (i.e. while the pages are cached in RAM), or a severe software bug in the kernel you're running. When doing the btrfs check I also always do a btrfs scrub and it never found any error. Once it didn't manage to finish the scrub because of: BTRFS critical (device dm-0): corrupt leaf, slot offset bad: block=670597120,root=1, slot=6 and btrfs scrub status reported "was aborted after 00:00:10". Talking about scrub I created a systemd timer to run scrub hourly and I noticed 2 *uncorrectable* errors suddenly appeared on my system. So I immediately re-run the scrub just to confirm it and then I rebooted into the Arch live usb and runned btrfs check: the metadata were perfect. So I runned btrfs scrub from the live usb and there were no errors at all! I rebooted into my system and runned scrub once again and the uncorrectable errors where really gone! It happened two times in the past few days. Try different kernel versions (e.g. 4.4.9 or 4.1.23) in case whoever maintains your kernel had a bad day and merged a patch they should not have. Almost no patches get applied by the Arch kernel team: https://git.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/linux At the moment the only one is an harmless "change-default-console-loglevel.patch". Try a minimal configuration with as few drivers as possible loaded, especially GPU drivers and anything from the staging subdirectory--when these drivers have bugs, they ruin everything. Arch kernel team is quite conservative regarding staging/experimental features, I remember they rejected some config patches I submitted because of this. Anyway I will try to blacklist as many kernel modules as I can. Maybe blacklisting GPU is too much because if I can't actually use my laptop it will be much more difficult to reproduce the issue. Try memtest86+ which has a few more/different tests than memtest86. I have encountered RAM modules that pass memtest86 but fail memtest86+ and vice versa. Try memtester, a memory tester that runs as a Linux process, so it can detect corruption caused when device drivers spray data randomly into RAM, or when the CPU thermal controls are influenced by Linux (an overheating CPU-to-RAM bridge can really ruin your day, and some of the dumber laptop designs rely on the OS for thermal management). Try running more than one memory testing process, in case there is a bug in your hardware that affects interactions between multiple cores (memtest is single-threaded). You can run memtest86 inside a kvm (e.g. kvm -m 3072 -kernel /boot/memtest86.bin) to detect these kinds of issues. Kernel compiles are a bad way to test RAM. I've successfully built kernels on hosts with known RAM failures. The kernels don't always work properly, but it's quite rare to see a build fail outright. I didn't use memtest86+ because of the lack of EFI support, but I just tried the shiny new memtest86 7.0 beta with improved tests for 12+ hours without issues. Also I runned "memtester 4G" and "systester-cli -gausslg 64M -threads 4 -turns 10" together for 12 hours without any issue so I think both my ram and cpu are ok. I can think only about two possible culprits now (correct me if I'm wrong): 1) A btrfs bug 2) Another module screwing things around I can do nothing about btrfs bugs so I will try to hunt the second option. This is the list of modules I'm running: lsmod | awk '$4 == ""' | awk '{print $1}' | sort 8250_dw ac acpi_als acpi_pad aesni_intel ahci algif_skcipher ansi_cprng arc4 atkbd battery bnep btrfs btusb cdc_ether cmac coretemp crc32c_intel crc32_pclmul crct10dif_pclmul dell_laptop dell_wmi dm_crypt drbg ecb elan_i2c evdev ext4 fan fjes ghash_clmulni_intel gpio_lynxpoint hid_generic hid_multitouch hmac i2c_designware_platform i2c_hid i2c_i801 i915 input_leds int3400_thermal int3402_thermal int3403_thermal intel_hid intel_pch_thermal intel_powerclamp intel_rapl ip_tables
Undelete deleted subvolume?
I accidentally deleted wrong snapshot using SUSE snapper. Is it possible to undelete subvolume? I know that it is possible to extract files from old tree (although SLES12 does not seem to offer btrfs-find-root), but is it possible to "reconnect" subvolume back? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATHCH] add option to supress "At subvol …" message in btrfs send
On Sat, May 07, 2016 at 06:29:58PM +0200, M G Berberich wrote: > btrfs send puts a “At subvol …” on stderr, which is very annoying in > scripts, esp. cron-jobs. Piping stderr to /dev/null does suppress this > message, but also error-messages which one would probably want to > see. I added an option to not change the behavior of btrfs send > and possibly break existing scripts, but moving this message to > verbose would be O.K. for me too. We should use the current verbosity option. For compatibility reasons, I'd keep the 'At subvol' printed as default, matching verbosity level 1. All existing messages verbosity should then become 2, and the proposed quiet option 0. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1] btrfs-progs: Typo review of strings and comments
On Wed, May 11, 2016 at 07:50:35PM -0400, Nicholas D Steeves wrote: > Thank you David Sterba for the btrfs-typos.txt which gave me a head > start. Unfortunately I wasn't able finish before btrfs-progs-4.5.3 > was released, because I decided to use emacs' > ispell-comments-and-strings to do a full review. I had to rebase to > kdave's 4.5.2 branch on github, and that is what this patch will > cleanly apply to. Yes, patch applied cleanly. > There were a couple of instances where I wasn't sure what to do; I've > annotated them, and they are long lines now. To find them, search or > grep the diff for 'Steeves'. Found and updated, most of them real typos, 'strtoull' is name of a C library functioon. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs-progs: autogen: Make build success in CentOS 6 and 7
btrfs-progs build failed in CentOS 6 and 7: #./autogen.sh ... configure.ac:131: error: possibly undefined macro: PKG_CHECK_VAR If this token and others are legitimate, please use m4_pattern_allow. See the Autoconf documentation. ... Seems PKG_CHECK_VAR is new in pkgconfig 0.28 (24-Jan-2013): http://redmine.audacious-media-player.org/boards/1/topics/736 And the max available version for CentOS 7 in yum-repo and rpmfind.net is: pkgconfig-0.27.1-4.el7 http://rpmfind.net/linux/rpm2html/search.php?query=pkgconfig=Search+...=centos= I updated my pkgconfig to 0.30, but still failed at above error. (Maybe it is my setting problem) To make user in centos 6 and 7 building btrfs-progs without more changes, we can avoid using PKG_CHECK_VAR in following way found in: https://github.com/audacious-media-player/audacious-plugins/commit/f95ab6f939ecf0d9232b3165f9241d2ea9676b9e Signed-off-by: Zhao Lei--- configure.ac | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 4688bc7..a754990 100644 --- a/configure.ac +++ b/configure.ac @@ -128,7 +128,7 @@ PKG_STATIC(UUID_LIBS_STATIC, [uuid]) PKG_CHECK_MODULES(ZLIB, [zlib]) PKG_STATIC(ZLIB_LIBS_STATIC, [zlib]) -PKG_CHECK_VAR([UDEVDIR], [udev], [udevdir]) +UDEVDIR="$(pkg-config udev --variable=udevdir)" dnl lzo library does not provide pkg-config, let use classic way AC_CHECK_LIB([lzo2], [lzo_version], [ -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] btrfs-progs: autogen: Avoid chdir fail on dirname with blank
If source put in dir with blanks, as: /var/lib/jenkins/workspace/btrfs progs autogen will failed: ./autogen.sh: line 95: cd: /var/lib/jenkins/workspace/btrfs: No such file or directory Can be fixed by adding quotes into cd command. Signed-off-by: Zhao Lei--- autogen.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/autogen.sh b/autogen.sh index 9669850..8b9a9cb 100755 --- a/autogen.sh +++ b/autogen.sh @@ -92,7 +92,7 @@ find_autofile config.guess find_autofile config.sub find_autofile install-sh -cd $THEDIR +cd "$THEDIR" echo echo "Now type '$srcdir/configure' and 'make' to compile." -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs-progs: autogen: Don't show success message on fail
When autogen.sh failed, the success message is still in output: # ./autogen.sh ... configure.ac:131: error: possibly undefined macro: PKG_CHECK_VAR If this token and others are legitimate, please use m4_pattern_allow. See the Autoconf documentation. Now type './configure' and 'make' to compile. # Fixed by check return value of autoconf. After patch: # ./autogen.sh ... configure.ac:132: error: possibly undefined macro: PKG_CHECK_VAR If this token and others are legitimate, please use m4_pattern_allow. See the Autoconf documentation. # Signed-off-by: Zhao Lei--- autogen.sh | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/autogen.sh b/autogen.sh index 8b9a9cb..a5f9af2 100755 --- a/autogen.sh +++ b/autogen.sh @@ -64,9 +64,10 @@ echo " automake: $(automake --version | head -1)" chmod +x version.sh rm -rf autom4te.cache -aclocal $AL_OPTS -autoconf $AC_OPTS -autoheader $AH_OPTS +aclocal $AL_OPTS && +autoconf $AC_OPTS && +autoheader $AH_OPTS || +exit 1 # it's better to use helper files from automake installation than # maintain copies in git tree -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo
Niccolò Belli posted on Wed, 11 May 2016 21:50:43 +0200 as excerpted: > Hi, > Before doing the daily backup I did a btrfs check and btrfs scrub as > usual. > After that this time I also decided to run btrfs filesystem defragment > -r -v -clzo on all subvolumes (from a live distro) and just to be sure I > runned check and scrub once again. > > Before defragment: total bytes scrubbed: 15.90GiB with 0 errors > After defragment: total bytes scrubbed: 26.66GiB with 0 errors > > What did happen? This is something like a night and day difference: > almost double the data! As stated in the subject all the subolumes have > always been mounted with compress=lzo in /etc/fstab, even when I > installed the distro a couple of days ago I manually mounted the > subvolumes with -o compress=lzo. Instead I never used autodefrag. I'd place money on your use of either snapshots or dedup. As CAM says (perhaps too) briefly, defrag isn't snapshot (technically, reflink) aware, and will break reflinks from other snapshots/dedups as it defrags whatever file it's currently working on. If there's few to no reflinks, as there won't be if you're not using snapshots, btrfs dedup, etc, no problem, but where there's existing reflinks, the mechanism both snapshots and the various btrfs dedup tools use, it will rewrite only the copy of the data it's working on, leaving the others as they are, thus effectively doubling (for the snapshots and first defrag case) the data usage, the old possibly multiply snapshot- reflinked copy, and the new defragged copy that no longer shares extents with the snapshots and other previously reflinked copies. And unlike a normal defrag, when you use the compress option, it forced rewrite of every file in ordered to (possibly re)compress it. So while a normal defrag would have only rewritten some files and would have only expanded data usage to the extent it actually did rewrites, the compress option forced it to recompress all files it came across, breaking all those reflinks and duplicating the data if existing snapshots, etc, still referenced the old copies, in the process, thereby effectively doubling your data usage. The fact that it didn't /quite/ double usage may be down to the normal compress mount option only doing a quick compression test and not compressing it if the file doesn't seem particularly compressible based on that quick test, while the defrag with compress likely actually checks every (128 KiB compression) block, getting a bit better compression in the process. So the defrag/compress run didn't quite double usage as it compressed some stuff that the runtime compression didn't. (FWIW, you can get the more thorough runtime compression behavior with the compress- force option, which always tries compression, not just doing a quick test and skipping compression on the entire file if the bit the test tried didn't compress so well.) FWIW, around 3.9, btrfs defrag was actually snapshot/reflink aware for a few releases, but it turned out that dealing with all those reflinks simply didn't scale well with the then existing code, and people were reporting defrag runs taking days or weeks, to (would-be) months with enough snapshots and with quotas (which didn't scale well either) turned on. Obviously that was simply unworkable, so defrag's snapshot awareness was reverted until they could make it scale better, as a working but snapshot unaware defrag was clearly more practical than one that couldn't be run because it'd take months, and that snapshot awareness has yet to be reactivated. So now the bottom line is don't defrag what you don't want un-reflinked. FWIW, autodefrag has the same problem in theory, but the effect in practice is far more limited, in part because it only does its defrag thing when some part of the file is being rewritten (and thus COWed elsewhere, doing a limited dereflink already for the actually written block(s) already, and while autodefrag will magnify that a bit by COWing somewhat larger extents, for files of any size (MiB scale and larger) it's not going to rewrite and thus duplicate the entire file, as as defrag could do. And it's definitely not going to be rewriting all files in large sections of the filesystem as recursive defrag with the compression option will. Additionally, autodefrag will tend to defrag the file shortly after it has been changed, likely before any snapshots have been taken if they're only taken daily or so, so you'll only have effectively two copies of the portion of the file that was changed, the old version as still locked in place by previous snapshots and the new version, not the three that you're likely to have if you wait until snapshots have been done before doing the defrag (the old version as in previous snapshots, the new version as initially written and locked in place by post-change pre- defrag snapshots, and the new version as defragged). -- Duncan - List replies
Re: Btrfs progs release 4.5.3
On Thu, May 12, 2016 at 09:14:44AM +, Duncan wrote: > David Sterba posted on Wed, 11 May 2016 16:47:39 +0200 as excerpted: > > > btrfs-progs 4.5.2 have been released. A bugfix release. > > So 4.5.3 as stated in the subject line, or 4.5.2 as stated in the message > body? > > Given that I'm on 4.5.2, it must be 4.5.3 that's just released. =:^) The subject line is correct, I'm copying the previous release message text and forgot to update the version everywhere. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Idea on compatibility for old distributions
On Thu, May 12, 2016 at 04:51:22PM +0800, Qu Wenruo wrote: > David Sterba wrote on 2016/05/12 10:37 +0200: > > On Thu, May 12, 2016 at 08:57:47AM +0800, Qu Wenruo wrote: > >> Thanks for the info. > >> > >> I also read the phoronix news yesterday. > >> So for RHEL6 that's meaningless. > >> > >> But I'm not sure whether it's still meaningless for OpenSUSE, maybe > >> David has some plan on it? > > > > No plans. > > > So the compatible layer patch is just to allow btrfs-progs to be > compiled on old distros? Yes, but convert compatibility is with e2fsprogs while FIEMAP_EXTENT_SHARED is kernel interface. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs progs release 4.5.3
David Sterba posted on Wed, 11 May 2016 16:47:39 +0200 as excerpted: > btrfs-progs 4.5.2 have been released. A bugfix release. So 4.5.3 as stated in the subject line, or 4.5.2 as stated in the message body? Given that I'm on 4.5.2, it must be 4.5.3 that's just released. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
On Wed, May 11, 2016 at 01:30:21PM -0700, Josef Bacik wrote: > > Signed-off-by: Qu Wenruo> > Signed-off-by: Mark Fasheh > Reviewed-by: Josef Bacik Applied to for-next with the following fixup to make it bisectable: --- btrfs: build fixup for qgroup_account_snapshot The macro btrfs_std_error got renamed to btrfs_handle_fs_error in an independent branch for the same merge target (4.7). To make the code compilable for bisectability reasons, add a temporary stub. Signed-off-by: David Sterba --- fs/btrfs/transaction.c | 5 + 1 file changed, 5 insertions(+) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index d7172d7ced5f..530081388d77 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1311,6 +1311,11 @@ int btrfs_defrag_root(struct btrfs_root *root) return ret; } +/* Bisesctability fixup, remove in 4.8 */ +#ifndef btrfs_std_error +#define btrfs_std_error btrfs_handle_fs_error +#endif + /* * Do all special snapshot related qgroup dirty hack. * --- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Idea on compatibility for old distributions
David Sterba wrote on 2016/05/12 10:37 +0200: On Thu, May 12, 2016 at 08:57:47AM +0800, Qu Wenruo wrote: Thanks for the info. I also read the phoronix news yesterday. So for RHEL6 that's meaningless. But I'm not sure whether it's still meaningless for OpenSUSE, maybe David has some plan on it? No plans. So the compatible layer patch is just to allow btrfs-progs to be compiled on old distros? Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Idea on compatibility for old distributions
On Thu, May 12, 2016 at 08:57:47AM +0800, Qu Wenruo wrote: > Thanks for the info. > > I also read the phoronix news yesterday. > So for RHEL6 that's meaningless. > > But I'm not sure whether it's still meaningless for OpenSUSE, maybe > David has some plan on it? No plans. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] fstests: generic: Test reserved extent map search routine on deduped file
For fully deduped file, which means all its file exntents are pointing to the same bytenr, btrfs can cause soft lockup when calling fiemap ioctl on that file, like the following output: -- CPU: 1 PID: 7500 Comm: xfs_io Not tainted 4.5.0-rc6+ #2 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 task: 880027681b40 ti: 8800276e task.ti: 8800276e RIP: 0010:[] [] __merge_refs+0x34/0x120 [btrfs] RSP: 0018:8800276e3c08 EFLAGS: 0202 RAX: 8800269cc330 RBX: 8800269cdb18 RCX: 0007 RDX: 61b0 RSI: 8800269cc4c8 RDI: 8800276e3c88 RBP: 8800276e3c20 R08: R09: 0001 R10: R11: R12: 880026ea3cb0 R13: 8800276e3c88 R14: 880027132a50 R15: 88002743 FS: 7f10201df700() GS:88003fa0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f10201ec000 CR3: 27603000 CR4: 000406e0 Stack: 8800276e3ce8 a0259f38 0005 8800274c6870 8800274c7d88 00c1 0001 27431190 Call Trace: [] find_parent_nodes+0x448/0x740 [btrfs] [] btrfs_check_shared+0x102/0x1b0 [btrfs] [] ? __might_fault+0x4d/0xa0 [] extent_fiemap+0x2ac/0x550 [btrfs] [] ? __filemap_fdatawait_range+0x96/0x160 [] ? btrfs_get_extent+0xb30/0xb30 [btrfs] [] btrfs_fiemap+0x45/0x50 [btrfs] [] do_vfs_ioctl+0x498/0x670 [] SyS_ioctl+0x79/0x90 [] entry_SYSCALL_64_fastpath+0x12/0x6f Code: 41 55 41 54 53 4c 8b 27 4c 39 e7 0f 84 e9 00 00 00 49 89 fd 49 8b 34 24 49 39 f5 48 8b 1e 75 17 e9 d5 00 00 00 49 39 dd 48 8b 03 <48> 89 de 0f 84 b9 00 00 00 48 89 c3 8b 46 2c 41 39 44 24 2c 75 -- Also btrfs will return wrong flag for all these extents, they should have SHARED(0x2000) flags, while btrfs still consider them as exclusive extents. On the other hand, with unmerged xfs reflink patches, xfs can handle it without problem, and for patched btrfs, it can also handle it. This test case will create a large fully deduped file to check if the fs can handle the fiemap ioctl and return correct SHARED flag for any fs which support reflink. Reported-by: Tsutomu ItohSigned-off-by: Qu Wenruo --- v2: Use more xfs_io wrapper instead of calling $XFS_IO_PROGS Add fiemap requirement Refactor output to match golden output if LOAD_FACTOR is not 1 --- common/punch | 17 + tests/generic/352 | 98 +++ tests/generic/352.out | 5 +++ tests/generic/group | 1 + 4 files changed, 121 insertions(+) create mode 100755 tests/generic/352 create mode 100644 tests/generic/352.out diff --git a/common/punch b/common/punch index 43f04c2..44c6e1c 100644 --- a/common/punch +++ b/common/punch @@ -218,6 +218,23 @@ _filter_fiemap() _coalesce_extents } +_filter_fiemap_flags() +{ + $AWK_PROG ' + $3 ~ /hole/ { + print $1, $2, $3; + next; + } + $5 ~ /0x[[:xdigit:]]*8[[:xdigit:]][[:xdigit:]]/ { + print $1, $2, "unwritten"; + next; + } + $5 ~ /0x[[:xdigit:]]+/ { + print $1, $2, $5; + }' | + _coalesce_extents +} + # Filters fiemap output to only print the # file offset column and whether or not # it is an extent or a hole diff --git a/tests/generic/352 b/tests/generic/352 new file mode 100755 index 000..3537074 --- /dev/null +++ b/tests/generic/352 @@ -0,0 +1,98 @@ +#! /bin/bash +# FS QA Test 352 +# +# Test fiemap ioctl on heavily deduped file +# +# This test case will check if reserved extent map searching go +# without problem and return correct SHARED flag. +# Which btrfs will soft lock up and return wrong shared flag. +# +#--- +# Copyright (c) 2016 Fujitsu. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default!
Re: BTRFS Data at Rest File Corruption
What are the mount options for this filesystem? Maybe filter with grep/egrep the journalctl output or /var/log/messages for -i btrfs, and also for libata/scsi messages. Anything over previous days might reveal some clue. I've got multiple Btrfs raid1's and several times I've had *correctable* errors. So your expectation is proper. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 4.4.0 - no space left with >1.7 TB free space left
On 2016-05-12 15:03, Tomasz Chmielewski wrote: FYI, I'm still getting this with 4.5.3, which probably means the fix was not yet included ("No space left" at snapshot time): /var/log/postgresql/postgresql-9.3-main.log:2016-05-11 06:06:10 UTC LOG: could not close temporary statistics file "pg_stat_tmp/db_0.tmp": No space left on device /var/log/postgresql/postgresql-9.3-main.log:2016-05-11 06:06:10 UTC LOG: could not close temporary statistics file "pg_stat_tmp/global.tmp": No space left on device /var/log/postgresql/postgresql-9.3-main.log:2016-05-11 06:06:10 UTC LOG: could not close temporary statistics file "pg_stat_tmp/db_0.tmp": No space left on device /var/log/postgresql/postgresql-9.3-main.log:2016-05-11 06:06:10 UTC LOG: could not close temporary statistics file "pg_stat_tmp/global.tmp": No space left on device I've tried mounting with space_cache=v2, but it didn't help. On the good side, I see it's in 4.6-rc7. Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 4.4.0 - no space left with >1.7 TB free space left
On 2016-04-08 20:53, Roman Mamedov wrote: > Do you snapshot the parent subvolume which holds the databases? Can you > correlate that perhaps ENOSPC occurs at the time of snapshotting? If > yes, then > you should try the patch https://patchwork.kernel.org/patch/7967161/ > > (Too bad this was not included into 4.4.1.) By the way - was it included in any later kernel? I'm running 4.4.5 on that server, but still hitting the same issue. It's not in 4.4.6 either. I don't know why it doesn't get included, or what we need to do. Last time I asked, it was queued: http://www.spinics.net/lists/linux-btrfs/msg52478.html But maybe that meant 4.5 or 4.6 only? While the bug is affecting people on 4.4.x today. FYI, I'm still getting this with 4.5.3, which probably means the fix was not yet included ("No space left" at snapshot time): /var/log/postgresql/postgresql-9.3-main.log:2016-05-11 06:06:10 UTC LOG: could not close temporary statistics file "pg_stat_tmp/db_0.tmp": No space left on device /var/log/postgresql/postgresql-9.3-main.log:2016-05-11 06:06:10 UTC LOG: could not close temporary statistics file "pg_stat_tmp/global.tmp": No space left on device /var/log/postgresql/postgresql-9.3-main.log:2016-05-11 06:06:10 UTC LOG: could not close temporary statistics file "pg_stat_tmp/db_0.tmp": No space left on device /var/log/postgresql/postgresql-9.3-main.log:2016-05-11 06:06:10 UTC LOG: could not close temporary statistics file "pg_stat_tmp/global.tmp": No space left on device I've tried mounting with space_cache=v2, but it didn't help. Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html