RE: [GIT PULL] Fix for btrfs/070 checksum error
Hi, Chris > -Original Message- > From: linux-btrfs-ow...@vger.kernel.org > [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Qu Wenruo > Sent: Tuesday, July 28, 2015 3:11 PM > To: Chris Mason; btrfs > Subject: Re: [GIT PULL] Fix for btrfs/070 checksum error > > Chris Mason wrote on 2015/07/23 21:57 -0400: > > On Fri, Jul 24, 2015 at 08:29:05AM +0800, Qu Wenruo wrote: > > > > [ deadlock with the 070 patches ] > > > >> Thanks Chris > >> > >> We will investigate it with highest priority. > >> > >> Thanks, > >> Qu > >> > > > > Thanks! I'm doing a few more runs to make sure the lockup is new with > > these patches. > > > > -chris > > > Hi Chris, > > I'm very sorry that we are unable to fix the lockup in a short time, so it > may not > fit in the v4.2 merge window. > > Please ignore this patchset for now. > Sorry for taking quite a long time for investigate because it is randomly happened. We got reason of process blocking: 1: In some case, this patch caused __btrfs_cow_block()->btrfs_reloc_cow_block() failed from btrfs_balance operation.(need more investigation) 2: __btrfs_cow_block()'s error handle code hadn't unlock/free new_allocated tree block before return error. 3: do_relocation(), which is caller of __btrfs_cow_block(), have error handle code, but also can't work in this case, because new_allocated eb is not returned. 4: subsequent code in do_relocation() try to lock above eb again, and caused dead lock. In short: do_relocation() -> __btrfs_cow_block() failed without unlock eb *1 ... -> btrfs_search_slot() try to lock above eb again ... *1: this fail is caused by scrub Because eb locking code is not normal lock, we can't get information from lockldep in this case. Things to do: 1: Fix this patch to avoid making __btrfs_cow_block() fails. 2: Fix __btrfs_cow_block() to do enough cleanup in error handle code. 3: Some enhance for eb locking, to report some information to helps similar error. Thanks Zhaolei > Thanks, > Qu > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the > body > of a message to majord...@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Fix for btrfs/070 checksum error
On Wed, Jul 29, 2015 at 04:21:33PM +0800, Zhao Lei wrote: > Hi, Chris > > > -Original Message- > > From: linux-btrfs-ow...@vger.kernel.org > > [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Qu Wenruo > > Sent: Tuesday, July 28, 2015 3:11 PM > > To: Chris Mason; btrfs > > Subject: Re: [GIT PULL] Fix for btrfs/070 checksum error > > > > Chris Mason wrote on 2015/07/23 21:57 -0400: > > > On Fri, Jul 24, 2015 at 08:29:05AM +0800, Qu Wenruo wrote: > > > > > > [ deadlock with the 070 patches ] > > > > > >> Thanks Chris > > >> > > >> We will investigate it with highest priority. > > >> > > >> Thanks, > > >> Qu > > >> > > > > > > Thanks! I'm doing a few more runs to make sure the lockup is new with > > > these patches. > > > > > > -chris > > > > > Hi Chris, > > > > I'm very sorry that we are unable to fix the lockup in a short time, so it > > may not > > fit in the v4.2 merge window. > > > > Please ignore this patchset for now. > > > > Sorry for taking quite a long time for investigate because it is > randomly happened. > > We got reason of process blocking: > 1: In some case, this patch caused > __btrfs_cow_block()->btrfs_reloc_cow_block() > failed from btrfs_balance operation.(need more investigation) > > 2: __btrfs_cow_block()'s error handle code hadn't unlock/free > new_allocated tree block before return error. > > 3: do_relocation(), which is caller of __btrfs_cow_block(), have error handle > code, but also can't work in this case, because new_allocated eb is not > returned. > > 4: subsequent code in do_relocation() try to lock above eb again, > and caused dead lock. Excellent, thanks for tracking this down. I agree investigating #1 is the top priority, since it's possible the patches are just making it happen more often. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: teach backref walking about backrefs with underflowed offset values
From: Filipe Manana When cloning/deduplicating file extents (through the clone and extent_same ioctls) we can get data back references with offset values that are a result of an unsigned integer arithmetic underflow, that is, values that are much larger then they could be otherwise. This is not a problem when decrementing or dropping the back references (happens when we overwrite the extents or punch a hole for example, through __btrfs_drop_extents()), since we compute the same too large offset value, but it is a problem for the backref walking code, used by an incremental send and the ioctls that are used by the btrfs tool "inspect-internal" commands, as it makes it miss the corresponding file extent items because the search key is set for an extent item that starts at an offset matching the exceptionally large offset value of the data back reference. For an incremental send this causes the send ioctl to fail with -EIO. So teach the backref walking code to deal with these cases by setting the search key's offset to 0 if the backref's offset value is larger than LLONG_MAX (the largest possible file offset). This makes sure the backref walking code finds the corresponding file extent items at the expense of scanning more items and leafs in the btree. Fixing the clone/dedup ioctls to not produce such underflowed results would require major changes breaking backward compatibility, updating user space tools, etc. Simple reproducer case for fstests: seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { rm -fr $send_files_dir rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter # real QA test starts here _supported_fs btrfs _supported_os Linux _require_scratch _require_cloner _need_to_be_root send_files_dir=$TEST_DIR/btrfs-test-$seq rm -f $seqres.full rm -fr $send_files_dir mkdir $send_files_dir _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount # Create our test file with a single extent of 64K starting at file # offset 128K. $XFS_IO_PROG -f -c "pwrite -S 0xaa 128K 64K" $SCRATCH_MNT/foo \ | _filter_xfs_io _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT \ $SCRATCH_MNT/mysnap1 # Now clone parts of the original extent into lower offsets of the file. # # The first clone operation adds a file extent item to file offset 0 # that points to our initial extent with a data offset of 16K. The # corresponding data back reference in the extent tree has an offset of # 18446744073709535232, which is the result of file_offset - data_offset # = 0 - 16K. # # The second clone operation adds a file extent item to file offset 16K # that points to our initial extent with a data offset of 48K. The # corresponding data back reference in the extent tree has an offset of # 18446744073709518848, which is the result of file_offset - data_offset # = 16K - 48K. # # Those large back reference offsets (result of unsigned arithmetic # underflow) confused the back reference walking code (used by an # incremental send and the multiple inspect-internal ioctls) and made it # miss the back references, which for the case of an incremental send it # made it fail with -EIO and print a message like the following to # dmesg: # # "BTRFS error (device sdc): did not find backref in send_root. \ # inode=257, offset=0, disk_byte=12845056 found extent=12845056" # $CLONER_PROG -s $(((128 + 16) * 1024)) -d 0 -l $((16 * 1024)) \ $SCRATCH_MNT/foo $SCRATCH_MNT/foo $CLONER_PROG -s $(((128 + 48) * 1024)) -d $((16 * 1024)) \ -l $((16 * 1024)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT \ $SCRATCH_MNT/mysnap2 _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $send_files_dir/1.snap _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \ -f $send_files_dir/2.snap echo "File digest in the original filesystem:" md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch # Now recreate the filesystem by receiving both send streams and verify # we get the same file contents that the original filesystem had. _scratch_unmount _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/1.snap _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/2.snap echo "File digest in the new filesystem:" md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch status=0 exit The test's expected golden output is: wrote 65536/65536 bytes at offset 131072 XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) File digest in the original filesystem: 6c6079335cff141b8a31233ead04cbff SCRATCH_MNT/mysnap2/foo File digest in the new filesystem: 6c6079335cff141b8a31233ead04cbff SCRATCH_MNT/mysnap2
Re: Strange data backref offset?
On Fri, Jul 17, 2015 at 3:38 AM, Qu Wenruo wrote: > Hi all, > > While I'm developing a new btrfs inband dedup mechanism, I found btrfsck and > kernel doing strange behavior for clone. > > [Reproducer] > # mount /dev/sdc -t btrfs /mnt/test > # dd if=/dev/zero of=/mnt/test/file1 bs=4K count=4 > # sync > # ~/xfstests/src/cloner -s 4096 -l 4096 /mnt/test/file1 /mnt/test/file2 > # sync > > Then btrfs-debug-tree gives quite strange result on the data backref: > -- > > item 4 key (12845056 EXTENT_ITEM 16384) itemoff 16047 itemsize 111 > extent refs 3 gen 6 flags DATA > extent data backref root 5 objectid 257 offset 0 count 1 > extent data backref root 5 objectid 258 offset > 18446744073709547520 count 1 > > > item 8 key (257 EXTENT_DATA 0) itemoff 15743 itemsize 53 > extent data disk byte 12845056 nr 16384 > extent data offset 0 nr 16384 ram 16384 > extent compression 0 > item 9 key (257 EXTENT_DATA 16384) itemoff 15690 itemsize 53 > extent data disk byte 12845056 nr 16384 > extent data offset 4096 nr 4096 ram 16384 > extent compression 0 > -- > > The offset is file extent's key.offset - file exntent's offset, > Which is 0 - 4096, causing the overflow result. > > Kernel and fsck all uses that behavior, so fsck can pass the strange thing. > > But shouldn't the offset in data backref matches with the key.offset of the > file extent? > > And I'm quite sure the change of behavior can hugely break the fsck and > kernel, but I'm wondering is this a known BUG or feature, and will it be > handled? Obviously a bug. I was recently investigating incremental send failures after cloning/deduping extents and that lead me to this as well. It's a bug but it's not too bad as it effects only backref walking, which can have a simple workaround (I just sent a patch for it). For the purposes of incrementing/decrementing the data backref's count we do the same calculation everywhere, always leading to the same large and unexpected value, so we don't get bogus backrefs added/left around. > > Thanks, > Qu > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fstests: test for btrfs incremental send after file extent cloning
From: Filipe Manana Test that an incremental send works after a file gets one of its extents cloned/deduplicated into lower file offsets. This is a regression test for the problem fixed by the linux kernel patch titled: "Btrfs: teach backref walking about backrefs with underflowed offset values" Signed-off-by: Filipe Manana --- tests/btrfs/097 | 113 tests/btrfs/097.out | 7 tests/btrfs/group | 1 + 3 files changed, 121 insertions(+) create mode 100755 tests/btrfs/097 create mode 100644 tests/btrfs/097.out diff --git a/tests/btrfs/097 b/tests/btrfs/097 new file mode 100755 index 000..d9138ea --- /dev/null +++ b/tests/btrfs/097 @@ -0,0 +1,113 @@ +#! /bin/bash +# FS QA Test No. btrfs/097 +# +# Test that an incremental send works after a file gets one of its extents +# cloned/deduplicated into lower file offsets. +# +#--- +# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + rm -fr $send_files_dir + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_cloner +_need_to_be_root + +send_files_dir=$TEST_DIR/btrfs-test-$seq + +rm -f $seqres.full +rm -fr $send_files_dir +mkdir $send_files_dir + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount + +# Create our test file with a single extent of 64K starting at file offset 128K. +$XFS_IO_PROG -f -c "pwrite -S 0xaa 128K 64K" $SCRATCH_MNT/foo | _filter_xfs_io + +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 + +# Now clone parts of the original extent into lower offsets of the file. +# +# The first clone operation adds a file extent item to file offset 0 that points +# to our initial extent with a data offset of 16K. The corresponding data back +# reference in the extent tree has an offset of 18446744073709535232, which is +# the result of file_offset - data_offset = 0 - 16K. +# +# The second clone operation adds a file extent item to file offset 16K that +# points to our initial extent with a data offset of 48K. The corresponding data +# back reference in the extent tree has an offset of 18446744073709518848, which +# is the result of file_offset - data_offset = 16K - 48K. +# +# Those large back reference offsets (result of unsigned arithmetic underflow) +# confused the back reference walking code (used by an incremental send and +# the multiple inspect-internal ioctls) and made it miss the back references, +# which for the case of an incremental send it made it fail with -EIO and print +# a message like the following to dmesg: +# +# "BTRFS error (device sdc): did not find backref in send_root. inode=257, \ +# offset=0, disk_byte=12845056 found extent=12845056" +# +$CLONER_PROG -s $(((128 + 16) * 1024)) -d 0 -l $((16 * 1024)) \ + $SCRATCH_MNT/foo $SCRATCH_MNT/foo +$CLONER_PROG -s $(((128 + 48) * 1024)) -d $((16 * 1024)) -l $((16 * 1024)) \ + $SCRATCH_MNT/foo $SCRATCH_MNT/foo + +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2 + +_run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $send_files_dir/1.snap +_run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \ + -f $send_files_dir/2.snap + +echo "File digest in the original filesystem:" +md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch + +# Now recreate the filesystem by receiving both send streams and verify we get +# the same file contents that the original filesystem had. +_scratch_unmount +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount + +_run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/1.snap +_run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/2.snap + +echo "File digest in the new filesystem:" +md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch + +status=0 +exit diff --git a/tests/btrfs/097.out b/tests/btrfs/097.out new file mode 100644 index 0
fs got readonly after "btrfs_run_delayed_refs:2783: errno=-5 IO failure"
Hi At my home machine I use btrfs from the latest Linux kernel (Linux Arch). A few days ago I started rebalance but unfortunately the machine got rebooted. It looks like rebalance operation is not interrupt-tolerant and now my filesystem got corrupted. I see a lot of checksum errors, but as I use RAID most of these error got fixed, I started scrub operation to find/fix all the problems but the scrub operation got cancelled at the very beginning. I see following error in kernel logs, it says "(device sdb): run_one_delayed_ref returned -5" and after that "(device sdb): forced readonly". What does it suppose to mean? I expect that scrub either fix filesystem inconsistency problems. Or tell me what file are not recoverable so I can delete/restore the data from backup. But now I have a readonly filsystem and scrub refuses to recover it. I see the same issue with current HEAD (v4.2-rc3). I enabled btrfs debugging to get more info what is going on here [ 609.802479] BTRFS: read error corrected: ino 1 off 25324783960064 (dev /dev/sdc sector 2530931648) [ 609.814791] BTRFS (device sdb): parent transid verify failed on 25324789198848 wanted 443932 found 413701 [ 609.835181] BTRFS: read error corrected: ino 1 off 25324789198848 (dev /dev/sdc sector 2530941880) [ 609.846056] BTRFS (device sdb): parent transid verify failed on 25324789448704 wanted 443932 found 413701 [ 609.858280] BTRFS: read error corrected: ino 1 off 25324789448704 (dev /dev/sdc sector 2530942368) [ 609.859835] BTRFS (device sdb): parent transid verify failed on 25324789387264 wanted 443932 found 413701 [ 609.870867] BTRFS: read error corrected: ino 1 off 25324789387264 (dev /dev/sdc sector 2530942248) [ 609.872679] BTRFS (device sdb): parent transid verify failed on 25324822609920 wanted 443938 found 441825 [ 609.909616] BTRFS: read error corrected: ino 1 off 25324822609920 (dev /dev/sdc sector 2531007136) [ 609.967041] BTRFS (device sdb): parent transid verify failed on 25324678742016 wanted 443932 found 441820 [ 609.970855] BTRFS: read error corrected: ino 1 off 25324678742016 (dev /dev/sdc sector 2530726144) [ 610.008460] BTRFS (device sdb): parent transid verify failed on 25324908392448 wanted 443938 found 415080 [ 610.041669] BTRFS: read error corrected: ino 1 off 25324908392448 (dev /dev/sdc sector 2531174680) [ 610.116968] BTRFS (device sdb): parent transid verify failed on 25325058904064 wanted 443941 found 441828 [ 610.123595] BTRFS: read error corrected: ino 1 off 25325058904064 (dev /dev/sdc sector 4024575336) [ 610.128482] BTRFS: read error corrected: ino 1 off 25324674007040 (dev /dev/sdc sector 2530716896) [ 640.028885] verify_parent_transid: 19 callbacks suppressed [ 640.030377] BTRFS (device sdb): parent transid verify failed on 25324845932544 wanted 443938 found 441825 [ 640.062917] repair_io_failure: 18 callbacks suppressed [ 640.064486] BTRFS: read error corrected: ino 1 off 25324845932544 (dev /dev/sdc sector 2531052688) [ 640.119903] BTRFS (device sdb): parent transid verify failed on 25324845969408 wanted 443938 found 441827 [ 640.125322] BTRFS: read error corrected: ino 1 off 25324845969408 (dev /dev/sdc sector 2531052760) [ 640.142157] BTRFS (device sdb): parent transid verify failed on 25325043716096 wanted 443940 found 441827 [ 640.174974] BTRFS: read error corrected: ino 1 off 25325043716096 (dev /dev/sdc sector 4024545672) [ 640.185464] BTRFS (device sdb): parent transid verify failed on 25325503774720 wanted 443950 found 441837 [ 640.238762] BTRFS: read error corrected: ino 1 off 25325503774720 (dev /dev/sdc sector 4025444224) [ 641.718129] BTRFS (device sdb): parent transid verify failed on 25325006667776 wanted 443940 found 441827 [ 641.721734] BTRFS: read error corrected: ino 1 off 25325006667776 (dev /dev/sdc sector 4024473312) [ 641.723841] BTRFS (device sdb): parent transid verify failed on 25325006692352 wanted 443940 found 441827 [ 641.725775] BTRFS: read error corrected: ino 1 off 25325006692352 (dev /dev/sdc sector 4024473360) [ 641.742454] BTRFS (device sdb): parent transid verify failed on 25325006716928 wanted 443940 found 441827 [ 641.744649] BTRFS: read error corrected: ino 1 off 25325006716928 (dev /dev/sdc sector 4024473408) [ 641.778807] BTRFS (device sdb): parent transid verify failed on 25324804997120 wanted 443937 found 413700 [ 641.819483] BTRFS: read error corrected: ino 1 off 25324804997120 (dev /dev/sdc sector 2530972736) [ 641.821201] BTRFS (device sdb): parent transid verify failed on 25324782997504 wanted 443937 found 441823 [ 641.834794] BTRFS: read error corrected: ino 1 off 25324782997504 (dev /dev/sdc sector 2530929768) [ 641.836415] BTRFS (device sdb): parent transid verify failed on 25324805001216 wanted 443937 found 413700 [ 641.838488] BTRFS: read error corrected: ino 1 off 25324805001216 (dev /dev/sdc sector 2530972744) [ 644.531005] BTRFS error (device sdb): run_one_delayed_ref returned -5 [ 644.534562] [ cut here ]---
[PATCH 1/1] btrfs-progs: compilation errors when using musl libc
- limits.h must be included to pick up PATH_MAX. - remove double declaration of BTRFS_DISABLE_BACKTRACE kerncompat.h assumed that if __GLIBC__ was not defined, it could safely define BTRFS_DISABLE_BACKTRACE, however this can be defined by the configure script. Added a check to ensure it is not defined first. Signed-off-by: Brendan Heading --- cmds-inspect.c | 1 + cmds-receive.c | 1 + cmds-scrub.c | 1 + cmds-send.c| 1 + kerncompat.h | 2 ++ 5 files changed, 6 insertions(+) diff --git a/cmds-inspect.c b/cmds-inspect.c index 71451fe..9712581 100644 --- a/cmds-inspect.c +++ b/cmds-inspect.c @@ -20,6 +20,7 @@ #include #include #include +#include #include "kerncompat.h" #include "ioctl.h" diff --git a/cmds-receive.c b/cmds-receive.c index 071bea9..d4b3103 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include diff --git a/cmds-scrub.c b/cmds-scrub.c index b7aa809..5a85dc4 100644 --- a/cmds-scrub.c +++ b/cmds-scrub.c @@ -34,6 +34,7 @@ #include #include #include +#include #include "ctree.h" #include "ioctl.h" diff --git a/cmds-send.c b/cmds-send.c index 20bba18..a0b7f95 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -33,6 +33,7 @@ #include #include #include +#include #include "ctree.h" #include "ioctl.h" diff --git a/kerncompat.h b/kerncompat.h index 5d92856..7c627ba 100644 --- a/kerncompat.h +++ b/kerncompat.h @@ -33,7 +33,9 @@ #include #ifndef __GLIBC__ +#ifndef BTRFS_DISABLE_BACKTRACE #define BTRFS_DISABLE_BACKTRACE +#endif #define __always_inline __inline __attribute__ ((__always_inline__)) #endif -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fs got readonly after "btrfs_run_delayed_refs:2783: errno=-5 IO failure"
Anatol Pomozov posted on Wed, 29 Jul 2015 09:26:00 -0700 as excerpted: > At my home machine I use btrfs from the latest Linux kernel (Linux > Arch). Similar here, but on gentoo. And to be clear, just a list regular and btrfs user as yourself, not a dev. As such, this reply isn't intended to directly help you fix the issue at hand, but it does address a possible misconception I saw, below, and provide some more general information that could be helpful. > A few days ago I started rebalance but unfortunately the machine got > rebooted. It looks like rebalance operation is not interrupt-tolerant > and now my filesystem got corrupted. In _theory_ btrfs operations are atomic and thus even unplug-the-running- machine tolerant, let alone reboot tolerant. However, in _both_ theory and practice, btrfs is still not fully stable and mature yet, and bugs negatively affect the operation of the theory above... In theory rebalance simply moves big chunks of data/metadata around, and if interrupted, all addresses will either point to the new location for for previously balanced chunks, or the old location, for those not yet balanced and for the one that was being processed at the time of the reboot. And a balance definitely can and normally does pick up where it left off after a reboot. But... > I see a lot of checksum errors, but as I use RAID most of these error > got fixed, I started scrub operation to find/fix all the problems but > the scrub operation got cancelled at the very beginning. I see following > error in kernel logs, it says "(device sdb): run_one_delayed_ref > returned -5" and after that "(device sdb): forced readonly". What does > it suppose to mean? I expect that scrub either fix filesystem > inconsistency problems. Or tell me what file are not recoverable so I > can delete/restore the data from backup. But now I have a readonly > filsystem and scrub refuses to recover it. Scrub detects, and fixes in the dup/raid1/5/6/10 case where there's either a redundant copy or parity information from which it can rebuild, one kind of error, the checksum errors you mentioned. It does _not_, however, and this is the possible misconception I mentioned above, fix other types of filesystem inconsistency problems, unless they're a direct result of the checksum validated data integrity errors it does detect and fix if possible. For other errors, the kernel itself catches and fixes many problems on-mount, with others recoverable with the recovery mount option, and still others fixable using btrfs check, tho AFAIK, the recommendation remains not to use btrfs check in --repair mode (without -- repair it'll only report any problems it finds, not attempt to fix them) unless you have to, because with problems it doesn't understand it might make the problem worse instead of better. Of course with btrfs' immaturity, the rule about having backups if you care about the data, and if you don't have backups, by definition you don't care about the data, applies double, but you already mentioned the possibility of restoring from backups, so you have that one covered. =:^) As for the read-only, the kernel btrfs code forces a filesystem read-only when it detects a filesystem inconsistency that could result in further damage were it to continue to write to the filesystem. Since at that point it's read-only, you can't damage it further by rebooting, and it's possible btrfs' self-healing properties will fix the problem on reboot. However, since it's also possible the damage is bad enough it might not mount at all on reboot, you might wish to take advantage of the current read-only state to freshen your backups while you can still access the filesystem. (If you do get caught with an unmountable filesystem and stale backups, btrfs restore can be used to restore still readable files from the unmounted filesystem. And because restore doesn't actually change the filesystem it's restoring from but writes restored files elsewhere, if it comes to that, restore is recommended before more risky interventions, like btrfs check in --repair mode. I've done that a couple times when my backups were stale, and was quite happy with the results. Of course that does mean you need space on a mounted filesystem to restore to...) As for the problem at hand itself, I'll let those with more expertise address that. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fs got readonly after "btrfs_run_delayed_refs:2783: errno=-5 IO failure"
Hi, I see a lot of checksum errors, but as I use RAID most of these error got fixed, I started scrub operation to find/fix all the problems but the scrub operation got cancelled at the very beginning. I see following error in kernel logs, it says "(device sdb): run_one_delayed_ref returned -5" and after that "(device sdb): forced readonly". are you using mount -o degrade option ? if not could you please try ? Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RESEND] btrfs: fix search key advancing condition
Hello, list. Could any one take a look at on this? I believe this is a issue slowing down ioctl(BTRFS_IOC_TREE_SEARCH) if the target key is missing. On Tue, Jun 30, 2015 at 11:25 AM, Naohiro Aota wrote: > The search key advancing condition used in copy_to_sk() is loose. It can > advance the key even if it reaches sk->max_*: e.g. when the max key = (512, > 1024, -1) and the current key = (512, 1025, 10), it increments the > offset by 1, continues hopeless search from (512, 1025, 11). This issue > make ioctl() to take unexpectedly long time scanning all the leaf a blocks > one by one. > > This commit fix the problem using standard way of key comparison: > btrfs_comp_cpu_keys() > > Signed-off-by: Naohiro Aota > --- > fs/btrfs/ioctl.c | 12 +--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 1c22c65..07dc01d 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -1932,6 +1932,7 @@ static noinline int copy_to_sk(struct btrfs_root *root, > u64 found_transid; > struct extent_buffer *leaf; > struct btrfs_ioctl_search_header sh; > + struct btrfs_key test; > unsigned long item_off; > unsigned long item_len; > int nritems; > @@ -2015,12 +2016,17 @@ static noinline int copy_to_sk(struct btrfs_root > *root, > } > advance_key: > ret = 0; > - if (key->offset < (u64)-1 && key->offset < sk->max_offset) > + test.objectid = sk->max_objectid; > + test.type = sk->max_type; > + test.offset = sk->max_offset; > + if (btrfs_comp_cpu_keys(key, &test) >= 0) > + ret = 1; > + else if (key->offset < (u64)-1) > key->offset++; > - else if (key->type < (u8)-1 && key->type < sk->max_type) { > + else if (key->type < (u8)-1) { > key->offset = 0; > key->type++; > - } else if (key->objectid < (u64)-1 && key->objectid < > sk->max_objectid) { > + } else if (key->objectid < (u64)-1) { > key->offset = 0; > key->type = 0; > key->objectid++; > -- > 2.4.4 > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html