Re: fsck lowmem mode only: ERROR: errors found in fs roots
On Sat, 2018-11-03 at 09:34 +0800, Su Yue wrote: > Sorry for the late reply cause I'm busy at other things. No worries :-) > I just looked through related codes and found the bug. > The patches can fix it. So no need to do more tests. > Thanks to your tests and patience. :) Thanks for fixing :-) Best wishes, Chris.
Re: fsck lowmem mode only: ERROR: errors found in fs roots
On 2018/11/2 10:10 PM, Christoph Anton Mitterer wrote: Hey Su. Sorry for the late reply cause I'm busy at other things. Anything further I need to do in this matter or can I consider it "solved" and you won't need further testing by my side, but just PR the patches of that branch? :-) I just looked through related codes and found the bug. The patches can fix it. So no need to do more tests. Thanks to your tests and patience. :) In previous output of debug version, we can see @ret code is 524296 which is (DIR_ITEM_MISMATCH(1 << 3) | DIR_INDEX_MISMATCH (1<<19)). In btrfs-progs v4.17, function check_inode_extref() passes u64 @mode as the last parameter of find_dir_item(); However, find_dir_item() is defined as: static int find_dir_item(struct btrfs_root *root, struct btrfs_key *key, struct btrfs_key *location_key, char *name, u32 namelen, u8 file_type); The type of the last argument is u8 not u64. So the case is that while checking files with inode_extrefs, if (imode != file_type), then find_dir_item() thinks it found DIR_ITEM_MISMATCH or DIR_INDEX_MISMATCH. Thanks, Su Thanks, Chris. On Sat, 2018-10-27 at 14:15 +0200, Christoph Anton Mitterer wrote: Hey. Without the last patches on 4.17: checking extents checking free space cache checking fs roots ERROR: errors found in fs roots Checking filesystem on /dev/mapper/system UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c found 619543498752 bytes used, error(s) found total csum bytes: 602382204 total tree bytes: 2534309888 total fs tree bytes: 1652097024 total extent tree bytes: 160432128 btree space waste bytes: 459291608 file data blocks allocated: 7334036647936 referenced 730839187456 With the last patches, on 4.17: checking extents checking free space cache checking fs roots checking only csum items (without verifying data) checking root refs Checking filesystem on /dev/mapper/system UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c found 619543498752 bytes used, no error found total csum bytes: 602382204 total tree bytes: 2534309888 total fs tree bytes: 1652097024 total extent tree bytes: 160432128 btree space waste bytes: 459291608 file data blocks allocated: 7334036647936 referenced 730839187456 Cheers, Chris.
Re: Salvage files from broken btrfs
On 2018/11/3 上午1:18, M. Klingmann wrote: > On 02.11.2018 at 15:45 Qu Wenruo wrote: >> On 2018/11/2 下午10:30, M. Klingmann wrote: >>> On 31.10.2018 at 01:03 Qu Wenruo wrote: My plan for such recovery is: 1) btrfs ins dump-super to make sure system chunk array is valid 2) btrfs-find-root to find any valid chunk tree blocks 3) pass that chunk tree bytenr to btrfs-restore Unfortunately, btrfs-restore doesn't support specifying chunk root yet. But it's pretty easy to add such support. So, please provide the "btrfs ins dump-super -Ffa" output to start with. >>> Following your plan, I did 1) and 2). >>> As 2) failed (see below), is there anything I can do to find the tree >>> bytenr to supply btrfs-restore with it? >>> >>> 1) Here's the output given by "btrfs-show-super -Ffa": >>> >>> superblock: bytenr=65536, device=sdcard.iso >>> - >>> csum 0xb8e15dd7 [match] > [snip] >>> 2) "btrfs-find-root" yields "Couldn't read chunk root; Open ctree failed". >> It's not plain "btrfs-find-root" but "btrfs-find-root -o 5". >> >> And you should use btrfs-progs v4.17.1, not the old v4.4. >> The ability to continue search even if chunk tree get corrupted is added >> in v4.5, and I strongly recommend to use latest (v4.17.1) for a lot of >> fixes and extra debug output. >> >> If you can't find any handy way to update btrfs-progs, you could use >> Archlinux iso as a rescue OS to use the latest btrfs-progs. > > Using Archlinux in fact is the easiest way to use version 4.17.1 > (Archlinux for 2018-11-01). > > Here's the output from "btrfs-find-root sdcard.iso": > > WARNING: cannot read chunk root, continue anyway > Superblock thinks the generation is 1757933 > Superblock thinks the level is 0 > > Here's the output using "btrfs-find-root -o 5 sdcard.iso": > > WARNING: cannot read chunk root, continue anyway > Superblock doesn't contain generation info for root 5 > Superblock doesn't contain the level info for root 5 No other output at all? That means the whole 8M range of system chunk get corrupted. Thus really no way to get any meaningful data out of the filesystem, unfortunately. Thanks, Qu > >> For 3), I could easily add such feature btrfs-restore, or just use >> manually patching your superblock to continue. >> So as soon as your "btrfs-find-root -o 5" gets some valid output, I >> could continue the work. >> > Thank you. > signature.asc Description: OpenPGP digital signature
[PATCH 5/7] btrfs: test swap files on multiple devices
From: Omar Sandoval Swap files currently need to exist on exactly one device in exactly one place. Signed-off-by: Omar Sandoval --- tests/btrfs/175 | 73 + tests/btrfs/175.out | 8 + tests/btrfs/group | 1 + 3 files changed, 82 insertions(+) create mode 100755 tests/btrfs/175 create mode 100644 tests/btrfs/175.out diff --git a/tests/btrfs/175 b/tests/btrfs/175 new file mode 100755 index ..64afc4f0 --- /dev/null +++ b/tests/btrfs/175 @@ -0,0 +1,73 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2018 Facebook. All Rights Reserved. +# +# FS QA Test 175 +# +# Test swap file activation on multiple devices. +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +. ./common/rc +. ./common/filter + +rm -f $seqres.full + +_supported_fs generic +_supported_os Linux +_require_scratch_dev_pool 2 +_require_scratch_swapfile + +cycle_swapfile() { + local sz=${1:-$(($(get_page_size) * 10))} + _format_swapfile "$SCRATCH_MNT/swap" "$sz" + swapon "$SCRATCH_MNT/swap" 2>&1 | _filter_scratch + swapoff "$SCRATCH_MNT/swap" > /dev/null 2>&1 +} + +echo "RAID 1" +_scratch_pool_mkfs -d raid1 -m raid1 >> $seqres.full 2>&1 +_scratch_mount +cycle_swapfile +_scratch_unmount + +echo "DUP" +_scratch_pool_mkfs -d dup -m dup >> $seqres.full 2>&1 +_scratch_mount +cycle_swapfile +_scratch_unmount + +echo "Single on multiple devices" +_scratch_pool_mkfs -d single -m raid1 -b $((1024 * 1024 * 1024)) >> $seqres.full 2>&1 +_scratch_mount +# Each device is only 1 GB, so 1.5 GB must be split across multiple devices. +cycle_swapfile $((3 * 1024 * 1024 * 1024 / 2)) +_scratch_unmount + +echo "Single on one device" +_scratch_mkfs >> $seqres.full 2>&1 +_scratch_mount +# Create the swap file, then add the device. That way we know it's all on one +# device. +_format_swapfile "$SCRATCH_MNT/swap" $(($(get_page_size) * 10)) +scratch_dev2="$(echo "${SCRATCH_DEV_POOL}" | awk '{ print $2 }')" +$BTRFS_UTIL_PROG device add -f "$scratch_dev2" "$SCRATCH_MNT" +swapon "$SCRATCH_MNT/swap" 2>&1 | _filter_scratch +swapoff "$SCRATCH_MNT/swap" > /dev/null 2>&1 +_scratch_unmount + +status=0 +exit diff --git a/tests/btrfs/175.out b/tests/btrfs/175.out new file mode 100644 index ..ce2e5992 --- /dev/null +++ b/tests/btrfs/175.out @@ -0,0 +1,8 @@ +QA output created by 175 +RAID 1 +swapon: SCRATCH_MNT/swap: swapon failed: Invalid argument +DUP +swapon: SCRATCH_MNT/swap: swapon failed: Invalid argument +Single on multiple devices +swapon: SCRATCH_MNT/swap: swapon failed: Invalid argument +Single on one device diff --git a/tests/btrfs/group b/tests/btrfs/group index 2e10f7df..b6160b72 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -177,3 +177,4 @@ 172 auto quick punch 173 auto quick swap 174 auto quick swap +175 auto quick swap -- 2.19.1
[PATCH 4/7] btrfs: test invalid operations on a swap file
From: Omar Sandoval Btrfs forbids some operations which should not be done on a swap file. Signed-off-by: Omar Sandoval --- tests/btrfs/174 | 66 + tests/btrfs/174.out | 10 +++ tests/btrfs/group | 1 + 3 files changed, 77 insertions(+) create mode 100755 tests/btrfs/174 create mode 100644 tests/btrfs/174.out diff --git a/tests/btrfs/174 b/tests/btrfs/174 new file mode 100755 index ..a26e6669 --- /dev/null +++ b/tests/btrfs/174 @@ -0,0 +1,66 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2018 Facebook. All Rights Reserved. +# +# FS QA Test 174 +# +# Test restrictions on operations that can be done on an active swap file +# specific to Btrfs. +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +. ./common/rc +. ./common/filter + +rm -f $seqres.full + +_supported_fs generic +_supported_os Linux +_require_scratch_swapfile + +_scratch_mkfs >> $seqres.full 2>&1 +_scratch_mount + +$BTRFS_UTIL_PROG subvol create "$SCRATCH_MNT/swapvol" >> $seqres.full +swapfile="$SCRATCH_MNT/swapvol/swap" +_format_swapfile "$swapfile" $(($(get_page_size) * 10)) +swapon "$swapfile" + +# Turning off nowcow doesn't do anything because the file is not empty, not +# because the file is a swap file, but make sure this works anyways. +echo "Disable nocow" +$CHATTR_PROG -C "$swapfile" +lsattr -l "$swapfile" | _filter_scratch | _filter_spaces + +# Compression we reject outright. +echo "Enable compression" +$CHATTR_PROG +c "$swapfile" 2>&1 | grep -o "Text file busy" +lsattr -l "$swapfile" | _filter_scratch | _filter_spaces + +echo "Snapshot" +$BTRFS_UTIL_PROG subvol snap "$SCRATCH_MNT/swapvol" \ + "$SCRATCH_MNT/swapsnap" 2>&1 | grep -o "Text file busy" + +echo "Defrag" +# We pass the -c (compress) flag to force defrag even if the file isn't +# fragmented. +$BTRFS_UTIL_PROG filesystem defrag -c "$swapfile" 2>&1 | grep -o "Text file busy" + +swapoff "$swapfile" +_scratch_unmount + +status=0 +exit diff --git a/tests/btrfs/174.out b/tests/btrfs/174.out new file mode 100644 index ..bc24f1fb --- /dev/null +++ b/tests/btrfs/174.out @@ -0,0 +1,10 @@ +QA output created by 174 +Disable nocow +SCRATCH_MNT/swapvol/swap No_COW +Enable compression +Text file busy +SCRATCH_MNT/swapvol/swap No_COW +Snapshot +Text file busy +Defrag +Text file busy diff --git a/tests/btrfs/group b/tests/btrfs/group index 3525014f..2e10f7df 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -176,3 +176,4 @@ 171 auto quick qgroup 172 auto quick punch 173 auto quick swap +174 auto quick swap -- 2.19.1
[PATCH 6/7] btrfs: test device add/remove/replace with an active swap file
From: Omar Sandoval Make sure that we don't remove or replace a device with an active swap file but can add, remove, and replace other devices. Signed-off-by: Omar Sandoval --- tests/btrfs/176 | 82 + tests/btrfs/176.out | 5 +++ tests/btrfs/group | 1 + 3 files changed, 88 insertions(+) create mode 100755 tests/btrfs/176 create mode 100644 tests/btrfs/176.out diff --git a/tests/btrfs/176 b/tests/btrfs/176 new file mode 100755 index ..1e576149 --- /dev/null +++ b/tests/btrfs/176 @@ -0,0 +1,82 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2018 Facebook. All Rights Reserved. +# +# FS QA Test 176 +# +# Test device remove/replace with an active swap file. +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# remove previous $seqres.full before test +rm -f $seqres.full + +# real QA test starts here + +# Modify as appropriate. +_supported_fs generic +_supported_os Linux +_require_scratch_dev_pool 3 +_require_scratch_swapfile + +# We check the filesystem manually because we move devices around. +rm -f "${RESULT_DIR}/require_scratch" + +scratch_dev1="$(echo "${SCRATCH_DEV_POOL}" | awk '{ print $1 }')" +scratch_dev2="$(echo "${SCRATCH_DEV_POOL}" | awk '{ print $2 }')" +scratch_dev3="$(echo "${SCRATCH_DEV_POOL}" | awk '{ print $3 }')" + +echo "Remove device" +_scratch_mkfs >> $seqres.full 2>&1 +_scratch_mount +_format_swapfile "$SCRATCH_MNT/swap" $(($(get_page_size) * 10)) +$BTRFS_UTIL_PROG device add -f "$scratch_dev2" "$SCRATCH_MNT" +swapon "$SCRATCH_MNT/swap" 2>&1 | _filter_scratch +# We know the swap file is on device 1 because we added device 2 after it was +# already created. +$BTRFS_UTIL_PROG device delete "$scratch_dev1" "$SCRATCH_MNT" 2>&1 | grep -o "Text file busy" +# Deleting/readding device 2 should still work. +$BTRFS_UTIL_PROG device delete "$scratch_dev2" "$SCRATCH_MNT" +$BTRFS_UTIL_PROG device add -f "$scratch_dev2" "$SCRATCH_MNT" +swapoff "$SCRATCH_MNT/swap" > /dev/null 2>&1 +# Deleting device 1 should work again after swapoff. +$BTRFS_UTIL_PROG device delete "$scratch_dev1" "$SCRATCH_MNT" +_scratch_unmount +_check_scratch_fs "$scratch_dev2" + +echo "Replace device" +_scratch_mkfs >> $seqres.full 2>&1 +_scratch_mount +_format_swapfile "$SCRATCH_MNT/swap" $(($(get_page_size) * 10)) +$BTRFS_UTIL_PROG device add -f "$scratch_dev2" "$SCRATCH_MNT" +swapon "$SCRATCH_MNT/swap" 2>&1 | _filter_scratch +# Again, we know the swap file is on device 1. +$BTRFS_UTIL_PROG replace start -fB "$scratch_dev1" "$scratch_dev3" "$SCRATCH_MNT" 2>&1 | grep -o "Text file busy" +# Replacing device 2 should still work. +$BTRFS_UTIL_PROG replace start -fB "$scratch_dev2" "$scratch_dev3" "$SCRATCH_MNT" +swapoff "$SCRATCH_MNT/swap" > /dev/null 2>&1 +# Replacing device 1 should work again after swapoff. +$BTRFS_UTIL_PROG replace start -fB "$scratch_dev1" "$scratch_dev2" "$SCRATCH_MNT" +_scratch_unmount +_check_scratch_fs "$scratch_dev2" + +# success, all done +status=0 +exit diff --git a/tests/btrfs/176.out b/tests/btrfs/176.out new file mode 100644 index ..5c99e0fd --- /dev/null +++ b/tests/btrfs/176.out @@ -0,0 +1,5 @@ +QA output created by 176 +Remove device +Text file busy +Replace device +Text file busy diff --git a/tests/btrfs/group b/tests/btrfs/group index b6160b72..3562420b 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -178,3 +178,4 @@ 173 auto quick swap 174 auto quick swap 175 auto quick swap +176 auto quick swap -- 2.19.1
[PATCH 0/7] fstests: test Btrfs swapfile support
From: Omar Sandoval This series fixes a couple of generic swapfile tests and adds some Btrfs-specific swapfile tests. Btrfs swapfile support is scheduled for 4.21 [1]. 1: https://www.spinics.net/lists/linux-btrfs/msg83454.html Thanks! Omar Sandoval (7): generic/{472,496,497}: fix $seeqres typo generic/{472,496}: fix swap file creation on Btrfs btrfs: test swap file activation restrictions btrfs: test invalid operations on a swap file btrfs: test swap files on multiple devices btrfs: test device add/remove/replace with an active swap file btrfs: test balance and resize with an active swap file tests/btrfs/173 | 55 ++ tests/btrfs/173.out | 5 +++ tests/btrfs/174 | 66 tests/btrfs/174.out | 10 ++ tests/btrfs/175 | 73 tests/btrfs/175.out | 8 + tests/btrfs/176 | 82 + tests/btrfs/176.out | 5 +++ tests/btrfs/177 | 64 +++ tests/btrfs/177.out | 6 tests/btrfs/group | 5 +++ tests/generic/472 | 16 - tests/generic/496 | 8 ++--- tests/generic/497 | 2 +- 14 files changed, 391 insertions(+), 14 deletions(-) create mode 100755 tests/btrfs/173 create mode 100644 tests/btrfs/173.out create mode 100755 tests/btrfs/174 create mode 100644 tests/btrfs/174.out create mode 100755 tests/btrfs/175 create mode 100644 tests/btrfs/175.out create mode 100755 tests/btrfs/176 create mode 100644 tests/btrfs/176.out create mode 100755 tests/btrfs/177 create mode 100644 tests/btrfs/177.out -- 2.19.1
[PATCH 7/7] btrfs: test balance and resize with an active swap file
From: Omar Sandoval Make sure we don't shrink the device past an active swap file, but allow shrinking otherwise, as well as growing and balance. Signed-off-by: Omar Sandoval --- tests/btrfs/177 | 64 + tests/btrfs/177.out | 6 + tests/btrfs/group | 1 + 3 files changed, 71 insertions(+) create mode 100755 tests/btrfs/177 create mode 100644 tests/btrfs/177.out diff --git a/tests/btrfs/177 b/tests/btrfs/177 new file mode 100755 index ..12dad8fc --- /dev/null +++ b/tests/btrfs/177 @@ -0,0 +1,64 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2018 Facebook. All Rights Reserved. +# +# FS QA Test 177 +# +# Test relocation (balance and resize) with an active swap file. +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +. ./common/rc +. ./common/filter +. ./common/btrfs + +rm -f $seqres.full + +# Modify as appropriate. +_supported_fs generic +_supported_os Linux +_require_scratch_swapfile + +swapfile="$SCRATCH_MNT/swap" + +# First, create a 1GB filesystem and fill it up. +_scratch_mkfs_sized $((1024 * 1024 * 1024)) >> $seqres.full 2>&1 +_scratch_mount +dd if=/dev/zero of="$SCRATCH_MNT/fill" bs=1024k >> $seqres.full 2>&1 +# Now add more space and create a swap file. We know that the first 1GB of the +# filesystem was used, so the swap file must be in the new part of the +# filesystem. +$BTRFS_UTIL_PROG filesystem resize 2G "$SCRATCH_MNT" | _filter_scratch +_format_swapfile "$swapfile" $((32 * 1024 * 1024)) +swapon "$swapfile" +# Add even more space which we know is unused. +$BTRFS_UTIL_PROG filesystem resize 3G "$SCRATCH_MNT" | _filter_scratch +# Free up the first 1GB of the filesystem. +rm -f "$SCRATCH_MNT/fill" +# Get rid of empty block groups and also make sure that balance skips block +# groups containing active swap files. +_run_btrfs_balance_start "$SCRATCH_MNT" +# Shrink away the unused space. +$BTRFS_UTIL_PROG filesystem resize 2G "$SCRATCH_MNT" | _filter_scratch +# Try to shrink away the area occupied by the swap file, which should fail. +$BTRFS_UTIL_PROG filesystem resize 1G "$SCRATCH_MNT" 2>&1 | grep -o "Text file busy" +swapoff "$swapfile" +# It should work again after swapoff. +$BTRFS_UTIL_PROG filesystem resize 1G "$SCRATCH_MNT" | _filter_scratch +_scratch_unmount + +status=0 +exit diff --git a/tests/btrfs/177.out b/tests/btrfs/177.out new file mode 100644 index ..6ced01da --- /dev/null +++ b/tests/btrfs/177.out @@ -0,0 +1,6 @@ +QA output created by 177 +Resize 'SCRATCH_MNT' of '2G' +Resize 'SCRATCH_MNT' of '3G' +Resize 'SCRATCH_MNT' of '2G' +Text file busy +Resize 'SCRATCH_MNT' of '1G' diff --git a/tests/btrfs/group b/tests/btrfs/group index 3562420b..0b62e58a 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -179,3 +179,4 @@ 174 auto quick swap 175 auto quick swap 176 auto quick swap +177 auto quick swap -- 2.19.1
[PATCH 1/7] generic/{472,496,497}: fix $seeqres typo
From: Omar Sandoval Signed-off-by: Omar Sandoval --- tests/generic/472 | 2 +- tests/generic/496 | 2 +- tests/generic/497 | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/generic/472 b/tests/generic/472 index c74d6c70..04ed3e73 100755 --- a/tests/generic/472 +++ b/tests/generic/472 @@ -51,7 +51,7 @@ swapfile_cycle() { $CHATTR_PROG +C $swapfile >> $seqres.full 2>&1 "$here/src/mkswap" $swapfile >> $seqres.full "$here/src/swapon" $swapfile 2>&1 | _filter_scratch - swapoff $swapfile 2>> $seeqres.full + swapoff $swapfile 2>> $seqres.full rm -f $swapfile } diff --git a/tests/generic/496 b/tests/generic/496 index 1c9651ad..968b8012 100755 --- a/tests/generic/496 +++ b/tests/generic/496 @@ -53,7 +53,7 @@ swapfile_cycle() { $CHATTR_PROG +C $swapfile >> $seqres.full 2>&1 "$here/src/mkswap" $swapfile >> $seqres.full "$here/src/swapon" $swapfile 2>&1 | _filter_scratch - swapoff $swapfile 2>> $seeqres.full + swapoff $swapfile 2>> $seqres.full rm -f $swapfile } diff --git a/tests/generic/497 b/tests/generic/497 index 584af58a..3d5502ef 100755 --- a/tests/generic/497 +++ b/tests/generic/497 @@ -53,7 +53,7 @@ swapfile_cycle() { $CHATTR_PROG +C $swapfile >> $seqres.full 2>&1 "$here/src/mkswap" $swapfile >> $seqres.full "$here/src/swapon" $swapfile 2>&1 | _filter_scratch - swapoff $swapfile 2>> $seeqres.full + swapoff $swapfile 2>> $seqres.full rm -f $swapfile } -- 2.19.1
[PATCH 2/7] generic/{472,496}: fix swap file creation on Btrfs
From: Omar Sandoval The swap file must be set nocow before it is written to, otherwise it is ignored and Btrfs refuses to activate it as swap. Fixes: 25ce9740065e ("generic: test swapfile creation, activation, and deactivation") Signed-off-by: Omar Sandoval --- tests/generic/472 | 14 ++ tests/generic/496 | 6 +++--- 2 files changed, 9 insertions(+), 11 deletions(-) diff --git a/tests/generic/472 b/tests/generic/472 index 04ed3e73..aba4a007 100755 --- a/tests/generic/472 +++ b/tests/generic/472 @@ -42,13 +42,15 @@ _scratch_mount >>$seqres.full 2>&1 swapfile=$SCRATCH_MNT/swap len=$((2 * 1048576)) -page_size=$(get_page_size) swapfile_cycle() { local swapfile="$1" + local len="$2" + touch $swapfile # Swap files must be nocow on Btrfs. $CHATTR_PROG +C $swapfile >> $seqres.full 2>&1 + _pwrite_byte 0x58 0 $len $swapfile >> $seqres.full "$here/src/mkswap" $swapfile >> $seqres.full "$here/src/swapon" $swapfile 2>&1 | _filter_scratch swapoff $swapfile 2>> $seqres.full @@ -57,20 +59,16 @@ swapfile_cycle() { # Create a regular swap file echo "regular swap" | tee -a $seqres.full -_pwrite_byte 0x58 0 $len $swapfile >> $seqres.full -swapfile_cycle $swapfile +swapfile_cycle $swapfile $len # Create a swap file with a little too much junk on the end echo "too long swap" | tee -a $seqres.full -_pwrite_byte 0x58 0 $((len + 3)) $swapfile >> $seqres.full -swapfile_cycle $swapfile +swapfile_cycle $swapfile $((len + 3)) # Create a ridiculously small swap file. Each swap file must have at least # two pages after the header page. echo "tiny swap" | tee -a $seqres.full -tiny_len=$((page_size * 3)) -_pwrite_byte 0x58 0 $tiny_len $swapfile >> $seqres.full -swapfile_cycle $swapfile +swapfile_cycle $swapfile $(($(get_page_size) * 3)) status=0 exit diff --git a/tests/generic/496 b/tests/generic/496 index 968b8012..3083eef0 100755 --- a/tests/generic/496 +++ b/tests/generic/496 @@ -49,8 +49,6 @@ page_size=$(get_page_size) swapfile_cycle() { local swapfile="$1" - # Swap files must be nocow on Btrfs. - $CHATTR_PROG +C $swapfile >> $seqres.full 2>&1 "$here/src/mkswap" $swapfile >> $seqres.full "$here/src/swapon" $swapfile 2>&1 | _filter_scratch swapoff $swapfile 2>> $seqres.full @@ -59,8 +57,10 @@ swapfile_cycle() { # Create a fallocated swap file echo "fallocate swap" | tee -a $seqres.full -$XFS_IO_PROG -f -c "falloc 0 $len" $swapfile >> $seqres.full +touch $swapfile +# Swap files must be nocow on Btrfs. $CHATTR_PROG +C $swapfile >> $seqres.full 2>&1 +$XFS_IO_PROG -f -c "falloc 0 $len" $swapfile >> $seqres.full "$here/src/mkswap" $swapfile "$here/src/swapon" $swapfile >> $seqres.full 2>&1 || \ _notrun "fallocated swap not supported here" -- 2.19.1
[PATCH 3/7] btrfs: test swap file activation restrictions
From: Omar Sandoval Swap files on Btrfs have some restrictions not applicable to other filesystems. Signed-off-by: Omar Sandoval --- tests/btrfs/173 | 55 + tests/btrfs/173.out | 5 + tests/btrfs/group | 1 + 3 files changed, 61 insertions(+) create mode 100755 tests/btrfs/173 create mode 100644 tests/btrfs/173.out diff --git a/tests/btrfs/173 b/tests/btrfs/173 new file mode 100755 index ..665bec39 --- /dev/null +++ b/tests/btrfs/173 @@ -0,0 +1,55 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2018 Facebook. All Rights Reserved. +# +# FS QA Test 173 +# +# Test swap file activation restrictions specific to Btrfs. +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +. ./common/rc +. ./common/filter + +rm -f $seqres.full + +_supported_fs generic +_supported_os Linux +_require_scratch_swapfile + +_scratch_mkfs >> $seqres.full 2>&1 +_scratch_mount + +echo "COW file" +# We can't use _format_swapfile because we don't want chattr +C, and we can't +# unset it after the swap file has been created. +rm -f "$SCRATCH_MNT/swap" +touch "$SCRATCH_MNT/swap" +chmod 0600 "$SCRATCH_MNT/swap" +_pwrite_byte 0x61 0 $(($(get_page_size) * 10)) "$SCRATCH_MNT/swap" >> $seqres.full +mkswap "$SCRATCH_MNT/swap" >> $seqres.full +swapon "$SCRATCH_MNT/swap" 2>&1 | _filter_scratch +swapoff "$SCRATCH_MNT/swap" >/dev/null 2>&1 + +echo "Compressed file" +rm -f "$SCRATCH_MNT/swap" +_format_swapfile "$SCRATCH_MNT/swap" $(($(get_page_size) * 10)) +$CHATTR_PROG +c "$SCRATCH_MNT/swap" +swapon "$SCRATCH_MNT/swap" 2>&1 | _filter_scratch +swapoff "$SCRATCH_MNT/swap" >/dev/null 2>&1 + +status=0 +exit diff --git a/tests/btrfs/173.out b/tests/btrfs/173.out new file mode 100644 index ..6d7856bf --- /dev/null +++ b/tests/btrfs/173.out @@ -0,0 +1,5 @@ +QA output created by 173 +COW file +swapon: SCRATCH_MNT/swap: swapon failed: Invalid argument +Compressed file +swapon: SCRATCH_MNT/swap: swapon failed: Invalid argument diff --git a/tests/btrfs/group b/tests/btrfs/group index a490d7eb..3525014f 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -175,3 +175,4 @@ 170 auto quick snapshot 171 auto quick qgroup 172 auto quick punch +173 auto quick swap -- 2.19.1
BTRFS did it's job nicely (thanks!)
Hi, my main computer runs on a 7x SSD BTRFS as rootfs with data:RAID1 and metadata:RAID10. One SSD is probably about to fail, and it seems that BTRFS fixed it nicely (thanks everyone!) I decided to just post the ugly details in case someone just wants to have a look. Note that I tend to interpret the btrfs de st / output as if the error was NOT fixed even if (seems clearly that) it was, so I think the output is a bit misleading... just saying... -- below are the details for those curious (just for fun) --- scrub status for [YOINK!] scrub started at Fri Nov 2 17:49:45 2018 and finished after 00:29:26 total bytes scrubbed: 1.15TiB with 1 errors error details: csum=1 corrected errors: 1, uncorrectable errors: 0, unverified errors: 0 btrfs fi us -T / Overall: Device size: 1.18TiB Device allocated: 1.17TiB Device unallocated:9.69GiB Device missing: 0.00B Used: 1.17TiB Free (estimated): 6.30GiB (min: 6.30GiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data Metadata System Id Path RAID1 RAID10RAID10Unallocated -- - - - - --- 6 /dev/sda1 236.28GiB 704.00MiB 32.00MiB 485.00MiB 7 /dev/sdb1 233.72GiB 1.03GiB 32.00MiB 2.69GiB 2 /dev/sdc1 110.56GiB 352.00MiB - 904.00MiB 8 /dev/sdd1 234.96GiB 1.03GiB 32.00MiB 1.45GiB 1 /dev/sde1 164.90GiB 1.03GiB 32.00MiB 1.72GiB 9 /dev/sdf1 109.00GiB 1.03GiB 32.00MiB 744.00MiB 10 /dev/sdg1 107.98GiB 1.03GiB 32.00MiB 1.74GiB -- - - - - --- Total 598.70GiB 3.09GiB 96.00MiB 9.69GiB Used 597.25GiB 1.57GiB 128.00KiB uname -a Linux main 4.18.0-2-amd64 #1 SMP Debian 4.18.10-2 (2018-10-07) x86_64 GNU/Linux btrfs --version btrfs-progs v4.17 dmesg | grep -i btrfs [7.801817] Btrfs loaded, crc32c=crc32c-generic [8.163288] BTRFS: device label btrfsroot devid 10 transid 669961 /dev/sdg1 [8.163433] BTRFS: device label btrfsroot devid 9 transid 669961 /dev/sdf1 [8.163591] BTRFS: device label btrfsroot devid 1 transid 669961 /dev/sde1 [8.163734] BTRFS: device label btrfsroot devid 8 transid 669961 /dev/sdd1 [8.163974] BTRFS: device label btrfsroot devid 2 transid 669961 /dev/sdc1 [8.164117] BTRFS: device label btrfsroot devid 7 transid 669961 /dev/sdb1 [8.164262] BTRFS: device label btrfsroot devid 6 transid 669961 /dev/sda1 [8.206174] BTRFS info (device sde1): disk space caching is enabled [8.206236] BTRFS info (device sde1): has skinny extents [8.348610] BTRFS info (device sde1): enabling ssd optimizations [8.854412] BTRFS info (device sde1): enabling free space tree [8.854471] BTRFS info (device sde1): using free space tree [ 68.170580] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.185973] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.185991] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.186003] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.186015] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.186028] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.186041] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.186052] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.186063] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.186075] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.199237] BTRFS info (device sde1): read error corrected: ino 3247424 off 36700160 (dev /dev/sda1 sector 244987192) [ 68.202602] BTRFS info (device sde1): read error corrected: ino 3247424 off 36704256 (dev /dev/sda1 sector 244987192) [ 68.203176] BTRFS info (device sde1): read error corrected: ino 3247424 off 36712448 (dev /dev/sda1 sector 244987192) [ 68.206762] BTRFS info (device sde1): read error corrected: ino 3247424 off 36708352 (dev /dev/sda1 sector 244987192) [ 68.212071] BTRFS info
Big amount of snapshots and disk full
Hi, I am doing backups with rsync and do versioning with btrfs snapshots. And last week I got file system full. I googled some and noticed that there might be bug in space_cache so I cleared cache and mounted it with nospace_cache and system worked about week (managed to delete & create tens of snapshots during that time. Before 1st issue system have worked without problems over year). Now I got same issue again. Any Ideas what I should try next? Disk have never been even half full. An error occurred when script was creating new snapshot, rsync before that was fine. (And there was another rsync to another destination running at same time) Is there some issue with big amount of snapshots? I have nearly 50 machines on backup and most of them have less than ten versions, so my subvol hierarchy is like this. data (btrfsroot) machine1 (subvol) last (subvol) [date] (readonly snapshot of last) [date2] (readonly snapshot of last) ... machine2 (subvol) last (subvol) [date] (readonly snapshot of last) .. root@backuprasia:~# btrfs sub list /data |wc -l 492 root@backuprasia:~# btrfs fi usage /data Overall: Device size: 30.00TiB Device allocated: 13.62TiB Device unallocated: 16.38TiB Device missing: 0.00B Used: 13.59TiB Free (estimated): 16.40TiB (min: 16.40TiB) Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 512.00MiB (used: 499.00MiB) Data,single: Size:13.45TiB, Used:13.43TiB /dev/mapper/datavg-backup1 13.45TiB Metadata,single: Size:172.01GiB, Used:169.20GiB /dev/mapper/datavg-backup1172.01GiB System,single: Size:4.00MiB, Used:1.62MiB /dev/mapper/datavg-backup1 4.00MiB Unallocated: /dev/mapper/datavg-backup1 16.38TiB root@backuprasia:~# uname -a Linux backuprasia 4.15.0-38-generic #41~16.04.1-Ubuntu SMP Wed Oct 10 20:16:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Nov 02 01:58:31 backuprasia kernel: [ cut here ] Nov 02 01:58:31 backuprasia kernel: BTRFS: Transaction aborted (error -28) Nov 02 01:58:31 backuprasia kernel: WARNING: CPU: 8 PID: 14381 at /build/linux-hwe-4LSUYr/linux-hwe-4.15.0/fs/btrfs/extent-tree.c:3784 btrfs_start_dirty_block_groups+0x260//0x430 [btrfs] Nov 02 01:58:31 backuprasia kernel: [205245.441842] Modules linked in: ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_multiport iptable_filter ip_tables x_tables intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass intel_cstate 8021q garp mrp mei_me ipmi_ssif stp intel_rapl_perf llc joydev input_leds lpc_ich mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs zstd_compress ses enclosure raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear ast i2c_algo_bit ixgbe ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea Nov 02 01:58:31 backuprasia kernel: [205245.441911] pcbc sysfillrect aesni_intel sysimgblt fb_sys_fops aes_x86_64 raid1 hid_generic crypto_simd usbhid dca glue_helper mpt3sas ahci ptp raid_class drm r8169 hid mxm_wmi cryptd libahci pps_core scsi_transport_sas mii mdio wmi Nov 02 01:58:31 backuprasia kernel: pcbc sysfillrect aesni_intel sysimgblt fb_sys_fops aes_x86_64 raid1 hid_generic crypto_simd usbhid dca glue_helper mpt3sas ahci ptp rai Nov 02 01:58:31 backuprasia kernel: CPU: 8 PID: 14381 Comm: btrfs-transacti Not tainted 4.15.0-38-generic #41~16.04.1-Ubuntu Nov 02 01:58:31 backuprasia kernel: Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 2.1 09/13/2016 Nov 02 01:58:31 backuprasia kernel: RIP: 0010:btrfs_start_dirty_block_groups+0x260/0x430 [btrfs] Nov 02 01:58:31 backuprasia kernel: RSP: 0018:b7b60efbbdc0 EFLAGS: 00010282 Nov 02 01:58:31 backuprasia kernel: RAX: RBX: 8bd5d4a1bc30 RCX: 0006 Nov 02 01:58:31 backuprasia kernel: RDX: 0007 RSI: 0092 RDI: 8bd77f416490 Nov 02 01:58:31 backuprasia kernel: RBP: b7b60efbbe30 R08: 0001 R09: 0539 Nov 02 01:58:31 backuprasia kernel: R10: R11: 0539 R12: 8bd60128e800 Nov 02 01:58:31 backuprasia kernel: R13: 0001 R14: 8bd1d1ce32a0 R15: 8bd60128e948 Nov 02 01:58:31 backuprasia kernel: FS: () GS:8bd77f40() knlGS: Nov 02 01:58:31 backuprasia kernel: CS: 0010 DS: ES: CR0: 80050033 Nov 02 01:58:31 backuprasia kernel: CR2: 7fc65cfc6868 CR3: 000f8e80a003 CR4: 003606e0 Nov 02 01:58:31 backuprasia kernel: DR0:
Re: Salvage files from broken btrfs
On 02.11.2018 at 15:45 Qu Wenruo wrote: > On 2018/11/2 下午10:30, M. Klingmann wrote: >> On 31.10.2018 at 01:03 Qu Wenruo wrote: >>> My plan for such recovery is: >>> >>> 1) btrfs ins dump-super to make sure system chunk array is valid >>> 2) btrfs-find-root to find any valid chunk tree blocks >>> 3) pass that chunk tree bytenr to btrfs-restore >>>Unfortunately, btrfs-restore doesn't support specifying chunk root >>>yet. But it's pretty easy to add such support. >>> >>> So, please provide the "btrfs ins dump-super -Ffa" output to start with. >> Following your plan, I did 1) and 2). >> As 2) failed (see below), is there anything I can do to find the tree >> bytenr to supply btrfs-restore with it? >> >> 1) Here's the output given by "btrfs-show-super -Ffa": >> >> superblock: bytenr=65536, device=sdcard.iso >> - >> csum 0xb8e15dd7 [match] [snip] >> 2) "btrfs-find-root" yields "Couldn't read chunk root; Open ctree failed". > It's not plain "btrfs-find-root" but "btrfs-find-root -o 5". > > And you should use btrfs-progs v4.17.1, not the old v4.4. > The ability to continue search even if chunk tree get corrupted is added > in v4.5, and I strongly recommend to use latest (v4.17.1) for a lot of > fixes and extra debug output. > > If you can't find any handy way to update btrfs-progs, you could use > Archlinux iso as a rescue OS to use the latest btrfs-progs. Using Archlinux in fact is the easiest way to use version 4.17.1 (Archlinux for 2018-11-01). Here's the output from "btrfs-find-root sdcard.iso": WARNING: cannot read chunk root, continue anyway Superblock thinks the generation is 1757933 Superblock thinks the level is 0 Here's the output using "btrfs-find-root -o 5 sdcard.iso": WARNING: cannot read chunk root, continue anyway Superblock doesn't contain generation info for root 5 Superblock doesn't contain the level info for root 5 > For 3), I could easily add such feature btrfs-restore, or just use > manually patching your superblock to continue. > So as soon as your "btrfs-find-root -o 5" gets some valid output, I > could continue the work. > Thank you. -- Mirko signature.asc Description: OpenPGP digital signature
Re: Salvage files from broken btrfs
On 2018/11/2 下午10:30, M. Klingmann wrote: > > On 31.10.2018 at 01:03 Qu Wenruo wrote: >> My plan for such recovery is: >> >> 1) btrfs ins dump-super to make sure system chunk array is valid >> 2) btrfs-find-root to find any valid chunk tree blocks >> 3) pass that chunk tree bytenr to btrfs-restore >>Unfortunately, btrfs-restore doesn't support specifying chunk root >>yet. But it's pretty easy to add such support. >> >> So, please provide the "btrfs ins dump-super -Ffa" output to start with. > Following your plan, I did 1) and 2). > As 2) failed (see below), is there anything I can do to find the tree > bytenr to supply btrfs-restore with it? > > 1) Here's the output given by "btrfs-show-super -Ffa": > > superblock: bytenr=65536, device=sdcard.iso > - > csum 0xb8e15dd7 [match] > bytenr 65536 > flags 0x1 > ( WRITTEN ) > magic _BHRfS_M [match] > fsid 4235aa4f-7721-4e73-97f0-7fe0e9a3ce9c > label > generation 1757933 > root 889143296 > sys_array_size 226 > chunk_root_generation 932006 > root_level 0 > chunk_root 20987904 > chunk_root_level 0 > log_root 890109952 > log_root_transid 0 > log_root_level 0 > total_bytes 3053696 > bytes_used 16937803776 > sectorsize 4096 > nodesize 16384 > leafsize 16384 > stripesize 4096 > root_dir 6 > num_devices 1 > compat_flags 0x0 > compat_ro_flags 0x0 > incompat_flags 0x61 > ( MIXED_BACKREF | > BIG_METADATA | > EXTENDED_IREF ) > csum_type 0 > csum_size 4 > cache_generation 1757933 > uuid_tree_generation 149 > dev_item.uuid 90185cf6-b937-49bb-b191-91d08677ee22 > dev_item.fsid 4235aa4f-7721-4e73-97f0-7fe0e9a3ce9c [match] > dev_item.type 0 > dev_item.total_bytes 3053696 > dev_item.bytes_used 3053696 > dev_item.io_align 4096 > dev_item.io_width 4096 > dev_item.sector_size 4096 > dev_item.devid 1 > dev_item.dev_group 0 > dev_item.seek_speed 0 > dev_item.bandwidth 0 > dev_item.generation 0 > sys_chunk_array[2048]: > item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 0) > chunk length 4194304 owner 2 stripe_len 65536 > type SYSTEM num_stripes 1 > stripe 0 devid 1 offset 0 > dev uuid: 90185cf6-b937-49bb-b191-91d08677ee22 > item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) > chunk length 8388608 owner 2 stripe_len 65536 > type SYSTEM|DUP num_stripes 2 This chunk looks pretty OK. And it's DUP, so it improves the possibility to recover. > stripe 0 devid 1 offset 20971520 > dev uuid: 90185cf6-b937-49bb-b191-91d08677ee22 > stripe 1 devid 1 offset 29360128 > dev uuid: 90185cf6-b937-49bb-b191-91d08677ee22 > backup_roots[4]: > backup 0: > backup_tree_root: 889143296 gen: 1757933 level: 0 > backup_chunk_root: 20987904 gen: 932006 level: 0 > backup_extent_root: 81152 gen: 1757933 level: 2 > backup_fs_root: 889716736 gen: 1757934 level: 2 > backup_dev_root: 307560448 gen: 1673227 level: 0 > backup_csum_root: 887898112 gen: 1757934 level: 2 > backup_total_bytes: 3053696 > backup_bytes_used: 16937803776 > backup_num_devices: 1 > > backup 1: > backup_tree_root: 882311168 gen: 1757930 level: 0 > backup_chunk_root: 20987904 gen: 932006 level: 0 > backup_extent_root: 879738880 gen: 1757931 level: 2 > backup_fs_root: 883097600 gen: 1757931 level: 2 > backup_dev_root: 307560448 gen: 1673227 level: 0 > backup_csum_root: 883212288 gen: 1757931 level: 2 > backup_total_bytes: 3053696 > backup_bytes_used: 16943640576 > backup_num_devices: 1 > > backup 2: > backup_tree_root: 881082368 gen: 1757931 level: 0 > backup_chunk_root: 20987904 gen: 932006 level: 0 > backup_extent_root: 879738880 gen: 1757931 level: 2 > backup_fs_root: 883654656 gen: 1757932 level: 2 > backup_dev_root: 307560448 gen: 1673227 level: 0 > backup_csum_root: 883703808 gen: 1757932 level: 2 > backup_total_bytes: 3053696 > backup_bytes_used: 16943722496 > backup_num_devices: 1 > > backup 3: > backup_tree_root: 887865344 gen: 1757932 level: 0 > backup_chunk_root: 20987904 gen: 932006 level: 0 > backup_extent_root: 81152 gen: 1757933 level: 2 > backup_fs_root: 888750080 gen: 1757933 level: 2 > backup_dev_root:
Re: Salvage files from broken btrfs
On 31.10.2018 at 01:03 Qu Wenruo wrote: > My plan for such recovery is: > > 1) btrfs ins dump-super to make sure system chunk array is valid > 2) btrfs-find-root to find any valid chunk tree blocks > 3) pass that chunk tree bytenr to btrfs-restore >Unfortunately, btrfs-restore doesn't support specifying chunk root >yet. But it's pretty easy to add such support. > > So, please provide the "btrfs ins dump-super -Ffa" output to start with. Following your plan, I did 1) and 2). As 2) failed (see below), is there anything I can do to find the tree bytenr to supply btrfs-restore with it? 1) Here's the output given by "btrfs-show-super -Ffa": superblock: bytenr=65536, device=sdcard.iso - csum 0xb8e15dd7 [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid 4235aa4f-7721-4e73-97f0-7fe0e9a3ce9c label generation 1757933 root 889143296 sys_array_size 226 chunk_root_generation 932006 root_level 0 chunk_root 20987904 chunk_root_level 0 log_root 890109952 log_root_transid 0 log_root_level 0 total_bytes 3053696 bytes_used 16937803776 sectorsize 4096 nodesize 16384 leafsize 16384 stripesize 4096 root_dir 6 num_devices 1 compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x61 ( MIXED_BACKREF | BIG_METADATA | EXTENDED_IREF ) csum_type 0 csum_size 4 cache_generation 1757933 uuid_tree_generation 149 dev_item.uuid 90185cf6-b937-49bb-b191-91d08677ee22 dev_item.fsid 4235aa4f-7721-4e73-97f0-7fe0e9a3ce9c [match] dev_item.type 0 dev_item.total_bytes 3053696 dev_item.bytes_used 3053696 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size 4096 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 sys_chunk_array[2048]: item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 0) chunk length 4194304 owner 2 stripe_len 65536 type SYSTEM num_stripes 1 stripe 0 devid 1 offset 0 dev uuid: 90185cf6-b937-49bb-b191-91d08677ee22 item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) chunk length 8388608 owner 2 stripe_len 65536 type SYSTEM|DUP num_stripes 2 stripe 0 devid 1 offset 20971520 dev uuid: 90185cf6-b937-49bb-b191-91d08677ee22 stripe 1 devid 1 offset 29360128 dev uuid: 90185cf6-b937-49bb-b191-91d08677ee22 backup_roots[4]: backup 0: backup_tree_root: 889143296 gen: 1757933 level: 0 backup_chunk_root: 20987904 gen: 932006 level: 0 backup_extent_root: 81152 gen: 1757933 level: 2 backup_fs_root: 889716736 gen: 1757934 level: 2 backup_dev_root: 307560448 gen: 1673227 level: 0 backup_csum_root: 887898112 gen: 1757934 level: 2 backup_total_bytes: 3053696 backup_bytes_used: 16937803776 backup_num_devices: 1 backup 1: backup_tree_root: 882311168 gen: 1757930 level: 0 backup_chunk_root: 20987904 gen: 932006 level: 0 backup_extent_root: 879738880 gen: 1757931 level: 2 backup_fs_root: 883097600 gen: 1757931 level: 2 backup_dev_root: 307560448 gen: 1673227 level: 0 backup_csum_root: 883212288 gen: 1757931 level: 2 backup_total_bytes: 3053696 backup_bytes_used: 16943640576 backup_num_devices: 1 backup 2: backup_tree_root: 881082368 gen: 1757931 level: 0 backup_chunk_root: 20987904 gen: 932006 level: 0 backup_extent_root: 879738880 gen: 1757931 level: 2 backup_fs_root: 883654656 gen: 1757932 level: 2 backup_dev_root: 307560448 gen: 1673227 level: 0 backup_csum_root: 883703808 gen: 1757932 level: 2 backup_total_bytes: 3053696 backup_bytes_used: 16943722496 backup_num_devices: 1 backup 3: backup_tree_root: 887865344 gen: 1757932 level: 0 backup_chunk_root: 20987904 gen: 932006 level: 0 backup_extent_root: 81152 gen: 1757933 level: 2 backup_fs_root: 888750080 gen: 1757933 level: 2 backup_dev_root: 307560448 gen: 1673227 level: 0 backup_csum_root: 32000 gen: 1757933 level: 2 backup_total_bytes: 3053696 backup_bytes_used: 16937803776 backup_num_devices: 1 superblock: bytenr=67108864, device=sdcard.iso - csum 0x
Re: Salvage files from broken btrfs
On 31.10.2018 at 05:56 Chris Murphy wrote: > On Tue, Oct 30, 2018 at 4:11 PM, Mirko Klingmann wrote: >> Hi all, >> >> my btrfs root file system on a SD card broke down and did not mount anymore. > It might mount with -o ro,nologreplay > > Typically an SD card will break in a way that it can't write, and > mount will just hang (with mmcblk errors). Mounting with both ro and > nologreplay will ensure no writes are needed, allowing the mount to > succeed. of course any changes that are in the log tree will be > missing so recent transactions may be unrecoverable but so far I've > had good luck recovering from broken SD cards this way. > No luck with these options. The error still persists with same output in "dmesg". Thanks for your effort... -- Mirko
Re: fsck lowmem mode only: ERROR: errors found in fs roots
Hey Su. Anything further I need to do in this matter or can I consider it "solved" and you won't need further testing by my side, but just PR the patches of that branch? :-) Thanks, Chris. On Sat, 2018-10-27 at 14:15 +0200, Christoph Anton Mitterer wrote: > Hey. > > > Without the last patches on 4.17: > > checking extents > checking free space cache > checking fs roots > ERROR: errors found in fs roots > Checking filesystem on /dev/mapper/system > UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c > found 619543498752 bytes used, error(s) found > total csum bytes: 602382204 > total tree bytes: 2534309888 > total fs tree bytes: 1652097024 > total extent tree bytes: 160432128 > btree space waste bytes: 459291608 > file data blocks allocated: 7334036647936 > referenced 730839187456 > > > With the last patches, on 4.17: > > checking extents > checking free space cache > checking fs roots > checking only csum items (without verifying data) > checking root refs > Checking filesystem on /dev/mapper/system > UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c > found 619543498752 bytes used, no error found > total csum bytes: 602382204 > total tree bytes: 2534309888 > total fs tree bytes: 1652097024 > total extent tree bytes: 160432128 > btree space waste bytes: 459291608 > file data blocks allocated: 7334036647936 > referenced 730839187456 > > > Cheers, > Chris. >
Re: [PATCH v2] btrfs: use tagged writepage to mitigate livelock of snapshot
On 2.11.18 г. 11:06 ч., Ethan Lien wrote: > Snapshot is expected to be fast. But if there are writers steadily > create dirty pages in our subvolume, the snapshot may take a very long > time to complete. To fix the problem, we use tagged writepage for > snapshot flusher as we do in generic write_cache_pages(): we quickly > tag all dirty pages with a TOWRITE tag, then do the hard work of > writepage only on those pages with TOWRITE tag, so we ommit pages dirtied > after the snapshot command. > > We do a simple snapshot speed test on a Intel D-1531 box: > > fio --ioengine=libaio --iodepth=32 --bs=4k --rw=write --size=64G > --direct=0 --thread=1 --numjobs=1 --time_based --runtime=120 > --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5; > time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio > > original: 1m58sec > patched: 6.54sec > > This is the best case for this patch since for a sequential write case, > we omit nearly all pages dirtied after the snapshot command. > > For a multi writers, random write test: > > fio --ioengine=libaio --iodepth=32 --bs=4k --rw=randwrite --size=64G > --direct=0 --thread=1 --numjobs=4 --time_based --runtime=120 > --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5; > time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio > > original: 15.83sec > patched: 10.35sec > > The improvement is less compared with the sequential write case, since > we omit only half of the pages dirtied after snapshot command. > > Signed-off-by: Ethan Lien Codewise it looks good, though I'd also document the reason you are using wbc->nr_to_write == LONG_MAX (i.e the fact that other flushers might race and inadvertently disable the snapshot flag) but this is something which David can fix on the way in. So: Reviewed-by: Nikolay Borisov > --- > > V2: > Add more details in commit message. > rename BTRFS_INODE_TAGGED_FLUSH to BTRFS_INODE_SNAPSHOT_FLUSH. > remove unnecessary sync_mode check. > start_delalloc_inodes use boolean argument. > > fs/btrfs/btrfs_inode.h | 1 + > fs/btrfs/ctree.h | 2 +- > fs/btrfs/extent_io.c | 14 -- > fs/btrfs/inode.c | 10 ++ > fs/btrfs/ioctl.c | 2 +- > 5 files changed, 21 insertions(+), 8 deletions(-) > > diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h > index 1343ac57b438..ffc9a1c77375 100644 > --- a/fs/btrfs/btrfs_inode.h > +++ b/fs/btrfs/btrfs_inode.h > @@ -29,6 +29,7 @@ enum { > BTRFS_INODE_IN_DELALLOC_LIST, > BTRFS_INODE_READDIO_NEED_LOCK, > BTRFS_INODE_HAS_PROPS, > + BTRFS_INODE_SNAPSHOT_FLUSH, > }; > > /* in memory btrfs inode */ > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 2cddfe7806a4..82682da5a40d 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -3155,7 +3155,7 @@ int btrfs_truncate_inode_items(struct > btrfs_trans_handle *trans, > struct inode *inode, u64 new_size, > u32 min_type); > > -int btrfs_start_delalloc_inodes(struct btrfs_root *root); > +int btrfs_start_delalloc_snapshot(struct btrfs_root *root); > int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int nr); > int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end, > unsigned int extra_bits, > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 4dd6faab02bb..93f2e413535d 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -3928,12 +3928,22 @@ static int extent_write_cache_pages(struct > address_space *mapping, > range_whole = 1; > scanned = 1; > } > - if (wbc->sync_mode == WB_SYNC_ALL) > + > + /* > + * We do the tagged writepage as long as the snapshot flush bit is set > + * and we are the first one who do the filemap_flush() on this inode. > + */ > + if (range_whole && wbc->nr_to_write == LONG_MAX && > + test_and_clear_bit(BTRFS_INODE_SNAPSHOT_FLUSH, > + _I(inode)->runtime_flags)) > + wbc->tagged_writepages = 1; > + > + if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) > tag = PAGECACHE_TAG_TOWRITE; > else > tag = PAGECACHE_TAG_DIRTY; > retry: > - if (wbc->sync_mode == WB_SYNC_ALL) > + if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) > tag_pages_for_writeback(mapping, index, end); > done_index = index; > while (!done && !nr_to_write_done && (index <= end) && > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 3ea5339603cf..593445d122ed 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -9975,7 +9975,7 @@ static struct btrfs_delalloc_work > *btrfs_alloc_delalloc_work(struct inode *inode > * some fairly slow code that needs optimization. This walks the list > * of all the inodes with pending delalloc and forces them to disk. >
[PATCH v2] btrfs: use tagged writepage to mitigate livelock of snapshot
Snapshot is expected to be fast. But if there are writers steadily create dirty pages in our subvolume, the snapshot may take a very long time to complete. To fix the problem, we use tagged writepage for snapshot flusher as we do in generic write_cache_pages(): we quickly tag all dirty pages with a TOWRITE tag, then do the hard work of writepage only on those pages with TOWRITE tag, so we ommit pages dirtied after the snapshot command. We do a simple snapshot speed test on a Intel D-1531 box: fio --ioengine=libaio --iodepth=32 --bs=4k --rw=write --size=64G --direct=0 --thread=1 --numjobs=1 --time_based --runtime=120 --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5; time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio original: 1m58sec patched: 6.54sec This is the best case for this patch since for a sequential write case, we omit nearly all pages dirtied after the snapshot command. For a multi writers, random write test: fio --ioengine=libaio --iodepth=32 --bs=4k --rw=randwrite --size=64G --direct=0 --thread=1 --numjobs=4 --time_based --runtime=120 --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5; time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio original: 15.83sec patched: 10.35sec The improvement is less compared with the sequential write case, since we omit only half of the pages dirtied after snapshot command. Signed-off-by: Ethan Lien --- V2: Add more details in commit message. rename BTRFS_INODE_TAGGED_FLUSH to BTRFS_INODE_SNAPSHOT_FLUSH. remove unnecessary sync_mode check. start_delalloc_inodes use boolean argument. fs/btrfs/btrfs_inode.h | 1 + fs/btrfs/ctree.h | 2 +- fs/btrfs/extent_io.c | 14 -- fs/btrfs/inode.c | 10 ++ fs/btrfs/ioctl.c | 2 +- 5 files changed, 21 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 1343ac57b438..ffc9a1c77375 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -29,6 +29,7 @@ enum { BTRFS_INODE_IN_DELALLOC_LIST, BTRFS_INODE_READDIO_NEED_LOCK, BTRFS_INODE_HAS_PROPS, + BTRFS_INODE_SNAPSHOT_FLUSH, }; /* in memory btrfs inode */ diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2cddfe7806a4..82682da5a40d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3155,7 +3155,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, struct inode *inode, u64 new_size, u32 min_type); -int btrfs_start_delalloc_inodes(struct btrfs_root *root); +int btrfs_start_delalloc_snapshot(struct btrfs_root *root); int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int nr); int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end, unsigned int extra_bits, diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4dd6faab02bb..93f2e413535d 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3928,12 +3928,22 @@ static int extent_write_cache_pages(struct address_space *mapping, range_whole = 1; scanned = 1; } - if (wbc->sync_mode == WB_SYNC_ALL) + + /* +* We do the tagged writepage as long as the snapshot flush bit is set +* and we are the first one who do the filemap_flush() on this inode. +*/ + if (range_whole && wbc->nr_to_write == LONG_MAX && + test_and_clear_bit(BTRFS_INODE_SNAPSHOT_FLUSH, + _I(inode)->runtime_flags)) + wbc->tagged_writepages = 1; + + if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) tag = PAGECACHE_TAG_TOWRITE; else tag = PAGECACHE_TAG_DIRTY; retry: - if (wbc->sync_mode == WB_SYNC_ALL) + if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) tag_pages_for_writeback(mapping, index, end); done_index = index; while (!done && !nr_to_write_done && (index <= end) && diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3ea5339603cf..593445d122ed 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9975,7 +9975,7 @@ static struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode * some fairly slow code that needs optimization. This walks the list * of all the inodes with pending delalloc and forces them to disk. */ -static int start_delalloc_inodes(struct btrfs_root *root, int nr) +static int start_delalloc_inodes(struct btrfs_root *root, int nr, bool snapshot) { struct btrfs_inode *binode; struct inode *inode; @@ -10003,6 +10003,8 @@ static int start_delalloc_inodes(struct btrfs_root *root, int nr) } spin_unlock(>delalloc_lock); + if (snapshot) + set_bit(BTRFS_INODE_SNAPSHOT_FLUSH, >runtime_flags);
Re: [PATCH] btrfs: use tagged writepage to mitigate livelock of snapshot
On 2.11.18 г. 9:13 ч., ethanlien wrote: > David Sterba 於 2018-11-02 02:02 寫到: >> On Thu, Nov 01, 2018 at 02:49:03PM +0800, Ethan Lien wrote: >>> Snapshot is expected to be fast. But if there are writers steadily >>> create dirty pages in our subvolume, the snapshot may take a very long >>> time to complete. To fix the problem, we use tagged writepage for >>> snapshot >>> flusher as we do in the generic write_cache_pages(), so we can ommit >>> pages >>> dirtied after the snapshot command. >>> >>> We do a simple snapshot speed test on a Intel D-1531 box: >>> >>> fio --ioengine=libaio --iodepth=32 --bs=4k --rw=write --size=64G >>> --direct=0 --thread=1 --numjobs=1 --time_based --runtime=120 >>> --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5; >>> time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio >>> >>> original: 1m58sec >>> patched: 6.54sec >>> >>> This is the best case for this patch since for a sequential write case, >>> we omit nearly all pages dirtied after the snapshot command. >>> >>> For a multi writers, random write test: >>> >>> fio --ioengine=libaio --iodepth=32 --bs=4k --rw=randwrite --size=64G >>> --direct=0 --thread=1 --numjobs=4 --time_based --runtime=120 >>> --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5; >>> time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio >>> >>> original: 15.83sec >>> patched: 10.35sec >>> >>> The improvement is less compared with the sequential write case, since >>> we omit only half of the pages dirtied after snapshot command. >>> >>> Signed-off-by: Ethan Lien >> >> This looks nice, thanks. I agree with the The suggestions from Nikolay, >> please update and resend. >> >> I was bit curious about the 'livelock', what you describe does not seem >> to be one. System under heavy IO can make the snapshot dead slow but can >> recover from that once the IO stops.> > I'm not sure if this is indeed the case of 'livelock'. I learn the term > from commit: > f446daaea9d4a420d, "mm: implement writeback livelock avoidance using > page tagging". > If this is not the case, I can use another term. It's really bad to use terms you don't fully comprehend because that way you can, inadvertently, mislead people who don't necessarily have the same context as you do. A brief description of what livelock can be found here: https://stackoverflow.com/questions/6155951/whats-the-difference-between-deadlock-and-livelock In the context of wb livelock could occur since you have 2 processes - 1 producer (dirtying pages) and the other one consumer (writeback). So a livelock will be when both are doing useful work yet the system doesn't make forward progress ( in this case there is a risk of the snapshot never being created since we will never finish doing the writeback for constantly dirtied pages). David, indeed what Ethan fixes and what Jan fixed in f446daaea9d4a420d are the same class of issues. And livelock in this context really depends on one's definition of how long a no-forward-progress-but-useful-work-being-done situation persisting. > >> Regarding the sync semantics, there's AFAIK no change to the current >> state where the sync is done before snapshot but without further other >> guarantees. From that point I think it's safe to select only subset of >> pages and make things faster. >> >> As the requested changes are not functional I'll add the patch to >> for-next for testing. > >
Re: [PATCH] btrfs: use tagged writepage to mitigate livelock of snapshot
David Sterba 於 2018-11-02 02:02 寫到: On Thu, Nov 01, 2018 at 02:49:03PM +0800, Ethan Lien wrote: Snapshot is expected to be fast. But if there are writers steadily create dirty pages in our subvolume, the snapshot may take a very long time to complete. To fix the problem, we use tagged writepage for snapshot flusher as we do in the generic write_cache_pages(), so we can ommit pages dirtied after the snapshot command. We do a simple snapshot speed test on a Intel D-1531 box: fio --ioengine=libaio --iodepth=32 --bs=4k --rw=write --size=64G --direct=0 --thread=1 --numjobs=1 --time_based --runtime=120 --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5; time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio original: 1m58sec patched: 6.54sec This is the best case for this patch since for a sequential write case, we omit nearly all pages dirtied after the snapshot command. For a multi writers, random write test: fio --ioengine=libaio --iodepth=32 --bs=4k --rw=randwrite --size=64G --direct=0 --thread=1 --numjobs=4 --time_based --runtime=120 --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5; time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio original: 15.83sec patched: 10.35sec The improvement is less compared with the sequential write case, since we omit only half of the pages dirtied after snapshot command. Signed-off-by: Ethan Lien This looks nice, thanks. I agree with the The suggestions from Nikolay, please update and resend. I was bit curious about the 'livelock', what you describe does not seem to be one. System under heavy IO can make the snapshot dead slow but can recover from that once the IO stops. I'm not sure if this is indeed the case of 'livelock'. I learn the term from commit: f446daaea9d4a420d, "mm: implement writeback livelock avoidance using page tagging". If this is not the case, I can use another term. Regarding the sync semantics, there's AFAIK no change to the current state where the sync is done before snapshot but without further other guarantees. From that point I think it's safe to select only subset of pages and make things faster. As the requested changes are not functional I'll add the patch to for-next for testing.
Re: [PATCH] btrfs: use tagged writepage to mitigate livelock of snapshot
Nikolay Borisov 於 2018-11-01 19:57 寫到: On 1.11.18 г. 8:49 ч., Ethan Lien wrote: Snapshot is expected to be fast. But if there are writers steadily create dirty pages in our subvolume, the snapshot may take a very long time to complete. To fix the problem, we use tagged writepage for snapshot flusher as we do in the generic write_cache_pages(), so we can ommit pages dirtied after the snapshot command. We do a simple snapshot speed test on a Intel D-1531 box: fio --ioengine=libaio --iodepth=32 --bs=4k --rw=write --size=64G --direct=0 --thread=1 --numjobs=1 --time_based --runtime=120 --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5; time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio original: 1m58sec patched: 6.54sec This is the best case for this patch since for a sequential write case, we omit nearly all pages dirtied after the snapshot command. For a multi writers, random write test: fio --ioengine=libaio --iodepth=32 --bs=4k --rw=randwrite --size=64G --direct=0 --thread=1 --numjobs=4 --time_based --runtime=120 --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5; time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio original: 15.83sec patched: 10.35sec The improvement is less compared with the sequential write case, since we omit only half of the pages dirtied after snapshot command. Signed-off-by: Ethan Lien --- fs/btrfs/btrfs_inode.h | 1 + fs/btrfs/ctree.h | 2 +- fs/btrfs/extent_io.c | 16 ++-- fs/btrfs/inode.c | 10 ++ fs/btrfs/ioctl.c | 2 +- 5 files changed, 23 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 1343ac57b438..4182bfbb56be 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -29,6 +29,7 @@ enum { BTRFS_INODE_IN_DELALLOC_LIST, BTRFS_INODE_READDIO_NEED_LOCK, BTRFS_INODE_HAS_PROPS, + BTRFS_INODE_TAGGED_FLUSH, }; /* in memory btrfs inode */ diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2cddfe7806a4..82682da5a40d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3155,7 +3155,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, struct inode *inode, u64 new_size, u32 min_type); -int btrfs_start_delalloc_inodes(struct btrfs_root *root); +int btrfs_start_delalloc_snapshot(struct btrfs_root *root); int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int nr); int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end, unsigned int extra_bits, diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4dd6faab02bb..c21d8a0e010a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3928,12 +3928,24 @@ static int extent_write_cache_pages(struct address_space *mapping, range_whole = 1; scanned = 1; } - if (wbc->sync_mode == WB_SYNC_ALL) + + /* + * We don't care if we are the one who set BTRFS_INODE_TAGGED_FLUSH in + * start_delalloc_inodes(). We do the tagged writepage as long as we are +* the first one who do the filemap_flush() on this inode. +*/ + if (range_whole && wbc->nr_to_write == LONG_MAX && + wbc->sync_mode == WB_SYNC_NONE && + test_and_clear_bit(BTRFS_INODE_TAGGED_FLUSH, + _I(inode)->runtime_flags)) Actually this check can be simplified to: range_whole && test_and_clear_bit. filemap_flush triggers range_whole = 1 and then you care about TAGGED_FLUSH (or w/e it's going to be named) to be set. The nr_to_write && syncmode just make it a tad more difficult to reason about the code. Yes, the sync_mode check is not necessary. For nr_to_write, since pagevec contain only limited amount of pages in one loop, nr_to_write essentially control whether we scan all of the dirty pages or not. Since there is a window between start_delalloc_inodes() and extent_write_cache_pages(), another flusher with range_whole==1 but nr_to_write + wbc->tagged_writepages = 1; + + if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) tag = PAGECACHE_TAG_TOWRITE; else tag = PAGECACHE_TAG_DIRTY; retry: - if (wbc->sync_mode == WB_SYNC_ALL) + if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) tag_pages_for_writeback(mapping, index, end); done_index = index; while (!done && !nr_to_write_done && (index <= end) && diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3ea5339603cf..3df3cbbe91c5 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9975,7 +9975,7 @@ static struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode * some fairly slow code that needs optimization. This walks the list * of all the inodes with pending delalloc and forces
Re: [PATCH 3/8] btrfs: Remove extent_io_ops::writepage_end_io_hook
On 1.11.18 г. 21:03 ч., Josef Bacik wrote: > On Thu, Nov 01, 2018 at 02:09:48PM +0200, Nikolay Borisov wrote: >> This callback is ony ever called for data page writeout so >> there is no need to actually abstract it via extent_io_ops. Lets just >> export it, remove the definition of the callback and call it directly >> in the functions that invoke the callback. Also rename the function to >> btrfs_writepage_endio_finish_ordered since what it really does is >> account finished io in the ordered extent data structures. >> No functional changes. >> >> Signed-off-by: Nikolay Borisov > > Could send another cleanup patch to remove the struct extent_state *state from > the arg list as well. Indeed, once this series lands I will send a follow-up cleanup since I have some other ideas around writepage_delalloc code cleanups as well. Thanks > > Reviewed-by: Josef Bacik > > Thanks, > > Josef >