[PATCH] fstests: Fix generic/102 fail for btrfs
From: Zhao Lei generic/102 sometimes fails in newest btrfs toolchain, because it use non-mixed mode in default, which request more space for metadata, and no space for data writing. This patch force mixed mode for btrfs in generic/102. Signed-off-by: Zhao Lei --- tests/generic/102 | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/generic/102 b/tests/generic/102 index abc3994..8c01fb5 100755 --- a/tests/generic/102 +++ b/tests/generic/102 @@ -48,6 +48,8 @@ _require_scratch rm -f $seqres.full +[[ "$FSTYP" = "btrfs" ]] && MKFS_OPTIONS+=" --mixed" + dev_size=$((512 * 1024 * 1024)) # 512MB filesystem _scratch_mkfs_sized $dev_size >>$seqres.full 2>&1 _scratch_mount -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fstests: speedup generic/027 for new version of btrfs
From: Zhao Lei New version of btrfs create non-mixed blockgroups in all case. For generic/027, the filesystem in test is convert from mixed-blockgroup to non-mixed blockgroup. And test time is changed from 400s -> 2700s in my node. To test btrfs with all mountoptions, this testitem need about 7.5H. (actually, some mountoption as compress needs more time) This patch reduce test loop count, to make testtime about equal with old version. Signed-off-by: Zhao Lei --- tests/generic/027 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/generic/027 b/tests/generic/027 index d2e59d6..42f0685 100755 --- a/tests/generic/027 +++ b/tests/generic/027 @@ -78,7 +78,7 @@ rm -f $SCRATCH_MNT/testfile loop=100 # btrfs takes much longer time, reduce the loop count if [ "$FSTYP" == "btrfs" ]; then - loop=10 + loop=2 fi dir=$SCRATCH_MNT/testdir -- 1.8.5.1 -- This message has been scanned for viruses and dangerous content by Fujitsu, and is believed to be clean. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: Increase running state's priority in stat output
From: Zhao Lei Anthony Plack reported a output bug in maillist: title: btrfs-progs SCRUB reporting aborted but still running - minor btrfs scrub status report it was aborted but still runs to completion. # btrfs scrub status /mnt/data scrub status for f591ac13-1a69-476d-bd30-346f87a491da scrub started at Mon Apr 27 06:48:44 2015 and was aborted after 1089 seconds total bytes scrubbed: 1.02TiB with 0 errors # # btrfs scrub status /mnt/data scrub status for f591ac13-1a69-476d-bd30-346f87a491da scrub started at Mon Apr 27 06:48:44 2015 and was aborted after 1664 seconds total bytes scrubbed: 1.53TiB with 0 errors # ... Reason: When scrub multi-device simultaneously, if some device canceled, and some device is still running, cancel state have higher priority to be outputed in global report. So we can see "scrub aborted" in status line, with running-time keeps increased. Fix: We can increase running state's priority in output, if there is some device in scrub state, we output running state instead of cancelled state. Reported-by: Anthony Plack Signed-off-by: Zhao Lei --- cmds-scrub.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/cmds-scrub.c b/cmds-scrub.c index b7aa809..7c9318e 100644 --- a/cmds-scrub.c +++ b/cmds-scrub.c @@ -252,17 +252,15 @@ static void _print_scrub_ss(struct scrub_stats *ss) hours = ss->duration / (60 * 60); gmtime_r(&seconds, &tm); strftime(t, sizeof(t), "%M:%S", &tm); - if (ss->finished && !ss->canceled) { - printf(" and finished after %02u:%s\n", hours, t); - } else if (ss->canceled) { + if (ss->in_progress) + printf(", running for %02u:%s\n", hours, t); + else if (ss->canceled) printf(" and was aborted after %02u:%s\n", hours, t); - } else { - if (ss->in_progress) - printf(", running for %02u:%s\n", hours, t); - else - printf(", interrupted after %02u:%s, not running\n", - hours, t); - } + else if (ss->finished) + printf(" and finished after %02u:%s\n", hours, t); + else + printf(", interrupted after %02u:%s, not running\n", + hours, t); } static void print_scrub_dev(struct btrfs_ioctl_dev_info_args *di, -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs-tests: Avoid outputting useless warning in ./fsck-tests.sh
From: Zhao Lei 002-bad-transid outout 'transid verify failed' message in screen which is just a warning in btrfs-image in normal condition of this test. This patch move above warning into $RESULTS, to: 1: Avoid trouble screen output 2: Let user known detail if other error happened in btrfs-image Before patch: # ./fsck-tests.sh [TEST] 001-bad-file-extent-bytenr [TEST] 002-bad-transid parent transid verify failed on 29360128 wanted 9 found 755944791 parent transid verify failed on 29360128 wanted 9 found 755944791 Ignoring transid failure [TEST] 003-shift-offsets [TEST] 004-no-dir-index ... After patch: # ./fsck-tests.sh [TEST] 001-bad-file-extent-bytenr [TEST] 002-bad-transid [TEST] 003-shift-offsets [TEST] 004-no-dir-index ... Signed-off-by: Zhao Lei --- tests/common | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tests/common b/tests/common index 381ff96..4d03ed8 100644 --- a/tests/common +++ b/tests/common @@ -87,8 +87,9 @@ check_all_images() if ! [ -f $image.restored ]; then echo "restoring image $(basename $image)" >> $RESULTS - $TOP/btrfs-image -r $image $image.restored || \ - _fail "failed to restore image $image" + $TOP/btrfs-image -r $image $image.restored \ + &>> $RESULTS \ + || _fail "failed to restore image $image" fi check_image $image.restored -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs-progs-tests: Fix mount fail of 013-extent-tree-rebuild
From: Zhao Lei When using loop device for test, fsck-tests/013-extent-tree-rebuild failed with following error message: # ./fsck-tests.sh ... [TEST] 013-extent-tree-rebuild failed: mount /data/btrfsprogs/tests/test.img /data/btrfsprogs/tests/mnt test failed for case 013-extent-tree-rebuild # Considering that $TEST_DEV can be block or loop device, we need determine our mount option in a condition for both case. This patch make above request to a common function, to solve current problem in 013-extent-tree-rebuild, and support similar request in future. Signed-off-by: Zhao Lei --- tests/common | 24 tests/fsck-tests/013-extent-tree-rebuild/test.sh | 4 ++-- 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/tests/common b/tests/common index ba0b78a..381ff96 100644 --- a/tests/common +++ b/tests/common @@ -165,3 +165,27 @@ init_env() mkdir -p "$TEST_MNT" || { echo "Failed mkdir -p $TEST_MNT"; exit 1; } } init_env + +mount_test_dev() +{ + local loop_opt + if [[ -b "$TEST_DEV" ]]; then + loop_opt=() + elif [[ -f "$TEST_DEV" ]]; then + loop_opt=(-o loop) + else + _fail "Invalid \$TEST_DEV: $TEST_DEV" + fi + + [[ -d "$TEST_MNT" ]] || { + _fail "Invalid \$TEST_MNT: $TEST_MNT" + } + + mount "${loop_opt[@]}" "$TEST_DEV" "$TEST_MNT" || _fail "mount $TEST_DEV to $TEST_MNT failed" +} + +umount_test_dev() +{ + umount "$TEST_DEV" +} + diff --git a/tests/fsck-tests/013-extent-tree-rebuild/test.sh b/tests/fsck-tests/013-extent-tree-rebuild/test.sh index b7909d2..ff3c922 100755 --- a/tests/fsck-tests/013-extent-tree-rebuild/test.sh +++ b/tests/fsck-tests/013-extent-tree-rebuild/test.sh @@ -12,14 +12,14 @@ test_extent_tree_rebuild() { run_check $SUDO_HELPER $TOP/mkfs.btrfs -f $TEST_DEV - run_check $SUDO_HELPER mount $TEST_DEV $TEST_MNT + run_check $SUDO_HELPER mount_test_dev run_check $SUDO_HELPER cp -aR /lib/modules/`uname -r`/ $TEST_MNT for i in `seq 1 100`;do run_check $SUDO_HELPER $TOP/btrfs sub snapshot $TEST_MNT \ $TEST_MNT/snapaaa_$i done - run_check $SUDO_HELPER umount $TEST_DEV + run_check $SUDO_HELPER umount_test_dev # get extent root bytenr extent_root_bytenr=`$SUDO_HELPER $TOP/btrfs-debug-tree -r $TEST_DEV | \ -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs-progs-tests: Introduce init_env() to initialize common env variant
From: Zhao Lei For example, $TEST_DIR is common used in severial tests, and have duplicated code for initialize. These duplicated code not only benifits harddisk vendor, but have inconsistent details, as: convert-tests.sh: lack of mkdir fsck-tests/012-leaf-corruption/test.sh: unnecessary mkdir fsck-tests/013-extent-tree-rebuild/test.sh: unnecessary init misc-tests/XXX ... And severial error message: _fail "unable to create mount point on $TEST_MNT" _fail "failed to create mount point" ... This patch move initizlizaton of $TEST_DIR to common init_env(), to avoid above problem, and init_env() can be used to add more things in future. Signed-off-by: Zhao Lei --- tests/common | 7 +++ tests/convert-tests.sh | 1 - tests/fsck-tests.sh | 3 --- tests/fsck-tests/012-leaf-corruption/test.sh | 1 - tests/fsck-tests/013-extent-tree-rebuild/test.sh | 5 - tests/misc-tests.sh | 3 --- tests/misc-tests/001-btrfstune-features/test.sh | 5 - tests/misc-tests/002-uuid-rewrite/test.sh| 5 - tests/misc-tests/003-zero-log/test.sh| 5 - 9 files changed, 7 insertions(+), 28 deletions(-) diff --git a/tests/common b/tests/common index 2d337b0..ba0b78a 100644 --- a/tests/common +++ b/tests/common @@ -158,3 +158,10 @@ prepare_test_dev() truncate -s "$size" "$TEST_DEV" || _not_run "create file for loop device failed" } +init_env() +{ + TEST_MNT="${TEST_MNT:-$TOP/tests/mnt}" + export TEST_MNT + mkdir -p "$TEST_MNT" || { echo "Failed mkdir -p $TEST_MNT"; exit 1; } +} +init_env diff --git a/tests/convert-tests.sh b/tests/convert-tests.sh index efed90b..4e8496a 100755 --- a/tests/convert-tests.sh +++ b/tests/convert-tests.sh @@ -9,7 +9,6 @@ unset LANG LANG=C SCRIPT_DIR=$(dirname $(readlink -f $0)) TOP=$(readlink -f $SCRIPT_DIR/../) -TEST_MNT=${TEST_MNT:-$TOP/tests/mnt} RESULTS="$TOP/tests/convert-tests-results.txt" IMAGE="$TOP/tests/test.img" diff --git a/tests/fsck-tests.sh b/tests/fsck-tests.sh index b0ded6a..46dd72d 100755 --- a/tests/fsck-tests.sh +++ b/tests/fsck-tests.sh @@ -11,7 +11,6 @@ LANG=C SCRIPT_DIR=$(dirname $(readlink -f $0)) TOP=$(readlink -f $SCRIPT_DIR/../) TEST_DEV=${TEST_DEV:-} -TEST_MNT=${TEST_MNT:-$TOP/tests/mnt} RESULTS="$TOP/tests/fsck-tests-results.txt" source $TOP/tests/common @@ -20,11 +19,9 @@ source $TOP/tests/common export TOP export RESULTS # For custom script needs to verfiy recovery -export TEST_MNT export LANG rm -f $RESULTS -mkdir -p $TEST_MNT || _fail "unable to create mount point on $TEST_MNT" # test rely on corrupting blocks tool check_prereq btrfs-corrupt-block diff --git a/tests/fsck-tests/012-leaf-corruption/test.sh b/tests/fsck-tests/012-leaf-corruption/test.sh index f8701ad..a37ceda 100755 --- a/tests/fsck-tests/012-leaf-corruption/test.sh +++ b/tests/fsck-tests/012-leaf-corruption/test.sh @@ -85,7 +85,6 @@ check_inode() check_leaf_corrupt_no_data_ext() { image=$1 - mkdir -p $TEST_MNT || _fail "failed to create mount point" $SUDO_HELPER mount -o loop $image -o ro $TEST_MNT i=0 diff --git a/tests/fsck-tests/013-extent-tree-rebuild/test.sh b/tests/fsck-tests/013-extent-tree-rebuild/test.sh index 88a66cc..b7909d2 100755 --- a/tests/fsck-tests/013-extent-tree-rebuild/test.sh +++ b/tests/fsck-tests/013-extent-tree-rebuild/test.sh @@ -7,11 +7,6 @@ check_prereq mkfs.btrfs setup_root_helper prepare_test_dev 1G -if [ -z $TEST_MNT ];then - echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant" - exit 0 -fi - # test whether fsck can rebuild a corrupted extent tree test_extent_tree_rebuild() { diff --git a/tests/misc-tests.sh b/tests/misc-tests.sh index 5bbe914..cabe9c3 100755 --- a/tests/misc-tests.sh +++ b/tests/misc-tests.sh @@ -8,7 +8,6 @@ LANG=C SCRIPT_DIR=$(dirname $(readlink -f $0)) TOP=$(readlink -f $SCRIPT_DIR/../) TEST_DEV=${TEST_DEV:-} -TEST_MNT=${TEST_MNT:-$TOP/tests/mnt} RESULTS="$TOP/tests/misc-tests-results.txt" IMAGE="$TOP/tests/test.img" @@ -18,11 +17,9 @@ source $TOP/tests/common export TOP export RESULTS # For custom script needs to verfiy recovery -export TEST_MNT export LANG rm -f $RESULTS -mkdir -p $TEST_MNT || _fail "unable to create mount point on $TEST_MNT" # test rely on corrupting blocks tool check_prereq btrfs-corrupt-block diff --git a/tests/misc-tests/001-btrfstune-features/test.sh b/tests/misc-tests/001-btrfstune-features/test.sh index ea33954..836e8d3 100755 --- a/tests/misc-tests/001-btrfstune-features/test.sh +++ b/tests/misc-tests/001-btrfstune-features/test.sh @@ -9,11 +9,6 @@ check_prereq mkfs.btrfs setup_root_helper prepare_test_dev -if [ -z $TEST_MNT ];then - echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant" - exit 0 -fi - # test whether fsck can rebuild a corrupted extent tree # parameters: # - option for mkfs.btrfs
[PATCH] btrfs-progs-tests: Add -o loop to fsck-tests/012-leaf-corruption for loop device
From: Zhao Lei To avoid following mount error in test: mount: /root/btrfs/progs/tests/fsck-tests/012-leaf-corruption/test.img is not a block device (maybe try `-o loop'?) Signed-off-by: Zhao Lei --- tests/fsck-tests/012-leaf-corruption/test.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/fsck-tests/012-leaf-corruption/test.sh b/tests/fsck-tests/012-leaf-corruption/test.sh index 98e3185..f8701ad 100755 --- a/tests/fsck-tests/012-leaf-corruption/test.sh +++ b/tests/fsck-tests/012-leaf-corruption/test.sh @@ -86,7 +86,7 @@ check_leaf_corrupt_no_data_ext() { image=$1 mkdir -p $TEST_MNT || _fail "failed to create mount point" - $SUDO_HELPER mount $image -o ro $TEST_MNT + $SUDO_HELPER mount -o loop $image -o ro $TEST_MNT i=0 while [ $i -lt ${#leaf_no_data_ext_list[@]} ]; do -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/7] btrfs-progs: Move close timer handle code to line after sub process exit
From: Zhao Lei The timer handle have possibility in using by sub thread, better to close it after sub process exit. Signed-off-by: Zhao Lei --- task-utils.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/task-utils.c b/task-utils.c index 0390a69..17fd573 100644 --- a/task-utils.c +++ b/task-utils.c @@ -61,14 +61,14 @@ void task_stop(struct task_info *info) if (!info) return; - if (info->periodic.timer_fd) - close(info->periodic.timer_fd); - if (info->id > 0) { pthread_cancel(info->id); pthread_join(info->id, NULL); } + if (info->periodic.timer_fd) + close(info->periodic.timer_fd); + if (info->postfn) info->postfn(info->private_data); } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 6/7] btrfs-progs: Move code to create loop device to common
From: Zhao Lei This code block is common used, move it to common function will make code clean. Signed-off-by: Zhao Lei --- tests/common | 16 tests/fsck-tests/013-extent-tree-rebuild/test.sh | 11 +-- tests/misc-tests/001-btrfstune-features/test.sh | 11 +-- tests/misc-tests/002-uuid-rewrite/test.sh| 11 +-- tests/misc-tests/003-zero-log/test.sh| 11 +-- 5 files changed, 20 insertions(+), 40 deletions(-) diff --git a/tests/common b/tests/common index 899ec7b..2d337b0 100644 --- a/tests/common +++ b/tests/common @@ -142,3 +142,19 @@ setup_root_helper() fi SUDO_HELPER=root_helper } + +prepare_test_dev() +{ + # num[K/M/G/T...] + local size="$1" + + [[ "$TEST_DEV" ]] && return + [[ "$size" ]] || size='1G' + + echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \ + $RESULTS + TEST_DEV="$TOP/tests/test.img" + + truncate -s "$size" "$TEST_DEV" || _not_run "create file for loop device failed" +} + diff --git a/tests/fsck-tests/013-extent-tree-rebuild/test.sh b/tests/fsck-tests/013-extent-tree-rebuild/test.sh index 3290cd7..88a66cc 100755 --- a/tests/fsck-tests/013-extent-tree-rebuild/test.sh +++ b/tests/fsck-tests/013-extent-tree-rebuild/test.sh @@ -5,16 +5,7 @@ source $TOP/tests/common check_prereq btrfs-debug-tree check_prereq mkfs.btrfs setup_root_helper - -if [ -z $TEST_DEV ]; then - echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \ - $RESULTS - TEST_DEV="$TOP/tests/test.img" - - # Need at least 1G to avoid mixed block group, which extent tree - # rebuild doesn't support. - run_check truncate -s 1G $TEST_DEV -fi +prepare_test_dev 1G if [ -z $TEST_MNT ];then echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant" diff --git a/tests/misc-tests/001-btrfstune-features/test.sh b/tests/misc-tests/001-btrfstune-features/test.sh index c9981e6..ea33954 100755 --- a/tests/misc-tests/001-btrfstune-features/test.sh +++ b/tests/misc-tests/001-btrfstune-features/test.sh @@ -7,16 +7,7 @@ check_prereq btrfs-debug-tree check_prereq btrfs-show-super check_prereq mkfs.btrfs setup_root_helper - -if [ -z $TEST_DEV ]; then - echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \ - $RESULTS - TEST_DEV="$TOP/tests/test.img" - - # Need at least 1G to avoid mixed block group, which extent tree - # rebuild doesn't support. - run_check truncate -s 1G $TEST_DEV -fi +prepare_test_dev if [ -z $TEST_MNT ];then echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant" diff --git a/tests/misc-tests/002-uuid-rewrite/test.sh b/tests/misc-tests/002-uuid-rewrite/test.sh index 6c2a393..bffa9b8 100755 --- a/tests/misc-tests/002-uuid-rewrite/test.sh +++ b/tests/misc-tests/002-uuid-rewrite/test.sh @@ -7,16 +7,7 @@ check_prereq btrfs-debug-tree check_prereq btrfs-show-super check_prereq mkfs.btrfs check_prereq btrfstune - -if [ -z $TEST_DEV ]; then - echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \ - $RESULTS - TEST_DEV="$TOP/tests/test.img" - - # Need at least 1G to avoid mixed block group, which extent tree - # rebuild doesn't support. - run_check truncate -s 1G $TEST_DEV -fi +prepare_test_dev if [ -z $TEST_MNT ];then echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant" diff --git a/tests/misc-tests/003-zero-log/test.sh b/tests/misc-tests/003-zero-log/test.sh index da5b351..edab5db 100755 --- a/tests/misc-tests/003-zero-log/test.sh +++ b/tests/misc-tests/003-zero-log/test.sh @@ -6,16 +6,7 @@ source $TOP/tests/common check_prereq btrfs-show-super check_prereq mkfs.btrfs check_prereq btrfs - -if [ -z $TEST_DEV ]; then - echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \ - $RESULTS - TEST_DEV="$TOP/tests/test.img" - - # Need at least 1G to avoid mixed block group, which extent tree - # rebuild doesn't support. - run_check truncate -s 1G $TEST_DEV -fi +prepare_test_dev if [ -z $TEST_MNT ];then echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant" -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 5/7] btrfs-progs: Set info->periodic.timer_fd to 0 in init-fail
From: Zhao Lei In current code, (info->periodic.timer_fd == 0) means it is not valid, as: if (info->periodic.timer_fd) { close(info->periodic.timer_fd); ... We need set its value from -1 to 0 in create-fail case, to make code like above works. Signed-off-by: Zhao Lei --- task-utils.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/task-utils.c b/task-utils.c index 768be94..58f5195 100644 --- a/task-utils.c +++ b/task-utils.c @@ -94,8 +94,10 @@ int task_period_start(struct task_info *info, unsigned int period_ms) return -1; info->periodic.timer_fd = timerfd_create(CLOCK_MONOTONIC, 0); - if (info->periodic.timer_fd == -1) + if (info->periodic.timer_fd == -1) { + info->periodic.timer_fd = 0; return info->periodic.timer_fd; + } info->periodic.wakeups_missed = 0; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/7] btrfs-progs: Remove cleanup-timer code for btrfs-convert
From: Zhao Lei No need to close timer handle afain in subthread-close-callback, it is closed by task_stop() automatically. This patch also add a fflush() to make log output on time. Signed-off-by: Zhao Lei --- btrfs-convert.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/btrfs-convert.c b/btrfs-convert.c index b89c685..7dfab7d 100644 --- a/btrfs-convert.c +++ b/btrfs-convert.c @@ -71,10 +71,8 @@ static void *print_copied_inodes(void *p) static int after_copied_inodes(void *p) { - struct task_ctx *priv = p; - printf("\n"); - task_period_stop(priv->info); + fflush(stdout); return 0; } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 7/7] btrfs-progs: Introduce a misc test for thread conflict in btrfs-convert
From: Zhao Lei Current code of btrfs-convert have a bug of thread conflict, which caused invalid memory accessing between threads, and make program panic. This patch add a test item for above bug, as: # ./misc-tests.sh [TEST] 001-btrfstune-features [TEST] 002-uuid-rewrite [TEST] 003-zero-log [TEST] 004-convert-thread-conflict failed: btrfs-convert /root/btrfsprogs/tests/test.img test failed for case 004-convert-thread-conflict # # cat misc-tests-results.txt ... ### btrfs-convert /root/btrfsprogs/tests/test.img trans 7 running 5 ctree.c:363: btrfs_cow_block: Assertion `1` failed. btrfs-convert(btrfs_cow_block+0x92)[0x40acaf] btrfs-convert(btrfs_search_slot+0x1cb)[0x40c50f] btrfs-convert(btrfs_csum_file_block+0x20f)[0x41d83a] btrfs-convert[0x43422d] btrfs-convert[0x4342cd] btrfs-convert[0x4345ca] btrfs-convert[0x434767] btrfs-convert[0x435770] btrfs-convert[0x439748] btrfs-convert(main+0x13f8)[0x43b09d] /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd] btrfs-convert[0x407649] create btrfs filesystem: blocksize: 4096 nodesize: 16384 features: extref, skinny-metadata (default) creating btrfs metadata. creating ext2fs image file. failed: btrfs-convert /root/btrfsprogs/tests/test.img test failed for case 004-convert-thread-conflict # Note that this bug is not happened every time, especilly in slow device as loop(slow cpu with fast block device is likely to trigger). I set loop count to 20 to make bug happened in 90% tests. Suggested-by: David Sterba Signed-off-by: Zhao Lei --- tests/misc-tests/004-convert-thread-conflict/test.sh | 14 ++ 1 file changed, 14 insertions(+) create mode 100755 tests/misc-tests/004-convert-thread-conflict/test.sh diff --git a/tests/misc-tests/004-convert-thread-conflict/test.sh b/tests/misc-tests/004-convert-thread-conflict/test.sh new file mode 100755 index 000..09ac8a3 --- /dev/null +++ b/tests/misc-tests/004-convert-thread-conflict/test.sh @@ -0,0 +1,14 @@ +#!/bin/bash +# test convert-thread-conflict + +source $TOP/tests/common + +check_prereq btrfs +mkfs.ext4 -V &>/dev/null || _not_run "mkfs.ext4 not found" +prepare_test_dev 1G + +for ((i = 0; i < 20; i++)); do + echo "loop $i" >>$RESULTS + mkfs.ext4 -F "$TEST_DEV" &>>$RESULTS || _not_run "mkfs.ext4 failed" + run_check $TOP/btrfs-convert "$TEST_DEV" +done -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 4/7] btrfs-progs: resst info->periodic.timer_fd's value after free
From: Zhao Lei task_period_stop() is used to close a timer explicitly, to avoid the timer handle closed again by task_stop(), we should reset its value after close. Also add value-reset for info->id for safe. Signed-off-by: Zhao Lei --- task-utils.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/task-utils.c b/task-utils.c index 17fd573..768be94 100644 --- a/task-utils.c +++ b/task-utils.c @@ -64,10 +64,13 @@ void task_stop(struct task_info *info) if (info->id > 0) { pthread_cancel(info->id); pthread_join(info->id, NULL); + info->id = -1; } - if (info->periodic.timer_fd) + if (info->periodic.timer_fd) { close(info->periodic.timer_fd); + info->periodic.timer_fd = 0; + } if (info->postfn) info->postfn(info->private_data); @@ -130,5 +133,6 @@ void task_period_stop(struct task_info *info) if (info->periodic.timer_fd) { timerfd_settime(info->periodic.timer_fd, 0, NULL, NULL); close(info->periodic.timer_fd); + info->periodic.timer_fd = -1; } } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/7] btrfs-progs: Fix wrong address accessing by subthread in btrfs-convert
From: Zhao Lei It is v3 patchset of: Fix wrong address accessing by subthread in btrfs-convert. Changelog v2->v3: 1: Add $TOP path prefix to btrfs-convert command in: [PATCH v2 7/7] btrfs-progs: Introduce a misc test for thread conflict in btrfs-convert To check prog in btrfs source dir. Changelog v1->v2: 1: Move bug test script from patch description into a single script in test/misc-tests, suggested-by: David Sterba 2: Move code for creating loop device to common [PATCH 6/7] Zhao Lei (7): btrfs-progs: Fix wrong address accessing by subthread in btrfs-convert btrfs-progs: Move close timer handle code to line after sub process exit btrfs-progs: Remove cleanup-timer code for btrfs-convert btrfs-progs: resst info->periodic.timer_fd's value after free btrfs-progs: Set info->periodic.timer_fd to 0 in init-fail btrfs-progs: Move code to create loop device to common btrfs-progs: Introduce a misc test for thread conflict in btrfs-convert btrfs-convert.c| 4 +--- task-utils.c | 22 ++ tests/common | 16 tests/fsck-tests/013-extent-tree-rebuild/test.sh | 11 +-- tests/misc-tests/001-btrfstune-features/test.sh| 11 +-- tests/misc-tests/002-uuid-rewrite/test.sh | 11 +-- tests/misc-tests/003-zero-log/test.sh | 11 +-- .../misc-tests/004-convert-thread-conflict/test.sh | 14 ++ 8 files changed, 49 insertions(+), 51 deletions(-) create mode 100755 tests/misc-tests/004-convert-thread-conflict/test.sh -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/7] btrfs-progs: Fix wrong address accessing by subthread in btrfs-convert
From: Zhao Lei btrfs-convert sometimes show 'Assertion failed' in converting a nearly blank file system, as: create btrfs filesystem: blocksize: 4096 nodesize: 16384 features: extref, skinny-metadata (default) creating btrfs metadata. creating ext2fs image file. trans 7 running 5 ctree.c:363: btrfs_cow_block: Assertion `1` failed. btrfs-convert(btrfs_cow_block+0x92)[0x40acaf] btrfs-convert(btrfs_search_slot+0x1cb)[0x40c50f] btrfs-convert(btrfs_csum_file_block+0x20f)[0x41d83a] btrfs-convert[0x43422d] btrfs-convert[0x4342cd] btrfs-convert[0x4345ca] btrfs-convert[0x434767] btrfs-convert[0x435770] btrfs-convert[0x439748] btrfs-convert(main+0x13f8)[0x43b09d] /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd] btrfs-convert[0x407649] Reason is complex: 1: main thread allocated a block of memory, shared with sub thread 2: main thread killed sub thread, and free above memory 3: main thread malloc a new one(in same address), and use it 4: sub thread(which is not really quit), write into this address, and caused this bug. By adding some debug lines into code, we can see following output: create btrfs filesystem: blocksize: 4096 nodesize: 16384 features: extref, skinny-metadata (default) creating btrfs metadata. 1: ctx(0x7ffe1abde230)->info=0xc65b80 2: task_period_start: will create periodic.timer_fd 3: task_stop: info->periodic.timer_fd = NULL 4: task_stop: begin pthread_cancel info->id=-1746053376 5: task_stop: done pthread_cancel ret=0 6: task_stop: begin info->postfn 7: task_period_stop: periodic.timer_fd NULL 8: task_stop: done info->postfn 9: task_stop: done all 10: creating ext2fs image file. trans 7 running 5 11: task_period_start: create periodic.timer_fd done info->periodic.timer_fd(0xc65b80)=7 12: btrfs_cow_block: root->fs_info->generation(0xc63568) = 5 trans->transid(0xc65b80)=7 13: ctree.c:368: btrfs_cow_block: Assertion `1` failed. ./btrfs-convert(btrfs_cow_block+0xda)[0x40ad37] ./btrfs-convert(btrfs_search_slot+0x1cb)[0x40c5b4] ./btrfs-convert(btrfs_insert_empty_items+0xac)[0x40d9f6] ./btrfs-convert(btrfs_record_file_extent+0xc0)[0x4183fe] ./btrfs-convert[0x435796] ./btrfs-convert[0x439b0c] ./btrfs-convert(main+0x13f8)[0x43b45d] /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd] ./btrfs-convert[0x407689] Conclusion: a: subthread should exit before step 5, but it is still running in step 11 b: task_stop() hadn't close periodic.timer_fd in step3, because periodic.timer_fd is not initialized yet. c. address of 0xc65b80 is overwrited by subthread in step 11, but this address is freed and re-malloc by main thread before step 10, and used for trans->transid. d: trans->transid which is overwrite by subthread caused error in step 13. Fix: pthread_cancel() only send a cancellation request to the thread, thread will quit in next cancellation point by default. To make sub thread quit in time, this patch add pthread_join() after pthread_cancel() call. And to make pthread_join() works, pthread_detach() is removed. Signed-off-by: Zhao Lei --- task-utils.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/task-utils.c b/task-utils.c index 10e3f0f..0390a69 100644 --- a/task-utils.c +++ b/task-utils.c @@ -50,9 +50,7 @@ int task_start(struct task_info *info) ret = pthread_create(&info->id, NULL, info->threadfn, info->private_data); - if (ret == 0) - pthread_detach(info->id); - else + if (ret) info->id = -1; return ret; @@ -66,8 +64,10 @@ void task_stop(struct task_info *info) if (info->periodic.timer_fd) close(info->periodic.timer_fd); - if (info->id > 0) + if (info->id > 0) { pthread_cancel(info->id); + pthread_join(info->id, NULL); + } if (info->postfn) info->postfn(info->private_data); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: Show detail error message when write sb failed in write_dev_supers()
From: Zhao Lei fsck-tests.sh failed and show following message in my node: # ./fsck-tests.sh [TEST] 001-bad-file-extent-bytenr disk-io.c:1444: write_dev_supers: Assertion `ret != BTRFS_SUPER_INFO_SIZE` failed. /root/btrfsprogs/btrfs-image(write_all_supers+0x2d2)[0x41031c] /root/btrfsprogs/btrfs-image(write_ctree_super+0xc5)[0x41042e] /root/btrfsprogs/btrfs-image(btrfs_commit_transaction+0x208)[0x410976] /root/btrfsprogs/btrfs-image[0x438780] /root/btrfsprogs/btrfs-image(main+0x3d5)[0x438c5c] /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd] /root/btrfsprogs/btrfs-image[0x4074e9] failed to restore image /root/btrfsprogs/tests/fsck-tests/001-bad-file-extent-bytenr/default_case.img # # cat fsck-tests-results.txt === Entering /root/btrfsprogs/tests/fsck-tests/001-bad-file-extent-bytenr restoring image default_case.img failed to restore image /root/btrfsprogs/tests/fsck-tests/001-bad-file-extent-bytenr/default_case.img # Reason: I run above test in a NFS mountpoint, it don't have enouth space to write all superblock to image file, and don't support sparse file. So write_dev_supers() failed in writing sb and output above message. It takes me quite of time to know what happened, we can save these time by output exact information in write-sb-fail case. After patch: # ./fsck-tests.sh [TEST] 001-bad-file-extent-bytenr WARNING: Write sb failed: File too large disk-io.c:1492: write_all_supers: Assertion `ret` failed. ... # Signed-off-by: Zhao Lei --- disk-io.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/disk-io.c b/disk-io.c index fdcfd6d..607d4a1 100644 --- a/disk-io.c +++ b/disk-io.c @@ -1419,7 +1419,8 @@ static int write_dev_supers(struct btrfs_root *root, ret = pwrite64(device->fd, root->fs_info->super_copy, BTRFS_SUPER_INFO_SIZE, root->fs_info->super_bytenr); - BUG_ON(ret != BTRFS_SUPER_INFO_SIZE); + if (ret != BTRFS_SUPER_INFO_SIZE) + goto write_err; return 0; } @@ -1441,10 +1442,19 @@ static int write_dev_supers(struct btrfs_root *root, */ ret = pwrite64(device->fd, root->fs_info->super_copy, BTRFS_SUPER_INFO_SIZE, bytenr); - BUG_ON(ret != BTRFS_SUPER_INFO_SIZE); + if (ret != BTRFS_SUPER_INFO_SIZE) + goto write_err; } return 0; + +write_err: + if (ret > 0) + fprintf(stderr, "WARNING: Failed to write all sb data\n"); + else + fprintf(stderr, "WARNING: Write sb failed: %s\n", + strerror(errno)); + return ret; } int write_all_supers(struct btrfs_root *root) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs-tests: Add '-o loop' to mount command line in convert-tests.sh
From: Zhao Lei To fix following bug: # ./convert-tests.sh [TEST] ext2 4k nodesize, btrfs defaults failed: mount /root/btrfsprogs/tests/test.img /root/btrfsprogs/tests/mnt # tail convert-tests-results.txt ... ### mount /root/btrfsprogs/tests/test.img /root/btrfsprogs/tests/mnt mount: /root/btrfsprogs/tests/test.img is not a block device (maybe try `-o loop'?) failed: mount /root/btrfsprogs/tests/test.img /root/btrfsprogs/tests/mnt # Signed-off-by: Zhao Lei --- tests/convert-tests.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/convert-tests.sh b/tests/convert-tests.sh index 14dde1f..efed90b 100755 --- a/tests/convert-tests.sh +++ b/tests/convert-tests.sh @@ -43,7 +43,7 @@ convert_test() { # create a file to check btrfs-convert can convert regular file # correct - run_check $SUDO_HELPER mount $IMAGE $TEST_MNT + run_check $SUDO_HELPER mount -o loop $IMAGE $TEST_MNT run_check $SUDO_HELPER dd if=/dev/zero of=$TEST_MNT/test bs=$nodesize \ count=1 1>/dev/null 2>&1 run_check $SUDO_HELPER umount $TEST_MNT -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/7] btrfs-progs: Move close timer handle code to line after sub process exit
From: Zhao Lei The timer handle have possibility in using by sub thread, better to close it after sub process exit. Signed-off-by: Zhao Lei --- task-utils.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/task-utils.c b/task-utils.c index 0390a69..17fd573 100644 --- a/task-utils.c +++ b/task-utils.c @@ -61,14 +61,14 @@ void task_stop(struct task_info *info) if (!info) return; - if (info->periodic.timer_fd) - close(info->periodic.timer_fd); - if (info->id > 0) { pthread_cancel(info->id); pthread_join(info->id, NULL); } + if (info->periodic.timer_fd) + close(info->periodic.timer_fd); + if (info->postfn) info->postfn(info->private_data); } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 6/7] btrfs-progs: Move code to create loop device to common
From: Zhao Lei This code block is common used, move it to common function will make code clean. Signed-off-by: Zhao Lei --- tests/common | 16 tests/fsck-tests/013-extent-tree-rebuild/test.sh | 11 +-- tests/misc-tests/001-btrfstune-features/test.sh | 11 +-- tests/misc-tests/002-uuid-rewrite/test.sh| 11 +-- tests/misc-tests/003-zero-log/test.sh| 11 +-- 5 files changed, 20 insertions(+), 40 deletions(-) diff --git a/tests/common b/tests/common index 899ec7b..2d337b0 100644 --- a/tests/common +++ b/tests/common @@ -142,3 +142,19 @@ setup_root_helper() fi SUDO_HELPER=root_helper } + +prepare_test_dev() +{ + # num[K/M/G/T...] + local size="$1" + + [[ "$TEST_DEV" ]] && return + [[ "$size" ]] || size='1G' + + echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \ + $RESULTS + TEST_DEV="$TOP/tests/test.img" + + truncate -s "$size" "$TEST_DEV" || _not_run "create file for loop device failed" +} + diff --git a/tests/fsck-tests/013-extent-tree-rebuild/test.sh b/tests/fsck-tests/013-extent-tree-rebuild/test.sh index 3290cd7..88a66cc 100755 --- a/tests/fsck-tests/013-extent-tree-rebuild/test.sh +++ b/tests/fsck-tests/013-extent-tree-rebuild/test.sh @@ -5,16 +5,7 @@ source $TOP/tests/common check_prereq btrfs-debug-tree check_prereq mkfs.btrfs setup_root_helper - -if [ -z $TEST_DEV ]; then - echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \ - $RESULTS - TEST_DEV="$TOP/tests/test.img" - - # Need at least 1G to avoid mixed block group, which extent tree - # rebuild doesn't support. - run_check truncate -s 1G $TEST_DEV -fi +prepare_test_dev 1G if [ -z $TEST_MNT ];then echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant" diff --git a/tests/misc-tests/001-btrfstune-features/test.sh b/tests/misc-tests/001-btrfstune-features/test.sh index c9981e6..ea33954 100755 --- a/tests/misc-tests/001-btrfstune-features/test.sh +++ b/tests/misc-tests/001-btrfstune-features/test.sh @@ -7,16 +7,7 @@ check_prereq btrfs-debug-tree check_prereq btrfs-show-super check_prereq mkfs.btrfs setup_root_helper - -if [ -z $TEST_DEV ]; then - echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \ - $RESULTS - TEST_DEV="$TOP/tests/test.img" - - # Need at least 1G to avoid mixed block group, which extent tree - # rebuild doesn't support. - run_check truncate -s 1G $TEST_DEV -fi +prepare_test_dev if [ -z $TEST_MNT ];then echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant" diff --git a/tests/misc-tests/002-uuid-rewrite/test.sh b/tests/misc-tests/002-uuid-rewrite/test.sh index 6c2a393..bffa9b8 100755 --- a/tests/misc-tests/002-uuid-rewrite/test.sh +++ b/tests/misc-tests/002-uuid-rewrite/test.sh @@ -7,16 +7,7 @@ check_prereq btrfs-debug-tree check_prereq btrfs-show-super check_prereq mkfs.btrfs check_prereq btrfstune - -if [ -z $TEST_DEV ]; then - echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \ - $RESULTS - TEST_DEV="$TOP/tests/test.img" - - # Need at least 1G to avoid mixed block group, which extent tree - # rebuild doesn't support. - run_check truncate -s 1G $TEST_DEV -fi +prepare_test_dev if [ -z $TEST_MNT ];then echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant" diff --git a/tests/misc-tests/003-zero-log/test.sh b/tests/misc-tests/003-zero-log/test.sh index da5b351..edab5db 100755 --- a/tests/misc-tests/003-zero-log/test.sh +++ b/tests/misc-tests/003-zero-log/test.sh @@ -6,16 +6,7 @@ source $TOP/tests/common check_prereq btrfs-show-super check_prereq mkfs.btrfs check_prereq btrfs - -if [ -z $TEST_DEV ]; then - echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \ - $RESULTS - TEST_DEV="$TOP/tests/test.img" - - # Need at least 1G to avoid mixed block group, which extent tree - # rebuild doesn't support. - run_check truncate -s 1G $TEST_DEV -fi +prepare_test_dev if [ -z $TEST_MNT ];then echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant" -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 7/7] btrfs-progs: Introduce a misc test for thread conflict in btrfs-convert
From: Zhao Lei Current code of btrfs-convert have a bug of thread conflict, which caused invalid memory accessing between threads, and make program panic. This patch add a test item for above bug, as: # ./misc-tests.sh [TEST] 001-btrfstune-features [TEST] 002-uuid-rewrite [TEST] 003-zero-log [TEST] 004-convert-thread-conflict failed: btrfs-convert /root/btrfsprogs/tests/test.img test failed for case 004-convert-thread-conflict # # cat misc-tests-results.txt ... ### btrfs-convert /root/btrfsprogs/tests/test.img trans 7 running 5 ctree.c:363: btrfs_cow_block: Assertion `1` failed. btrfs-convert(btrfs_cow_block+0x92)[0x40acaf] btrfs-convert(btrfs_search_slot+0x1cb)[0x40c50f] btrfs-convert(btrfs_csum_file_block+0x20f)[0x41d83a] btrfs-convert[0x43422d] btrfs-convert[0x4342cd] btrfs-convert[0x4345ca] btrfs-convert[0x434767] btrfs-convert[0x435770] btrfs-convert[0x439748] btrfs-convert(main+0x13f8)[0x43b09d] /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd] btrfs-convert[0x407649] create btrfs filesystem: blocksize: 4096 nodesize: 16384 features: extref, skinny-metadata (default) creating btrfs metadata. creating ext2fs image file. failed: btrfs-convert /root/btrfsprogs/tests/test.img test failed for case 004-convert-thread-conflict # Note that this bug is not happened every time, especilly in slow device as loop(slow cpu with fast block device is likely to trigger). I set loop count to 20 to make bug happened in 90% tests. Suggested-by: David Sterba Signed-off-by: Zhao Lei --- tests/misc-tests/004-convert-thread-conflict/test.sh | 14 ++ 1 file changed, 14 insertions(+) create mode 100755 tests/misc-tests/004-convert-thread-conflict/test.sh diff --git a/tests/misc-tests/004-convert-thread-conflict/test.sh b/tests/misc-tests/004-convert-thread-conflict/test.sh new file mode 100755 index 000..d02e766 --- /dev/null +++ b/tests/misc-tests/004-convert-thread-conflict/test.sh @@ -0,0 +1,14 @@ +#!/bin/bash +# test convert-thread-conflict + +source $TOP/tests/common + +check_prereq btrfs +mkfs.ext4 -V &>/dev/null || _not_run "mkfs.ext4 not found" +prepare_test_dev 1G + +for ((i = 0; i < 20; i++)); do + echo "loop $i" >>$RESULTS + mkfs.ext4 -F "$TEST_DEV" &>>$RESULTS || _not_run "mkfs.ext4 failed" + run_check btrfs-convert "$TEST_DEV" +done -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/7] btrfs-progs: Set info->periodic.timer_fd to 0 in init-fail
From: Zhao Lei In current code, (info->periodic.timer_fd == 0) means it is not valid, as: if (info->periodic.timer_fd) { close(info->periodic.timer_fd); ... We need set its value from -1 to 0 in create-fail case, to make code like above works. Signed-off-by: Zhao Lei --- task-utils.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/task-utils.c b/task-utils.c index 768be94..58f5195 100644 --- a/task-utils.c +++ b/task-utils.c @@ -94,8 +94,10 @@ int task_period_start(struct task_info *info, unsigned int period_ms) return -1; info->periodic.timer_fd = timerfd_create(CLOCK_MONOTONIC, 0); - if (info->periodic.timer_fd == -1) + if (info->periodic.timer_fd == -1) { + info->periodic.timer_fd = 0; return info->periodic.timer_fd; + } info->periodic.wakeups_missed = 0; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/7] btrfs-progs: Fix wrong address accessing by subthread in btrfs-convert
From: Zhao Lei btrfs-convert sometimes show 'Assertion failed' in converting a nearly blank file system, as: create btrfs filesystem: blocksize: 4096 nodesize: 16384 features: extref, skinny-metadata (default) creating btrfs metadata. creating ext2fs image file. trans 7 running 5 ctree.c:363: btrfs_cow_block: Assertion `1` failed. btrfs-convert(btrfs_cow_block+0x92)[0x40acaf] btrfs-convert(btrfs_search_slot+0x1cb)[0x40c50f] btrfs-convert(btrfs_csum_file_block+0x20f)[0x41d83a] btrfs-convert[0x43422d] btrfs-convert[0x4342cd] btrfs-convert[0x4345ca] btrfs-convert[0x434767] btrfs-convert[0x435770] btrfs-convert[0x439748] btrfs-convert(main+0x13f8)[0x43b09d] /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd] btrfs-convert[0x407649] Reason is complex: 1: main thread allocated a block of memory, shared with sub thread 2: main thread killed sub thread, and free above memory 3: main thread malloc a new one(in same address), and use it 4: sub thread(which is not really quit), write into this address, and caused this bug. By adding some debug lines into code, we can see following output: create btrfs filesystem: blocksize: 4096 nodesize: 16384 features: extref, skinny-metadata (default) creating btrfs metadata. 1: ctx(0x7ffe1abde230)->info=0xc65b80 2: task_period_start: will create periodic.timer_fd 3: task_stop: info->periodic.timer_fd = NULL 4: task_stop: begin pthread_cancel info->id=-1746053376 5: task_stop: done pthread_cancel ret=0 6: task_stop: begin info->postfn 7: task_period_stop: periodic.timer_fd NULL 8: task_stop: done info->postfn 9: task_stop: done all 10: creating ext2fs image file. trans 7 running 5 11: task_period_start: create periodic.timer_fd done info->periodic.timer_fd(0xc65b80)=7 12: btrfs_cow_block: root->fs_info->generation(0xc63568) = 5 trans->transid(0xc65b80)=7 13: ctree.c:368: btrfs_cow_block: Assertion `1` failed. ./btrfs-convert(btrfs_cow_block+0xda)[0x40ad37] ./btrfs-convert(btrfs_search_slot+0x1cb)[0x40c5b4] ./btrfs-convert(btrfs_insert_empty_items+0xac)[0x40d9f6] ./btrfs-convert(btrfs_record_file_extent+0xc0)[0x4183fe] ./btrfs-convert[0x435796] ./btrfs-convert[0x439b0c] ./btrfs-convert(main+0x13f8)[0x43b45d] /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd] ./btrfs-convert[0x407689] Conclusion: a: subthread should exit before step 5, but it is still running in step 11 b: task_stop() hadn't close periodic.timer_fd in step3, because periodic.timer_fd is not initialized yet. c. address of 0xc65b80 is overwrited by subthread in step 11, but this address is freed and re-malloc by main thread before step 10, and used for trans->transid. d: trans->transid which is overwrite by subthread caused error in step 13. Fix: pthread_cancel() only send a cancellation request to the thread, thread will quit in next cancellation point by default. To make sub thread quit in time, this patch add pthread_join() after pthread_cancel() call. And to make pthread_join() works, pthread_detach() is removed. Signed-off-by: Zhao Lei --- task-utils.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/task-utils.c b/task-utils.c index 10e3f0f..0390a69 100644 --- a/task-utils.c +++ b/task-utils.c @@ -50,9 +50,7 @@ int task_start(struct task_info *info) ret = pthread_create(&info->id, NULL, info->threadfn, info->private_data); - if (ret == 0) - pthread_detach(info->id); - else + if (ret) info->id = -1; return ret; @@ -66,8 +64,10 @@ void task_stop(struct task_info *info) if (info->periodic.timer_fd) close(info->periodic.timer_fd); - if (info->id > 0) + if (info->id > 0) { pthread_cancel(info->id); + pthread_join(info->id, NULL); + } if (info->postfn) info->postfn(info->private_data); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/7] btrfs-progs: Remove cleanup-timer code for btrfs-convert
From: Zhao Lei No need to close timer handle afain in subthread-close-callback, it is closed by task_stop() automatically. This patch also add a fflush() to make log output on time. Signed-off-by: Zhao Lei --- btrfs-convert.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/btrfs-convert.c b/btrfs-convert.c index b89c685..7dfab7d 100644 --- a/btrfs-convert.c +++ b/btrfs-convert.c @@ -71,10 +71,8 @@ static void *print_copied_inodes(void *p) static int after_copied_inodes(void *p) { - struct task_ctx *priv = p; - printf("\n"); - task_period_stop(priv->info); + fflush(stdout); return 0; } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/7] btrfs-progs: resst info->periodic.timer_fd's value after free
From: Zhao Lei task_period_stop() is used to close a timer explicitly, to avoid the timer handle closed again by task_stop(), we should reset its value after close. Also add value-reset for info->id for safe. Signed-off-by: Zhao Lei --- task-utils.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/task-utils.c b/task-utils.c index 17fd573..768be94 100644 --- a/task-utils.c +++ b/task-utils.c @@ -64,10 +64,13 @@ void task_stop(struct task_info *info) if (info->id > 0) { pthread_cancel(info->id); pthread_join(info->id, NULL); + info->id = -1; } - if (info->periodic.timer_fd) + if (info->periodic.timer_fd) { close(info->periodic.timer_fd); + info->periodic.timer_fd = 0; + } if (info->postfn) info->postfn(info->private_data); @@ -130,5 +133,6 @@ void task_period_stop(struct task_info *info) if (info->periodic.timer_fd) { timerfd_settime(info->periodic.timer_fd, 0, NULL, NULL); close(info->periodic.timer_fd); + info->periodic.timer_fd = -1; } } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/7] btrfs-progs: Fix wrong address accessing by subthread in btrfs-convert
From: Zhao Lei It is v2 patchset of: Fix wrong address accessing by subthread in btrfs-convert. Changelog: 1: Move bug test script from patch description into a single script in test/misc-tests, suggested-by: David Sterba 2: Move code for creating loop device to common [PATCH 6/7] Zhao Lei (7): btrfs-progs: Fix wrong address accessing by subthread in btrfs-convert btrfs-progs: Move close timer handle code to line after sub process exit btrfs-progs: Remove cleanup-timer code for btrfs-convert btrfs-progs: resst info->periodic.timer_fd's value after free btrfs-progs: Set info->periodic.timer_fd to 0 in init-fail btrfs-progs: Move code to create loop device to common btrfs-progs: Introduce a misc test for thread conflict in btrfs-convert btrfs-convert.c| 4 +--- task-utils.c | 22 ++ tests/common | 16 tests/fsck-tests/013-extent-tree-rebuild/test.sh | 11 +-- tests/misc-tests/001-btrfstune-features/test.sh| 11 +-- tests/misc-tests/002-uuid-rewrite/test.sh | 11 +-- tests/misc-tests/003-zero-log/test.sh | 11 +-- .../misc-tests/004-convert-thread-conflict/test.sh | 14 ++ 8 files changed, 49 insertions(+), 51 deletions(-) create mode 100755 tests/misc-tests/004-convert-thread-conflict/test.sh -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] btrfs-progs: Move close timer handle code to line after sub process exit
From: Zhao Lei The timer handle have possibility in using by sub thread, better to close it after sub process exit. Signed-off-by: Zhao Lei --- task-utils.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/task-utils.c b/task-utils.c index 0390a69..17fd573 100644 --- a/task-utils.c +++ b/task-utils.c @@ -61,14 +61,14 @@ void task_stop(struct task_info *info) if (!info) return; - if (info->periodic.timer_fd) - close(info->periodic.timer_fd); - if (info->id > 0) { pthread_cancel(info->id); pthread_join(info->id, NULL); } + if (info->periodic.timer_fd) + close(info->periodic.timer_fd); + if (info->postfn) info->postfn(info->private_data); } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] btrfs-progs: resst info->periodic.timer_fd's value after free
From: Zhao Lei task_period_stop() is used to close a timer explicitly, to avoid the timer handle closed again by task_stop(), we should reset its value after close. Also add value-reset for info->id for safe. Signed-off-by: Zhao Lei --- task-utils.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/task-utils.c b/task-utils.c index 17fd573..768be94 100644 --- a/task-utils.c +++ b/task-utils.c @@ -64,10 +64,13 @@ void task_stop(struct task_info *info) if (info->id > 0) { pthread_cancel(info->id); pthread_join(info->id, NULL); + info->id = -1; } - if (info->periodic.timer_fd) + if (info->periodic.timer_fd) { close(info->periodic.timer_fd); + info->periodic.timer_fd = 0; + } if (info->postfn) info->postfn(info->private_data); @@ -130,5 +133,6 @@ void task_period_stop(struct task_info *info) if (info->periodic.timer_fd) { timerfd_settime(info->periodic.timer_fd, 0, NULL, NULL); close(info->periodic.timer_fd); + info->periodic.timer_fd = -1; } } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] btrfs-progs: Remove cleanup-timer code for btrfs-convert
From: Zhao Lei No need to close timer handle afain in subthread-close-callback, it is closed by task_stop() automatically. This patch also add a fflush() to make log output on time. Signed-off-by: Zhao Lei --- btrfs-convert.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/btrfs-convert.c b/btrfs-convert.c index b89c685..7dfab7d 100644 --- a/btrfs-convert.c +++ b/btrfs-convert.c @@ -71,10 +71,8 @@ static void *print_copied_inodes(void *p) static int after_copied_inodes(void *p) { - struct task_ctx *priv = p; - printf("\n"); - task_period_stop(priv->info); + fflush(stdout); return 0; } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] btrfs-progs: Set info->periodic.timer_fd to 0 in init-fail
From: Zhao Lei In current code, (info->periodic.timer_fd == 0) means it is not valid, as: if (info->periodic.timer_fd) { close(info->periodic.timer_fd); ... We need set its value from -1 to 0 in create-fail case, to make code like above works. Signed-off-by: Zhao Lei --- task-utils.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/task-utils.c b/task-utils.c index 768be94..58f5195 100644 --- a/task-utils.c +++ b/task-utils.c @@ -94,8 +94,10 @@ int task_period_start(struct task_info *info, unsigned int period_ms) return -1; info->periodic.timer_fd = timerfd_create(CLOCK_MONOTONIC, 0); - if (info->periodic.timer_fd == -1) + if (info->periodic.timer_fd == -1) { + info->periodic.timer_fd = 0; return info->periodic.timer_fd; + } info->periodic.wakeups_missed = 0; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] btrfs-progs: Fix wrong address accessing by subthread in btrfs-convert
From: Zhao Lei We can reproduce this bug by a simple script: DEV=/dev/vdh for ((i = 0; i < 100; i++)); do echo "loop $i" mkfs.ext4 "$DEV" &>/dev/null || { echo mkfs fail; break; } btrfs-convert "$DEV" || break done Result is like: loop 0 ... loop 1 ... loop 3 create btrfs filesystem: blocksize: 4096 nodesize: 16384 features: extref, skinny-metadata (default) creating btrfs metadata. creating ext2fs image file. trans 7 running 5 ctree.c:363: btrfs_cow_block: Assertion `1` failed. btrfs-convert(btrfs_cow_block+0x92)[0x40acaf] btrfs-convert(btrfs_search_slot+0x1cb)[0x40c50f] btrfs-convert(btrfs_csum_file_block+0x20f)[0x41d83a] btrfs-convert[0x43422d] btrfs-convert[0x4342cd] btrfs-convert[0x4345ca] btrfs-convert[0x434767] btrfs-convert[0x435770] btrfs-convert[0x439748] btrfs-convert(main+0x13f8)[0x43b09d] /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd] btrfs-convert[0x407649] Reason is complex: 1: main thread allocated a block of memory, shared with sub thread 2: main thread killed sub thread, and free above memory 3: main thread malloc a new one(in same address), and use it 4: sub thread(which is not really quit), write into this address, and caused this bug. By adding some debug lines into code, we can see following output: create btrfs filesystem: blocksize: 4096 nodesize: 16384 features: extref, skinny-metadata (default) creating btrfs metadata. 1: ctx(0x7ffe1abde230)->info=0xc65b80 2: task_period_start: will create periodic.timer_fd 3: task_stop: info->periodic.timer_fd = NULL 4: task_stop: begin pthread_cancel info->id=-1746053376 5: task_stop: done pthread_cancel ret=0 6: task_stop: begin info->postfn 7: task_period_stop: periodic.timer_fd NULL 8: task_stop: done info->postfn 9: task_stop: done all 10: creating ext2fs image file. trans 7 running 5 11: task_period_start: create periodic.timer_fd done info->periodic.timer_fd(0xc65b80)=7 12: btrfs_cow_block: root->fs_info->generation(0xc63568) = 5 trans->transid(0xc65b80)=7 13: ctree.c:368: btrfs_cow_block: Assertion `1` failed. ./btrfs-convert(btrfs_cow_block+0xda)[0x40ad37] ./btrfs-convert(btrfs_search_slot+0x1cb)[0x40c5b4] ./btrfs-convert(btrfs_insert_empty_items+0xac)[0x40d9f6] ./btrfs-convert(btrfs_record_file_extent+0xc0)[0x4183fe] ./btrfs-convert[0x435796] ./btrfs-convert[0x439b0c] ./btrfs-convert(main+0x13f8)[0x43b45d] /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd] ./btrfs-convert[0x407689] Conclusion: a: subthread should exit before step 5, but it is still running in step 11 b: task_stop() hadn't close periodic.timer_fd in step3, because periodic.timer_fd is not initialized yet. c. address of 0xc65b80 is overwrited by subthread in step 11, but this address is freed and re-malloc by main thread before step 10, and used for trans->transid. d: trans->transid which is overwrite by subthread caused error in step 13. Fix: pthread_cancel() only send a cancellation request to the thread, thread will quit in next cancellation point by default. To make sub thread quit in time, this patch add pthread_join() after pthread_cancel() call. And to make pthread_join() works, pthread_detach() is removed. Test result: Passed 400 times of above script Signed-off-by: Zhao Lei --- task-utils.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/task-utils.c b/task-utils.c index 10e3f0f..0390a69 100644 --- a/task-utils.c +++ b/task-utils.c @@ -50,9 +50,7 @@ int task_start(struct task_info *info) ret = pthread_create(&info->id, NULL, info->threadfn, info->private_data); - if (ret == 0) - pthread_detach(info->id); - else + if (ret) info->id = -1; return ret; @@ -66,8 +64,10 @@ void task_stop(struct task_info *info) if (info->periodic.timer_fd) close(info->periodic.timer_fd); - if (info->id > 0) + if (info->id > 0) { pthread_cancel(info->id); + pthread_join(info->id, NULL); + } if (info->postfn) info->postfn(info->private_data); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Fix scrub panic when leaf accross stripes
From: Zhao Lei Scrub panic in following operation: mkfs.ext4 /dev/vdh btrfs-convert /dev/vdh mount /dev/vdh /mnt/tmp1 btrfs scrub start -B /dev/vdh (panic) Reason: 1: In some case, leaf created by btrfs-convert was splited into 2 strips. 2: Scrub bypassed part of above wrong leaf data, but remain data caused panic in scrub_checksum_tree_block(). For reason 1: we can get following information after some simple operation. a. mkfs.ext4 /dev/vdh btrfs-convert /dev/vdh b. btrfs-debug-tree /dev/vdh we can see following item in extent tree: item 25 key (27054080 METADATA_ITEM 0) itemoff 15083 itemsize 33 Its logical address is [27054080, 27070464) and acrossed 2 strips: [27000832, 27066368) [27066368, 27131904) Will be fixed in btrfs-progs(btrfs-convert, btrfsck, ...) For reason 2: Scrub is trying to do a "bypass" in this case, but the result is "panic", because current code lacks of some condition in bypass, and let some wrong leaf data escaped. This patch fixed above scrub code. Before patch: # btrfs scrub start -B /dev/vdh (panic) After patch: # btrfs scrub start -B /dev/vdh scrub done for 353cec8f-da31-4a94-aa35-be72d997b06e ... # dmesg ... [ 59.088697] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27000832 [ 59.089929] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27066368 # Reported-by: Chris Murphy Signed-off-by: Zhao Lei --- fs/btrfs/scrub.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 94db0fa..35d49b2 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2881,11 +2881,12 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, flags = btrfs_extent_flags(l, extent); generation = btrfs_extent_generation(l, extent); - if (key.objectid < logic_start && - (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK)) { - btrfs_err(fs_info, - "scrub: tree block %llu spanning stripes, ignored. logical=%llu", - key.objectid, logic_start); + if ((flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) && + (key.objectid < logic_start || +key.objectid + bytes > +logic_start + map->stripe_len)) { + btrfs_err(fs_info, "scrub: tree block %llu spanning stripes, ignored. logical=%llu", + key.objectid, logic_start); goto next; } again: @@ -3212,8 +3213,10 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, flags = btrfs_extent_flags(l, extent); generation = btrfs_extent_generation(l, extent); - if (key.objectid < logical && - (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK)) { + if ((flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) && + (key.objectid < logical || +key.objectid + bytes > +logical + map->stripe_len)) { btrfs_err(fs_info, "scrub: tree block %llu spanning " "stripes, ignored. logical=%llu", -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: Accurate errormsg for resize operation on no-enouth-free-space case
From: Zhao Lei btrfs progs output following error message when doing resize on no-enouth-free-space case: # btrfs filesystem resize +10g /mnt/btrfs_5gb Resize '/mnt/btrfs_5gb' of '+10g' ERROR: unable to resize '/mnt/btrfs_5gb' - File too large # It is not a good description for users, and this patch changed it to: # ./btrfs filesystem resize +10G /mnt/tmp1 Resize '/mnt/tmp1' of '+10G' ERROR: unable to resize '/mnt/tmp1' - no enouth free space # Reported-by: Taeha Kim Signed-off-by: Zhao Lei --- cmds-filesystem.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index 800aa4d..c393ce7 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -1327,8 +1327,16 @@ static int cmd_resize(int argc, char **argv) e = errno; close_file_or_dir(fd, dirstream); if( res < 0 ){ - fprintf(stderr, "ERROR: unable to resize '%s' - %s\n", - path, strerror(e)); + switch (e) { + case EFBIG: + fprintf(stderr, "ERROR: unable to resize '%s' - no enouth free space\n", + path); + break; + default: + fprintf(stderr, "ERROR: unable to resize '%s' - %s\n", + path, strerror(e)); + break; + } return 1; } else if (res > 0) { const char *err_str = btrfs_err_str(res); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs: Bypass unrelated items before accessing its contents in scrub
From: Zhao Lei When we access extent_root in scrub_stripe() and scrub_raid56_parity(), we need bypass unrelated tree item firstly before using its contents to do other condition. It is not a bug fix, only making code sequence in logic. Signed-off-by: Zhao Lei --- fs/btrfs/scrub.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index d72e8e1..3e8b3fc 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2856,6 +2856,10 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, } btrfs_item_key_to_cpu(l, &key, slot); + if (key.type != BTRFS_EXTENT_ITEM_KEY && + key.type != BTRFS_METADATA_ITEM_KEY) + goto next; + if (key.type == BTRFS_METADATA_ITEM_KEY) bytes = root->nodesize; else @@ -2864,10 +2868,6 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, if (key.objectid + bytes <= logic_start) goto next; - if (key.type != BTRFS_EXTENT_ITEM_KEY && - key.type != BTRFS_METADATA_ITEM_KEY) - goto next; - if (key.objectid >= logic_end) { stop_loop = 1; break; @@ -3192,6 +3192,10 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, } btrfs_item_key_to_cpu(l, &key, slot); + if (key.type != BTRFS_EXTENT_ITEM_KEY && + key.type != BTRFS_METADATA_ITEM_KEY) + goto next; + if (key.type == BTRFS_METADATA_ITEM_KEY) bytes = root->nodesize; else @@ -3200,10 +3204,6 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, if (key.objectid + bytes <= logical) goto next; - if (key.type != BTRFS_EXTENT_ITEM_KEY && - key.type != BTRFS_METADATA_ITEM_KEY) - goto next; - if (key.objectid >= logical + map->stripe_len) { /* out of this device extent */ if (key.objectid >= logic_end) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs: Load only necessary csums into list in scrub
From: Zhao Lei We need not load csum of whole strip in scrub because strip is trimed before use, it is to say, what we really need to calculate csum is data between [extent_logical, extent_len). This patch changed to use above segment for btrfs_lookup_csums_range() in scrub_stripe() Signed-off-by: Zhao Lei --- fs/btrfs/scrub.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 7f56603..d72e8e1 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3251,9 +3251,11 @@ again: &extent_dev, &extent_mirror_num); - ret = btrfs_lookup_csums_range(csum_root, logical, - logical + map->stripe_len - 1, - &sctx->csum_list, 1); + ret = btrfs_lookup_csums_range(csum_root, + extent_logical, + extent_logical + + extent_len - 1, + &sctx->csum_list, 1); if (ret) goto out; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Fix calculate typo caused by ambiguous meaning of logic_end
From: Zhao Lei For example, in scrub_raid56_parity(), following lines are used to judge is all data processed: place1: if (key.objectid > logic_end) ... place2: if (logic_start >= logic_end) ... ... (place2 is typo, is should be ">", it is copied from other place, where logic_end's meaning is different, long story...) We can fix above typo directly, but the root reason is ambiguous meaning of logic_end in scrub raid56 parity. In other place, XXX_end is pointed to data which is not included, and we need to process segment of [XXX_start, XXX_end). But for scrub raid56 parity, logic_end is pointed to lattest data need to process, and introduced many "+ 1" and "- 1" in code as below: length = sparity->logic_end - sparity->logic_start + 1 logic_end - logic_start + 1 stripe_logical + increment - 1 This patch changed logic_end's meaning to make it in normal understanding in raid56 parity functions and data struct alone with above bugfix. Signed-off-by: Zhao Lei --- fs/btrfs/scrub.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 24720f6..7f56603 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2702,7 +2702,7 @@ static void scrub_parity_check_and_repair(struct scrub_parity *sparity) sparity->nsectors)) goto out; - length = sparity->logic_end - sparity->logic_start + 1; + length = sparity->logic_end - sparity->logic_start; ret = btrfs_map_sblock(sctx->dev_root->fs_info, WRITE, sparity->logic_start, &length, &bbio, 0, 1); @@ -2868,7 +2868,7 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, key.type != BTRFS_METADATA_ITEM_KEY) goto next; - if (key.objectid > logic_end) { + if (key.objectid >= logic_end) { stop_loop = 1; break; } @@ -2957,7 +2957,7 @@ next: out: if (ret < 0) scrub_parity_mark_sectors_error(sparity, logic_start, - logic_end - logic_start + 1); + logic_end - logic_start); scrub_parity_put(sparity); scrub_submit(sctx); mutex_lock(&sctx->wr_ctx.wr_lock); @@ -3138,7 +3138,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, logical += base; if (ret) { stripe_logical += base; - stripe_end = stripe_logical + increment - 1; + stripe_end = stripe_logical + increment; ret = scrub_raid56_parity(sctx, map, scrub_dev, ppath, stripe_logical, stripe_end); @@ -3284,7 +3284,7 @@ loop: if (ret && physical < physical_end) { stripe_logical += base; stripe_end = stripe_logical + - increment - 1; + increment; ret = scrub_raid56_parity(sctx, map, scrub_dev, ppath, stripe_logical, -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs: Check cancel and pause in interval of scrub operation
From: Zhao Lei Old code checking cancel and pause request inside scrub stripe operation, like: loop() { if (parity) { scrub_parity_stripe(); continue; } check_cancel_and_pause() scrub_normal_stripe(); } Reason is when introduce raid56 stripe scrub, new code is inserted simplely to front of loop. Better to: loop() { check_cancel_and_pause() if (parity) scrub_parity_stripe(); else scrub_normal_stripe(); } This patch adjusted code place to realize above sequence. Signed-off-by: Zhao Lei --- fs/btrfs/scrub.c | 34 ++ 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 94db0fa..f8551b9 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3104,22 +3104,6 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, */ ret = 0; while (physical < physical_end) { - /* for raid56, we skip parity stripe */ - if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { - ret = get_raid56_logic_offset(physical, num, - map, &logical, &stripe_logical); - logical += base; - if (ret) { - stripe_logical += base; - stripe_end = stripe_logical + increment - 1; - ret = scrub_raid56_parity(sctx, map, scrub_dev, - ppath, stripe_logical, - stripe_end); - if (ret) - goto out; - goto skip; - } - } /* * canceled? */ @@ -3144,6 +3128,24 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, scrub_blocked_if_needed(fs_info); } + /* for raid56, we skip parity stripe */ + if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { + ret = get_raid56_logic_offset(physical, num, map, + &logical, + &stripe_logical); + logical += base; + if (ret) { + stripe_logical += base; + stripe_end = stripe_logical + increment - 1; + ret = scrub_raid56_parity(sctx, map, scrub_dev, + ppath, stripe_logical, + stripe_end); + if (ret) + goto out; + goto skip; + } + } + if (btrfs_fs_incompat(fs_info, SKINNY_METADATA)) key.type = BTRFS_METADATA_ITEM_KEY; else -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs: Free checksum list on scrub_extent() fail
From: Zhao Lei When scrub_extent() failed, we need to free previois created checksum list. Signed-off-by: Zhao Lei --- fs/btrfs/scrub.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index f8551b9..24720f6 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2923,10 +2923,12 @@ again: extent_dev, flags, generation, extent_mirror_num); + + scrub_free_csums(sctx); + if (ret) goto out; - scrub_free_csums(sctx); if (extent_logical + extent_len < key.objectid + bytes) { logic_start += map->stripe_len; @@ -3259,10 +3261,12 @@ again: extent_physical, extent_dev, flags, generation, extent_mirror_num, extent_logical - logical + physical); + + scrub_free_csums(sctx); + if (ret) goto out; - scrub_free_csums(sctx); if (extent_logical + extent_len < key.objectid + bytes) { if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Show detail information when mount failed on missing devices
From: Zhao Lei When mount failed because missing device, we can see following dmesg: [ 1060.267743] BTRFS: too many missing devices, writeable mount is not allowed [ 1060.273158] BTRFS: open_ctree failed This patch add missing_device_number and tolerated_missing_device_number to above output, to let user know what really happened, and helps bug-report and debug. dmesg after patch: [ 127.050367] BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not allowed [ 127.056099] BTRFS: open_ctree failed Changelog v1->v2: 1: Changed to more clear description, suggested-by: Anand Jain Suggested-by: Anand Jain Signed-off-by: Zhao Lei --- fs/btrfs/disk-io.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 2eda03b..5b44e02 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2950,8 +2950,9 @@ retry_root_backup: if (fs_info->fs_devices->missing_devices > fs_info->num_tolerated_disk_barrier_failures && !(sb->s_flags & MS_RDONLY)) { - printk(KERN_WARNING "BTRFS: " - "too many missing devices, writeable mount is not allowed\n"); + pr_warn("BTRFS: missing devices(%llu) exceeds the limit(%d), writeable mount is not allowed\n", + fs_info->fs_devices->missing_devices, + fs_info->num_tolerated_disk_barrier_failures); goto fail_sysfs; } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] btrfs: Add raid56 support for updating num_tolerated_disk_barrier_failures in btrfs_balance()
From: Zhao Lei Code for updating fs_info->num_tolerated_disk_barrier_failures in btrfs_balance() lacks raid56 support. Reason: Above code was wroten in 2012-08-01, together with btrfs_calc_num_tolerated_disk_barrier_failures()'s first version. Then, btrfs_calc_num_tolerated_disk_barrier_failures() got updated later to support raid56, but code in btrfs_balance() was not updated together. Fix: Merge above similar code to a common function: btrfs_get_num_tolerated_disk_barrier_failures() and make it support both case. It can fix this bug with a bonus of cleanup, and make these code never in above no-sync state from now on. Changelog v1-v2: 1: Use a common function instead of adding extra argument to btrfs_calc_num_tolerated_disk_barrier_failures(), which is quite usefully used by many more functions, Suggested-by: Anand Jain Suggested-by: Anand Jain Signed-off-by: Zhao Lei --- fs/btrfs/disk-io.c | 47 +-- fs/btrfs/disk-io.h | 1 + fs/btrfs/volumes.c | 21 - 3 files changed, 30 insertions(+), 39 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b6600c7..1e5dcfd 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3440,6 +3440,26 @@ static int barrier_all_devices(struct btrfs_fs_info *info) return 0; } +int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags) +{ + if ((flags & (BTRFS_BLOCK_GROUP_DUP | + BTRFS_BLOCK_GROUP_RAID0 | + BTRFS_AVAIL_ALLOC_BIT_SINGLE)) || + ((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0)) + return 0; + + if (flags & (BTRFS_BLOCK_GROUP_RAID1 | +BTRFS_BLOCK_GROUP_RAID5 | +BTRFS_BLOCK_GROUP_RAID10)) + return 1; + + if (flags & BTRFS_BLOCK_GROUP_RAID6) + return 2; + + pr_warn("BTRFS: unknown raid type: %llu\n", flags); + return 0; +} + int btrfs_calc_num_tolerated_disk_barrier_failures( struct btrfs_fs_info *fs_info) { @@ -3482,28 +3502,11 @@ int btrfs_calc_num_tolerated_disk_barrier_failures( if (space.total_bytes == 0 || space.used_bytes == 0) continue; flags = space.flags; - /* -* return -* 0: if dup, single or RAID0 is configured for -*any of metadata, system or data, else -* 1: if RAID5 is configured, or if RAID1 or -*RAID10 is configured and only two mirrors -*are used, else -* 2: if RAID6 is configured -*/ - if (num_tolerated_disk_barrier_failures > 0 && - ((flags & (BTRFS_BLOCK_GROUP_DUP | - BTRFS_BLOCK_GROUP_RAID0)) || -((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0))) - num_tolerated_disk_barrier_failures = 0; - else if (num_tolerated_disk_barrier_failures > 1 && - (flags & (BTRFS_BLOCK_GROUP_RAID1 | -BTRFS_BLOCK_GROUP_RAID5 | -BTRFS_BLOCK_GROUP_RAID10))) - num_tolerated_disk_barrier_failures = 1; - else if (num_tolerated_disk_barrier_failures > 2 && - (flags & BTRFS_BLOCK_GROUP_RAID6)) - num_tolerated_disk_barrier_failures = 2; + + num_tolerated_disk_barrier_failures = min( + num_tolerated_disk_barrier_failures, + btrfs_get_num_tolerated_disk_barrier_failures( + flags)); } up_read(&sinfo->groups_sem); } diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index d4cbfee..bdfb479 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -139,6 +139,7 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans, u64 objectid); int btree_lock_page_hook(struct page *page, void *data, void (*flush_fn)(void *)); +int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags); int btrfs_calc_num_tolerated_disk_barrier_failures( struct btrfs_fs_info *fs_info); int __init btrfs_end_io_wq_init(void); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index fbe7c10..a4392ad 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3573,23 +3573,10 @@ int btrfs_balance(struct btrfs_balance_control *bctl, } while (read_seqretry(&fs_info->profiles_lock, seq)); if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) { -
[PATCH] [RFC] btrfs: Avoid using single-type chunk on degree mode
From: Zhao Lei We can get mount-fail in following operation: # mkfs a raid1 filesystem mkfs.btrfs -f -d raid1 -m raid1 /dev/vdd /dev/vde # destroy a disk dd if=/dev/zero of=/dev/vde bs=1M count=1 # do some fs operation on degraded mode mount -o degraded /dev/vdd /mnt/test touch /mnt/test/123 rm -f /mnt/test/123 sync umount /mnt/test # mount fs again mount -o degraded /dev/vdd /mnt/test Above mount will output following error message: mount: wrong fs type, bad option, bad superblock on /dev/vdd, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so With following dmesg: [ 127.912406] BTRFS: too many missing devices(1 > 0), writeable mount is not allowed [ 127.918128] BTRFS: open_ctree failed Reason: When we do fs operation in degraded fs, btrfs_reduce_alloc_profile() have possibility to clean all existing raid mode flag because no-enouth-disk, and return a all-zero raid flag, and use this flag to do find_free_extent(), then write data into single-type chunk. In current version of mkfs, we have 3 single-type chunks in init, data will write to above chunks first. And for mkfs after Qu Wenruo 's patch to avoid creating above 3 single-type init chunks, find_free_extent() will create these chunks. And, because filesystem have data in single-mode chunks, btrfs_calc_num_tolerated_disk_barrier_failures() will return 0, it is to say, loss-one-disk fs is not allowed to mount, and caused above mount fail. Fix: This problem is caused by multi-reason, but the main reason may be: we can't write data into sinele-mode chunk in degraded mode, except filesystem is created with single. This patch add a condition before find_free_extent(), if the filesystem is not created with single-mode(have other raid mode), we forbid write new datas to single chunks. Fix result: This patch fixed above bug, but we can not write any data into filesystem in above degraded mount. (data write to single-mode chunk before patch) It is different with old style, which is better? (allow or not allow to write into single-mode chunk)? Or we have another better way to fix this bug? Signed-off-by: Zhao Lei --- fs/btrfs/ctree.h | 3 ++- fs/btrfs/extent-tree.c | 60 +- fs/btrfs/inode.c | 3 ++- fs/btrfs/super.c | 2 +- fs/btrfs/volumes.c | 4 ++-- 5 files changed, 47 insertions(+), 25 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 3b69324..11a5c4a 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3439,7 +3439,8 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info); void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans, struct btrfs_root *root); -u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data); +u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data, + int no_device_reduce); void btrfs_clear_space_info_full(struct btrfs_fs_info *info); enum btrfs_reserve_flush_enum { diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 1c2bd17..3cdbb1c 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3737,7 +3737,8 @@ static u64 get_restripe_target(struct btrfs_fs_info *fs_info, u64 flags) * progress (either running or paused) picks the target profile (if it's * already available), otherwise falls back to plain reducing. */ -static u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags) +static u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags, + int no_device_reduce) { u64 num_devices = root->fs_info->fs_devices->rw_devices; u64 target; @@ -3759,13 +3760,16 @@ static u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags) spin_unlock(&root->fs_info->balance_lock); /* First, mask out the RAID levels which aren't possible */ - if (num_devices == 1) - flags &= ~(BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID0 | - BTRFS_BLOCK_GROUP_RAID5); - if (num_devices < 3) - flags &= ~BTRFS_BLOCK_GROUP_RAID6; - if (num_devices < 4) - flags &= ~BTRFS_BLOCK_GROUP_RAID10; + if (!no_device_reduce) { + if (num_devices == 1) + flags &= ~(BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID0 | + BTRFS_BLOCK_GROUP_RAID5); + if (num_devices < 3) + flags &= ~BTRFS_BLOCK_GROUP_RAID6; + if (num_devices < 4) + flags &= ~BTRFS_BLOCK_GROUP_RAID10; + } tmp = flags & (BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RA
[PATCH] btrfs: Add raid56 support for updating num_tolerated_disk_barrier_failures in btrfs_balance()
From: Zhao Lei Code for updating fs_info->num_tolerated_disk_barrier_failures in btrfs_balance() lacks raid56 support. Reason: Above code was wroten in 2012-08-01, together with btrfs_calc_num_tolerated_disk_barrier_failures()'s first version. Then, btrfs_calc_num_tolerated_disk_barrier_failures() was updated later to support raid56, but code in btrfs_balance() was not updated together. Fix: Merge these similar code by adding a argument to btrfs_calc_num_tolerated_disk_barrier_failures() to make it support both case. It can fix this bug with a bonus of cleanup, and make these code never in current no-sync state from now on. Signed-off-by: Zhao Lei --- fs/btrfs/disk-io.c | 9 + fs/btrfs/disk-io.h | 2 +- fs/btrfs/volumes.c | 28 +--- 3 files changed, 15 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b6600c7..ac26111 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2946,7 +2946,7 @@ retry_root_backup: goto fail_sysfs; } fs_info->num_tolerated_disk_barrier_failures = - btrfs_calc_num_tolerated_disk_barrier_failures(fs_info); + btrfs_calc_num_tolerated_disk_barrier_failures(fs_info, 0); if (fs_info->fs_devices->missing_devices > fs_info->num_tolerated_disk_barrier_failures && !(sb->s_flags & MS_RDONLY)) { @@ -3441,7 +3441,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info) } int btrfs_calc_num_tolerated_disk_barrier_failures( - struct btrfs_fs_info *fs_info) + struct btrfs_fs_info *fs_info, u64 extra_flags) { struct btrfs_ioctl_space_info space; struct btrfs_space_info *sinfo; @@ -3481,7 +3481,7 @@ int btrfs_calc_num_tolerated_disk_barrier_failures( &space); if (space.total_bytes == 0 || space.used_bytes == 0) continue; - flags = space.flags; + flags = space.flags | extra_flags; /* * return * 0: if dup, single or RAID0 is configured for @@ -3493,7 +3493,8 @@ int btrfs_calc_num_tolerated_disk_barrier_failures( */ if (num_tolerated_disk_barrier_failures > 0 && ((flags & (BTRFS_BLOCK_GROUP_DUP | - BTRFS_BLOCK_GROUP_RAID0)) || + BTRFS_BLOCK_GROUP_RAID0 | + BTRFS_AVAIL_ALLOC_BIT_SINGLE)) || ((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0))) num_tolerated_disk_barrier_failures = 0; else if (num_tolerated_disk_barrier_failures > 1 && diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index d4cbfee..aceaa8d 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -140,7 +140,7 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans, int btree_lock_page_hook(struct page *page, void *data, void (*flush_fn)(void *)); int btrfs_calc_num_tolerated_disk_barrier_failures( - struct btrfs_fs_info *fs_info); + struct btrfs_fs_info *fs_info, u64 extra_flags); int __init btrfs_end_io_wq_init(void); void btrfs_end_io_wq_exit(void); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index fbe7c10..d739915 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1812,7 +1812,8 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) } root->fs_info->num_tolerated_disk_barrier_failures = - btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info); + btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info, + 0); /* * at this point, the device is zero sized. We want to @@ -2342,7 +2343,8 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) } root->fs_info->num_tolerated_disk_barrier_failures = - btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info); + btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info, + 0); ret = btrfs_commit_transaction(trans, root); if (seeding_dev) { @@ -3573,23 +3575,10 @@ int btrfs_balance(struct btrfs_balance_control *bctl, } while (read_seqretry(&fs_info->profiles_lock, seq)); if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) { - int num_tolerated_disk_barrier_failures; - u64 target = bctl->sys.target; - - num_tolerated_disk_barrier_failures = - btrfs_calc_num_tolerated_disk_barrier_failures(fs_info); - i
[PATCH] btrfs: Cleanup for btrfs_calc_num_tolerated_disk_barrier_failures()
From: Zhao Lei 1: Use ARRAY_SIZE(types) to replace a static-value variant: int num_types = 4; 2: Use 'continue' on condition to reduce one level tab if (!XXX) { code; ... } -> if (XXX) continue; code; ... 3: Put setting 'num_tolerated_disk_barrier_failures = 2' to (num_tolerated_disk_barrier_failures > 2) condition to make make logic neat. if (num_tolerated_disk_barrier_failures > 0 && XXX) num_tolerated_disk_barrier_failures = 0; else if (num_tolerated_disk_barrier_failures > 1) { if (XXX) num_tolerated_disk_barrier_failures = 1; else if (XXX) num_tolerated_disk_barrier_failures = 2; -> if (num_tolerated_disk_barrier_failures > 0 && XXX) num_tolerated_disk_barrier_failures = 0; if (num_tolerated_disk_barrier_failures > 1 && XXX) num_tolerated_disk_barrier_failures = ; if (num_tolerated_disk_barrier_failures > 2 && XXX) num_tolerated_disk_barrier_failures = 2; 4: Remove comment of: num_mirrors - 1: if RAID1 or RAID10 is configured and more than 2 mirrors are used. which is not fit with code. Signed-off-by: Zhao Lei --- fs/btrfs/disk-io.c | 73 -- 1 file changed, 33 insertions(+), 40 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f556c37..2eda03b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3448,13 +3448,12 @@ int btrfs_calc_num_tolerated_disk_barrier_failures( BTRFS_BLOCK_GROUP_SYSTEM, BTRFS_BLOCK_GROUP_METADATA, BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA}; - int num_types = 4; int i; int c; int num_tolerated_disk_barrier_failures = (int)fs_info->fs_devices->num_devices; - for (i = 0; i < num_types; i++) { + for (i = 0; i < ARRAY_SIZE(types); i++) { struct btrfs_space_info *tmp; sinfo = NULL; @@ -3472,44 +3471,38 @@ int btrfs_calc_num_tolerated_disk_barrier_failures( down_read(&sinfo->groups_sem); for (c = 0; c < BTRFS_NR_RAID_TYPES; c++) { - if (!list_empty(&sinfo->block_groups[c])) { - u64 flags; - - btrfs_get_block_group_info( - &sinfo->block_groups[c], &space); - if (space.total_bytes == 0 || - space.used_bytes == 0) - continue; - flags = space.flags; - /* -* return -* 0: if dup, single or RAID0 is configured for -*any of metadata, system or data, else -* 1: if RAID5 is configured, or if RAID1 or -*RAID10 is configured and only two mirrors -*are used, else -* 2: if RAID6 is configured, else -* num_mirrors - 1: if RAID1 or RAID10 is -* configured and more than -* 2 mirrors are used. -*/ - if (num_tolerated_disk_barrier_failures > 0 && - ((flags & (BTRFS_BLOCK_GROUP_DUP | - BTRFS_BLOCK_GROUP_RAID0)) || -((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) - == 0))) - num_tolerated_disk_barrier_failures = 0; - else if (num_tolerated_disk_barrier_failures > 1) { - if (flags & (BTRFS_BLOCK_GROUP_RAID1 | - BTRFS_BLOCK_GROUP_RAID5 | - BTRFS_BLOCK_GROUP_RAID10)) { - num_tolerated_disk_barrier_failures = 1; - } else if (flags & - BTRFS_BLOCK_GROUP_RAID6) { - num_tolerated_disk_barrier_failures = 2; - } - } - } + u64 flags; + + if (list_empty(&sinfo->block_groups[c])) + continue; + + btrfs_get_block_group_info(&sinfo->block_groups[c], + &space); + if (space.total_bytes == 0 || space.used_bytes == 0) +
[PATCH] btrfs: Show detail information when mount failed on missing devices
From: Zhao Lei When mount failed because missing device, we can see following dmesg: [ 1060.267743] BTRFS: too many missing devices, writeable mount is not allowed [ 1060.273158] BTRFS: open_ctree failed This patch add missing_device_number and tolerated_missing_device_number to above output, to let user know what really happened, and helps bug-report and debug. dmesg after patch: [ 127.912406] BTRFS: too many missing devices(1 > 0), writeable mount is not allowed [ 127.918128] BTRFS: open_ctree failed Signed-off-by: Zhao Lei --- fs/btrfs/disk-io.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 2eda03b..b6600c7 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2950,8 +2950,9 @@ retry_root_backup: if (fs_info->fs_devices->missing_devices > fs_info->num_tolerated_disk_barrier_failures && !(sb->s_flags & MS_RDONLY)) { - printk(KERN_WARNING "BTRFS: " - "too many missing devices, writeable mount is not allowed\n"); + pr_warn("BTRFS: too many missing devices(%llu > %d), writeable mount is not allowed\n", + fs_info->fs_devices->missing_devices, + fs_info->num_tolerated_disk_barrier_failures); goto fail_sysfs; } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Avoid NULL pointer dereference of free_extent_buffer when read_tree_block() fail
From: Zhao Lei When read_tree_block() failed, we can see following dmesg: [ 134.371389] BUG: unable to handle kernel NULL pointer dereference at 0063 [ 134.372236] IP: [] free_extent_buffer+0x21/0x90 [ 134.372236] PGD 0 [ 134.372236] Oops: [#1] SMP [ 134.372236] Modules linked in: [ 134.372236] CPU: 0 PID: 2289 Comm: mount Not tainted 4.2.0-rc1_HEAD_c65b99f046843d2455aa231747b5a07a999a9f3d_+ #115 [ 134.372236] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 [ 134.372236] task: 88003b6e1a00 ti: 880011e6 task.ti: 880011e6 [ 134.372236] RIP: 0010:[] [] free_extent_buffer+0x21/0x90 ... [ 134.372236] Call Trace: [ 134.372236] [] free_root_extent_buffers+0x91/0xb0 [ 134.372236] [] free_root_pointers+0x17d/0x190 [ 134.372236] [] open_ctree+0x1ca0/0x25b0 [ 134.372236] [] ? disk_name+0x97/0xb0 [ 134.372236] [] btrfs_mount+0x8fa/0xab0 ... Reason: read_tree_block() changed to return error number on fail, and this value(not NULL) is set to tree_root->node, then subsequent code will run to: free_root_pointers() ->free_root_extent_buffers() ->free_extent_buffer() ->atomic_read((extent_buffer *)(-E_XXX)->refs); and trigger above error. Fix: Set tree_root->node to NULL on fail to make error_handle code happy. Signed-off-by: Zhao Lei --- fs/btrfs/disk-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a9aadb2..f556c37 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2842,6 +2842,7 @@ int open_ctree(struct super_block *sb, !extent_buffer_uptodate(chunk_root->node)) { printk(KERN_ERR "BTRFS: failed to read chunk root on %s\n", sb->s_id); + chunk_root->node = NULL; goto fail_tree_roots; } btrfs_set_root_node(&chunk_root->root_item, chunk_root->node); @@ -2879,7 +2880,7 @@ retry_root_backup: !extent_buffer_uptodate(tree_root->node)) { printk(KERN_WARNING "BTRFS: failed to read tree root on %s\n", sb->s_id); - + tree_root->node = NULL; goto recovery_tree_root; } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Fix lockdep warning of btrfs_run_delayed_iputs()
From: Zhao Lei Liu Bo reported a lockdep warning of delayed_iput_sem in xfstests generic/241: [ 2061.345955] = [ 2061.346027] [ INFO: possible recursive locking detected ] [ 2061.346027] 4.1.0+ #268 Tainted: GW [ 2061.346027] - [ 2061.346027] btrfs-cleaner/3045 is trying to acquire lock: [ 2061.346027] (&fs_info->delayed_iput_sem){..}, at: [] btrfs_run_delayed_iputs+0x6b/0x100 [ 2061.346027] but task is already holding lock: [ 2061.346027] (&fs_info->delayed_iput_sem){..}, at: [] btrfs_run_delayed_iputs+0x6b/0x100 [ 2061.346027] other info that might help us debug this: [ 2061.346027] Possible unsafe locking scenario: [ 2061.346027]CPU0 [ 2061.346027] [ 2061.346027] lock(&fs_info->delayed_iput_sem); [ 2061.346027] lock(&fs_info->delayed_iput_sem); [ 2061.346027] *** DEADLOCK *** It is rarely happened, about 1/400 in my test env. The reason is recursion of btrfs_run_delayed_iputs(): cleaner_kthread -> btrfs_run_delayed_iputs() *1 -> get delayed_iput_sem lock *2 -> iput() -> ... -> btrfs_commit_transaction() -> btrfs_run_delayed_iputs() *1 -> get delayed_iput_sem lock (dead lock) *2 *1: recursion of btrfs_run_delayed_iputs() *2: warning of lockdep about delayed_iput_sem When fs is in high stress, new iputs may added into fs_info->delayed_iputs list when btrfs_run_delayed_iputs() is running, which cause second btrfs_run_delayed_iputs() run into down_read(&fs_info->delayed_iput_sem) again, and cause above lockdep warning. Actually, it will not cause real problem because both locks are read lock, but to avoid lockdep warning, we can do a fix. Fix: Don't do btrfs_run_delayed_iputs() in btrfs_commit_transaction() for cleaner_kthread thread to break above recursion path. cleaner_kthread is calling btrfs_run_delayed_iputs() explicitly in code, and don't need to call btrfs_run_delayed_iputs() again in btrfs_commit_transaction(), it also give us a bonus to avoid stack overflow. Test: No above lockdep warning after patch in 1200 generic/241 tests. Reported-by: Liu Bo Signed-off-by: Zhao Lei --- fs/btrfs/transaction.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index c0f18e7..31248ad 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -2152,7 +2152,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, kmem_cache_free(btrfs_trans_handle_cachep, trans); - if (current != root->fs_info->transaction_kthread) + if (current != root->fs_info->transaction_kthread && + current != root->fs_info->cleaner_kthread) btrfs_run_delayed_iputs(root); return ret; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Remove noused chunk_tree and chunk_objectid from scrub_enumerate_chunks() and scrub_chunk()
From: Zhao Lei These variables are not used from introduced version , remove them. Signed-off-by: Zhao Lei --- fs/btrfs/scrub.c | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index eb35176..f552937 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3321,7 +3321,6 @@ out: static noinline_for_stack int scrub_chunk(struct scrub_ctx *sctx, struct btrfs_device *scrub_dev, - u64 chunk_tree, u64 chunk_objectid, u64 chunk_offset, u64 length, u64 dev_offset, int is_dev_replace) { @@ -3372,8 +3371,6 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, struct btrfs_root *root = sctx->dev_root; struct btrfs_fs_info *fs_info = root->fs_info; u64 length; - u64 chunk_tree; - u64 chunk_objectid; u64 chunk_offset; int ret; int slot; @@ -3431,8 +3428,6 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (found_key.offset + length <= start) goto skip; - chunk_tree = btrfs_dev_extent_chunk_tree(l, dev_extent); - chunk_objectid = btrfs_dev_extent_chunk_objectid(l, dev_extent); chunk_offset = btrfs_dev_extent_chunk_offset(l, dev_extent); /* @@ -3449,8 +3444,8 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, dev_replace->cursor_right = found_key.offset + length; dev_replace->cursor_left = found_key.offset; dev_replace->item_needs_writeback = 1; - ret = scrub_chunk(sctx, scrub_dev, chunk_tree, chunk_objectid, - chunk_offset, length, found_key.offset, + ret = scrub_chunk(sctx, scrub_dev, chunk_offset, length, + found_key.offset, is_dev_replace); /* -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/4] btrfs: use scrub_pause_on/off() to reduce code in scrub_enumerate_chunks()
From: Zhao Lei Use new intruduced scrub_pause_on/off() can make this code block clean and more readable. Signed-off-by: Zhao Lei --- fs/btrfs/scrub.c | 10 +++--- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index cbfb8c7..a882a34 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3492,8 +3492,8 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); - atomic_inc(&fs_info->scrubs_paused); - wake_up(&fs_info->scrub_pause_wait); + + scrub_pause_on(fs_info); /* * must be called before we decrease @scrub_paused. @@ -3504,11 +3504,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, atomic_read(&sctx->workers_pending) == 0); atomic_set(&sctx->wr_ctx.flush_all_writes, 0); - mutex_lock(&fs_info->scrub_lock); - __scrub_blocked_if_needed(fs_info); - atomic_dec(&fs_info->scrubs_paused); - mutex_unlock(&fs_info->scrub_lock); - wake_up(&fs_info->scrub_pause_wait); + scrub_pause_off(fs_info); btrfs_put_block_group(cache); if (ret) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 4/4] btrfs: Fix data checksum error cause by replace with io-load.
From: Zhao Lei xfstests btrfs/070 sometimes failed. In my test machine, its fail rate is about 30%. In another vm(vmware), its fail rate is about 50%. Reason: btrfs/070 do replace and defrag with fsstress simultaneously, after above operation, checksum error is found by scrub. Actually, it have no relationship with defrag operation, only replace with fsstress can trigger this bug. New data writen to target device have possibility rewrited by old data from source device by replace code in debug, to avoid above problem, we can set target block group to readonly in replace period, so new data requested by other operation will not write to same place with replace code. Before patch(4.1-rc3): 30% failed in 100 xfstests. After patch: 0% failed in 300 xfstests. It also happened in btrfs/071. Changelog v2->v3: 1: Fix a typo(caused in rebase) which make xfstests failed in btrfs/073 and btrfs/066. 2: Rebase on top of integration-4.2 3: Do full xfstests(generic and btrfs group with 10 mount options) Changelog v1->v2: 1: Update subject to reflect the problem being fixed. 2: Update description to say reason why set read-only can fix the problem. 3: Use a helper function to avoid duplicated code block for set chunk ro. All of above are suggested by: David Sterba Reported-by: Qu Wenruo Suggested-by: Qu Wenruo Signed-off-by: Qu Wenruo Signed-off-by: Zhao Lei --- fs/btrfs/scrub.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index a882a34..3a49a43 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3467,6 +3467,18 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (!cache) goto skip; + /* +* we need call btrfs_inc_block_group_ro() with scrubs_paused, +* to avoid deadlock caused by: +* btrfs_inc_block_group_ro() +* -> btrfs_wait_for_commit() +* -> btrfs_commit_transaction() +* -> btrfs_scrub_pause() +*/ + scrub_pause_on(fs_info); + btrfs_inc_block_group_ro(root, cache); + scrub_pause_off(fs_info); + dev_replace->cursor_right = found_key.offset + length; dev_replace->cursor_left = found_key.offset; dev_replace->item_needs_writeback = 1; @@ -3506,6 +3518,8 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, scrub_pause_off(fs_info); + btrfs_dec_block_group_ro(root, cache); + btrfs_put_block_group(cache); if (ret) break; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/4] btrfs: Separate scrub_blocked_if_needed() to scrub_pause_on/off()
From: Zhao Lei It can reduce current duplicated code which is similar to scrub_blocked_if_needed() but can not call it because little different. It also used by my next patch which is in same case. Signed-off-by: Zhao Lei --- fs/btrfs/scrub.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 94db0fa..cbfb8c7 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -332,11 +332,14 @@ static void __scrub_blocked_if_needed(struct btrfs_fs_info *fs_info) } } -static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info) +static void scrub_pause_on(struct btrfs_fs_info *fs_info) { atomic_inc(&fs_info->scrubs_paused); wake_up(&fs_info->scrub_pause_wait); +} +static void scrub_pause_off(struct btrfs_fs_info *fs_info) +{ mutex_lock(&fs_info->scrub_lock); __scrub_blocked_if_needed(fs_info); atomic_dec(&fs_info->scrubs_paused); @@ -345,6 +348,12 @@ static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info) wake_up(&fs_info->scrub_pause_wait); } +static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info) +{ + scrub_pause_on(fs_info); + scrub_pause_off(fs_info); +} + /* * used for workers that require transaction commits (i.e., for the * NOCOW case) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/4] btrfs: Use ref_cnt for set_block_group_ro()
From: Zhao Lei More than one code call set_block_group_ro() and restore rw in fail. Old code use bool bit to save blockgroup's ro state, it can not support parallel case(it is confirmd exist in my debug log). This patch use ref count to store ro state, and rename set_block_group_ro/set_block_group_rw to inc_block_group_ro/dec_block_group_ro. Signed-off-by: Zhao Lei --- fs/btrfs/ctree.h | 6 +++--- fs/btrfs/extent-tree.c | 42 +- fs/btrfs/relocation.c | 14 ++ 3 files changed, 30 insertions(+), 32 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index aac314e..f57e6ca 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1300,7 +1300,7 @@ struct btrfs_block_group_cache { /* for raid56, this is a full stripe, without parity */ unsigned long full_stripe_len; - unsigned int ro:1; + unsigned int ro; unsigned int iref:1; unsigned int has_caching_ctl:1; unsigned int removed:1; @@ -3495,9 +3495,9 @@ int btrfs_cond_migrate_bytes(struct btrfs_fs_info *fs_info, void btrfs_block_rsv_release(struct btrfs_root *root, struct btrfs_block_rsv *block_rsv, u64 num_bytes); -int btrfs_set_block_group_ro(struct btrfs_root *root, +int btrfs_inc_block_group_ro(struct btrfs_root *root, struct btrfs_block_group_cache *cache); -void btrfs_set_block_group_rw(struct btrfs_root *root, +void btrfs_dec_block_group_ro(struct btrfs_root *root, struct btrfs_block_group_cache *cache); void btrfs_put_block_group_cache(struct btrfs_fs_info *info); u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 1c2bd17..a436bd5 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -8692,14 +8692,13 @@ static u64 update_block_group_flags(struct btrfs_root *root, u64 flags) return flags; } -static int set_block_group_ro(struct btrfs_block_group_cache *cache, int force) +static int inc_block_group_ro(struct btrfs_block_group_cache *cache, int force) { struct btrfs_space_info *sinfo = cache->space_info; u64 num_bytes; u64 min_allocable_bytes; int ret = -ENOSPC; - /* * We need some metadata space and system metadata space for * allocating chunks in some corner cases until we force to set @@ -8716,6 +8715,7 @@ static int set_block_group_ro(struct btrfs_block_group_cache *cache, int force) spin_lock(&cache->lock); if (cache->ro) { + cache->ro++; ret = 0; goto out; } @@ -8727,7 +8727,7 @@ static int set_block_group_ro(struct btrfs_block_group_cache *cache, int force) sinfo->bytes_may_use + sinfo->bytes_readonly + num_bytes + min_allocable_bytes <= sinfo->total_bytes) { sinfo->bytes_readonly += num_bytes; - cache->ro = 1; + cache->ro++; list_add_tail(&cache->ro_list, &sinfo->ro_bgs); ret = 0; } @@ -8737,7 +8737,7 @@ out: return ret; } -int btrfs_set_block_group_ro(struct btrfs_root *root, +int btrfs_inc_block_group_ro(struct btrfs_root *root, struct btrfs_block_group_cache *cache) { @@ -8745,8 +8745,6 @@ int btrfs_set_block_group_ro(struct btrfs_root *root, u64 alloc_flags; int ret; - BUG_ON(cache->ro); - again: trans = btrfs_join_transaction(root); if (IS_ERR(trans)) @@ -8789,7 +8787,7 @@ again: goto out; } - ret = set_block_group_ro(cache, 0); + ret = inc_block_group_ro(cache, 0); if (!ret) goto out; alloc_flags = get_alloc_profile(root, cache->space_info->flags); @@ -8797,7 +8795,7 @@ again: CHUNK_ALLOC_FORCE); if (ret < 0) goto out; - ret = set_block_group_ro(cache, 0); + ret = inc_block_group_ro(cache, 0); out: if (cache->flags & BTRFS_BLOCK_GROUP_SYSTEM) { alloc_flags = update_block_group_flags(root, cache->flags); @@ -8860,7 +8858,7 @@ u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo) return free_bytes; } -void btrfs_set_block_group_rw(struct btrfs_root *root, +void btrfs_dec_block_group_ro(struct btrfs_root *root, struct btrfs_block_group_cache *cache) { struct btrfs_space_info *sinfo = cache->space_info; @@ -8870,11 +8868,13 @@ void btrfs_set_block_group_rw(struct btrfs_root *root, spin_lock(&sinfo->lock); spin_lock(&cache->lock); - num_bytes = cache->key.offset - cache->reserved - cache->pinned - - cache->bytes_super - btrfs_block_group_used(&cache->item); - sinfo->bytes_read
[PATCH v3 0/4] btrfs: Fix data checksum error cause by replace with io-load
From: Zhao Lei This patchset is used to fix data checksum error cause by replace with io-load. It cause xfstests btrfs/070 and btrfs/071 sometimes failed. See description in [PATCH 4/4] for detail. Changelog v2->v3: 1: Fix a typo(caused in rebase) which make xfstests failed in btrfs/073 and btrfs/066. 2: Rebase on top of integration-4.2 3: Do full xfstests(generic and btrfs group with 10 mount options) Changelog v1->v2: 1: Update subject to reflect the problem being fixed. 2: Update description to say reason why set read-only can fix the problem. 3: Use a helper function to avoid duplicated code block for set chunk ro. All of above are suggested by: David Sterba Zhao Lei (4): btrfs: Use ref_cnt for set_block_group_ro() btrfs: Separate scrub_blocked_if_needed() to scrub_pause_on/off() btrfs: use scrub_pause_on/off() to reduce code in scrub_enumerate_chunks() btrfs: Fix data checksum error cause by replace with io-load. fs/btrfs/ctree.h | 6 +++--- fs/btrfs/extent-tree.c | 42 +- fs/btrfs/relocation.c | 14 ++ fs/btrfs/scrub.c | 35 +++ 4 files changed, 57 insertions(+), 40 deletions(-) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: add error handling for scrub_workers_get()
From: Zhao Lei Although it is a rare case, we'd better free previous allocated memory on error. Signed-off-by: Zhao Lei --- fs/btrfs/scrub.c | 31 --- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index ab58115..eb35176 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3559,7 +3559,6 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx, static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info, int is_dev_replace) { - int ret = 0; unsigned int flags = WQ_FREEZABLE | WQ_UNBOUND; int max_active = fs_info->thread_pool_size; @@ -3572,27 +3571,29 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info, fs_info->scrub_workers = btrfs_alloc_workqueue("btrfs-scrub", flags, max_active, 4); - if (!fs_info->scrub_workers) { - ret = -ENOMEM; - goto out; - } + if (!fs_info->scrub_workers) + goto fail_scrub_workers; + fs_info->scrub_wr_completion_workers = btrfs_alloc_workqueue("btrfs-scrubwrc", flags, max_active, 2); - if (!fs_info->scrub_wr_completion_workers) { - ret = -ENOMEM; - goto out; - } + if (!fs_info->scrub_wr_completion_workers) + goto fail_scrub_wr_completion_workers; + fs_info->scrub_nocow_workers = btrfs_alloc_workqueue("btrfs-scrubnc", flags, 1, 0); - if (!fs_info->scrub_nocow_workers) { - ret = -ENOMEM; - goto out; - } + if (!fs_info->scrub_nocow_workers) + goto fail_scrub_nocow_workers; } ++fs_info->scrub_workers_refcnt; -out: - return ret; + return 0; + +fail_scrub_nocow_workers: + btrfs_destroy_workqueue(fs_info->scrub_wr_completion_workers); +fail_scrub_wr_completion_workers: + btrfs_destroy_workqueue(fs_info->scrub_workers); +fail_scrub_workers: + return -ENOMEM; } static noinline_for_stack void scrub_workers_put(struct btrfs_fs_info *fs_info) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: [RFC] Don't use workqueue for raid56 parity scrub
From: Zhao Lei The code in workqueue only do fast initialization and submit a bio in end, plus, it will not called in interrupe context, no need to queue a work for this type of work. Call it directly will make code simple, easy to debug and reduce potential problem. Signed-off-by: Zhao Lei --- fs/btrfs/raid56.c | 19 +-- 1 file changed, 1 insertion(+), 18 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index fa72068..eea86d1 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -2557,7 +2557,7 @@ static void raid56_parity_scrub_end_io(struct bio *bio, int err) validate_rbio_for_parity_scrub(rbio); } -static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio) +static void async_scrub_parity(struct btrfs_raid_bio *rbio) { int bios_to_read = 0; struct bio_list bio_list; @@ -2646,23 +2646,6 @@ finish: validate_rbio_for_parity_scrub(rbio); } -static void scrub_parity_work(struct btrfs_work *work) -{ - struct btrfs_raid_bio *rbio; - - rbio = container_of(work, struct btrfs_raid_bio, work); - raid56_parity_scrub_stripe(rbio); -} - -static void async_scrub_parity(struct btrfs_raid_bio *rbio) -{ - btrfs_init_work(&rbio->work, btrfs_rmw_helper, - scrub_parity_work, NULL, NULL); - - btrfs_queue_work(rbio->fs_info->rmw_workers, -&rbio->work); -} - void raid56_parity_submit_scrub_rbio(struct btrfs_raid_bio *rbio) { if (!lock_stripe_add(rbio)) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Update out-of-date "skip parity stripe" comment
From: Zhao Lei Because btrfs support scrub raid56 parity stripe now. Signed-off-by: Zhao Lei --- fs/btrfs/scrub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index a13f91a..5ee5630 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3202,12 +3202,12 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, */ ret = 0; while (physical < physical_end) { - /* for raid56, we skip parity stripe */ if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { ret = get_raid56_logic_offset(physical, num, map, &logical, &stripe_logical); logical += base; if (ret) { + /* it is parity strip */ stripe_logical += base; stripe_end = stripe_logical + increment - 1; ret = scrub_raid56_parity(sctx, map, scrub_dev, -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: cleanup noused initialization of dev in btrfs_end_bio()
From: Zhao Lei It is introduced by: c404e0dc2c843b154f9a36c3aec10d0a715d88eb Btrfs: fix use-after-free in the finishing procedure of the device replace But seems no relationship with that bug, this patch revirt these code block for cleanup. Signed-off-by: Zhao Lei --- fs/btrfs/volumes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 174f5e1..1702adc 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5596,7 +5596,6 @@ static inline void btrfs_end_bbio(struct btrfs_bio *bbio, struct bio *bio, int e static void btrfs_end_bio(struct bio *bio, int err) { struct btrfs_bio *bbio = bio->bi_private; - struct btrfs_device *dev = bbio->stripes[0].dev; int is_orig_bio = 0; if (err) { @@ -5604,6 +5603,7 @@ static void btrfs_end_bio(struct bio *bio, int err) if (err == -EIO || err == -EREMOTEIO) { unsigned int stripe_index = btrfs_io_bio(bio)->stripe_index; + struct btrfs_device *dev; BUG_ON(stripe_index >= bbio->num_stripes); dev = bbio->stripes[stripe_index].dev; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] btrfs: Fix lockdep warning of wr_ctx->wr_lock in scrub_free_wr_ctx()
From: Zhao Lei lockdep report following warning in test: [25176.843958] = [25176.844519] [ INFO: inconsistent lock state ] [25176.845047] 4.1.0-rc3 #22 Tainted: GW [25176.845591] - [25176.846153] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [25176.846713] fsstress/26661 [HC0[0]:SC1[1]:HE1:SE0] takes: [25176.847246] (&wr_ctx->wr_lock){+.?...}, at: [] scrub_free_ctx+0x2d/0xf0 [btrfs] [25176.847838] {SOFTIRQ-ON-W} state was registered at: [25176.848396] [] __lock_acquire+0x6a0/0xe10 [25176.848955] [] lock_acquire+0xce/0x2c0 [25176.849491] [] mutex_lock_nested+0x7f/0x410 [25176.850029] [] scrub_stripe+0x4df/0x1080 [btrfs] [25176.850575] [] scrub_chunk.isra.19+0x111/0x130 [btrfs] [25176.851110] [] scrub_enumerate_chunks+0x27c/0x510 [btrfs] [25176.851660] [] btrfs_scrub_dev+0x1c7/0x6c0 [btrfs] [25176.852189] [] btrfs_dev_replace_start+0x36e/0x450 [btrfs] [25176.852771] [] btrfs_ioctl+0x1e10/0x2d20 [btrfs] [25176.853315] [] do_vfs_ioctl+0x318/0x570 [25176.853868] [] SyS_ioctl+0x41/0x80 [25176.854406] [] system_call_fastpath+0x12/0x6f [25176.854935] irq event stamp: 51506 [25176.855511] hardirqs last enabled at (51506): [] vprintk_emit+0x225/0x5e0 [25176.856059] hardirqs last disabled at (51505): [] vprintk_emit+0xb7/0x5e0 [25176.856642] softirqs last enabled at (50886): [] __do_softirq+0x363/0x640 [25176.857184] softirqs last disabled at (50949): [] irq_exit+0x10d/0x120 [25176.857746] other info that might help us debug this: [25176.858845] Possible unsafe locking scenario: [25176.859981]CPU0 [25176.860537] [25176.861059] lock(&wr_ctx->wr_lock); [25176.861705] [25176.862272] lock(&wr_ctx->wr_lock); [25176.862881] *** DEADLOCK *** Reason: Above warning is caused by: Interrupt -> bio_endio() -> ... -> scrub_put_ctx() -> scrub_free_ctx() *1 -> ... -> mutex_lock(&wr_ctx->wr_lock); scrub_put_ctx() is allowed to be called in end_bio interrupt, but in code design, it will never call scrub_free_ctx(sctx) in interrupe context(above *1), because btrfs_scrub_dev() get one additional reference of sctx->refs, which makes scrub_free_ctx() only called withine btrfs_scrub_dev(). Now the code runs out of our wish, because free sequence in scrub_pending_bio_dec() have a gap. Current code: ---+--- scrub_pending_bio_dec()| btrfs_scrub_dev ---+--- atomic_dec(&sctx->bios_in_flight); | wake_up(&sctx->list_wait); | | scrub_put_ctx() | -> atomic_dec_and_test(&sctx->refs) scrub_put_ctx(sctx); | -> atomic_dec_and_test(&sctx->refs)| -> scrub_free_ctx()| ---+--- We expected: ---+--- scrub_pending_bio_dec()| btrfs_scrub_dev ---+--- atomic_dec(&sctx->bios_in_flight); | wake_up(&sctx->list_wait); | scrub_put_ctx(sctx); | -> atomic_dec_and_test(&sctx->refs)| | scrub_put_ctx() | -> atomic_dec_and_test(&sctx->refs) | -> scrub_free_ctx() ---+--- Fix: Move scrub_pending_bio_dec() to a workqueue, to avoid this function run in interrupt context. Tested by check tracelog in debug. Changelog v1->v2: Use workqueue instead of adjust function call sequence in v1, because v1 will introduce a bug pointed out by: Filipe David Manana Reported-by: Qu Wenruo Signed-off-by: Zhao Lei --- fs/btrfs/async-thread.c | 1 + fs/btrfs/async-thread.h | 2 ++ fs/btrfs/ctree.h| 1 + fs/btrfs/scrub.c| 26 +++--- 4 files changed, 27 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index df9932b..1ce06c84 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -85,6 +85,7 @@ BTRFS_WORK_HELPER(extent_refs_helper); BTRFS_WORK_HELPER(scrub_helper); BTRFS_WORK_HELPER(scrubwrc_helper); BTRFS_WORK_HELPER(scrubnc_helper); +BTRFS_WORK_HELPER(scrubparity_helper); static struct __btrfs_workqueue * __btrfs_alloc_workqueue(const char *name, unsigned int flags, int max_active, diff --git a/fs/btrfs/async-thread.h b/fs/btrfs/async-thread.h index ec2ee47..b0b093b 100644 --- a/fs/btrfs/async-thread.h +++ b/fs/btrfs/async-thread.h @@ -64,6 +64,8 @@ BTRFS_WORK_HELPER_PROTO(extent_refs_helper); BTRFS_WORK_HELPER_PROTO(scrub_helper); BTRFS_WORK_HELPER_PROTO(scrubwrc_helper); BTRFS_WORK_HELPER_PROTO(scrubnc_helper); +BTRFS_WO
[PATCH] btrfs: Fix lockdep warning of wr_ctx->wr_lock in scrub_free_wr_ctx()
From: Zhao Lei lockdep report following warning in test: [25176.843958] = [25176.844519] [ INFO: inconsistent lock state ] [25176.845047] 4.1.0-rc3 #22 Tainted: GW [25176.845591] - [25176.846153] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [25176.846713] fsstress/26661 [HC0[0]:SC1[1]:HE1:SE0] takes: [25176.847246] (&wr_ctx->wr_lock){+.?...}, at: [] scrub_free_ctx+0x2d/0xf0 [btrfs] [25176.847838] {SOFTIRQ-ON-W} state was registered at: [25176.848396] [] __lock_acquire+0x6a0/0xe10 [25176.848955] [] lock_acquire+0xce/0x2c0 [25176.849491] [] mutex_lock_nested+0x7f/0x410 [25176.850029] [] scrub_stripe+0x4df/0x1080 [btrfs] [25176.850575] [] scrub_chunk.isra.19+0x111/0x130 [btrfs] [25176.851110] [] scrub_enumerate_chunks+0x27c/0x510 [btrfs] [25176.851660] [] btrfs_scrub_dev+0x1c7/0x6c0 [btrfs] [25176.852189] [] btrfs_dev_replace_start+0x36e/0x450 [btrfs] [25176.852771] [] btrfs_ioctl+0x1e10/0x2d20 [btrfs] [25176.853315] [] do_vfs_ioctl+0x318/0x570 [25176.853868] [] SyS_ioctl+0x41/0x80 [25176.854406] [] system_call_fastpath+0x12/0x6f [25176.854935] irq event stamp: 51506 [25176.855511] hardirqs last enabled at (51506): [] vprintk_emit+0x225/0x5e0 [25176.856059] hardirqs last disabled at (51505): [] vprintk_emit+0xb7/0x5e0 [25176.856642] softirqs last enabled at (50886): [] __do_softirq+0x363/0x640 [25176.857184] softirqs last disabled at (50949): [] irq_exit+0x10d/0x120 [25176.857746] other info that might help us debug this: [25176.858845] Possible unsafe locking scenario: [25176.859981]CPU0 [25176.860537] [25176.861059] lock(&wr_ctx->wr_lock); [25176.861705] [25176.862272] lock(&wr_ctx->wr_lock); [25176.862881] *** DEADLOCK *** Reason: Above warning is caused by: Interrupt -> bio_endio() -> ... -> scrub_put_ctx() -> scrub_free_ctx() *1 -> ... -> mutex_lock(&wr_ctx->wr_lock); scrub_put_ctx() is allowed to be called in end_bio interrupt, but in code design, it will never call scrub_free_ctx(sctx) in interrupe context(above *1), because btrfs_scrub_dev() get one additional reference of sctx->refs, which makes scrub_free_ctx() only called withine btrfs_scrub_dev(). Now the code runs out of our wish, because free sequence in scrub_pending_bio_dec() have a gap. Current code: ---+--- scrub_pending_bio_dec()| btrfs_scrub_dev ---+--- atomic_dec(&sctx->bios_in_flight); | wake_up(&sctx->list_wait); | | scrub_put_ctx() | -> atomic_dec_and_test(&sctx->refs) scrub_put_ctx(sctx); | -> atomic_dec_and_test(&sctx->refs)| -> scrub_free_ctx()| ---+--- We expected: ---+--- scrub_pending_bio_dec()| btrfs_scrub_dev ---+--- atomic_dec(&sctx->bios_in_flight); | wake_up(&sctx->list_wait); | scrub_put_ctx(sctx); | -> atomic_dec_and_test(&sctx->refs)| | scrub_put_ctx() | -> atomic_dec_and_test(&sctx->refs) | -> scrub_free_ctx() ---+--- Fix: To fix above problem, we can move scrub_put_ctx() to line before atomic_dec(&sctx->bios_in_flight) in scrub_pending_bio_dec(), to force scrub_put_ctx() in btrfs_scrub_dev() run after scrub_put_ctx() in scrub_pending_bio_dec(). Reported-by: Qu Wenruo Signed-off-by: Zhao Lei --- fs/btrfs/scrub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index ab58115..1b4b27c 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -317,9 +317,9 @@ static void scrub_pending_bio_inc(struct scrub_ctx *sctx) static void scrub_pending_bio_dec(struct scrub_ctx *sctx) { + scrub_put_ctx(sctx); atomic_dec(&sctx->bios_in_flight); wake_up(&sctx->list_wait); - scrub_put_ctx(sctx); } static void __scrub_blocked_if_needed(struct btrfs_fs_info *fs_info) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] btrfs: Use ref_cnt for set_block_group_ro()
From: Zhao Lei More than one code call set_block_group_ro() and restore rw in fail. Old code use bool bit to save blockgroup's ro state, it can not support parallel case(it is confirmd exist in my debug log). This patch use ref count to store ro state, and rename set_block_group_ro/set_block_group_rw to inc_block_group_ro/dec_block_group_ro. Signed-off-by: Zhao Lei --- fs/btrfs/ctree.h | 6 +++--- fs/btrfs/extent-tree.c | 42 +- fs/btrfs/relocation.c | 14 ++ 3 files changed, 30 insertions(+), 32 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 6f364e1..74ce6fc 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1300,7 +1300,7 @@ struct btrfs_block_group_cache { /* for raid56, this is a full stripe, without parity */ unsigned long full_stripe_len; - unsigned int ro:1; + unsigned int ro; unsigned int iref:1; unsigned int has_caching_ctl:1; unsigned int removed:1; @@ -3495,9 +3495,9 @@ int btrfs_cond_migrate_bytes(struct btrfs_fs_info *fs_info, void btrfs_block_rsv_release(struct btrfs_root *root, struct btrfs_block_rsv *block_rsv, u64 num_bytes); -int btrfs_set_block_group_ro(struct btrfs_root *root, +int btrfs_inc_block_group_ro(struct btrfs_root *root, struct btrfs_block_group_cache *cache); -void btrfs_set_block_group_rw(struct btrfs_root *root, +void btrfs_dec_block_group_ro(struct btrfs_root *root, struct btrfs_block_group_cache *cache); void btrfs_put_block_group_cache(struct btrfs_fs_info *info); u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 7effed6..6a82ba0 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -8751,14 +8751,13 @@ static u64 update_block_group_flags(struct btrfs_root *root, u64 flags) return flags; } -static int set_block_group_ro(struct btrfs_block_group_cache *cache, int force) +static int inc_block_group_ro(struct btrfs_block_group_cache *cache, int force) { struct btrfs_space_info *sinfo = cache->space_info; u64 num_bytes; u64 min_allocable_bytes; int ret = -ENOSPC; - /* * We need some metadata space and system metadata space for * allocating chunks in some corner cases until we force to set @@ -8775,6 +8774,7 @@ static int set_block_group_ro(struct btrfs_block_group_cache *cache, int force) spin_lock(&cache->lock); if (cache->ro) { + cache->ro++; ret = 0; goto out; } @@ -8786,7 +8786,7 @@ static int set_block_group_ro(struct btrfs_block_group_cache *cache, int force) sinfo->bytes_may_use + sinfo->bytes_readonly + num_bytes + min_allocable_bytes <= sinfo->total_bytes) { sinfo->bytes_readonly += num_bytes; - cache->ro = 1; + cache->ro++; list_add_tail(&cache->ro_list, &sinfo->ro_bgs); ret = 0; } @@ -8796,7 +8796,7 @@ out: return ret; } -int btrfs_set_block_group_ro(struct btrfs_root *root, +int btrfs_inc_block_group_ro(struct btrfs_root *root, struct btrfs_block_group_cache *cache) { @@ -8804,8 +8804,6 @@ int btrfs_set_block_group_ro(struct btrfs_root *root, u64 alloc_flags; int ret; - BUG_ON(cache->ro); - again: trans = btrfs_join_transaction(root); if (IS_ERR(trans)) @@ -8830,7 +8828,7 @@ again: } - ret = set_block_group_ro(cache, 0); + ret = inc_block_group_ro(cache, 0); if (!ret) goto out; alloc_flags = get_alloc_profile(root, cache->space_info->flags); @@ -8838,7 +8836,7 @@ again: CHUNK_ALLOC_FORCE); if (ret < 0) goto out; - ret = set_block_group_ro(cache, 0); + ret = inc_block_group_ro(cache, 0); out: if (cache->flags & BTRFS_BLOCK_GROUP_SYSTEM) { alloc_flags = update_block_group_flags(root, cache->flags); @@ -8899,7 +8897,7 @@ u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo) return free_bytes; } -void btrfs_set_block_group_rw(struct btrfs_root *root, +void btrfs_dec_block_group_ro(struct btrfs_root *root, struct btrfs_block_group_cache *cache) { struct btrfs_space_info *sinfo = cache->space_info; @@ -8909,11 +8907,13 @@ void btrfs_set_block_group_rw(struct btrfs_root *root, spin_lock(&sinfo->lock); spin_lock(&cache->lock); - num_bytes = cache->key.offset - cache->reserved - cache->pinned - - cache->bytes_super - btrfs_block_group_used(&cache->item); - sinfo->bytes_readonly -= num_bytes; - cache
[PATCH 3/4] btrfs: use scrub_pause_on/off() to reduce code in scrub_enumerate_chunks()
From: Zhao Lei Use new intruduced scrub_pause_on/off() can make this code block clean and more readable. Signed-off-by: Zhao Lei --- fs/btrfs/scrub.c | 10 +++--- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index a3d1546..8da3459 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3480,8 +3480,8 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); - atomic_inc(&fs_info->scrubs_paused); - wake_up(&fs_info->scrub_pause_wait); + + scrub_pause_on(fs_info); /* * must be called before we decrease @scrub_paused. @@ -3492,11 +3492,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, atomic_read(&sctx->workers_pending) == 0); atomic_set(&sctx->wr_ctx.flush_all_writes, 0); - mutex_lock(&fs_info->scrub_lock); - __scrub_blocked_if_needed(fs_info); - atomic_dec(&fs_info->scrubs_paused); - mutex_unlock(&fs_info->scrub_lock); - wake_up(&fs_info->scrub_pause_wait); + scrub_pause_off(fs_info); btrfs_put_block_group(cache); if (ret) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] btrfs: Separate scrub_blocked_if_needed() to scrub_pause_on/off()
From: Zhao Lei It can reduce current duplicated code which is similar to scrub_blocked_if_needed() but can not call it because little different. It also used by my next patch which is in same case. Signed-off-by: Zhao Lei --- fs/btrfs/scrub.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index ab58115..a3d1546 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -332,11 +332,14 @@ static void __scrub_blocked_if_needed(struct btrfs_fs_info *fs_info) } } -static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info) +static void scrub_pause_on(struct btrfs_fs_info *fs_info) { atomic_inc(&fs_info->scrubs_paused); wake_up(&fs_info->scrub_pause_wait); +} +static void scrub_pause_off(struct btrfs_fs_info *fs_info) +{ mutex_lock(&fs_info->scrub_lock); __scrub_blocked_if_needed(fs_info); atomic_dec(&fs_info->scrubs_paused); @@ -345,6 +348,12 @@ static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info) wake_up(&fs_info->scrub_pause_wait); } +static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info) +{ + scrub_pause_on(fs_info); + scrub_pause_off(fs_info); +} + /* * used for workers that require transaction commits (i.e., for the * NOCOW case) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] btrfs: Fix data checksum error cause by replace with io-load.
From: Zhao Lei xfstests btrfs/070 sometimes failed. In my test machine, its fail rate is about 30%. In another vm(vmware), its fail rate is about 50%. Reason: btrfs/070 do replace and defrag with fsstress simultaneously, after above operation, checksum error is found by scrub. Actually, it have no relationship with defrag operation, only replace with fsstress can trigger this bug. New data writen to target device have possibility rewrited by old data from source device by replace code in debug, to avoid above problem, we can set target block group to readonly in replace period, so new data requested by other operation will not write to same place with replace code. Before patch(4.1-rc3): 30% failed in 100 xfstests. After patch: 0% failed in 300 xfstests. Changelog v1->v2: 1: Update subject to reflect the problem being fixed. 2: Update description to say reason why set read-only can fix the problem. 3: Use a helper function to avoid duplicated code block for set chunk ro. All of above are suggested by: David Sterba Reported-by: Qu Wenruo Suggested-by: Qu Wenruo Signed-off-by: Qu Wenruo Signed-off-by: Zhao Lei --- fs/btrfs/scrub.c | 12 1 file changed, 12 insertions(+) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 8da3459..e1ebf43 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3455,6 +3455,18 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (!cache) goto skip; + /* +* we need call btrfs_inc_block_group_ro() with scrubs_paused, +* to avoid deadlock caused by: +* btrfs_inc_block_group_ro() +* -> btrfs_wait_for_commit() +* -> btrfs_commit_transaction() +* -> btrfs_scrub_pause() +*/ + scrub_pause_on(fs_info); + btrfs_inc_block_group_ro(root, cache); + scrub_pause_off(fs_info); + dev_replace->cursor_right = found_key.offset + length; dev_replace->cursor_left = found_key.offset; dev_replace->item_needs_writeback = 1; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs: Use ref_cnt for set_block_group_ro()
From: Zhao Lei More than one code call set_block_group_ro() and restore rw in fail. Old code use bool bit to save blockgroup's ro state, it can not support parallel case(it is confirmd exist in my debug log). This patch use ref count to store ro state, and rename set_block_group_ro/set_block_group_rw to inc_block_group_ro/dec_block_group_ro. Signed-off-by: Zhao Lei --- fs/btrfs/ctree.h | 6 +++--- fs/btrfs/extent-tree.c | 42 +- fs/btrfs/relocation.c | 14 ++ 3 files changed, 30 insertions(+), 32 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 6f364e1..74ce6fc 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1300,7 +1300,7 @@ struct btrfs_block_group_cache { /* for raid56, this is a full stripe, without parity */ unsigned long full_stripe_len; - unsigned int ro:1; + unsigned int ro; unsigned int iref:1; unsigned int has_caching_ctl:1; unsigned int removed:1; @@ -3495,9 +3495,9 @@ int btrfs_cond_migrate_bytes(struct btrfs_fs_info *fs_info, void btrfs_block_rsv_release(struct btrfs_root *root, struct btrfs_block_rsv *block_rsv, u64 num_bytes); -int btrfs_set_block_group_ro(struct btrfs_root *root, +int btrfs_inc_block_group_ro(struct btrfs_root *root, struct btrfs_block_group_cache *cache); -void btrfs_set_block_group_rw(struct btrfs_root *root, +void btrfs_dec_block_group_ro(struct btrfs_root *root, struct btrfs_block_group_cache *cache); void btrfs_put_block_group_cache(struct btrfs_fs_info *info); u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 7effed6..6a82ba0 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -8751,14 +8751,13 @@ static u64 update_block_group_flags(struct btrfs_root *root, u64 flags) return flags; } -static int set_block_group_ro(struct btrfs_block_group_cache *cache, int force) +static int inc_block_group_ro(struct btrfs_block_group_cache *cache, int force) { struct btrfs_space_info *sinfo = cache->space_info; u64 num_bytes; u64 min_allocable_bytes; int ret = -ENOSPC; - /* * We need some metadata space and system metadata space for * allocating chunks in some corner cases until we force to set @@ -8775,6 +8774,7 @@ static int set_block_group_ro(struct btrfs_block_group_cache *cache, int force) spin_lock(&cache->lock); if (cache->ro) { + cache->ro++; ret = 0; goto out; } @@ -8786,7 +8786,7 @@ static int set_block_group_ro(struct btrfs_block_group_cache *cache, int force) sinfo->bytes_may_use + sinfo->bytes_readonly + num_bytes + min_allocable_bytes <= sinfo->total_bytes) { sinfo->bytes_readonly += num_bytes; - cache->ro = 1; + cache->ro++; list_add_tail(&cache->ro_list, &sinfo->ro_bgs); ret = 0; } @@ -8796,7 +8796,7 @@ out: return ret; } -int btrfs_set_block_group_ro(struct btrfs_root *root, +int btrfs_inc_block_group_ro(struct btrfs_root *root, struct btrfs_block_group_cache *cache) { @@ -8804,8 +8804,6 @@ int btrfs_set_block_group_ro(struct btrfs_root *root, u64 alloc_flags; int ret; - BUG_ON(cache->ro); - again: trans = btrfs_join_transaction(root); if (IS_ERR(trans)) @@ -8830,7 +8828,7 @@ again: } - ret = set_block_group_ro(cache, 0); + ret = inc_block_group_ro(cache, 0); if (!ret) goto out; alloc_flags = get_alloc_profile(root, cache->space_info->flags); @@ -8838,7 +8836,7 @@ again: CHUNK_ALLOC_FORCE); if (ret < 0) goto out; - ret = set_block_group_ro(cache, 0); + ret = inc_block_group_ro(cache, 0); out: if (cache->flags & BTRFS_BLOCK_GROUP_SYSTEM) { alloc_flags = update_block_group_flags(root, cache->flags); @@ -8899,7 +8897,7 @@ u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo) return free_bytes; } -void btrfs_set_block_group_rw(struct btrfs_root *root, +void btrfs_dec_block_group_ro(struct btrfs_root *root, struct btrfs_block_group_cache *cache) { struct btrfs_space_info *sinfo = cache->space_info; @@ -8909,11 +8907,13 @@ void btrfs_set_block_group_rw(struct btrfs_root *root, spin_lock(&sinfo->lock); spin_lock(&cache->lock); - num_bytes = cache->key.offset - cache->reserved - cache->pinned - - cache->bytes_super - btrfs_block_group_used(&cache->item); - sinfo->bytes_readonly -= num_bytes; - cache
[PATCH 2/2] btrfs: Fix xfstests btrfs/070
From: Zhao Lei xfstests btrfs/070 sometimes failed. In my test machine, its fail rate is about 30%. In another vm(vmware), its fail rate is about 50%. Reason: btrfs/070 do replace and defrag with fsstress simultaneously, after above operation, checksum error is found by scrub. Actually, it have no relationship with defrag operation, only replace with fsstress can trigger this bug. New data writen to target device have possibility rewrited by old data from source device by replace code in debug, and can be fixed by set chunk to ro in replace operation. Before patch(4.1-rc3): 30% failed in 100 xfstests. After patch: 0% failed in 300 xfstests. Signed-off-by: Qu Wenruo Signed-off-by: Zhao Lei --- fs/btrfs/scrub.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index ab58115..469c8a5 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3446,6 +3446,23 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (!cache) goto skip; + /* +* we need call btrfs_inc_block_group_ro() with scrubs_paused, +* to avoid deadlock caused by: +* btrfs_inc_block_group_ro() +* -> btrfs_wait_for_commit() +* -> btrfs_commit_transaction() +* -> btrfs_scrub_pause() +*/ + atomic_inc(&fs_info->scrubs_paused); + wake_up(&fs_info->scrub_pause_wait); + btrfs_inc_block_group_ro(root, cache); + mutex_lock(&fs_info->scrub_lock); + __scrub_blocked_if_needed(fs_info); + atomic_dec(&fs_info->scrubs_paused); + mutex_unlock(&fs_info->scrub_lock); + wake_up(&fs_info->scrub_pause_wait); + dev_replace->cursor_right = found_key.offset + length; dev_replace->cursor_left = found_key.offset; dev_replace->item_needs_writeback = 1; @@ -3489,6 +3506,8 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, mutex_unlock(&fs_info->scrub_lock); wake_up(&fs_info->scrub_pause_wait); + btrfs_dec_block_group_ro(root, cache); + btrfs_put_block_group(cache); if (ret) break; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/9] btrfs: cleanup unused alloc_chunk varible
From: Zhao Lei Remove int alloc_chunk in btrfs_check_data_free_space() for not necessary. Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index d5ec383..b009987 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3641,7 +3641,7 @@ int btrfs_check_data_free_space(struct inode *inode, u64 bytes) struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_fs_info *fs_info = root->fs_info; u64 used; - int ret = 0, need_commit = 2, have_pinned_space, alloc_chunk = 1; + int ret = 0, need_commit = 2, have_pinned_space; /* make sure bytes are sectorsize aligned */ bytes = ALIGN(bytes, root->sectorsize); @@ -3669,7 +3669,7 @@ again: * if we don't have enough free bytes in this space then we need * to alloc a new chunk. */ - if (!data_sinfo->full && alloc_chunk) { + if (!data_sinfo->full) { u64 alloc_target; data_sinfo->force_alloc = CHUNK_ALLOC_FORCE; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/9] btrfs: add WARN_ON() to check is space_info op current
From: Zhao Lei space_info's value calculation is some complex and easy to cause bug, add WARN_ON() to help debug. Changelog v1->v2: Put WARN_ON()s under the ENOSPC_DEBUG mount option. Suggested by: David Sterba Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index e04ea1f..203ac63 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9464,9 +9464,19 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, spin_lock(&block_group->space_info->lock); list_del_init(&block_group->ro_list); + + if (btrfs_test_opt(root, ENOSPC_DEBUG)) { + WARN_ON(block_group->space_info->total_bytes + < block_group->key.offset); + WARN_ON(block_group->space_info->bytes_readonly + < block_group->key.offset); + WARN_ON(block_group->space_info->disk_total + < block_group->key.offset * factor); + } block_group->space_info->total_bytes -= block_group->key.offset; block_group->space_info->bytes_readonly -= block_group->key.offset; block_group->space_info->disk_total -= block_group->key.offset * factor; + spin_unlock(&block_group->space_info->lock); memcpy(&key, &block_group->key, sizeof(key)); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/9] btrfs: Fix NO_SPACE bug caused by delayed-iput
From: Zhao Lei Steps to reproduce: while true; do dd if=/dev/zero of=/btrfs_dir/file count=[fs_size * 75%] rm /btrfs_dir/file sync done And we'll see dd failed because btrfs return NO_SPACE. Reason: Normally, btrfs_commit_transaction() call btrfs_run_delayed_iputs() in end to free fs space for next write, but sometimes it hadn't done work on time, because btrfs-cleaner thread get delayed-iputs from list before, but do iput() after next write. This is log: [ 2569.050776] comm=btrfs-cleaner func=btrfs_evict_inode() begin [ 2569.084280] comm=sync func=btrfs_commit_transaction() call btrfs_run_delayed_iputs() [ 2569.085418] comm=sync func=btrfs_commit_transaction() done btrfs_run_delayed_iputs() [ 2569.087554] comm=sync func=btrfs_commit_transaction() end [ 2569.191081] comm=dd begin [ 2569.790112] comm=dd func=__btrfs_buffered_write() ret=-28 [ 2569.847479] comm=btrfs-cleaner func=add_pinned_bytes() 0 + 32677888 = 32677888 [ 2569.849530] comm=btrfs-cleaner func=add_pinned_bytes() 32677888 + 23834624 = 56512512 ... [ 2569.903893] comm=btrfs-cleaner func=add_pinned_bytes() 943976448 + 21762048 = 965738496 [ 2569.908270] comm=btrfs-cleaner func=btrfs_evict_inode() end Fix: Make btrfs_commit_transaction() wait current running btrfs-cleaner's delayed-iputs() done in end. Test: Use script similar to above(more complex), before patch: 7 failed in 100 * 20 loop. after patch: 0 failed in 100 * 20 loop. Signed-off-by: Zhao Lei --- fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c | 3 ++- fs/btrfs/extent-tree.c | 6 ++ fs/btrfs/inode.c | 4 4 files changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index f9c89ca..54d4d78 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1513,6 +1513,7 @@ struct btrfs_fs_info { spinlock_t delayed_iput_lock; struct list_head delayed_iputs; + struct rw_semaphore delayed_iput_sem; /* this protects tree_mod_seq_list */ spinlock_t tree_mod_seq_lock; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 639f266..6867471 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2241,11 +2241,12 @@ int open_ctree(struct super_block *sb, spin_lock_init(&fs_info->qgroup_op_lock); spin_lock_init(&fs_info->buffer_lock); spin_lock_init(&fs_info->unused_bgs_lock); - mutex_init(&fs_info->unused_bg_unpin_mutex); rwlock_init(&fs_info->tree_mod_log_lock); + mutex_init(&fs_info->unused_bg_unpin_mutex); mutex_init(&fs_info->reloc_mutex); mutex_init(&fs_info->delalloc_root_mutex); seqlock_init(&fs_info->profiles_lock); + init_rwsem(&fs_info->delayed_iput_sem); init_completion(&fs_info->kobj_unregister); INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 203ac63..6fd7dca 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3732,6 +3732,12 @@ commit_trans: ret = btrfs_commit_transaction(trans, root); if (ret) return ret; + /* +* make sure that all running delayed iput are +* done +*/ + down_write(&root->fs_info->delayed_iput_sem); + up_write(&root->fs_info->delayed_iput_sem); goto again; } else { btrfs_end_transaction(trans, root); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d2e732d..34d10be 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3110,6 +3110,8 @@ void btrfs_run_delayed_iputs(struct btrfs_root *root) if (empty) return; + down_read(&fs_info->delayed_iput_sem); + spin_lock(&fs_info->delayed_iput_lock); list_splice_init(&fs_info->delayed_iputs, &list); spin_unlock(&fs_info->delayed_iput_lock); @@ -3120,6 +3122,8 @@ void btrfs_run_delayed_iputs(struct btrfs_root *root) iput(delayed->inode); kfree(delayed); } + + up_read(&root->fs_info->delayed_iput_sem); } /* -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/9] btrfs: Fix tail space processing in find_free_dev_extent()
From: Zhao Lei It is another reason for NO_SPACE case. When we found enough free space in loop and saved them to max_hole_start/size before, and tail space contains pending extent, origional innocent max_hole_start/size are reset in retry. As a result, find_free_dev_extent() returns less space than it can, and cause NO_SPACE in user program. Reviewed-by: Liu Bo Signed-off-by: Zhao Lei --- fs/btrfs/volumes.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8222f6f..586824a 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1136,11 +1136,11 @@ int find_free_dev_extent(struct btrfs_trans_handle *trans, path = btrfs_alloc_path(); if (!path) return -ENOMEM; -again: + max_hole_start = search_start; max_hole_size = 0; - hole_size = 0; +again: if (search_start >= search_end || device->is_tgtdev_for_dev_replace) { ret = -ENOSPC; goto out; @@ -1233,21 +1233,23 @@ next: * allocated dev extents, and when shrinking the device, * search_end may be smaller than search_start. */ - if (search_end > search_start) + if (search_end > search_start) { hole_size = search_end - search_start; - if (hole_size > max_hole_size) { - max_hole_start = search_start; - max_hole_size = hole_size; - } + if (contains_pending_extent(trans, device, &search_start, + hole_size)) { + btrfs_release_path(path); + goto again; + } - if (contains_pending_extent(trans, device, &search_start, hole_size)) { - btrfs_release_path(path); - goto again; + if (hole_size > max_hole_size) { + max_hole_start = search_start; + max_hole_size = hole_size; + } } /* See above. */ - if (hole_size < num_bytes) + if (max_hole_size < num_bytes) ret = -ENOSPC; else ret = 0; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/9] btrfs: wait for delayed iputs on no space
From: Zhao Lei btrfs will report no_space when we run following write and delete file loop: # FILE_SIZE_M=[ 75% of fs space ] # DEV=[ some dev ] # MNT=[ some dir ] # # mkfs.btrfs -f "$DEV" # mount -o nodatacow "$DEV" "$MNT" # for ((i = 0; i < 100; i++)); do dd if=/dev/zero of="$MNT"/file0 bs=1M count="$FILE_SIZE_M"; rm -f "$MNT"/file0; done # Reason: iput() and evict() is run after write pages to block device, if write pages work is not finished before next write, the "rm"ed space is not freed, and caused above bug. Fix: We can add "-o flushoncommit" mount option to avoid above bug, but it have performance problem. Actually, we can to wait for on-the-fly writes only when no-space happened, it is which this patch do. Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 0572f14..d5ec383 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3725,6 +3725,9 @@ commit_trans: !atomic_read(&root->fs_info->open_ioctl_trans)) { need_commit--; + if (need_commit > 0) + btrfs_wait_ordered_roots(fs_info, -1); + trans = btrfs_join_transaction(root); if (IS_ERR(trans)) return PTR_ERR(trans); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/9] btrfs: Adjust commit-transaction condition to avoid NO_SPACE more
From: Zhao Lei If we have any chance to make a successful write, we should not give up. This patch adjust commit-transaction condition from: pinned >= wanted to left + pinned >= wanted Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ebeedb4..644468b 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3713,7 +3713,8 @@ alloc: * don't bother committing the transaction. */ if (percpu_counter_compare(&data_sinfo->total_bytes_pinned, - bytes) < 0) + used + bytes - + data_sinfo->total_bytes) < 0) have_pinned_space = 0; spin_unlock(&data_sinfo->lock); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/9] btrfs: Support busy loop of write and delete
From: Zhao Lei Reproduce: while true; do dd if=/dev/zero of=/mnt/btrfs/file count=[75% fs_size] rm /mnt/btrfs/file done Then we can see above loop failed on NO_SPACE. It it long-term problem since very beginning, because delayed-iput after rm are not run. We already have commit_transaction() in alloc_space code, but it is not triggered in above case. This patch trigger commit_transaction() to run delayed-iput and reflash pinned-space to to make write success. It is based on previous fix of delayed-iput in commit_transaction(), need to be applied on top of: btrfs: Fix NO_SPACE bug caused by delayed-iput Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 6fd7dca..0572f14 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3641,13 +3641,13 @@ int btrfs_check_data_free_space(struct inode *inode, u64 bytes) struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_fs_info *fs_info = root->fs_info; u64 used; - int ret = 0, committed = 0, have_pinned_space = 1, alloc_chunk = 1; + int ret = 0, need_commit = 2, have_pinned_space, alloc_chunk = 1; /* make sure bytes are sectorsize aligned */ bytes = ALIGN(bytes, root->sectorsize); if (btrfs_is_free_space_inode(inode)) { - committed = 1; + need_commit = 0; ASSERT(current->journal_info); } @@ -3697,8 +3697,10 @@ alloc: if (ret < 0) { if (ret != -ENOSPC) return ret; - else + else { + have_pinned_space = 1; goto commit_trans; + } } if (!data_sinfo) @@ -3712,23 +3714,23 @@ alloc: * allocation, and no removed chunk in current transaction, * don't bother committing the transaction. */ - if (percpu_counter_compare(&data_sinfo->total_bytes_pinned, - used + bytes - - data_sinfo->total_bytes) < 0) - have_pinned_space = 0; + have_pinned_space = percpu_counter_compare( + &data_sinfo->total_bytes_pinned, + used + bytes - data_sinfo->total_bytes); spin_unlock(&data_sinfo->lock); /* commit the current transaction and try again */ commit_trans: - if (!committed && + if (need_commit && !atomic_read(&root->fs_info->open_ioctl_trans)) { - committed = 1; + need_commit--; trans = btrfs_join_transaction(root); if (IS_ERR(trans)) return PTR_ERR(trans); - if (have_pinned_space || - trans->transaction->have_free_bgs) { + if (have_pinned_space >= 0 || + trans->transaction->have_free_bgs || + need_commit > 0) { ret = btrfs_commit_transaction(trans, root); if (ret) return ret; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/9] btrfs: Set relative data on clear btrfs_block_group_cache->pinned
From: Zhao Lei Bug1: space_info->bytes_readonly was set to very large(negative) value in btrfs_remove_block_group(). Reason: Current code set block_group_cache->pinned = 0 in btrfs_delete_unused_bgs(), but above space was not counted to space_info->bytes_readonly. Then in btrfs_remove_block_group(): block_group->space_info->bytes_readonly -= block_group->key.offset; We can see following value in trace: btrfs_remove_block_group: pid=2677 comm=btrfs-cleaner WARNING: bytes_readonly=12582912, key.offset=134217728 Bug2: space_info->total_bytes_pinned grow to value larger than fs size. In a 1.2G fs, we can get following trace log: at first: ZL_DEBUG: add_pinned_bytes: pid=2710 comm=sync change total_bytes_pinned flags=1 869793792 + 95944704 = 965738496 after some op: ZL_DEBUG: add_pinned_bytes: pid=2770 comm=sync change total_bytes_pinned flags=1 1780178944 + 95944704 = 1876123648 after some op: ZL_DEBUG: add_pinned_bytes: pid=3193 comm=sync change total_bytes_pinned flags=1 2924568576 + 95551488 = 3020120064 ... Reason: Similar to bug1, we also need to adjust space_info->total_bytes_pinned in above code block. Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 644468b..e04ea1f 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9654,8 +9654,18 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) mutex_unlock(&fs_info->unused_bg_unpin_mutex); /* Reset pinned so btrfs_put_block_group doesn't complain */ + spin_lock(&space_info->lock); + spin_lock(&block_group->lock); + + space_info->bytes_pinned -= block_group->pinned; + space_info->bytes_readonly += block_group->pinned; + percpu_counter_add(&space_info->total_bytes_pinned, + -block_group->pinned); block_group->pinned = 0; + spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); + /* * Btrfs_remove_chunk will abort the transaction if things go * horribly wrong. -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/9] btrfs: fix condition of commit transaction
From: Zhao Lei Old code bypass commit transaction when we don't have enough pinned space, but another case is there exist freed bgs in current transction, it have possibility to make alloc_chunk success. This patch modify the condition to: if (have_free_bg || have_pinned_space) commit_transaction() Confirmed above action by printk before and after patch. Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 8b353ad..ebeedb4 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3641,7 +3641,7 @@ int btrfs_check_data_free_space(struct inode *inode, u64 bytes) struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_fs_info *fs_info = root->fs_info; u64 used; - int ret = 0, committed = 0, alloc_chunk = 1; + int ret = 0, committed = 0, have_pinned_space = 1, alloc_chunk = 1; /* make sure bytes are sectorsize aligned */ bytes = ALIGN(bytes, root->sectorsize); @@ -3709,11 +3709,12 @@ alloc: /* * If we don't have enough pinned space to deal with this -* allocation don't bother committing the transaction. +* allocation, and no removed chunk in current transaction, +* don't bother committing the transaction. */ if (percpu_counter_compare(&data_sinfo->total_bytes_pinned, bytes) < 0) - committed = 1; + have_pinned_space = 0; spin_unlock(&data_sinfo->lock); /* commit the current transaction and try again */ @@ -3725,10 +3726,15 @@ commit_trans: trans = btrfs_join_transaction(root); if (IS_ERR(trans)) return PTR_ERR(trans); - ret = btrfs_commit_transaction(trans, root); - if (ret) - return ret; - goto again; + if (have_pinned_space || + trans->transaction->have_free_bgs) { + ret = btrfs_commit_transaction(trans, root); + if (ret) + return ret; + goto again; + } else { + btrfs_end_transaction(trans, root); + } } trace_btrfs_space_reservation(root->fs_info, -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/9] btrfs: Fix no_space on dd and rm loop
From: Zhao Lei This is v2 of resend-fix-no-space. Most of them are send in single patch, I resend them in patchset to make it easy to access. Notice that "Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole" from Forrest Liu in: https://patchwork.kernel.org/patch/5800231/ is also need to fix all known no_space bug. Changelog v1->v2: 1: Rebased on top of v4.0-rc7 2: Fixed a lock problem reported by: 'Tsutomu Itoh' 3: Add Reviewed-by: Liu Bo to [PATCH 2/9] btrfs: Tested by busy dd and rm loop script in 2000 times. I'll add xfstests for this case later. This is available at fix_no_space branch on my tree: git://github.com/zhaoleidd/btrfs.git It is also included in integration-for-chris branch in above tree. Thanks Zhaolei Zhao Lei (9): btrfs: fix condition of commit transaction btrfs: Fix tail space processing in find_free_dev_extent() btrfs: Adjust commit-transaction condition to avoid NO_SPACE more btrfs: Set relative data on clear btrfs_block_group_cache->pinned btrfs: add WARN_ON() to check is space_info op current btrfs: Fix NO_SPACE bug caused by delayed-iput btrfs: Support busy loop of write and delete btrfs: wait for delayed iputs on no space btrfs: cleanup unused alloc_chunk varible fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c | 3 ++- fs/btrfs/extent-tree.c | 66 +++--- fs/btrfs/inode.c | 4 +++ fs/btrfs/volumes.c | 24 +- 5 files changed, 72 insertions(+), 26 deletions(-) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/9] btrfs: fix condition of commit transaction
From: Zhao Lei Old code bypass commit transaction when we don't have enough pinned space, but another case is there exist freed bgs in current transction, it have possibility to make alloc_chunk success. This patch modify the condition to: if (have_free_bg || have_pinned_space) commit_transaction() Confirmed above action by printk before and after patch. Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 8b353ad..ebeedb4 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3641,7 +3641,7 @@ int btrfs_check_data_free_space(struct inode *inode, u64 bytes) struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_fs_info *fs_info = root->fs_info; u64 used; - int ret = 0, committed = 0, alloc_chunk = 1; + int ret = 0, committed = 0, have_pinned_space = 1, alloc_chunk = 1; /* make sure bytes are sectorsize aligned */ bytes = ALIGN(bytes, root->sectorsize); @@ -3709,11 +3709,12 @@ alloc: /* * If we don't have enough pinned space to deal with this -* allocation don't bother committing the transaction. +* allocation, and no removed chunk in current transaction, +* don't bother committing the transaction. */ if (percpu_counter_compare(&data_sinfo->total_bytes_pinned, bytes) < 0) - committed = 1; + have_pinned_space = 0; spin_unlock(&data_sinfo->lock); /* commit the current transaction and try again */ @@ -3725,10 +3726,15 @@ commit_trans: trans = btrfs_join_transaction(root); if (IS_ERR(trans)) return PTR_ERR(trans); - ret = btrfs_commit_transaction(trans, root); - if (ret) - return ret; - goto again; + if (have_pinned_space || + trans->transaction->have_free_bgs) { + ret = btrfs_commit_transaction(trans, root); + if (ret) + return ret; + goto again; + } else { + btrfs_end_transaction(trans, root); + } } trace_btrfs_space_reservation(root->fs_info, -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/9] btrfs: Adjust commit-transaction condition to avoid NO_SPACE more
From: Zhao Lei If we have any chance to make a successful write, we should not give up. This patch adjust commit-transaction condition from: pinned >= wanted to left + pinned >= wanted Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ebeedb4..644468b 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3713,7 +3713,8 @@ alloc: * don't bother committing the transaction. */ if (percpu_counter_compare(&data_sinfo->total_bytes_pinned, - bytes) < 0) + used + bytes - + data_sinfo->total_bytes) < 0) have_pinned_space = 0; spin_unlock(&data_sinfo->lock); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/9] btrfs: add WARN_ON() to check is space_info op current
From: Zhao Lei space_info's value calculation is some complex and easy to cause bug, add WARN_ON() to help debug. Changelog v1->v2: Put WARN_ON()s under the ENOSPC_DEBUG mount option. Suggested by: David Sterba Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index e04ea1f..203ac63 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9464,9 +9464,19 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, spin_lock(&block_group->space_info->lock); list_del_init(&block_group->ro_list); + + if (btrfs_test_opt(root, ENOSPC_DEBUG)) { + WARN_ON(block_group->space_info->total_bytes + < block_group->key.offset); + WARN_ON(block_group->space_info->bytes_readonly + < block_group->key.offset); + WARN_ON(block_group->space_info->disk_total + < block_group->key.offset * factor); + } block_group->space_info->total_bytes -= block_group->key.offset; block_group->space_info->bytes_readonly -= block_group->key.offset; block_group->space_info->disk_total -= block_group->key.offset * factor; + spin_unlock(&block_group->space_info->lock); memcpy(&key, &block_group->key, sizeof(key)); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/9] btrfs: Support busy loop of write and delete
From: Zhao Lei Reproduce: while true; do dd if=/dev/zero of=/mnt/btrfs/file count=[75% fs_size] rm /mnt/btrfs/file done Then we can see above loop failed on NO_SPACE. It it long-term problem since very beginning, because delayed-iput after rm are not run. We already have commit_transaction() in alloc_space code, but it is not triggered in above case. This patch trigger commit_transaction() to run delayed-iput and reflash pinned-space to to make write success. It is based on previous fix of delayed-iput in commit_transaction(), need to be applied on top of: btrfs: Fix NO_SPACE bug caused by delayed-iput Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 203ac63..5683736 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3641,13 +3641,13 @@ int btrfs_check_data_free_space(struct inode *inode, u64 bytes) struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_fs_info *fs_info = root->fs_info; u64 used; - int ret = 0, committed = 0, have_pinned_space = 1, alloc_chunk = 1; + int ret = 0, need_commit = 2, have_pinned_space, alloc_chunk = 1; /* make sure bytes are sectorsize aligned */ bytes = ALIGN(bytes, root->sectorsize); if (btrfs_is_free_space_inode(inode)) { - committed = 1; + need_commit = 0; ASSERT(current->journal_info); } @@ -3697,8 +3697,10 @@ alloc: if (ret < 0) { if (ret != -ENOSPC) return ret; - else + else { + have_pinned_space = 1; goto commit_trans; + } } if (!data_sinfo) @@ -3712,23 +3714,23 @@ alloc: * allocation, and no removed chunk in current transaction, * don't bother committing the transaction. */ - if (percpu_counter_compare(&data_sinfo->total_bytes_pinned, - used + bytes - - data_sinfo->total_bytes) < 0) - have_pinned_space = 0; + have_pinned_space = percpu_counter_compare( + &data_sinfo->total_bytes_pinned, + used + bytes - data_sinfo->total_bytes); spin_unlock(&data_sinfo->lock); /* commit the current transaction and try again */ commit_trans: - if (!committed && + if (need_commit && !atomic_read(&root->fs_info->open_ioctl_trans)) { - committed = 1; + need_commit--; trans = btrfs_join_transaction(root); if (IS_ERR(trans)) return PTR_ERR(trans); - if (have_pinned_space || - trans->transaction->have_free_bgs) { + if (have_pinned_space >= 0 || + trans->transaction->have_free_bgs || + need_commit > 0) { ret = btrfs_commit_transaction(trans, root); if (ret) return ret; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/9] btrfs: Set relative data on clear btrfs_block_group_cache->pinned
From: Zhao Lei Bug1: space_info->bytes_readonly was set to very large(negative) value in btrfs_remove_block_group(). Reason: Current code set block_group_cache->pinned = 0 in btrfs_delete_unused_bgs(), but above space was not counted to space_info->bytes_readonly. Then in btrfs_remove_block_group(): block_group->space_info->bytes_readonly -= block_group->key.offset; We can see following value in trace: btrfs_remove_block_group: pid=2677 comm=btrfs-cleaner WARNING: bytes_readonly=12582912, key.offset=134217728 Bug2: space_info->total_bytes_pinned grow to value larger than fs size. In a 1.2G fs, we can get following trace log: at first: ZL_DEBUG: add_pinned_bytes: pid=2710 comm=sync change total_bytes_pinned flags=1 869793792 + 95944704 = 965738496 after some op: ZL_DEBUG: add_pinned_bytes: pid=2770 comm=sync change total_bytes_pinned flags=1 1780178944 + 95944704 = 1876123648 after some op: ZL_DEBUG: add_pinned_bytes: pid=3193 comm=sync change total_bytes_pinned flags=1 2924568576 + 95551488 = 3020120064 ... Reason: Similar to bug1, we also need to adjust space_info->total_bytes_pinned in above code block. Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 644468b..e04ea1f 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9654,8 +9654,18 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) mutex_unlock(&fs_info->unused_bg_unpin_mutex); /* Reset pinned so btrfs_put_block_group doesn't complain */ + spin_lock(&space_info->lock); + spin_lock(&block_group->lock); + + space_info->bytes_pinned -= block_group->pinned; + space_info->bytes_readonly += block_group->pinned; + percpu_counter_add(&space_info->total_bytes_pinned, + -block_group->pinned); block_group->pinned = 0; + spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); + /* * Btrfs_remove_chunk will abort the transaction if things go * horribly wrong. -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/9] btrfs: cleanup unused alloc_chunk varible
From: Zhao Lei Remove int alloc_chunk in btrfs_check_data_free_space() for not necessary. Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 05747d2..b83060f 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3641,7 +3641,7 @@ int btrfs_check_data_free_space(struct inode *inode, u64 bytes) struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_fs_info *fs_info = root->fs_info; u64 used; - int ret = 0, need_commit = 2, have_pinned_space, alloc_chunk = 1; + int ret = 0, need_commit = 2, have_pinned_space; /* make sure bytes are sectorsize aligned */ bytes = ALIGN(bytes, root->sectorsize); @@ -3669,7 +3669,7 @@ again: * if we don't have enough free bytes in this space then we need * to alloc a new chunk. */ - if (!data_sinfo->full && alloc_chunk) { + if (!data_sinfo->full) { u64 alloc_target; data_sinfo->force_alloc = CHUNK_ALLOC_FORCE; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/9] btrfs: wait for delayed iputs on no space
From: Zhao Lei btrfs will report no_space when we run following write and delete file loop: # FILE_SIZE_M=[ 75% of fs space ] # DEV=[ some dev ] # MNT=[ some dir ] # # mkfs.btrfs -f "$DEV" # mount -o nodatacow "$DEV" "$MNT" # for ((i = 0; i < 100; i++)); do dd if=/dev/zero of="$MNT"/file0 bs=1M count="$FILE_SIZE_M"; rm -f "$MNT"/file0; done # Reason: iput() and evict() is run after write pages to block device, if write pages work is not finished before next write, the "rm"ed space is not freed, and caused above bug. Fix: We can add "-o flushoncommit" mount option to avoid above bug, but it have performance problem. Actually, we can to wait for on-the-fly writes only when no-space happened, it is which this patch do. Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 5683736..05747d2 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3725,6 +3725,9 @@ commit_trans: !atomic_read(&root->fs_info->open_ioctl_trans)) { need_commit--; + if (need_commit > 0) + btrfs_wait_ordered_roots(fs_info, -1); + trans = btrfs_join_transaction(root); if (IS_ERR(trans)) return PTR_ERR(trans); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/9] btrfs: Fix tail space processing in find_free_dev_extent()
From: Zhao Lei It is another reason for NO_SPACE case. When we found enough free space in loop and saved them to max_hole_start/size before, and tail space contains pending extent, origional innocent max_hole_start/size are reset in retry. As a result, find_free_dev_extent() returns less space than it can, and cause NO_SPACE in user program. Signed-off-by: Zhao Lei --- fs/btrfs/volumes.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8222f6f..586824a 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1136,11 +1136,11 @@ int find_free_dev_extent(struct btrfs_trans_handle *trans, path = btrfs_alloc_path(); if (!path) return -ENOMEM; -again: + max_hole_start = search_start; max_hole_size = 0; - hole_size = 0; +again: if (search_start >= search_end || device->is_tgtdev_for_dev_replace) { ret = -ENOSPC; goto out; @@ -1233,21 +1233,23 @@ next: * allocated dev extents, and when shrinking the device, * search_end may be smaller than search_start. */ - if (search_end > search_start) + if (search_end > search_start) { hole_size = search_end - search_start; - if (hole_size > max_hole_size) { - max_hole_start = search_start; - max_hole_size = hole_size; - } + if (contains_pending_extent(trans, device, &search_start, + hole_size)) { + btrfs_release_path(path); + goto again; + } - if (contains_pending_extent(trans, device, &search_start, hole_size)) { - btrfs_release_path(path); - goto again; + if (hole_size > max_hole_size) { + max_hole_start = search_start; + max_hole_size = hole_size; + } } /* See above. */ - if (hole_size < num_bytes) + if (max_hole_size < num_bytes) ret = -ENOSPC; else ret = 0; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/9] btrfs: Fix NO_SPACE bug caused by delayed-iput
From: Zhao Lei Steps to reproduce: while true; do dd if=/dev/zero of=/btrfs_dir/file count=[fs_size * 75%] rm /btrfs_dir/file sync done And we'll see dd failed because btrfs return NO_SPACE. Reason: Normally, btrfs_commit_transaction() call btrfs_run_delayed_iputs() in end to free fs space for next write, but sometimes it hadn't done work on time, because btrfs-cleaner thread get delayed-iputs from list before, but do iput() after next write. This is log: [ 2569.050776] comm=btrfs-cleaner func=btrfs_evict_inode() begin [ 2569.084280] comm=sync func=btrfs_commit_transaction() call btrfs_run_delayed_iputs() [ 2569.085418] comm=sync func=btrfs_commit_transaction() done btrfs_run_delayed_iputs() [ 2569.087554] comm=sync func=btrfs_commit_transaction() end [ 2569.191081] comm=dd begin [ 2569.790112] comm=dd func=__btrfs_buffered_write() ret=-28 [ 2569.847479] comm=btrfs-cleaner func=add_pinned_bytes() 0 + 32677888 = 32677888 [ 2569.849530] comm=btrfs-cleaner func=add_pinned_bytes() 32677888 + 23834624 = 56512512 ... [ 2569.903893] comm=btrfs-cleaner func=add_pinned_bytes() 943976448 + 21762048 = 965738496 [ 2569.908270] comm=btrfs-cleaner func=btrfs_evict_inode() end Fix: Make btrfs_commit_transaction() wait current running btrfs-cleaner's delayed-iputs() done in end. Test: Use script similar to above(more complex), before patch: 7 failed in 100 * 20 loop. after patch: 0 failed in 100 * 20 loop. Signed-off-by: Zhao Lei --- fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c | 5 - fs/btrfs/transaction.c | 6 +- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index f9c89ca..54d4d78 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1513,6 +1513,7 @@ struct btrfs_fs_info { spinlock_t delayed_iput_lock; struct list_head delayed_iputs; + struct rw_semaphore delayed_iput_sem; /* this protects tree_mod_seq_list */ spinlock_t tree_mod_seq_lock; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 639f266..df40f60 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1778,7 +1778,9 @@ static int cleaner_kthread(void *arg) goto sleep; } + down_read(&root->fs_info->delayed_iput_sem); btrfs_run_delayed_iputs(root); + up_read(&root->fs_info->delayed_iput_sem); btrfs_delete_unused_bgs(root->fs_info); again = btrfs_clean_one_deleted_snapshot(root); mutex_unlock(&root->fs_info->cleaner_mutex); @@ -2241,11 +2243,12 @@ int open_ctree(struct super_block *sb, spin_lock_init(&fs_info->qgroup_op_lock); spin_lock_init(&fs_info->buffer_lock); spin_lock_init(&fs_info->unused_bgs_lock); - mutex_init(&fs_info->unused_bg_unpin_mutex); rwlock_init(&fs_info->tree_mod_log_lock); + mutex_init(&fs_info->unused_bg_unpin_mutex); mutex_init(&fs_info->reloc_mutex); mutex_init(&fs_info->delalloc_root_mutex); seqlock_init(&fs_info->profiles_lock); + init_rwsem(&fs_info->delayed_iput_sem); init_completion(&fs_info->kobj_unregister); INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 8be4278..d18991f 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -2076,8 +2076,12 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, kmem_cache_free(btrfs_trans_handle_cachep, trans); - if (current != root->fs_info->transaction_kthread) + if (current != root->fs_info->transaction_kthread) { btrfs_run_delayed_iputs(root); + /* make sure that all running delayed iput are done */ + down_write(&root->fs_info->delayed_iput_sem); + up_write(&root->fs_info->delayed_iput_sem); + } return ret; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/9] btrfs: Fix no_space on dd and rm loop
From: Zhao Lei I resend this patch set with some changes: 1: Move a cleanup patch for btrfs_check_data_free_space() into 2: Rebased on top of v4.0-rc5 3: Fixed a lock problem reported by: 'Tsutomu Itoh' Tested by busy dd and rm loop script in 2000 times. Confirmed having-problem in v4.0-rc5 and no-problem on top of this patchset. I'll add xfstests for this case later. This is available at fix_no_space branch on my tree: git://github.com/zhaoleidd/btrfs.git It is also included in integration-for-chris branch in above tree. Thanks Zhaolei Zhao Lei (9): btrfs: fix condition of commit transaction btrfs: Fix tail space processing in find_free_dev_extent() btrfs: Adjust commit-transaction condition to avoid NO_SPACE more btrfs: Set relative data on clear btrfs_block_group_cache->pinned btrfs: add WARN_ON() to check is space_info op current btrfs: Fix NO_SPACE bug caused by delayed-iput btrfs: Support busy loop of write and delete btrfs: wait for delayed iputs on no space btrfs: cleanup unused alloc_chunk varible fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c | 5 - fs/btrfs/extent-tree.c | 60 ++ fs/btrfs/transaction.c | 6 - fs/btrfs/volumes.c | 24 +++- 5 files changed, 69 insertions(+), 27 deletions(-) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: wait for delayed iputs on no space
From: Zhao Lei btrfs will report no_space when we run following write and delete file loop: # FILE_SIZE_M=[ 75% of fs space ] # DEV=[ some dev ] # MNT=[ some dir ] # # mkfs.btrfs -f "$DEV" # mount -o nodatacow "$DEV" "$MNT" # for ((i = 0; i < 100; i++)); do dd if=/dev/zero of="$MNT"/file0 bs=1M count="$FILE_SIZE_M"; rm -f "$MNT"/file0; done # Reason: iput() and evict() is run after write pages to block device, if write pages work is not finished before next write, the "rm"ed space is not freed, and caused above bug. Fix: We can add "-o flushoncommit" mount option to avoid above bug, but it have performance problem. Actually, we can to wait for on-the-fly writes only when no-space happened, it is which this patch do. Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 6c1e211..94fb15f 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3683,6 +3683,9 @@ commit_trans: !atomic_read(&root->fs_info->open_ioctl_trans)) { need_commit--; + if (need_commit > 0) + btrfs_wait_ordered_roots(fs_info, -1); + trans = btrfs_join_transaction(root); if (IS_ERR(trans)) return PTR_ERR(trans); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: wait for delayed iputs on no space
From: Zhao Lei This is another fix of no_space case. All patchs for fix no_space bug are available at fix_no_space branch on: git://github.com/zhaoleidd/btrfs Any suggestions are welcome. Zhao Lei (1): btrfs: wait for delayed iputs on no space fs/btrfs/extent-tree.c | 3 +++ 1 file changed, 3 insertions(+) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] btrfs: Use raid_write_end_io for scrub
From: Zhao Lei No need to create additional end_io function for scrub, it can use existing raid_write_end_io() instead. This patch also fixed some wrong comments. Signed-off-by: Zhao Lei --- fs/btrfs/raid56.c | 36 1 file changed, 8 insertions(+), 28 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 0a40d07..2285e78 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -897,6 +897,7 @@ static void rbio_orig_end_io(struct btrfs_raid_bio *rbio, int err, int uptodate) static void raid_write_end_io(struct bio *bio, int err) { struct btrfs_raid_bio *rbio = bio->bi_private; + int max_errors; if (err) fail_bio_stripe(rbio, bio); @@ -906,10 +907,13 @@ static void raid_write_end_io(struct bio *bio, int err) if (!atomic_dec_and_test(&rbio->stripes_pending)) return; - err = 0; + /* OK, we have wrote all the stripes we need to. */ + if (rbio->operation == BTRFS_RBIO_PARITY_SCRUB) + max_errors = 0; + else + max_errors = rbio->bbio->max_errors; - /* OK, we have read all the stripes we need to. */ - if (atomic_read(&rbio->error) > rbio->bbio->max_errors) + if (atomic_read(&rbio->error) > max_errors) err = -EIO; rbio_orig_end_io(rbio, err, 0); @@ -2276,30 +2280,6 @@ static int alloc_rbio_essential_pages(struct btrfs_raid_bio *rbio) return 0; } -/* - * end io function used by finish_rmw. When we finally - * get here, we've written a full stripe - */ -static void raid_write_parity_end_io(struct bio *bio, int err) -{ - struct btrfs_raid_bio *rbio = bio->bi_private; - - if (err) - fail_bio_stripe(rbio, bio); - - bio_put(bio); - - if (!atomic_dec_and_test(&rbio->stripes_pending)) - return; - - err = 0; - - if (atomic_read(&rbio->error)) - err = -EIO; - - rbio_orig_end_io(rbio, err, 0); -} - static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio, int need_check) { @@ -2452,7 +2432,7 @@ submit_write: break; bio->bi_private = rbio; - bio->bi_end_io = raid_write_parity_end_io; + bio->bi_end_io = raid_write_end_io; BUG_ON(!test_bit(BIO_UPTODATE, &bio->bi_flags)); submit_bio(WRITE, bio); } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] btrfs: Use unified stripe_page's index calculation
From: Zhao Lei We are using different index calculation method for stripe_page in current code: 1: (rbio->stripe_len / PAGE_CACHE_SIZE) * stripe_index + page_index 2: DIV_ROUND_UP(rbio->stripe_len, PAGE_CACHE_SIZE) * stripe_index + page_index 3: DIV_ROUND_UP(rbio->stripe_len * stripe_index, PAGE_CACHE_SIZE) + page_index ... They can get same result when stripe_len align to PAGE_CACHE_SIZE, this is why current code can work. But anyway, we need to fix it to make code better. Signed-off-by: Zhao Lei --- fs/btrfs/raid56.c | 43 +-- 1 file changed, 21 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 0cfbfcf..645bc37 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -612,13 +612,28 @@ static int rbio_can_merge(struct btrfs_raid_bio *last, return 1; } +static int rbio_stripe_page_index(struct btrfs_raid_bio *rbio, int stripe, + int index) +{ + return stripe * rbio->stripe_npages + index; +} + +/* + * these are just the pages from the rbio array, not from anything + * the FS sent down to us + */ +static struct page *rbio_stripe_page(struct btrfs_raid_bio *rbio, int stripe, +int index) +{ + return rbio->stripe_pages[rbio_stripe_page_index(rbio, stripe, index)]; +} + /* * helper to index into the pstripe */ static struct page *rbio_pstripe_page(struct btrfs_raid_bio *rbio, int index) { - index += (rbio->nr_data * rbio->stripe_len) >> PAGE_CACHE_SHIFT; - return rbio->stripe_pages[index]; + return rbio_stripe_page(rbio, rbio->nr_data, index); } /* @@ -629,10 +644,7 @@ static struct page *rbio_qstripe_page(struct btrfs_raid_bio *rbio, int index) { if (rbio->nr_data + 1 == rbio->real_stripes) return NULL; - - index += ((rbio->nr_data + 1) * rbio->stripe_len) >> - PAGE_CACHE_SHIFT; - return rbio->stripe_pages[index]; + return rbio_stripe_page(rbio, rbio->nr_data + 1, index); } /* @@ -944,8 +956,7 @@ static struct page *page_in_rbio(struct btrfs_raid_bio *rbio, */ static unsigned long rbio_nr_pages(unsigned long stripe_len, int nr_stripes) { - unsigned long nr = stripe_len * nr_stripes; - return DIV_ROUND_UP(nr, PAGE_CACHE_SIZE); + return DIV_ROUND_UP(stripe_len, PAGE_CACHE_SIZE) * nr_stripes; } /* @@ -1023,13 +1034,13 @@ static int alloc_rbio_pages(struct btrfs_raid_bio *rbio) return 0; } -/* allocate pages for just the p/q stripes */ +/* only allocate pages for p/q stripes */ static int alloc_rbio_parity_pages(struct btrfs_raid_bio *rbio) { int i; struct page *page; - i = (rbio->nr_data * rbio->stripe_len) >> PAGE_CACHE_SHIFT; + i = rbio_stripe_page_index(rbio, rbio->nr_data, 0); for (; i < rbio->nr_pages; i++) { if (rbio->stripe_pages[i]) @@ -1119,18 +1130,6 @@ static void validate_rbio_for_rmw(struct btrfs_raid_bio *rbio) } /* - * these are just the pages from the rbio array, not from anything - * the FS sent down to us - */ -static struct page *rbio_stripe_page(struct btrfs_raid_bio *rbio, int stripe, int page) -{ - int index; - index = stripe * (rbio->stripe_len >> PAGE_CACHE_SHIFT); - index += page; - return rbio->stripe_pages[index]; -} - -/* * helper function to walk our bio list and populate the bio_pages array with * the result. This seems expensive, but it is faster than constantly * searching through the bio list as we setup the IO in finish_rmw or stripe -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] btrfs: use rbio->nr_pages to reduce calculation
From: Zhao Lei We can use rbio->stripe_npages to reduce unnecessary calculation in many code place. Signed-off-by: Zhao Lei --- fs/btrfs/raid56.c | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 645bc37..0d902ac 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -1172,7 +1172,6 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio) { struct btrfs_bio *bbio = rbio->bbio; void *pointers[rbio->real_stripes]; - int stripe_len = rbio->stripe_len; int nr_data = rbio->nr_data; int stripe; int pagenr; @@ -1180,7 +1179,6 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio) int q_stripe = -1; struct bio_list bio_list; struct bio *bio; - int pages_per_stripe = stripe_len >> PAGE_CACHE_SHIFT; int ret; bio_list_init(&bio_list); @@ -1223,7 +1221,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio) else clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags); - for (pagenr = 0; pagenr < pages_per_stripe; pagenr++) { + for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) { struct page *p; /* first collect one page from each data stripe */ for (stripe = 0; stripe < nr_data; stripe++) { @@ -1265,7 +1263,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio) * everything else. */ for (stripe = 0; stripe < rbio->real_stripes; stripe++) { - for (pagenr = 0; pagenr < pages_per_stripe; pagenr++) { + for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) { struct page *page; if (stripe < rbio->nr_data) { page = page_in_rbio(rbio, stripe, pagenr, 1); @@ -1289,7 +1287,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio) if (!bbio->tgtdev_map[stripe]) continue; - for (pagenr = 0; pagenr < pages_per_stripe; pagenr++) { + for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) { struct page *page; if (stripe < rbio->nr_data) { page = page_in_rbio(rbio, stripe, pagenr, 1); @@ -1505,7 +1503,6 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio) int bios_to_read = 0; struct bio_list bio_list; int ret; - int nr_pages = DIV_ROUND_UP(rbio->stripe_len, PAGE_CACHE_SIZE); int pagenr; int stripe; struct bio *bio; @@ -1524,7 +1521,7 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio) * stripe */ for (stripe = 0; stripe < rbio->nr_data; stripe++) { - for (pagenr = 0; pagenr < nr_pages; pagenr++) { + for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) { struct page *page; /* * we want to find all the pages missing from @@ -1801,7 +1798,6 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio) int pagenr, stripe; void **pointers; int faila = -1, failb = -1; - int nr_pages = DIV_ROUND_UP(rbio->stripe_len, PAGE_CACHE_SIZE); struct page *page; int err; int i; @@ -1824,7 +1820,7 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio) index_rbio_pages(rbio); - for (pagenr = 0; pagenr < nr_pages; pagenr++) { + for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) { /* * Now we just use bitmap to mark the horizontal stripes in * which we have data when doing parity scrub. @@ -1934,7 +1930,7 @@ pstripe: * other endio functions will fiddle the uptodate bits */ if (rbio->operation == BTRFS_RBIO_WRITE) { - for (i = 0; i < nr_pages; i++) { + for (i = 0; i < rbio->stripe_npages; i++) { if (faila != -1) { page = rbio_stripe_page(rbio, faila, i); SetPageUptodate(page); @@ -2027,7 +2023,6 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio) int bios_to_read = 0; struct bio_list bio_list; int ret; - int nr_pages = DIV_ROUND_UP(rbio->stripe_len, PAGE_CACHE_SIZE); int pagenr; int stripe; struct bio *bio; @@ -2051,7 +2046,7 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio) continue; } - for (pagenr = 0; pagenr < nr_pages; pagenr++) { + for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) { struct page *p;
[PATCH 1/6] btrfs: Fix calculation of rbio->dbitmap's size calculation
From: Zhao Lei Current code is trying to calculate rbio->dbitmap's size to make it align to sizeof(long), but implement haven't achived this object, it is align to sizeof(char) instead. This patch fixed above calculation, and use sizeof(long) instead of fixed "8" to increate compatibility. Signed-off-by: Zhao Lei --- fs/btrfs/raid56.c | 4 ++-- fs/btrfs/scrub.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 5264858..0cfbfcf 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -963,8 +963,8 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_root *root, void *p; rbio = kzalloc(sizeof(*rbio) + num_pages * sizeof(struct page *) * 2 + - DIV_ROUND_UP(stripe_npages, BITS_PER_LONG / 8), - GFP_NOFS); + DIV_ROUND_UP(stripe_npages, BITS_PER_LONG) * + sizeof(long), GFP_NOFS); if (!rbio) return ERR_PTR(-ENOMEM); diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index ec57687..1221a56 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2737,7 +2737,7 @@ out: static inline int scrub_calc_parity_bitmap_len(int nsectors) { - return DIV_ROUND_UP(nsectors, BITS_PER_LONG) * (BITS_PER_LONG / 8); + return DIV_ROUND_UP(nsectors, BITS_PER_LONG) * sizeof(long); } static void scrub_parity_get(struct scrub_parity *sparity) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] btrfs: Clear PageUptodate bit in alloc_rbio_parity_pages()
From: Zhao Lei Signed-off-by: Zhao Lei --- fs/btrfs/raid56.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 0d902ac..0a40d07 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -1049,6 +1049,7 @@ static int alloc_rbio_parity_pages(struct btrfs_raid_bio *rbio) if (!page) return -ENOMEM; rbio->stripe_pages[i] = page; + ClearPageUptodate(page); } return 0; } -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] btrfs: Remove unused err = 0 line for raid_rmw_end_io()
From: Zhao Lei Signed-off-by: Zhao Lei --- fs/btrfs/raid56.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 2285e78..c087870 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -1464,7 +1464,6 @@ static void raid_rmw_end_io(struct bio *bio, int err) if (!atomic_dec_and_test(&rbio->stripes_pending)) return; - err = 0; if (atomic_read(&rbio->error) > rbio->bbio->max_errors) goto cleanup; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Support busy loop of write and delete
From: Zhao Lei Reproduce: while true; do dd if=/dev/zero of=/mnt/btrfs/file count=[75% fs_size] rm /mnt/btrfs/file done Then we can see above loop failed on NO_SPACE. It it long-term problem since very beginning, because delayed-iput after rm are not run. We already have commit_transaction() in alloc_space code, but it is not triggered in above case. This patch trigger commit_transaction() to run delayed-iput and reflash pinned-space to to make write success. It is based on previous fix of delayed-iput in commit_transaction(), need to be applied on top of: btrfs: Fix NO_SPACE bug caused by delayed-iput Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 6c19033..6c1e211 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3599,13 +3599,13 @@ int btrfs_check_data_free_space(struct inode *inode, u64 bytes) struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_fs_info *fs_info = root->fs_info; u64 used; - int ret = 0, committed = 0, have_pinned_space = 1, alloc_chunk = 1; + int ret = 0, need_commit = 2, have_pinned_space, alloc_chunk = 1; /* make sure bytes are sectorsize aligned */ bytes = ALIGN(bytes, root->sectorsize); if (btrfs_is_free_space_inode(inode)) { - committed = 1; + need_commit = 0; ASSERT(current->journal_info); } @@ -3655,8 +3655,10 @@ alloc: if (ret < 0) { if (ret != -ENOSPC) return ret; - else + else { + have_pinned_space = 1; goto commit_trans; + } } if (!data_sinfo) @@ -3670,23 +3672,23 @@ alloc: * allocation, and no removed chunk in current transaction, * don't bother committing the transaction. */ - if (percpu_counter_compare(&data_sinfo->total_bytes_pinned, - used + bytes - - data_sinfo->total_bytes) < 0) - have_pinned_space = 0; + have_pinned_space = percpu_counter_compare( + &data_sinfo->total_bytes_pinned, + used + bytes - data_sinfo->total_bytes); spin_unlock(&data_sinfo->lock); /* commit the current transaction and try again */ commit_trans: - if (!committed && + if (need_commit && !atomic_read(&root->fs_info->open_ioctl_trans)) { - committed = 1; + need_commit--; trans = btrfs_join_transaction(root); if (IS_ERR(trans)) return PTR_ERR(trans); - if (have_pinned_space || - trans->transaction->have_free_bgs) { + if (have_pinned_space >= 0 || + trans->transaction->have_free_bgs || + need_commit > 0) { ret = btrfs_commit_transaction(trans, root); if (ret) return ret; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/2] btrfs: add WARN_ON() to check is space_info op current
From: Zhao Lei space_info's value calculation is some complex and easy to cause bug, add WARN_ON() to help debug. Changelog v1->v2: Put WARN_ON()s under the ENOSPC_DEBUG mount option. Suggested by: David Sterba Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index bfb9105..6c19033 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9415,9 +9415,19 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, spin_lock(&block_group->space_info->lock); list_del_init(&block_group->ro_list); + + if (btrfs_test_opt(root, ENOSPC_DEBUG)) { + WARN_ON(block_group->space_info->total_bytes + < block_group->key.offset); + WARN_ON(block_group->space_info->bytes_readonly + < block_group->key.offset); + WARN_ON(block_group->space_info->disk_total + < block_group->key.offset * factor); + } block_group->space_info->total_bytes -= block_group->key.offset; block_group->space_info->bytes_readonly -= block_group->key.offset; block_group->space_info->disk_total -= block_group->key.offset * factor; + spin_unlock(&block_group->space_info->lock); memcpy(&key, &block_group->key, sizeof(key)); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/2] btrfs: Set relative data on clear btrfs_block_group_cache->pinned
From: Zhao Lei Changelog v1->v2: [PATCH 2/2] btrfs: add WARN_ON() to check is space_info op current Put WARN_ON()s under the ENOSPC_DEBUG mount option. Suggested by: David Sterba Changelog v1->v2: drop patch of: Remove BUG_ON() when failed searching block_group_cache in unpin_extent_range() because Filipe Manana already fixed it with better way. Zhao Lei (2): btrfs: Set relative data on clear btrfs_block_group_cache->pinned btrfs: add WARN_ON() to check is space_info op current fs/btrfs/extent-tree.c | 20 1 file changed, 20 insertions(+) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/2] btrfs: Set relative data on clear btrfs_block_group_cache->pinned
From: Zhao Lei Bug1: space_info->bytes_readonly was set to very large(negative) value in btrfs_remove_block_group(). Reason: Current code set block_group_cache->pinned = 0 in btrfs_delete_unused_bgs(), but above space was not counted to space_info->bytes_readonly. Then in btrfs_remove_block_group(): block_group->space_info->bytes_readonly -= block_group->key.offset; We can see following value in trace: btrfs_remove_block_group: pid=2677 comm=btrfs-cleaner WARNING: bytes_readonly=12582912, key.offset=134217728 Bug2: space_info->total_bytes_pinned grow to value larger than fs size. In a 1.2G fs, we can get following trace log: at first: ZL_DEBUG: add_pinned_bytes: pid=2710 comm=sync change total_bytes_pinned flags=1 869793792 + 95944704 = 965738496 after some op: ZL_DEBUG: add_pinned_bytes: pid=2770 comm=sync change total_bytes_pinned flags=1 1780178944 + 95944704 = 1876123648 after some op: ZL_DEBUG: add_pinned_bytes: pid=3193 comm=sync change total_bytes_pinned flags=1 2924568576 + 95551488 = 3020120064 ... Reason: Similar to bug1, we also need to adjust space_info->total_bytes_pinned in above code block. Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index dc25e13..bfb9105 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9605,8 +9605,18 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) mutex_unlock(&fs_info->unused_bg_unpin_mutex); /* Reset pinned so btrfs_put_block_group doesn't complain */ + spin_lock(&space_info->lock); + spin_lock(&block_group->lock); + + space_info->bytes_pinned -= block_group->pinned; + space_info->bytes_readonly += block_group->pinned; + percpu_counter_add(&space_info->total_bytes_pinned, + -block_group->pinned); block_group->pinned = 0; + spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); + /* * Btrfs_remove_chunk will abort the transaction if things go * horribly wrong. -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/1] btrfs: Fix NO_SPACE bug caused by delayed-iput
From: Zhao Lei It is the last patch to fix following write fail case: while true; do write a file to 75% fs size delete above file sync or sleep done Above issue is caused by several reason, and fixed in following patch respectively: Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole from Forrest Liu btrfs: Fix out-of-space bug merged into v4.0-rc1 btrfs: fix condition of commit transaction btrfs: Fix tail space processing in find_free_dev_extent() btrfs: Adjust commit-transaction condition to avoid NO_SPACE more btrfs: Fix NO_SPACE bug caused by delayed-iput this patch These patchs reduced fail-rate step by step, from 50 fails in 20 * 200 loops, to 0 fails now. And now we can add a test case to xfstests for above action. Zhao Lei (1): btrfs: Fix NO_SPACE bug caused by delayed-iput fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c | 3 ++- fs/btrfs/inode.c | 4 fs/btrfs/transaction.c | 6 +- 4 files changed, 12 insertions(+), 2 deletions(-) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] btrfs: Fix NO_SPACE bug caused by delayed-iput
From: Zhao Lei Steps to reproduce: while true; do dd if=/dev/zero of=/btrfs_dir/file count=[fs_size * 75%] rm /btrfs_dir/file sync done And we'll see dd failed because btrfs return NO_SPACE. Reason: Normally, btrfs_commit_transaction() call btrfs_run_delayed_iputs() in end to free fs space for next write, but sometimes it hadn't done work on time, because btrfs-cleaner thread get delayed-iputs from list before, but do iput() after next write. This is log: [ 2569.050776] comm=btrfs-cleaner func=btrfs_evict_inode() begin [ 2569.084280] comm=sync func=btrfs_commit_transaction() call btrfs_run_delayed_iputs() [ 2569.085418] comm=sync func=btrfs_commit_transaction() done btrfs_run_delayed_iputs() [ 2569.087554] comm=sync func=btrfs_commit_transaction() end [ 2569.191081] comm=dd begin [ 2569.790112] comm=dd func=__btrfs_buffered_write() ret=-28 [ 2569.847479] comm=btrfs-cleaner func=add_pinned_bytes() 0 + 32677888 = 32677888 [ 2569.849530] comm=btrfs-cleaner func=add_pinned_bytes() 32677888 + 23834624 = 56512512 ... [ 2569.903893] comm=btrfs-cleaner func=add_pinned_bytes() 943976448 + 21762048 = 965738496 [ 2569.908270] comm=btrfs-cleaner func=btrfs_evict_inode() end Fix: Make btrfs_commit_transaction() wait current running delayed-iputs() done in end. Test: Use script similar to above(more complex), before patch: 7 failed in 100 * 20 loop. after patch: 0 failed in 100 * 20 loop. Signed-off-by: Zhao Lei --- fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c | 3 ++- fs/btrfs/inode.c | 4 fs/btrfs/transaction.c | 6 +- 4 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 84c3b00..ec2dac0 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1513,6 +1513,7 @@ struct btrfs_fs_info { spinlock_t delayed_iput_lock; struct list_head delayed_iputs; + struct rw_semaphore delayed_iput_sem; /* this protects tree_mod_seq_list */ spinlock_t tree_mod_seq_lock; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f79f385..1c0d8ec 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2241,11 +2241,12 @@ int open_ctree(struct super_block *sb, spin_lock_init(&fs_info->qgroup_op_lock); spin_lock_init(&fs_info->buffer_lock); spin_lock_init(&fs_info->unused_bgs_lock); - mutex_init(&fs_info->unused_bg_unpin_mutex); rwlock_init(&fs_info->tree_mod_log_lock); + mutex_init(&fs_info->unused_bg_unpin_mutex); mutex_init(&fs_info->reloc_mutex); mutex_init(&fs_info->delalloc_root_mutex); seqlock_init(&fs_info->profiles_lock); + init_rwsem(&fs_info->delayed_iput_sem); init_completion(&fs_info->kobj_unregister); INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a85c23d..a396bb9 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3087,6 +3087,8 @@ void btrfs_run_delayed_iputs(struct btrfs_root *root) if (empty) return; + down_read(&fs_info->delayed_iput_sem); + spin_lock(&fs_info->delayed_iput_lock); list_splice_init(&fs_info->delayed_iputs, &list); spin_unlock(&fs_info->delayed_iput_lock); @@ -3097,6 +3099,8 @@ void btrfs_run_delayed_iputs(struct btrfs_root *root) iput(delayed->inode); kfree(delayed); } + + up_read(&fs_info->delayed_iput_sem); } /* diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 7e80f32..175cdef 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -2068,8 +2068,12 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, kmem_cache_free(btrfs_trans_handle_cachep, trans); - if (current != root->fs_info->transaction_kthread) + if (current != root->fs_info->transaction_kthread) { btrfs_run_delayed_iputs(root); + /* make sure that all running delayed iput are done */ + down_write(&root->fs_info->delayed_iput_sem); + up_write(&root->fs_info->delayed_iput_sem); + } return ret; -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/2] btrfs: Set relative data on clear btrfs_block_group_cache->pinned
From: Zhao Lei Changelog v1->v2: drop patch of: Remove BUG_ON() when failed searching block_group_cache in unpin_extent_range() because Filipe Manana already fixed it with better way. Zhao Lei (2): btrfs: Set relative data on clear btrfs_block_group_cache->pinned btrfs: add WARN_ON() to check is space_info op current fs/btrfs/extent-tree.c | 18 ++ 1 file changed, 18 insertions(+) -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs: add WARN_ON() to check is space_info op current
From: Zhao Lei space_info's value calculation is some complex and easy to cause bug, add WARN_ON() to help debug. Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 3e7a4af..8b51eb5 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9471,9 +9471,17 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, spin_lock(&block_group->space_info->lock); list_del_init(&block_group->ro_list); + + WARN_ON(block_group->space_info->total_bytes + < block_group->key.offset); + WARN_ON(block_group->space_info->bytes_readonly + < block_group->key.offset); + WARN_ON(block_group->space_info->disk_total + < block_group->key.offset * factor); block_group->space_info->total_bytes -= block_group->key.offset; block_group->space_info->bytes_readonly -= block_group->key.offset; block_group->space_info->disk_total -= block_group->key.offset * factor; + spin_unlock(&block_group->space_info->lock); memcpy(&key, &block_group->key, sizeof(key)); -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs: Set relative data on clear btrfs_block_group_cache->pinned
From: Zhao Lei Bug1: space_info->bytes_readonly was set to very large(negative) value in btrfs_remove_block_group(). Reason: Current code set block_group_cache->pinned = 0 in btrfs_delete_unused_bgs(), but above space was not counted to space_info->bytes_readonly. Then in btrfs_remove_block_group(): block_group->space_info->bytes_readonly -= block_group->key.offset; We can see following value in trace: btrfs_remove_block_group: pid=2677 comm=btrfs-cleaner WARNING: bytes_readonly=12582912, key.offset=134217728 Bug2: space_info->total_bytes_pinned grow to value larger than fs size. In a 1.2G fs, we can get following trace log: at first: ZL_DEBUG: add_pinned_bytes: pid=2710 comm=sync change total_bytes_pinned flags=1 869793792 + 95944704 = 965738496 after some op: ZL_DEBUG: add_pinned_bytes: pid=2770 comm=sync change total_bytes_pinned flags=1 1780178944 + 95944704 = 1876123648 after some op: ZL_DEBUG: add_pinned_bytes: pid=3193 comm=sync change total_bytes_pinned flags=1 2924568576 + 95551488 = 3020120064 ... Reason: Similar to bug1, we also need to adjust space_info->total_bytes_pinned in above code block. Signed-off-by: Zhao Lei --- fs/btrfs/extent-tree.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 4ffce64..3e7a4af 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9645,8 +9645,18 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) } /* Reset pinned so btrfs_put_block_group doesn't complain */ + spin_lock(&space_info->lock); + spin_lock(&block_group->lock); + + space_info->bytes_pinned -= block_group->pinned; + space_info->bytes_readonly += block_group->pinned; + percpu_counter_add(&space_info->total_bytes_pinned, + -block_group->pinned); block_group->pinned = 0; + spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); + /* * Btrfs_remove_chunk will abort the transaction if things go * horribly wrong. -- 1.8.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html