from:"Zhaolei"

[PATCH] fstests: Fix generic/102 fail for btrfs

2015-12-03 Thread Zhaolei

From: Zhao Lei 

generic/102 sometimes fails in newest btrfs toolchain,
because it use non-mixed mode in default, which request more space
for metadata, and no space for data writing.

This patch force mixed mode for btrfs in generic/102.

Signed-off-by: Zhao Lei 
---
 tests/generic/102 | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/generic/102 b/tests/generic/102
index abc3994..8c01fb5 100755
--- a/tests/generic/102
+++ b/tests/generic/102
@@ -48,6 +48,8 @@ _require_scratch
 
 rm -f $seqres.full
 
+[[ "$FSTYP" = "btrfs" ]] && MKFS_OPTIONS+=" --mixed"
+
 dev_size=$((512 * 1024 * 1024)) # 512MB filesystem
 _scratch_mkfs_sized $dev_size >>$seqres.full 2>&1
 _scratch_mount
-- 
1.8.5.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] fstests: speedup generic/027 for new version of btrfs

2015-11-24 Thread Zhaolei

From: Zhao Lei 

New version of btrfs create non-mixed blockgroups in all case.

For generic/027, the filesystem in test is convert from
mixed-blockgroup to non-mixed blockgroup.
And test time is changed from 400s -> 2700s in my node.

To test btrfs with all mountoptions, this testitem need about
7.5H. (actually, some mountoption as compress needs more time)

This patch reduce test loop count, to make testtime about equal
with old version.

Signed-off-by: Zhao Lei 
---
 tests/generic/027 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/generic/027 b/tests/generic/027
index d2e59d6..42f0685 100755
--- a/tests/generic/027
+++ b/tests/generic/027
@@ -78,7 +78,7 @@ rm -f $SCRATCH_MNT/testfile
 loop=100
 # btrfs takes much longer time, reduce the loop count
 if [ "$FSTYP" == "btrfs" ]; then
-   loop=10
+   loop=2
 fi
 
 dir=$SCRATCH_MNT/testdir
-- 
1.8.5.1


-- 
This message has been scanned for viruses and
dangerous content by Fujitsu, and is believed to be clean.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs-progs: Increase running state's priority in stat output

2015-07-28 Thread Zhaolei

From: Zhao Lei 

Anthony Plack  reported a output bug in maillist:
  title: btrfs-progs SCRUB reporting aborted but still running - minor

btrfs scrub status report it was aborted but still runs to completion.
  # btrfs scrub status /mnt/data
  scrub status for f591ac13-1a69-476d-bd30-346f87a491da
scrub started at Mon Apr 27 06:48:44 2015 and was aborted after 1089 
seconds
total bytes scrubbed: 1.02TiB with 0 errors
  #
  # btrfs scrub status /mnt/data
  scrub status for f591ac13-1a69-476d-bd30-346f87a491da
scrub started at Mon Apr 27 06:48:44 2015 and was aborted after 1664 
seconds
total bytes scrubbed: 1.53TiB with 0 errors
  #
  ...

Reason:
  When scrub multi-device simultaneously, if some device canceled,
  and some device is still running, cancel state have higher priority to
  be outputed in global report.
  So we can see "scrub aborted" in status line, with running-time keeps
  increased.

Fix:
  We can increase running state's priority in output, if there is
  some device in scrub state, we output running state instead of
  cancelled state.

Reported-by: Anthony Plack 
Signed-off-by: Zhao Lei 
---
 cmds-scrub.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/cmds-scrub.c b/cmds-scrub.c
index b7aa809..7c9318e 100644
--- a/cmds-scrub.c
+++ b/cmds-scrub.c
@@ -252,17 +252,15 @@ static void _print_scrub_ss(struct scrub_stats *ss)
hours = ss->duration / (60 * 60);
gmtime_r(&seconds, &tm);
strftime(t, sizeof(t), "%M:%S", &tm);
-   if (ss->finished && !ss->canceled) {
-   printf(" and finished after %02u:%s\n", hours, t);
-   } else if (ss->canceled) {
+   if (ss->in_progress)
+   printf(", running for %02u:%s\n", hours, t);
+   else if (ss->canceled)
printf(" and was aborted after %02u:%s\n", hours, t);
-   } else {
-   if (ss->in_progress)
-   printf(", running for %02u:%s\n", hours, t);
-   else
-   printf(", interrupted after %02u:%s, not running\n",
-   hours, t);
-   }
+   else if (ss->finished)
+   printf(" and finished after %02u:%s\n", hours, t);
+   else
+   printf(", interrupted after %02u:%s, not running\n",
+  hours, t);
 }
 
 static void print_scrub_dev(struct btrfs_ioctl_dev_info_args *di,
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs-progs-tests: Avoid outputting useless warning in ./fsck-tests.sh

2015-07-27 Thread Zhaolei

From: Zhao Lei 

002-bad-transid outout 'transid verify failed' message in screen
which is just a warning in btrfs-image in normal condition of this
test.

This patch move above warning into $RESULTS, to:
1: Avoid trouble screen output
2: Let user known detail if other error happened in btrfs-image

Before patch:
  # ./fsck-tests.sh
[TEST]   001-bad-file-extent-bytenr
[TEST]   002-bad-transid
  parent transid verify failed on 29360128 wanted 9 found 755944791
  parent transid verify failed on 29360128 wanted 9 found 755944791
  Ignoring transid failure
[TEST]   003-shift-offsets
[TEST]   004-no-dir-index
  ...

After patch:
  # ./fsck-tests.sh
  [TEST]   001-bad-file-extent-bytenr
  [TEST]   002-bad-transid
  [TEST]   003-shift-offsets
  [TEST]   004-no-dir-index
  ...

Signed-off-by: Zhao Lei 
---
 tests/common | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tests/common b/tests/common
index 381ff96..4d03ed8 100644
--- a/tests/common
+++ b/tests/common
@@ -87,8 +87,9 @@ check_all_images()
 
if ! [ -f $image.restored ]; then
echo "restoring image $(basename $image)" >> $RESULTS
-   $TOP/btrfs-image -r $image $image.restored || \
-   _fail "failed to restore image $image"
+   $TOP/btrfs-image -r $image $image.restored \
+   &>> $RESULTS \
+   || _fail "failed to restore image $image"
fi
 
check_image $image.restored
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] btrfs-progs-tests: Fix mount fail of 013-extent-tree-rebuild

2015-07-27 Thread Zhaolei

From: Zhao Lei 

When using loop device for test, fsck-tests/013-extent-tree-rebuild
failed with following error message:
  # ./fsck-tests.sh
  ...
[TEST]   013-extent-tree-rebuild
  failed: mount /data/btrfsprogs/tests/test.img /data/btrfsprogs/tests/mnt
  test failed for case 013-extent-tree-rebuild
  #

Considering that $TEST_DEV can be block or loop device, we need determine
our mount option in a condition for both case.

This patch make above request to a common function, to solve current
problem in 013-extent-tree-rebuild, and support similar request in future.

Signed-off-by: Zhao Lei 
---
 tests/common | 24 
 tests/fsck-tests/013-extent-tree-rebuild/test.sh |  4 ++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/tests/common b/tests/common
index ba0b78a..381ff96 100644
--- a/tests/common
+++ b/tests/common
@@ -165,3 +165,27 @@ init_env()
mkdir -p "$TEST_MNT" || { echo "Failed mkdir -p $TEST_MNT"; exit 1; }
 }
 init_env
+
+mount_test_dev()
+{
+   local loop_opt
+   if [[ -b "$TEST_DEV" ]]; then
+   loop_opt=()
+   elif [[ -f "$TEST_DEV" ]]; then
+   loop_opt=(-o loop)
+   else
+   _fail "Invalid \$TEST_DEV: $TEST_DEV"
+   fi
+
+   [[ -d "$TEST_MNT" ]] || {
+   _fail "Invalid \$TEST_MNT: $TEST_MNT"
+   }
+
+   mount "${loop_opt[@]}" "$TEST_DEV" "$TEST_MNT" || _fail "mount 
$TEST_DEV to $TEST_MNT failed"
+}
+
+umount_test_dev()
+{
+   umount "$TEST_DEV"
+}
+
diff --git a/tests/fsck-tests/013-extent-tree-rebuild/test.sh 
b/tests/fsck-tests/013-extent-tree-rebuild/test.sh
index b7909d2..ff3c922 100755
--- a/tests/fsck-tests/013-extent-tree-rebuild/test.sh
+++ b/tests/fsck-tests/013-extent-tree-rebuild/test.sh
@@ -12,14 +12,14 @@ test_extent_tree_rebuild()
 {
run_check $SUDO_HELPER $TOP/mkfs.btrfs -f $TEST_DEV
 
-   run_check $SUDO_HELPER mount $TEST_DEV $TEST_MNT
+   run_check $SUDO_HELPER mount_test_dev
run_check $SUDO_HELPER cp -aR /lib/modules/`uname -r`/ $TEST_MNT
 
for i in `seq 1 100`;do
run_check $SUDO_HELPER $TOP/btrfs sub snapshot $TEST_MNT \
$TEST_MNT/snapaaa_$i
done
-   run_check $SUDO_HELPER umount $TEST_DEV
+   run_check $SUDO_HELPER umount_test_dev
 
# get extent root bytenr
extent_root_bytenr=`$SUDO_HELPER $TOP/btrfs-debug-tree -r $TEST_DEV | \
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] btrfs-progs-tests: Introduce init_env() to initialize common env variant

2015-07-27 Thread Zhaolei

From: Zhao Lei 

For example, $TEST_DIR is common used in severial tests, and have
duplicated code for initialize.

These duplicated code not only benifits harddisk vendor, but have
inconsistent details, as:
  convert-tests.sh: lack of mkdir
  fsck-tests/012-leaf-corruption/test.sh: unnecessary mkdir
  fsck-tests/013-extent-tree-rebuild/test.sh: unnecessary init
  misc-tests/XXX ...
And severial error message:
  _fail "unable to create mount point on $TEST_MNT"
  _fail "failed to create mount point"
  ...

This patch move initizlizaton of $TEST_DIR to common init_env(),
to avoid above problem, and init_env() can be used to add more
things in future.

Signed-off-by: Zhao Lei 
---
 tests/common | 7 +++
 tests/convert-tests.sh   | 1 -
 tests/fsck-tests.sh  | 3 ---
 tests/fsck-tests/012-leaf-corruption/test.sh | 1 -
 tests/fsck-tests/013-extent-tree-rebuild/test.sh | 5 -
 tests/misc-tests.sh  | 3 ---
 tests/misc-tests/001-btrfstune-features/test.sh  | 5 -
 tests/misc-tests/002-uuid-rewrite/test.sh| 5 -
 tests/misc-tests/003-zero-log/test.sh| 5 -
 9 files changed, 7 insertions(+), 28 deletions(-)

diff --git a/tests/common b/tests/common
index 2d337b0..ba0b78a 100644
--- a/tests/common
+++ b/tests/common
@@ -158,3 +158,10 @@ prepare_test_dev()
truncate -s "$size" "$TEST_DEV" || _not_run "create file for loop 
device failed"
 }
 
+init_env()
+{
+   TEST_MNT="${TEST_MNT:-$TOP/tests/mnt}"
+   export TEST_MNT
+   mkdir -p "$TEST_MNT" || { echo "Failed mkdir -p $TEST_MNT"; exit 1; }
+}
+init_env
diff --git a/tests/convert-tests.sh b/tests/convert-tests.sh
index efed90b..4e8496a 100755
--- a/tests/convert-tests.sh
+++ b/tests/convert-tests.sh
@@ -9,7 +9,6 @@ unset LANG
 LANG=C
 SCRIPT_DIR=$(dirname $(readlink -f $0))
 TOP=$(readlink -f $SCRIPT_DIR/../)
-TEST_MNT=${TEST_MNT:-$TOP/tests/mnt}
 RESULTS="$TOP/tests/convert-tests-results.txt"
 IMAGE="$TOP/tests/test.img"
 
diff --git a/tests/fsck-tests.sh b/tests/fsck-tests.sh
index b0ded6a..46dd72d 100755
--- a/tests/fsck-tests.sh
+++ b/tests/fsck-tests.sh
@@ -11,7 +11,6 @@ LANG=C
 SCRIPT_DIR=$(dirname $(readlink -f $0))
 TOP=$(readlink -f $SCRIPT_DIR/../)
 TEST_DEV=${TEST_DEV:-}
-TEST_MNT=${TEST_MNT:-$TOP/tests/mnt}
 RESULTS="$TOP/tests/fsck-tests-results.txt"
 
 source $TOP/tests/common
@@ -20,11 +19,9 @@ source $TOP/tests/common
 export TOP
 export RESULTS
 # For custom script needs to verfiy recovery
-export TEST_MNT
 export LANG
 
 rm -f $RESULTS
-mkdir -p $TEST_MNT || _fail "unable to create mount point on $TEST_MNT"
 
 # test rely on corrupting blocks tool
 check_prereq btrfs-corrupt-block
diff --git a/tests/fsck-tests/012-leaf-corruption/test.sh 
b/tests/fsck-tests/012-leaf-corruption/test.sh
index f8701ad..a37ceda 100755
--- a/tests/fsck-tests/012-leaf-corruption/test.sh
+++ b/tests/fsck-tests/012-leaf-corruption/test.sh
@@ -85,7 +85,6 @@ check_inode()
 check_leaf_corrupt_no_data_ext()
 {
image=$1
-   mkdir -p $TEST_MNT || _fail "failed to create mount point"
$SUDO_HELPER mount -o loop $image -o ro $TEST_MNT
 
i=0
diff --git a/tests/fsck-tests/013-extent-tree-rebuild/test.sh 
b/tests/fsck-tests/013-extent-tree-rebuild/test.sh
index 88a66cc..b7909d2 100755
--- a/tests/fsck-tests/013-extent-tree-rebuild/test.sh
+++ b/tests/fsck-tests/013-extent-tree-rebuild/test.sh
@@ -7,11 +7,6 @@ check_prereq mkfs.btrfs
 setup_root_helper
 prepare_test_dev 1G
 
-if [ -z $TEST_MNT ];then
-   echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant"
-   exit 0
-fi
-
 # test whether fsck can rebuild a corrupted extent tree
 test_extent_tree_rebuild()
 {
diff --git a/tests/misc-tests.sh b/tests/misc-tests.sh
index 5bbe914..cabe9c3 100755
--- a/tests/misc-tests.sh
+++ b/tests/misc-tests.sh
@@ -8,7 +8,6 @@ LANG=C
 SCRIPT_DIR=$(dirname $(readlink -f $0))
 TOP=$(readlink -f $SCRIPT_DIR/../)
 TEST_DEV=${TEST_DEV:-}
-TEST_MNT=${TEST_MNT:-$TOP/tests/mnt}
 RESULTS="$TOP/tests/misc-tests-results.txt"
 IMAGE="$TOP/tests/test.img"
 
@@ -18,11 +17,9 @@ source $TOP/tests/common
 export TOP
 export RESULTS
 # For custom script needs to verfiy recovery
-export TEST_MNT
 export LANG
 
 rm -f $RESULTS
-mkdir -p $TEST_MNT || _fail "unable to create mount point on $TEST_MNT"
 
 # test rely on corrupting blocks tool
 check_prereq btrfs-corrupt-block
diff --git a/tests/misc-tests/001-btrfstune-features/test.sh 
b/tests/misc-tests/001-btrfstune-features/test.sh
index ea33954..836e8d3 100755
--- a/tests/misc-tests/001-btrfstune-features/test.sh
+++ b/tests/misc-tests/001-btrfstune-features/test.sh
@@ -9,11 +9,6 @@ check_prereq mkfs.btrfs
 setup_root_helper
 prepare_test_dev
 
-if [ -z $TEST_MNT ];then
-   echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant"
-   exit 0
-fi
-
 # test whether fsck can rebuild a corrupted extent tree
 # parameters:
 # - option for mkfs.btrfs

[PATCH] btrfs-progs-tests: Add -o loop to fsck-tests/012-leaf-corruption for loop device

2015-07-27 Thread Zhaolei

From: Zhao Lei 

To avoid following mount error in test:
  mount: /root/btrfs/progs/tests/fsck-tests/012-leaf-corruption/test.img
  is not a block device (maybe try `-o loop'?)

Signed-off-by: Zhao Lei 
---
 tests/fsck-tests/012-leaf-corruption/test.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/fsck-tests/012-leaf-corruption/test.sh 
b/tests/fsck-tests/012-leaf-corruption/test.sh
index 98e3185..f8701ad 100755
--- a/tests/fsck-tests/012-leaf-corruption/test.sh
+++ b/tests/fsck-tests/012-leaf-corruption/test.sh
@@ -86,7 +86,7 @@ check_leaf_corrupt_no_data_ext()
 {
image=$1
mkdir -p $TEST_MNT || _fail "failed to create mount point"
-   $SUDO_HELPER mount $image -o ro $TEST_MNT
+   $SUDO_HELPER mount -o loop $image -o ro $TEST_MNT
 
i=0
while [ $i -lt ${#leaf_no_data_ext_list[@]} ]; do
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/7] btrfs-progs: Move close timer handle code to line after sub process exit

2015-07-27 Thread Zhaolei

From: Zhao Lei 

The timer handle have possibility in using by sub thread,
better to close it after sub process exit.

Signed-off-by: Zhao Lei 
---
 task-utils.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/task-utils.c b/task-utils.c
index 0390a69..17fd573 100644
--- a/task-utils.c
+++ b/task-utils.c
@@ -61,14 +61,14 @@ void task_stop(struct task_info *info)
if (!info)
return;
 
-   if (info->periodic.timer_fd)
-   close(info->periodic.timer_fd);
-
if (info->id > 0) {
pthread_cancel(info->id);
pthread_join(info->id, NULL);
}
 
+   if (info->periodic.timer_fd)
+   close(info->periodic.timer_fd);
+
if (info->postfn)
info->postfn(info->private_data);
 }
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 6/7] btrfs-progs: Move code to create loop device to common

2015-07-27 Thread Zhaolei

From: Zhao Lei 

This code block is common used, move it to common function will
make code clean.

Signed-off-by: Zhao Lei 
---
 tests/common | 16 
 tests/fsck-tests/013-extent-tree-rebuild/test.sh | 11 +--
 tests/misc-tests/001-btrfstune-features/test.sh  | 11 +--
 tests/misc-tests/002-uuid-rewrite/test.sh| 11 +--
 tests/misc-tests/003-zero-log/test.sh| 11 +--
 5 files changed, 20 insertions(+), 40 deletions(-)

diff --git a/tests/common b/tests/common
index 899ec7b..2d337b0 100644
--- a/tests/common
+++ b/tests/common
@@ -142,3 +142,19 @@ setup_root_helper()
fi
SUDO_HELPER=root_helper
 }
+
+prepare_test_dev()
+{
+   # num[K/M/G/T...]
+   local size="$1"
+
+   [[ "$TEST_DEV" ]] && return
+   [[ "$size" ]] || size='1G'
+
+   echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \
+   $RESULTS
+   TEST_DEV="$TOP/tests/test.img"
+
+   truncate -s "$size" "$TEST_DEV" || _not_run "create file for loop 
device failed"
+}
+
diff --git a/tests/fsck-tests/013-extent-tree-rebuild/test.sh 
b/tests/fsck-tests/013-extent-tree-rebuild/test.sh
index 3290cd7..88a66cc 100755
--- a/tests/fsck-tests/013-extent-tree-rebuild/test.sh
+++ b/tests/fsck-tests/013-extent-tree-rebuild/test.sh
@@ -5,16 +5,7 @@ source $TOP/tests/common
 check_prereq btrfs-debug-tree
 check_prereq mkfs.btrfs
 setup_root_helper
-
-if [ -z $TEST_DEV ]; then
-   echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \
-   $RESULTS
-   TEST_DEV="$TOP/tests/test.img"
-
-   # Need at least 1G to avoid mixed block group, which extent tree
-   # rebuild doesn't support.
-   run_check truncate -s 1G $TEST_DEV
-fi
+prepare_test_dev 1G
 
 if [ -z $TEST_MNT ];then
echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant"
diff --git a/tests/misc-tests/001-btrfstune-features/test.sh 
b/tests/misc-tests/001-btrfstune-features/test.sh
index c9981e6..ea33954 100755
--- a/tests/misc-tests/001-btrfstune-features/test.sh
+++ b/tests/misc-tests/001-btrfstune-features/test.sh
@@ -7,16 +7,7 @@ check_prereq btrfs-debug-tree
 check_prereq btrfs-show-super
 check_prereq mkfs.btrfs
 setup_root_helper
-
-if [ -z $TEST_DEV ]; then
-   echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \
-   $RESULTS
-   TEST_DEV="$TOP/tests/test.img"
-
-   # Need at least 1G to avoid mixed block group, which extent tree
-   # rebuild doesn't support.
-   run_check truncate -s 1G $TEST_DEV
-fi
+prepare_test_dev
 
 if [ -z $TEST_MNT ];then
echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant"
diff --git a/tests/misc-tests/002-uuid-rewrite/test.sh 
b/tests/misc-tests/002-uuid-rewrite/test.sh
index 6c2a393..bffa9b8 100755
--- a/tests/misc-tests/002-uuid-rewrite/test.sh
+++ b/tests/misc-tests/002-uuid-rewrite/test.sh
@@ -7,16 +7,7 @@ check_prereq btrfs-debug-tree
 check_prereq btrfs-show-super
 check_prereq mkfs.btrfs
 check_prereq btrfstune
-
-if [ -z $TEST_DEV ]; then
-   echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \
-   $RESULTS
-   TEST_DEV="$TOP/tests/test.img"
-
-   # Need at least 1G to avoid mixed block group, which extent tree
-   # rebuild doesn't support.
-   run_check truncate -s 1G $TEST_DEV
-fi
+prepare_test_dev
 
 if [ -z $TEST_MNT ];then
echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant"
diff --git a/tests/misc-tests/003-zero-log/test.sh 
b/tests/misc-tests/003-zero-log/test.sh
index da5b351..edab5db 100755
--- a/tests/misc-tests/003-zero-log/test.sh
+++ b/tests/misc-tests/003-zero-log/test.sh
@@ -6,16 +6,7 @@ source $TOP/tests/common
 check_prereq btrfs-show-super
 check_prereq mkfs.btrfs
 check_prereq btrfs
-
-if [ -z $TEST_DEV ]; then
-   echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \
-   $RESULTS
-   TEST_DEV="$TOP/tests/test.img"
-
-   # Need at least 1G to avoid mixed block group, which extent tree
-   # rebuild doesn't support.
-   run_check truncate -s 1G $TEST_DEV
-fi
+prepare_test_dev
 
 if [ -z $TEST_MNT ];then
echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant"
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 5/7] btrfs-progs: Set info->periodic.timer_fd to 0 in init-fail

2015-07-27 Thread Zhaolei

From: Zhao Lei 

In current code, (info->periodic.timer_fd == 0) means it is not
valid, as:
  if (info->periodic.timer_fd) {
  close(info->periodic.timer_fd);
  ...

We need set its value from -1 to 0 in create-fail case, to make
code like above works.

Signed-off-by: Zhao Lei 
---
 task-utils.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/task-utils.c b/task-utils.c
index 768be94..58f5195 100644
--- a/task-utils.c
+++ b/task-utils.c
@@ -94,8 +94,10 @@ int task_period_start(struct task_info *info, unsigned int 
period_ms)
return -1;
 
info->periodic.timer_fd = timerfd_create(CLOCK_MONOTONIC, 0);
-   if (info->periodic.timer_fd == -1)
+   if (info->periodic.timer_fd == -1) {
+   info->periodic.timer_fd = 0;
return info->periodic.timer_fd;
+   }
 
info->periodic.wakeups_missed = 0;
 
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 3/7] btrfs-progs: Remove cleanup-timer code for btrfs-convert

2015-07-27 Thread Zhaolei

From: Zhao Lei 

No need to close timer handle afain in subthread-close-callback,
it is closed by task_stop() automatically.

This patch also add a fflush() to make log output on time.

Signed-off-by: Zhao Lei 
---
 btrfs-convert.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/btrfs-convert.c b/btrfs-convert.c
index b89c685..7dfab7d 100644
--- a/btrfs-convert.c
+++ b/btrfs-convert.c
@@ -71,10 +71,8 @@ static void *print_copied_inodes(void *p)
 
 static int after_copied_inodes(void *p)
 {
-   struct task_ctx *priv = p;
-
printf("\n");
-   task_period_stop(priv->info);
+   fflush(stdout);
 
return 0;
 }
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 7/7] btrfs-progs: Introduce a misc test for thread conflict in btrfs-convert

2015-07-27 Thread Zhaolei

From: Zhao Lei 

Current code of btrfs-convert have a bug of thread conflict, which caused
invalid memory accessing between threads, and make program panic.

This patch add a test item for above bug, as:
 # ./misc-tests.sh
[TEST]   001-btrfstune-features
[TEST]   002-uuid-rewrite
[TEST]   003-zero-log
[TEST]   004-convert-thread-conflict
 failed: btrfs-convert /root/btrfsprogs/tests/test.img
 test failed for case 004-convert-thread-conflict
 #
 # cat misc-tests-results.txt
 ...
 ### btrfs-convert /root/btrfsprogs/tests/test.img
 trans 7 running 5
 ctree.c:363: btrfs_cow_block: Assertion `1` failed.
 btrfs-convert(btrfs_cow_block+0x92)[0x40acaf]
 btrfs-convert(btrfs_search_slot+0x1cb)[0x40c50f]
 btrfs-convert(btrfs_csum_file_block+0x20f)[0x41d83a]
 btrfs-convert[0x43422d]
 btrfs-convert[0x4342cd]
 btrfs-convert[0x4345ca]
 btrfs-convert[0x434767]
 btrfs-convert[0x435770]
 btrfs-convert[0x439748]
 btrfs-convert(main+0x13f8)[0x43b09d]
 /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd]
 btrfs-convert[0x407649]
 create btrfs filesystem:
 blocksize: 4096
 nodesize:  16384
 features:  extref, skinny-metadata (default)
 creating btrfs metadata.

 creating ext2fs image file.
 failed: btrfs-convert /root/btrfsprogs/tests/test.img
 test failed for case 004-convert-thread-conflict
 #

Note that this bug is not happened every time, especilly in slow
device as loop(slow cpu with fast block device is likely to trigger).
I set loop count to 20 to make bug happened in 90% tests.

Suggested-by: David Sterba 
Signed-off-by: Zhao Lei 
---
 tests/misc-tests/004-convert-thread-conflict/test.sh | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100755 tests/misc-tests/004-convert-thread-conflict/test.sh

diff --git a/tests/misc-tests/004-convert-thread-conflict/test.sh 
b/tests/misc-tests/004-convert-thread-conflict/test.sh
new file mode 100755
index 000..09ac8a3
--- /dev/null
+++ b/tests/misc-tests/004-convert-thread-conflict/test.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+# test convert-thread-conflict
+
+source $TOP/tests/common
+
+check_prereq btrfs
+mkfs.ext4 -V &>/dev/null || _not_run "mkfs.ext4 not found"
+prepare_test_dev 1G
+
+for ((i = 0; i < 20; i++)); do
+   echo "loop $i" >>$RESULTS
+   mkfs.ext4 -F "$TEST_DEV" &>>$RESULTS || _not_run "mkfs.ext4 failed"
+   run_check $TOP/btrfs-convert "$TEST_DEV"
+done
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 4/7] btrfs-progs: resst info->periodic.timer_fd's value after free

2015-07-27 Thread Zhaolei

From: Zhao Lei 

task_period_stop() is used to close a timer explicitly, to avoid
the timer handle closed again by task_stop(), we should reset its
value after close.

Also add value-reset for info->id for safe.

Signed-off-by: Zhao Lei 
---
 task-utils.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/task-utils.c b/task-utils.c
index 17fd573..768be94 100644
--- a/task-utils.c
+++ b/task-utils.c
@@ -64,10 +64,13 @@ void task_stop(struct task_info *info)
if (info->id > 0) {
pthread_cancel(info->id);
pthread_join(info->id, NULL);
+   info->id = -1;
}
 
-   if (info->periodic.timer_fd)
+   if (info->periodic.timer_fd) {
close(info->periodic.timer_fd);
+   info->periodic.timer_fd = 0;
+   }
 
if (info->postfn)
info->postfn(info->private_data);
@@ -130,5 +133,6 @@ void task_period_stop(struct task_info *info)
if (info->periodic.timer_fd) {
timerfd_settime(info->periodic.timer_fd, 0, NULL, NULL);
close(info->periodic.timer_fd);
+   info->periodic.timer_fd = -1;
}
 }
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 0/7] btrfs-progs: Fix wrong address accessing by subthread in btrfs-convert

2015-07-27 Thread Zhaolei

From: Zhao Lei 

It is v3 patchset of:
 Fix wrong address accessing by subthread in btrfs-convert.

Changelog v2->v3:
 1: Add $TOP path prefix to btrfs-convert command in:
[PATCH v2 7/7] btrfs-progs: Introduce a misc test for thread conflict in
btrfs-convert
To check prog in btrfs source dir.

Changelog v1->v2:
 1: Move bug test script from patch description into a single script in
test/misc-tests, suggested-by: David Sterba 
 2: Move code for creating loop device to common [PATCH 6/7]

Zhao Lei (7):
  btrfs-progs: Fix wrong address accessing by subthread in btrfs-convert
  btrfs-progs: Move close timer handle code to line after sub process
exit
  btrfs-progs: Remove cleanup-timer code for btrfs-convert
  btrfs-progs: resst info->periodic.timer_fd's value after free
  btrfs-progs: Set info->periodic.timer_fd to 0 in init-fail
  btrfs-progs: Move code to create loop device to common
  btrfs-progs: Introduce a misc test for thread conflict in
btrfs-convert

 btrfs-convert.c|  4 +---
 task-utils.c   | 22 ++
 tests/common   | 16 
 tests/fsck-tests/013-extent-tree-rebuild/test.sh   | 11 +--
 tests/misc-tests/001-btrfstune-features/test.sh| 11 +--
 tests/misc-tests/002-uuid-rewrite/test.sh  | 11 +--
 tests/misc-tests/003-zero-log/test.sh  | 11 +--
 .../misc-tests/004-convert-thread-conflict/test.sh | 14 ++
 8 files changed, 49 insertions(+), 51 deletions(-)
 create mode 100755 tests/misc-tests/004-convert-thread-conflict/test.sh

-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 1/7] btrfs-progs: Fix wrong address accessing by subthread in btrfs-convert

2015-07-27 Thread Zhaolei

From: Zhao Lei 

btrfs-convert sometimes show 'Assertion failed' in converting a nearly blank
file system, as:
  create btrfs filesystem:
  blocksize: 4096
  nodesize:  16384
  features:  extref, skinny-metadata (default)
  creating btrfs metadata.

  creating ext2fs image file.
  trans 7 running 5
  ctree.c:363: btrfs_cow_block: Assertion `1` failed.
  btrfs-convert(btrfs_cow_block+0x92)[0x40acaf]
  btrfs-convert(btrfs_search_slot+0x1cb)[0x40c50f]
  btrfs-convert(btrfs_csum_file_block+0x20f)[0x41d83a]
  btrfs-convert[0x43422d]
  btrfs-convert[0x4342cd]
  btrfs-convert[0x4345ca]
  btrfs-convert[0x434767]
  btrfs-convert[0x435770]
  btrfs-convert[0x439748]
  btrfs-convert(main+0x13f8)[0x43b09d]
  /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd]
  btrfs-convert[0x407649]

Reason is complex:
1: main thread allocated a block of memory,
   shared with sub thread
2: main thread killed sub thread, and free above memory
3: main thread malloc a new one(in same address),
   and use it
4: sub thread(which is not really quit), write into
   this address, and caused this bug.

By adding some debug lines into code, we can see following output:
  create btrfs filesystem:
  blocksize: 4096
  nodesize:  16384
  features:  extref, skinny-metadata (default)
  creating btrfs metadata.
  1:  ctx(0x7ffe1abde230)->info=0xc65b80
  2:  task_period_start: will create periodic.timer_fd
  3:  task_stop: info->periodic.timer_fd = NULL
  4:  task_stop: begin pthread_cancel info->id=-1746053376
  5:  task_stop: done pthread_cancel ret=0
  6:  task_stop: begin info->postfn
  7:  task_period_stop: periodic.timer_fd NULL
  8:  task_stop: done info->postfn
  9:  task_stop: done all
  10: creating ext2fs image file.
  trans 7 running 5
  11: task_period_start: create periodic.timer_fd done 
info->periodic.timer_fd(0xc65b80)=7
  12: btrfs_cow_block: root->fs_info->generation(0xc63568) = 5 
trans->transid(0xc65b80)=7
  13: ctree.c:368: btrfs_cow_block: Assertion `1` failed.
  ./btrfs-convert(btrfs_cow_block+0xda)[0x40ad37]
  ./btrfs-convert(btrfs_search_slot+0x1cb)[0x40c5b4]
  ./btrfs-convert(btrfs_insert_empty_items+0xac)[0x40d9f6]
  ./btrfs-convert(btrfs_record_file_extent+0xc0)[0x4183fe]
  ./btrfs-convert[0x435796]
  ./btrfs-convert[0x439b0c]
  ./btrfs-convert(main+0x13f8)[0x43b45d]
  /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd]
  ./btrfs-convert[0x407689]
  Conclusion:
  a: subthread should exit before step 5, but it is still running
 in step 11
  b: task_stop() hadn't close periodic.timer_fd in step3,
 because periodic.timer_fd is not initialized yet.
  c. address of 0xc65b80 is overwrited by subthread in step 11,
 but this address is freed and re-malloc by main thread
 before step 10, and used for trans->transid.
  d: trans->transid which is overwrite by subthread caused error
 in step 13.

Fix:
  pthread_cancel() only send a cancellation request to the thread,
  thread will quit in next cancellation point by default.
  To make sub thread quit in time, this patch add pthread_join()
  after pthread_cancel() call.
  And to make pthread_join() works, pthread_detach() is removed.

Signed-off-by: Zhao Lei 
---
 task-utils.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/task-utils.c b/task-utils.c
index 10e3f0f..0390a69 100644
--- a/task-utils.c
+++ b/task-utils.c
@@ -50,9 +50,7 @@ int task_start(struct task_info *info)
ret = pthread_create(&info->id, NULL, info->threadfn,
 info->private_data);
 
-   if (ret == 0)
-   pthread_detach(info->id);
-   else
+   if (ret)
info->id = -1;
 
return ret;
@@ -66,8 +64,10 @@ void task_stop(struct task_info *info)
if (info->periodic.timer_fd)
close(info->periodic.timer_fd);
 
-   if (info->id > 0)
+   if (info->id > 0) {
pthread_cancel(info->id);
+   pthread_join(info->id, NULL);
+   }
 
if (info->postfn)
info->postfn(info->private_data);
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs-progs: Show detail error message when write sb failed in write_dev_supers()

2015-07-27 Thread Zhaolei

From: Zhao Lei 

fsck-tests.sh failed and show following message in my node:
  # ./fsck-tests.sh
 [TEST]   001-bad-file-extent-bytenr
  disk-io.c:1444: write_dev_supers: Assertion `ret != BTRFS_SUPER_INFO_SIZE` 
failed.
  /root/btrfsprogs/btrfs-image(write_all_supers+0x2d2)[0x41031c]
  /root/btrfsprogs/btrfs-image(write_ctree_super+0xc5)[0x41042e]
  /root/btrfsprogs/btrfs-image(btrfs_commit_transaction+0x208)[0x410976]
  /root/btrfsprogs/btrfs-image[0x438780]
  /root/btrfsprogs/btrfs-image(main+0x3d5)[0x438c5c]
  /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd]
  /root/btrfsprogs/btrfs-image[0x4074e9]
  failed to restore image 
/root/btrfsprogs/tests/fsck-tests/001-bad-file-extent-bytenr/default_case.img
  #

  # cat fsck-tests-results.txt
  === Entering /root/btrfsprogs/tests/fsck-tests/001-bad-file-extent-bytenr
  restoring image default_case.img
  failed to restore image 
/root/btrfsprogs/tests/fsck-tests/001-bad-file-extent-bytenr/default_case.img
  #

Reason:
  I run above test in a NFS mountpoint, it don't have enouth space to write
  all superblock to image file, and don't support sparse file.
  So write_dev_supers() failed in writing sb and output above message.

It takes me quite of time to know what happened, we can save these time
by output exact information in write-sb-fail case.

After patch:
  # ./fsck-tests.sh
[TEST]   001-bad-file-extent-bytenr
  WARNING: Write sb failed: File too large
  disk-io.c:1492: write_all_supers: Assertion `ret` failed.
  ...
  #

Signed-off-by: Zhao Lei 
---
 disk-io.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/disk-io.c b/disk-io.c
index fdcfd6d..607d4a1 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -1419,7 +1419,8 @@ static int write_dev_supers(struct btrfs_root *root,
ret = pwrite64(device->fd, root->fs_info->super_copy,
BTRFS_SUPER_INFO_SIZE,
root->fs_info->super_bytenr);
-   BUG_ON(ret != BTRFS_SUPER_INFO_SIZE);
+   if (ret != BTRFS_SUPER_INFO_SIZE)
+   goto write_err;
return 0;
}
 
@@ -1441,10 +1442,19 @@ static int write_dev_supers(struct btrfs_root *root,
 */
ret = pwrite64(device->fd, root->fs_info->super_copy,
BTRFS_SUPER_INFO_SIZE, bytenr);
-   BUG_ON(ret != BTRFS_SUPER_INFO_SIZE);
+   if (ret != BTRFS_SUPER_INFO_SIZE)
+   goto write_err;
}
 
return 0;
+
+write_err:
+   if (ret > 0)
+   fprintf(stderr, "WARNING: Failed to write all sb data\n");
+   else
+   fprintf(stderr, "WARNING: Write sb failed: %s\n",
+   strerror(errno));
+   return ret;
 }
 
 int write_all_supers(struct btrfs_root *root)
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs-progs-tests: Add '-o loop' to mount command line in convert-tests.sh

2015-07-26 Thread Zhaolei

From: Zhao Lei 

To fix following bug:
 # ./convert-tests.sh
 [TEST]   ext2 4k nodesize, btrfs defaults
 failed: mount /root/btrfsprogs/tests/test.img /root/btrfsprogs/tests/mnt
 # tail convert-tests-results.txt
 ...
 ### mount /root/btrfsprogs/tests/test.img 
/root/btrfsprogs/tests/mnt
 mount: /root/btrfsprogs/tests/test.img is not a block device (maybe try `-o 
loop'?)
 failed: mount /root/btrfsprogs/tests/test.img /root/btrfsprogs/tests/mnt
 #

Signed-off-by: Zhao Lei 
---
 tests/convert-tests.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/convert-tests.sh b/tests/convert-tests.sh
index 14dde1f..efed90b 100755
--- a/tests/convert-tests.sh
+++ b/tests/convert-tests.sh
@@ -43,7 +43,7 @@ convert_test() {
 
# create a file to check btrfs-convert can convert regular file
# correct
-   run_check $SUDO_HELPER mount $IMAGE $TEST_MNT
+   run_check $SUDO_HELPER mount -o loop $IMAGE $TEST_MNT
run_check $SUDO_HELPER dd if=/dev/zero of=$TEST_MNT/test bs=$nodesize \
count=1 1>/dev/null 2>&1
run_check $SUDO_HELPER umount $TEST_MNT
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/7] btrfs-progs: Move close timer handle code to line after sub process exit

2015-07-26 Thread Zhaolei

From: Zhao Lei 

The timer handle have possibility in using by sub thread,
better to close it after sub process exit.

Signed-off-by: Zhao Lei 
---
 task-utils.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/task-utils.c b/task-utils.c
index 0390a69..17fd573 100644
--- a/task-utils.c
+++ b/task-utils.c
@@ -61,14 +61,14 @@ void task_stop(struct task_info *info)
if (!info)
return;
 
-   if (info->periodic.timer_fd)
-   close(info->periodic.timer_fd);
-
if (info->id > 0) {
pthread_cancel(info->id);
pthread_join(info->id, NULL);
}
 
+   if (info->periodic.timer_fd)
+   close(info->periodic.timer_fd);
+
if (info->postfn)
info->postfn(info->private_data);
 }
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 6/7] btrfs-progs: Move code to create loop device to common

2015-07-26 Thread Zhaolei

From: Zhao Lei 

This code block is common used, move it to common function will
make code clean.

Signed-off-by: Zhao Lei 
---
 tests/common | 16 
 tests/fsck-tests/013-extent-tree-rebuild/test.sh | 11 +--
 tests/misc-tests/001-btrfstune-features/test.sh  | 11 +--
 tests/misc-tests/002-uuid-rewrite/test.sh| 11 +--
 tests/misc-tests/003-zero-log/test.sh| 11 +--
 5 files changed, 20 insertions(+), 40 deletions(-)

diff --git a/tests/common b/tests/common
index 899ec7b..2d337b0 100644
--- a/tests/common
+++ b/tests/common
@@ -142,3 +142,19 @@ setup_root_helper()
fi
SUDO_HELPER=root_helper
 }
+
+prepare_test_dev()
+{
+   # num[K/M/G/T...]
+   local size="$1"
+
+   [[ "$TEST_DEV" ]] && return
+   [[ "$size" ]] || size='1G'
+
+   echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \
+   $RESULTS
+   TEST_DEV="$TOP/tests/test.img"
+
+   truncate -s "$size" "$TEST_DEV" || _not_run "create file for loop 
device failed"
+}
+
diff --git a/tests/fsck-tests/013-extent-tree-rebuild/test.sh 
b/tests/fsck-tests/013-extent-tree-rebuild/test.sh
index 3290cd7..88a66cc 100755
--- a/tests/fsck-tests/013-extent-tree-rebuild/test.sh
+++ b/tests/fsck-tests/013-extent-tree-rebuild/test.sh
@@ -5,16 +5,7 @@ source $TOP/tests/common
 check_prereq btrfs-debug-tree
 check_prereq mkfs.btrfs
 setup_root_helper
-
-if [ -z $TEST_DEV ]; then
-   echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \
-   $RESULTS
-   TEST_DEV="$TOP/tests/test.img"
-
-   # Need at least 1G to avoid mixed block group, which extent tree
-   # rebuild doesn't support.
-   run_check truncate -s 1G $TEST_DEV
-fi
+prepare_test_dev 1G
 
 if [ -z $TEST_MNT ];then
echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant"
diff --git a/tests/misc-tests/001-btrfstune-features/test.sh 
b/tests/misc-tests/001-btrfstune-features/test.sh
index c9981e6..ea33954 100755
--- a/tests/misc-tests/001-btrfstune-features/test.sh
+++ b/tests/misc-tests/001-btrfstune-features/test.sh
@@ -7,16 +7,7 @@ check_prereq btrfs-debug-tree
 check_prereq btrfs-show-super
 check_prereq mkfs.btrfs
 setup_root_helper
-
-if [ -z $TEST_DEV ]; then
-   echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \
-   $RESULTS
-   TEST_DEV="$TOP/tests/test.img"
-
-   # Need at least 1G to avoid mixed block group, which extent tree
-   # rebuild doesn't support.
-   run_check truncate -s 1G $TEST_DEV
-fi
+prepare_test_dev
 
 if [ -z $TEST_MNT ];then
echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant"
diff --git a/tests/misc-tests/002-uuid-rewrite/test.sh 
b/tests/misc-tests/002-uuid-rewrite/test.sh
index 6c2a393..bffa9b8 100755
--- a/tests/misc-tests/002-uuid-rewrite/test.sh
+++ b/tests/misc-tests/002-uuid-rewrite/test.sh
@@ -7,16 +7,7 @@ check_prereq btrfs-debug-tree
 check_prereq btrfs-show-super
 check_prereq mkfs.btrfs
 check_prereq btrfstune
-
-if [ -z $TEST_DEV ]; then
-   echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \
-   $RESULTS
-   TEST_DEV="$TOP/tests/test.img"
-
-   # Need at least 1G to avoid mixed block group, which extent tree
-   # rebuild doesn't support.
-   run_check truncate -s 1G $TEST_DEV
-fi
+prepare_test_dev
 
 if [ -z $TEST_MNT ];then
echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant"
diff --git a/tests/misc-tests/003-zero-log/test.sh 
b/tests/misc-tests/003-zero-log/test.sh
index da5b351..edab5db 100755
--- a/tests/misc-tests/003-zero-log/test.sh
+++ b/tests/misc-tests/003-zero-log/test.sh
@@ -6,16 +6,7 @@ source $TOP/tests/common
 check_prereq btrfs-show-super
 check_prereq mkfs.btrfs
 check_prereq btrfs
-
-if [ -z $TEST_DEV ]; then
-   echo "\$TEST_DEV not given, use $TOP/test/test.img as fallback" >> \
-   $RESULTS
-   TEST_DEV="$TOP/tests/test.img"
-
-   # Need at least 1G to avoid mixed block group, which extent tree
-   # rebuild doesn't support.
-   run_check truncate -s 1G $TEST_DEV
-fi
+prepare_test_dev
 
 if [ -z $TEST_MNT ];then
echo "[NOTRUN] extent tree rebuild, need TEST_MNT variant"
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 7/7] btrfs-progs: Introduce a misc test for thread conflict in btrfs-convert

2015-07-26 Thread Zhaolei

From: Zhao Lei 

Current code of btrfs-convert have a bug of thread conflict, which caused
invalid memory accessing between threads, and make program panic.

This patch add a test item for above bug, as:
 # ./misc-tests.sh
[TEST]   001-btrfstune-features
[TEST]   002-uuid-rewrite
[TEST]   003-zero-log
[TEST]   004-convert-thread-conflict
 failed: btrfs-convert /root/btrfsprogs/tests/test.img
 test failed for case 004-convert-thread-conflict
 #
 # cat misc-tests-results.txt
 ...
 ### btrfs-convert /root/btrfsprogs/tests/test.img
 trans 7 running 5
 ctree.c:363: btrfs_cow_block: Assertion `1` failed.
 btrfs-convert(btrfs_cow_block+0x92)[0x40acaf]
 btrfs-convert(btrfs_search_slot+0x1cb)[0x40c50f]
 btrfs-convert(btrfs_csum_file_block+0x20f)[0x41d83a]
 btrfs-convert[0x43422d]
 btrfs-convert[0x4342cd]
 btrfs-convert[0x4345ca]
 btrfs-convert[0x434767]
 btrfs-convert[0x435770]
 btrfs-convert[0x439748]
 btrfs-convert(main+0x13f8)[0x43b09d]
 /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd]
 btrfs-convert[0x407649]
 create btrfs filesystem:
 blocksize: 4096
 nodesize:  16384
 features:  extref, skinny-metadata (default)
 creating btrfs metadata.

 creating ext2fs image file.
 failed: btrfs-convert /root/btrfsprogs/tests/test.img
 test failed for case 004-convert-thread-conflict
 #

Note that this bug is not happened every time, especilly in slow
device as loop(slow cpu with fast block device is likely to trigger).
I set loop count to 20 to make bug happened in 90% tests.

Suggested-by: David Sterba 
Signed-off-by: Zhao Lei 
---
 tests/misc-tests/004-convert-thread-conflict/test.sh | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100755 tests/misc-tests/004-convert-thread-conflict/test.sh

diff --git a/tests/misc-tests/004-convert-thread-conflict/test.sh 
b/tests/misc-tests/004-convert-thread-conflict/test.sh
new file mode 100755
index 000..d02e766
--- /dev/null
+++ b/tests/misc-tests/004-convert-thread-conflict/test.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+# test convert-thread-conflict
+
+source $TOP/tests/common
+
+check_prereq btrfs
+mkfs.ext4 -V &>/dev/null || _not_run "mkfs.ext4 not found"
+prepare_test_dev 1G
+
+for ((i = 0; i < 20; i++)); do
+   echo "loop $i" >>$RESULTS
+   mkfs.ext4 -F "$TEST_DEV" &>>$RESULTS || _not_run "mkfs.ext4 failed"
+   run_check btrfs-convert "$TEST_DEV"
+done
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 5/7] btrfs-progs: Set info->periodic.timer_fd to 0 in init-fail

2015-07-26 Thread Zhaolei

From: Zhao Lei 

In current code, (info->periodic.timer_fd == 0) means it is not
valid, as:
  if (info->periodic.timer_fd) {
  close(info->periodic.timer_fd);
  ...

We need set its value from -1 to 0 in create-fail case, to make
code like above works.

Signed-off-by: Zhao Lei 
---
 task-utils.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/task-utils.c b/task-utils.c
index 768be94..58f5195 100644
--- a/task-utils.c
+++ b/task-utils.c
@@ -94,8 +94,10 @@ int task_period_start(struct task_info *info, unsigned int 
period_ms)
return -1;
 
info->periodic.timer_fd = timerfd_create(CLOCK_MONOTONIC, 0);
-   if (info->periodic.timer_fd == -1)
+   if (info->periodic.timer_fd == -1) {
+   info->periodic.timer_fd = 0;
return info->periodic.timer_fd;
+   }
 
info->periodic.wakeups_missed = 0;
 
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/7] btrfs-progs: Fix wrong address accessing by subthread in btrfs-convert

2015-07-26 Thread Zhaolei

From: Zhao Lei 

btrfs-convert sometimes show 'Assertion failed' in converting a nearly blank
file system, as:
  create btrfs filesystem:
  blocksize: 4096
  nodesize:  16384
  features:  extref, skinny-metadata (default)
  creating btrfs metadata.

  creating ext2fs image file.
  trans 7 running 5
  ctree.c:363: btrfs_cow_block: Assertion `1` failed.
  btrfs-convert(btrfs_cow_block+0x92)[0x40acaf]
  btrfs-convert(btrfs_search_slot+0x1cb)[0x40c50f]
  btrfs-convert(btrfs_csum_file_block+0x20f)[0x41d83a]
  btrfs-convert[0x43422d]
  btrfs-convert[0x4342cd]
  btrfs-convert[0x4345ca]
  btrfs-convert[0x434767]
  btrfs-convert[0x435770]
  btrfs-convert[0x439748]
  btrfs-convert(main+0x13f8)[0x43b09d]
  /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd]
  btrfs-convert[0x407649]

Reason is complex:
1: main thread allocated a block of memory,
   shared with sub thread
2: main thread killed sub thread, and free above memory
3: main thread malloc a new one(in same address),
   and use it
4: sub thread(which is not really quit), write into
   this address, and caused this bug.

By adding some debug lines into code, we can see following output:
  create btrfs filesystem:
  blocksize: 4096
  nodesize:  16384
  features:  extref, skinny-metadata (default)
  creating btrfs metadata.
  1:  ctx(0x7ffe1abde230)->info=0xc65b80
  2:  task_period_start: will create periodic.timer_fd
  3:  task_stop: info->periodic.timer_fd = NULL
  4:  task_stop: begin pthread_cancel info->id=-1746053376
  5:  task_stop: done pthread_cancel ret=0
  6:  task_stop: begin info->postfn
  7:  task_period_stop: periodic.timer_fd NULL
  8:  task_stop: done info->postfn
  9:  task_stop: done all
  10: creating ext2fs image file.
  trans 7 running 5
  11: task_period_start: create periodic.timer_fd done 
info->periodic.timer_fd(0xc65b80)=7
  12: btrfs_cow_block: root->fs_info->generation(0xc63568) = 5 
trans->transid(0xc65b80)=7
  13: ctree.c:368: btrfs_cow_block: Assertion `1` failed.
  ./btrfs-convert(btrfs_cow_block+0xda)[0x40ad37]
  ./btrfs-convert(btrfs_search_slot+0x1cb)[0x40c5b4]
  ./btrfs-convert(btrfs_insert_empty_items+0xac)[0x40d9f6]
  ./btrfs-convert(btrfs_record_file_extent+0xc0)[0x4183fe]
  ./btrfs-convert[0x435796]
  ./btrfs-convert[0x439b0c]
  ./btrfs-convert(main+0x13f8)[0x43b45d]
  /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd]
  ./btrfs-convert[0x407689]
  Conclusion:
  a: subthread should exit before step 5, but it is still running
 in step 11
  b: task_stop() hadn't close periodic.timer_fd in step3,
 because periodic.timer_fd is not initialized yet.
  c. address of 0xc65b80 is overwrited by subthread in step 11,
 but this address is freed and re-malloc by main thread
 before step 10, and used for trans->transid.
  d: trans->transid which is overwrite by subthread caused error
 in step 13.

Fix:
  pthread_cancel() only send a cancellation request to the thread,
  thread will quit in next cancellation point by default.
  To make sub thread quit in time, this patch add pthread_join()
  after pthread_cancel() call.
  And to make pthread_join() works, pthread_detach() is removed.

Signed-off-by: Zhao Lei 
---
 task-utils.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/task-utils.c b/task-utils.c
index 10e3f0f..0390a69 100644
--- a/task-utils.c
+++ b/task-utils.c
@@ -50,9 +50,7 @@ int task_start(struct task_info *info)
ret = pthread_create(&info->id, NULL, info->threadfn,
 info->private_data);
 
-   if (ret == 0)
-   pthread_detach(info->id);
-   else
+   if (ret)
info->id = -1;
 
return ret;
@@ -66,8 +64,10 @@ void task_stop(struct task_info *info)
if (info->periodic.timer_fd)
close(info->periodic.timer_fd);
 
-   if (info->id > 0)
+   if (info->id > 0) {
pthread_cancel(info->id);
+   pthread_join(info->id, NULL);
+   }
 
if (info->postfn)
info->postfn(info->private_data);
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 3/7] btrfs-progs: Remove cleanup-timer code for btrfs-convert

2015-07-26 Thread Zhaolei

From: Zhao Lei 

No need to close timer handle afain in subthread-close-callback,
it is closed by task_stop() automatically.

This patch also add a fflush() to make log output on time.

Signed-off-by: Zhao Lei 
---
 btrfs-convert.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/btrfs-convert.c b/btrfs-convert.c
index b89c685..7dfab7d 100644
--- a/btrfs-convert.c
+++ b/btrfs-convert.c
@@ -71,10 +71,8 @@ static void *print_copied_inodes(void *p)
 
 static int after_copied_inodes(void *p)
 {
-   struct task_ctx *priv = p;
-
printf("\n");
-   task_period_stop(priv->info);
+   fflush(stdout);
 
return 0;
 }
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 4/7] btrfs-progs: resst info->periodic.timer_fd's value after free

2015-07-26 Thread Zhaolei

From: Zhao Lei 

task_period_stop() is used to close a timer explicitly, to avoid
the timer handle closed again by task_stop(), we should reset its
value after close.

Also add value-reset for info->id for safe.

Signed-off-by: Zhao Lei 
---
 task-utils.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/task-utils.c b/task-utils.c
index 17fd573..768be94 100644
--- a/task-utils.c
+++ b/task-utils.c
@@ -64,10 +64,13 @@ void task_stop(struct task_info *info)
if (info->id > 0) {
pthread_cancel(info->id);
pthread_join(info->id, NULL);
+   info->id = -1;
}
 
-   if (info->periodic.timer_fd)
+   if (info->periodic.timer_fd) {
close(info->periodic.timer_fd);
+   info->periodic.timer_fd = 0;
+   }
 
if (info->postfn)
info->postfn(info->private_data);
@@ -130,5 +133,6 @@ void task_period_stop(struct task_info *info)
if (info->periodic.timer_fd) {
timerfd_settime(info->periodic.timer_fd, 0, NULL, NULL);
close(info->periodic.timer_fd);
+   info->periodic.timer_fd = -1;
}
 }
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/7] btrfs-progs: Fix wrong address accessing by subthread in btrfs-convert

2015-07-26 Thread Zhaolei

From: Zhao Lei 

It is v2 patchset of:
 Fix wrong address accessing by subthread in btrfs-convert.

Changelog:
 1: Move bug test script from patch description into a single script in
test/misc-tests, suggested-by: David Sterba 
 2: Move code for creating loop device to common [PATCH 6/7]

Zhao Lei (7):
  btrfs-progs: Fix wrong address accessing by subthread in btrfs-convert
  btrfs-progs: Move close timer handle code to line after sub process
exit
  btrfs-progs: Remove cleanup-timer code for btrfs-convert
  btrfs-progs: resst info->periodic.timer_fd's value after free
  btrfs-progs: Set info->periodic.timer_fd to 0 in init-fail
  btrfs-progs: Move code to create loop device to common
  btrfs-progs: Introduce a misc test for thread conflict in
btrfs-convert

 btrfs-convert.c|  4 +---
 task-utils.c   | 22 ++
 tests/common   | 16 
 tests/fsck-tests/013-extent-tree-rebuild/test.sh   | 11 +--
 tests/misc-tests/001-btrfstune-features/test.sh| 11 +--
 tests/misc-tests/002-uuid-rewrite/test.sh  | 11 +--
 tests/misc-tests/003-zero-log/test.sh  | 11 +--
 .../misc-tests/004-convert-thread-conflict/test.sh | 14 ++
 8 files changed, 49 insertions(+), 51 deletions(-)
 create mode 100755 tests/misc-tests/004-convert-thread-conflict/test.sh

-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/5] btrfs-progs: Move close timer handle code to line after sub process exit

2015-07-24 Thread Zhaolei

From: Zhao Lei 

The timer handle have possibility in using by sub thread,
better to close it after sub process exit.

Signed-off-by: Zhao Lei 
---
 task-utils.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/task-utils.c b/task-utils.c
index 0390a69..17fd573 100644
--- a/task-utils.c
+++ b/task-utils.c
@@ -61,14 +61,14 @@ void task_stop(struct task_info *info)
if (!info)
return;
 
-   if (info->periodic.timer_fd)
-   close(info->periodic.timer_fd);
-
if (info->id > 0) {
pthread_cancel(info->id);
pthread_join(info->id, NULL);
}
 
+   if (info->periodic.timer_fd)
+   close(info->periodic.timer_fd);
+
if (info->postfn)
info->postfn(info->private_data);
 }
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/5] btrfs-progs: resst info->periodic.timer_fd's value after free

2015-07-24 Thread Zhaolei

From: Zhao Lei 

task_period_stop() is used to close a timer explicitly, to avoid
the timer handle closed again by task_stop(), we should reset its
value after close.

Also add value-reset for info->id for safe.

Signed-off-by: Zhao Lei 
---
 task-utils.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/task-utils.c b/task-utils.c
index 17fd573..768be94 100644
--- a/task-utils.c
+++ b/task-utils.c
@@ -64,10 +64,13 @@ void task_stop(struct task_info *info)
if (info->id > 0) {
pthread_cancel(info->id);
pthread_join(info->id, NULL);
+   info->id = -1;
}
 
-   if (info->periodic.timer_fd)
+   if (info->periodic.timer_fd) {
close(info->periodic.timer_fd);
+   info->periodic.timer_fd = 0;
+   }
 
if (info->postfn)
info->postfn(info->private_data);
@@ -130,5 +133,6 @@ void task_period_stop(struct task_info *info)
if (info->periodic.timer_fd) {
timerfd_settime(info->periodic.timer_fd, 0, NULL, NULL);
close(info->periodic.timer_fd);
+   info->periodic.timer_fd = -1;
}
 }
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/5] btrfs-progs: Remove cleanup-timer code for btrfs-convert

2015-07-24 Thread Zhaolei

From: Zhao Lei 

No need to close timer handle afain in subthread-close-callback,
it is closed by task_stop() automatically.

This patch also add a fflush() to make log output on time.

Signed-off-by: Zhao Lei 
---
 btrfs-convert.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/btrfs-convert.c b/btrfs-convert.c
index b89c685..7dfab7d 100644
--- a/btrfs-convert.c
+++ b/btrfs-convert.c
@@ -71,10 +71,8 @@ static void *print_copied_inodes(void *p)
 
 static int after_copied_inodes(void *p)
 {
-   struct task_ctx *priv = p;
-
printf("\n");
-   task_period_stop(priv->info);
+   fflush(stdout);
 
return 0;
 }
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/5] btrfs-progs: Set info->periodic.timer_fd to 0 in init-fail

2015-07-24 Thread Zhaolei

From: Zhao Lei 

In current code, (info->periodic.timer_fd == 0) means it is not
valid, as:
  if (info->periodic.timer_fd) {
  close(info->periodic.timer_fd);
  ...

We need set its value from -1 to 0 in create-fail case, to make
code like above works.

Signed-off-by: Zhao Lei 
---
 task-utils.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/task-utils.c b/task-utils.c
index 768be94..58f5195 100644
--- a/task-utils.c
+++ b/task-utils.c
@@ -94,8 +94,10 @@ int task_period_start(struct task_info *info, unsigned int 
period_ms)
return -1;
 
info->periodic.timer_fd = timerfd_create(CLOCK_MONOTONIC, 0);
-   if (info->periodic.timer_fd == -1)
+   if (info->periodic.timer_fd == -1) {
+   info->periodic.timer_fd = 0;
return info->periodic.timer_fd;
+   }
 
info->periodic.wakeups_missed = 0;
 
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/5] btrfs-progs: Fix wrong address accessing by subthread in btrfs-convert

2015-07-24 Thread Zhaolei

From: Zhao Lei 

We can reproduce this bug by a simple script:
  DEV=/dev/vdh
  for ((i = 0; i < 100; i++)); do
  echo "loop $i"
  mkfs.ext4 "$DEV" &>/dev/null || { echo mkfs fail; break; }
  btrfs-convert "$DEV" || break
  done

Result is like:
  loop 0
  ...
  loop 1
  ...
  loop 3
  create btrfs filesystem:
  blocksize: 4096
  nodesize:  16384
  features:  extref, skinny-metadata (default)
  creating btrfs metadata.

  creating ext2fs image file.
  trans 7 running 5
  ctree.c:363: btrfs_cow_block: Assertion `1` failed.
  btrfs-convert(btrfs_cow_block+0x92)[0x40acaf]
  btrfs-convert(btrfs_search_slot+0x1cb)[0x40c50f]
  btrfs-convert(btrfs_csum_file_block+0x20f)[0x41d83a]
  btrfs-convert[0x43422d]
  btrfs-convert[0x4342cd]
  btrfs-convert[0x4345ca]
  btrfs-convert[0x434767]
  btrfs-convert[0x435770]
  btrfs-convert[0x439748]
  btrfs-convert(main+0x13f8)[0x43b09d]
  /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd]
  btrfs-convert[0x407649]

Reason is complex:
1: main thread allocated a block of memory,
   shared with sub thread
2: main thread killed sub thread, and free above memory
3: main thread malloc a new one(in same address),
   and use it
4: sub thread(which is not really quit), write into
   this address, and caused this bug.

By adding some debug lines into code, we can see following output:
  create btrfs filesystem:
  blocksize: 4096
  nodesize:  16384
  features:  extref, skinny-metadata (default)
  creating btrfs metadata.
  1:  ctx(0x7ffe1abde230)->info=0xc65b80
  2:  task_period_start: will create periodic.timer_fd
  3:  task_stop: info->periodic.timer_fd = NULL
  4:  task_stop: begin pthread_cancel info->id=-1746053376
  5:  task_stop: done pthread_cancel ret=0
  6:  task_stop: begin info->postfn
  7:  task_period_stop: periodic.timer_fd NULL
  8:  task_stop: done info->postfn
  9:  task_stop: done all
  10: creating ext2fs image file.
  trans 7 running 5
  11: task_period_start: create periodic.timer_fd done 
info->periodic.timer_fd(0xc65b80)=7
  12: btrfs_cow_block: root->fs_info->generation(0xc63568) = 5 
trans->transid(0xc65b80)=7
  13: ctree.c:368: btrfs_cow_block: Assertion `1` failed.
  ./btrfs-convert(btrfs_cow_block+0xda)[0x40ad37]
  ./btrfs-convert(btrfs_search_slot+0x1cb)[0x40c5b4]
  ./btrfs-convert(btrfs_insert_empty_items+0xac)[0x40d9f6]
  ./btrfs-convert(btrfs_record_file_extent+0xc0)[0x4183fe]
  ./btrfs-convert[0x435796]
  ./btrfs-convert[0x439b0c]
  ./btrfs-convert(main+0x13f8)[0x43b45d]
  /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd]
  ./btrfs-convert[0x407689]
  Conclusion:
  a: subthread should exit before step 5, but it is still running
 in step 11
  b: task_stop() hadn't close periodic.timer_fd in step3,
 because periodic.timer_fd is not initialized yet.
  c. address of 0xc65b80 is overwrited by subthread in step 11,
 but this address is freed and re-malloc by main thread
 before step 10, and used for trans->transid.
  d: trans->transid which is overwrite by subthread caused error
 in step 13.

Fix:
  pthread_cancel() only send a cancellation request to the thread,
  thread will quit in next cancellation point by default.
  To make sub thread quit in time, this patch add pthread_join()
  after pthread_cancel() call.
  And to make pthread_join() works, pthread_detach() is removed.

Test result:
  Passed 400 times of above script

Signed-off-by: Zhao Lei 
---
 task-utils.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/task-utils.c b/task-utils.c
index 10e3f0f..0390a69 100644
--- a/task-utils.c
+++ b/task-utils.c
@@ -50,9 +50,7 @@ int task_start(struct task_info *info)
ret = pthread_create(&info->id, NULL, info->threadfn,
 info->private_data);
 
-   if (ret == 0)
-   pthread_detach(info->id);
-   else
+   if (ret)
info->id = -1;
 
return ret;
@@ -66,8 +64,10 @@ void task_stop(struct task_info *info)
if (info->periodic.timer_fd)
close(info->periodic.timer_fd);
 
-   if (info->id > 0)
+   if (info->id > 0) {
pthread_cancel(info->id);
+   pthread_join(info->id, NULL);
+   }
 
if (info->postfn)
info->postfn(info->private_data);
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: Fix scrub panic when leaf accross stripes

2015-07-22 Thread Zhaolei

From: Zhao Lei 

Scrub panic in following operation:
  mkfs.ext4 /dev/vdh
  btrfs-convert /dev/vdh
  mount /dev/vdh /mnt/tmp1
  btrfs scrub start -B /dev/vdh
  (panic)

Reason:
  1: In some case, leaf created by btrfs-convert was splited into 2
 strips.
  2: Scrub bypassed part of above wrong leaf data, but remain data
 caused panic in scrub_checksum_tree_block().

For reason 1:
  we can get following information after some simple operation.
  a. mkfs.ext4 /dev/vdh
 btrfs-convert /dev/vdh
  b. btrfs-debug-tree /dev/vdh
 we can see following item in extent tree:
 item 25 key (27054080 METADATA_ITEM 0) itemoff 15083 itemsize 33
 Its logical address is [27054080, 27070464)
 and acrossed 2 strips:
 [27000832, 27066368)
 [27066368, 27131904)
  Will be fixed in btrfs-progs(btrfs-convert, btrfsck, ...)

For reason 2:
  Scrub is trying to do a "bypass" in this case, but the result is
  "panic", because current code lacks of some condition in bypass,
  and let some wrong leaf data escaped.

This patch fixed above scrub code.

Before patch:
  # btrfs scrub start -B /dev/vdh
  (panic)

After patch:
  # btrfs scrub start -B /dev/vdh
  scrub done for 353cec8f-da31-4a94-aa35-be72d997b06e
  ...
  # dmesg
  ...
  [   59.088697] BTRFS error (device vdh): scrub: tree block 27054080 spanning 
stripes, ignored. logical=27000832
  [   59.089929] BTRFS error (device vdh): scrub: tree block 27054080 spanning 
stripes, ignored. logical=27066368
  #

Reported-by: Chris Murphy 
Signed-off-by: Zhao Lei 
---
 fs/btrfs/scrub.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 94db0fa..35d49b2 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2881,11 +2881,12 @@ static noinline_for_stack int 
scrub_raid56_parity(struct scrub_ctx *sctx,
flags = btrfs_extent_flags(l, extent);
generation = btrfs_extent_generation(l, extent);
 
-   if (key.objectid < logic_start &&
-   (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK)) {
-   btrfs_err(fs_info,
- "scrub: tree block %llu spanning 
stripes, ignored. logical=%llu",
-  key.objectid, logic_start);
+   if ((flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) &&
+   (key.objectid < logic_start ||
+key.objectid + bytes >
+logic_start + map->stripe_len)) {
+   btrfs_err(fs_info, "scrub: tree block %llu 
spanning stripes, ignored. logical=%llu",
+ key.objectid, logic_start);
goto next;
}
 again:
@@ -3212,8 +3213,10 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_ctx *sctx,
flags = btrfs_extent_flags(l, extent);
generation = btrfs_extent_generation(l, extent);
 
-   if (key.objectid < logical &&
-   (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK)) {
+   if ((flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) &&
+   (key.objectid < logical ||
+key.objectid + bytes >
+logical + map->stripe_len)) {
btrfs_err(fs_info,
   "scrub: tree block %llu spanning "
   "stripes, ignored. logical=%llu",
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs-progs: Accurate errormsg for resize operation on no-enouth-free-space case

2015-07-22 Thread Zhaolei

From: Zhao Lei 

btrfs progs output following error message when doing resize on
no-enouth-free-space case:
 # btrfs filesystem resize +10g /mnt/btrfs_5gb
 Resize '/mnt/btrfs_5gb' of '+10g'
 ERROR: unable to resize '/mnt/btrfs_5gb' - File too large
 #

It is not a good description for users, and this patch changed it to:
 # ./btrfs filesystem resize +10G /mnt/tmp1
 Resize '/mnt/tmp1' of '+10G'
 ERROR: unable to resize '/mnt/tmp1' - no enouth free space
 #

Reported-by: Taeha Kim 
Signed-off-by: Zhao Lei 
---
 cmds-filesystem.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 800aa4d..c393ce7 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -1327,8 +1327,16 @@ static int cmd_resize(int argc, char **argv)
e = errno;
close_file_or_dir(fd, dirstream);
if( res < 0 ){
-   fprintf(stderr, "ERROR: unable to resize '%s' - %s\n", 
-   path, strerror(e));
+   switch (e) {
+   case EFBIG:
+   fprintf(stderr, "ERROR: unable to resize '%s' - no 
enouth free space\n",
+   path);
+   break;
+   default:
+   fprintf(stderr, "ERROR: unable to resize '%s' - %s\n",
+   path, strerror(e));
+   break;
+   }
return 1;
} else if (res > 0) {
const char *err_str = btrfs_err_str(res);
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] btrfs: Bypass unrelated items before accessing its contents in scrub

2015-07-21 Thread Zhaolei

From: Zhao Lei 

When we access extent_root in scrub_stripe() and
scrub_raid56_parity(), we need bypass unrelated tree item firstly
before using its contents to do other condition.

It is not a bug fix, only making code sequence in logic.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/scrub.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index d72e8e1..3e8b3fc 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2856,6 +2856,10 @@ static noinline_for_stack int scrub_raid56_parity(struct 
scrub_ctx *sctx,
}
btrfs_item_key_to_cpu(l, &key, slot);
 
+   if (key.type != BTRFS_EXTENT_ITEM_KEY &&
+   key.type != BTRFS_METADATA_ITEM_KEY)
+   goto next;
+
if (key.type == BTRFS_METADATA_ITEM_KEY)
bytes = root->nodesize;
else
@@ -2864,10 +2868,6 @@ static noinline_for_stack int scrub_raid56_parity(struct 
scrub_ctx *sctx,
if (key.objectid + bytes <= logic_start)
goto next;
 
-   if (key.type != BTRFS_EXTENT_ITEM_KEY &&
-   key.type != BTRFS_METADATA_ITEM_KEY)
-   goto next;
-
if (key.objectid >= logic_end) {
stop_loop = 1;
break;
@@ -3192,6 +3192,10 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_ctx *sctx,
}
btrfs_item_key_to_cpu(l, &key, slot);
 
+   if (key.type != BTRFS_EXTENT_ITEM_KEY &&
+   key.type != BTRFS_METADATA_ITEM_KEY)
+   goto next;
+
if (key.type == BTRFS_METADATA_ITEM_KEY)
bytes = root->nodesize;
else
@@ -3200,10 +3204,6 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_ctx *sctx,
if (key.objectid + bytes <= logical)
goto next;
 
-   if (key.type != BTRFS_EXTENT_ITEM_KEY &&
-   key.type != BTRFS_METADATA_ITEM_KEY)
-   goto next;
-
if (key.objectid >= logical + map->stripe_len) {
/* out of this device extent */
if (key.objectid >= logic_end)
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] btrfs: Load only necessary csums into list in scrub

2015-07-21 Thread Zhaolei

From: Zhao Lei 

We need not load csum of whole strip in scrub because strip is trimed
before use, it is to say, what we really need to calculate csum is
data between [extent_logical, extent_len).

This patch changed to use above segment for btrfs_lookup_csums_range()
in scrub_stripe()

Signed-off-by: Zhao Lei 
---
 fs/btrfs/scrub.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 7f56603..d72e8e1 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3251,9 +3251,11 @@ again:
   &extent_dev,
   &extent_mirror_num);
 
-   ret = btrfs_lookup_csums_range(csum_root, logical,
-   logical + map->stripe_len - 1,
-   &sctx->csum_list, 1);
+   ret = btrfs_lookup_csums_range(csum_root,
+  extent_logical,
+  extent_logical +
+  extent_len - 1,
+  &sctx->csum_list, 1);
if (ret)
goto out;
 
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: Fix calculate typo caused by ambiguous meaning of logic_end

2015-07-21 Thread Zhaolei

From: Zhao Lei 

For example, in scrub_raid56_parity(), following lines are used
to judge is all data processed:
 place1: if (key.objectid > logic_end) ...
 place2: if (logic_start >= logic_end) ...
 ...
 (place2 is typo, is should be ">", it is copied from other
  place, where logic_end's meaning is different, long story...)

We can fix above typo directly, but the root reason is ambiguous
meaning of logic_end in scrub raid56 parity.

In other place, XXX_end is pointed to data which is not included,
and we need to process segment of [XXX_start, XXX_end).

But for scrub raid56 parity, logic_end is pointed to lattest data
need to process, and introduced many "+ 1" and "- 1" in code as
below:
 length = sparity->logic_end - sparity->logic_start + 1
 logic_end - logic_start + 1
 stripe_logical + increment - 1

This patch changed logic_end's meaning to make it in normal understanding
in raid56 parity functions and data struct alone with above bugfix.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/scrub.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 24720f6..7f56603 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2702,7 +2702,7 @@ static void scrub_parity_check_and_repair(struct 
scrub_parity *sparity)
   sparity->nsectors))
goto out;
 
-   length = sparity->logic_end - sparity->logic_start + 1;
+   length = sparity->logic_end - sparity->logic_start;
ret = btrfs_map_sblock(sctx->dev_root->fs_info, WRITE,
   sparity->logic_start,
   &length, &bbio, 0, 1);
@@ -2868,7 +2868,7 @@ static noinline_for_stack int scrub_raid56_parity(struct 
scrub_ctx *sctx,
key.type != BTRFS_METADATA_ITEM_KEY)
goto next;
 
-   if (key.objectid > logic_end) {
+   if (key.objectid >= logic_end) {
stop_loop = 1;
break;
}
@@ -2957,7 +2957,7 @@ next:
 out:
if (ret < 0)
scrub_parity_mark_sectors_error(sparity, logic_start,
-   logic_end - logic_start + 1);
+   logic_end - logic_start);
scrub_parity_put(sparity);
scrub_submit(sctx);
mutex_lock(&sctx->wr_ctx.wr_lock);
@@ -3138,7 +3138,7 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_ctx *sctx,
logical += base;
if (ret) {
stripe_logical += base;
-   stripe_end = stripe_logical + increment - 1;
+   stripe_end = stripe_logical + increment;
ret = scrub_raid56_parity(sctx, map, scrub_dev,
  ppath, stripe_logical,
  stripe_end);
@@ -3284,7 +3284,7 @@ loop:
if (ret && physical < physical_end) {
stripe_logical += base;
stripe_end = stripe_logical +
-   increment - 1;
+   increment;
ret = scrub_raid56_parity(sctx,
map, scrub_dev, ppath,
stripe_logical,
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] btrfs: Check cancel and pause in interval of scrub operation

2015-07-20 Thread Zhaolei

From: Zhao Lei 

Old code checking cancel and pause request inside scrub stripe
operation, like:
  loop() {
if (parity) {
  scrub_parity_stripe();
  continue;
}

check_cancel_and_pause()

scrub_normal_stripe();
  }

Reason is when introduce raid56 stripe scrub, new code is inserted
simplely to front of loop.

Better to:
  loop() {
check_cancel_and_pause()

if (parity)
  scrub_parity_stripe();
else
  scrub_normal_stripe();
  }

This patch adjusted code place to realize above sequence.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/scrub.c | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 94db0fa..f8551b9 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3104,22 +3104,6 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_ctx *sctx,
 */
ret = 0;
while (physical < physical_end) {
-   /* for raid56, we skip parity stripe */
-   if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
-   ret = get_raid56_logic_offset(physical, num,
-   map, &logical, &stripe_logical);
-   logical += base;
-   if (ret) {
-   stripe_logical += base;
-   stripe_end = stripe_logical + increment - 1;
-   ret = scrub_raid56_parity(sctx, map, scrub_dev,
-   ppath, stripe_logical,
-   stripe_end);
-   if (ret)
-   goto out;
-   goto skip;
-   }
-   }
/*
 * canceled?
 */
@@ -3144,6 +3128,24 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_ctx *sctx,
scrub_blocked_if_needed(fs_info);
}
 
+   /* for raid56, we skip parity stripe */
+   if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
+   ret = get_raid56_logic_offset(physical, num, map,
+ &logical,
+ &stripe_logical);
+   logical += base;
+   if (ret) {
+   stripe_logical += base;
+   stripe_end = stripe_logical + increment - 1;
+   ret = scrub_raid56_parity(sctx, map, scrub_dev,
+ ppath, stripe_logical,
+ stripe_end);
+   if (ret)
+   goto out;
+   goto skip;
+   }
+   }
+
if (btrfs_fs_incompat(fs_info, SKINNY_METADATA))
key.type = BTRFS_METADATA_ITEM_KEY;
else
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] btrfs: Free checksum list on scrub_extent() fail

2015-07-20 Thread Zhaolei

From: Zhao Lei 

When scrub_extent() failed, we need to free previois created
checksum list.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/scrub.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index f8551b9..24720f6 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2923,10 +2923,12 @@ again:
  extent_dev, flags,
  generation,
  extent_mirror_num);
+
+   scrub_free_csums(sctx);
+
if (ret)
goto out;
 
-   scrub_free_csums(sctx);
if (extent_logical + extent_len <
key.objectid + bytes) {
logic_start += map->stripe_len;
@@ -3259,10 +3261,12 @@ again:
   extent_physical, extent_dev, flags,
   generation, extent_mirror_num,
   extent_logical - logical + physical);
+
+   scrub_free_csums(sctx);
+
if (ret)
goto out;
 
-   scrub_free_csums(sctx);
if (extent_logical + extent_len <
key.objectid + bytes) {
if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: Show detail information when mount failed on missing devices

2015-07-20 Thread Zhaolei

From: Zhao Lei 

When mount failed because missing device, we can see following
dmesg:
 [ 1060.267743] BTRFS: too many missing devices, writeable mount is not allowed
 [ 1060.273158] BTRFS: open_ctree failed

This patch add missing_device_number and tolerated_missing_device_number
to above output, to let user know what really happened, and helps
bug-report and debug.

dmesg after patch:
 [  127.050367] BTRFS: missing devices(1) exceeds the limit(0), writeable mount 
is not allowed
 [  127.056099] BTRFS: open_ctree failed

Changelog v1->v2:
1: Changed to more clear description, suggested-by:
   Anand Jain 

Suggested-by: Anand Jain 
Signed-off-by: Zhao Lei 
---
 fs/btrfs/disk-io.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2eda03b..5b44e02 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2950,8 +2950,9 @@ retry_root_backup:
if (fs_info->fs_devices->missing_devices >
 fs_info->num_tolerated_disk_barrier_failures &&
!(sb->s_flags & MS_RDONLY)) {
-   printk(KERN_WARNING "BTRFS: "
-   "too many missing devices, writeable mount is not 
allowed\n");
+   pr_warn("BTRFS: missing devices(%llu) exceeds the limit(%d), 
writeable mount is not allowed\n",
+   fs_info->fs_devices->missing_devices,
+   fs_info->num_tolerated_disk_barrier_failures);
goto fail_sysfs;
}
 
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] btrfs: Add raid56 support for updating num_tolerated_disk_barrier_failures in btrfs_balance()

2015-07-20 Thread Zhaolei

From: Zhao Lei 

Code for updating fs_info->num_tolerated_disk_barrier_failures in
btrfs_balance() lacks raid56 support.

Reason:
 Above code was wroten in 2012-08-01, together with
 btrfs_calc_num_tolerated_disk_barrier_failures()'s first version.

 Then, btrfs_calc_num_tolerated_disk_barrier_failures() got updated
 later to support raid56, but code in btrfs_balance() was not
 updated together.

Fix:
 Merge above similar code to a common function:
 btrfs_get_num_tolerated_disk_barrier_failures()
 and make it support both case.

 It can fix this bug with a bonus of cleanup, and make these code
 never in above no-sync state from now on.

Changelog v1-v2:
 1: Use a common function instead of adding extra argument to
btrfs_calc_num_tolerated_disk_barrier_failures(), which is quite
usefully used by many more functions, Suggested-by:
Anand Jain 

Suggested-by: Anand Jain 
Signed-off-by: Zhao Lei 
---
 fs/btrfs/disk-io.c | 47 +--
 fs/btrfs/disk-io.h |  1 +
 fs/btrfs/volumes.c | 21 -
 3 files changed, 30 insertions(+), 39 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b6600c7..1e5dcfd 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3440,6 +3440,26 @@ static int barrier_all_devices(struct btrfs_fs_info 
*info)
return 0;
 }
 
+int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags)
+{
+   if ((flags & (BTRFS_BLOCK_GROUP_DUP |
+ BTRFS_BLOCK_GROUP_RAID0 |
+ BTRFS_AVAIL_ALLOC_BIT_SINGLE)) ||
+   ((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0))
+   return 0;
+
+   if (flags & (BTRFS_BLOCK_GROUP_RAID1 |
+BTRFS_BLOCK_GROUP_RAID5 |
+BTRFS_BLOCK_GROUP_RAID10))
+   return 1;
+
+   if (flags & BTRFS_BLOCK_GROUP_RAID6)
+   return 2;
+
+   pr_warn("BTRFS: unknown raid type: %llu\n", flags);
+   return 0;
+}
+
 int btrfs_calc_num_tolerated_disk_barrier_failures(
struct btrfs_fs_info *fs_info)
 {
@@ -3482,28 +3502,11 @@ int btrfs_calc_num_tolerated_disk_barrier_failures(
if (space.total_bytes == 0 || space.used_bytes == 0)
continue;
flags = space.flags;
-   /*
-* return
-* 0: if dup, single or RAID0 is configured for
-*any of metadata, system or data, else
-* 1: if RAID5 is configured, or if RAID1 or
-*RAID10 is configured and only two mirrors
-*are used, else
-* 2: if RAID6 is configured
-*/
-   if (num_tolerated_disk_barrier_failures > 0 &&
-   ((flags & (BTRFS_BLOCK_GROUP_DUP |
-  BTRFS_BLOCK_GROUP_RAID0)) ||
-((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0)))
-   num_tolerated_disk_barrier_failures = 0;
-   else if (num_tolerated_disk_barrier_failures > 1 &&
-  (flags & (BTRFS_BLOCK_GROUP_RAID1 |
-BTRFS_BLOCK_GROUP_RAID5 |
-BTRFS_BLOCK_GROUP_RAID10)))
-   num_tolerated_disk_barrier_failures = 1;
-   else if (num_tolerated_disk_barrier_failures > 2 &&
-  (flags & BTRFS_BLOCK_GROUP_RAID6))
-   num_tolerated_disk_barrier_failures = 2;
+
+   num_tolerated_disk_barrier_failures = min(
+   num_tolerated_disk_barrier_failures,
+   btrfs_get_num_tolerated_disk_barrier_failures(
+   flags));
}
up_read(&sinfo->groups_sem);
}
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index d4cbfee..bdfb479 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -139,6 +139,7 @@ struct btrfs_root *btrfs_create_tree(struct 
btrfs_trans_handle *trans,
 u64 objectid);
 int btree_lock_page_hook(struct page *page, void *data,
void (*flush_fn)(void *));
+int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags);
 int btrfs_calc_num_tolerated_disk_barrier_failures(
struct btrfs_fs_info *fs_info);
 int __init btrfs_end_io_wq_init(void);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index fbe7c10..a4392ad 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3573,23 +3573,10 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
} while (read_seqretry(&fs_info->profiles_lock, seq));
 
if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
-

[PATCH] [RFC] btrfs: Avoid using single-type chunk on degree mode

2015-07-17 Thread Zhaolei

From: Zhao Lei 

We can get mount-fail in following operation:
  # mkfs a raid1 filesystem
  mkfs.btrfs -f -d raid1 -m raid1 /dev/vdd /dev/vde

  # destroy a disk
  dd if=/dev/zero of=/dev/vde bs=1M count=1

  # do some fs operation on degraded mode
  mount -o degraded /dev/vdd /mnt/test
  touch /mnt/test/123
  rm -f /mnt/test/123
  sync
  umount /mnt/test

  # mount fs again
  mount -o degraded /dev/vdd  /mnt/test

Above mount will output following error message:
  mount: wrong fs type, bad option, bad superblock on /dev/vdd,
   missing codepage or helper program, or other error
   In some cases useful info is found in syslog - try
   dmesg | tail  or so
With following dmesg:
  [  127.912406] BTRFS: too many missing devices(1 > 0), writeable mount is not 
allowed
  [  127.918128] BTRFS: open_ctree failed

Reason:
  When we do fs operation in degraded fs, btrfs_reduce_alloc_profile()
  have possibility to clean all existing raid mode flag because
  no-enouth-disk, and return a all-zero raid flag, and use this flag
  to do find_free_extent(), then write data into single-type chunk.

  In current version of mkfs, we have 3 single-type chunks in init,
  data will write to above chunks first.
  And for mkfs after Qu Wenruo 's patch
  to avoid creating above 3 single-type init chunks, find_free_extent()
  will create these chunks.

  And, because filesystem have data in single-mode chunks,
  btrfs_calc_num_tolerated_disk_barrier_failures() will return 0,
  it is to say, loss-one-disk fs is not allowed to mount,
  and caused above mount fail.

Fix:
  This problem is caused by multi-reason, but the main reason may
  be: we can't write data into sinele-mode chunk in degraded mode,
  except filesystem is created with single.

  This patch add a condition before find_free_extent(), if the
  filesystem is not created with single-mode(have other raid mode),
  we forbid write new datas to single chunks.

Fix result:
  This patch fixed above bug, but we can not write any data into
  filesystem in above degraded mount.
  (data write to single-mode chunk before patch)

  It is different with old style, which is better?
  (allow or not allow to write into single-mode chunk)?

  Or we have another better way to fix this bug?

Signed-off-by: Zhao Lei 
---
 fs/btrfs/ctree.h   |  3 ++-
 fs/btrfs/extent-tree.c | 60 +-
 fs/btrfs/inode.c   |  3 ++-
 fs/btrfs/super.c   |  2 +-
 fs/btrfs/volumes.c |  4 ++--
 5 files changed, 47 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 3b69324..11a5c4a 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3439,7 +3439,8 @@ int btrfs_remove_block_group(struct btrfs_trans_handle 
*trans,
 void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info);
 void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans,
   struct btrfs_root *root);
-u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data);
+u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data,
+   int no_device_reduce);
 void btrfs_clear_space_info_full(struct btrfs_fs_info *info);
 
 enum btrfs_reserve_flush_enum {
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1c2bd17..3cdbb1c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3737,7 +3737,8 @@ static u64 get_restripe_target(struct btrfs_fs_info 
*fs_info, u64 flags)
  * progress (either running or paused) picks the target profile (if it's
  * already available), otherwise falls back to plain reducing.
  */
-static u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags)
+static u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags,
+ int no_device_reduce)
 {
u64 num_devices = root->fs_info->fs_devices->rw_devices;
u64 target;
@@ -3759,13 +3760,16 @@ static u64 btrfs_reduce_alloc_profile(struct btrfs_root 
*root, u64 flags)
spin_unlock(&root->fs_info->balance_lock);
 
/* First, mask out the RAID levels which aren't possible */
-   if (num_devices == 1)
-   flags &= ~(BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID0 |
-  BTRFS_BLOCK_GROUP_RAID5);
-   if (num_devices < 3)
-   flags &= ~BTRFS_BLOCK_GROUP_RAID6;
-   if (num_devices < 4)
-   flags &= ~BTRFS_BLOCK_GROUP_RAID10;
+   if (!no_device_reduce) {
+   if (num_devices == 1)
+   flags &= ~(BTRFS_BLOCK_GROUP_RAID1 |
+  BTRFS_BLOCK_GROUP_RAID0 |
+  BTRFS_BLOCK_GROUP_RAID5);
+   if (num_devices < 3)
+   flags &= ~BTRFS_BLOCK_GROUP_RAID6;
+   if (num_devices < 4)
+   flags &= ~BTRFS_BLOCK_GROUP_RAID10;
+   }
 
tmp = flags & (BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RA

[PATCH] btrfs: Add raid56 support for updating num_tolerated_disk_barrier_failures in btrfs_balance()

2015-07-16 Thread Zhaolei

From: Zhao Lei 

Code for updating fs_info->num_tolerated_disk_barrier_failures in
btrfs_balance() lacks raid56 support.

Reason:
 Above code was wroten in 2012-08-01, together with
 btrfs_calc_num_tolerated_disk_barrier_failures()'s first version.

 Then, btrfs_calc_num_tolerated_disk_barrier_failures() was updated
 later to support raid56, but code in btrfs_balance() was not
 updated together.

Fix:
 Merge these similar code by adding a argument to
 btrfs_calc_num_tolerated_disk_barrier_failures() to make it
 support both case.

 It can fix this bug with a bonus of cleanup, and make these code
 never in current no-sync state from now on.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/disk-io.c |  9 +
 fs/btrfs/disk-io.h |  2 +-
 fs/btrfs/volumes.c | 28 +---
 3 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b6600c7..ac26111 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2946,7 +2946,7 @@ retry_root_backup:
goto fail_sysfs;
}
fs_info->num_tolerated_disk_barrier_failures =
-   btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
+   btrfs_calc_num_tolerated_disk_barrier_failures(fs_info, 0);
if (fs_info->fs_devices->missing_devices >
 fs_info->num_tolerated_disk_barrier_failures &&
!(sb->s_flags & MS_RDONLY)) {
@@ -3441,7 +3441,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 }
 
 int btrfs_calc_num_tolerated_disk_barrier_failures(
-   struct btrfs_fs_info *fs_info)
+   struct btrfs_fs_info *fs_info, u64 extra_flags)
 {
struct btrfs_ioctl_space_info space;
struct btrfs_space_info *sinfo;
@@ -3481,7 +3481,7 @@ int btrfs_calc_num_tolerated_disk_barrier_failures(
   &space);
if (space.total_bytes == 0 || space.used_bytes == 0)
continue;
-   flags = space.flags;
+   flags = space.flags | extra_flags;
/*
 * return
 * 0: if dup, single or RAID0 is configured for
@@ -3493,7 +3493,8 @@ int btrfs_calc_num_tolerated_disk_barrier_failures(
 */
if (num_tolerated_disk_barrier_failures > 0 &&
((flags & (BTRFS_BLOCK_GROUP_DUP |
-  BTRFS_BLOCK_GROUP_RAID0)) ||
+  BTRFS_BLOCK_GROUP_RAID0 |
+  BTRFS_AVAIL_ALLOC_BIT_SINGLE)) ||
 ((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0)))
num_tolerated_disk_barrier_failures = 0;
else if (num_tolerated_disk_barrier_failures > 1 &&
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index d4cbfee..aceaa8d 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -140,7 +140,7 @@ struct btrfs_root *btrfs_create_tree(struct 
btrfs_trans_handle *trans,
 int btree_lock_page_hook(struct page *page, void *data,
void (*flush_fn)(void *));
 int btrfs_calc_num_tolerated_disk_barrier_failures(
-   struct btrfs_fs_info *fs_info);
+   struct btrfs_fs_info *fs_info, u64 extra_flags);
 int __init btrfs_end_io_wq_init(void);
 void btrfs_end_io_wq_exit(void);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index fbe7c10..d739915 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1812,7 +1812,8 @@ int btrfs_rm_device(struct btrfs_root *root, char 
*device_path)
}
 
root->fs_info->num_tolerated_disk_barrier_failures =
-   btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info);
+   btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info,
+  0);
 
/*
 * at this point, the device is zero sized.  We want to
@@ -2342,7 +2343,8 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
*device_path)
}
 
root->fs_info->num_tolerated_disk_barrier_failures =
-   btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info);
+   btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info,
+  0);
ret = btrfs_commit_transaction(trans, root);
 
if (seeding_dev) {
@@ -3573,23 +3575,10 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
} while (read_seqretry(&fs_info->profiles_lock, seq));
 
if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
-   int num_tolerated_disk_barrier_failures;
-   u64 target = bctl->sys.target;
-
-   num_tolerated_disk_barrier_failures =
-   btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-   i

[PATCH] btrfs: Cleanup for btrfs_calc_num_tolerated_disk_barrier_failures()

2015-07-16 Thread Zhaolei

From: Zhao Lei 

1: Use ARRAY_SIZE(types) to replace a static-value variant:
   int num_types = 4;

2: Use 'continue' on condition to reduce one level tab
   if (!XXX) {
   code;
   ...
   }
   ->
   if (XXX)
   continue;
   code;
   ...

3: Put setting 'num_tolerated_disk_barrier_failures = 2' to
   (num_tolerated_disk_barrier_failures > 2) condition to make
   make logic neat.
   if (num_tolerated_disk_barrier_failures > 0 && XXX)
   num_tolerated_disk_barrier_failures = 0;
   else if (num_tolerated_disk_barrier_failures > 1) {
   if (XXX)
   num_tolerated_disk_barrier_failures = 1;
   else if (XXX)
   num_tolerated_disk_barrier_failures = 2;
   ->
   if (num_tolerated_disk_barrier_failures > 0 && XXX)
   num_tolerated_disk_barrier_failures = 0;
   if (num_tolerated_disk_barrier_failures > 1 && XXX)
   num_tolerated_disk_barrier_failures = ;
   if (num_tolerated_disk_barrier_failures > 2 && XXX)
   num_tolerated_disk_barrier_failures = 2;

4: Remove comment of:
   num_mirrors - 1: if RAID1 or RAID10 is configured and more
   than 2 mirrors are used.
   which is not fit with code.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/disk-io.c | 73 --
 1 file changed, 33 insertions(+), 40 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f556c37..2eda03b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3448,13 +3448,12 @@ int btrfs_calc_num_tolerated_disk_barrier_failures(
   BTRFS_BLOCK_GROUP_SYSTEM,
   BTRFS_BLOCK_GROUP_METADATA,
   BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA};
-   int num_types = 4;
int i;
int c;
int num_tolerated_disk_barrier_failures =
(int)fs_info->fs_devices->num_devices;
 
-   for (i = 0; i < num_types; i++) {
+   for (i = 0; i < ARRAY_SIZE(types); i++) {
struct btrfs_space_info *tmp;
 
sinfo = NULL;
@@ -3472,44 +3471,38 @@ int btrfs_calc_num_tolerated_disk_barrier_failures(
 
down_read(&sinfo->groups_sem);
for (c = 0; c < BTRFS_NR_RAID_TYPES; c++) {
-   if (!list_empty(&sinfo->block_groups[c])) {
-   u64 flags;
-
-   btrfs_get_block_group_info(
-   &sinfo->block_groups[c], &space);
-   if (space.total_bytes == 0 ||
-   space.used_bytes == 0)
-   continue;
-   flags = space.flags;
-   /*
-* return
-* 0: if dup, single or RAID0 is configured for
-*any of metadata, system or data, else
-* 1: if RAID5 is configured, or if RAID1 or
-*RAID10 is configured and only two mirrors
-*are used, else
-* 2: if RAID6 is configured, else
-* num_mirrors - 1: if RAID1 or RAID10 is
-*  configured and more than
-*  2 mirrors are used.
-*/
-   if (num_tolerated_disk_barrier_failures > 0 &&
-   ((flags & (BTRFS_BLOCK_GROUP_DUP |
-  BTRFS_BLOCK_GROUP_RAID0)) ||
-((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK)
- == 0)))
-   num_tolerated_disk_barrier_failures = 0;
-   else if (num_tolerated_disk_barrier_failures > 
1) {
-   if (flags & (BTRFS_BLOCK_GROUP_RAID1 |
-   BTRFS_BLOCK_GROUP_RAID5 |
-   BTRFS_BLOCK_GROUP_RAID10)) {
-   
num_tolerated_disk_barrier_failures = 1;
-   } else if (flags &
-  BTRFS_BLOCK_GROUP_RAID6) {
-   
num_tolerated_disk_barrier_failures = 2;
-   }
-   }
-   }
+   u64 flags;
+
+   if (list_empty(&sinfo->block_groups[c]))
+   continue;
+
+   btrfs_get_block_group_info(&sinfo->block_groups[c],
+  &space);
+   if (space.total_bytes == 0 || space.used_bytes == 0)
+

[PATCH] btrfs: Show detail information when mount failed on missing devices

2015-07-16 Thread Zhaolei

From: Zhao Lei 

When mount failed because missing device, we can see following
dmesg:
 [ 1060.267743] BTRFS: too many missing devices, writeable mount is not allowed
 [ 1060.273158] BTRFS: open_ctree failed

This patch add missing_device_number and tolerated_missing_device_number
to above output, to let user know what really happened, and helps
bug-report and debug.

dmesg after patch:
 [  127.912406] BTRFS: too many missing devices(1 > 0), writeable mount is not 
allowed
 [  127.918128] BTRFS: open_ctree failed

Signed-off-by: Zhao Lei 
---
 fs/btrfs/disk-io.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2eda03b..b6600c7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2950,8 +2950,9 @@ retry_root_backup:
if (fs_info->fs_devices->missing_devices >
 fs_info->num_tolerated_disk_barrier_failures &&
!(sb->s_flags & MS_RDONLY)) {
-   printk(KERN_WARNING "BTRFS: "
-   "too many missing devices, writeable mount is not 
allowed\n");
+   pr_warn("BTRFS: too many missing devices(%llu > %d), writeable 
mount is not allowed\n",
+   fs_info->fs_devices->missing_devices,
+   fs_info->num_tolerated_disk_barrier_failures);
goto fail_sysfs;
}
 
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: Avoid NULL pointer dereference of free_extent_buffer when read_tree_block() fail

2015-07-15 Thread Zhaolei

From: Zhao Lei 

When read_tree_block() failed, we can see following dmesg:
 [  134.371389] BUG: unable to handle kernel NULL pointer dereference at 
0063
 [  134.372236] IP: [] free_extent_buffer+0x21/0x90
 [  134.372236] PGD 0
 [  134.372236] Oops:  [#1] SMP
 [  134.372236] Modules linked in:
 [  134.372236] CPU: 0 PID: 2289 Comm: mount Not tainted 
4.2.0-rc1_HEAD_c65b99f046843d2455aa231747b5a07a999a9f3d_+ #115
 [  134.372236] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
 [  134.372236] task: 88003b6e1a00 ti: 880011e6 task.ti: 
880011e6
 [  134.372236] RIP: 0010:[]  [] 
free_extent_buffer+0x21/0x90
 ...
 [  134.372236] Call Trace:
 [  134.372236]  [] free_root_extent_buffers+0x91/0xb0
 [  134.372236]  [] free_root_pointers+0x17d/0x190
 [  134.372236]  [] open_ctree+0x1ca0/0x25b0
 [  134.372236]  [] ? disk_name+0x97/0xb0
 [  134.372236]  [] btrfs_mount+0x8fa/0xab0
 ...

Reason:
 read_tree_block() changed to return error number on fail,
 and this value(not NULL) is set to tree_root->node, then subsequent
 code will run to:
  free_root_pointers()
  ->free_root_extent_buffers()
  ->free_extent_buffer()
  ->atomic_read((extent_buffer *)(-E_XXX)->refs);
 and trigger above error.

Fix:
 Set tree_root->node to NULL on fail to make error_handle code
 happy.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/disk-io.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a9aadb2..f556c37 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2842,6 +2842,7 @@ int open_ctree(struct super_block *sb,
!extent_buffer_uptodate(chunk_root->node)) {
printk(KERN_ERR "BTRFS: failed to read chunk root on %s\n",
   sb->s_id);
+   chunk_root->node = NULL;
goto fail_tree_roots;
}
btrfs_set_root_node(&chunk_root->root_item, chunk_root->node);
@@ -2879,7 +2880,7 @@ retry_root_backup:
!extent_buffer_uptodate(tree_root->node)) {
printk(KERN_WARNING "BTRFS: failed to read tree root on %s\n",
   sb->s_id);
-
+   tree_root->node = NULL;
goto recovery_tree_root;
}
 
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: Fix lockdep warning of btrfs_run_delayed_iputs()

2015-07-14 Thread Zhaolei

From: Zhao Lei 

Liu Bo  reported a lockdep warning of
delayed_iput_sem in xfstests generic/241:
  [ 2061.345955] =
  [ 2061.346027] [ INFO: possible recursive locking detected ]
  [ 2061.346027] 4.1.0+ #268 Tainted: GW
  [ 2061.346027] -
  [ 2061.346027] btrfs-cleaner/3045 is trying to acquire lock:
  [ 2061.346027]  (&fs_info->delayed_iput_sem){..}, at:
  [] btrfs_run_delayed_iputs+0x6b/0x100
  [ 2061.346027] but task is already holding lock:
  [ 2061.346027]  (&fs_info->delayed_iput_sem){..}, at: 
[] btrfs_run_delayed_iputs+0x6b/0x100
  [ 2061.346027] other info that might help us debug this:
  [ 2061.346027]  Possible unsafe locking scenario:

  [ 2061.346027]CPU0
  [ 2061.346027]
  [ 2061.346027]   lock(&fs_info->delayed_iput_sem);
  [ 2061.346027]   lock(&fs_info->delayed_iput_sem);
  [ 2061.346027]
   *** DEADLOCK ***
It is rarely happened, about 1/400 in my test env.

The reason is recursion of btrfs_run_delayed_iputs():
  cleaner_kthread
  -> btrfs_run_delayed_iputs() *1
  -> get delayed_iput_sem lock *2
  -> iput()
  -> ...
  -> btrfs_commit_transaction()
  -> btrfs_run_delayed_iputs() *1
  -> get delayed_iput_sem lock (dead lock) *2
  *1: recursion of btrfs_run_delayed_iputs()
  *2: warning of lockdep about delayed_iput_sem

When fs is in high stress, new iputs may added into fs_info->delayed_iputs
list when btrfs_run_delayed_iputs() is running, which cause
second btrfs_run_delayed_iputs() run into down_read(&fs_info->delayed_iput_sem)
again, and cause above lockdep warning.

Actually, it will not cause real problem because both locks are read lock,
but to avoid lockdep warning, we can do a fix.

Fix:
  Don't do btrfs_run_delayed_iputs() in btrfs_commit_transaction() for
  cleaner_kthread thread to break above recursion path.
  cleaner_kthread is calling btrfs_run_delayed_iputs() explicitly in code,
  and don't need to call btrfs_run_delayed_iputs() again in
  btrfs_commit_transaction(), it also give us a bonus to avoid stack overflow.

Test:
  No above lockdep warning after patch in 1200 generic/241 tests.

Reported-by: Liu Bo 
Signed-off-by: Zhao Lei 
---
 fs/btrfs/transaction.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index c0f18e7..31248ad 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2152,7 +2152,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans,
 
kmem_cache_free(btrfs_trans_handle_cachep, trans);
 
-   if (current != root->fs_info->transaction_kthread)
+   if (current != root->fs_info->transaction_kthread &&
+   current != root->fs_info->cleaner_kthread)
btrfs_run_delayed_iputs(root);
 
return ret;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: Remove noused chunk_tree and chunk_objectid from scrub_enumerate_chunks() and scrub_chunk()

2015-07-09 Thread Zhaolei

From: Zhao Lei 

These variables are not used from introduced version , remove them.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/scrub.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index eb35176..f552937 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3321,7 +3321,6 @@ out:
 
 static noinline_for_stack int scrub_chunk(struct scrub_ctx *sctx,
  struct btrfs_device *scrub_dev,
- u64 chunk_tree, u64 chunk_objectid,
  u64 chunk_offset, u64 length,
  u64 dev_offset, int is_dev_replace)
 {
@@ -3372,8 +3371,6 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
struct btrfs_root *root = sctx->dev_root;
struct btrfs_fs_info *fs_info = root->fs_info;
u64 length;
-   u64 chunk_tree;
-   u64 chunk_objectid;
u64 chunk_offset;
int ret;
int slot;
@@ -3431,8 +3428,6 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
if (found_key.offset + length <= start)
goto skip;
 
-   chunk_tree = btrfs_dev_extent_chunk_tree(l, dev_extent);
-   chunk_objectid = btrfs_dev_extent_chunk_objectid(l, dev_extent);
chunk_offset = btrfs_dev_extent_chunk_offset(l, dev_extent);
 
/*
@@ -3449,8 +3444,8 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
dev_replace->cursor_right = found_key.offset + length;
dev_replace->cursor_left = found_key.offset;
dev_replace->item_needs_writeback = 1;
-   ret = scrub_chunk(sctx, scrub_dev, chunk_tree, chunk_objectid,
- chunk_offset, length, found_key.offset,
+   ret = scrub_chunk(sctx, scrub_dev, chunk_offset, length,
+ found_key.offset,
  is_dev_replace);
 
/*
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 3/4] btrfs: use scrub_pause_on/off() to reduce code in scrub_enumerate_chunks()

2015-07-07 Thread Zhaolei

From: Zhao Lei 

Use new intruduced scrub_pause_on/off() can make this code block
clean and more readable.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/scrub.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index cbfb8c7..a882a34 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3492,8 +3492,8 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
 
wait_event(sctx->list_wait,
   atomic_read(&sctx->bios_in_flight) == 0);
-   atomic_inc(&fs_info->scrubs_paused);
-   wake_up(&fs_info->scrub_pause_wait);
+
+   scrub_pause_on(fs_info);
 
/*
 * must be called before we decrease @scrub_paused.
@@ -3504,11 +3504,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
   atomic_read(&sctx->workers_pending) == 0);
atomic_set(&sctx->wr_ctx.flush_all_writes, 0);
 
-   mutex_lock(&fs_info->scrub_lock);
-   __scrub_blocked_if_needed(fs_info);
-   atomic_dec(&fs_info->scrubs_paused);
-   mutex_unlock(&fs_info->scrub_lock);
-   wake_up(&fs_info->scrub_pause_wait);
+   scrub_pause_off(fs_info);
 
btrfs_put_block_group(cache);
if (ret)
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 4/4] btrfs: Fix data checksum error cause by replace with io-load.

2015-07-07 Thread Zhaolei

From: Zhao Lei 

xfstests btrfs/070 sometimes failed.
In my test machine, its fail rate is about 30%.
In another vm(vmware), its fail rate is about 50%.

Reason:
  btrfs/070 do replace and defrag with fsstress simultaneously,
  after above operation, checksum error is found by scrub.

  Actually, it have no relationship with defrag operation, only
  replace with fsstress can trigger this bug.

  New data writen to target device have possibility rewrited by
  old data from source device by replace code in debug, to avoid
  above problem, we can set target block group to readonly in
  replace period, so new data requested by other operation will
  not write to same place with replace code.

  Before patch(4.1-rc3):
30% failed in 100 xfstests.
  After patch:
0% failed in 300 xfstests.

  It also happened in btrfs/071.

Changelog v2->v3:
1: Fix a typo(caused in rebase) which make xfstests failed in
   btrfs/073 and btrfs/066.
2: Rebase on top of integration-4.2
3: Do full xfstests(generic and btrfs group with 10 mount options)

Changelog v1->v2:
1: Update subject to reflect the problem being fixed.
2: Update description to say reason why set read-only can fix the
   problem.
3: Use a helper function to avoid duplicated code block for set
   chunk ro.
All of above are suggested by: David Sterba 

Reported-by: Qu Wenruo 
Suggested-by: Qu Wenruo 
Signed-off-by: Qu Wenruo 
Signed-off-by: Zhao Lei 
---
 fs/btrfs/scrub.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index a882a34..3a49a43 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3467,6 +3467,18 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
if (!cache)
goto skip;
 
+   /*
+* we need call btrfs_inc_block_group_ro() with scrubs_paused,
+* to avoid deadlock caused by:
+* btrfs_inc_block_group_ro()
+* -> btrfs_wait_for_commit()
+* -> btrfs_commit_transaction()
+* -> btrfs_scrub_pause()
+*/
+   scrub_pause_on(fs_info);
+   btrfs_inc_block_group_ro(root, cache);
+   scrub_pause_off(fs_info);
+
dev_replace->cursor_right = found_key.offset + length;
dev_replace->cursor_left = found_key.offset;
dev_replace->item_needs_writeback = 1;
@@ -3506,6 +3518,8 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
 
scrub_pause_off(fs_info);
 
+   btrfs_dec_block_group_ro(root, cache);
+
btrfs_put_block_group(cache);
if (ret)
break;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/4] btrfs: Separate scrub_blocked_if_needed() to scrub_pause_on/off()

2015-07-07 Thread Zhaolei

From: Zhao Lei 

It can reduce current duplicated code which is similar to
scrub_blocked_if_needed() but can not call it because little
different.
It also used by my next patch which is in same case.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/scrub.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 94db0fa..cbfb8c7 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -332,11 +332,14 @@ static void __scrub_blocked_if_needed(struct 
btrfs_fs_info *fs_info)
}
 }
 
-static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info)
+static void scrub_pause_on(struct btrfs_fs_info *fs_info)
 {
atomic_inc(&fs_info->scrubs_paused);
wake_up(&fs_info->scrub_pause_wait);
+}
 
+static void scrub_pause_off(struct btrfs_fs_info *fs_info)
+{
mutex_lock(&fs_info->scrub_lock);
__scrub_blocked_if_needed(fs_info);
atomic_dec(&fs_info->scrubs_paused);
@@ -345,6 +348,12 @@ static void scrub_blocked_if_needed(struct btrfs_fs_info 
*fs_info)
wake_up(&fs_info->scrub_pause_wait);
 }
 
+static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info)
+{
+   scrub_pause_on(fs_info);
+   scrub_pause_off(fs_info);
+}
+
 /*
  * used for workers that require transaction commits (i.e., for the
  * NOCOW case)
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 1/4] btrfs: Use ref_cnt for set_block_group_ro()

2015-07-07 Thread Zhaolei

From: Zhao Lei 

More than one code call set_block_group_ro() and restore rw in fail.

Old code use bool bit to save blockgroup's ro state, it can not
support parallel case(it is confirmd exist in my debug log).

This patch use ref count to store ro state, and rename
set_block_group_ro/set_block_group_rw
to
inc_block_group_ro/dec_block_group_ro.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/ctree.h   |  6 +++---
 fs/btrfs/extent-tree.c | 42 +-
 fs/btrfs/relocation.c  | 14 ++
 3 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index aac314e..f57e6ca 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1300,7 +1300,7 @@ struct btrfs_block_group_cache {
/* for raid56, this is a full stripe, without parity */
unsigned long full_stripe_len;
 
-   unsigned int ro:1;
+   unsigned int ro;
unsigned int iref:1;
unsigned int has_caching_ctl:1;
unsigned int removed:1;
@@ -3495,9 +3495,9 @@ int btrfs_cond_migrate_bytes(struct btrfs_fs_info 
*fs_info,
 void btrfs_block_rsv_release(struct btrfs_root *root,
 struct btrfs_block_rsv *block_rsv,
 u64 num_bytes);
-int btrfs_set_block_group_ro(struct btrfs_root *root,
+int btrfs_inc_block_group_ro(struct btrfs_root *root,
 struct btrfs_block_group_cache *cache);
-void btrfs_set_block_group_rw(struct btrfs_root *root,
+void btrfs_dec_block_group_ro(struct btrfs_root *root,
  struct btrfs_block_group_cache *cache);
 void btrfs_put_block_group_cache(struct btrfs_fs_info *info);
 u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1c2bd17..a436bd5 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8692,14 +8692,13 @@ static u64 update_block_group_flags(struct btrfs_root 
*root, u64 flags)
return flags;
 }
 
-static int set_block_group_ro(struct btrfs_block_group_cache *cache, int force)
+static int inc_block_group_ro(struct btrfs_block_group_cache *cache, int force)
 {
struct btrfs_space_info *sinfo = cache->space_info;
u64 num_bytes;
u64 min_allocable_bytes;
int ret = -ENOSPC;
 
-
/*
 * We need some metadata space and system metadata space for
 * allocating chunks in some corner cases until we force to set
@@ -8716,6 +8715,7 @@ static int set_block_group_ro(struct 
btrfs_block_group_cache *cache, int force)
spin_lock(&cache->lock);
 
if (cache->ro) {
+   cache->ro++;
ret = 0;
goto out;
}
@@ -8727,7 +8727,7 @@ static int set_block_group_ro(struct 
btrfs_block_group_cache *cache, int force)
sinfo->bytes_may_use + sinfo->bytes_readonly + num_bytes +
min_allocable_bytes <= sinfo->total_bytes) {
sinfo->bytes_readonly += num_bytes;
-   cache->ro = 1;
+   cache->ro++;
list_add_tail(&cache->ro_list, &sinfo->ro_bgs);
ret = 0;
}
@@ -8737,7 +8737,7 @@ out:
return ret;
 }
 
-int btrfs_set_block_group_ro(struct btrfs_root *root,
+int btrfs_inc_block_group_ro(struct btrfs_root *root,
 struct btrfs_block_group_cache *cache)
 
 {
@@ -8745,8 +8745,6 @@ int btrfs_set_block_group_ro(struct btrfs_root *root,
u64 alloc_flags;
int ret;
 
-   BUG_ON(cache->ro);
-
 again:
trans = btrfs_join_transaction(root);
if (IS_ERR(trans))
@@ -8789,7 +8787,7 @@ again:
goto out;
}
 
-   ret = set_block_group_ro(cache, 0);
+   ret = inc_block_group_ro(cache, 0);
if (!ret)
goto out;
alloc_flags = get_alloc_profile(root, cache->space_info->flags);
@@ -8797,7 +8795,7 @@ again:
 CHUNK_ALLOC_FORCE);
if (ret < 0)
goto out;
-   ret = set_block_group_ro(cache, 0);
+   ret = inc_block_group_ro(cache, 0);
 out:
if (cache->flags & BTRFS_BLOCK_GROUP_SYSTEM) {
alloc_flags = update_block_group_flags(root, cache->flags);
@@ -8860,7 +8858,7 @@ u64 btrfs_account_ro_block_groups_free_space(struct 
btrfs_space_info *sinfo)
return free_bytes;
 }
 
-void btrfs_set_block_group_rw(struct btrfs_root *root,
+void btrfs_dec_block_group_ro(struct btrfs_root *root,
  struct btrfs_block_group_cache *cache)
 {
struct btrfs_space_info *sinfo = cache->space_info;
@@ -8870,11 +8868,13 @@ void btrfs_set_block_group_rw(struct btrfs_root *root,
 
spin_lock(&sinfo->lock);
spin_lock(&cache->lock);
-   num_bytes = cache->key.offset - cache->reserved - cache->pinned -
-   cache->bytes_super - btrfs_block_group_used(&cache->item);
-   sinfo->bytes_read

[PATCH v3 0/4] btrfs: Fix data checksum error cause by replace with io-load

2015-07-07 Thread Zhaolei

From: Zhao Lei 

This patchset is used to fix data checksum error cause by replace with io-load.
It cause xfstests btrfs/070 and btrfs/071 sometimes failed.

See description in [PATCH 4/4] for detail.

Changelog v2->v3:
1: Fix a typo(caused in rebase) which make xfstests failed in
   btrfs/073 and btrfs/066.
2: Rebase on top of integration-4.2
3: Do full xfstests(generic and btrfs group with 10 mount options)

Changelog v1->v2:
1: Update subject to reflect the problem being fixed.
2: Update description to say reason why set read-only can fix the
   problem.
3: Use a helper function to avoid duplicated code block for set
   chunk ro.
All of above are suggested by: David Sterba 



Zhao Lei (4):
  btrfs: Use ref_cnt for set_block_group_ro()
  btrfs: Separate scrub_blocked_if_needed() to scrub_pause_on/off()
  btrfs: use scrub_pause_on/off() to reduce code in
scrub_enumerate_chunks()
  btrfs: Fix data checksum error cause by replace with io-load.

 fs/btrfs/ctree.h   |  6 +++---
 fs/btrfs/extent-tree.c | 42 +-
 fs/btrfs/relocation.c  | 14 ++
 fs/btrfs/scrub.c   | 35 +++
 4 files changed, 57 insertions(+), 40 deletions(-)

-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: add error handling for scrub_workers_get()

2015-06-12 Thread Zhaolei

From: Zhao Lei 

Although it is a rare case, we'd better free previous allocated
memory on error.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/scrub.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ab58115..eb35176 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3559,7 +3559,6 @@ static noinline_for_stack int scrub_supers(struct 
scrub_ctx *sctx,
 static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info,
int is_dev_replace)
 {
-   int ret = 0;
unsigned int flags = WQ_FREEZABLE | WQ_UNBOUND;
int max_active = fs_info->thread_pool_size;
 
@@ -3572,27 +3571,29 @@ static noinline_for_stack int scrub_workers_get(struct 
btrfs_fs_info *fs_info,
fs_info->scrub_workers =
btrfs_alloc_workqueue("btrfs-scrub", flags,
  max_active, 4);
-   if (!fs_info->scrub_workers) {
-   ret = -ENOMEM;
-   goto out;
-   }
+   if (!fs_info->scrub_workers)
+   goto fail_scrub_workers;
+
fs_info->scrub_wr_completion_workers =
btrfs_alloc_workqueue("btrfs-scrubwrc", flags,
  max_active, 2);
-   if (!fs_info->scrub_wr_completion_workers) {
-   ret = -ENOMEM;
-   goto out;
-   }
+   if (!fs_info->scrub_wr_completion_workers)
+   goto fail_scrub_wr_completion_workers;
+
fs_info->scrub_nocow_workers =
btrfs_alloc_workqueue("btrfs-scrubnc", flags, 1, 0);
-   if (!fs_info->scrub_nocow_workers) {
-   ret = -ENOMEM;
-   goto out;
-   }
+   if (!fs_info->scrub_nocow_workers)
+   goto fail_scrub_nocow_workers;
}
++fs_info->scrub_workers_refcnt;
-out:
-   return ret;
+   return 0;
+
+fail_scrub_nocow_workers:
+   btrfs_destroy_workqueue(fs_info->scrub_wr_completion_workers);
+fail_scrub_wr_completion_workers:
+   btrfs_destroy_workqueue(fs_info->scrub_workers);
+fail_scrub_workers:
+   return -ENOMEM;
 }
 
 static noinline_for_stack void scrub_workers_put(struct btrfs_fs_info *fs_info)
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: [RFC] Don't use workqueue for raid56 parity scrub

2015-06-12 Thread Zhaolei

From: Zhao Lei 

The code in workqueue only do fast initialization and submit a bio
in end, plus, it will not called in interrupe context, no need to
queue a work for this type of work.

Call it directly will make code simple, easy to debug and reduce
potential problem.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/raid56.c | 19 +--
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index fa72068..eea86d1 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2557,7 +2557,7 @@ static void raid56_parity_scrub_end_io(struct bio *bio, 
int err)
validate_rbio_for_parity_scrub(rbio);
 }
 
-static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio)
+static void async_scrub_parity(struct btrfs_raid_bio *rbio)
 {
int bios_to_read = 0;
struct bio_list bio_list;
@@ -2646,23 +2646,6 @@ finish:
validate_rbio_for_parity_scrub(rbio);
 }
 
-static void scrub_parity_work(struct btrfs_work *work)
-{
-   struct btrfs_raid_bio *rbio;
-
-   rbio = container_of(work, struct btrfs_raid_bio, work);
-   raid56_parity_scrub_stripe(rbio);
-}
-
-static void async_scrub_parity(struct btrfs_raid_bio *rbio)
-{
-   btrfs_init_work(&rbio->work, btrfs_rmw_helper,
-   scrub_parity_work, NULL, NULL);
-
-   btrfs_queue_work(rbio->fs_info->rmw_workers,
-&rbio->work);
-}
-
 void raid56_parity_submit_scrub_rbio(struct btrfs_raid_bio *rbio)
 {
if (!lock_stripe_add(rbio))
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: Update out-of-date "skip parity stripe" comment

2015-06-10 Thread Zhaolei

From: Zhao Lei 

Because btrfs support scrub raid56 parity stripe now.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/scrub.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index a13f91a..5ee5630 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3202,12 +3202,12 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_ctx *sctx,
 */
ret = 0;
while (physical < physical_end) {
-   /* for raid56, we skip parity stripe */
if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
ret = get_raid56_logic_offset(physical, num,
map, &logical, &stripe_logical);
logical += base;
if (ret) {
+   /* it is parity strip */
stripe_logical += base;
stripe_end = stripe_logical + increment - 1;
ret = scrub_raid56_parity(sctx, map, scrub_dev,
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: cleanup noused initialization of dev in btrfs_end_bio()

2015-06-08 Thread Zhaolei

From: Zhao Lei 

It is introduced by:
 c404e0dc2c843b154f9a36c3aec10d0a715d88eb
 Btrfs: fix use-after-free in the finishing procedure of the device replace

But seems no relationship with that bug, this patch revirt these
code block for cleanup.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/volumes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 174f5e1..1702adc 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5596,7 +5596,6 @@ static inline void btrfs_end_bbio(struct btrfs_bio *bbio, 
struct bio *bio, int e
 static void btrfs_end_bio(struct bio *bio, int err)
 {
struct btrfs_bio *bbio = bio->bi_private;
-   struct btrfs_device *dev = bbio->stripes[0].dev;
int is_orig_bio = 0;
 
if (err) {
@@ -5604,6 +5603,7 @@ static void btrfs_end_bio(struct bio *bio, int err)
if (err == -EIO || err == -EREMOTEIO) {
unsigned int stripe_index =
btrfs_io_bio(bio)->stripe_index;
+   struct btrfs_device *dev;
 
BUG_ON(stripe_index >= bbio->num_stripes);
dev = bbio->stripes[stripe_index].dev;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] btrfs: Fix lockdep warning of wr_ctx->wr_lock in scrub_free_wr_ctx()

2015-06-04 Thread Zhaolei

From: Zhao Lei 

lockdep report following warning in test:
 [25176.843958] =
 [25176.844519] [ INFO: inconsistent lock state ]
 [25176.845047] 4.1.0-rc3 #22 Tainted: GW
 [25176.845591] -
 [25176.846153] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
 [25176.846713] fsstress/26661 [HC0[0]:SC1[1]:HE1:SE0] takes:
 [25176.847246]  (&wr_ctx->wr_lock){+.?...}, at: [] 
scrub_free_ctx+0x2d/0xf0 [btrfs]
 [25176.847838] {SOFTIRQ-ON-W} state was registered at:
 [25176.848396]   [] __lock_acquire+0x6a0/0xe10
 [25176.848955]   [] lock_acquire+0xce/0x2c0
 [25176.849491]   [] mutex_lock_nested+0x7f/0x410
 [25176.850029]   [] scrub_stripe+0x4df/0x1080 [btrfs]
 [25176.850575]   [] scrub_chunk.isra.19+0x111/0x130 [btrfs]
 [25176.851110]   [] scrub_enumerate_chunks+0x27c/0x510 
[btrfs]
 [25176.851660]   [] btrfs_scrub_dev+0x1c7/0x6c0 [btrfs]
 [25176.852189]   [] btrfs_dev_replace_start+0x36e/0x450 
[btrfs]
 [25176.852771]   [] btrfs_ioctl+0x1e10/0x2d20 [btrfs]
 [25176.853315]   [] do_vfs_ioctl+0x318/0x570
 [25176.853868]   [] SyS_ioctl+0x41/0x80
 [25176.854406]   [] system_call_fastpath+0x12/0x6f
 [25176.854935] irq event stamp: 51506
 [25176.855511] hardirqs last  enabled at (51506): [] 
vprintk_emit+0x225/0x5e0
 [25176.856059] hardirqs last disabled at (51505): [] 
vprintk_emit+0xb7/0x5e0
 [25176.856642] softirqs last  enabled at (50886): [] 
__do_softirq+0x363/0x640
 [25176.857184] softirqs last disabled at (50949): [] 
irq_exit+0x10d/0x120
 [25176.857746]
 other info that might help us debug this:
 [25176.858845]  Possible unsafe locking scenario:
 [25176.859981]CPU0
 [25176.860537]
 [25176.861059]   lock(&wr_ctx->wr_lock);
 [25176.861705]   
 [25176.862272] lock(&wr_ctx->wr_lock);
 [25176.862881]
  *** DEADLOCK ***

Reason:
 Above warning is caused by:
 Interrupt
 -> bio_endio()
 -> ...
 -> scrub_put_ctx()
 -> scrub_free_ctx() *1
 -> ...
 -> mutex_lock(&wr_ctx->wr_lock);

 scrub_put_ctx() is allowed to be called in end_bio interrupt, but
 in code design, it will never call scrub_free_ctx(sctx) in interrupe
 context(above *1), because btrfs_scrub_dev() get one additional
 reference of sctx->refs, which makes scrub_free_ctx() only called
 withine btrfs_scrub_dev().

 Now the code runs out of our wish, because free sequence in
 scrub_pending_bio_dec() have a gap.

 Current code:
 ---+---
 scrub_pending_bio_dec()|  btrfs_scrub_dev
 ---+---
 atomic_dec(&sctx->bios_in_flight); |
 wake_up(&sctx->list_wait); |
| scrub_put_ctx()
| -> atomic_dec_and_test(&sctx->refs)
 scrub_put_ctx(sctx);   |
 -> atomic_dec_and_test(&sctx->refs)|
 -> scrub_free_ctx()|
 ---+---

 We expected:
 ---+---
 scrub_pending_bio_dec()|  btrfs_scrub_dev
 ---+---
 atomic_dec(&sctx->bios_in_flight); |
 wake_up(&sctx->list_wait); |
 scrub_put_ctx(sctx);   |
 -> atomic_dec_and_test(&sctx->refs)|
| scrub_put_ctx()
| -> atomic_dec_and_test(&sctx->refs)
| -> scrub_free_ctx()
 ---+---

Fix:
 Move scrub_pending_bio_dec() to a workqueue, to avoid this function run
 in interrupt context.
 Tested by check tracelog in debug.

Changelog v1->v2:
 Use workqueue instead of adjust function call sequence in v1,
 because v1 will introduce a bug pointed out by:
 Filipe David Manana 

Reported-by: Qu Wenruo 
Signed-off-by: Zhao Lei 
---
 fs/btrfs/async-thread.c |  1 +
 fs/btrfs/async-thread.h |  2 ++
 fs/btrfs/ctree.h|  1 +
 fs/btrfs/scrub.c| 26 +++---
 4 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index df9932b..1ce06c84 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -85,6 +85,7 @@ BTRFS_WORK_HELPER(extent_refs_helper);
 BTRFS_WORK_HELPER(scrub_helper);
 BTRFS_WORK_HELPER(scrubwrc_helper);
 BTRFS_WORK_HELPER(scrubnc_helper);
+BTRFS_WORK_HELPER(scrubparity_helper);
 
 static struct __btrfs_workqueue *
 __btrfs_alloc_workqueue(const char *name, unsigned int flags, int max_active,
diff --git a/fs/btrfs/async-thread.h b/fs/btrfs/async-thread.h
index ec2ee47..b0b093b 100644
--- a/fs/btrfs/async-thread.h
+++ b/fs/btrfs/async-thread.h
@@ -64,6 +64,8 @@ BTRFS_WORK_HELPER_PROTO(extent_refs_helper);
 BTRFS_WORK_HELPER_PROTO(scrub_helper);
 BTRFS_WORK_HELPER_PROTO(scrubwrc_helper);
 BTRFS_WORK_HELPER_PROTO(scrubnc_helper);
+BTRFS_WO

[PATCH] btrfs: Fix lockdep warning of wr_ctx->wr_lock in scrub_free_wr_ctx()

2015-06-02 Thread Zhaolei

From: Zhao Lei 

lockdep report following warning in test:
 [25176.843958] =
 [25176.844519] [ INFO: inconsistent lock state ]
 [25176.845047] 4.1.0-rc3 #22 Tainted: GW
 [25176.845591] -
 [25176.846153] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
 [25176.846713] fsstress/26661 [HC0[0]:SC1[1]:HE1:SE0] takes:
 [25176.847246]  (&wr_ctx->wr_lock){+.?...}, at: [] 
scrub_free_ctx+0x2d/0xf0 [btrfs]
 [25176.847838] {SOFTIRQ-ON-W} state was registered at:
 [25176.848396]   [] __lock_acquire+0x6a0/0xe10
 [25176.848955]   [] lock_acquire+0xce/0x2c0
 [25176.849491]   [] mutex_lock_nested+0x7f/0x410
 [25176.850029]   [] scrub_stripe+0x4df/0x1080 [btrfs]
 [25176.850575]   [] scrub_chunk.isra.19+0x111/0x130 [btrfs]
 [25176.851110]   [] scrub_enumerate_chunks+0x27c/0x510 
[btrfs]
 [25176.851660]   [] btrfs_scrub_dev+0x1c7/0x6c0 [btrfs]
 [25176.852189]   [] btrfs_dev_replace_start+0x36e/0x450 
[btrfs]
 [25176.852771]   [] btrfs_ioctl+0x1e10/0x2d20 [btrfs]
 [25176.853315]   [] do_vfs_ioctl+0x318/0x570
 [25176.853868]   [] SyS_ioctl+0x41/0x80
 [25176.854406]   [] system_call_fastpath+0x12/0x6f
 [25176.854935] irq event stamp: 51506
 [25176.855511] hardirqs last  enabled at (51506): [] 
vprintk_emit+0x225/0x5e0
 [25176.856059] hardirqs last disabled at (51505): [] 
vprintk_emit+0xb7/0x5e0
 [25176.856642] softirqs last  enabled at (50886): [] 
__do_softirq+0x363/0x640
 [25176.857184] softirqs last disabled at (50949): [] 
irq_exit+0x10d/0x120
 [25176.857746]
 other info that might help us debug this:
 [25176.858845]  Possible unsafe locking scenario:
 [25176.859981]CPU0
 [25176.860537]
 [25176.861059]   lock(&wr_ctx->wr_lock);
 [25176.861705]   
 [25176.862272] lock(&wr_ctx->wr_lock);
 [25176.862881]
  *** DEADLOCK ***

Reason:
 Above warning is caused by:
 Interrupt
 -> bio_endio()
 -> ...
 -> scrub_put_ctx()
 -> scrub_free_ctx() *1
 -> ...
 -> mutex_lock(&wr_ctx->wr_lock);

 scrub_put_ctx() is allowed to be called in end_bio interrupt, but
 in code design, it will never call scrub_free_ctx(sctx) in interrupe
 context(above *1), because btrfs_scrub_dev() get one additional
 reference of sctx->refs, which makes scrub_free_ctx() only called
 withine btrfs_scrub_dev().

 Now the code runs out of our wish, because free sequence in
 scrub_pending_bio_dec() have a gap.

 Current code:
 ---+---
 scrub_pending_bio_dec()|  btrfs_scrub_dev
 ---+---
 atomic_dec(&sctx->bios_in_flight); |
 wake_up(&sctx->list_wait); |
| scrub_put_ctx()
| -> atomic_dec_and_test(&sctx->refs)
 scrub_put_ctx(sctx);   |
 -> atomic_dec_and_test(&sctx->refs)|
 -> scrub_free_ctx()|
 ---+---

 We expected:
 ---+---
 scrub_pending_bio_dec()|  btrfs_scrub_dev
 ---+---
 atomic_dec(&sctx->bios_in_flight); |
 wake_up(&sctx->list_wait); |
 scrub_put_ctx(sctx);   |
 -> atomic_dec_and_test(&sctx->refs)|
| scrub_put_ctx()
| -> atomic_dec_and_test(&sctx->refs)
| -> scrub_free_ctx()
 ---+---

Fix:
 To fix above problem, we can move scrub_put_ctx() to line before
 atomic_dec(&sctx->bios_in_flight) in scrub_pending_bio_dec(), to force
 scrub_put_ctx() in btrfs_scrub_dev() run after scrub_put_ctx() in
 scrub_pending_bio_dec().

Reported-by: Qu Wenruo 
Signed-off-by: Zhao Lei 
---
 fs/btrfs/scrub.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ab58115..1b4b27c 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -317,9 +317,9 @@ static void scrub_pending_bio_inc(struct scrub_ctx *sctx)
 
 static void scrub_pending_bio_dec(struct scrub_ctx *sctx)
 {
+   scrub_put_ctx(sctx);
atomic_dec(&sctx->bios_in_flight);
wake_up(&sctx->list_wait);
-   scrub_put_ctx(sctx);
 }
 
 static void __scrub_blocked_if_needed(struct btrfs_fs_info *fs_info)
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/4] btrfs: Use ref_cnt for set_block_group_ro()

2015-05-29 Thread Zhaolei

From: Zhao Lei 

More than one code call set_block_group_ro() and restore rw in fail.

Old code use bool bit to save blockgroup's ro state, it can not
support parallel case(it is confirmd exist in my debug log).

This patch use ref count to store ro state, and rename
set_block_group_ro/set_block_group_rw
to
inc_block_group_ro/dec_block_group_ro.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/ctree.h   |  6 +++---
 fs/btrfs/extent-tree.c | 42 +-
 fs/btrfs/relocation.c  | 14 ++
 3 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6f364e1..74ce6fc 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1300,7 +1300,7 @@ struct btrfs_block_group_cache {
/* for raid56, this is a full stripe, without parity */
unsigned long full_stripe_len;
 
-   unsigned int ro:1;
+   unsigned int ro;
unsigned int iref:1;
unsigned int has_caching_ctl:1;
unsigned int removed:1;
@@ -3495,9 +3495,9 @@ int btrfs_cond_migrate_bytes(struct btrfs_fs_info 
*fs_info,
 void btrfs_block_rsv_release(struct btrfs_root *root,
 struct btrfs_block_rsv *block_rsv,
 u64 num_bytes);
-int btrfs_set_block_group_ro(struct btrfs_root *root,
+int btrfs_inc_block_group_ro(struct btrfs_root *root,
 struct btrfs_block_group_cache *cache);
-void btrfs_set_block_group_rw(struct btrfs_root *root,
+void btrfs_dec_block_group_ro(struct btrfs_root *root,
  struct btrfs_block_group_cache *cache);
 void btrfs_put_block_group_cache(struct btrfs_fs_info *info);
 u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 7effed6..6a82ba0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8751,14 +8751,13 @@ static u64 update_block_group_flags(struct btrfs_root 
*root, u64 flags)
return flags;
 }
 
-static int set_block_group_ro(struct btrfs_block_group_cache *cache, int force)
+static int inc_block_group_ro(struct btrfs_block_group_cache *cache, int force)
 {
struct btrfs_space_info *sinfo = cache->space_info;
u64 num_bytes;
u64 min_allocable_bytes;
int ret = -ENOSPC;
 
-
/*
 * We need some metadata space and system metadata space for
 * allocating chunks in some corner cases until we force to set
@@ -8775,6 +8774,7 @@ static int set_block_group_ro(struct 
btrfs_block_group_cache *cache, int force)
spin_lock(&cache->lock);
 
if (cache->ro) {
+   cache->ro++;
ret = 0;
goto out;
}
@@ -8786,7 +8786,7 @@ static int set_block_group_ro(struct 
btrfs_block_group_cache *cache, int force)
sinfo->bytes_may_use + sinfo->bytes_readonly + num_bytes +
min_allocable_bytes <= sinfo->total_bytes) {
sinfo->bytes_readonly += num_bytes;
-   cache->ro = 1;
+   cache->ro++;
list_add_tail(&cache->ro_list, &sinfo->ro_bgs);
ret = 0;
}
@@ -8796,7 +8796,7 @@ out:
return ret;
 }
 
-int btrfs_set_block_group_ro(struct btrfs_root *root,
+int btrfs_inc_block_group_ro(struct btrfs_root *root,
 struct btrfs_block_group_cache *cache)
 
 {
@@ -8804,8 +8804,6 @@ int btrfs_set_block_group_ro(struct btrfs_root *root,
u64 alloc_flags;
int ret;
 
-   BUG_ON(cache->ro);
-
 again:
trans = btrfs_join_transaction(root);
if (IS_ERR(trans))
@@ -8830,7 +8828,7 @@ again:
}
 
 
-   ret = set_block_group_ro(cache, 0);
+   ret = inc_block_group_ro(cache, 0);
if (!ret)
goto out;
alloc_flags = get_alloc_profile(root, cache->space_info->flags);
@@ -8838,7 +8836,7 @@ again:
 CHUNK_ALLOC_FORCE);
if (ret < 0)
goto out;
-   ret = set_block_group_ro(cache, 0);
+   ret = inc_block_group_ro(cache, 0);
 out:
if (cache->flags & BTRFS_BLOCK_GROUP_SYSTEM) {
alloc_flags = update_block_group_flags(root, cache->flags);
@@ -8899,7 +8897,7 @@ u64 btrfs_account_ro_block_groups_free_space(struct 
btrfs_space_info *sinfo)
return free_bytes;
 }
 
-void btrfs_set_block_group_rw(struct btrfs_root *root,
+void btrfs_dec_block_group_ro(struct btrfs_root *root,
  struct btrfs_block_group_cache *cache)
 {
struct btrfs_space_info *sinfo = cache->space_info;
@@ -8909,11 +8907,13 @@ void btrfs_set_block_group_rw(struct btrfs_root *root,
 
spin_lock(&sinfo->lock);
spin_lock(&cache->lock);
-   num_bytes = cache->key.offset - cache->reserved - cache->pinned -
-   cache->bytes_super - btrfs_block_group_used(&cache->item);
-   sinfo->bytes_readonly -= num_bytes;
-   cache

[PATCH 3/4] btrfs: use scrub_pause_on/off() to reduce code in scrub_enumerate_chunks()

2015-05-29 Thread Zhaolei

From: Zhao Lei 

Use new intruduced scrub_pause_on/off() can make this code block
clean and more readable.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/scrub.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index a3d1546..8da3459 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3480,8 +3480,8 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
 
wait_event(sctx->list_wait,
   atomic_read(&sctx->bios_in_flight) == 0);
-   atomic_inc(&fs_info->scrubs_paused);
-   wake_up(&fs_info->scrub_pause_wait);
+
+   scrub_pause_on(fs_info);
 
/*
 * must be called before we decrease @scrub_paused.
@@ -3492,11 +3492,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
   atomic_read(&sctx->workers_pending) == 0);
atomic_set(&sctx->wr_ctx.flush_all_writes, 0);
 
-   mutex_lock(&fs_info->scrub_lock);
-   __scrub_blocked_if_needed(fs_info);
-   atomic_dec(&fs_info->scrubs_paused);
-   mutex_unlock(&fs_info->scrub_lock);
-   wake_up(&fs_info->scrub_pause_wait);
+   scrub_pause_off(fs_info);
 
btrfs_put_block_group(cache);
if (ret)
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] btrfs: Separate scrub_blocked_if_needed() to scrub_pause_on/off()

2015-05-29 Thread Zhaolei

From: Zhao Lei 

It can reduce current duplicated code which is similar to
scrub_blocked_if_needed() but can not call it because little
different.
It also used by my next patch which is in same case.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/scrub.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ab58115..a3d1546 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -332,11 +332,14 @@ static void __scrub_blocked_if_needed(struct 
btrfs_fs_info *fs_info)
}
 }
 
-static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info)
+static void scrub_pause_on(struct btrfs_fs_info *fs_info)
 {
atomic_inc(&fs_info->scrubs_paused);
wake_up(&fs_info->scrub_pause_wait);
+}
 
+static void scrub_pause_off(struct btrfs_fs_info *fs_info)
+{
mutex_lock(&fs_info->scrub_lock);
__scrub_blocked_if_needed(fs_info);
atomic_dec(&fs_info->scrubs_paused);
@@ -345,6 +348,12 @@ static void scrub_blocked_if_needed(struct btrfs_fs_info 
*fs_info)
wake_up(&fs_info->scrub_pause_wait);
 }
 
+static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info)
+{
+   scrub_pause_on(fs_info);
+   scrub_pause_off(fs_info);
+}
+
 /*
  * used for workers that require transaction commits (i.e., for the
  * NOCOW case)
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] btrfs: Fix data checksum error cause by replace with io-load.

2015-05-29 Thread Zhaolei

From: Zhao Lei 

xfstests btrfs/070 sometimes failed.
In my test machine, its fail rate is about 30%.
In another vm(vmware), its fail rate is about 50%.

Reason:
  btrfs/070 do replace and defrag with fsstress simultaneously,
  after above operation, checksum error is found by scrub.

  Actually, it have no relationship with defrag operation, only
  replace with fsstress can trigger this bug.

  New data writen to target device have possibility rewrited by
  old data from source device by replace code in debug, to avoid
  above problem, we can set target block group to readonly in
  replace period, so new data requested by other operation will
  not write to same place with replace code.

  Before patch(4.1-rc3):
30% failed in 100 xfstests.
  After patch:
0% failed in 300 xfstests.

Changelog v1->v2:
1: Update subject to reflect the problem being fixed.
2: Update description to say reason why set read-only can fix the
   problem.
3: Use a helper function to avoid duplicated code block for set
   chunk ro.
All of above are suggested by: David Sterba 

Reported-by: Qu Wenruo 
Suggested-by: Qu Wenruo 
Signed-off-by: Qu Wenruo 
Signed-off-by: Zhao Lei 
---
 fs/btrfs/scrub.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 8da3459..e1ebf43 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3455,6 +3455,18 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
if (!cache)
goto skip;
 
+   /*
+* we need call btrfs_inc_block_group_ro() with scrubs_paused,
+* to avoid deadlock caused by:
+* btrfs_inc_block_group_ro()
+* -> btrfs_wait_for_commit()
+* -> btrfs_commit_transaction()
+* -> btrfs_scrub_pause()
+*/
+   scrub_pause_on(fs_info);
+   btrfs_inc_block_group_ro(root, cache);
+   scrub_pause_off(fs_info);
+
dev_replace->cursor_right = found_key.offset + length;
dev_replace->cursor_left = found_key.offset;
dev_replace->item_needs_writeback = 1;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] btrfs: Use ref_cnt for set_block_group_ro()

2015-05-21 Thread Zhaolei

From: Zhao Lei 

More than one code call set_block_group_ro() and restore rw in fail.

Old code use bool bit to save blockgroup's ro state, it can not
support parallel case(it is confirmd exist in my debug log).

This patch use ref count to store ro state, and rename
set_block_group_ro/set_block_group_rw
to
inc_block_group_ro/dec_block_group_ro.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/ctree.h   |  6 +++---
 fs/btrfs/extent-tree.c | 42 +-
 fs/btrfs/relocation.c  | 14 ++
 3 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6f364e1..74ce6fc 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1300,7 +1300,7 @@ struct btrfs_block_group_cache {
/* for raid56, this is a full stripe, without parity */
unsigned long full_stripe_len;
 
-   unsigned int ro:1;
+   unsigned int ro;
unsigned int iref:1;
unsigned int has_caching_ctl:1;
unsigned int removed:1;
@@ -3495,9 +3495,9 @@ int btrfs_cond_migrate_bytes(struct btrfs_fs_info 
*fs_info,
 void btrfs_block_rsv_release(struct btrfs_root *root,
 struct btrfs_block_rsv *block_rsv,
 u64 num_bytes);
-int btrfs_set_block_group_ro(struct btrfs_root *root,
+int btrfs_inc_block_group_ro(struct btrfs_root *root,
 struct btrfs_block_group_cache *cache);
-void btrfs_set_block_group_rw(struct btrfs_root *root,
+void btrfs_dec_block_group_ro(struct btrfs_root *root,
  struct btrfs_block_group_cache *cache);
 void btrfs_put_block_group_cache(struct btrfs_fs_info *info);
 u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 7effed6..6a82ba0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8751,14 +8751,13 @@ static u64 update_block_group_flags(struct btrfs_root 
*root, u64 flags)
return flags;
 }
 
-static int set_block_group_ro(struct btrfs_block_group_cache *cache, int force)
+static int inc_block_group_ro(struct btrfs_block_group_cache *cache, int force)
 {
struct btrfs_space_info *sinfo = cache->space_info;
u64 num_bytes;
u64 min_allocable_bytes;
int ret = -ENOSPC;
 
-
/*
 * We need some metadata space and system metadata space for
 * allocating chunks in some corner cases until we force to set
@@ -8775,6 +8774,7 @@ static int set_block_group_ro(struct 
btrfs_block_group_cache *cache, int force)
spin_lock(&cache->lock);
 
if (cache->ro) {
+   cache->ro++;
ret = 0;
goto out;
}
@@ -8786,7 +8786,7 @@ static int set_block_group_ro(struct 
btrfs_block_group_cache *cache, int force)
sinfo->bytes_may_use + sinfo->bytes_readonly + num_bytes +
min_allocable_bytes <= sinfo->total_bytes) {
sinfo->bytes_readonly += num_bytes;
-   cache->ro = 1;
+   cache->ro++;
list_add_tail(&cache->ro_list, &sinfo->ro_bgs);
ret = 0;
}
@@ -8796,7 +8796,7 @@ out:
return ret;
 }
 
-int btrfs_set_block_group_ro(struct btrfs_root *root,
+int btrfs_inc_block_group_ro(struct btrfs_root *root,
 struct btrfs_block_group_cache *cache)
 
 {
@@ -8804,8 +8804,6 @@ int btrfs_set_block_group_ro(struct btrfs_root *root,
u64 alloc_flags;
int ret;
 
-   BUG_ON(cache->ro);
-
 again:
trans = btrfs_join_transaction(root);
if (IS_ERR(trans))
@@ -8830,7 +8828,7 @@ again:
}
 
 
-   ret = set_block_group_ro(cache, 0);
+   ret = inc_block_group_ro(cache, 0);
if (!ret)
goto out;
alloc_flags = get_alloc_profile(root, cache->space_info->flags);
@@ -8838,7 +8836,7 @@ again:
 CHUNK_ALLOC_FORCE);
if (ret < 0)
goto out;
-   ret = set_block_group_ro(cache, 0);
+   ret = inc_block_group_ro(cache, 0);
 out:
if (cache->flags & BTRFS_BLOCK_GROUP_SYSTEM) {
alloc_flags = update_block_group_flags(root, cache->flags);
@@ -8899,7 +8897,7 @@ u64 btrfs_account_ro_block_groups_free_space(struct 
btrfs_space_info *sinfo)
return free_bytes;
 }
 
-void btrfs_set_block_group_rw(struct btrfs_root *root,
+void btrfs_dec_block_group_ro(struct btrfs_root *root,
  struct btrfs_block_group_cache *cache)
 {
struct btrfs_space_info *sinfo = cache->space_info;
@@ -8909,11 +8907,13 @@ void btrfs_set_block_group_rw(struct btrfs_root *root,
 
spin_lock(&sinfo->lock);
spin_lock(&cache->lock);
-   num_bytes = cache->key.offset - cache->reserved - cache->pinned -
-   cache->bytes_super - btrfs_block_group_used(&cache->item);
-   sinfo->bytes_readonly -= num_bytes;
-   cache

[PATCH 2/2] btrfs: Fix xfstests btrfs/070

2015-05-21 Thread Zhaolei

From: Zhao Lei 

xfstests btrfs/070 sometimes failed.
In my test machine, its fail rate is about 30%.
In another vm(vmware), its fail rate is about 50%.

Reason:
btrfs/070 do replace and defrag with fsstress simultaneously,
after above operation, checksum error is found by scrub.
Actually, it have no relationship with defrag operation, only
replace with fsstress can trigger this bug.
New data writen to target device have possibility rewrited by
old data from source device by replace code in debug, and can
be fixed by set chunk to ro in replace operation.

Before patch(4.1-rc3):
  30% failed in 100 xfstests.
After patch:
  0% failed in 300 xfstests.

Signed-off-by: Qu Wenruo 
Signed-off-by: Zhao Lei 
---
 fs/btrfs/scrub.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ab58115..469c8a5 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3446,6 +3446,23 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
if (!cache)
goto skip;
 
+   /*
+* we need call btrfs_inc_block_group_ro() with scrubs_paused,
+* to avoid deadlock caused by:
+* btrfs_inc_block_group_ro()
+* -> btrfs_wait_for_commit()
+* -> btrfs_commit_transaction()
+* -> btrfs_scrub_pause()
+*/
+   atomic_inc(&fs_info->scrubs_paused);
+   wake_up(&fs_info->scrub_pause_wait);
+   btrfs_inc_block_group_ro(root, cache);
+   mutex_lock(&fs_info->scrub_lock);
+   __scrub_blocked_if_needed(fs_info);
+   atomic_dec(&fs_info->scrubs_paused);
+   mutex_unlock(&fs_info->scrub_lock);
+   wake_up(&fs_info->scrub_pause_wait);
+
dev_replace->cursor_right = found_key.offset + length;
dev_replace->cursor_left = found_key.offset;
dev_replace->item_needs_writeback = 1;
@@ -3489,6 +3506,8 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
mutex_unlock(&fs_info->scrub_lock);
wake_up(&fs_info->scrub_pause_wait);
 
+   btrfs_dec_block_group_ro(root, cache);
+
btrfs_put_block_group(cache);
if (ret)
break;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 9/9] btrfs: cleanup unused alloc_chunk varible

2015-04-08 Thread Zhaolei

From: Zhao Lei 

Remove int alloc_chunk in btrfs_check_data_free_space() for not
necessary.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index d5ec383..b009987 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3641,7 +3641,7 @@ int btrfs_check_data_free_space(struct inode *inode, u64 
bytes)
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_fs_info *fs_info = root->fs_info;
u64 used;
-   int ret = 0, need_commit = 2, have_pinned_space, alloc_chunk = 1;
+   int ret = 0, need_commit = 2, have_pinned_space;
 
/* make sure bytes are sectorsize aligned */
bytes = ALIGN(bytes, root->sectorsize);
@@ -3669,7 +3669,7 @@ again:
 * if we don't have enough free bytes in this space then we need
 * to alloc a new chunk.
 */
-   if (!data_sinfo->full && alloc_chunk) {
+   if (!data_sinfo->full) {
u64 alloc_target;
 
data_sinfo->force_alloc = CHUNK_ALLOC_FORCE;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/9] btrfs: add WARN_ON() to check is space_info op current

2015-04-08 Thread Zhaolei

From: Zhao Lei 

space_info's value calculation is some complex and easy to cause
bug, add WARN_ON() to help debug.

Changelog v1->v2:
 Put WARN_ON()s under the ENOSPC_DEBUG mount option.
 Suggested by: David Sterba 

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e04ea1f..203ac63 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9464,9 +9464,19 @@ int btrfs_remove_block_group(struct btrfs_trans_handle 
*trans,
 
spin_lock(&block_group->space_info->lock);
list_del_init(&block_group->ro_list);
+
+   if (btrfs_test_opt(root, ENOSPC_DEBUG)) {
+   WARN_ON(block_group->space_info->total_bytes
+   < block_group->key.offset);
+   WARN_ON(block_group->space_info->bytes_readonly
+   < block_group->key.offset);
+   WARN_ON(block_group->space_info->disk_total
+   < block_group->key.offset * factor);
+   }
block_group->space_info->total_bytes -= block_group->key.offset;
block_group->space_info->bytes_readonly -= block_group->key.offset;
block_group->space_info->disk_total -= block_group->key.offset * factor;
+
spin_unlock(&block_group->space_info->lock);
 
memcpy(&key, &block_group->key, sizeof(key));
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/9] btrfs: Fix NO_SPACE bug caused by delayed-iput

2015-04-08 Thread Zhaolei

From: Zhao Lei 

Steps to reproduce:
  while true; do
dd if=/dev/zero of=/btrfs_dir/file count=[fs_size * 75%]
rm /btrfs_dir/file
sync
  done

  And we'll see dd failed because btrfs return NO_SPACE.

Reason:
  Normally, btrfs_commit_transaction() call btrfs_run_delayed_iputs()
  in end to free fs space for next write, but sometimes it hadn't
  done work on time, because btrfs-cleaner thread get delayed-iputs
  from list before, but do iput() after next write.

  This is log:
  [ 2569.050776] comm=btrfs-cleaner func=btrfs_evict_inode() begin

  [ 2569.084280] comm=sync func=btrfs_commit_transaction() call 
btrfs_run_delayed_iputs()
  [ 2569.085418] comm=sync func=btrfs_commit_transaction() done 
btrfs_run_delayed_iputs()
  [ 2569.087554] comm=sync func=btrfs_commit_transaction() end

  [ 2569.191081] comm=dd begin
  [ 2569.790112] comm=dd func=__btrfs_buffered_write() ret=-28

  [ 2569.847479] comm=btrfs-cleaner func=add_pinned_bytes() 0 + 32677888 = 
32677888
  [ 2569.849530] comm=btrfs-cleaner func=add_pinned_bytes() 32677888 + 23834624 
= 56512512
  ...
  [ 2569.903893] comm=btrfs-cleaner func=add_pinned_bytes() 943976448 + 
21762048 = 965738496
  [ 2569.908270] comm=btrfs-cleaner func=btrfs_evict_inode() end

Fix:
  Make btrfs_commit_transaction() wait current running btrfs-cleaner's
  delayed-iputs() done in end.

Test:
  Use script similar to above(more complex),
  before patch:
7 failed in 100 * 20 loop.
  after patch:
0 failed in 100 * 20 loop.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/ctree.h   | 1 +
 fs/btrfs/disk-io.c | 3 ++-
 fs/btrfs/extent-tree.c | 6 ++
 fs/btrfs/inode.c   | 4 
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f9c89ca..54d4d78 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1513,6 +1513,7 @@ struct btrfs_fs_info {
 
spinlock_t delayed_iput_lock;
struct list_head delayed_iputs;
+   struct rw_semaphore delayed_iput_sem;
 
/* this protects tree_mod_seq_list */
spinlock_t tree_mod_seq_lock;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 639f266..6867471 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2241,11 +2241,12 @@ int open_ctree(struct super_block *sb,
spin_lock_init(&fs_info->qgroup_op_lock);
spin_lock_init(&fs_info->buffer_lock);
spin_lock_init(&fs_info->unused_bgs_lock);
-   mutex_init(&fs_info->unused_bg_unpin_mutex);
rwlock_init(&fs_info->tree_mod_log_lock);
+   mutex_init(&fs_info->unused_bg_unpin_mutex);
mutex_init(&fs_info->reloc_mutex);
mutex_init(&fs_info->delalloc_root_mutex);
seqlock_init(&fs_info->profiles_lock);
+   init_rwsem(&fs_info->delayed_iput_sem);
 
init_completion(&fs_info->kobj_unregister);
INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 203ac63..6fd7dca 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3732,6 +3732,12 @@ commit_trans:
ret = btrfs_commit_transaction(trans, root);
if (ret)
return ret;
+   /*
+* make sure that all running delayed iput are
+* done
+*/
+   down_write(&root->fs_info->delayed_iput_sem);
+   up_write(&root->fs_info->delayed_iput_sem);
goto again;
} else {
btrfs_end_transaction(trans, root);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d2e732d..34d10be 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3110,6 +3110,8 @@ void btrfs_run_delayed_iputs(struct btrfs_root *root)
if (empty)
return;
 
+   down_read(&fs_info->delayed_iput_sem);
+
spin_lock(&fs_info->delayed_iput_lock);
list_splice_init(&fs_info->delayed_iputs, &list);
spin_unlock(&fs_info->delayed_iput_lock);
@@ -3120,6 +3122,8 @@ void btrfs_run_delayed_iputs(struct btrfs_root *root)
iput(delayed->inode);
kfree(delayed);
}
+
+   up_read(&root->fs_info->delayed_iput_sem);
 }
 
 /*
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/9] btrfs: Fix tail space processing in find_free_dev_extent()

2015-04-08 Thread Zhaolei

From: Zhao Lei 

It is another reason for NO_SPACE case.

When we found enough free space in loop and saved them to
max_hole_start/size before, and tail space contains pending extent,
origional innocent max_hole_start/size are reset in retry.

As a result, find_free_dev_extent() returns less space than it can,
and cause NO_SPACE in user program.

Reviewed-by: Liu Bo 
Signed-off-by: Zhao Lei 
---
 fs/btrfs/volumes.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8222f6f..586824a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1136,11 +1136,11 @@ int find_free_dev_extent(struct btrfs_trans_handle 
*trans,
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-again:
+
max_hole_start = search_start;
max_hole_size = 0;
-   hole_size = 0;
 
+again:
if (search_start >= search_end || device->is_tgtdev_for_dev_replace) {
ret = -ENOSPC;
goto out;
@@ -1233,21 +1233,23 @@ next:
 * allocated dev extents, and when shrinking the device,
 * search_end may be smaller than search_start.
 */
-   if (search_end > search_start)
+   if (search_end > search_start) {
hole_size = search_end - search_start;
 
-   if (hole_size > max_hole_size) {
-   max_hole_start = search_start;
-   max_hole_size = hole_size;
-   }
+   if (contains_pending_extent(trans, device, &search_start,
+   hole_size)) {
+   btrfs_release_path(path);
+   goto again;
+   }
 
-   if (contains_pending_extent(trans, device, &search_start, hole_size)) {
-   btrfs_release_path(path);
-   goto again;
+   if (hole_size > max_hole_size) {
+   max_hole_start = search_start;
+   max_hole_size = hole_size;
+   }
}
 
/* See above. */
-   if (hole_size < num_bytes)
+   if (max_hole_size < num_bytes)
ret = -ENOSPC;
else
ret = 0;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 8/9] btrfs: wait for delayed iputs on no space

2015-04-08 Thread Zhaolei

From: Zhao Lei 

btrfs will report no_space when we run following write and delete
file loop:
 # FILE_SIZE_M=[ 75% of fs space ]
 # DEV=[ some dev ]
 # MNT=[ some dir ]
 #
 # mkfs.btrfs -f "$DEV"
 # mount -o nodatacow "$DEV" "$MNT"
 # for ((i = 0; i < 100; i++)); do dd if=/dev/zero of="$MNT"/file0 bs=1M 
count="$FILE_SIZE_M"; rm -f "$MNT"/file0; done
 #

Reason:
 iput() and evict() is run after write pages to block device, if
 write pages work is not finished before next write, the "rm"ed space
 is not freed, and caused above bug.

Fix:
 We can add "-o flushoncommit" mount option to avoid above bug, but
 it have performance problem. Actually, we can to wait for on-the-fly
 writes only when no-space happened, it is which this patch do.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0572f14..d5ec383 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3725,6 +3725,9 @@ commit_trans:
!atomic_read(&root->fs_info->open_ioctl_trans)) {
need_commit--;
 
+   if (need_commit > 0)
+   btrfs_wait_ordered_roots(fs_info, -1);
+
trans = btrfs_join_transaction(root);
if (IS_ERR(trans))
return PTR_ERR(trans);
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/9] btrfs: Adjust commit-transaction condition to avoid NO_SPACE more

2015-04-08 Thread Zhaolei

From: Zhao Lei 

If we have any chance to make a successful write, we should not give up.

This patch adjust commit-transaction condition from:
  pinned >= wanted
to
  left + pinned >= wanted

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ebeedb4..644468b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3713,7 +3713,8 @@ alloc:
 * don't bother committing the transaction.
 */
if (percpu_counter_compare(&data_sinfo->total_bytes_pinned,
-  bytes) < 0)
+  used + bytes -
+  data_sinfo->total_bytes) < 0)
have_pinned_space = 0;
spin_unlock(&data_sinfo->lock);
 
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 7/9] btrfs: Support busy loop of write and delete

2015-04-08 Thread Zhaolei

From: Zhao Lei 

Reproduce:
 while true; do
   dd if=/dev/zero of=/mnt/btrfs/file count=[75% fs_size]
   rm /mnt/btrfs/file
 done
 Then we can see above loop failed on NO_SPACE.

It it long-term problem since very beginning, because delayed-iput
after rm are not run.

We already have commit_transaction() in alloc_space code, but it is
not triggered in above case.
This patch trigger commit_transaction() to run delayed-iput and
reflash pinned-space to to make write success.

It is based on previous fix of delayed-iput in commit_transaction(),
need to be applied on top of:
btrfs: Fix NO_SPACE bug caused by delayed-iput

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 6fd7dca..0572f14 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3641,13 +3641,13 @@ int btrfs_check_data_free_space(struct inode *inode, 
u64 bytes)
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_fs_info *fs_info = root->fs_info;
u64 used;
-   int ret = 0, committed = 0, have_pinned_space = 1, alloc_chunk = 1;
+   int ret = 0, need_commit = 2, have_pinned_space, alloc_chunk = 1;
 
/* make sure bytes are sectorsize aligned */
bytes = ALIGN(bytes, root->sectorsize);
 
if (btrfs_is_free_space_inode(inode)) {
-   committed = 1;
+   need_commit = 0;
ASSERT(current->journal_info);
}
 
@@ -3697,8 +3697,10 @@ alloc:
if (ret < 0) {
if (ret != -ENOSPC)
return ret;
-   else
+   else {
+   have_pinned_space = 1;
goto commit_trans;
+   }
}
 
if (!data_sinfo)
@@ -3712,23 +3714,23 @@ alloc:
 * allocation, and no removed chunk in current transaction,
 * don't bother committing the transaction.
 */
-   if (percpu_counter_compare(&data_sinfo->total_bytes_pinned,
-  used + bytes -
-  data_sinfo->total_bytes) < 0)
-   have_pinned_space = 0;
+   have_pinned_space = percpu_counter_compare(
+   &data_sinfo->total_bytes_pinned,
+   used + bytes - data_sinfo->total_bytes);
spin_unlock(&data_sinfo->lock);
 
/* commit the current transaction and try again */
 commit_trans:
-   if (!committed &&
+   if (need_commit &&
!atomic_read(&root->fs_info->open_ioctl_trans)) {
-   committed = 1;
+   need_commit--;
 
trans = btrfs_join_transaction(root);
if (IS_ERR(trans))
return PTR_ERR(trans);
-   if (have_pinned_space ||
-   trans->transaction->have_free_bgs) {
+   if (have_pinned_space >= 0 ||
+   trans->transaction->have_free_bgs ||
+   need_commit > 0) {
ret = btrfs_commit_transaction(trans, root);
if (ret)
return ret;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/9] btrfs: Set relative data on clear btrfs_block_group_cache->pinned

2015-04-08 Thread Zhaolei

From: Zhao Lei 

Bug1:
  space_info->bytes_readonly was set to very large(negative) value in
  btrfs_remove_block_group().

Reason:
  Current code set block_group_cache->pinned = 0 in btrfs_delete_unused_bgs(),
  but above space was not counted to space_info->bytes_readonly.

  Then in btrfs_remove_block_group():
block_group->space_info->bytes_readonly -= block_group->key.offset;
  We can see following value in trace:
btrfs_remove_block_group: pid=2677 comm=btrfs-cleaner WARNING: 
bytes_readonly=12582912, key.offset=134217728

Bug2:
  space_info->total_bytes_pinned grow to value larger than fs size.
  In a 1.2G fs, we can get following trace log:
  at first:
ZL_DEBUG: add_pinned_bytes: pid=2710 comm=sync change total_bytes_pinned 
flags=1 869793792 + 95944704 = 965738496
  after some op:
ZL_DEBUG: add_pinned_bytes: pid=2770 comm=sync change total_bytes_pinned 
flags=1 1780178944 + 95944704 = 1876123648
  after some op:
ZL_DEBUG: add_pinned_bytes: pid=3193 comm=sync change total_bytes_pinned 
flags=1 2924568576 + 95551488 = 3020120064
  ...

Reason:
  Similar to bug1, we also need to adjust space_info->total_bytes_pinned
  in above code block.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 644468b..e04ea1f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9654,8 +9654,18 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info 
*fs_info)
mutex_unlock(&fs_info->unused_bg_unpin_mutex);
 
/* Reset pinned so btrfs_put_block_group doesn't complain */
+   spin_lock(&space_info->lock);
+   spin_lock(&block_group->lock);
+
+   space_info->bytes_pinned -= block_group->pinned;
+   space_info->bytes_readonly += block_group->pinned;
+   percpu_counter_add(&space_info->total_bytes_pinned,
+  -block_group->pinned);
block_group->pinned = 0;
 
+   spin_unlock(&block_group->lock);
+   spin_unlock(&space_info->lock);
+
/*
 * Btrfs_remove_chunk will abort the transaction if things go
 * horribly wrong.
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/9] btrfs: fix condition of commit transaction

2015-04-08 Thread Zhaolei

From: Zhao Lei 

Old code bypass commit transaction when we don't have enough
pinned space, but another case is there exist freed bgs in current
transction, it have possibility to make alloc_chunk success.

This patch modify the condition to:
if (have_free_bg || have_pinned_space) commit_transaction()

Confirmed above action by printk before and after patch.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8b353ad..ebeedb4 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3641,7 +3641,7 @@ int btrfs_check_data_free_space(struct inode *inode, u64 
bytes)
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_fs_info *fs_info = root->fs_info;
u64 used;
-   int ret = 0, committed = 0, alloc_chunk = 1;
+   int ret = 0, committed = 0, have_pinned_space = 1, alloc_chunk = 1;
 
/* make sure bytes are sectorsize aligned */
bytes = ALIGN(bytes, root->sectorsize);
@@ -3709,11 +3709,12 @@ alloc:
 
/*
 * If we don't have enough pinned space to deal with this
-* allocation don't bother committing the transaction.
+* allocation, and no removed chunk in current transaction,
+* don't bother committing the transaction.
 */
if (percpu_counter_compare(&data_sinfo->total_bytes_pinned,
   bytes) < 0)
-   committed = 1;
+   have_pinned_space = 0;
spin_unlock(&data_sinfo->lock);
 
/* commit the current transaction and try again */
@@ -3725,10 +3726,15 @@ commit_trans:
trans = btrfs_join_transaction(root);
if (IS_ERR(trans))
return PTR_ERR(trans);
-   ret = btrfs_commit_transaction(trans, root);
-   if (ret)
-   return ret;
-   goto again;
+   if (have_pinned_space ||
+   trans->transaction->have_free_bgs) {
+   ret = btrfs_commit_transaction(trans, root);
+   if (ret)
+   return ret;
+   goto again;
+   } else {
+   btrfs_end_transaction(trans, root);
+   }
}
 
trace_btrfs_space_reservation(root->fs_info,
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/9] btrfs: Fix no_space on dd and rm loop

2015-04-08 Thread Zhaolei

From: Zhao Lei 

This is v2 of resend-fix-no-space.

Most of them are send in single patch, I resend them in patchset
to make it easy to access.

Notice that "Btrfs: fix find_free_dev_extent() malfunction in case
device tree has hole" from Forrest Liu in:
  https://patchwork.kernel.org/patch/5800231/
is also need to fix all known no_space bug.

Changelog v1->v2:
1: Rebased on top of v4.0-rc7
2: Fixed a lock problem reported by:
   'Tsutomu Itoh' 
3: Add Reviewed-by: Liu Bo 
   to [PATCH 2/9] btrfs:

Tested by busy dd and rm loop script in 2000 times.
I'll add xfstests for this case later.

This is available at fix_no_space branch on my tree:
  git://github.com/zhaoleidd/btrfs.git
It is also included in integration-for-chris branch in above tree.

Thanks
Zhaolei

Zhao Lei (9):
  btrfs: fix condition of commit transaction
  btrfs: Fix tail space processing in find_free_dev_extent()
  btrfs: Adjust commit-transaction condition to avoid NO_SPACE more
  btrfs: Set relative data on clear btrfs_block_group_cache->pinned
  btrfs: add WARN_ON() to check is space_info op current
  btrfs: Fix NO_SPACE bug caused by delayed-iput
  btrfs: Support busy loop of write and delete
  btrfs: wait for delayed iputs on no space
  btrfs: cleanup unused alloc_chunk varible

 fs/btrfs/ctree.h   |  1 +
 fs/btrfs/disk-io.c |  3 ++-
 fs/btrfs/extent-tree.c | 66 +++---
 fs/btrfs/inode.c   |  4 +++
 fs/btrfs/volumes.c | 24 +-
 5 files changed, 72 insertions(+), 26 deletions(-)

-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/9] btrfs: fix condition of commit transaction

2015-04-03 Thread Zhaolei

From: Zhao Lei 

Old code bypass commit transaction when we don't have enough
pinned space, but another case is there exist freed bgs in current
transction, it have possibility to make alloc_chunk success.

This patch modify the condition to:
if (have_free_bg || have_pinned_space) commit_transaction()

Confirmed above action by printk before and after patch.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8b353ad..ebeedb4 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3641,7 +3641,7 @@ int btrfs_check_data_free_space(struct inode *inode, u64 
bytes)
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_fs_info *fs_info = root->fs_info;
u64 used;
-   int ret = 0, committed = 0, alloc_chunk = 1;
+   int ret = 0, committed = 0, have_pinned_space = 1, alloc_chunk = 1;
 
/* make sure bytes are sectorsize aligned */
bytes = ALIGN(bytes, root->sectorsize);
@@ -3709,11 +3709,12 @@ alloc:
 
/*
 * If we don't have enough pinned space to deal with this
-* allocation don't bother committing the transaction.
+* allocation, and no removed chunk in current transaction,
+* don't bother committing the transaction.
 */
if (percpu_counter_compare(&data_sinfo->total_bytes_pinned,
   bytes) < 0)
-   committed = 1;
+   have_pinned_space = 0;
spin_unlock(&data_sinfo->lock);
 
/* commit the current transaction and try again */
@@ -3725,10 +3726,15 @@ commit_trans:
trans = btrfs_join_transaction(root);
if (IS_ERR(trans))
return PTR_ERR(trans);
-   ret = btrfs_commit_transaction(trans, root);
-   if (ret)
-   return ret;
-   goto again;
+   if (have_pinned_space ||
+   trans->transaction->have_free_bgs) {
+   ret = btrfs_commit_transaction(trans, root);
+   if (ret)
+   return ret;
+   goto again;
+   } else {
+   btrfs_end_transaction(trans, root);
+   }
}
 
trace_btrfs_space_reservation(root->fs_info,
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/9] btrfs: Adjust commit-transaction condition to avoid NO_SPACE more

2015-04-03 Thread Zhaolei

From: Zhao Lei 

If we have any chance to make a successful write, we should not give up.

This patch adjust commit-transaction condition from:
  pinned >= wanted
to
  left + pinned >= wanted

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ebeedb4..644468b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3713,7 +3713,8 @@ alloc:
 * don't bother committing the transaction.
 */
if (percpu_counter_compare(&data_sinfo->total_bytes_pinned,
-  bytes) < 0)
+  used + bytes -
+  data_sinfo->total_bytes) < 0)
have_pinned_space = 0;
spin_unlock(&data_sinfo->lock);
 
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/9] btrfs: add WARN_ON() to check is space_info op current

2015-04-03 Thread Zhaolei

From: Zhao Lei 

space_info's value calculation is some complex and easy to cause
bug, add WARN_ON() to help debug.

Changelog v1->v2:
 Put WARN_ON()s under the ENOSPC_DEBUG mount option.
 Suggested by: David Sterba 

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e04ea1f..203ac63 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9464,9 +9464,19 @@ int btrfs_remove_block_group(struct btrfs_trans_handle 
*trans,
 
spin_lock(&block_group->space_info->lock);
list_del_init(&block_group->ro_list);
+
+   if (btrfs_test_opt(root, ENOSPC_DEBUG)) {
+   WARN_ON(block_group->space_info->total_bytes
+   < block_group->key.offset);
+   WARN_ON(block_group->space_info->bytes_readonly
+   < block_group->key.offset);
+   WARN_ON(block_group->space_info->disk_total
+   < block_group->key.offset * factor);
+   }
block_group->space_info->total_bytes -= block_group->key.offset;
block_group->space_info->bytes_readonly -= block_group->key.offset;
block_group->space_info->disk_total -= block_group->key.offset * factor;
+
spin_unlock(&block_group->space_info->lock);
 
memcpy(&key, &block_group->key, sizeof(key));
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 7/9] btrfs: Support busy loop of write and delete

2015-04-03 Thread Zhaolei

From: Zhao Lei 

Reproduce:
 while true; do
   dd if=/dev/zero of=/mnt/btrfs/file count=[75% fs_size]
   rm /mnt/btrfs/file
 done
 Then we can see above loop failed on NO_SPACE.

It it long-term problem since very beginning, because delayed-iput
after rm are not run.

We already have commit_transaction() in alloc_space code, but it is
not triggered in above case.
This patch trigger commit_transaction() to run delayed-iput and
reflash pinned-space to to make write success.

It is based on previous fix of delayed-iput in commit_transaction(),
need to be applied on top of:
btrfs: Fix NO_SPACE bug caused by delayed-iput

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 203ac63..5683736 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3641,13 +3641,13 @@ int btrfs_check_data_free_space(struct inode *inode, 
u64 bytes)
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_fs_info *fs_info = root->fs_info;
u64 used;
-   int ret = 0, committed = 0, have_pinned_space = 1, alloc_chunk = 1;
+   int ret = 0, need_commit = 2, have_pinned_space, alloc_chunk = 1;
 
/* make sure bytes are sectorsize aligned */
bytes = ALIGN(bytes, root->sectorsize);
 
if (btrfs_is_free_space_inode(inode)) {
-   committed = 1;
+   need_commit = 0;
ASSERT(current->journal_info);
}
 
@@ -3697,8 +3697,10 @@ alloc:
if (ret < 0) {
if (ret != -ENOSPC)
return ret;
-   else
+   else {
+   have_pinned_space = 1;
goto commit_trans;
+   }
}
 
if (!data_sinfo)
@@ -3712,23 +3714,23 @@ alloc:
 * allocation, and no removed chunk in current transaction,
 * don't bother committing the transaction.
 */
-   if (percpu_counter_compare(&data_sinfo->total_bytes_pinned,
-  used + bytes -
-  data_sinfo->total_bytes) < 0)
-   have_pinned_space = 0;
+   have_pinned_space = percpu_counter_compare(
+   &data_sinfo->total_bytes_pinned,
+   used + bytes - data_sinfo->total_bytes);
spin_unlock(&data_sinfo->lock);
 
/* commit the current transaction and try again */
 commit_trans:
-   if (!committed &&
+   if (need_commit &&
!atomic_read(&root->fs_info->open_ioctl_trans)) {
-   committed = 1;
+   need_commit--;
 
trans = btrfs_join_transaction(root);
if (IS_ERR(trans))
return PTR_ERR(trans);
-   if (have_pinned_space ||
-   trans->transaction->have_free_bgs) {
+   if (have_pinned_space >= 0 ||
+   trans->transaction->have_free_bgs ||
+   need_commit > 0) {
ret = btrfs_commit_transaction(trans, root);
if (ret)
return ret;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/9] btrfs: Set relative data on clear btrfs_block_group_cache->pinned

2015-04-03 Thread Zhaolei

From: Zhao Lei 

Bug1:
  space_info->bytes_readonly was set to very large(negative) value in
  btrfs_remove_block_group().

Reason:
  Current code set block_group_cache->pinned = 0 in btrfs_delete_unused_bgs(),
  but above space was not counted to space_info->bytes_readonly.

  Then in btrfs_remove_block_group():
block_group->space_info->bytes_readonly -= block_group->key.offset;
  We can see following value in trace:
btrfs_remove_block_group: pid=2677 comm=btrfs-cleaner WARNING: 
bytes_readonly=12582912, key.offset=134217728

Bug2:
  space_info->total_bytes_pinned grow to value larger than fs size.
  In a 1.2G fs, we can get following trace log:
  at first:
ZL_DEBUG: add_pinned_bytes: pid=2710 comm=sync change total_bytes_pinned 
flags=1 869793792 + 95944704 = 965738496
  after some op:
ZL_DEBUG: add_pinned_bytes: pid=2770 comm=sync change total_bytes_pinned 
flags=1 1780178944 + 95944704 = 1876123648
  after some op:
ZL_DEBUG: add_pinned_bytes: pid=3193 comm=sync change total_bytes_pinned 
flags=1 2924568576 + 95551488 = 3020120064
  ...

Reason:
  Similar to bug1, we also need to adjust space_info->total_bytes_pinned
  in above code block.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 644468b..e04ea1f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9654,8 +9654,18 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info 
*fs_info)
mutex_unlock(&fs_info->unused_bg_unpin_mutex);
 
/* Reset pinned so btrfs_put_block_group doesn't complain */
+   spin_lock(&space_info->lock);
+   spin_lock(&block_group->lock);
+
+   space_info->bytes_pinned -= block_group->pinned;
+   space_info->bytes_readonly += block_group->pinned;
+   percpu_counter_add(&space_info->total_bytes_pinned,
+  -block_group->pinned);
block_group->pinned = 0;
 
+   spin_unlock(&block_group->lock);
+   spin_unlock(&space_info->lock);
+
/*
 * Btrfs_remove_chunk will abort the transaction if things go
 * horribly wrong.
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 9/9] btrfs: cleanup unused alloc_chunk varible

2015-04-03 Thread Zhaolei

From: Zhao Lei 

Remove int alloc_chunk in btrfs_check_data_free_space() for not
necessary.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 05747d2..b83060f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3641,7 +3641,7 @@ int btrfs_check_data_free_space(struct inode *inode, u64 
bytes)
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_fs_info *fs_info = root->fs_info;
u64 used;
-   int ret = 0, need_commit = 2, have_pinned_space, alloc_chunk = 1;
+   int ret = 0, need_commit = 2, have_pinned_space;
 
/* make sure bytes are sectorsize aligned */
bytes = ALIGN(bytes, root->sectorsize);
@@ -3669,7 +3669,7 @@ again:
 * if we don't have enough free bytes in this space then we need
 * to alloc a new chunk.
 */
-   if (!data_sinfo->full && alloc_chunk) {
+   if (!data_sinfo->full) {
u64 alloc_target;
 
data_sinfo->force_alloc = CHUNK_ALLOC_FORCE;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 8/9] btrfs: wait for delayed iputs on no space

2015-04-03 Thread Zhaolei

From: Zhao Lei 

btrfs will report no_space when we run following write and delete
file loop:
 # FILE_SIZE_M=[ 75% of fs space ]
 # DEV=[ some dev ]
 # MNT=[ some dir ]
 #
 # mkfs.btrfs -f "$DEV"
 # mount -o nodatacow "$DEV" "$MNT"
 # for ((i = 0; i < 100; i++)); do dd if=/dev/zero of="$MNT"/file0 bs=1M 
count="$FILE_SIZE_M"; rm -f "$MNT"/file0; done
 #

Reason:
 iput() and evict() is run after write pages to block device, if
 write pages work is not finished before next write, the "rm"ed space
 is not freed, and caused above bug.

Fix:
 We can add "-o flushoncommit" mount option to avoid above bug, but
 it have performance problem. Actually, we can to wait for on-the-fly
 writes only when no-space happened, it is which this patch do.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 5683736..05747d2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3725,6 +3725,9 @@ commit_trans:
!atomic_read(&root->fs_info->open_ioctl_trans)) {
need_commit--;
 
+   if (need_commit > 0)
+   btrfs_wait_ordered_roots(fs_info, -1);
+
trans = btrfs_join_transaction(root);
if (IS_ERR(trans))
return PTR_ERR(trans);
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/9] btrfs: Fix tail space processing in find_free_dev_extent()

2015-04-03 Thread Zhaolei

From: Zhao Lei 

It is another reason for NO_SPACE case.

When we found enough free space in loop and saved them to
max_hole_start/size before, and tail space contains pending extent,
origional innocent max_hole_start/size are reset in retry.

As a result, find_free_dev_extent() returns less space than it can,
and cause NO_SPACE in user program.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/volumes.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8222f6f..586824a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1136,11 +1136,11 @@ int find_free_dev_extent(struct btrfs_trans_handle 
*trans,
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-again:
+
max_hole_start = search_start;
max_hole_size = 0;
-   hole_size = 0;
 
+again:
if (search_start >= search_end || device->is_tgtdev_for_dev_replace) {
ret = -ENOSPC;
goto out;
@@ -1233,21 +1233,23 @@ next:
 * allocated dev extents, and when shrinking the device,
 * search_end may be smaller than search_start.
 */
-   if (search_end > search_start)
+   if (search_end > search_start) {
hole_size = search_end - search_start;
 
-   if (hole_size > max_hole_size) {
-   max_hole_start = search_start;
-   max_hole_size = hole_size;
-   }
+   if (contains_pending_extent(trans, device, &search_start,
+   hole_size)) {
+   btrfs_release_path(path);
+   goto again;
+   }
 
-   if (contains_pending_extent(trans, device, &search_start, hole_size)) {
-   btrfs_release_path(path);
-   goto again;
+   if (hole_size > max_hole_size) {
+   max_hole_start = search_start;
+   max_hole_size = hole_size;
+   }
}
 
/* See above. */
-   if (hole_size < num_bytes)
+   if (max_hole_size < num_bytes)
ret = -ENOSPC;
else
ret = 0;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/9] btrfs: Fix NO_SPACE bug caused by delayed-iput

2015-04-03 Thread Zhaolei

From: Zhao Lei 

Steps to reproduce:
  while true; do
dd if=/dev/zero of=/btrfs_dir/file count=[fs_size * 75%]
rm /btrfs_dir/file
sync
  done

  And we'll see dd failed because btrfs return NO_SPACE.

Reason:
  Normally, btrfs_commit_transaction() call btrfs_run_delayed_iputs()
  in end to free fs space for next write, but sometimes it hadn't
  done work on time, because btrfs-cleaner thread get delayed-iputs
  from list before, but do iput() after next write.

  This is log:
  [ 2569.050776] comm=btrfs-cleaner func=btrfs_evict_inode() begin

  [ 2569.084280] comm=sync func=btrfs_commit_transaction() call 
btrfs_run_delayed_iputs()
  [ 2569.085418] comm=sync func=btrfs_commit_transaction() done 
btrfs_run_delayed_iputs()
  [ 2569.087554] comm=sync func=btrfs_commit_transaction() end

  [ 2569.191081] comm=dd begin
  [ 2569.790112] comm=dd func=__btrfs_buffered_write() ret=-28

  [ 2569.847479] comm=btrfs-cleaner func=add_pinned_bytes() 0 + 32677888 = 
32677888
  [ 2569.849530] comm=btrfs-cleaner func=add_pinned_bytes() 32677888 + 23834624 
= 56512512
  ...
  [ 2569.903893] comm=btrfs-cleaner func=add_pinned_bytes() 943976448 + 
21762048 = 965738496
  [ 2569.908270] comm=btrfs-cleaner func=btrfs_evict_inode() end

Fix:
  Make btrfs_commit_transaction() wait current running btrfs-cleaner's
  delayed-iputs() done in end.

Test:
  Use script similar to above(more complex),
  before patch:
7 failed in 100 * 20 loop.
  after patch:
0 failed in 100 * 20 loop.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/ctree.h   | 1 +
 fs/btrfs/disk-io.c | 5 -
 fs/btrfs/transaction.c | 6 +-
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f9c89ca..54d4d78 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1513,6 +1513,7 @@ struct btrfs_fs_info {
 
spinlock_t delayed_iput_lock;
struct list_head delayed_iputs;
+   struct rw_semaphore delayed_iput_sem;
 
/* this protects tree_mod_seq_list */
spinlock_t tree_mod_seq_lock;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 639f266..df40f60 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1778,7 +1778,9 @@ static int cleaner_kthread(void *arg)
goto sleep;
}
 
+   down_read(&root->fs_info->delayed_iput_sem);
btrfs_run_delayed_iputs(root);
+   up_read(&root->fs_info->delayed_iput_sem);
btrfs_delete_unused_bgs(root->fs_info);
again = btrfs_clean_one_deleted_snapshot(root);
mutex_unlock(&root->fs_info->cleaner_mutex);
@@ -2241,11 +2243,12 @@ int open_ctree(struct super_block *sb,
spin_lock_init(&fs_info->qgroup_op_lock);
spin_lock_init(&fs_info->buffer_lock);
spin_lock_init(&fs_info->unused_bgs_lock);
-   mutex_init(&fs_info->unused_bg_unpin_mutex);
rwlock_init(&fs_info->tree_mod_log_lock);
+   mutex_init(&fs_info->unused_bg_unpin_mutex);
mutex_init(&fs_info->reloc_mutex);
mutex_init(&fs_info->delalloc_root_mutex);
seqlock_init(&fs_info->profiles_lock);
+   init_rwsem(&fs_info->delayed_iput_sem);
 
init_completion(&fs_info->kobj_unregister);
INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 8be4278..d18991f 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2076,8 +2076,12 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans,
 
kmem_cache_free(btrfs_trans_handle_cachep, trans);
 
-   if (current != root->fs_info->transaction_kthread)
+   if (current != root->fs_info->transaction_kthread) {
btrfs_run_delayed_iputs(root);
+   /* make sure that all running delayed iput are done */
+   down_write(&root->fs_info->delayed_iput_sem);
+   up_write(&root->fs_info->delayed_iput_sem);
+   }
 
return ret;
 
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/9] btrfs: Fix no_space on dd and rm loop

2015-04-03 Thread Zhaolei

From: Zhao Lei 

I resend this patch set with some changes:
1: Move a cleanup patch for btrfs_check_data_free_space() into
2: Rebased on top of v4.0-rc5
3: Fixed a lock problem reported by:
   'Tsutomu Itoh' 

Tested by busy dd and rm loop script in 2000 times.
Confirmed having-problem in v4.0-rc5 and no-problem on top of
this patchset.

I'll add xfstests for this case later.

This is available at fix_no_space branch on my tree:
  git://github.com/zhaoleidd/btrfs.git
It is also included in integration-for-chris branch in above tree.

Thanks
Zhaolei


Zhao Lei (9):
  btrfs: fix condition of commit transaction
  btrfs: Fix tail space processing in find_free_dev_extent()
  btrfs: Adjust commit-transaction condition to avoid NO_SPACE more
  btrfs: Set relative data on clear btrfs_block_group_cache->pinned
  btrfs: add WARN_ON() to check is space_info op current
  btrfs: Fix NO_SPACE bug caused by delayed-iput
  btrfs: Support busy loop of write and delete
  btrfs: wait for delayed iputs on no space
  btrfs: cleanup unused alloc_chunk varible

 fs/btrfs/ctree.h   |  1 +
 fs/btrfs/disk-io.c |  5 -
 fs/btrfs/extent-tree.c | 60 ++
 fs/btrfs/transaction.c |  6 -
 fs/btrfs/volumes.c | 24 +++-
 5 files changed, 69 insertions(+), 27 deletions(-)

-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: wait for delayed iputs on no space

2015-03-27 Thread Zhaolei

From: Zhao Lei 

btrfs will report no_space when we run following write and delete
file loop:
 # FILE_SIZE_M=[ 75% of fs space ]
 # DEV=[ some dev ]
 # MNT=[ some dir ]
 #
 # mkfs.btrfs -f "$DEV"
 # mount -o nodatacow "$DEV" "$MNT"
 # for ((i = 0; i < 100; i++)); do dd if=/dev/zero of="$MNT"/file0 bs=1M 
count="$FILE_SIZE_M"; rm -f "$MNT"/file0; done
 #

Reason:
 iput() and evict() is run after write pages to block device, if
 write pages work is not finished before next write, the "rm"ed space
 is not freed, and caused above bug.

Fix:
 We can add "-o flushoncommit" mount option to avoid above bug, but
 it have performance problem. Actually, we can to wait for on-the-fly
 writes only when no-space happened, it is which this patch do.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 6c1e211..94fb15f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3683,6 +3683,9 @@ commit_trans:
!atomic_read(&root->fs_info->open_ioctl_trans)) {
need_commit--;
 
+   if (need_commit > 0)
+   btrfs_wait_ordered_roots(fs_info, -1);
+
trans = btrfs_join_transaction(root);
if (IS_ERR(trans))
return PTR_ERR(trans);
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: wait for delayed iputs on no space

2015-03-27 Thread Zhaolei

From: Zhao Lei 

This is another fix of no_space case.

All patchs for fix no_space bug are available at fix_no_space
branch on:
  git://github.com/zhaoleidd/btrfs

Any suggestions are welcome.

Zhao Lei (1):
  btrfs: wait for delayed iputs on no space

 fs/btrfs/extent-tree.c | 3 +++
 1 file changed, 3 insertions(+)

-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/6] btrfs: Use raid_write_end_io for scrub

2015-03-04 Thread Zhaolei

From: Zhao Lei 

No need to create additional end_io function for scrub, it can use
existing raid_write_end_io() instead.

This patch also fixed some wrong comments.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/raid56.c | 36 
 1 file changed, 8 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 0a40d07..2285e78 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -897,6 +897,7 @@ static void rbio_orig_end_io(struct btrfs_raid_bio *rbio, 
int err, int uptodate)
 static void raid_write_end_io(struct bio *bio, int err)
 {
struct btrfs_raid_bio *rbio = bio->bi_private;
+   int max_errors;
 
if (err)
fail_bio_stripe(rbio, bio);
@@ -906,10 +907,13 @@ static void raid_write_end_io(struct bio *bio, int err)
if (!atomic_dec_and_test(&rbio->stripes_pending))
return;
 
-   err = 0;
+   /* OK, we have wrote all the stripes we need to. */
+   if (rbio->operation == BTRFS_RBIO_PARITY_SCRUB)
+   max_errors = 0;
+   else
+   max_errors = rbio->bbio->max_errors;
 
-   /* OK, we have read all the stripes we need to. */
-   if (atomic_read(&rbio->error) > rbio->bbio->max_errors)
+   if (atomic_read(&rbio->error) > max_errors)
err = -EIO;
 
rbio_orig_end_io(rbio, err, 0);
@@ -2276,30 +2280,6 @@ static int alloc_rbio_essential_pages(struct 
btrfs_raid_bio *rbio)
return 0;
 }
 
-/*
- * end io function used by finish_rmw.  When we finally
- * get here, we've written a full stripe
- */
-static void raid_write_parity_end_io(struct bio *bio, int err)
-{
-   struct btrfs_raid_bio *rbio = bio->bi_private;
-
-   if (err)
-   fail_bio_stripe(rbio, bio);
-
-   bio_put(bio);
-
-   if (!atomic_dec_and_test(&rbio->stripes_pending))
-   return;
-
-   err = 0;
-
-   if (atomic_read(&rbio->error))
-   err = -EIO;
-
-   rbio_orig_end_io(rbio, err, 0);
-}
-
 static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 int need_check)
 {
@@ -2452,7 +2432,7 @@ submit_write:
break;
 
bio->bi_private = rbio;
-   bio->bi_end_io = raid_write_parity_end_io;
+   bio->bi_end_io = raid_write_end_io;
BUG_ON(!test_bit(BIO_UPTODATE, &bio->bi_flags));
submit_bio(WRITE, bio);
}
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/6] btrfs: Use unified stripe_page's index calculation

2015-03-04 Thread Zhaolei

From: Zhao Lei 

We are using different index calculation method for stripe_page in
current code:
1: (rbio->stripe_len / PAGE_CACHE_SIZE) * stripe_index + page_index
2: DIV_ROUND_UP(rbio->stripe_len, PAGE_CACHE_SIZE) * stripe_index + page_index
3: DIV_ROUND_UP(rbio->stripe_len * stripe_index, PAGE_CACHE_SIZE) + page_index
...

They can get same result when stripe_len align to PAGE_CACHE_SIZE,
this is why current code can work.
But anyway, we need to fix it to make code better.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/raid56.c | 43 +--
 1 file changed, 21 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 0cfbfcf..645bc37 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -612,13 +612,28 @@ static int rbio_can_merge(struct btrfs_raid_bio *last,
return 1;
 }
 
+static int rbio_stripe_page_index(struct btrfs_raid_bio *rbio, int stripe,
+ int index)
+{
+   return stripe * rbio->stripe_npages + index;
+}
+
+/*
+ * these are just the pages from the rbio array, not from anything
+ * the FS sent down to us
+ */
+static struct page *rbio_stripe_page(struct btrfs_raid_bio *rbio, int stripe,
+int index)
+{
+   return rbio->stripe_pages[rbio_stripe_page_index(rbio, stripe, index)];
+}
+
 /*
  * helper to index into the pstripe
  */
 static struct page *rbio_pstripe_page(struct btrfs_raid_bio *rbio, int index)
 {
-   index += (rbio->nr_data * rbio->stripe_len) >> PAGE_CACHE_SHIFT;
-   return rbio->stripe_pages[index];
+   return rbio_stripe_page(rbio, rbio->nr_data, index);
 }
 
 /*
@@ -629,10 +644,7 @@ static struct page *rbio_qstripe_page(struct 
btrfs_raid_bio *rbio, int index)
 {
if (rbio->nr_data + 1 == rbio->real_stripes)
return NULL;
-
-   index += ((rbio->nr_data + 1) * rbio->stripe_len) >>
-   PAGE_CACHE_SHIFT;
-   return rbio->stripe_pages[index];
+   return rbio_stripe_page(rbio, rbio->nr_data + 1, index);
 }
 
 /*
@@ -944,8 +956,7 @@ static struct page *page_in_rbio(struct btrfs_raid_bio 
*rbio,
  */
 static unsigned long rbio_nr_pages(unsigned long stripe_len, int nr_stripes)
 {
-   unsigned long nr = stripe_len * nr_stripes;
-   return DIV_ROUND_UP(nr, PAGE_CACHE_SIZE);
+   return DIV_ROUND_UP(stripe_len, PAGE_CACHE_SIZE) * nr_stripes;
 }
 
 /*
@@ -1023,13 +1034,13 @@ static int alloc_rbio_pages(struct btrfs_raid_bio *rbio)
return 0;
 }
 
-/* allocate pages for just the p/q stripes */
+/* only allocate pages for p/q stripes */
 static int alloc_rbio_parity_pages(struct btrfs_raid_bio *rbio)
 {
int i;
struct page *page;
 
-   i = (rbio->nr_data * rbio->stripe_len) >> PAGE_CACHE_SHIFT;
+   i = rbio_stripe_page_index(rbio, rbio->nr_data, 0);
 
for (; i < rbio->nr_pages; i++) {
if (rbio->stripe_pages[i])
@@ -1119,18 +1130,6 @@ static void validate_rbio_for_rmw(struct btrfs_raid_bio 
*rbio)
 }
 
 /*
- * these are just the pages from the rbio array, not from anything
- * the FS sent down to us
- */
-static struct page *rbio_stripe_page(struct btrfs_raid_bio *rbio, int stripe, 
int page)
-{
-   int index;
-   index = stripe * (rbio->stripe_len >> PAGE_CACHE_SHIFT);
-   index += page;
-   return rbio->stripe_pages[index];
-}
-
-/*
  * helper function to walk our bio list and populate the bio_pages array with
  * the result.  This seems expensive, but it is faster than constantly
  * searching through the bio list as we setup the IO in finish_rmw or stripe
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/6] btrfs: use rbio->nr_pages to reduce calculation

2015-03-04 Thread Zhaolei

From: Zhao Lei 

We can use rbio->stripe_npages to reduce unnecessary calculation in
many code place.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/raid56.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 645bc37..0d902ac 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1172,7 +1172,6 @@ static noinline void finish_rmw(struct btrfs_raid_bio 
*rbio)
 {
struct btrfs_bio *bbio = rbio->bbio;
void *pointers[rbio->real_stripes];
-   int stripe_len = rbio->stripe_len;
int nr_data = rbio->nr_data;
int stripe;
int pagenr;
@@ -1180,7 +1179,6 @@ static noinline void finish_rmw(struct btrfs_raid_bio 
*rbio)
int q_stripe = -1;
struct bio_list bio_list;
struct bio *bio;
-   int pages_per_stripe = stripe_len >> PAGE_CACHE_SHIFT;
int ret;
 
bio_list_init(&bio_list);
@@ -1223,7 +1221,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio 
*rbio)
else
clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
 
-   for (pagenr = 0; pagenr < pages_per_stripe; pagenr++) {
+   for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
struct page *p;
/* first collect one page from each data stripe */
for (stripe = 0; stripe < nr_data; stripe++) {
@@ -1265,7 +1263,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio 
*rbio)
 * everything else.
 */
for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
-   for (pagenr = 0; pagenr < pages_per_stripe; pagenr++) {
+   for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
struct page *page;
if (stripe < rbio->nr_data) {
page = page_in_rbio(rbio, stripe, pagenr, 1);
@@ -1289,7 +1287,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio 
*rbio)
if (!bbio->tgtdev_map[stripe])
continue;
 
-   for (pagenr = 0; pagenr < pages_per_stripe; pagenr++) {
+   for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
struct page *page;
if (stripe < rbio->nr_data) {
page = page_in_rbio(rbio, stripe, pagenr, 1);
@@ -1505,7 +1503,6 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
int bios_to_read = 0;
struct bio_list bio_list;
int ret;
-   int nr_pages = DIV_ROUND_UP(rbio->stripe_len, PAGE_CACHE_SIZE);
int pagenr;
int stripe;
struct bio *bio;
@@ -1524,7 +1521,7 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
 * stripe
 */
for (stripe = 0; stripe < rbio->nr_data; stripe++) {
-   for (pagenr = 0; pagenr < nr_pages; pagenr++) {
+   for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
struct page *page;
/*
 * we want to find all the pages missing from
@@ -1801,7 +1798,6 @@ static void __raid_recover_end_io(struct btrfs_raid_bio 
*rbio)
int pagenr, stripe;
void **pointers;
int faila = -1, failb = -1;
-   int nr_pages = DIV_ROUND_UP(rbio->stripe_len, PAGE_CACHE_SIZE);
struct page *page;
int err;
int i;
@@ -1824,7 +1820,7 @@ static void __raid_recover_end_io(struct btrfs_raid_bio 
*rbio)
 
index_rbio_pages(rbio);
 
-   for (pagenr = 0; pagenr < nr_pages; pagenr++) {
+   for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
/*
 * Now we just use bitmap to mark the horizontal stripes in
 * which we have data when doing parity scrub.
@@ -1934,7 +1930,7 @@ pstripe:
 * other endio functions will fiddle the uptodate bits
 */
if (rbio->operation == BTRFS_RBIO_WRITE) {
-   for (i = 0;  i < nr_pages; i++) {
+   for (i = 0;  i < rbio->stripe_npages; i++) {
if (faila != -1) {
page = rbio_stripe_page(rbio, faila, i);
SetPageUptodate(page);
@@ -2027,7 +2023,6 @@ static int __raid56_parity_recover(struct btrfs_raid_bio 
*rbio)
int bios_to_read = 0;
struct bio_list bio_list;
int ret;
-   int nr_pages = DIV_ROUND_UP(rbio->stripe_len, PAGE_CACHE_SIZE);
int pagenr;
int stripe;
struct bio *bio;
@@ -2051,7 +2046,7 @@ static int __raid56_parity_recover(struct btrfs_raid_bio 
*rbio)
continue;
}
 
-   for (pagenr = 0; pagenr < nr_pages; pagenr++) {
+   for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
struct page *p;

[PATCH 1/6] btrfs: Fix calculation of rbio->dbitmap's size calculation

2015-03-04 Thread Zhaolei

From: Zhao Lei 

Current code is trying to calculate rbio->dbitmap's size to make it
align to sizeof(long), but implement haven't achived this object,
it is align to sizeof(char) instead.
This patch fixed above calculation, and use sizeof(long) instead of
fixed "8" to increate compatibility.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/raid56.c | 4 ++--
 fs/btrfs/scrub.c  | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 5264858..0cfbfcf 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -963,8 +963,8 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_root 
*root,
void *p;
 
rbio = kzalloc(sizeof(*rbio) + num_pages * sizeof(struct page *) * 2 +
-  DIV_ROUND_UP(stripe_npages, BITS_PER_LONG / 8),
-   GFP_NOFS);
+  DIV_ROUND_UP(stripe_npages, BITS_PER_LONG) *
+  sizeof(long), GFP_NOFS);
if (!rbio)
return ERR_PTR(-ENOMEM);
 
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ec57687..1221a56 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2737,7 +2737,7 @@ out:
 
 static inline int scrub_calc_parity_bitmap_len(int nsectors)
 {
-   return DIV_ROUND_UP(nsectors, BITS_PER_LONG) * (BITS_PER_LONG / 8);
+   return DIV_ROUND_UP(nsectors, BITS_PER_LONG) * sizeof(long);
 }
 
 static void scrub_parity_get(struct scrub_parity *sparity)
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/6] btrfs: Clear PageUptodate bit in alloc_rbio_parity_pages()

2015-03-04 Thread Zhaolei

From: Zhao Lei 

Signed-off-by: Zhao Lei 
---
 fs/btrfs/raid56.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 0d902ac..0a40d07 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1049,6 +1049,7 @@ static int alloc_rbio_parity_pages(struct btrfs_raid_bio 
*rbio)
if (!page)
return -ENOMEM;
rbio->stripe_pages[i] = page;
+   ClearPageUptodate(page);
}
return 0;
 }
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/6] btrfs: Remove unused err = 0 line for raid_rmw_end_io()

2015-03-04 Thread Zhaolei

From: Zhao Lei 

Signed-off-by: Zhao Lei 
---
 fs/btrfs/raid56.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 2285e78..c087870 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1464,7 +1464,6 @@ static void raid_rmw_end_io(struct bio *bio, int err)
if (!atomic_dec_and_test(&rbio->stripes_pending))
return;
 
-   err = 0;
if (atomic_read(&rbio->error) > rbio->bbio->max_errors)
goto cleanup;
 
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: Support busy loop of write and delete

2015-03-02 Thread Zhaolei

From: Zhao Lei 

Reproduce:
 while true; do
   dd if=/dev/zero of=/mnt/btrfs/file count=[75% fs_size]
   rm /mnt/btrfs/file
 done
 Then we can see above loop failed on NO_SPACE.

It it long-term problem since very beginning, because delayed-iput
after rm are not run.

We already have commit_transaction() in alloc_space code, but it is
not triggered in above case.
This patch trigger commit_transaction() to run delayed-iput and
reflash pinned-space to to make write success.

It is based on previous fix of delayed-iput in commit_transaction(),
need to be applied on top of:
btrfs: Fix NO_SPACE bug caused by delayed-iput

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 6c19033..6c1e211 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3599,13 +3599,13 @@ int btrfs_check_data_free_space(struct inode *inode, 
u64 bytes)
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_fs_info *fs_info = root->fs_info;
u64 used;
-   int ret = 0, committed = 0, have_pinned_space = 1, alloc_chunk = 1;
+   int ret = 0, need_commit = 2, have_pinned_space, alloc_chunk = 1;
 
/* make sure bytes are sectorsize aligned */
bytes = ALIGN(bytes, root->sectorsize);
 
if (btrfs_is_free_space_inode(inode)) {
-   committed = 1;
+   need_commit = 0;
ASSERT(current->journal_info);
}
 
@@ -3655,8 +3655,10 @@ alloc:
if (ret < 0) {
if (ret != -ENOSPC)
return ret;
-   else
+   else {
+   have_pinned_space = 1;
goto commit_trans;
+   }
}
 
if (!data_sinfo)
@@ -3670,23 +3672,23 @@ alloc:
 * allocation, and no removed chunk in current transaction,
 * don't bother committing the transaction.
 */
-   if (percpu_counter_compare(&data_sinfo->total_bytes_pinned,
-  used + bytes -
-  data_sinfo->total_bytes) < 0)
-   have_pinned_space = 0;
+   have_pinned_space = percpu_counter_compare(
+   &data_sinfo->total_bytes_pinned,
+   used + bytes - data_sinfo->total_bytes);
spin_unlock(&data_sinfo->lock);
 
/* commit the current transaction and try again */
 commit_trans:
-   if (!committed &&
+   if (need_commit &&
!atomic_read(&root->fs_info->open_ioctl_trans)) {
-   committed = 1;
+   need_commit--;
 
trans = btrfs_join_transaction(root);
if (IS_ERR(trans))
return PTR_ERR(trans);
-   if (have_pinned_space ||
-   trans->transaction->have_free_bgs) {
+   if (have_pinned_space >= 0 ||
+   trans->transaction->have_free_bgs ||
+   need_commit > 0) {
ret = btrfs_commit_transaction(trans, root);
if (ret)
return ret;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/2] btrfs: add WARN_ON() to check is space_info op current

2015-02-25 Thread Zhaolei

From: Zhao Lei 

space_info's value calculation is some complex and easy to cause
bug, add WARN_ON() to help debug.

Changelog v1->v2:
 Put WARN_ON()s under the ENOSPC_DEBUG mount option.
 Suggested by: David Sterba 

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index bfb9105..6c19033 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9415,9 +9415,19 @@ int btrfs_remove_block_group(struct btrfs_trans_handle 
*trans,
 
spin_lock(&block_group->space_info->lock);
list_del_init(&block_group->ro_list);
+
+   if (btrfs_test_opt(root, ENOSPC_DEBUG)) {
+   WARN_ON(block_group->space_info->total_bytes
+   < block_group->key.offset);
+   WARN_ON(block_group->space_info->bytes_readonly
+   < block_group->key.offset);
+   WARN_ON(block_group->space_info->disk_total
+   < block_group->key.offset * factor);
+   }
block_group->space_info->total_bytes -= block_group->key.offset;
block_group->space_info->bytes_readonly -= block_group->key.offset;
block_group->space_info->disk_total -= block_group->key.offset * factor;
+
spin_unlock(&block_group->space_info->lock);
 
memcpy(&key, &block_group->key, sizeof(key));
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 0/2] btrfs: Set relative data on clear btrfs_block_group_cache->pinned

2015-02-25 Thread Zhaolei

From: Zhao Lei 

Changelog v1->v2:
 [PATCH 2/2] btrfs: add WARN_ON() to check is space_info op current
   Put WARN_ON()s under the ENOSPC_DEBUG mount option.
   Suggested by: David Sterba 

Changelog v1->v2:
 drop patch of:
  Remove BUG_ON() when failed searching block_group_cache in 
unpin_extent_range()
  because Filipe Manana  already fixed it with  better way.

Zhao Lei (2):
  btrfs: Set relative data on clear btrfs_block_group_cache->pinned
  btrfs: add WARN_ON() to check is space_info op current

 fs/btrfs/extent-tree.c | 20 
 1 file changed, 20 insertions(+)

-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 1/2] btrfs: Set relative data on clear btrfs_block_group_cache->pinned

2015-02-25 Thread Zhaolei

From: Zhao Lei 

Bug1:
  space_info->bytes_readonly was set to very large(negative) value in
  btrfs_remove_block_group().

Reason:
  Current code set block_group_cache->pinned = 0 in btrfs_delete_unused_bgs(),
  but above space was not counted to space_info->bytes_readonly.

  Then in btrfs_remove_block_group():
block_group->space_info->bytes_readonly -= block_group->key.offset;
  We can see following value in trace:
btrfs_remove_block_group: pid=2677 comm=btrfs-cleaner WARNING: 
bytes_readonly=12582912, key.offset=134217728

Bug2:
  space_info->total_bytes_pinned grow to value larger than fs size.
  In a 1.2G fs, we can get following trace log:
  at first:
ZL_DEBUG: add_pinned_bytes: pid=2710 comm=sync change total_bytes_pinned 
flags=1 869793792 + 95944704 = 965738496
  after some op:
ZL_DEBUG: add_pinned_bytes: pid=2770 comm=sync change total_bytes_pinned 
flags=1 1780178944 + 95944704 = 1876123648
  after some op:
ZL_DEBUG: add_pinned_bytes: pid=3193 comm=sync change total_bytes_pinned 
flags=1 2924568576 + 95551488 = 3020120064
  ...

Reason:
  Similar to bug1, we also need to adjust space_info->total_bytes_pinned
  in above code block.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index dc25e13..bfb9105 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9605,8 +9605,18 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info 
*fs_info)
mutex_unlock(&fs_info->unused_bg_unpin_mutex);
 
/* Reset pinned so btrfs_put_block_group doesn't complain */
+   spin_lock(&space_info->lock);
+   spin_lock(&block_group->lock);
+
+   space_info->bytes_pinned -= block_group->pinned;
+   space_info->bytes_readonly += block_group->pinned;
+   percpu_counter_add(&space_info->total_bytes_pinned,
+  -block_group->pinned);
block_group->pinned = 0;
 
+   spin_unlock(&block_group->lock);
+   spin_unlock(&space_info->lock);
+
/*
 * Btrfs_remove_chunk will abort the transaction if things go
 * horribly wrong.
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/1] btrfs: Fix NO_SPACE bug caused by delayed-iput

2015-02-25 Thread Zhaolei

From: Zhao Lei 

It is the last patch to fix following write fail case:
while true; do
  write a file to 75% fs size
  delete above file
  sync or sleep
done

Above issue is caused by several reason, and fixed in following patch
respectively:
 Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole
  from Forrest Liu 
 btrfs: Fix out-of-space bug
  merged into v4.0-rc1
 btrfs: fix condition of commit transaction
 btrfs: Fix tail space processing in find_free_dev_extent()
 btrfs: Adjust commit-transaction condition to avoid NO_SPACE more
 btrfs: Fix NO_SPACE bug caused by delayed-iput
  this patch

These patchs reduced fail-rate step by step, from 50 fails in 20 * 200
loops, to 0 fails now.
And now we can add a test case to xfstests for above action.

Zhao Lei (1):
  btrfs: Fix NO_SPACE bug caused by delayed-iput

 fs/btrfs/ctree.h   | 1 +
 fs/btrfs/disk-io.c | 3 ++-
 fs/btrfs/inode.c   | 4 
 fs/btrfs/transaction.c | 6 +-
 4 files changed, 12 insertions(+), 2 deletions(-)

-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] btrfs: Fix NO_SPACE bug caused by delayed-iput

2015-02-25 Thread Zhaolei

From: Zhao Lei 

Steps to reproduce:
  while true; do
dd if=/dev/zero of=/btrfs_dir/file count=[fs_size * 75%]
rm /btrfs_dir/file
sync
  done

  And we'll see dd failed because btrfs return NO_SPACE.

Reason:
  Normally, btrfs_commit_transaction() call btrfs_run_delayed_iputs()
  in end to free fs space for next write, but sometimes it hadn't
  done work on time, because btrfs-cleaner thread get delayed-iputs
  from list before, but do iput() after next write.

  This is log:
  [ 2569.050776] comm=btrfs-cleaner func=btrfs_evict_inode() begin

  [ 2569.084280] comm=sync func=btrfs_commit_transaction() call 
btrfs_run_delayed_iputs()
  [ 2569.085418] comm=sync func=btrfs_commit_transaction() done 
btrfs_run_delayed_iputs()
  [ 2569.087554] comm=sync func=btrfs_commit_transaction() end

  [ 2569.191081] comm=dd begin
  [ 2569.790112] comm=dd func=__btrfs_buffered_write() ret=-28

  [ 2569.847479] comm=btrfs-cleaner func=add_pinned_bytes() 0 + 32677888 = 
32677888
  [ 2569.849530] comm=btrfs-cleaner func=add_pinned_bytes() 32677888 + 23834624 
= 56512512
  ...
  [ 2569.903893] comm=btrfs-cleaner func=add_pinned_bytes() 943976448 + 
21762048 = 965738496
  [ 2569.908270] comm=btrfs-cleaner func=btrfs_evict_inode() end

Fix:
  Make btrfs_commit_transaction() wait current running delayed-iputs()
  done in end.

Test:
  Use script similar to above(more complex),
  before patch:
7 failed in 100 * 20 loop.
  after patch:
0 failed in 100 * 20 loop.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/ctree.h   | 1 +
 fs/btrfs/disk-io.c | 3 ++-
 fs/btrfs/inode.c   | 4 
 fs/btrfs/transaction.c | 6 +-
 4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 84c3b00..ec2dac0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1513,6 +1513,7 @@ struct btrfs_fs_info {
 
spinlock_t delayed_iput_lock;
struct list_head delayed_iputs;
+   struct rw_semaphore delayed_iput_sem;
 
/* this protects tree_mod_seq_list */
spinlock_t tree_mod_seq_lock;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f79f385..1c0d8ec 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2241,11 +2241,12 @@ int open_ctree(struct super_block *sb,
spin_lock_init(&fs_info->qgroup_op_lock);
spin_lock_init(&fs_info->buffer_lock);
spin_lock_init(&fs_info->unused_bgs_lock);
-   mutex_init(&fs_info->unused_bg_unpin_mutex);
rwlock_init(&fs_info->tree_mod_log_lock);
+   mutex_init(&fs_info->unused_bg_unpin_mutex);
mutex_init(&fs_info->reloc_mutex);
mutex_init(&fs_info->delalloc_root_mutex);
seqlock_init(&fs_info->profiles_lock);
+   init_rwsem(&fs_info->delayed_iput_sem);
 
init_completion(&fs_info->kobj_unregister);
INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a85c23d..a396bb9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3087,6 +3087,8 @@ void btrfs_run_delayed_iputs(struct btrfs_root *root)
if (empty)
return;
 
+   down_read(&fs_info->delayed_iput_sem);
+
spin_lock(&fs_info->delayed_iput_lock);
list_splice_init(&fs_info->delayed_iputs, &list);
spin_unlock(&fs_info->delayed_iput_lock);
@@ -3097,6 +3099,8 @@ void btrfs_run_delayed_iputs(struct btrfs_root *root)
iput(delayed->inode);
kfree(delayed);
}
+
+   up_read(&fs_info->delayed_iput_sem);
 }
 
 /*
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 7e80f32..175cdef 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2068,8 +2068,12 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans,
 
kmem_cache_free(btrfs_trans_handle_cachep, trans);
 
-   if (current != root->fs_info->transaction_kthread)
+   if (current != root->fs_info->transaction_kthread) {
btrfs_run_delayed_iputs(root);
+   /* make sure that all running delayed iput are done */
+   down_write(&root->fs_info->delayed_iput_sem);
+   up_write(&root->fs_info->delayed_iput_sem);
+   }
 
return ret;
 
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/2] btrfs: Set relative data on clear btrfs_block_group_cache->pinned

2015-02-24 Thread Zhaolei

From: Zhao Lei 

Changelog v1->v2:
 drop patch of:
  Remove BUG_ON() when failed searching block_group_cache in 
unpin_extent_range()
 because Filipe Manana  already fixed it with
 better way.

Zhao Lei (2):
  btrfs: Set relative data on clear btrfs_block_group_cache->pinned
  btrfs: add WARN_ON() to check is space_info op current

 fs/btrfs/extent-tree.c | 18 ++
 1 file changed, 18 insertions(+)

-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] btrfs: add WARN_ON() to check is space_info op current

2015-02-24 Thread Zhaolei

From: Zhao Lei 

space_info's value calculation is some complex and easy to cause
bug, add WARN_ON() to help debug.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3e7a4af..8b51eb5 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9471,9 +9471,17 @@ int btrfs_remove_block_group(struct btrfs_trans_handle 
*trans,
 
spin_lock(&block_group->space_info->lock);
list_del_init(&block_group->ro_list);
+
+   WARN_ON(block_group->space_info->total_bytes
+   < block_group->key.offset);
+   WARN_ON(block_group->space_info->bytes_readonly
+   < block_group->key.offset);
+   WARN_ON(block_group->space_info->disk_total
+   < block_group->key.offset * factor);
block_group->space_info->total_bytes -= block_group->key.offset;
block_group->space_info->bytes_readonly -= block_group->key.offset;
block_group->space_info->disk_total -= block_group->key.offset * factor;
+
spin_unlock(&block_group->space_info->lock);
 
memcpy(&key, &block_group->key, sizeof(key));
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] btrfs: Set relative data on clear btrfs_block_group_cache->pinned

2015-02-24 Thread Zhaolei

From: Zhao Lei 

Bug1:
  space_info->bytes_readonly was set to very large(negative) value in
  btrfs_remove_block_group().

Reason:
  Current code set block_group_cache->pinned = 0 in btrfs_delete_unused_bgs(),
  but above space was not counted to space_info->bytes_readonly.

  Then in btrfs_remove_block_group():
block_group->space_info->bytes_readonly -= block_group->key.offset;
  We can see following value in trace:
btrfs_remove_block_group: pid=2677 comm=btrfs-cleaner WARNING: 
bytes_readonly=12582912, key.offset=134217728

Bug2:
  space_info->total_bytes_pinned grow to value larger than fs size.
  In a 1.2G fs, we can get following trace log:
  at first:
ZL_DEBUG: add_pinned_bytes: pid=2710 comm=sync change total_bytes_pinned 
flags=1 869793792 + 95944704 = 965738496
  after some op:
ZL_DEBUG: add_pinned_bytes: pid=2770 comm=sync change total_bytes_pinned 
flags=1 1780178944 + 95944704 = 1876123648
  after some op:
ZL_DEBUG: add_pinned_bytes: pid=3193 comm=sync change total_bytes_pinned 
flags=1 2924568576 + 95551488 = 3020120064
  ...

Reason:
  Similar to bug1, we also need to adjust space_info->total_bytes_pinned
  in above code block.

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 4ffce64..3e7a4af 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9645,8 +9645,18 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info 
*fs_info)
}
 
/* Reset pinned so btrfs_put_block_group doesn't complain */
+   spin_lock(&space_info->lock);
+   spin_lock(&block_group->lock);
+
+   space_info->bytes_pinned -= block_group->pinned;
+   space_info->bytes_readonly += block_group->pinned;
+   percpu_counter_add(&space_info->total_bytes_pinned,
+  -block_group->pinned);
block_group->pinned = 0;
 
+   spin_unlock(&block_group->lock);
+   spin_unlock(&space_info->lock);
+
/*
 * Btrfs_remove_chunk will abort the transaction if things go
 * horribly wrong.
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 182 matches

Mail list logo