Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-11-02 Thread Christoph Anton Mitterer
On Sat, 2018-11-03 at 09:34 +0800, Su Yue wrote:
> Sorry for the late reply cause I'm busy at other things.
No worries :-)


> I just looked through related codes and found the bug.
> The patches can fix it. So no need to do more tests.
> Thanks to your tests and patience. :)
Thanks for fixing :-)


Best wishes,
Chris.



Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-11-02 Thread Su Yue




On 2018/11/2 10:10 PM, Christoph Anton Mitterer wrote:

Hey Su.



Sorry for the late reply cause I'm busy at other things.


Anything further I need to do in this matter or can I consider it
"solved" and you won't need further testing by my side, but just PR the
patches of that branch? :-)



I just looked through related codes and found the bug.
The patches can fix it. So no need to do more tests.
Thanks to your tests and patience. :)


In previous output of debug version, we can see @ret code
is 524296 which is (DIR_ITEM_MISMATCH(1 << 3) | DIR_INDEX_MISMATCH (1<<19)).

In btrfs-progs v4.17,
function check_inode_extref() passes u64 @mode as the last parameter
of find_dir_item();
However, find_dir_item() is defined as:
static int find_dir_item(struct btrfs_root *root, struct btrfs_key *key,
 struct btrfs_key *location_key, char *name,
 u32 namelen, u8 file_type);

The type of the last argument is u8 not u64.

So the case is that while checking files with inode_extrefs,
if (imode != file_type), then find_dir_item() thinks it found
DIR_ITEM_MISMATCH or DIR_INDEX_MISMATCH.

Thanks,
Su


Thanks,
Chris.

On Sat, 2018-10-27 at 14:15 +0200, Christoph Anton Mitterer wrote:

Hey.


Without the last patches on 4.17:

checking extents
checking free space cache
checking fs roots
ERROR: errors found in fs roots
Checking filesystem on /dev/mapper/system
UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c
found 619543498752 bytes used, error(s) found
total csum bytes: 602382204
total tree bytes: 2534309888
total fs tree bytes: 1652097024
total extent tree bytes: 160432128
btree space waste bytes: 459291608
file data blocks allocated: 7334036647936
  referenced 730839187456


With the last patches, on 4.17:

checking extents
checking free space cache
checking fs roots
checking only csum items (without verifying data)
checking root refs
Checking filesystem on /dev/mapper/system
UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c
found 619543498752 bytes used, no error found
total csum bytes: 602382204
total tree bytes: 2534309888
total fs tree bytes: 1652097024
total extent tree bytes: 160432128
btree space waste bytes: 459291608
file data blocks allocated: 7334036647936
  referenced 730839187456


Cheers,
Chris.





Re: Salvage files from broken btrfs

2018-11-02 Thread Qu Wenruo


On 2018/11/3 上午1:18, M. Klingmann wrote:
> On 02.11.2018 at 15:45 Qu Wenruo wrote:
>> On 2018/11/2 下午10:30, M. Klingmann wrote:
>>> On 31.10.2018 at 01:03 Qu Wenruo wrote:
 My plan for such recovery is:

 1) btrfs ins dump-super to make sure system chunk array is valid
 2) btrfs-find-root to find any valid chunk tree blocks
 3) pass that chunk tree bytenr to btrfs-restore
Unfortunately, btrfs-restore doesn't support specifying chunk root
yet. But it's pretty easy to add such support.

 So, please provide the "btrfs ins dump-super -Ffa" output to start with.
>>> Following your plan, I did 1) and 2).
>>> As 2) failed (see below), is there anything I can do to find the tree
>>> bytenr to supply btrfs-restore with it?
>>>
>>> 1) Here's the output given by "btrfs-show-super -Ffa":
>>>
>>> superblock: bytenr=65536, device=sdcard.iso
>>> -
>>> csum            0xb8e15dd7 [match]
> [snip]
>>> 2) "btrfs-find-root" yields "Couldn't read chunk root; Open ctree failed".
>> It's not plain "btrfs-find-root" but "btrfs-find-root -o 5".
>>
>> And you should use btrfs-progs v4.17.1, not the old v4.4.
>> The ability to continue search even if chunk tree get corrupted is added
>> in v4.5, and I strongly recommend to use latest (v4.17.1) for a lot of
>> fixes and extra debug output.
>>
>> If you can't find any handy way to update btrfs-progs, you could use
>> Archlinux iso as a rescue OS to use the latest btrfs-progs.
> 
> Using Archlinux in fact is the easiest way to use version 4.17.1
> (Archlinux for 2018-11-01).
> 
> Here's the output from "btrfs-find-root sdcard.iso":
> 
> WARNING: cannot read chunk root, continue anyway
> Superblock thinks the generation is 1757933
> Superblock thinks the level is 0
> 
> Here's the output using "btrfs-find-root -o 5 sdcard.iso":
> 
> WARNING: cannot read chunk root, continue anyway
> Superblock doesn't contain generation info for root 5
> Superblock doesn't contain the level info for root 5

No other output at all?

That means the whole 8M range of system chunk get corrupted.
Thus really no way to get any meaningful data out of the filesystem,
unfortunately.

Thanks,
Qu

> 
>> For 3), I could easily add such feature btrfs-restore, or just use
>> manually patching your superblock to continue.
>> So as soon as your "btrfs-find-root -o 5" gets some valid output, I
>> could continue the work.
>>
> Thank you.
> 



signature.asc
Description: OpenPGP digital signature


[PATCH 5/7] btrfs: test swap files on multiple devices

2018-11-02 Thread Omar Sandoval
From: Omar Sandoval 

Swap files currently need to exist on exactly one device in exactly one
place.

Signed-off-by: Omar Sandoval 
---
 tests/btrfs/175 | 73 +
 tests/btrfs/175.out |  8 +
 tests/btrfs/group   |  1 +
 3 files changed, 82 insertions(+)
 create mode 100755 tests/btrfs/175
 create mode 100644 tests/btrfs/175.out

diff --git a/tests/btrfs/175 b/tests/btrfs/175
new file mode 100755
index ..64afc4f0
--- /dev/null
+++ b/tests/btrfs/175
@@ -0,0 +1,73 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2018 Facebook.  All Rights Reserved.
+#
+# FS QA Test 175
+#
+# Test swap file activation on multiple devices.
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+. ./common/rc
+. ./common/filter
+
+rm -f $seqres.full
+
+_supported_fs generic
+_supported_os Linux
+_require_scratch_dev_pool 2
+_require_scratch_swapfile
+
+cycle_swapfile() {
+   local sz=${1:-$(($(get_page_size) * 10))}
+   _format_swapfile "$SCRATCH_MNT/swap" "$sz"
+   swapon "$SCRATCH_MNT/swap" 2>&1 | _filter_scratch
+   swapoff "$SCRATCH_MNT/swap" > /dev/null 2>&1
+}
+
+echo "RAID 1"
+_scratch_pool_mkfs -d raid1 -m raid1 >> $seqres.full 2>&1
+_scratch_mount
+cycle_swapfile
+_scratch_unmount
+
+echo "DUP"
+_scratch_pool_mkfs -d dup -m dup >> $seqres.full 2>&1
+_scratch_mount
+cycle_swapfile
+_scratch_unmount
+
+echo "Single on multiple devices"
+_scratch_pool_mkfs -d single -m raid1 -b $((1024 * 1024 * 1024)) >> 
$seqres.full 2>&1
+_scratch_mount
+# Each device is only 1 GB, so 1.5 GB must be split across multiple devices.
+cycle_swapfile $((3 * 1024 * 1024 * 1024 / 2))
+_scratch_unmount
+
+echo "Single on one device"
+_scratch_mkfs >> $seqres.full 2>&1
+_scratch_mount
+# Create the swap file, then add the device. That way we know it's all on one
+# device.
+_format_swapfile "$SCRATCH_MNT/swap" $(($(get_page_size) * 10))
+scratch_dev2="$(echo "${SCRATCH_DEV_POOL}" | awk '{ print $2 }')"
+$BTRFS_UTIL_PROG device add -f "$scratch_dev2" "$SCRATCH_MNT"
+swapon "$SCRATCH_MNT/swap" 2>&1 | _filter_scratch
+swapoff "$SCRATCH_MNT/swap" > /dev/null 2>&1
+_scratch_unmount
+
+status=0
+exit
diff --git a/tests/btrfs/175.out b/tests/btrfs/175.out
new file mode 100644
index ..ce2e5992
--- /dev/null
+++ b/tests/btrfs/175.out
@@ -0,0 +1,8 @@
+QA output created by 175
+RAID 1
+swapon: SCRATCH_MNT/swap: swapon failed: Invalid argument
+DUP
+swapon: SCRATCH_MNT/swap: swapon failed: Invalid argument
+Single on multiple devices
+swapon: SCRATCH_MNT/swap: swapon failed: Invalid argument
+Single on one device
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 2e10f7df..b6160b72 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -177,3 +177,4 @@
 172 auto quick punch
 173 auto quick swap
 174 auto quick swap
+175 auto quick swap
-- 
2.19.1



[PATCH 4/7] btrfs: test invalid operations on a swap file

2018-11-02 Thread Omar Sandoval
From: Omar Sandoval 

Btrfs forbids some operations which should not be done on a swap file.

Signed-off-by: Omar Sandoval 
---
 tests/btrfs/174 | 66 +
 tests/btrfs/174.out | 10 +++
 tests/btrfs/group   |  1 +
 3 files changed, 77 insertions(+)
 create mode 100755 tests/btrfs/174
 create mode 100644 tests/btrfs/174.out

diff --git a/tests/btrfs/174 b/tests/btrfs/174
new file mode 100755
index ..a26e6669
--- /dev/null
+++ b/tests/btrfs/174
@@ -0,0 +1,66 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2018 Facebook.  All Rights Reserved.
+#
+# FS QA Test 174
+#
+# Test restrictions on operations that can be done on an active swap file
+# specific to Btrfs.
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+. ./common/rc
+. ./common/filter
+
+rm -f $seqres.full
+
+_supported_fs generic
+_supported_os Linux
+_require_scratch_swapfile
+
+_scratch_mkfs >> $seqres.full 2>&1
+_scratch_mount
+
+$BTRFS_UTIL_PROG subvol create "$SCRATCH_MNT/swapvol" >> $seqres.full
+swapfile="$SCRATCH_MNT/swapvol/swap"
+_format_swapfile "$swapfile" $(($(get_page_size) * 10))
+swapon "$swapfile"
+
+# Turning off nowcow doesn't do anything because the file is not empty, not
+# because the file is a swap file, but make sure this works anyways.
+echo "Disable nocow"
+$CHATTR_PROG -C "$swapfile"
+lsattr -l "$swapfile" | _filter_scratch | _filter_spaces
+
+# Compression we reject outright.
+echo "Enable compression"
+$CHATTR_PROG +c "$swapfile" 2>&1 | grep -o "Text file busy"
+lsattr -l "$swapfile" | _filter_scratch | _filter_spaces
+
+echo "Snapshot"
+$BTRFS_UTIL_PROG subvol snap "$SCRATCH_MNT/swapvol" \
+   "$SCRATCH_MNT/swapsnap" 2>&1 | grep -o "Text file busy"
+
+echo "Defrag"
+# We pass the -c (compress) flag to force defrag even if the file isn't
+# fragmented.
+$BTRFS_UTIL_PROG filesystem defrag -c "$swapfile" 2>&1 | grep -o "Text file 
busy"
+
+swapoff "$swapfile"
+_scratch_unmount
+
+status=0
+exit
diff --git a/tests/btrfs/174.out b/tests/btrfs/174.out
new file mode 100644
index ..bc24f1fb
--- /dev/null
+++ b/tests/btrfs/174.out
@@ -0,0 +1,10 @@
+QA output created by 174
+Disable nocow
+SCRATCH_MNT/swapvol/swap No_COW
+Enable compression
+Text file busy
+SCRATCH_MNT/swapvol/swap No_COW
+Snapshot
+Text file busy
+Defrag
+Text file busy
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 3525014f..2e10f7df 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -176,3 +176,4 @@
 171 auto quick qgroup
 172 auto quick punch
 173 auto quick swap
+174 auto quick swap
-- 
2.19.1



[PATCH 6/7] btrfs: test device add/remove/replace with an active swap file

2018-11-02 Thread Omar Sandoval
From: Omar Sandoval 

Make sure that we don't remove or replace a device with an active swap
file but can add, remove, and replace other devices.

Signed-off-by: Omar Sandoval 
---
 tests/btrfs/176 | 82 +
 tests/btrfs/176.out |  5 +++
 tests/btrfs/group   |  1 +
 3 files changed, 88 insertions(+)
 create mode 100755 tests/btrfs/176
 create mode 100644 tests/btrfs/176.out

diff --git a/tests/btrfs/176 b/tests/btrfs/176
new file mode 100755
index ..1e576149
--- /dev/null
+++ b/tests/btrfs/176
@@ -0,0 +1,82 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2018 Facebook.  All Rights Reserved.
+#
+# FS QA Test 176
+#
+# Test device remove/replace with an active swap file.
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs generic
+_supported_os Linux
+_require_scratch_dev_pool 3
+_require_scratch_swapfile
+
+# We check the filesystem manually because we move devices around.
+rm -f "${RESULT_DIR}/require_scratch"
+
+scratch_dev1="$(echo "${SCRATCH_DEV_POOL}" | awk '{ print $1 }')"
+scratch_dev2="$(echo "${SCRATCH_DEV_POOL}" | awk '{ print $2 }')"
+scratch_dev3="$(echo "${SCRATCH_DEV_POOL}" | awk '{ print $3 }')"
+
+echo "Remove device"
+_scratch_mkfs >> $seqres.full 2>&1
+_scratch_mount
+_format_swapfile "$SCRATCH_MNT/swap" $(($(get_page_size) * 10))
+$BTRFS_UTIL_PROG device add -f "$scratch_dev2" "$SCRATCH_MNT"
+swapon "$SCRATCH_MNT/swap" 2>&1 | _filter_scratch
+# We know the swap file is on device 1 because we added device 2 after it was
+# already created.
+$BTRFS_UTIL_PROG device delete "$scratch_dev1" "$SCRATCH_MNT" 2>&1 | grep -o 
"Text file busy"
+# Deleting/readding device 2 should still work.
+$BTRFS_UTIL_PROG device delete "$scratch_dev2" "$SCRATCH_MNT"
+$BTRFS_UTIL_PROG device add -f "$scratch_dev2" "$SCRATCH_MNT"
+swapoff "$SCRATCH_MNT/swap" > /dev/null 2>&1
+# Deleting device 1 should work again after swapoff.
+$BTRFS_UTIL_PROG device delete "$scratch_dev1" "$SCRATCH_MNT"
+_scratch_unmount
+_check_scratch_fs "$scratch_dev2"
+
+echo "Replace device"
+_scratch_mkfs >> $seqres.full 2>&1
+_scratch_mount
+_format_swapfile "$SCRATCH_MNT/swap" $(($(get_page_size) * 10))
+$BTRFS_UTIL_PROG device add -f "$scratch_dev2" "$SCRATCH_MNT"
+swapon "$SCRATCH_MNT/swap" 2>&1 | _filter_scratch
+# Again, we know the swap file is on device 1.
+$BTRFS_UTIL_PROG replace start -fB "$scratch_dev1" "$scratch_dev3" 
"$SCRATCH_MNT" 2>&1 | grep -o "Text file busy"
+# Replacing device 2 should still work.
+$BTRFS_UTIL_PROG replace start -fB "$scratch_dev2" "$scratch_dev3" 
"$SCRATCH_MNT"
+swapoff "$SCRATCH_MNT/swap" > /dev/null 2>&1
+# Replacing device 1 should work again after swapoff.
+$BTRFS_UTIL_PROG replace start -fB "$scratch_dev1" "$scratch_dev2" 
"$SCRATCH_MNT"
+_scratch_unmount
+_check_scratch_fs "$scratch_dev2"
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/176.out b/tests/btrfs/176.out
new file mode 100644
index ..5c99e0fd
--- /dev/null
+++ b/tests/btrfs/176.out
@@ -0,0 +1,5 @@
+QA output created by 176
+Remove device
+Text file busy
+Replace device
+Text file busy
diff --git a/tests/btrfs/group b/tests/btrfs/group
index b6160b72..3562420b 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -178,3 +178,4 @@
 173 auto quick swap
 174 auto quick swap
 175 auto quick swap
+176 auto quick swap
-- 
2.19.1



[PATCH 0/7] fstests: test Btrfs swapfile support

2018-11-02 Thread Omar Sandoval
From: Omar Sandoval 

This series fixes a couple of generic swapfile tests and adds some
Btrfs-specific swapfile tests. Btrfs swapfile support is scheduled for
4.21 [1].

1: https://www.spinics.net/lists/linux-btrfs/msg83454.html

Thanks!

Omar Sandoval (7):
  generic/{472,496,497}: fix $seeqres typo
  generic/{472,496}: fix swap file creation on Btrfs
  btrfs: test swap file activation restrictions
  btrfs: test invalid operations on a swap file
  btrfs: test swap files on multiple devices
  btrfs: test device add/remove/replace with an active swap file
  btrfs: test balance and resize with an active swap file

 tests/btrfs/173 | 55 ++
 tests/btrfs/173.out |  5 +++
 tests/btrfs/174 | 66 
 tests/btrfs/174.out | 10 ++
 tests/btrfs/175 | 73 
 tests/btrfs/175.out |  8 +
 tests/btrfs/176 | 82 +
 tests/btrfs/176.out |  5 +++
 tests/btrfs/177 | 64 +++
 tests/btrfs/177.out |  6 
 tests/btrfs/group   |  5 +++
 tests/generic/472   | 16 -
 tests/generic/496   |  8 ++---
 tests/generic/497   |  2 +-
 14 files changed, 391 insertions(+), 14 deletions(-)
 create mode 100755 tests/btrfs/173
 create mode 100644 tests/btrfs/173.out
 create mode 100755 tests/btrfs/174
 create mode 100644 tests/btrfs/174.out
 create mode 100755 tests/btrfs/175
 create mode 100644 tests/btrfs/175.out
 create mode 100755 tests/btrfs/176
 create mode 100644 tests/btrfs/176.out
 create mode 100755 tests/btrfs/177
 create mode 100644 tests/btrfs/177.out

-- 
2.19.1



[PATCH 7/7] btrfs: test balance and resize with an active swap file

2018-11-02 Thread Omar Sandoval
From: Omar Sandoval 

Make sure we don't shrink the device past an active swap file, but allow
shrinking otherwise, as well as growing and balance.

Signed-off-by: Omar Sandoval 
---
 tests/btrfs/177 | 64 +
 tests/btrfs/177.out |  6 +
 tests/btrfs/group   |  1 +
 3 files changed, 71 insertions(+)
 create mode 100755 tests/btrfs/177
 create mode 100644 tests/btrfs/177.out

diff --git a/tests/btrfs/177 b/tests/btrfs/177
new file mode 100755
index ..12dad8fc
--- /dev/null
+++ b/tests/btrfs/177
@@ -0,0 +1,64 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2018 Facebook.  All Rights Reserved.
+#
+# FS QA Test 177
+#
+# Test relocation (balance and resize) with an active swap file.
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+. ./common/rc
+. ./common/filter
+. ./common/btrfs
+
+rm -f $seqres.full
+
+# Modify as appropriate.
+_supported_fs generic
+_supported_os Linux
+_require_scratch_swapfile
+
+swapfile="$SCRATCH_MNT/swap"
+
+# First, create a 1GB filesystem and fill it up.
+_scratch_mkfs_sized $((1024 * 1024 * 1024)) >> $seqres.full 2>&1
+_scratch_mount
+dd if=/dev/zero of="$SCRATCH_MNT/fill" bs=1024k >> $seqres.full 2>&1
+# Now add more space and create a swap file. We know that the first 1GB of the
+# filesystem was used, so the swap file must be in the new part of the
+# filesystem.
+$BTRFS_UTIL_PROG filesystem resize 2G "$SCRATCH_MNT" | _filter_scratch
+_format_swapfile "$swapfile" $((32 * 1024 * 1024))
+swapon "$swapfile"
+# Add even more space which we know is unused.
+$BTRFS_UTIL_PROG filesystem resize 3G "$SCRATCH_MNT" | _filter_scratch
+# Free up the first 1GB of the filesystem.
+rm -f "$SCRATCH_MNT/fill"
+# Get rid of empty block groups and also make sure that balance skips block
+# groups containing active swap files.
+_run_btrfs_balance_start "$SCRATCH_MNT"
+# Shrink away the unused space.
+$BTRFS_UTIL_PROG filesystem resize 2G "$SCRATCH_MNT" | _filter_scratch
+# Try to shrink away the area occupied by the swap file, which should fail.
+$BTRFS_UTIL_PROG filesystem resize 1G "$SCRATCH_MNT" 2>&1 | grep -o "Text file 
busy"
+swapoff "$swapfile"
+# It should work again after swapoff.
+$BTRFS_UTIL_PROG filesystem resize 1G "$SCRATCH_MNT" | _filter_scratch
+_scratch_unmount
+
+status=0
+exit
diff --git a/tests/btrfs/177.out b/tests/btrfs/177.out
new file mode 100644
index ..6ced01da
--- /dev/null
+++ b/tests/btrfs/177.out
@@ -0,0 +1,6 @@
+QA output created by 177
+Resize 'SCRATCH_MNT' of '2G'
+Resize 'SCRATCH_MNT' of '3G'
+Resize 'SCRATCH_MNT' of '2G'
+Text file busy
+Resize 'SCRATCH_MNT' of '1G'
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 3562420b..0b62e58a 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -179,3 +179,4 @@
 174 auto quick swap
 175 auto quick swap
 176 auto quick swap
+177 auto quick swap
-- 
2.19.1



[PATCH 1/7] generic/{472,496,497}: fix $seeqres typo

2018-11-02 Thread Omar Sandoval
From: Omar Sandoval 

Signed-off-by: Omar Sandoval 
---
 tests/generic/472 | 2 +-
 tests/generic/496 | 2 +-
 tests/generic/497 | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/generic/472 b/tests/generic/472
index c74d6c70..04ed3e73 100755
--- a/tests/generic/472
+++ b/tests/generic/472
@@ -51,7 +51,7 @@ swapfile_cycle() {
$CHATTR_PROG +C $swapfile >> $seqres.full 2>&1
"$here/src/mkswap" $swapfile >> $seqres.full
"$here/src/swapon" $swapfile 2>&1 | _filter_scratch
-   swapoff $swapfile 2>> $seeqres.full
+   swapoff $swapfile 2>> $seqres.full
rm -f $swapfile
 }
 
diff --git a/tests/generic/496 b/tests/generic/496
index 1c9651ad..968b8012 100755
--- a/tests/generic/496
+++ b/tests/generic/496
@@ -53,7 +53,7 @@ swapfile_cycle() {
$CHATTR_PROG +C $swapfile >> $seqres.full 2>&1
"$here/src/mkswap" $swapfile >> $seqres.full
"$here/src/swapon" $swapfile 2>&1 | _filter_scratch
-   swapoff $swapfile 2>> $seeqres.full
+   swapoff $swapfile 2>> $seqres.full
rm -f $swapfile
 }
 
diff --git a/tests/generic/497 b/tests/generic/497
index 584af58a..3d5502ef 100755
--- a/tests/generic/497
+++ b/tests/generic/497
@@ -53,7 +53,7 @@ swapfile_cycle() {
$CHATTR_PROG +C $swapfile >> $seqres.full 2>&1
"$here/src/mkswap" $swapfile >> $seqres.full
"$here/src/swapon" $swapfile 2>&1 | _filter_scratch
-   swapoff $swapfile 2>> $seeqres.full
+   swapoff $swapfile 2>> $seqres.full
rm -f $swapfile
 }
 
-- 
2.19.1



[PATCH 2/7] generic/{472,496}: fix swap file creation on Btrfs

2018-11-02 Thread Omar Sandoval
From: Omar Sandoval 

The swap file must be set nocow before it is written to, otherwise it is
ignored and Btrfs refuses to activate it as swap.

Fixes: 25ce9740065e ("generic: test swapfile creation, activation, and 
deactivation")
Signed-off-by: Omar Sandoval 
---
 tests/generic/472 | 14 ++
 tests/generic/496 |  6 +++---
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/tests/generic/472 b/tests/generic/472
index 04ed3e73..aba4a007 100755
--- a/tests/generic/472
+++ b/tests/generic/472
@@ -42,13 +42,15 @@ _scratch_mount >>$seqres.full 2>&1
 
 swapfile=$SCRATCH_MNT/swap
 len=$((2 * 1048576))
-page_size=$(get_page_size)
 
 swapfile_cycle() {
local swapfile="$1"
+   local len="$2"
 
+   touch $swapfile
# Swap files must be nocow on Btrfs.
$CHATTR_PROG +C $swapfile >> $seqres.full 2>&1
+   _pwrite_byte 0x58 0 $len $swapfile >> $seqres.full
"$here/src/mkswap" $swapfile >> $seqres.full
"$here/src/swapon" $swapfile 2>&1 | _filter_scratch
swapoff $swapfile 2>> $seqres.full
@@ -57,20 +59,16 @@ swapfile_cycle() {
 
 # Create a regular swap file
 echo "regular swap" | tee -a $seqres.full
-_pwrite_byte 0x58 0 $len $swapfile >> $seqres.full
-swapfile_cycle $swapfile
+swapfile_cycle $swapfile $len
 
 # Create a swap file with a little too much junk on the end
 echo "too long swap" | tee -a $seqres.full
-_pwrite_byte 0x58 0 $((len + 3)) $swapfile >> $seqres.full
-swapfile_cycle $swapfile
+swapfile_cycle $swapfile $((len + 3))
 
 # Create a ridiculously small swap file.  Each swap file must have at least
 # two pages after the header page.
 echo "tiny swap" | tee -a $seqres.full
-tiny_len=$((page_size * 3))
-_pwrite_byte 0x58 0 $tiny_len $swapfile >> $seqres.full
-swapfile_cycle $swapfile
+swapfile_cycle $swapfile $(($(get_page_size) * 3))
 
 status=0
 exit
diff --git a/tests/generic/496 b/tests/generic/496
index 968b8012..3083eef0 100755
--- a/tests/generic/496
+++ b/tests/generic/496
@@ -49,8 +49,6 @@ page_size=$(get_page_size)
 swapfile_cycle() {
local swapfile="$1"
 
-   # Swap files must be nocow on Btrfs.
-   $CHATTR_PROG +C $swapfile >> $seqres.full 2>&1
"$here/src/mkswap" $swapfile >> $seqres.full
"$here/src/swapon" $swapfile 2>&1 | _filter_scratch
swapoff $swapfile 2>> $seqres.full
@@ -59,8 +57,10 @@ swapfile_cycle() {
 
 # Create a fallocated swap file
 echo "fallocate swap" | tee -a $seqres.full
-$XFS_IO_PROG -f -c "falloc 0 $len" $swapfile >> $seqres.full
+touch $swapfile
+# Swap files must be nocow on Btrfs.
 $CHATTR_PROG +C $swapfile >> $seqres.full 2>&1
+$XFS_IO_PROG -f -c "falloc 0 $len" $swapfile >> $seqres.full
 "$here/src/mkswap" $swapfile
 "$here/src/swapon" $swapfile >> $seqres.full 2>&1 || \
_notrun "fallocated swap not supported here"
-- 
2.19.1



[PATCH 3/7] btrfs: test swap file activation restrictions

2018-11-02 Thread Omar Sandoval
From: Omar Sandoval 

Swap files on Btrfs have some restrictions not applicable to other
filesystems.

Signed-off-by: Omar Sandoval 
---
 tests/btrfs/173 | 55 +
 tests/btrfs/173.out |  5 +
 tests/btrfs/group   |  1 +
 3 files changed, 61 insertions(+)
 create mode 100755 tests/btrfs/173
 create mode 100644 tests/btrfs/173.out

diff --git a/tests/btrfs/173 b/tests/btrfs/173
new file mode 100755
index ..665bec39
--- /dev/null
+++ b/tests/btrfs/173
@@ -0,0 +1,55 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2018 Facebook.  All Rights Reserved.
+#
+# FS QA Test 173
+#
+# Test swap file activation restrictions specific to Btrfs.
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+. ./common/rc
+. ./common/filter
+
+rm -f $seqres.full
+
+_supported_fs generic
+_supported_os Linux
+_require_scratch_swapfile
+
+_scratch_mkfs >> $seqres.full 2>&1
+_scratch_mount
+
+echo "COW file"
+# We can't use _format_swapfile because we don't want chattr +C, and we can't
+# unset it after the swap file has been created.
+rm -f "$SCRATCH_MNT/swap"
+touch "$SCRATCH_MNT/swap"
+chmod 0600 "$SCRATCH_MNT/swap"
+_pwrite_byte 0x61 0 $(($(get_page_size) * 10)) "$SCRATCH_MNT/swap" >> 
$seqres.full
+mkswap "$SCRATCH_MNT/swap" >> $seqres.full
+swapon "$SCRATCH_MNT/swap" 2>&1 | _filter_scratch
+swapoff "$SCRATCH_MNT/swap" >/dev/null 2>&1
+
+echo "Compressed file"
+rm -f "$SCRATCH_MNT/swap"
+_format_swapfile "$SCRATCH_MNT/swap" $(($(get_page_size) * 10))
+$CHATTR_PROG +c "$SCRATCH_MNT/swap"
+swapon "$SCRATCH_MNT/swap" 2>&1 | _filter_scratch
+swapoff "$SCRATCH_MNT/swap" >/dev/null 2>&1
+
+status=0
+exit
diff --git a/tests/btrfs/173.out b/tests/btrfs/173.out
new file mode 100644
index ..6d7856bf
--- /dev/null
+++ b/tests/btrfs/173.out
@@ -0,0 +1,5 @@
+QA output created by 173
+COW file
+swapon: SCRATCH_MNT/swap: swapon failed: Invalid argument
+Compressed file
+swapon: SCRATCH_MNT/swap: swapon failed: Invalid argument
diff --git a/tests/btrfs/group b/tests/btrfs/group
index a490d7eb..3525014f 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -175,3 +175,4 @@
 170 auto quick snapshot
 171 auto quick qgroup
 172 auto quick punch
+173 auto quick swap
-- 
2.19.1



BTRFS did it's job nicely (thanks!)

2018-11-02 Thread waxhead

Hi,

my main computer runs on a 7x SSD BTRFS as rootfs with
data:RAID1 and metadata:RAID10.

One SSD is probably about to fail, and it seems that BTRFS fixed it 
nicely (thanks everyone!)


I decided to just post the ugly details in case someone just wants to 
have a look. Note that I tend to interpret the btrfs de st / output as 
if the error was NOT fixed even if (seems clearly that) it was, so I 
think the output is a bit misleading... just saying...




-- below are the details for those curious (just for fun) ---

scrub status for [YOINK!]
scrub started at Fri Nov  2 17:49:45 2018 and finished after 
00:29:26

total bytes scrubbed: 1.15TiB with 1 errors
error details: csum=1
corrected errors: 1, uncorrectable errors: 0, unverified errors: 0

 btrfs fi us -T /
Overall:
Device size:   1.18TiB
Device allocated:  1.17TiB
Device unallocated:9.69GiB
Device missing:  0.00B
Used:  1.17TiB
Free (estimated):  6.30GiB  (min: 6.30GiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

 Data  Metadata  System
Id Path  RAID1 RAID10RAID10Unallocated
-- - - - - ---
 6 /dev/sda1 236.28GiB 704.00MiB  32.00MiB   485.00MiB
 7 /dev/sdb1 233.72GiB   1.03GiB  32.00MiB 2.69GiB
 2 /dev/sdc1 110.56GiB 352.00MiB -   904.00MiB
 8 /dev/sdd1 234.96GiB   1.03GiB  32.00MiB 1.45GiB
 1 /dev/sde1 164.90GiB   1.03GiB  32.00MiB 1.72GiB
 9 /dev/sdf1 109.00GiB   1.03GiB  32.00MiB   744.00MiB
10 /dev/sdg1 107.98GiB   1.03GiB  32.00MiB 1.74GiB
-- - - - - ---
   Total 598.70GiB   3.09GiB  96.00MiB 9.69GiB
   Used  597.25GiB   1.57GiB 128.00KiB



uname -a
Linux main 4.18.0-2-amd64 #1 SMP Debian 4.18.10-2 (2018-10-07) x86_64 
GNU/Linux


btrfs --version
btrfs-progs v4.17


dmesg | grep -i btrfs
[7.801817] Btrfs loaded, crc32c=crc32c-generic
[8.163288] BTRFS: device label btrfsroot devid 10 transid 669961 
/dev/sdg1
[8.163433] BTRFS: device label btrfsroot devid 9 transid 669961 
/dev/sdf1
[8.163591] BTRFS: device label btrfsroot devid 1 transid 669961 
/dev/sde1
[8.163734] BTRFS: device label btrfsroot devid 8 transid 669961 
/dev/sdd1
[8.163974] BTRFS: device label btrfsroot devid 2 transid 669961 
/dev/sdc1
[8.164117] BTRFS: device label btrfsroot devid 7 transid 669961 
/dev/sdb1
[8.164262] BTRFS: device label btrfsroot devid 6 transid 669961 
/dev/sda1

[8.206174] BTRFS info (device sde1): disk space caching is enabled
[8.206236] BTRFS info (device sde1): has skinny extents
[8.348610] BTRFS info (device sde1): enabling ssd optimizations
[8.854412] BTRFS info (device sde1): enabling free space tree
[8.854471] BTRFS info (device sde1): using free space tree
[   68.170580] BTRFS warning (device sde1): csum failed root 3760 ino 
3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
[   68.185973] BTRFS warning (device sde1): csum failed root 3760 ino 
3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
[   68.185991] BTRFS warning (device sde1): csum failed root 3760 ino 
3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
[   68.186003] BTRFS warning (device sde1): csum failed root 3760 ino 
3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
[   68.186015] BTRFS warning (device sde1): csum failed root 3760 ino 
3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
[   68.186028] BTRFS warning (device sde1): csum failed root 3760 ino 
3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
[   68.186041] BTRFS warning (device sde1): csum failed root 3760 ino 
3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
[   68.186052] BTRFS warning (device sde1): csum failed root 3760 ino 
3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
[   68.186063] BTRFS warning (device sde1): csum failed root 3760 ino 
3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
[   68.186075] BTRFS warning (device sde1): csum failed root 3760 ino 
3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2
[   68.199237] BTRFS info (device sde1): read error corrected: ino 
3247424 off 36700160 (dev /dev/sda1 sector 244987192)
[   68.202602] BTRFS info (device sde1): read error corrected: ino 
3247424 off 36704256 (dev /dev/sda1 sector 244987192)
[   68.203176] BTRFS info (device sde1): read error corrected: ino 
3247424 off 36712448 (dev /dev/sda1 sector 244987192)
[   68.206762] BTRFS info (device sde1): read error corrected: ino 
3247424 off 36708352 (dev /dev/sda1 sector 244987192)
[   68.212071] BTRFS info 

Big amount of snapshots and disk full

2018-11-02 Thread linux-btrfs



Hi,

I am doing backups with rsync and do versioning with btrfs snapshots. 
And last week I got file system full.  I googled some and noticed that 
there might be bug in space_cache so I cleared cache and mounted it with 
nospace_cache and system worked about week (managed to delete & create 
tens of snapshots during that time. Before 1st issue system have worked 
without problems over year).  Now I got same issue again.  Any Ideas what 
I should try next?  Disk have never been even half full.


An error occurred when script was creating new snapshot, rsync 
before that was fine.  (And there was another rsync to another destination 
running at same time)


Is there some issue with big amount of snapshots?

I have nearly 50 machines on backup and most of them have less than ten 
versions, so my subvol hierarchy is like this.


data (btrfsroot)
 machine1 (subvol)
  last (subvol)
  [date] (readonly snapshot of last)
  [date2] (readonly snapshot of last)
  ...
 machine2 (subvol)
  last (subvol)
  [date] (readonly snapshot of last)
..


root@backuprasia:~# btrfs sub list /data |wc -l
492


root@backuprasia:~# btrfs fi usage /data
Overall:
Device size:  30.00TiB
Device allocated: 13.62TiB
Device unallocated:   16.38TiB
Device missing:  0.00B
Used: 13.59TiB
Free (estimated): 16.40TiB  (min: 16.40TiB)
Data ratio:   1.00
Metadata ratio:   1.00
Global reserve:  512.00MiB  (used: 499.00MiB)

Data,single: Size:13.45TiB, Used:13.43TiB
   /dev/mapper/datavg-backup1 13.45TiB

Metadata,single: Size:172.01GiB, Used:169.20GiB
   /dev/mapper/datavg-backup1172.01GiB

System,single: Size:4.00MiB, Used:1.62MiB
   /dev/mapper/datavg-backup1  4.00MiB

Unallocated:
   /dev/mapper/datavg-backup1 16.38TiB


root@backuprasia:~# uname -a
Linux backuprasia 4.15.0-38-generic #41~16.04.1-Ubuntu SMP Wed Oct 10 20:16:04 
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux


Nov 02 01:58:31 backuprasia kernel: [ cut here ]
Nov 02 01:58:31 backuprasia kernel: BTRFS: Transaction aborted (error -28)
Nov 02 01:58:31 backuprasia kernel: WARNING: CPU: 8 PID: 14381 at 
/build/linux-hwe-4LSUYr/linux-hwe-4.15.0/fs/btrfs/extent-tree.c:3784 
btrfs_start_dirty_block_groups+0x260//0x430 [btrfs]
Nov 02 01:58:31 backuprasia kernel: [205245.441842] Modules linked in: 
ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common 
xt_LOG xt_limit xt_tcpudp xt_multiport iptable_filter ip_tables x_tables 
intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm 
irqbypass intel_cstate 8021q garp mrp mei_me ipmi_ssif stp intel_rapl_perf llc 
joydev input_leds lpc_ich mei ioatdma shpchp ipmi_si ipmi_devintf 
ipmi_msghandler acpi_pad acpi_power_meter mac_hid ib_iser rdma_cm iw_cm ib_cm 
ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs 
zstd_compress ses enclosure raid10 raid456 async_raid6_recov async_memcpy 
async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear ast 
i2c_algo_bit ixgbe ttm crct10dif_pclmul crc32_pclmul drm_kms_helper 
ghash_clmulni_intel syscopyarea
Nov 02 01:58:31 backuprasia kernel: [205245.441911]  pcbc sysfillrect 
aesni_intel sysimgblt fb_sys_fops aes_x86_64 raid1 hid_generic crypto_simd 
usbhid dca glue_helper mpt3sas ahci ptp raid_class drm r8169 hid mxm_wmi cryptd 
libahci pps_core scsi_transport_sas mii mdio wmi
Nov 02 01:58:31 backuprasia kernel:  pcbc sysfillrect aesni_intel sysimgblt 
fb_sys_fops aes_x86_64 raid1 hid_generic crypto_simd usbhid dca glue_helper 
mpt3sas ahci ptp rai
Nov 02 01:58:31 backuprasia kernel: CPU: 8 PID: 14381 Comm: btrfs-transacti Not 
tainted 4.15.0-38-generic #41~16.04.1-Ubuntu
Nov 02 01:58:31 backuprasia kernel: Hardware name: Supermicro X10DRi/X10DRI-T, 
BIOS 2.1 09/13/2016
Nov 02 01:58:31 backuprasia kernel: RIP: 
0010:btrfs_start_dirty_block_groups+0x260/0x430 [btrfs]
Nov 02 01:58:31 backuprasia kernel: RSP: 0018:b7b60efbbdc0 EFLAGS: 00010282
Nov 02 01:58:31 backuprasia kernel: RAX:  RBX: 8bd5d4a1bc30 
RCX: 0006
Nov 02 01:58:31 backuprasia kernel: RDX: 0007 RSI: 0092 
RDI: 8bd77f416490
Nov 02 01:58:31 backuprasia kernel: RBP: b7b60efbbe30 R08: 0001 
R09: 0539
Nov 02 01:58:31 backuprasia kernel: R10:  R11: 0539 
R12: 8bd60128e800
Nov 02 01:58:31 backuprasia kernel: R13: 0001 R14: 8bd1d1ce32a0 
R15: 8bd60128e948
Nov 02 01:58:31 backuprasia kernel: FS:  () 
GS:8bd77f40() knlGS:
Nov 02 01:58:31 backuprasia kernel: CS:  0010 DS:  ES:  CR0: 
80050033
Nov 02 01:58:31 backuprasia kernel: CR2: 7fc65cfc6868 CR3: 000f8e80a003 
CR4: 003606e0
Nov 02 01:58:31 backuprasia kernel: DR0:  

Re: Salvage files from broken btrfs

2018-11-02 Thread M. Klingmann
On 02.11.2018 at 15:45 Qu Wenruo wrote:
> On 2018/11/2 下午10:30, M. Klingmann wrote:
>> On 31.10.2018 at 01:03 Qu Wenruo wrote:
>>> My plan for such recovery is:
>>>
>>> 1) btrfs ins dump-super to make sure system chunk array is valid
>>> 2) btrfs-find-root to find any valid chunk tree blocks
>>> 3) pass that chunk tree bytenr to btrfs-restore
>>>Unfortunately, btrfs-restore doesn't support specifying chunk root
>>>yet. But it's pretty easy to add such support.
>>>
>>> So, please provide the "btrfs ins dump-super -Ffa" output to start with.
>> Following your plan, I did 1) and 2).
>> As 2) failed (see below), is there anything I can do to find the tree
>> bytenr to supply btrfs-restore with it?
>>
>> 1) Here's the output given by "btrfs-show-super -Ffa":
>>
>> superblock: bytenr=65536, device=sdcard.iso
>> -
>> csum            0xb8e15dd7 [match]
[snip]
>> 2) "btrfs-find-root" yields "Couldn't read chunk root; Open ctree failed".
> It's not plain "btrfs-find-root" but "btrfs-find-root -o 5".
>
> And you should use btrfs-progs v4.17.1, not the old v4.4.
> The ability to continue search even if chunk tree get corrupted is added
> in v4.5, and I strongly recommend to use latest (v4.17.1) for a lot of
> fixes and extra debug output.
>
> If you can't find any handy way to update btrfs-progs, you could use
> Archlinux iso as a rescue OS to use the latest btrfs-progs.

Using Archlinux in fact is the easiest way to use version 4.17.1
(Archlinux for 2018-11-01).

Here's the output from "btrfs-find-root sdcard.iso":

WARNING: cannot read chunk root, continue anyway
Superblock thinks the generation is 1757933
Superblock thinks the level is 0

Here's the output using "btrfs-find-root -o 5 sdcard.iso":

WARNING: cannot read chunk root, continue anyway
Superblock doesn't contain generation info for root 5
Superblock doesn't contain the level info for root 5

> For 3), I could easily add such feature btrfs-restore, or just use
> manually patching your superblock to continue.
> So as soon as your "btrfs-find-root -o 5" gets some valid output, I
> could continue the work.
>
Thank you.

-- 
Mirko




signature.asc
Description: OpenPGP digital signature


Re: Salvage files from broken btrfs

2018-11-02 Thread Qu Wenruo


On 2018/11/2 下午10:30, M. Klingmann wrote:
> 
> On 31.10.2018 at 01:03 Qu Wenruo wrote:
>> My plan for such recovery is:
>>
>> 1) btrfs ins dump-super to make sure system chunk array is valid
>> 2) btrfs-find-root to find any valid chunk tree blocks
>> 3) pass that chunk tree bytenr to btrfs-restore
>>Unfortunately, btrfs-restore doesn't support specifying chunk root
>>yet. But it's pretty easy to add such support.
>>
>> So, please provide the "btrfs ins dump-super -Ffa" output to start with.
> Following your plan, I did 1) and 2).
> As 2) failed (see below), is there anything I can do to find the tree
> bytenr to supply btrfs-restore with it?
> 
> 1) Here's the output given by "btrfs-show-super -Ffa":
> 
> superblock: bytenr=65536, device=sdcard.iso
> -
> csum            0xb8e15dd7 [match]
> bytenr            65536
> flags            0x1
>             ( WRITTEN )
> magic            _BHRfS_M [match]
> fsid            4235aa4f-7721-4e73-97f0-7fe0e9a3ce9c
> label           
> generation        1757933
> root            889143296
> sys_array_size        226
> chunk_root_generation    932006
> root_level        0
> chunk_root        20987904
> chunk_root_level    0
> log_root        890109952
> log_root_transid    0
> log_root_level        0
> total_bytes        3053696
> bytes_used        16937803776
> sectorsize        4096
> nodesize        16384
> leafsize        16384
> stripesize        4096
> root_dir        6
> num_devices        1
> compat_flags        0x0
> compat_ro_flags        0x0
> incompat_flags        0x61
>             ( MIXED_BACKREF |
>               BIG_METADATA |
>               EXTENDED_IREF )
> csum_type        0
> csum_size        4
> cache_generation    1757933
> uuid_tree_generation    149
> dev_item.uuid        90185cf6-b937-49bb-b191-91d08677ee22
> dev_item.fsid        4235aa4f-7721-4e73-97f0-7fe0e9a3ce9c [match]
> dev_item.type        0
> dev_item.total_bytes    3053696
> dev_item.bytes_used    3053696
> dev_item.io_align    4096
> dev_item.io_width    4096
> dev_item.sector_size    4096
> dev_item.devid        1
> dev_item.dev_group    0
> dev_item.seek_speed    0
> dev_item.bandwidth    0
> dev_item.generation    0
> sys_chunk_array[2048]:
>     item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 0)
>         chunk length 4194304 owner 2 stripe_len 65536
>         type SYSTEM num_stripes 1
>             stripe 0 devid 1 offset 0
>             dev uuid: 90185cf6-b937-49bb-b191-91d08677ee22
>     item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520)
>         chunk length 8388608 owner 2 stripe_len 65536
>         type SYSTEM|DUP num_stripes 2

This chunk looks pretty OK.
And it's DUP, so it improves the possibility to recover.

>             stripe 0 devid 1 offset 20971520
>             dev uuid: 90185cf6-b937-49bb-b191-91d08677ee22
>             stripe 1 devid 1 offset 29360128
>             dev uuid: 90185cf6-b937-49bb-b191-91d08677ee22
> backup_roots[4]:
>     backup 0:
>         backup_tree_root:    889143296    gen: 1757933    level: 0
>         backup_chunk_root:    20987904    gen: 932006    level: 0
>         backup_extent_root:    81152    gen: 1757933    level: 2
>         backup_fs_root:        889716736    gen: 1757934    level: 2
>         backup_dev_root:    307560448    gen: 1673227    level: 0
>         backup_csum_root:    887898112    gen: 1757934    level: 2
>         backup_total_bytes:    3053696
>         backup_bytes_used:    16937803776
>         backup_num_devices:    1
> 
>     backup 1:
>         backup_tree_root:    882311168    gen: 1757930    level: 0
>         backup_chunk_root:    20987904    gen: 932006    level: 0
>         backup_extent_root:    879738880    gen: 1757931    level: 2
>         backup_fs_root:        883097600    gen: 1757931    level: 2
>         backup_dev_root:    307560448    gen: 1673227    level: 0
>         backup_csum_root:    883212288    gen: 1757931    level: 2
>         backup_total_bytes:    3053696
>         backup_bytes_used:    16943640576
>         backup_num_devices:    1
> 
>     backup 2:
>         backup_tree_root:    881082368    gen: 1757931    level: 0
>         backup_chunk_root:    20987904    gen: 932006    level: 0
>         backup_extent_root:    879738880    gen: 1757931    level: 2
>         backup_fs_root:        883654656    gen: 1757932    level: 2
>         backup_dev_root:    307560448    gen: 1673227    level: 0
>         backup_csum_root:    883703808    gen: 1757932    level: 2
>         backup_total_bytes:    3053696
>         backup_bytes_used:    16943722496
>         backup_num_devices:    1
> 
>     backup 3:
>         backup_tree_root:    887865344    gen: 1757932    level: 0
>         backup_chunk_root:    20987904    gen: 932006    level: 0
>         backup_extent_root:    81152    gen: 1757933    level: 2
>         backup_fs_root:        888750080    gen: 1757933    level: 2
>         backup_dev_root:   

Re: Salvage files from broken btrfs

2018-11-02 Thread M. Klingmann

On 31.10.2018 at 01:03 Qu Wenruo wrote:
> My plan for such recovery is:
>
> 1) btrfs ins dump-super to make sure system chunk array is valid
> 2) btrfs-find-root to find any valid chunk tree blocks
> 3) pass that chunk tree bytenr to btrfs-restore
>Unfortunately, btrfs-restore doesn't support specifying chunk root
>yet. But it's pretty easy to add such support.
>
> So, please provide the "btrfs ins dump-super -Ffa" output to start with.
Following your plan, I did 1) and 2).
As 2) failed (see below), is there anything I can do to find the tree
bytenr to supply btrfs-restore with it?

1) Here's the output given by "btrfs-show-super -Ffa":

superblock: bytenr=65536, device=sdcard.iso
-
csum            0xb8e15dd7 [match]
bytenr            65536
flags            0x1
            ( WRITTEN )
magic            _BHRfS_M [match]
fsid            4235aa4f-7721-4e73-97f0-7fe0e9a3ce9c
label           
generation        1757933
root            889143296
sys_array_size        226
chunk_root_generation    932006
root_level        0
chunk_root        20987904
chunk_root_level    0
log_root        890109952
log_root_transid    0
log_root_level        0
total_bytes        3053696
bytes_used        16937803776
sectorsize        4096
nodesize        16384
leafsize        16384
stripesize        4096
root_dir        6
num_devices        1
compat_flags        0x0
compat_ro_flags        0x0
incompat_flags        0x61
            ( MIXED_BACKREF |
              BIG_METADATA |
              EXTENDED_IREF )
csum_type        0
csum_size        4
cache_generation    1757933
uuid_tree_generation    149
dev_item.uuid        90185cf6-b937-49bb-b191-91d08677ee22
dev_item.fsid        4235aa4f-7721-4e73-97f0-7fe0e9a3ce9c [match]
dev_item.type        0
dev_item.total_bytes    3053696
dev_item.bytes_used    3053696
dev_item.io_align    4096
dev_item.io_width    4096
dev_item.sector_size    4096
dev_item.devid        1
dev_item.dev_group    0
dev_item.seek_speed    0
dev_item.bandwidth    0
dev_item.generation    0
sys_chunk_array[2048]:
    item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 0)
        chunk length 4194304 owner 2 stripe_len 65536
        type SYSTEM num_stripes 1
            stripe 0 devid 1 offset 0
            dev uuid: 90185cf6-b937-49bb-b191-91d08677ee22
    item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520)
        chunk length 8388608 owner 2 stripe_len 65536
        type SYSTEM|DUP num_stripes 2
            stripe 0 devid 1 offset 20971520
            dev uuid: 90185cf6-b937-49bb-b191-91d08677ee22
            stripe 1 devid 1 offset 29360128
            dev uuid: 90185cf6-b937-49bb-b191-91d08677ee22
backup_roots[4]:
    backup 0:
        backup_tree_root:    889143296    gen: 1757933    level: 0
        backup_chunk_root:    20987904    gen: 932006    level: 0
        backup_extent_root:    81152    gen: 1757933    level: 2
        backup_fs_root:        889716736    gen: 1757934    level: 2
        backup_dev_root:    307560448    gen: 1673227    level: 0
        backup_csum_root:    887898112    gen: 1757934    level: 2
        backup_total_bytes:    3053696
        backup_bytes_used:    16937803776
        backup_num_devices:    1

    backup 1:
        backup_tree_root:    882311168    gen: 1757930    level: 0
        backup_chunk_root:    20987904    gen: 932006    level: 0
        backup_extent_root:    879738880    gen: 1757931    level: 2
        backup_fs_root:        883097600    gen: 1757931    level: 2
        backup_dev_root:    307560448    gen: 1673227    level: 0
        backup_csum_root:    883212288    gen: 1757931    level: 2
        backup_total_bytes:    3053696
        backup_bytes_used:    16943640576
        backup_num_devices:    1

    backup 2:
        backup_tree_root:    881082368    gen: 1757931    level: 0
        backup_chunk_root:    20987904    gen: 932006    level: 0
        backup_extent_root:    879738880    gen: 1757931    level: 2
        backup_fs_root:        883654656    gen: 1757932    level: 2
        backup_dev_root:    307560448    gen: 1673227    level: 0
        backup_csum_root:    883703808    gen: 1757932    level: 2
        backup_total_bytes:    3053696
        backup_bytes_used:    16943722496
        backup_num_devices:    1

    backup 3:
        backup_tree_root:    887865344    gen: 1757932    level: 0
        backup_chunk_root:    20987904    gen: 932006    level: 0
        backup_extent_root:    81152    gen: 1757933    level: 2
        backup_fs_root:        888750080    gen: 1757933    level: 2
        backup_dev_root:    307560448    gen: 1673227    level: 0
        backup_csum_root:    32000    gen: 1757933    level: 2
        backup_total_bytes:    3053696
        backup_bytes_used:    16937803776
        backup_num_devices:    1


superblock: bytenr=67108864, device=sdcard.iso
-
csum            0x 

Re: Salvage files from broken btrfs

2018-11-02 Thread M. Klingmann
On 31.10.2018 at 05:56 Chris Murphy wrote:
> On Tue, Oct 30, 2018 at 4:11 PM, Mirko Klingmann  wrote:
>> Hi all,
>>
>> my btrfs root file system on a SD card broke down and did not mount anymore.
> It might mount with -o ro,nologreplay
>
> Typically an SD card will break in a way that it can't write, and
> mount will just hang (with mmcblk errors). Mounting with both ro and
> nologreplay will ensure no writes are needed, allowing the mount to
> succeed. of course any changes that are in the log tree will be
> missing so recent transactions may be unrecoverable but so far I've
> had good luck recovering from broken SD cards this way.
>
No luck with these options. The error still persists with same output in
"dmesg".

Thanks for your effort...

-- 

Mirko



Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-11-02 Thread Christoph Anton Mitterer
Hey Su.

Anything further I need to do in this matter or can I consider it
"solved" and you won't need further testing by my side, but just PR the
patches of that branch? :-)

Thanks,
Chris.

On Sat, 2018-10-27 at 14:15 +0200, Christoph Anton Mitterer wrote:
> Hey.
> 
> 
> Without the last patches on 4.17:
> 
> checking extents
> checking free space cache
> checking fs roots
> ERROR: errors found in fs roots
> Checking filesystem on /dev/mapper/system
> UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c
> found 619543498752 bytes used, error(s) found
> total csum bytes: 602382204
> total tree bytes: 2534309888
> total fs tree bytes: 1652097024
> total extent tree bytes: 160432128
> btree space waste bytes: 459291608
> file data blocks allocated: 7334036647936
>  referenced 730839187456
> 
> 
> With the last patches, on 4.17:
> 
> checking extents
> checking free space cache
> checking fs roots
> checking only csum items (without verifying data)
> checking root refs
> Checking filesystem on /dev/mapper/system
> UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c
> found 619543498752 bytes used, no error found
> total csum bytes: 602382204
> total tree bytes: 2534309888
> total fs tree bytes: 1652097024
> total extent tree bytes: 160432128
> btree space waste bytes: 459291608
> file data blocks allocated: 7334036647936
>  referenced 730839187456
> 
> 
> Cheers,
> Chris.
> 



Re: [PATCH v2] btrfs: use tagged writepage to mitigate livelock of snapshot

2018-11-02 Thread Nikolay Borisov



On 2.11.18 г. 11:06 ч., Ethan Lien wrote:
> Snapshot is expected to be fast. But if there are writers steadily
> create dirty pages in our subvolume, the snapshot may take a very long
> time to complete. To fix the problem, we use tagged writepage for
> snapshot flusher as we do in generic write_cache_pages(): we quickly
> tag all dirty pages with a TOWRITE tag, then do the hard work of
> writepage only on those pages with TOWRITE tag, so we ommit pages dirtied
> after the snapshot command.
> 
> We do a simple snapshot speed test on a Intel D-1531 box:
> 
> fio --ioengine=libaio --iodepth=32 --bs=4k --rw=write --size=64G
> --direct=0 --thread=1 --numjobs=1 --time_based --runtime=120
> --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5;
> time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio
> 
> original: 1m58sec
> patched:  6.54sec
> 
> This is the best case for this patch since for a sequential write case,
> we omit nearly all pages dirtied after the snapshot command.
> 
> For a multi writers, random write test:
> 
> fio --ioengine=libaio --iodepth=32 --bs=4k --rw=randwrite --size=64G
> --direct=0 --thread=1 --numjobs=4 --time_based --runtime=120
> --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5;
> time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio
> 
> original: 15.83sec
> patched:  10.35sec
> 
> The improvement is less compared with the sequential write case, since
> we omit only half of the pages dirtied after snapshot command.
> 
> Signed-off-by: Ethan Lien 

Codewise it looks good, though I'd also document the reason you are
using  wbc->nr_to_write == LONG_MAX (i.e the fact that other flushers
might race and inadvertently disable the snapshot flag) but this is
something which David can fix on the way in. So:

Reviewed-by: Nikolay Borisov 

> ---
> 
> V2:
>   Add more details in commit message.
>   rename BTRFS_INODE_TAGGED_FLUSH to BTRFS_INODE_SNAPSHOT_FLUSH.
>   remove unnecessary sync_mode check.
>   start_delalloc_inodes use boolean argument.
> 
>  fs/btrfs/btrfs_inode.h |  1 +
>  fs/btrfs/ctree.h   |  2 +-
>  fs/btrfs/extent_io.c   | 14 --
>  fs/btrfs/inode.c   | 10 ++
>  fs/btrfs/ioctl.c   |  2 +-
>  5 files changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index 1343ac57b438..ffc9a1c77375 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -29,6 +29,7 @@ enum {
>   BTRFS_INODE_IN_DELALLOC_LIST,
>   BTRFS_INODE_READDIO_NEED_LOCK,
>   BTRFS_INODE_HAS_PROPS,
> + BTRFS_INODE_SNAPSHOT_FLUSH,
>  };
>  
>  /* in memory btrfs inode */
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 2cddfe7806a4..82682da5a40d 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3155,7 +3155,7 @@ int btrfs_truncate_inode_items(struct 
> btrfs_trans_handle *trans,
>  struct inode *inode, u64 new_size,
>  u32 min_type);
>  
> -int btrfs_start_delalloc_inodes(struct btrfs_root *root);
> +int btrfs_start_delalloc_snapshot(struct btrfs_root *root);
>  int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int nr);
>  int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
> unsigned int extra_bits,
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 4dd6faab02bb..93f2e413535d 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3928,12 +3928,22 @@ static int extent_write_cache_pages(struct 
> address_space *mapping,
>   range_whole = 1;
>   scanned = 1;
>   }
> - if (wbc->sync_mode == WB_SYNC_ALL)
> +
> + /*
> +  * We do the tagged writepage as long as the snapshot flush bit is set
> +  * and we are the first one who do the filemap_flush() on this inode.
> +  */
> + if (range_whole && wbc->nr_to_write == LONG_MAX &&
> + test_and_clear_bit(BTRFS_INODE_SNAPSHOT_FLUSH,
> + _I(inode)->runtime_flags))
> + wbc->tagged_writepages = 1;
> +
> + if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
>   tag = PAGECACHE_TAG_TOWRITE;
>   else
>   tag = PAGECACHE_TAG_DIRTY;
>  retry:
> - if (wbc->sync_mode == WB_SYNC_ALL)
> + if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
>   tag_pages_for_writeback(mapping, index, end);
>   done_index = index;
>   while (!done && !nr_to_write_done && (index <= end) &&
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 3ea5339603cf..593445d122ed 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -9975,7 +9975,7 @@ static struct btrfs_delalloc_work 
> *btrfs_alloc_delalloc_work(struct inode *inode
>   * some fairly slow code that needs optimization. This walks the list
>   * of all the inodes with pending delalloc and forces them to disk.
>   

[PATCH v2] btrfs: use tagged writepage to mitigate livelock of snapshot

2018-11-02 Thread Ethan Lien
Snapshot is expected to be fast. But if there are writers steadily
create dirty pages in our subvolume, the snapshot may take a very long
time to complete. To fix the problem, we use tagged writepage for
snapshot flusher as we do in generic write_cache_pages(): we quickly
tag all dirty pages with a TOWRITE tag, then do the hard work of
writepage only on those pages with TOWRITE tag, so we ommit pages dirtied
after the snapshot command.

We do a simple snapshot speed test on a Intel D-1531 box:

fio --ioengine=libaio --iodepth=32 --bs=4k --rw=write --size=64G
--direct=0 --thread=1 --numjobs=1 --time_based --runtime=120
--filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5;
time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio

original: 1m58sec
patched:  6.54sec

This is the best case for this patch since for a sequential write case,
we omit nearly all pages dirtied after the snapshot command.

For a multi writers, random write test:

fio --ioengine=libaio --iodepth=32 --bs=4k --rw=randwrite --size=64G
--direct=0 --thread=1 --numjobs=4 --time_based --runtime=120
--filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5;
time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio

original: 15.83sec
patched:  10.35sec

The improvement is less compared with the sequential write case, since
we omit only half of the pages dirtied after snapshot command.

Signed-off-by: Ethan Lien 
---

V2:
Add more details in commit message.
rename BTRFS_INODE_TAGGED_FLUSH to BTRFS_INODE_SNAPSHOT_FLUSH.
remove unnecessary sync_mode check.
start_delalloc_inodes use boolean argument.

 fs/btrfs/btrfs_inode.h |  1 +
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/extent_io.c   | 14 --
 fs/btrfs/inode.c   | 10 ++
 fs/btrfs/ioctl.c   |  2 +-
 5 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 1343ac57b438..ffc9a1c77375 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -29,6 +29,7 @@ enum {
BTRFS_INODE_IN_DELALLOC_LIST,
BTRFS_INODE_READDIO_NEED_LOCK,
BTRFS_INODE_HAS_PROPS,
+   BTRFS_INODE_SNAPSHOT_FLUSH,
 };
 
 /* in memory btrfs inode */
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2cddfe7806a4..82682da5a40d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3155,7 +3155,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle 
*trans,
   struct inode *inode, u64 new_size,
   u32 min_type);
 
-int btrfs_start_delalloc_inodes(struct btrfs_root *root);
+int btrfs_start_delalloc_snapshot(struct btrfs_root *root);
 int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int nr);
 int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
  unsigned int extra_bits,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4dd6faab02bb..93f2e413535d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3928,12 +3928,22 @@ static int extent_write_cache_pages(struct 
address_space *mapping,
range_whole = 1;
scanned = 1;
}
-   if (wbc->sync_mode == WB_SYNC_ALL)
+
+   /*
+* We do the tagged writepage as long as the snapshot flush bit is set
+* and we are the first one who do the filemap_flush() on this inode.
+*/
+   if (range_whole && wbc->nr_to_write == LONG_MAX &&
+   test_and_clear_bit(BTRFS_INODE_SNAPSHOT_FLUSH,
+   _I(inode)->runtime_flags))
+   wbc->tagged_writepages = 1;
+
+   if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
tag = PAGECACHE_TAG_TOWRITE;
else
tag = PAGECACHE_TAG_DIRTY;
 retry:
-   if (wbc->sync_mode == WB_SYNC_ALL)
+   if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
tag_pages_for_writeback(mapping, index, end);
done_index = index;
while (!done && !nr_to_write_done && (index <= end) &&
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3ea5339603cf..593445d122ed 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9975,7 +9975,7 @@ static struct btrfs_delalloc_work 
*btrfs_alloc_delalloc_work(struct inode *inode
  * some fairly slow code that needs optimization. This walks the list
  * of all the inodes with pending delalloc and forces them to disk.
  */
-static int start_delalloc_inodes(struct btrfs_root *root, int nr)
+static int start_delalloc_inodes(struct btrfs_root *root, int nr, bool 
snapshot)
 {
struct btrfs_inode *binode;
struct inode *inode;
@@ -10003,6 +10003,8 @@ static int start_delalloc_inodes(struct btrfs_root 
*root, int nr)
}
spin_unlock(>delalloc_lock);
 
+   if (snapshot)
+   set_bit(BTRFS_INODE_SNAPSHOT_FLUSH, 
>runtime_flags);
   

Re: [PATCH] btrfs: use tagged writepage to mitigate livelock of snapshot

2018-11-02 Thread Nikolay Borisov



On 2.11.18 г. 9:13 ч., ethanlien wrote:
> David Sterba 於 2018-11-02 02:02 寫到:
>> On Thu, Nov 01, 2018 at 02:49:03PM +0800, Ethan Lien wrote:
>>> Snapshot is expected to be fast. But if there are writers steadily
>>> create dirty pages in our subvolume, the snapshot may take a very long
>>> time to complete. To fix the problem, we use tagged writepage for
>>> snapshot
>>> flusher as we do in the generic write_cache_pages(), so we can ommit
>>> pages
>>> dirtied after the snapshot command.
>>>
>>> We do a simple snapshot speed test on a Intel D-1531 box:
>>>
>>> fio --ioengine=libaio --iodepth=32 --bs=4k --rw=write --size=64G
>>> --direct=0 --thread=1 --numjobs=1 --time_based --runtime=120
>>> --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5;
>>> time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio
>>>
>>> original: 1m58sec
>>> patched:  6.54sec
>>>
>>> This is the best case for this patch since for a sequential write case,
>>> we omit nearly all pages dirtied after the snapshot command.
>>>
>>> For a multi writers, random write test:
>>>
>>> fio --ioengine=libaio --iodepth=32 --bs=4k --rw=randwrite --size=64G
>>> --direct=0 --thread=1 --numjobs=4 --time_based --runtime=120
>>> --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5;
>>> time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio
>>>
>>> original: 15.83sec
>>> patched:  10.35sec
>>>
>>> The improvement is less compared with the sequential write case, since
>>> we omit only half of the pages dirtied after snapshot command.
>>>
>>> Signed-off-by: Ethan Lien 
>>
>> This looks nice, thanks. I agree with the The suggestions from Nikolay,
>> please update and resend.
>>
>> I was bit curious about the 'livelock', what you describe does not seem
>> to be one. System under heavy IO can make the snapshot dead slow but can
>> recover from that once the IO stops.>
> I'm not sure if this is indeed the case of 'livelock'. I learn the term
> from commit:
> f446daaea9d4a420d, "mm: implement writeback livelock avoidance using
> page tagging".
> If this is not the case, I can use another term.

It's really bad to use terms you don't fully comprehend because that way
you can, inadvertently, mislead people who don't necessarily have the
same context as you do. A brief description of what livelock can be
found here:

https://stackoverflow.com/questions/6155951/whats-the-difference-between-deadlock-and-livelock

In the context of wb livelock could occur since you have 2 processes - 1
producer (dirtying pages) and the other one consumer (writeback). So a
livelock will be when both are doing useful work yet the system doesn't
make forward progress ( in this case there is a risk of the snapshot
never being created since we will never finish doing the writeback for
constantly dirtied pages).

David, indeed what Ethan fixes and what Jan fixed in f446daaea9d4a420d
are the same class of issues. And livelock in this context really
depends on one's definition of how long a
no-forward-progress-but-useful-work-being-done situation persisting.

> 
>> Regarding the sync semantics, there's AFAIK no change to the current
>> state where the sync is done before snapshot but without further other
>> guarantees. From that point I think it's safe to select only subset of
>> pages and make things faster.
>>
>> As the requested changes are not functional I'll add the patch to
>> for-next for testing.
> 
> 


Re: [PATCH] btrfs: use tagged writepage to mitigate livelock of snapshot

2018-11-02 Thread ethanlien

David Sterba 於 2018-11-02 02:02 寫到:

On Thu, Nov 01, 2018 at 02:49:03PM +0800, Ethan Lien wrote:

Snapshot is expected to be fast. But if there are writers steadily
create dirty pages in our subvolume, the snapshot may take a very long
time to complete. To fix the problem, we use tagged writepage for 
snapshot
flusher as we do in the generic write_cache_pages(), so we can ommit 
pages

dirtied after the snapshot command.

We do a simple snapshot speed test on a Intel D-1531 box:

fio --ioengine=libaio --iodepth=32 --bs=4k --rw=write --size=64G
--direct=0 --thread=1 --numjobs=1 --time_based --runtime=120
--filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5;
time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio

original: 1m58sec
patched:  6.54sec

This is the best case for this patch since for a sequential write 
case,

we omit nearly all pages dirtied after the snapshot command.

For a multi writers, random write test:

fio --ioengine=libaio --iodepth=32 --bs=4k --rw=randwrite --size=64G
--direct=0 --thread=1 --numjobs=4 --time_based --runtime=120
--filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5;
time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio

original: 15.83sec
patched:  10.35sec

The improvement is less compared with the sequential write case, since
we omit only half of the pages dirtied after snapshot command.

Signed-off-by: Ethan Lien 


This looks nice, thanks. I agree with the The suggestions from Nikolay,
please update and resend.

I was bit curious about the 'livelock', what you describe does not seem
to be one. System under heavy IO can make the snapshot dead slow but 
can

recover from that once the IO stops.


I'm not sure if this is indeed the case of 'livelock'. I learn the term 
from commit:
f446daaea9d4a420d, "mm: implement writeback livelock avoidance using 
page tagging".

If this is not the case, I can use another term.


Regarding the sync semantics, there's AFAIK no change to the current
state where the sync is done before snapshot but without further other
guarantees. From that point I think it's safe to select only subset of
pages and make things faster.

As the requested changes are not functional I'll add the patch to
for-next for testing.




Re: [PATCH] btrfs: use tagged writepage to mitigate livelock of snapshot

2018-11-02 Thread ethanlien

Nikolay Borisov 於 2018-11-01 19:57 寫到:

On 1.11.18 г. 8:49 ч., Ethan Lien wrote:

Snapshot is expected to be fast. But if there are writers steadily
create dirty pages in our subvolume, the snapshot may take a very long
time to complete. To fix the problem, we use tagged writepage for 
snapshot
flusher as we do in the generic write_cache_pages(), so we can ommit 
pages

dirtied after the snapshot command.

We do a simple snapshot speed test on a Intel D-1531 box:

fio --ioengine=libaio --iodepth=32 --bs=4k --rw=write --size=64G
--direct=0 --thread=1 --numjobs=1 --time_based --runtime=120
--filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5;
time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio

original: 1m58sec
patched:  6.54sec

This is the best case for this patch since for a sequential write 
case,

we omit nearly all pages dirtied after the snapshot command.

For a multi writers, random write test:

fio --ioengine=libaio --iodepth=32 --bs=4k --rw=randwrite --size=64G
--direct=0 --thread=1 --numjobs=4 --time_based --runtime=120
--filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5;
time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio

original: 15.83sec
patched:  10.35sec

The improvement is less compared with the sequential write case, since
we omit only half of the pages dirtied after snapshot command.

Signed-off-by: Ethan Lien 
---
 fs/btrfs/btrfs_inode.h |  1 +
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/extent_io.c   | 16 ++--
 fs/btrfs/inode.c   | 10 ++
 fs/btrfs/ioctl.c   |  2 +-
 5 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 1343ac57b438..4182bfbb56be 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -29,6 +29,7 @@ enum {
BTRFS_INODE_IN_DELALLOC_LIST,
BTRFS_INODE_READDIO_NEED_LOCK,
BTRFS_INODE_HAS_PROPS,
+   BTRFS_INODE_TAGGED_FLUSH,
 };

 /* in memory btrfs inode */
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2cddfe7806a4..82682da5a40d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3155,7 +3155,7 @@ int btrfs_truncate_inode_items(struct 
btrfs_trans_handle *trans,

   struct inode *inode, u64 new_size,
   u32 min_type);

-int btrfs_start_delalloc_inodes(struct btrfs_root *root);
+int btrfs_start_delalloc_snapshot(struct btrfs_root *root);
 int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int 
nr);
 int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 
end,

  unsigned int extra_bits,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4dd6faab02bb..c21d8a0e010a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3928,12 +3928,24 @@ static int extent_write_cache_pages(struct 
address_space *mapping,

range_whole = 1;
scanned = 1;
}
-   if (wbc->sync_mode == WB_SYNC_ALL)
+
+   /*
+	 * We don't care if we are the one who set BTRFS_INODE_TAGGED_FLUSH 
in
+	 * start_delalloc_inodes(). We do the tagged writepage as long as we 
are

+* the first one who do the filemap_flush() on this inode.
+*/
+   if (range_whole && wbc->nr_to_write == LONG_MAX &&
+   wbc->sync_mode == WB_SYNC_NONE &&
+   test_and_clear_bit(BTRFS_INODE_TAGGED_FLUSH,
+   _I(inode)->runtime_flags))

Actually this check can be simplified to:

range_whole && test_and_clear_bit. filemap_flush triggers range_whole =
1 and then you care about TAGGED_FLUSH (or w/e it's going to be named)
to be set. The nr_to_write && syncmode just make it a tad more 
difficult

to reason about the code.




Yes, the sync_mode check is not necessary. For nr_to_write, since 
pagevec
contain only limited amount of pages in one loop, nr_to_write 
essentially control
whether we scan all of the dirty pages or not. Since there is a window 
between
start_delalloc_inodes() and extent_write_cache_pages(), another flusher 
with

range_whole==1 but nr_to_write
+   wbc->tagged_writepages = 1;
+
+   if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
tag = PAGECACHE_TAG_TOWRITE;
else
tag = PAGECACHE_TAG_DIRTY;
 retry:
-   if (wbc->sync_mode == WB_SYNC_ALL)
+   if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
tag_pages_for_writeback(mapping, index, end);
done_index = index;
while (!done && !nr_to_write_done && (index <= end) &&
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3ea5339603cf..3df3cbbe91c5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9975,7 +9975,7 @@ static struct btrfs_delalloc_work 
*btrfs_alloc_delalloc_work(struct inode *inode

  * some fairly slow code that needs optimization. This walks the list
  * of all the inodes with pending delalloc and forces 

Re: [PATCH 3/8] btrfs: Remove extent_io_ops::writepage_end_io_hook

2018-11-02 Thread Nikolay Borisov



On 1.11.18 г. 21:03 ч., Josef Bacik wrote:
> On Thu, Nov 01, 2018 at 02:09:48PM +0200, Nikolay Borisov wrote:
>> This callback is ony ever called for data page writeout so
>> there is no need to actually abstract it via extent_io_ops. Lets just
>> export it, remove the definition of the callback and call it directly
>> in the functions that invoke the callback. Also rename the function to
>> btrfs_writepage_endio_finish_ordered since what it really does is
>> account finished io in the ordered extent data structures.
>> No functional changes.
>>
>> Signed-off-by: Nikolay Borisov 
> 
> Could send another cleanup patch to remove the struct extent_state *state from
> the arg list as well.

Indeed, once this series lands I will send a follow-up cleanup since I
have some other ideas around writepage_delalloc code cleanups as well.
Thanks

> 
> Reviewed-by: Josef Bacik 
> 
> Thanks,
> 
> Josef
>