[PATCH] verify hardening agaist duplicate fsid

2018-10-01 Thread Anand Jain
Its not that impossible to imagine that a device OR a btrfs image is
been copied just by using the dd or the cp command. Which in case both
the copies of the btrfs will have the same fsid. If on the system with
automount enabled, the copied FS gets scanned.

We have a known bug in btrfs, that we let the device path be changed
after the device has been mounted. So using this loop hole the new
copied device would appears as if its mounted immediately after its
been copied.

For example:

Initially.. /dev/mmcblk0p4 is mounted as /

lsblk
NAMEMAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
mmcblk0 179:00 29.2G  0 disk
|-mmcblk0p4 179:404G  0 part /
|-mmcblk0p2 179:20  500M  0 part /boot
|-mmcblk0p3 179:30  256M  0 part [SWAP]
`-mmcblk0p1 179:10  256M  0 part /boot/efi

btrfs fi show
Label: none  uuid: 07892354-ddaa-4443-90ea-f76a06accaba
Total devices 1 FS bytes used 1.40GiB
devid1 size 4.00GiB used 3.00GiB path /dev/mmcblk0p4

Copy mmcblk0 to sda
dd if=/dev/mmcblk0 of=/dev/sda

And immediately after the copy completes the change in the device
superblock is notified which the automount scans using
btrfs device scan and the new device sda becomes the mounted root
device.

lsblk
NAMEMAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda   8:01 14.9G  0 disk
|-sda48:414G  0 part /
|-sda28:21  500M  0 part
|-sda38:31  256M  0 part
`-sda18:11  256M  0 part
mmcblk0 179:00 29.2G  0 disk
|-mmcblk0p4 179:404G  0 part
|-mmcblk0p2 179:20  500M  0 part /boot
|-mmcblk0p3 179:30  256M  0 part [SWAP]
`-mmcblk0p1 179:10  256M  0 part /boot/efi
btrfs fi show /
Label: none  uuid: 07892354-ddaa-4443-90ea-f76a06accaba
Total devices 1 FS bytes used 1.40GiB
devid1 size 4.00GiB used 3.00GiB path /dev/sda4

The bug is quite nasty that you can't either unmount /dev/sda4 or
/dev/mmcblk0p4. And the problem does not get solved until you take
the sda out of the system on to another system to change its fsid using
the 'btrfstune -u' command.

Signed-off-by: Anand Jain 
---
 tests/btrfs/173 | 70 +
 tests/btrfs/173.out |  5 
 tests/btrfs/group   |  1 +
 3 files changed, 76 insertions(+)
 create mode 100755 tests/btrfs/173
 create mode 100644 tests/btrfs/173.out

diff --git a/tests/btrfs/173 b/tests/btrfs/173
new file mode 100755
index ..c644b07d6f5b
--- /dev/null
+++ b/tests/btrfs/173
@@ -0,0 +1,70 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2018 Oracle. All Rights Reserved.
+#
+# FS QA Test 173
+#
+# Fuzzy test for FS image duplication.
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch_dev_pool 2
+_scratch_dev_pool_get 2
+
+dev_foo=$(echo $SCRATCH_DEV_POOL | awk '{print $1}' | rev | cut -d"/" -f1 | 
rev)
+dev_bar=$(echo $SCRATCH_DEV_POOL | awk '{print $2}' | rev | cut -d"/" -f1 | 
rev)
+
+_mkfs_dev /dev/$dev_foo
+_mount /dev/$dev_foo $SCRATCH_MNT
+
+echo mount before btrfs image clone | tee -a $seqres.full
+findmnt /dev/$dev_foo | grep -v TARGET | awk '{print $1" "$2}' | \
+   sed -e "s/$dev_foo/dev_foo/g" | _filter_scratch | tee -a $seqres.full
+findmnt /dev/$dev_bar | grep -v TARGET | awk '{print $1" "$2}' | \
+   sed -e "s/$dev_bar/dev_bar/g" | _filter_scratch | tee -a $seqres.full
+
+for sb_bytenr in 65536 67108864
+do
+   echo -n "dd status=none if=/dev/$dev_foo of=/dev/$dev_bar bs=1 "\
+   "seek=$sb_bytenr skip=$sb_bytenr count=4096" >> $seqres.full
+   dd status=none if=/dev/$dev_foo of=/dev/$dev_bar bs=1 seek=$sb_bytenr \
+   skip=$sb_bytenr count=4096 >> $seqres.full 2>&1
+   echo ..:$? >> $seqres.full
+done
+
+echo mount after btrfs image clone | tee -a $seqres.full
+findmnt /dev/$dev_foo | grep -v TARGET | awk '{print $1" "$2}' | \
+   sed -e "s/$dev_foo/dev_foo/g" | _filter_scratch | tee -a $seqres.full
+findmnt /dev/$dev_bar | grep -v TARGET | awk '{print $1" "$2}' | \
+   sed -e "s/$dev_bar/dev_bar/g" | _filter_scratch | tee -a $seqres.full
+
+_scratch_dev_pool_put
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/173.out b/tests/btrfs/173.out
new file mode 100644
index ..a2ef7a26f4b9
--- /dev/null
+++ b/tests/btrfs/173.out
@@ -0,0 +1,5 @@
+QA output created by 173
+mount before btrfs image clone
+SCRATCH_MNT /dev/dev_foo
+mount after btrfs image clone
+SCRATCH_MNT /dev/dev_foo
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 45782565c3b7..b2f1

[PATCH] fstests: btrfs verify hardening agaist duplicate fsid

2018-10-01 Thread Anand Jain
We have a known bug in btrfs, that we let the device path be changed
after the device has been mounted. So using this loop hole the new
copied device would appears as if its mounted immediately after its
been copied. So this test case reproduces this issue.

For example:

Initially.. /dev/mmcblk0p4 is mounted as /

lsblk
NAMEMAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
mmcblk0 179:00 29.2G  0 disk
|-mmcblk0p4 179:404G  0 part /
|-mmcblk0p2 179:20  500M  0 part /boot
|-mmcblk0p3 179:30  256M  0 part [SWAP]
`-mmcblk0p1 179:10  256M  0 part /boot/efi

btrfs fi show
Label: none  uuid: 07892354-ddaa-4443-90ea-f76a06accaba
Total devices 1 FS bytes used 1.40GiB
devid1 size 4.00GiB used 3.00GiB path /dev/mmcblk0p4

Copy mmcblk0 to sda
dd if=/dev/mmcblk0 of=/dev/sda

And immediately after the copy completes the change in the device
superblock is notified which the automount scans using
btrfs device scan and the new device sda becomes the mounted root
device.

lsblk
NAMEMAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda   8:01 14.9G  0 disk
|-sda48:414G  0 part /
|-sda28:21  500M  0 part
|-sda38:31  256M  0 part
`-sda18:11  256M  0 part
mmcblk0 179:00 29.2G  0 disk
|-mmcblk0p4 179:404G  0 part
|-mmcblk0p2 179:20  500M  0 part /boot
|-mmcblk0p3 179:30  256M  0 part [SWAP]
`-mmcblk0p1 179:10  256M  0 part /boot/efi
btrfs fi show /
Label: none  uuid: 07892354-ddaa-4443-90ea-f76a06accaba
Total devices 1 FS bytes used 1.40GiB
devid1 size 4.00GiB used 3.00GiB path /dev/sda4

The bug is quite nasty that you can't either unmount /dev/sda4 or
/dev/mmcblk0p4. And the problem does not get solved until you take
the sda out of the system on to another system to change its fsid using
the 'btrfstune -u' command.

Signed-off-by: Anand Jain 
---
 tests/btrfs/173 | 72 +
 tests/btrfs/173.out |  5 
 tests/btrfs/group   |  1 +
 3 files changed, 78 insertions(+)
 create mode 100755 tests/btrfs/173
 create mode 100644 tests/btrfs/173.out

diff --git a/tests/btrfs/173 b/tests/btrfs/173
new file mode 100755
index ..f59a62e206c3
--- /dev/null
+++ b/tests/btrfs/173
@@ -0,0 +1,72 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2018 Oracle. All Rights Reserved.
+#
+# FS QA Test 173
+#
+# Fuzzy test for FS image duplication.
+#  Could be fixed by
+#[patch] btrfs: harden agaist duplicate fsid
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch_dev_pool 2
+_scratch_dev_pool_get 2
+
+dev_foo=$(echo $SCRATCH_DEV_POOL | awk '{print $1}' | rev | cut -d"/" -f1 | 
rev)
+dev_bar=$(echo $SCRATCH_DEV_POOL | awk '{print $2}' | rev | cut -d"/" -f1 | 
rev)
+
+_mkfs_dev /dev/$dev_foo
+_mount /dev/$dev_foo $SCRATCH_MNT
+
+echo mount before btrfs image clone | tee -a $seqres.full
+findmnt /dev/$dev_foo | grep -v TARGET | awk '{print $1" "$2}' | \
+   sed -e "s/$dev_foo/dev_foo/g" | _filter_scratch | tee -a $seqres.full
+findmnt /dev/$dev_bar | grep -v TARGET | awk '{print $1" "$2}' | \
+   sed -e "s/$dev_bar/dev_bar/g" | _filter_scratch | tee -a $seqres.full
+
+for sb_bytenr in 65536 67108864
+do
+   echo -n "dd status=none if=/dev/$dev_foo of=/dev/$dev_bar bs=1 "\
+   "seek=$sb_bytenr skip=$sb_bytenr count=4096" >> $seqres.full
+   dd status=none if=/dev/$dev_foo of=/dev/$dev_bar bs=1 seek=$sb_bytenr \
+   skip=$sb_bytenr count=4096 >> $seqres.full 2>&1
+   echo ..:$? >> $seqres.full
+done
+
+echo mount after btrfs image clone | tee -a $seqres.full
+findmnt /dev/$dev_foo | grep -v TARGET | awk '{print $1" "$2}' | \
+   sed -e "s/$dev_foo/dev_foo/g" | _filter_scratch | tee -a $seqres.full
+findmnt /dev/$dev_bar | grep -v TARGET | awk '{print $1" "$2}' | \
+   sed -e "s/$dev_bar/dev_bar/g" | _filter_scratch | tee -a $seqres.full
+
+_scratch_dev_pool_put
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/173.out b/tests/btrfs/173.out
new file mode 100644
index ..a2ef7a26f4b9
--- /dev/null
+++ b/tests/btrfs/173.out
@@ -0,0 +1,5 @@
+QA output created by 173
+mount before btrfs image clone
+SCRATCH_MNT /dev/dev_foo
+mount after btrfs image clone
+SCRATCH_MNT /dev/dev_foo
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 45782565c3b7..b2f1393f3e97 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -175,3 +175,4 @@
 170 auto quick snapshot
 171 auto quick qgroup
 172 auto quic

[PATCH RFC] btrfs: harden agaist duplicate fsid

2018-10-01 Thread Anand Jain
Its not that impossible to imagine that a device OR a btrfs image is
been copied just by using the dd or the cp command. Which in case both
the copies of the btrfs will have the same fsid. If on the system with
automount enabled, the copied FS gets scanned.

We have a known bug in btrfs, that we let the device path be changed
after the device has been mounted. So using this loop hole the new
copied device would appears as if its mounted immediately after its
been copied.

For example:

Initially.. /dev/mmcblk0p4 is mounted as /

lsblk
NAMEMAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
mmcblk0 179:00 29.2G  0 disk
|-mmcblk0p4 179:404G  0 part /
|-mmcblk0p2 179:20  500M  0 part /boot
|-mmcblk0p3 179:30  256M  0 part [SWAP]
`-mmcblk0p1 179:10  256M  0 part /boot/efi

btrfs fi show
   Label: none  uuid: 07892354-ddaa-4443-90ea-f76a06accaba
   Total devices 1 FS bytes used 1.40GiB
   devid1 size 4.00GiB used 3.00GiB path /dev/mmcblk0p4

Copy mmcblk0 to sda
   dd if=/dev/mmcblk0 of=/dev/sda

And immediately after the copy completes the change in the device
superblock is notified which the automount scans using
btrfs device scan and the new device sda becomes the mounted root
device.

lsblk
NAMEMAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda   8:01 14.9G  0 disk
|-sda48:414G  0 part /
|-sda28:21  500M  0 part
|-sda38:31  256M  0 part
`-sda18:11  256M  0 part
mmcblk0 179:00 29.2G  0 disk
|-mmcblk0p4 179:404G  0 part
|-mmcblk0p2 179:20  500M  0 part /boot
|-mmcblk0p3 179:30  256M  0 part [SWAP]
`-mmcblk0p1 179:10  256M  0 part /boot/efi

btrfs fi show /
 Label: none  uuid: 07892354-ddaa-4443-90ea-f76a06accaba
 Total devices 1 FS bytes used 1.40GiB
 devid1 size 4.00GiB used 3.00GiB path /dev/sda4

The bug is quite nasty that you can't either unmount /dev/sda4 or
/dev/mmcblk0p4. And the problem does not get solved until you take
sda out of the system on to another system to change its fsid
using the 'btrfstune -u' command.

Signed-off-by: Anand Jain 
---

Hi,

There was previous attempt to fix this bug ref: 
   www.spinics.net/lists/linux-btrfs/msg37466.html

which broke the Ubuntu subvol mount at boot. The reason
for that is, Ubuntu changes the device path in the boot
process, and the earlier fix checked for the device-path
instead of block_device as in here and so we failed the
subvol mount request and thus the bootup process.

I have tested this with Oracle Linux with btrfs as boot device
with a subvol to be mounted at boot. And also have verified
with new test case btrfs/173.

It will be good if someone run this through Ubuntu boot test case.

 fs/btrfs/volumes.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f4405e430da6..62173a3abcc4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -850,6 +850,29 @@ static noinline struct btrfs_device *device_list_add(const 
char *path,
return ERR_PTR(-EEXIST);
}
 
+   /*
+* we are going to replace the device path, make sure its the
+* same device if the device mounted
+*/
+   if (device->bdev) {
+   struct block_device *path_bdev;
+
+   path_bdev = lookup_bdev(path);
+   if (IS_ERR(path_bdev)) {
+   mutex_unlock(&fs_devices->device_list_mutex);
+   return ERR_CAST(path_bdev);
+   }
+
+   if (device->bdev != path_bdev) {
+   bdput(path_bdev);
+   mutex_unlock(&fs_devices->device_list_mutex);
+   return ERR_PTR(-EEXIST);
+   }
+   bdput(path_bdev);
+   pr_info("BTRFS: device fsid:devid %pU:%llu old path:%s 
new path:%s\n",
+   disk_super->fsid, devid, 
rcu_str_deref(device->name), path);
+   }
+
name = rcu_string_strdup(path, GFP_NOFS);
if (!name) {
mutex_unlock(&fs_devices->device_list_mutex);
-- 
1.8.3.1



Re: [PLEASE Ignore] [PATCH] verify hardening agaist duplicate fsid

2018-10-01 Thread Anand Jain



Please ignore this patch. Instead help review/integrate [1] which is
the same patch with subject and change log updated.

[1]
[PATCH] fstests: btrfs verify hardening agaist duplicate fsid

Thanks, Anand


Re: [PATCH RFC] btrfs: harden agaist duplicate fsid

2018-10-01 Thread Austin S. Hemmelgarn

On 2018-10-01 04:56, Anand Jain wrote:

Its not that impossible to imagine that a device OR a btrfs image is
been copied just by using the dd or the cp command. Which in case both
the copies of the btrfs will have the same fsid. If on the system with
automount enabled, the copied FS gets scanned.

We have a known bug in btrfs, that we let the device path be changed
after the device has been mounted. So using this loop hole the new
copied device would appears as if its mounted immediately after its
been copied.

For example:

Initially.. /dev/mmcblk0p4 is mounted as /

lsblk
NAMEMAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
mmcblk0 179:00 29.2G  0 disk
|-mmcblk0p4 179:404G  0 part /
|-mmcblk0p2 179:20  500M  0 part /boot
|-mmcblk0p3 179:30  256M  0 part [SWAP]
`-mmcblk0p1 179:10  256M  0 part /boot/efi

btrfs fi show
Label: none  uuid: 07892354-ddaa-4443-90ea-f76a06accaba
Total devices 1 FS bytes used 1.40GiB
devid1 size 4.00GiB used 3.00GiB path /dev/mmcblk0p4

Copy mmcblk0 to sda
dd if=/dev/mmcblk0 of=/dev/sda

And immediately after the copy completes the change in the device
superblock is notified which the automount scans using
btrfs device scan and the new device sda becomes the mounted root
device.

lsblk
NAMEMAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda   8:01 14.9G  0 disk
|-sda48:414G  0 part /
|-sda28:21  500M  0 part
|-sda38:31  256M  0 part
`-sda18:11  256M  0 part
mmcblk0 179:00 29.2G  0 disk
|-mmcblk0p4 179:404G  0 part
|-mmcblk0p2 179:20  500M  0 part /boot
|-mmcblk0p3 179:30  256M  0 part [SWAP]
`-mmcblk0p1 179:10  256M  0 part /boot/efi

btrfs fi show /
  Label: none  uuid: 07892354-ddaa-4443-90ea-f76a06accaba
  Total devices 1 FS bytes used 1.40GiB
  devid1 size 4.00GiB used 3.00GiB path /dev/sda4

The bug is quite nasty that you can't either unmount /dev/sda4 or
/dev/mmcblk0p4. And the problem does not get solved until you take
sda out of the system on to another system to change its fsid
using the 'btrfstune -u' command.

Signed-off-by: Anand Jain 
---

Hi,

There was previous attempt to fix this bug ref:
www.spinics.net/lists/linux-btrfs/msg37466.html

which broke the Ubuntu subvol mount at boot. The reason
for that is, Ubuntu changes the device path in the boot
process, and the earlier fix checked for the device-path
instead of block_device as in here and so we failed the
subvol mount request and thus the bootup process.

I have tested this with Oracle Linux with btrfs as boot device
with a subvol to be mounted at boot. And also have verified
with new test case btrfs/173.

It will be good if someone run this through Ubuntu boot test case.

  fs/btrfs/volumes.c | 23 +++
  1 file changed, 23 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f4405e430da6..62173a3abcc4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -850,6 +850,29 @@ static noinline struct btrfs_device *device_list_add(const 
char *path,
return ERR_PTR(-EEXIST);
}
  
+		/*

+* we are going to replace the device path, make sure its the
+* same device if the device mounted
+*/
+   if (device->bdev) {
+   struct block_device *path_bdev;
+
+   path_bdev = lookup_bdev(path);
+   if (IS_ERR(path_bdev)) {
+   mutex_unlock(&fs_devices->device_list_mutex);
+   return ERR_CAST(path_bdev);
+   }
+
+   if (device->bdev != path_bdev) {
+   bdput(path_bdev);
+   mutex_unlock(&fs_devices->device_list_mutex);
+   return ERR_PTR(-EEXIST);
It would be _really_ nice to have an informative error message printed 
here.  Aside from the possibility of an admin accidentally making a 
block-level copy of the volume, this code triggering could represent an 
attempted attack against the system, so it's arguably something that 
should be reported as happening.  Personally, I think a WARN_ON_ONCE for 
this would make sense, ideally per-volume if possible.

+   }
+   bdput(path_bdev);
+   pr_info("BTRFS: device fsid:devid %pU:%llu old path:%s new 
path:%s\n",
+   disk_super->fsid, devid, 
rcu_str_deref(device->name), path);
+   }
+
name = rcu_string_strdup(path, GFP_NOFS);
if (!name) {
mutex_unlock(&fs_devices->device_list_mutex);





Re: [PATCH RFC] btrfs: harden agaist duplicate fsid

2018-10-01 Thread Anand Jain




On 10/01/2018 07:17 PM, Austin S. Hemmelgarn wrote:

On 2018-10-01 04:56, Anand Jain wrote:

Its not that impossible to imagine that a device OR a btrfs image is
been copied just by using the dd or the cp command. Which in case both
the copies of the btrfs will have the same fsid. If on the system with
automount enabled, the copied FS gets scanned.

We have a known bug in btrfs, that we let the device path be changed
after the device has been mounted. So using this loop hole the new
copied device would appears as if its mounted immediately after its
been copied.

For example:

Initially.. /dev/mmcblk0p4 is mounted as /

lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
mmcblk0 179:0    0 29.2G  0 disk
|-mmcblk0p4 179:4    0    4G  0 part /
|-mmcblk0p2 179:2    0  500M  0 part /boot
|-mmcblk0p3 179:3    0  256M  0 part [SWAP]
`-mmcblk0p1 179:1    0  256M  0 part /boot/efi

btrfs fi show
    Label: none  uuid: 07892354-ddaa-4443-90ea-f76a06accaba
    Total devices 1 FS bytes used 1.40GiB
    devid    1 size 4.00GiB used 3.00GiB path /dev/mmcblk0p4

Copy mmcblk0 to sda
    dd if=/dev/mmcblk0 of=/dev/sda

And immediately after the copy completes the change in the device
superblock is notified which the automount scans using
btrfs device scan and the new device sda becomes the mounted root
device.

lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda   8:0    1 14.9G  0 disk
|-sda4    8:4    1    4G  0 part /
|-sda2    8:2    1  500M  0 part
|-sda3    8:3    1  256M  0 part
`-sda1    8:1    1  256M  0 part
mmcblk0 179:0    0 29.2G  0 disk
|-mmcblk0p4 179:4    0    4G  0 part
|-mmcblk0p2 179:2    0  500M  0 part /boot
|-mmcblk0p3 179:3    0  256M  0 part [SWAP]
`-mmcblk0p1 179:1    0  256M  0 part /boot/efi

btrfs fi show /
  Label: none  uuid: 07892354-ddaa-4443-90ea-f76a06accaba
  Total devices 1 FS bytes used 1.40GiB
  devid    1 size 4.00GiB used 3.00GiB path /dev/sda4

The bug is quite nasty that you can't either unmount /dev/sda4 or
/dev/mmcblk0p4. And the problem does not get solved until you take
sda out of the system on to another system to change its fsid
using the 'btrfstune -u' command.

Signed-off-by: Anand Jain 
---

Hi,

There was previous attempt to fix this bug ref:
    www.spinics.net/lists/linux-btrfs/msg37466.html

which broke the Ubuntu subvol mount at boot. The reason
for that is, Ubuntu changes the device path in the boot
process, and the earlier fix checked for the device-path
instead of block_device as in here and so we failed the
subvol mount request and thus the bootup process.

I have tested this with Oracle Linux with btrfs as boot device
with a subvol to be mounted at boot. And also have verified
with new test case btrfs/173.

It will be good if someone run this through Ubuntu boot test case.

  fs/btrfs/volumes.c | 23 +++
  1 file changed, 23 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f4405e430da6..62173a3abcc4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -850,6 +850,29 @@ static noinline struct btrfs_device 
*device_list_add(const char *path,

  return ERR_PTR(-EEXIST);
  }
+    /*
+ * we are going to replace the device path, make sure its the
+ * same device if the device mounted
+ */
+    if (device->bdev) {
+    struct block_device *path_bdev;
+
+    path_bdev = lookup_bdev(path);
+    if (IS_ERR(path_bdev)) {
+    mutex_unlock(&fs_devices->device_list_mutex);
+    return ERR_CAST(path_bdev);
+    }
+
+    if (device->bdev != path_bdev) {
+    bdput(path_bdev);
+    mutex_unlock(&fs_devices->device_list_mutex);
+    return ERR_PTR(-EEXIST);
It would be _really_ nice to have an informative error message printed 
here.  Aside from the possibility of an admin accidentally making a 
block-level copy of the volume, 


this code triggering could represent an 
attempted attack against the system, so it's arguably something that 
should be reported as happening.


  Personally, I think a WARN_ON_ONCE for 
this would make sense, ideally per-volume if possible.


 Ah. Will add an warn. Thanks, Anand



+    }
+    bdput(path_bdev);
+    pr_info("BTRFS: device fsid:devid %pU:%llu old path:%s 
new path:%s\n",
+    disk_super->fsid, devid, rcu_str_deref(device->name), 
path);

+    }
+
  name = rcu_string_strdup(path, GFP_NOFS);
  if (!name) {
  mutex_unlock(&fs_devices->device_list_mutex);





cross-fs copy support

2018-10-01 Thread Joshi
I was wondering about the cross-fs copy through copy_file_range.
It seems current implement has below check, that disables such copy.

1577 /* this could be relaxed once a method supports cross-fs copies */
1578 if (inode_in->i_sb != inode_out->i_sb)
1579 return -EXDEV;

May I know what are the thoughts behind disabling cross-fs copy?
Code has the comment "once a method supports", but that leaves me
wondering exactly what 'method' is expected, and from whom.

I disabled the check, and copy across volumes seemed to work fine. At
least for a single file (1G size), with no data mismatch, and faster
speed than regular copy.

-- 
Joshi


[PATCH 01/10] btrfs-progs: Add support for freespace tree in btrfs_read_fs_root

2018-10-01 Thread Nikolay Borisov
For completeness sake add code to btrfs_read_fs_root so that it can
handle the freespace tree.

Signed-off-by: Nikolay Borisov 
---
 disk-io.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/disk-io.c b/disk-io.c
index 2e6d56a36af9..14f0fd5c2f0c 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -668,6 +668,9 @@ struct btrfs_root *btrfs_read_fs_root(struct btrfs_fs_info 
*fs_info,
if (location->objectid == BTRFS_QUOTA_TREE_OBJECTID)
return fs_info->quota_enabled ? fs_info->quota_root :
ERR_PTR(-ENOENT);
+   if (location->objectid == BTRFS_FREE_SPACE_TREE_OBJECTID)
+return fs_info->free_space_root ? fs_info->free_space_root :
+  ERR_PTR(-ENOENT);
 
BUG_ON(location->objectid == BTRFS_TREE_RELOC_OBJECTID ||
   location->offset != (u64)-1);
-- 
2.7.4



[PATCH 02/10] btrfs-progs: Add extent buffer bitmap manipulation infrastructure

2018-10-01 Thread Nikolay Borisov
Those functions are in preparation for adding the freespace tree
repair code since it needs to be able to deal with bitmap based fsts.
This patch adds extent_buffer_bitmap_set and extent_buffer_bitmap_clear
functions. Since in userspace we don't have to deal with page mappings
their implementation is vastly simplified by simply setting each bit in
the passed range.

Signed-off-by: Nikolay Borisov 
---
 extent_io.c | 56 
 extent_io.h |  4 
 2 files changed, 60 insertions(+)

diff --git a/extent_io.c b/extent_io.c
index 198492699438..de47c2c59ae9 100644
--- a/extent_io.c
+++ b/extent_io.c
@@ -204,6 +204,62 @@ static int clear_state_bit(struct extent_io_tree *tree,
return ret;
 }
 
+/**
+ * extent_buffer_bitmap_set - set an area of a bitmap
+ * @eb: the extent buffer
+ * @start: offset of the bitmap item in the extent buffer
+ * @pos: bit number of the first bit
+ * @len: number of bits to set
+ */
+void extent_buffer_bitmap_set(struct extent_buffer *eb, unsigned long start,
+  unsigned long pos, unsigned long len)
+{
+   u8 *p = (u8 *)eb->data + start + BIT_BYTE(pos);
+   const unsigned int size = pos + len;
+   int bits_to_set = BITS_PER_BYTE - (pos % BITS_PER_BYTE);
+   u8 mask_to_set = BITMAP_FIRST_BYTE_MASK(pos);
+
+   while (len >= bits_to_set) {
+   *p |= mask_to_set;
+   len -= bits_to_set;
+   bits_to_set = BITS_PER_BYTE;
+   mask_to_set = ~0;
+   p++;
+   }
+   if (len) {
+   mask_to_set &= BITMAP_LAST_BYTE_MASK(size);
+   *p |= mask_to_set;
+   }
+}
+
+
+/**
+ * extent_buffer_bitmap_clear - clear an area of a bitmap
+ * @eb: the extent buffer
+ * @start: offset of the bitmap item in the extent buffer
+ * @pos: bit number of the first bit
+ * @len: number of bits to clear
+ */
+void extent_buffer_bitmap_clear(struct extent_buffer *eb, unsigned long start,
+unsigned long pos, unsigned long len)
+{
+   u8 *p = (u8 *)eb->data + start + BIT_BYTE(pos);
+   const unsigned int size = pos + len;
+   int bits_to_clear = BITS_PER_BYTE - (pos % BITS_PER_BYTE);
+   u8 mask_to_clear = BITMAP_FIRST_BYTE_MASK(pos);
+
+   while (len >= bits_to_clear) {
+   *p &= ~mask_to_clear;
+   len -= bits_to_clear;
+   bits_to_clear = BITS_PER_BYTE;
+   mask_to_clear = ~0;
+   p++;
+   }
+   if (len) {
+   mask_to_clear &= BITMAP_LAST_BYTE_MASK(size);
+   *p &= ~mask_to_clear;
+   }
+}
 /*
  * clear some bits on a range in the tree.
  */
diff --git a/extent_io.h b/extent_io.h
index d407d93d617e..b67c6fc40e89 100644
--- a/extent_io.h
+++ b/extent_io.h
@@ -175,4 +175,8 @@ int read_data_from_disk(struct btrfs_fs_info *info, void 
*buf, u64 offset,
u64 bytes, int mirror);
 int write_data_to_disk(struct btrfs_fs_info *info, void *buf, u64 offset,
   u64 bytes, int mirror);
+void extent_buffer_bitmap_clear(struct extent_buffer *eb, unsigned long start,
+unsigned long pos, unsigned long len);
+void extent_buffer_bitmap_set(struct extent_buffer *eb, unsigned long start,
+  unsigned long pos, unsigned long len);
 #endif
-- 
2.7.4



[PATCH 03/10] btrfs-progs: Replace homegrown bitops related functions with kernel counterparts

2018-10-01 Thread Nikolay Borisov
Replace existing find_*_bit functions with kernel equivalent. This
reduces duplication, simplifies the code (we really have one worker
function _find_next_bit) and is quite likely faster. No functional
changes.

Signed-off-by: Nikolay Borisov 
---
 kernel-lib/bitops.h | 142 +---
 1 file changed, 46 insertions(+), 96 deletions(-)

diff --git a/kernel-lib/bitops.h b/kernel-lib/bitops.h
index 5b35f9fc5213..78256adf55be 100644
--- a/kernel-lib/bitops.h
+++ b/kernel-lib/bitops.h
@@ -2,6 +2,7 @@
 #define _PERF_LINUX_BITOPS_H_
 
 #include 
+#include "internal.h"
 
 #ifndef DIV_ROUND_UP
 #define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
@@ -109,116 +110,65 @@ static __always_inline unsigned long __ffs(unsigned long 
word)
 
 #define ffz(x) __ffs(~(x))
 
+#define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 
1))) 
+#define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1)))
+
 /*
- * Find the first set bit in a memory region.
+ * This is a common helper function for find_next_bit, find_next_zero_bit, and
+ * find_next_and_bit. The differences are:
+ *  - The "invert" argument, which is XORed with each fetched word before
+ *searching it for one bits.
+ *  - The optional "addr2", which is anded with "addr1" if present.
  */
-static inline unsigned long
-find_first_bit(const unsigned long *addr, unsigned long size)
+static inline unsigned long _find_next_bit(const unsigned long *addr1,
+   const unsigned long *addr2, unsigned long nbits,
+   unsigned long start, unsigned long invert)
 {
-   const unsigned long *p = addr;
-   unsigned long result = 0;
unsigned long tmp;
 
-   while (size & ~(BITS_PER_LONG-1)) {
-   if ((tmp = *(p++)))
-   goto found;
-   result += BITS_PER_LONG;
-   size -= BITS_PER_LONG;
+   if (start >= nbits)
+   return nbits;
+
+   tmp = addr1[start / BITS_PER_LONG];
+   if (addr2)
+   tmp &= addr2[start / BITS_PER_LONG];
+   tmp ^= invert;
+
+   /* Handle 1st word. */
+   tmp &= BITMAP_FIRST_WORD_MASK(start);
+   start = round_down(start, BITS_PER_LONG);
+
+   while (!tmp) {
+   start += BITS_PER_LONG;
+   if (start >= nbits)
+   return nbits;
+
+   tmp = addr1[start / BITS_PER_LONG];
+   if (addr2)
+   tmp &= addr2[start / BITS_PER_LONG];
+   tmp ^= invert;
}
-   if (!size)
-   return result;
-
-   tmp = (*p) & (~0UL >> (BITS_PER_LONG - size));
-   if (tmp == 0UL) /* Are any bits set? */
-   return result + size;   /* Nope. */
-found:
-   return result + __ffs(tmp);
+
+   return min(start + __ffs(tmp), nbits);
 }
 
 /*
  * Find the next set bit in a memory region.
  */
-static inline unsigned long
-find_next_bit(const unsigned long *addr, unsigned long size,
- unsigned long offset)
+static inline unsigned long find_next_bit(const unsigned long *addr,
+ unsigned long size,
+ unsigned long offset)
 {
-   const unsigned long *p = addr + BITOP_WORD(offset);
-   unsigned long result = offset & ~(BITS_PER_LONG-1);
-   unsigned long tmp;
-
-   if (offset >= size)
-   return size;
-   size -= result;
-   offset %= BITS_PER_LONG;
-   if (offset) {
-   tmp = *(p++);
-   tmp &= (~0UL << offset);
-   if (size < BITS_PER_LONG)
-   goto found_first;
-   if (tmp)
-   goto found_middle;
-   size -= BITS_PER_LONG;
-   result += BITS_PER_LONG;
-   }
-   while (size & ~(BITS_PER_LONG-1)) {
-   if ((tmp = *(p++)))
-   goto found_middle;
-   result += BITS_PER_LONG;
-   size -= BITS_PER_LONG;
-   }
-   if (!size)
-   return result;
-   tmp = *p;
-
-found_first:
-   tmp &= (~0UL >> (BITS_PER_LONG - size));
-   if (tmp == 0UL) /* Are any bits set? */
-   return result + size;   /* Nope. */
-found_middle:
-   return result + __ffs(tmp);
+   return _find_next_bit(addr, NULL, size, offset, 0UL);
 }
 
-/*
- * This implementation of find_{first,next}_zero_bit was stolen from
- * Linus' asm-alpha/bitops.h.
- */
-static inline unsigned long
-find_next_zero_bit(const unsigned long *addr, unsigned long size,
-  unsigned long offset)
+static inline unsigned long find_next_zero_bit(const unsigned long *addr,
+  unsigned long size,
+  unsigned long offset)
 {
-   const unsigned long *p = addr + BITOP_WORD(offset);
-   unsigned long result = offset & ~(BITS

[PATCH 04/10] btrfs-progs: Implement find_*_bit_le operations

2018-10-01 Thread Nikolay Borisov
This commit introduces explicit little endian bit operations. The only
difference with the existing bitops implementation is that bswap(32|64)
is called when the _le versions are invoked on a big-endian machine.
This is in preparation for adding free space tree conversion support.

Signed-off-by: Nikolay Borisov 
---
 kernel-lib/bitops.h | 82 +
 1 file changed, 82 insertions(+)

diff --git a/kernel-lib/bitops.h b/kernel-lib/bitops.h
index 78256adf55be..5030bfa2815e 100644
--- a/kernel-lib/bitops.h
+++ b/kernel-lib/bitops.h
@@ -2,6 +2,7 @@
 #define _PERF_LINUX_BITOPS_H_
 
 #include 
+#include 
 #include "internal.h"
 
 #ifndef DIV_ROUND_UP
@@ -170,5 +171,86 @@ static inline unsigned long find_next_zero_bit(const 
unsigned long *addr,
 }
 
 #define find_first_bit(addr, size) find_next_bit((addr), (size), 0)
+#define find_first_zero_bit(addr, size) find_next_zero_bit((addr), (size), 0)
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+
+static inline unsigned long ext2_swab(const unsigned long y)
+{
+#if BITS_PER_LONG == 64
+   return (unsigned long) bswap64((u64) y);
+#elif BITS_PER_LONG == 32
+   return (unsigned long) bswap32((u32) y);
+#else
+#error BITS_PER_LONG not defined
+#endif
+}
+
+static inline unsigned long _find_next_bit_le(const unsigned long *addr1,
+   const unsigned long *addr2, unsigned long nbits,
+   unsigned long start, unsigned long invert)
+{
+   unsigned long tmp;
+
+   if (start >= nbits)
+   return nbits;
+
+   tmp = addr1[start / BITS_PER_LONG];
+   if (addr2)
+   tmp &= addr2[start / BITS_PER_LONG];
+   tmp ^= invert;
+
+   /* Handle 1st word. */
+   tmp &= ext2_swab(BITMAP_FIRST_WORD_MASK(start));
+   start = round_down(start, BITS_PER_LONG);
+
+   while (!tmp) {
+   start += BITS_PER_LONG;
+   if (start >= nbits)
+   return nbits;
+
+   tmp = addr1[start / BITS_PER_LONG];
+   if (addr2)
+   tmp &= addr2[start / BITS_PER_LONG];
+   tmp ^= invert;
+   }
+
+   return min(start + __ffs(ext2_swab(tmp)), nbits);
+}
+
+unsigned long find_next_zero_bit_le(const void *addr, unsigned
+   long size, unsigned long offset)
+{
+   return _find_next_bit_le(addr, NULL, size, offset, ~0UL);
+}
+
+
+unsigned long find_next_bit_le(const void *addr, unsigned
+   long size, unsigned long offset)
+{
+   return _find_next_bit_le(addr, NULL, size, offset, 0UL);
+}
+
+#else
+
+static inline unsigned long find_next_zero_bit_le(const void *addr,
+unsigned long size, unsigned long offset)
+{
+return find_next_zero_bit(addr, size, offset);
+}
+
+static inline unsigned long find_next_bit_le(const void *addr,
+unsigned long size, unsigned long offset)
+{
+return find_next_bit(addr, size, offset);
+}
+
+static inline unsigned long find_first_zero_bit_le(const void *addr,
+unsigned long size)
+{
+return find_first_zero_bit(addr, size);
+}
+
+#endif
 
 #endif
-- 
2.7.4



[PATCH 07/10] btrfs-progs: Add freespace tree as compat_ro supported feature

2018-10-01 Thread Nikolay Borisov
The RO_FREE_SPACE_TREE(_VALID) flags are required in order to be able
to open an FST filesystem in repair mode. Add them to
BTRFS_FEATURE_COMPAT_RO_SUPP.

Signed-off-by: Nikolay Borisov 
---
 ctree.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ctree.h b/ctree.h
index a6d6c3decd87..3c396e7d293d 100644
--- a/ctree.h
+++ b/ctree.h
@@ -497,7 +497,9 @@ struct btrfs_super_block {
  * added here until read-write support for the free space tree is implemented 
in
  * btrfs-progs.
  */
-#define BTRFS_FEATURE_COMPAT_RO_SUPP   0ULL
+#define BTRFS_FEATURE_COMPAT_RO_SUPP   \
+   (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |  \
+BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID)
 
 #define BTRFS_FEATURE_INCOMPAT_SUPP\
(BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF | \
-- 
2.7.4



[PATCH 08/10] btrfs-progs: check: Add support for freespace tree fixing

2018-10-01 Thread Nikolay Borisov
Now that all the prerequisite code for proper support of free space
tree repair is in, it's time to wire it in. This is achieved by first
hooking the freespace tree to the __free_extent/alloc_reserved_tree_block
functions. And then introducing a wrapper function to contains the
existing check_space_cache and the newly introduced repair code.
Finally, it's important to note that FST repair code first clears the
existing FST in case of any problem found and rebuilds it from scratch.

Signed-off-by: Nikolay Borisov 
---
 check/main.c | 47 ++-
 1 file changed, 30 insertions(+), 17 deletions(-)

diff --git a/check/main.c b/check/main.c
index b361cd7e26a0..4daf85aad82c 100644
--- a/check/main.c
+++ b/check/main.c
@@ -5392,14 +5392,6 @@ static int check_space_cache(struct btrfs_root *root)
int ret;
int error = 0;
 
-   if (btrfs_super_cache_generation(root->fs_info->super_copy) != -1ULL &&
-   btrfs_super_generation(root->fs_info->super_copy) !=
-   btrfs_super_cache_generation(root->fs_info->super_copy)) {
-   printf("cache and super generation don't match, space cache "
-  "will be invalidated\n");
-   return 0;
-   }
-
while (1) {
ctx.item_count++;
cache = btrfs_lookup_first_block_group(root->fs_info, start);
@@ -9417,7 +9409,6 @@ static int do_clear_free_space_cache(struct btrfs_fs_info 
*fs_info,
ret = 1;
goto close_out;
}
-   printf("Clearing free space cache\n");
ret = clear_free_space_cache(fs_info);
if (ret) {
error("failed to clear free space cache");
@@ -9444,6 +9435,35 @@ static int do_clear_free_space_cache(struct 
btrfs_fs_info *fs_info,
return ret;
 }
 
+static int validate_free_space_cache(struct btrfs_root *root)
+{
+
+   int ret;
+
+   if (btrfs_super_cache_generation(root->fs_info->super_copy) != -1ULL &&
+   btrfs_super_generation(root->fs_info->super_copy) !=
+   btrfs_super_cache_generation(root->fs_info->super_copy)) {
+   printf("cache and super generation don't match, space cache "
+  "will be invalidated\n");
+   return 0;
+   }
+
+   ret = check_space_cache(root);
+   if (ret && btrfs_fs_compat_ro(global_info, FREE_SPACE_TREE)
+   && repair) {
+   ret = do_clear_free_space_cache(global_info, 2);
+   if (ret)
+   goto out;
+
+   ret = btrfs_create_free_space_tree(global_info);
+   if (ret)
+   error("couldn't repair freespace tree");
+   }
+
+out:
+   return ret ? -EINVAL : 0;
+}
+
 const char * const cmd_check_usage[] = {
"btrfs check [options] ",
"Check structural integrity of a filesystem (unmounted).",
@@ -9877,16 +9897,9 @@ int cmd_check(int argc, char **argv)
task_start(ctx.info, &ctx.start_time, &ctx.item_count);
}
 
-   ret = check_space_cache(root);
+   ret = validate_free_space_cache(root);
task_stop(ctx.info);
err |= !!ret;
-   if (ret) {
-   if (is_free_space_tree)
-   error("errors found in free space tree");
-   else
-   error("errors found in free space cache");
-   goto out;
-   }
 
/*
 * We used to have to have these hole extents in between our real
-- 
2.7.4



[PATCH 10/10] btrfs-progs: check: Fix wrong error message in case of corrupted bitmap

2018-10-01 Thread Nikolay Borisov
Similarly to the fix in e444c7bfa65f ("btrfs-progs: check: Fix wrong
error message in case of corrupted extent") this commits addresses the
same problem but for corrupted bitmap objects.

Signed-off-by: Nikolay Borisov 
---
 free-space-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/free-space-tree.c b/free-space-tree.c
index 3b7e8a3fe4f5..690da44a739d 100644
--- a/free-space-tree.c
+++ b/free-space-tree.c
@@ -1302,7 +1302,7 @@ static int load_free_space_bitmaps(struct btrfs_fs_info 
*fs_info,
if (key.objectid + key.offset > end) {
fprintf(stderr,
"free space bitmap ends at %llu, beyond end of block group %llu-%llu\n",
-   key.objectid, start, end);
+   key.objectid + key.offset, start, end);
(*errors)++;
break;
}
-- 
2.7.4



[PATCH 09/10] btrfs-progs: tests: Test for FST corruption detection/repair

2018-10-01 Thread Nikolay Borisov
Simple test case which preps a filesystem, then corrupts the FST and
finally repairs it. Tests both extent based and bitmap based FSTs.

Signed-off-by: Nikolay Borisov 
---
 tests/fsck-tests/036-freespacetree-repair/test.sh | 76 +++
 1 file changed, 76 insertions(+)
 create mode 100755 tests/fsck-tests/036-freespacetree-repair/test.sh

diff --git a/tests/fsck-tests/036-freespacetree-repair/test.sh 
b/tests/fsck-tests/036-freespacetree-repair/test.sh
new file mode 100755
index ..8c91ce9dedbc
--- /dev/null
+++ b/tests/fsck-tests/036-freespacetree-repair/test.sh
@@ -0,0 +1,76 @@
+#!/bin/bash
+# Corrupt a filesystem that is using freespace tree and then ensure that
+# btrfs check is able to repair it. This tests correct detection/repair of
+# both a FREE_SPACE_EXTENT based FST and a FREE_SPACE_BITMAP based FST.
+
+source "$TEST_TOP/common"
+
+# wrapper for btrfs-corrupt-item
+# $1: Type of item we want to corrupt - extent or bitmap
+corrupt_fst_item()
+{
+   local type
+   local objectid
+   local offset
+   type="$1"
+
+   if [[ $type == "bitmap" ]]; then
+   type=200
+   objectid=$("$TOP/btrfs" inspect-internal dump-tree -t 10 
"$TEST_DEV" | \
+   grep -o "[[:digit:]]* FREE_SPACE_BITMAP [[:digit:]]*" | 
\
+   cut -d' ' -f1 | tail -2 | head -1)
+   offset=$("$TOP/btrfs" inspect-internal dump-tree -t 10 
"$TEST_DEV" | \
+   grep -o "[[:digit:]]* FREE_SPACE_BITMAP [[:digit:]]*" | 
\
+   cut -d' ' -f3 |tail -2 | head -1)
+   echo "Corrupting $objectid,FREE_SPACE_BITMAP,$offset" >> 
"$RESULTS"
+   elif [[ $type == "extent" ]]; then
+   type=199
+   objectid=$("$TOP/btrfs" inspect-internal dump-tree -t 10 
"$TEST_DEV" | \
+   grep -o "[[:digit:]]* FREE_SPACE_EXTENT [[:digit:]]*" | 
\
+   cut -d' ' -f1 | tail -2 | head -1)
+   offset=$("$TOP/btrfs" inspect-internal dump-tree -t 10 
"$TEST_DEV" | \
+   grep -o "[[:digit:]]* FREE_SPACE_EXTENT [[:digit:]]*" | 
\
+   cut -d' ' -f3 | tail -2 | head -1)
+   echo "Corrupting $objectid,FREE_SPACE_EXTENT,$offset" >> 
"$RESULTS"
+   else
+   _fail "Unknown item type for corruption"
+   fi
+
+   run_check "$TOP/btrfs-corrupt-block" -r 10 -K "$objectid,$type,$offset" 
\
+   -f offset "$TEST_DEV"
+}
+
+check_prereq btrfs
+check_prereq mkfs.btrfs
+check_global_prereq grep
+check_global_prereq tail
+check_global_prereq head
+check_global_prereq cut
+
+setup_root_helper
+prepare_test_dev 256M
+
+run_check "$TOP/mkfs.btrfs" -n 4k -f "$TEST_DEV"
+run_check_mount_test_dev -oclear_cache,space_cache=v2
+
+#create files which will populate the FST
+for i in {1..3000}; do
+   fallocate -l 4k "$TEST_MNT/file.$i"
+done
+
+run_check_umount_test_dev
+
+#now corrupt one of the bitmap items
+corrupt_fst_item "bitmap"
+check_image "$TEST_DEV"
+
+# change the freespace such that we now have at least one free_space_extent
+# object
+run_check_mount_test_dev
+rm -rf "$TEST_MNT/file.*"
+fallocate -l 50m "$TEST_MNT/file"
+run_check_umount_test_dev
+
+#now corrupt an extent
+corrupt_fst_item "extent"
+check_image "$TEST_DEV"
-- 
2.7.4



[PATCH 05/10] btrfs-progs: Pull free space tree related code from kernel

2018-10-01 Thread Nikolay Borisov
To help implement free space tree checker in user space some kernel
function are necessary, namely iterating/deleting/adding freespace
items, some internal search functions. Functions to populate a block
group based on the extent tree. The code is largely copy/paste from
the kernel with locking eliminated (i.e free_space_lock). It supports
reading/writing of both bitmap and extent based FST trees.

Signed-off-by: Nikolay Borisov 
---
 ctree.c   |   77 
 ctree.h   |   15 +
 free-space-tree.c | 1253 -
 free-space-tree.h |   13 +-
 kerncompat.h  |6 +
 5 files changed, 1357 insertions(+), 7 deletions(-)

diff --git a/ctree.c b/ctree.c
index d8a6883aa85f..aa1568620205 100644
--- a/ctree.c
+++ b/ctree.c
@@ -1226,6 +1226,83 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, 
struct btrfs_root
 }
 
 /*
+ * helper to use instead of search slot if no exact match is needed but
+ * instead the next or previous item should be returned.
+ * When find_higher is true, the next higher item is returned, the next lower
+ * otherwise.
+ * When return_any and find_higher are both true, and no higher item is found,
+ * return the next lower instead.
+ * When return_any is true and find_higher is false, and no lower item is 
found,
+ * return the next higher instead.
+ * It returns 0 if any item is found, 1 if none is found (tree empty), and
+ * < 0 on error
+ */
+int btrfs_search_slot_for_read(struct btrfs_root *root,
+   const struct btrfs_key *key,
+   struct btrfs_path *p, int find_higher,
+   int return_any)
+{
+int ret;
+struct extent_buffer *leaf;
+
+again:
+ret = btrfs_search_slot(NULL, root, key, p, 0, 0);
+if (ret <= 0)
+return ret;
+/*
+ * a return value of 1 means the path is at the position where the
+ * item should be inserted. Normally this is the next bigger item,
+ * but in case the previous item is the last in a leaf, path points
+ * to the first free slot in the previous leaf, i.e. at an invalid
+ * item.
+ */
+leaf = p->nodes[0];
+
+if (find_higher) {
+if (p->slots[0] >= btrfs_header_nritems(leaf)) {
+ret = btrfs_next_leaf(root, p);
+if (ret <= 0)
+return ret;
+if (!return_any)
+return 1;
+/*
+ * no higher item found, return the next
+ * lower instead
+ */
+return_any = 0;
+find_higher = 0;
+btrfs_release_path(p);
+goto again;
+}
+} else {
+if (p->slots[0] == 0) {
+ret = btrfs_prev_leaf(root, p);
+if (ret < 0)
+return ret;
+if (!ret) {
+leaf = p->nodes[0];
+if (p->slots[0] == btrfs_header_nritems(leaf))
+p->slots[0]--;
+return 0;
+}
+if (!return_any)
+return 1;
+/*
+ * no lower item found, return the next
+ * higher instead
+ */
+return_any = 0;
+find_higher = 1;
+btrfs_release_path(p);
+goto again;
+} else {
+--p->slots[0];
+}
+}
+return 0;
+}
+
+/*
  * adjust the pointers going up the tree, starting at level
  * making sure the right key of each node is points to 'key'.
  * This is used after shifting pointers to the left, so it stops
diff --git a/ctree.h b/ctree.h
index 49f0f5181512..a6d6c3decd87 100644
--- a/ctree.h
+++ b/ctree.h
@@ -1071,6 +1071,17 @@ struct btrfs_block_group_cache {
u64 flags;
int cached;
int ro;
+   /*
+ * If the free space extent count exceeds this number, convert the 
block
+ * group to bitmaps.
+ */
+u32 bitmap_high_thresh;
+/*
+ * If the free space extent count drops below this number, convert the
+ * block group back to extents.
+ */
+u32 bitmap_low_thresh;
+
 };
 
 struct btrfs_device;
@@ -2596,6 +2607,10 @@ int btrfs_split_item(struct btrfs_trans_handle *trans,
 int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root
  *root, struct btrfs_key *key, struct btrfs_path *p, int
  ins_len, int

[PATCH 06/10] btrfs-progs: Hook FST code in extent (de)alloc

2018-10-01 Thread Nikolay Borisov
For now this doesn't change the functionality since FST code is not
yet enabled via the compat bits. But this will be needed when it's
enabled so that the FST is correctly modified during repair operations
that allocate/deallocate  extents.

Signed-off-by: Nikolay Borisov 
---
 extent-tree.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/extent-tree.c b/extent-tree.c
index b9a30644720b..7adda557f8e6 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -29,6 +29,7 @@
 #include "crc32c.h"
 #include "volumes.h"
 #include "free-space-cache.h"
+#include "free-space-tree.h"
 #include "utils.h"
 
 #define PENDING_EXTENT_INSERT 0
@@ -2276,6 +2277,11 @@ static int __free_extent(struct btrfs_trans_handle 
*trans,
BUG_ON(ret);
}
 
+   ret = add_to_free_space_tree(trans, bytenr, num_bytes);
+   if (ret) {
+   goto fail;
+   }
+
update_block_group(trans->fs_info, bytenr, num_bytes, 0,
   mark_free);
}
@@ -2595,6 +2601,11 @@ static int alloc_reserved_tree_block(struct 
btrfs_trans_handle *trans,
btrfs_mark_buffer_dirty(leaf);
btrfs_free_path(path);
 
+   ret = remove_from_free_space_tree(trans, ins.objectid,
+ fs_info->nodesize);
+   if (ret)
+   return ret;
+
ret = update_block_group(fs_info, ins.objectid, fs_info->nodesize, 1,
 0);
 
-- 
2.7.4



[PATCH 00/10] Freespace tree repair support v2

2018-10-01 Thread Nikolay Borisov
Hello, 

Here is the v2 of the freespace tree repair support patches. Version 1 can be 
found at [0]. For background on the series please refer to the initial cover 
letter. This time round a number of changes have been incorporated based on 
feedback from Omar. Namely: 

1. Added 2 new patches, patches 3 and 4,  which refactor the userspace
impelementation of various bit manipulation functions. This was required to
implement the bitmap conversion code from kernel space. 

2. Extended patch 5 to include the bitmap conversion code, now userspace 
handles the case of fragmented fst. 

3. Re-ordered patches 6 and 7 so that we first hook into the extent 
alloc/dealloc sequence (patch 6) and finally enable support for fst in patch 7.

4. Updated the FST test (patch 9) to work with the new bitmap conversion code 
since it changed the layout of the expected freespace tree. Now the test 
passes again. Also removed an unused function. 

5. Added patch 10 which fixes a similar bug to e444c7bfa65f ("btrfs-progs:
check: Fix wrong error message in case of corrupted extent")

Following the updates, I rerun all misc/fsck-tests and no failures were 
observed. 

More feedback and suggestions are always welcomed.

[0] 
https://lore.kernel.org/linux-btrfs/1529060762-4372-1-git-send-email-nbori...@suse.com/


Nikolay Borisov (10):
  btrfs-progs: Add support for freespace tree in btrfs_read_fs_root
  btrfs-progs: Add extent buffer bitmap manipulation infrastructure
  btrfs-progs: Replace homegrown bitops related functions with kernel
counterparts
  btrfs-progs: Implement find_*_bit_le operations
  btrfs-progs: Pull free space tree related code from kernel
  btrfs-progs: Hook FST code in extent (de)alloc
  btrfs-progs: Add freespace tree as compat_ro supported feature
  btrfs-progs: check: Add support for freespace tree fixing
  btrfs-progs: tests: Test for FST corruption detection/repair
  btrfs-progs: check: Fix wrong error message in case of corrupted
bitmap

 check/main.c  |   47 +-
 ctree.c   |   77 ++
 ctree.h   |   19 +-
 disk-io.c |3 +
 extent-tree.c |   11 +
 extent_io.c   |   56 +
 extent_io.h   |4 +
 free-space-tree.c | 1255 -
 free-space-tree.h |   13 +-
 kerncompat.h  |6 +
 kernel-lib/bitops.h   |  210 ++--
 tests/fsck-tests/036-freespacetree-repair/test.sh |   76 ++
 12 files changed, 1662 insertions(+), 115 deletions(-)
 create mode 100755 tests/fsck-tests/036-freespacetree-repair/test.sh

-- 
2.7.4



Re: cross-fs copy support

2018-10-01 Thread Qu Wenruo


On 2018/10/1 下午10:32, Joshi wrote:
> I was wondering about the cross-fs copy through copy_file_range.

The term "cross-fs" looks pretty confusing.

If you mean "cross-subvolume", then it should work without problem in btrfs.

If you mean reflink across two different file systems (not matter the
same fs type or not).
Then it's impossible to work.

Reflink (clone_file_range) works by inserting data pointers into the
filesystem other than really copying the data.
Thus if the source is outside of the fs, it's really impossible to work,
as the source pointer/data is completely out of control of the dest fs.

> It seems current implement has below check, that disables such copy.
> 
> 1577 /* this could be relaxed once a method supports cross-fs copies 
> */
> 1578 if (inode_in->i_sb != inode_out->i_sb)
> 1579 return -EXDEV;
> 
> May I know what are the thoughts behind disabling cross-fs copy?
> Code has the comment "once a method supports", but that leaves me
> wondering exactly what 'method' is expected, and from whom.
> 
> I disabled the check, and copy across volumes seemed to work fine. At
> least for a single file (1G size), with no data mismatch, and faster
> speed than regular copy.

Please provide the steps or script about how you did the reflink, in
case I misunderstand your "cross-fs" definition.

And just in case you're really doing cross filesystem reflink, please
also run "btrfs check" on both fs.

Thanks,
Qu




signature.asc
Description: OpenPGP digital signature


Re: cross-fs copy support

2018-10-01 Thread Joshi
Sorry if my post was not clear enough. I picked the term "cross-fs"
from the implementation of vfs_copy_file_range.
Below snippet, inside vfs_copy_file_range -
1578 /* this could be relaxed once a method supports cross-fs copies */
1579 if (inode_in->i_sb != inode_out->i_sb)
1580 return -EXDEV;

And I was not trying reflink for copy, rather I just used example code
for copy_file_range syscall -
http://man7.org/linux/man-pages/man2/copy_file_range.2.html
This code creates a new file first, and later issues "copy_file_range"
for copying. I was trying this sort of copy among files residing on
btrfs, ext4, and xfs.
Although vfs_copy_file_range internally uses clone_file_range (which
would be true for btrfs and xfs).
On Mon, Oct 1, 2018 at 8:18 PM Qu Wenruo  wrote:
>
>
>
> On 2018/10/1 下午10:32, Joshi wrote:
> > I was wondering about the cross-fs copy through copy_file_range.
>
> The term "cross-fs" looks pretty confusing.
>
> If you mean "cross-subvolume", then it should work without problem in btrfs.
>
> If you mean reflink across two different file systems (not matter the
> same fs type or not).
> Then it's impossible to work.
>
> Reflink (clone_file_range) works by inserting data pointers into the
> filesystem other than really copying the data.
> Thus if the source is outside of the fs, it's really impossible to work,
> as the source pointer/data is completely out of control of the dest fs.
>
> > It seems current implement has below check, that disables such copy.
> >
> > 1577 /* this could be relaxed once a method supports cross-fs 
> > copies */
> > 1578 if (inode_in->i_sb != inode_out->i_sb)
> > 1579 return -EXDEV;
> >
> > May I know what are the thoughts behind disabling cross-fs copy?
> > Code has the comment "once a method supports", but that leaves me
> > wondering exactly what 'method' is expected, and from whom.
> >
> > I disabled the check, and copy across volumes seemed to work fine. At
> > least for a single file (1G size), with no data mismatch, and faster
> > speed than regular copy.
>
> Please provide the steps or script about how you did the reflink, in
> case I misunderstand your "cross-fs" definition.
>
> And just in case you're really doing cross filesystem reflink, please
> also run "btrfs check" on both fs.
>
> Thanks,
> Qu
>
>


-- 
Joshi


Re: cross-fs copy support

2018-10-01 Thread Eric Sandeen
On 10/1/18 9:48 AM, Qu Wenruo wrote:
> 
> 
> On 2018/10/1 下午10:32, Joshi wrote:
>> I was wondering about the cross-fs copy through copy_file_range.
> 
> The term "cross-fs" looks pretty confusing.
> 
> If you mean "cross-subvolume", then it should work without problem in btrfs.
> 
> If you mean reflink across two different file systems (not matter the
> same fs type or not).
> Then it's impossible to work.

I believe Joshi is talking about vfs_copy_file_range() not
vfs_clone_file range(), although _copy_ does call _clone_ if it can.

> Reflink (clone_file_range) works by inserting data pointers into the
> filesystem other than really copying the data.
> Thus if the source is outside of the fs, it's really impossible to work,
> as the source pointer/data is completely out of control of the dest fs.

Yes, I would expect there to be problems with his modified kernel
for a filesystem that supports clone_file_range, because
vfs_copy_file_range() will clone if possible, and this should fail across
filesystems.

In general, though, I don't know for sure why we don't fall back to
do_splice_direct() across filesystems, although the filesystems that
implement their own ->copy_file_range ops may have their own,
further restrictions within their implementations.

This call /is/ documented in the manpage as only being valid for
files on the same filesystem, though:
http://man7.org/linux/man-pages/man2/copy_file_range.2.html

-Eric



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 08/42] btrfs: dump block_rsv whe dumping space info

2018-10-01 Thread David Sterba
On Fri, Sep 28, 2018 at 07:17:47AM -0400, Josef Bacik wrote:
> For enospc_debug having the block rsvs is super helpful to see if we've
> done something wrong.
> 
> Signed-off-by: Josef Bacik 
> Reviewed-by: Omar Sandoval 

Reviewed-by: David Sterba 


Re: [PATCH 10/42] btrfs: protect space cache inode alloc with nofs

2018-10-01 Thread David Sterba
On Fri, Sep 28, 2018 at 07:17:49AM -0400, Josef Bacik wrote:
> If we're allocating a new space cache inode it's likely going to be
> under a transaction handle, so we need to use memalloc_nofs_save() in
> order to avoid deadlocks, and more importantly lockdep messages that
> make xfstests fail.
> 
> Reviewed-by: Omar Sandoval 
> Signed-off-by: Josef Bacik 

Reviewed-by: David Sterba 


Re: [PATCH 36/42] btrfs: wait on caching when putting the bg cache

2018-10-01 Thread David Sterba
On Fri, Sep 28, 2018 at 07:18:15AM -0400, Josef Bacik wrote:
> While testing my backport I noticed there was a panic if I ran
> generic/416 generic/417 generic/418 all in a row.  This just happened to
> uncover a race where we had outstanding IO after we destroy all of our
> workqueues, and then we'd go to queue the endio work on those free'd
> workqueues.  This is because we aren't waiting for the caching threads
> to be done before freeing everything up, so to fix this make sure we
> wait on any outstanding caching that's being done before we free up the
> block group, so we're sure to be done with all IO by the time we get to
> btrfs_stop_all_workers().  This fixes the panic I was seeing
> consistently in testing.

Can you please attach the stacktrace?

I was running the tests 416/417/418 in a loop for a few days and did not
reproduce the crash (in a VM) so this depends on the setup.


Re: cross-fs copy support

2018-10-01 Thread Andreas Dilger
On Oct 1, 2018, at 9:49 AM, Eric Sandeen  wrote:
> 
> 
> On 10/1/18 9:48 AM, Qu Wenruo wrote:
>> 
>> 
>> On 2018/10/1 下午10:32, Joshi wrote:
>>> I was wondering about the cross-fs copy through copy_file_range.
>> 
>> The term "cross-fs" looks pretty confusing.
>> 
>> If you mean "cross-subvolume", then it should work without problem in btrfs.
>> 
>> If you mean reflink across two different file systems (not matter the
>> same fs type or not).
>> Then it's impossible to work.
> 
> I believe Joshi is talking about vfs_copy_file_range() not
> vfs_clone_file range(), although _copy_ does call _clone_ if it can.
> 
>> Reflink (clone_file_range) works by inserting data pointers into the
>> filesystem other than really copying the data.
>> Thus if the source is outside of the fs, it's really impossible to work,
>> as the source pointer/data is completely out of control of the dest fs.
> 
> Yes, I would expect there to be problems with his modified kernel
> for a filesystem that supports clone_file_range, because
> vfs_copy_file_range() will clone if possible, and this should fail across
> filesystems.
> 
> In general, though, I don't know for sure why we don't fall back to
> do_splice_direct() across filesystems, although the filesystems that
> implement their own ->copy_file_range ops may have their own,
> further restrictions within their implementations.
> 
> This call /is/ documented in the manpage as only being valid for
> files on the same filesystem, though:
> http://man7.org/linux/man-pages/man2/copy_file_range.2.html

There was a patch to allow cross-mount copy for NFS, but it hasn't landed
yet.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP


Re: [patch] file dedupe (and maybe clone) data corruption (was Re: [PATCH] generic: test for deduplication between different files)

2018-10-01 Thread Andreas Dilger
On Sep 20, 2018, at 10:40 PM, Zygo Blaxell  
wrote:
> 
> On Fri, Sep 21, 2018 at 12:59:31PM +1000, Dave Chinner wrote:
>> On Wed, Sep 19, 2018 at 12:12:03AM -0400, Zygo Blaxell wrote:
> [...]
>> With no DMAPI in the future, people with custom HSM-like interfaces
>> based on dmapi are starting to turn to fanotify and friends to
>> provide them with the change notifications they require
> 
> I had a fanotify-based scanner once, before I noticed btrfs effectively
> had timestamps all over its metadata.
> 
> fanotify won't tell me which parts of a file were modified (unless it
> got that feature in the last few years?).  fanotify was pretty useless
> when the only file on the system that was being modified was a 13TB
> VM image.  Or even a little 16GB one.  Has to scan the whole file to
> find the one new byte.  Even on desktops the poor thing spends most of
> its time looping over /var/log/messages.  It was sad.
> 
> If fanotify gave me (inode, offset, length) tuples of dirty pages in
> cache, I could look them up and use a dedupe_file_range call to replace
> the dirty pages with a reference to an existing disk block.  If my
> listener can do that fast enough, it's in-band dedupe; if it doesn't,
> the data gets flushed to disk as normal, and I fall back to a scan of
> the filesystem to clean it up later.
> 
 e.g. a soft requirement is that we need to scan the entire fs at
 least once a month.
>>> 
>>> I have to scan and dedupe multiple times per hour.  OK, the first-ever
>>> scan of a non-empty filesystem is allowed to take much longer, but after
>>> that, if you have enough spare iops for continuous autodefrag you should
>>> also have spare iops for continuous dedupe.
>> 
>> Yup, but using notifications avoids the for even these scans - you'd
>> know exactly what data has changed, when it changed, and know
>> exactly that you needed to read to calculate the new hashes.
> 
> ...if the scanner can keep up with the notifications; otherwise, the
> notification receiver has to log them somewhere for the scanner to
> catch up.  If there are missed or dropped notifications--or 23 hours a
> day we're not listening for notifications because we only have an hour
> a day maintenance window--some kind of filesystem scan has to be done
> after the fact anyway.

It is worthwhile to mention that Lustre has a persistent Changelog record
that is generated atomically with the filesystem transaction that the event
happened in.

Once there is a Changelog consumer that registers itself with the filesystem,
along with a mask of the event types that it is interested in, the Changelog
begins recording all such events to disk (e.g. create, mkdir, setattr, etc.).
The Changelog consumer periodically notifies the filesystem when it has
processed events up to X, so that it can purge old events from the log.  It
is possible to have multiple consumers registered, and the log is only purged
up to the slowest consumer.

If a consumer hasn't processed logs in some (relatively long) time (e.g. many
days or weeks), or if the filesystem is otherwise going to run out of space,
then the consumer is deregistered and the old log records are cleaned up.  This
also notifies the consumer that it is is no longer active, and it has to do a
full scan to update its state for the events that it missed.

Having a persistent changelog is useful for all kinds of event processing,
and avoids the need to do real-time processing.  If the userspace daemon fails,
or the system is restarted, etc. then there is no need to rescan the whole
filesystem, which is important when there are many billions of files therein.


Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP


Re: btrfs send receive ERROR: chown failed: No such file or directory

2018-10-01 Thread Leonard Lausen
Hello,

does anyone have an idea about below issue? It is a severe issue as it
renders btrfs send / receive dysfunctional and it is not clear if there
may be a data corruption issue hiding in the current send / receive
code.

Thank you.

Best regards
Leonard

Leonard Lausen  writes:
> Hello!
>
> I observe the following issue with btrfs send | btrfs receive in a setup
> with 2 machines and 3 btrfs file-systems. All machines run Linux 4.18.9.
> Machine 1 runs btrfs-progs 4.17.1, machine 2 runs btrfs-progs 4.17 (via
> https://packages.debian.org/stretch-backports/btrfs-progs).
>
> 1) Machine 1 takes regular snapshots and sends them to machine 2. btrfs 
>btrfs send ... | ssh user@machine2 "btrfs receive /path1"
> 2) Machine 2 backups all subvolumes stored at /path1 to a second
>independent btrfs filesystem. Let /path1/rootsnapshot be the first
>snapshot stored at /path1 (ie. it has no Parent UUID). Let
>/path1/incrementalsnapshot be a snapshot that has /path1/rootsnapshot
>as a parent. Then
>btrfs send -v /path1/rootsnapshot | btrfs receive /path2
>works without issues, but
>btrfs send -v -p /path1/rootsnapshot /path1/incrementalsnapshot | btrfs 
> receive /path2
>fails as follows:
>ERROR: chown o257-4639416-0 failed: No such file or directory
>
> No error is shown in dmesg. /path1 and /path2 denote two independent
> btrfs filesystems.
>
> Note that there was no issue with transferring incrementalsnapshot from
> machine 1 to machine 2. No error is shown in dmesg.
>
> Best regards
> Leonard


Re: [PATCH] btrfs: list usage cleanup

2018-10-01 Thread David Sterba
On Thu, Sep 27, 2018 at 11:47:04AM -0700, Omar Sandoval wrote:
> On Wed, Sep 26, 2018 at 04:35:45PM +0800, zhong jiang wrote:
> > Trival cleanup, list_move_tail will implement the same function that
> > list_del() + list_add_tail() will do. hence just replace them.
> > 
> > Signed-off-by: zhong jiang 
> > ---
> >  fs/btrfs/send.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> > index 094cc144..d87f416 100644
> > --- a/fs/btrfs/send.c
> > +++ b/fs/btrfs/send.c
> > @@ -2075,8 +2075,7 @@ static struct name_cache_entry 
> > *name_cache_search(struct send_ctx *sctx,
> >   */
> >  static void name_cache_used(struct send_ctx *sctx, struct name_cache_entry 
> > *nce)
> >  {
> > -   list_del(&nce->list);
> > -   list_add_tail(&nce->list, &sctx->name_cache_list);
> > +   list_move_tail(&nce->list, &sctx->name_cache_list);
> >  }
> 
> At that point do we even need such a trivial helper, considering that
> this is only called in one place?

Fair point and trivial one-line helpers are on the cleanup todo list.
The exception is when the actual helper implementation is obscuring the
semantics and the helper is used in many places so it's not practical to
add a comment everywhere. But it's not the case here.

Zhong Jiang, please update the patch and resend, thanks.