Re: Recovery from full metadata with all device space consumed?

2018-04-19 Thread Hugo Mills
On Thu, Apr 19, 2018 at 04:12:39PM -0700, Drew Bloechl wrote:
> On Thu, Apr 19, 2018 at 10:43:57PM +, Hugo Mills wrote:
> >Given that both data and metadata levels here require paired
> > chunks, try adding _two_ temporary devices so that it can allocate a
> > new block group.
> 
> Thank you very much, that seems to have done the trick:
> 
> # fallocate -l 4GiB /var/tmp/btrfs-temp-1
> # fallocate -l 4GiB /var/tmp/btrfs-temp-2
> # losetup -f /var/tmp/btrfs-temp-1
> # losetup -f /var/tmp/btrfs-temp-2
> # btrfs device add /dev/loop0 /broken
> Performing full device TRIM (4.00GiB) ...
> # btrfs device add /dev/loop1 /broken
> Performing full device TRIM (4.00GiB) ...
> # btrfs balance start -v -dusage=1 /broken
> Dumping filters: flags 0x1, state 0x0, force is off
>   DATA (flags 0x2): balancing, usage=1

   Excellent. Don't forget to "btrfs dev delete" the devices after
you're finished the balance. You could damage the FS (possibly
irreparably) if you destroy the devices without doing so.

> I'm guessing that'll take a while to complete, but meanwhile, in another
> terminal:
> 
> # btrfs fi show /broken
> Label: 'mon_data'  uuid: 85e52555-7d6d-4346-8b37-8278447eb590
>   Total devices 6 FS bytes used 69.53GiB
>   devid1 size 931.51GiB used 731.02GiB path /dev/sda1
>   devid2 size 931.51GiB used 731.02GiB path /dev/sdb1
>   devid3 size 931.51GiB used 730.03GiB path /dev/sdc1
>   devid4 size 931.51GiB used 730.03GiB path /dev/sdd1
>   devid5 size 4.00GiB used 1.00GiB path /dev/loop0
>   devid6 size 4.00GiB used 1.00GiB path /dev/loop1
> 
> # btrfs fi df /broken
> Data, RAID0: total=2.77TiB, used=67.00GiB
> System, RAID1: total=8.00MiB, used=192.00KiB
> Metadata, RAID1: total=4.00GiB, used=2.49GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> Do I understand correctly that this could require up to 3 extra devices,
> if for instance you arrived in this situation with a RAID6 data profile?
> Or is the number even higher for profiles like RAID10?

   The minimum number of devices for each RAID level is:

single, DUP: 1
RAID-0, -1, -5:  2
RAID-6:  3
RAID-10: 4

   Hugo.

-- 
Hugo Mills | Gentlemen! You can't fight here! This is the War
hugo@... carfax.org.uk | Room!
http://carfax.org.uk/  |
PGP: E2AB1DE4  |Dr Strangelove


signature.asc
Description: Digital signature


Re: Recovery from full metadata with all device space consumed?

2018-04-19 Thread Drew Bloechl
On Thu, Apr 19, 2018 at 10:43:57PM +, Hugo Mills wrote:
>Given that both data and metadata levels here require paired
> chunks, try adding _two_ temporary devices so that it can allocate a
> new block group.

Thank you very much, that seems to have done the trick:

# fallocate -l 4GiB /var/tmp/btrfs-temp-1
# fallocate -l 4GiB /var/tmp/btrfs-temp-2
# losetup -f /var/tmp/btrfs-temp-1
# losetup -f /var/tmp/btrfs-temp-2
# btrfs device add /dev/loop0 /broken
Performing full device TRIM (4.00GiB) ...
# btrfs device add /dev/loop1 /broken
Performing full device TRIM (4.00GiB) ...
# btrfs balance start -v -dusage=1 /broken
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=1

I'm guessing that'll take a while to complete, but meanwhile, in another
terminal:

# btrfs fi show /broken
Label: 'mon_data'  uuid: 85e52555-7d6d-4346-8b37-8278447eb590
Total devices 6 FS bytes used 69.53GiB
devid1 size 931.51GiB used 731.02GiB path /dev/sda1
devid2 size 931.51GiB used 731.02GiB path /dev/sdb1
devid3 size 931.51GiB used 730.03GiB path /dev/sdc1
devid4 size 931.51GiB used 730.03GiB path /dev/sdd1
devid5 size 4.00GiB used 1.00GiB path /dev/loop0
devid6 size 4.00GiB used 1.00GiB path /dev/loop1

# btrfs fi df /broken
Data, RAID0: total=2.77TiB, used=67.00GiB
System, RAID1: total=8.00MiB, used=192.00KiB
Metadata, RAID1: total=4.00GiB, used=2.49GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Do I understand correctly that this could require up to 3 extra devices,
if for instance you arrived in this situation with a RAID6 data profile?
Or is the number even higher for profiles like RAID10?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recovery from full metadata with all device space consumed?

2018-04-19 Thread Hugo Mills
On Thu, Apr 19, 2018 at 03:08:48PM -0700, Drew Bloechl wrote:
> I've got a btrfs filesystem that I can't seem to get back to a useful
> state. The symptom I started with is that rename() operations started
> dying with ENOSPC, and it looks like the metadata allocation on the
> filesystem is full:
> 
> # btrfs fi df /broken
> Data, RAID0: total=3.63TiB, used=67.00GiB
> System, RAID1: total=8.00MiB, used=224.00KiB
> Metadata, RAID1: total=3.00GiB, used=2.50GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> All of the consumable space on the backing devices also seems to be in
> use:
> 
> # btrfs fi show /broken
> Label: 'mon_data'  uuid: 85e52555-7d6d-4346-8b37-8278447eb590
>   Total devices 4 FS bytes used 69.50GiB
>   devid1 size 931.51GiB used 931.51GiB path /dev/sda1
>   devid2 size 931.51GiB used 931.51GiB path /dev/sdb1
>   devid3 size 931.51GiB used 931.51GiB path /dev/sdc1
>   devid4 size 931.51GiB used 931.51GiB path /dev/sdd1
> 
> Even the smallest balance operation I can start fails (this doesn't
> change even with an extra temporary device added to the filesystem):

   Given that both data and metadata levels here require paired
chunks, try adding _two_ temporary devices so that it can allocate a
new block group.

   Hugo.

> # btrfs balance start -v -dusage=1 /broken
> Dumping filters: flags 0x1, state 0x0, force is off
>   DATA (flags 0x2): balancing, usage=1
> ERROR: error during balancing '/broken': No space left on device
> There may be more info in syslog - try dmesg | tail
> # dmesg | tail -1
> [11554.296805] BTRFS info (device sdc1): 757 enospc errors during
> balance
> 
> The current kernel is 4.15.0 from Debian's stretch-backports
> (specifically linux-image-4.15.0-0.bpo.2-amd64), but it was Debian's
> 4.9.30 when the filesystem got into this state. I upgraded it in the
> hopes that a newer kernel would be smarter, but no dice.
> 
> btrfs-progs is currently at v4.7.3.
> 
> Most of what this filesystem stores is Prometheus 1.8's TSDB for its
> metrics, which are constantly written at around 50MB/second. The
> filesystem never really gets full as far as data goes, but there's a lot
> of never-ending churn for what data is there.
> 
> Question 1: Are there other steps that can be tried to rescue a
> filesystem in this state? I still have it mounted in the same state, and
> I'm willing to try other things or extract debugging info.
> 
> Question 2: Is there something I could have done to prevent this from
> happening in the first place?
> 
> Thanks!

-- 
Hugo Mills | Always be sincere, whether you mean it or not.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |  Flanders & Swann
PGP: E2AB1DE4  |The Reluctant Cannibal


signature.asc
Description: Digital signature


Re: Recovery from full metadata with all device space consumed?

2018-04-19 Thread Timofey Titovets
2018-04-20 1:08 GMT+03:00 Drew Bloechl :
> I've got a btrfs filesystem that I can't seem to get back to a useful
> state. The symptom I started with is that rename() operations started
> dying with ENOSPC, and it looks like the metadata allocation on the
> filesystem is full:
>
> # btrfs fi df /broken
> Data, RAID0: total=3.63TiB, used=67.00GiB
> System, RAID1: total=8.00MiB, used=224.00KiB
> Metadata, RAID1: total=3.00GiB, used=2.50GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> All of the consumable space on the backing devices also seems to be in
> use:
>
> # btrfs fi show /broken
> Label: 'mon_data'  uuid: 85e52555-7d6d-4346-8b37-8278447eb590
> Total devices 4 FS bytes used 69.50GiB
> devid1 size 931.51GiB used 931.51GiB path /dev/sda1
> devid2 size 931.51GiB used 931.51GiB path /dev/sdb1
> devid3 size 931.51GiB used 931.51GiB path /dev/sdc1
> devid4 size 931.51GiB used 931.51GiB path /dev/sdd1
>
> Even the smallest balance operation I can start fails (this doesn't
> change even with an extra temporary device added to the filesystem):
>
> # btrfs balance start -v -dusage=1 /broken
> Dumping filters: flags 0x1, state 0x0, force is off
>   DATA (flags 0x2): balancing, usage=1
> ERROR: error during balancing '/broken': No space left on device
> There may be more info in syslog - try dmesg | tail
> # dmesg | tail -1
> [11554.296805] BTRFS info (device sdc1): 757 enospc errors during
> balance
>
> The current kernel is 4.15.0 from Debian's stretch-backports
> (specifically linux-image-4.15.0-0.bpo.2-amd64), but it was Debian's
> 4.9.30 when the filesystem got into this state. I upgraded it in the
> hopes that a newer kernel would be smarter, but no dice.
>
> btrfs-progs is currently at v4.7.3.
>
> Most of what this filesystem stores is Prometheus 1.8's TSDB for its
> metrics, which are constantly written at around 50MB/second. The
> filesystem never really gets full as far as data goes, but there's a lot
> of never-ending churn for what data is there.
>
> Question 1: Are there other steps that can be tried to rescue a
> filesystem in this state? I still have it mounted in the same state, and
> I'm willing to try other things or extract debugging info.
>
> Question 2: Is there something I could have done to prevent this from
> happening in the first place?
>
> Thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Not sure why this happening,
but if you stuck at that state:
  - Reboot to ensure no other problems will exists
  - Add any other external device temporary to FS, as example zram.
After you free small part of fs, delete external dev from FS and
continue balance chunks.

Thanks.

-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Recovery from full metadata with all device space consumed?

2018-04-19 Thread Drew Bloechl
I've got a btrfs filesystem that I can't seem to get back to a useful
state. The symptom I started with is that rename() operations started
dying with ENOSPC, and it looks like the metadata allocation on the
filesystem is full:

# btrfs fi df /broken
Data, RAID0: total=3.63TiB, used=67.00GiB
System, RAID1: total=8.00MiB, used=224.00KiB
Metadata, RAID1: total=3.00GiB, used=2.50GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

All of the consumable space on the backing devices also seems to be in
use:

# btrfs fi show /broken
Label: 'mon_data'  uuid: 85e52555-7d6d-4346-8b37-8278447eb590
Total devices 4 FS bytes used 69.50GiB
devid1 size 931.51GiB used 931.51GiB path /dev/sda1
devid2 size 931.51GiB used 931.51GiB path /dev/sdb1
devid3 size 931.51GiB used 931.51GiB path /dev/sdc1
devid4 size 931.51GiB used 931.51GiB path /dev/sdd1

Even the smallest balance operation I can start fails (this doesn't
change even with an extra temporary device added to the filesystem):

# btrfs balance start -v -dusage=1 /broken
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=1
ERROR: error during balancing '/broken': No space left on device
There may be more info in syslog - try dmesg | tail
# dmesg | tail -1
[11554.296805] BTRFS info (device sdc1): 757 enospc errors during
balance

The current kernel is 4.15.0 from Debian's stretch-backports
(specifically linux-image-4.15.0-0.bpo.2-amd64), but it was Debian's
4.9.30 when the filesystem got into this state. I upgraded it in the
hopes that a newer kernel would be smarter, but no dice.

btrfs-progs is currently at v4.7.3.

Most of what this filesystem stores is Prometheus 1.8's TSDB for its
metrics, which are constantly written at around 50MB/second. The
filesystem never really gets full as far as data goes, but there's a lot
of never-ending churn for what data is there.

Question 1: Are there other steps that can be tried to rescue a
filesystem in this state? I still have it mounted in the same state, and
I'm willing to try other things or extract debugging info.

Question 2: Is there something I could have done to prevent this from
happening in the first place?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hard link not persisted on fsync

2018-04-19 Thread Zygo Blaxell
On Mon, Apr 16, 2018 at 09:35:24AM -0500, Jayashree Mohan wrote:
> Hi,
> 
> The following seems to be a crash consistency bug on btrfs, where in
> the link count is not persisted even after a fsync on the original
> file.
> 
> Consider the following workload :
> creat foo
> link (foo, A/bar)
> fsync(foo)
> ---Crash---
> 
> Now, on recovery we expect the metadata of foo to be persisted i.e
> have a link count of 2. However in btrfs, the link count is 1 and file
> A/bar is not persisted. The expected behaviour would be to persist the
> dependencies of inode foo. That is to say, shouldn't fsync of foo
> persist A/bar and correctly update the link count?

Those dependencies are backward.  foo's inode doesn't depend on anything
but the data in the file foo, and foo's inode itself.

"foo" and "A/bar" are dirents that both depend on the inode of foo, which
implies that "A" and "." must be updated atomically with foo's inode.
If you had called fsync(A) then we'd expect A/bar to exist and the inode
to have a link count of 2.  If you'd called fsync(.) then...well, you
didn't modify "." at all, so I guess either outcome is valid as long as
the inode link count matches the number of dirents referencing the inode.

But then...why does foo exist at all?  I'd expect at least some tests
would end without foo on disk either, since all that was fsync()ed was the
foo inode, not the foo dirent in the directory '.'.  Does btrfs combine
creating foo and updating foo's inode into a single atomic operation?
I vaguely recall that it does exactly that, in order to solve a bug
some years ago.  What happens if you add a rename, e.g.

unlink foo2 # make sure foo2 doesn't exist
creat foo
rename(foo, foo2)
link(foo2, A/bar)
fsync(foo2)

Do you get foo or foo2?  I'd expect foo since you didn't fsync '.',
but maybe rename implies flush and you get foo2.

That's not to say that fsync is not a rich source of filesystem bugs.
btrfs did once have (and maybe still has?) a bug where renames and fsync
can create a dirent with no inode, e.g.

loop continuously:
creat foo
write(foo, data)
fsync(foo)
rename(foo, bar)

and crash somewhere in the middle of the loop, which will create a
dirent "foo" that points to a non-existent inode.

Removing the "fsync" works around the bug.  rename() does a flush anyway,
so the fsync() wasn't needed, but fsync() shouldn't _create_ a filesystem
inconsistency, especially when Googling recommends app developers to
sprinkle fsync()s indiscriminately in their code to prevent their data
from being mangled.

I haven't been tracking to see if that's fixed yet.  I last saw it on
4.11, but I have been aggressively avoiding fsync with eatmydata for
some years now.

> Note that ext4, xfs and f2fs recover to the correct link count of 2
> for the above workload.

Do those filesystems also work if you remove the fsync?  That may be
your answer:  they could be flushing the other metadata earlier, before
you call fsync().

> Let us know what you think about this behavior.
> 
> Thanks,
> Jayashree Mohan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: PGP signature


[PATCH v2 09/16] btrfs: cleanup helpers that reset balance state

2018-04-19 Thread David Sterba
The function __cancel_balance name is confusing with the cancel
operation of balance and it really resets the state of balance back to
zero. The unset_balance_control helper is called only from one place and
simple enough to be inlined.

Signed-off-by: David Sterba 
---
 fs/btrfs/ioctl.c   |  8 
 fs/btrfs/volumes.c | 29 +
 2 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 582bde5b7eda..6724029443fa 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4653,10 +4653,10 @@ static long btrfs_ioctl_balance(struct file *file, void 
__user *arg)
 
 do_balance:
/*
-* Ownership of bctl and filesystem flag BTRFS_FS_EXCL_OP
-* goes to to btrfs_balance.  bctl is freed in __cancel_balance,
-* or, if restriper was paused all the way until unmount, in
-* free_fs_info.  The flag should be cleared after __cancel_balance.
+* Ownership of bctl and filesystem flag BTRFS_FS_EXCL_OP goes to
+* btrfs_balance.  bctl is freed in reset_balance_state, or, if
+* restriper was paused all the way until unmount, in free_fs_info.
+* The flag should be cleared after reset_balance_state.
 */
need_unlock = false;
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 83fbe9d624f5..3b8085b5655d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3192,7 +3192,7 @@ static void update_balance_args(struct 
btrfs_balance_control *bctl)
 /*
  * Should be called with both balance and volume mutexes held to
  * serialize other volume operations (add_dev/rm_dev/resize) with
- * restriper.  Same goes for unset_balance_control.
+ * restriper.  Same goes for reset_balance_state.
  */
 static void set_balance_control(struct btrfs_balance_control *bctl)
 {
@@ -3205,9 +3205,13 @@ static void set_balance_control(struct 
btrfs_balance_control *bctl)
spin_unlock(_info->balance_lock);
 }
 
-static void unset_balance_control(struct btrfs_fs_info *fs_info)
+/*
+ * Clear the balance status in fs_info and delete the balance item from disk.
+ */
+static void reset_balance_state(struct btrfs_fs_info *fs_info)
 {
struct btrfs_balance_control *bctl = fs_info->balance_ctl;
+   int ret;
 
BUG_ON(!fs_info->balance_ctl);
 
@@ -3216,6 +3220,9 @@ static void unset_balance_control(struct btrfs_fs_info 
*fs_info)
spin_unlock(_info->balance_lock);
 
kfree(bctl);
+   ret = del_balance_item(fs_info);
+   if (ret)
+   btrfs_handle_fs_error(fs_info, ret, NULL);
 }
 
 /*
@@ -3752,16 +3759,6 @@ static inline int balance_need_close(struct 
btrfs_fs_info *fs_info)
 atomic_read(_info->balance_cancel_req) == 0);
 }
 
-static void __cancel_balance(struct btrfs_fs_info *fs_info)
-{
-   int ret;
-
-   unset_balance_control(fs_info);
-   ret = del_balance_item(fs_info);
-   if (ret)
-   btrfs_handle_fs_error(fs_info, ret, NULL);
-}
-
 /* Non-zero return value signifies invalidity */
 static inline int validate_convert_profile(struct btrfs_balance_args *bctl_arg,
u64 allowed)
@@ -3916,7 +3913,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 
if ((ret && ret != -ECANCELED && ret != -ENOSPC) ||
balance_need_close(fs_info)) {
-   __cancel_balance(fs_info);
+   reset_balance_state(fs_info);
clear_bit(BTRFS_FS_EXCL_OP, _info->flags);
}
 
@@ -3925,7 +3922,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
return ret;
 out:
if (bctl->flags & BTRFS_BALANCE_RESUME)
-   __cancel_balance(fs_info);
+   reset_balance_state(fs_info);
else
kfree(bctl);
clear_bit(BTRFS_FS_EXCL_OP, _info->flags);
@@ -4095,13 +4092,13 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
   atomic_read(_info->balance_running) == 0);
mutex_lock(_info->balance_mutex);
} else {
-   /* __cancel_balance needs volume_mutex */
+   /* reset_balance_state needs volume_mutex */
mutex_unlock(_info->balance_mutex);
mutex_lock(_info->volume_mutex);
mutex_lock(_info->balance_mutex);
 
if (fs_info->balance_ctl) {
-   __cancel_balance(fs_info);
+   reset_balance_state(fs_info);
clear_bit(BTRFS_FS_EXCL_OP, _info->flags);
}
 
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 12/16] btrfs: track running balance in a simpler way

2018-04-19 Thread David Sterba
Currently fs_info::balance_running is 0 or 1 and does not use the
semantics of atomics. The pause and cancel check for 0, that can happen
only after __btrfs_balance exits for whatever reason.

Parallel calls to balance ioctl may enter btrfs_ioctl_balance multiple
times but will block on the balance_mutex that protects the
fs_info::flags bit.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.h   |  6 +-
 fs/btrfs/disk-io.c |  1 -
 fs/btrfs/ioctl.c   |  6 +++---
 fs/btrfs/volumes.c | 18 ++
 4 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 011cb9a8a967..fda94a264eb7 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -726,6 +726,11 @@ struct btrfs_delayed_root;
  * (device replace, resize, device add/delete, balance)
  */
 #define BTRFS_FS_EXCL_OP   16
+/*
+ * Indicate that balance has been set up from the ioctl and is in the main
+ * phase. The fs_info::balance_ctl is initialized.
+ */
+#define BTRFS_FS_BALANCE_RUNNING   17
 
 struct btrfs_fs_info {
u8 fsid[BTRFS_FSID_SIZE];
@@ -991,7 +996,6 @@ struct btrfs_fs_info {
/* restriper state */
spinlock_t balance_lock;
struct mutex balance_mutex;
-   atomic_t balance_running;
atomic_t balance_pause_req;
atomic_t balance_cancel_req;
struct btrfs_balance_control *balance_ctl;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c0482ecea11f..b62559dfb053 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2168,7 +2168,6 @@ static void btrfs_init_balance(struct btrfs_fs_info 
*fs_info)
 {
spin_lock_init(_info->balance_lock);
mutex_init(_info->balance_mutex);
-   atomic_set(_info->balance_running, 0);
atomic_set(_info->balance_pause_req, 0);
atomic_set(_info->balance_cancel_req, 0);
fs_info->balance_ctl = NULL;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 7c99f74c200e..3dfd5ab2807b 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4506,7 +4506,7 @@ void update_ioctl_balance_args(struct btrfs_fs_info 
*fs_info, int lock,
 
bargs->flags = bctl->flags;
 
-   if (atomic_read(_info->balance_running))
+   if (test_bit(BTRFS_FS_BALANCE_RUNNING, _info->flags))
bargs->state |= BTRFS_BALANCE_STATE_RUNNING;
if (atomic_read(_info->balance_pause_req))
bargs->state |= BTRFS_BALANCE_STATE_PAUSE_REQ;
@@ -4558,7 +4558,7 @@ static long btrfs_ioctl_balance(struct file *file, void 
__user *arg)
mutex_lock(_info->balance_mutex);
if (fs_info->balance_ctl) {
/* this is either (2) or (3) */
-   if (!atomic_read(_info->balance_running)) {
+   if (!test_bit(BTRFS_FS_BALANCE_RUNNING, _info->flags)) {
mutex_unlock(_info->balance_mutex);
/*
 * Lock released to allow other waiters to continue,
@@ -4567,7 +4567,7 @@ static long btrfs_ioctl_balance(struct file *file, void 
__user *arg)
mutex_lock(_info->balance_mutex);
 
if (fs_info->balance_ctl &&
-   !atomic_read(_info->balance_running)) {
+   !test_bit(BTRFS_FS_BALANCE_RUNNING, 
_info->flags)) {
/* this is (3) */
need_unlock = false;
goto locked;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e95cc2f09fdd..a766d2f988c1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3886,13 +3886,14 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
spin_unlock(_info->balance_lock);
}
 
-   atomic_inc(_info->balance_running);
+   ASSERT(!test_bit(BTRFS_FS_BALANCE_RUNNING, _info->flags));
+   set_bit(BTRFS_FS_BALANCE_RUNNING, _info->flags);
mutex_unlock(_info->balance_mutex);
 
ret = __btrfs_balance(fs_info);
 
mutex_lock(_info->balance_mutex);
-   atomic_dec(_info->balance_running);
+   clear_bit(BTRFS_FS_BALANCE_RUNNING, _info->flags);
 
if (bargs) {
memset(bargs, 0, sizeof(*bargs));
@@ -4031,16 +4032,16 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
return -ENOTCONN;
}
 
-   if (atomic_read(_info->balance_running)) {
+   if (test_bit(BTRFS_FS_BALANCE_RUNNING, _info->flags)) {
atomic_inc(_info->balance_pause_req);
mutex_unlock(_info->balance_mutex);
 
wait_event(fs_info->balance_wait_q,
-  atomic_read(_info->balance_running) == 0);
+  !test_bit(BTRFS_FS_BALANCE_RUNNING, 
_info->flags));
 
mutex_lock(_info->balance_mutex);
/* we are good with balance_ctl ripped off from under us */
-   BUG_ON(atomic_read(_info->balance_running));

[PATCH v2 16/16] btrfs: open code set_balance_control

2018-04-19 Thread David Sterba
The helper is quite simple and I'd like to see the locking in the
caller.

Reviewed-by: Anand Jain 
Signed-off-by: David Sterba 
---
 fs/btrfs/volumes.c | 25 -
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 445380c4ac72..6bab436a94d5 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3178,21 +3178,6 @@ static void update_balance_args(struct 
btrfs_balance_control *bctl)
}
 }
 
-/*
- * Should be called with balance mutex held to protect against checking the
- * balance status or progress. Same goes for reset_balance_state.
- */
-static void set_balance_control(struct btrfs_balance_control *bctl)
-{
-   struct btrfs_fs_info *fs_info = bctl->fs_info;
-
-   BUG_ON(fs_info->balance_ctl);
-
-   spin_lock(_info->balance_lock);
-   fs_info->balance_ctl = bctl;
-   spin_unlock(_info->balance_lock);
-}
-
 /*
  * Clear the balance status in fs_info and delete the balance item from disk.
  */
@@ -3878,7 +3863,10 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 
if (!(bctl->flags & BTRFS_BALANCE_RESUME)) {
BUG_ON(ret == -EEXIST);
-   set_balance_control(bctl);
+   BUG_ON(fs_info->balance_ctl);
+   spin_lock(_info->balance_lock);
+   fs_info->balance_ctl = bctl;
+   spin_unlock(_info->balance_lock);
} else {
BUG_ON(ret != -EEXIST);
spin_lock(_info->balance_lock);
@@ -4015,7 +4003,10 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
"cannot set exclusive op status to balance, resume manually");
 
mutex_lock(_info->balance_mutex);
-   set_balance_control(bctl);
+   BUG_ON(fs_info->balance_ctl);
+   spin_lock(_info->balance_lock);
+   fs_info->balance_ctl = bctl;
+   spin_unlock(_info->balance_lock);
mutex_unlock(_info->balance_mutex);
 out:
btrfs_free_path(path);
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 13/16] btrfs: move and comment read-only check in btrfs_cancel_balance

2018-04-19 Thread David Sterba
Balance cannot be started on a read-only filesystem and will have to
finish/exit before eg. going to read-only via remount.

In case the filesystem is forcibly set to read-only after an error,
balance will finish anyway and if the cancel call is too fast it will
just wait for that to happen.

The last case is when the balance is paused after mount but it's
read-only and cancelling would want to delete the item. The test is
moved after the check if balance is running at all, as it looks more
logical to report "no balance running" instead of "read-only
filesystem".

Signed-off-by: David Sterba 
---
 fs/btrfs/volumes.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a766d2f988c1..9e48a4654d3a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4053,15 +4053,20 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
 
 int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
 {
-   if (sb_rdonly(fs_info->sb))
-   return -EROFS;
-
mutex_lock(_info->balance_mutex);
if (!fs_info->balance_ctl) {
mutex_unlock(_info->balance_mutex);
return -ENOTCONN;
}
 
+   /*
+* A paused balance with the item stored on disk can be resumed at
+* mount time if the mount is read-write. Otherwise it's still paused
+* and we must not allow cancelling as it deletes the item.
+*/
+   if (sb_rdonly(fs_info->sb))
+   return -EROFS;
+
atomic_inc(_info->balance_cancel_req);
/*
 * if we are running just wait and return, balance item is
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 15/16] btrfs: use mutex in btrfs_resume_balance_async

2018-04-19 Thread David Sterba
While the spinlock does not cause problems, using the mutex is more
correct and consistent with others. The global status of balance is eg.
checked from btrfs_pause_balance or btrfs_cancel_balance with mutex.

Resuming balance happens during mount or ro->rw remount. In the former
case, no other user of the balance_ctl exists, in the latter, balance
cannot run until the ro/rw transition is finished.

Signed-off-by: David Sterba 
---
 fs/btrfs/volumes.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1abe0fc5c105..445380c4ac72 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3938,12 +3938,12 @@ int btrfs_resume_balance_async(struct btrfs_fs_info 
*fs_info)
 {
struct task_struct *tsk;
 
-   spin_lock(_info->balance_lock);
+   mutex_lock(_info->balance_mutex);
if (!fs_info->balance_ctl) {
-   spin_unlock(_info->balance_lock);
+   mutex_unlock(_info->balance_mutex);
return 0;
}
-   spin_unlock(_info->balance_lock);
+   mutex_unlock(_info->balance_mutex);
 
if (btrfs_test_opt(fs_info, SKIP_BALANCE)) {
btrfs_info(fs_info, "force skipping balance");
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 08/16] btrfs: add sanity check when resuming balance after mount

2018-04-19 Thread David Sterba
Replace a WARN_ON with a proper check and message in case something goes
really wrong and resumed balance cannot set up its exclusive status.
The check is a user friendly assertion, I don't expect to ever happen
under normal circumstances.

Also document that the paused balance starts here and owns the exclusive
op status.

Signed-off-by: David Sterba 
---
 fs/btrfs/volumes.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 5c83ebc8e199..83fbe9d624f5 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4018,7 +4018,19 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
btrfs_balance_sys(leaf, item, _bargs);
btrfs_disk_balance_args_to_cpu(>sys, _bargs);
 
-   WARN_ON(test_and_set_bit(BTRFS_FS_EXCL_OP, _info->flags));
+   /*
+* This should never happen, as the paused balance state is recovered
+* during mount without any chance of other exclusive ops to collide.
+*
+* This gives the exclusive op status to balance and keeps in paused
+* state until user intervention (cancel or umount). If the ownership
+* cannot be assigned, show a message but do not fail. The balance
+* is in a paused state and must have fs_info::balance_ctl properly
+* set up.
+*/
+   if (test_and_set_bit(BTRFS_FS_EXCL_OP, _info->flags))
+   btrfs_warn(fs_info,
+   "cannot set exclusive op status to balance, resume manually");
 
mutex_lock(_info->volume_mutex);
mutex_lock(_info->balance_mutex);
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 11/16] btrfs: kill btrfs_fs_info::volume_mutex

2018-04-19 Thread David Sterba
Mutual exclusion of device add/rm and balance was done by the volume
mutex up to version 3.7. The commit 5ac00addc7ac091109 ("Btrfs: disallow
mutually exclusive admin operations from user mode") added a bit that
essentially tracked the same information.

The status bit has an advantage over a mutex that it can be set without
restrictions of function context, so it started to be used in the
mount-time resuming of balance or device replace.

But we don't really need to track the same information in two ways.

1) After the previous cleanups, the main ioctl handlers for
   add/del/resize copy the EXCL_OP bit next to the volume mutex, here
   it's clearly safe.

2) Resuming balance during mount or after rw remount will set only the
   EXCL_OP bit and the volume_mutex is held in the kernel thread that
   calls btrfs_balance.

3) Resuming device replace during mount or after rw remount is done
   after balance and is excluded by the EXCL_OP bit. It does not take
   the volume_mutex at all and completely relies on the EXCL_OP bit.

4) The resuming of balance and dev-replace cannot hapen at the same time
   as the ioctls cannot be started in parallel. Nevertheless, a crafted
   image could trigger that and a warning is printed.

5) Balance is normally excluded by EXCL_OP and also uses own mutex to
   protect against concurrent access to its status data. There's some
   trickery to maintain the right lock nesting in case we need to
   reexamine the status in btrfs_ioctl_balance. The volume_mutex is
   removed and the unlock/lock sequence is left in place as we might
   expect other waiters to proceed.

6) Similar to 5, the unlock/lock sequence is kept in
   btrfs_cancel_balance to allow waiters to continue.

Reviewed-by: Anand Jain 
Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.h   |  1 -
 fs/btrfs/disk-io.c |  1 -
 fs/btrfs/extent-tree.c |  2 +-
 fs/btrfs/ioctl.c   | 17 -
 fs/btrfs/volumes.c | 36 ++--
 5 files changed, 15 insertions(+), 42 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0eb55825862a..011cb9a8a967 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -826,7 +826,6 @@ struct btrfs_fs_info {
struct mutex transaction_kthread_mutex;
struct mutex cleaner_mutex;
struct mutex chunk_mutex;
-   struct mutex volume_mutex;
 
/*
 * this is taken to make sure we don't set block groups ro after
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 07b5e6f7df67..c0482ecea11f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2605,7 +2605,6 @@ int open_ctree(struct super_block *sb,
mutex_init(_info->chunk_mutex);
mutex_init(_info->transaction_kthread_mutex);
mutex_init(_info->cleaner_mutex);
-   mutex_init(_info->volume_mutex);
mutex_init(_info->ro_block_group_mutex);
init_rwsem(_info->commit_root_sem);
init_rwsem(_info->cleanup_work_sem);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f30548d7e0d2..90d28a3727c6 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4122,7 +4122,7 @@ static void set_avail_alloc_bits(struct btrfs_fs_info 
*fs_info, u64 flags)
  * returns target flags in extended format or 0 if restripe for this
  * chunk_type is not in progress
  *
- * should be called with either volume_mutex or balance_lock held
+ * should be called with balance_lock held
  */
 static u64 get_restripe_target(struct btrfs_fs_info *fs_info, u64 flags)
 {
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 6724029443fa..7c99f74c200e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1470,7 +1470,6 @@ static noinline int btrfs_ioctl_resize(struct file *file,
return BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
}
 
-   mutex_lock(_info->volume_mutex);
vol_args = memdup_user(arg, sizeof(*vol_args));
if (IS_ERR(vol_args)) {
ret = PTR_ERR(vol_args);
@@ -1578,7 +1577,6 @@ static noinline int btrfs_ioctl_resize(struct file *file,
 out_free:
kfree(vol_args);
 out:
-   mutex_unlock(_info->volume_mutex);
clear_bit(BTRFS_FS_EXCL_OP, _info->flags);
mnt_drop_write_file(file);
return ret;
@@ -2626,7 +2624,6 @@ static long btrfs_ioctl_add_dev(struct btrfs_fs_info 
*fs_info, void __user *arg)
if (test_and_set_bit(BTRFS_FS_EXCL_OP, _info->flags))
return BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
 
-   mutex_lock(_info->volume_mutex);
vol_args = memdup_user(arg, sizeof(*vol_args));
if (IS_ERR(vol_args)) {
ret = PTR_ERR(vol_args);
@@ -2641,7 +2638,6 @@ static long btrfs_ioctl_add_dev(struct btrfs_fs_info 
*fs_info, void __user *arg)
 
kfree(vol_args);
 out:
-   mutex_unlock(_info->volume_mutex);
clear_bit(BTRFS_FS_EXCL_OP, _info->flags);
return ret;
 }
@@ -2674,7 +2670,6 @@ static long 

[PATCH v2 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start

2018-04-19 Thread David Sterba
The volume mutex does not protect against anything in this case, the
comment about scrub is right but not related to locking and looks
confusing. The comment in btrfs_find_device_missing_or_by_path is wrong
and confusing too.

The device_list_mutex is not held here to protect device lookup, but in
this case device replace cannot run in parallel with device removal (due
to exclusive op protection), so we don't need further locking here.

Reviewed-by: Anand Jain 
Signed-off-by: David Sterba 
---
 fs/btrfs/dev-replace.c | 7 +--
 fs/btrfs/volumes.c | 4 
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 346bd460f8e7..ba011af9b0d2 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -426,18 +426,13 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
struct btrfs_device *tgt_device = NULL;
struct btrfs_device *src_device = NULL;
 
-   /* the disk copy procedure reuses the scrub code */
-   mutex_lock(_info->volume_mutex);
ret = btrfs_find_device_by_devspec(fs_info, srcdevid,
srcdev_name, _device);
-   if (ret) {
-   mutex_unlock(_info->volume_mutex);
+   if (ret)
return ret;
-   }
 
ret = btrfs_init_dev_replace_tgtdev(fs_info, tgtdev_name,
src_device, _device);
-   mutex_unlock(_info->volume_mutex);
if (ret)
return ret;
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 3b8085b5655d..8a5b022e9cde 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2198,10 +2198,6 @@ int btrfs_find_device_missing_or_by_path(struct 
btrfs_fs_info *fs_info,
struct btrfs_device *tmp;
 
devices = _info->fs_devices->devices;
-   /*
-* It is safe to read the devices since the volume_mutex
-* is held by the caller.
-*/
list_for_each_entry(tmp, devices, dev_list) {
if (test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
>dev_state) && !tmp->bdev) {
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 14/16] btrfs: drop lock parameter from update_ioctl_balance_args and rename

2018-04-19 Thread David Sterba
The parameter controls locking of the stats part but we can lock it
unconditionally, as this only happens once when balance starts. This is
not performance critical.

Add the prefix for an exported function.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/ioctl.c   | 14 +-
 fs/btrfs/volumes.c |  2 +-
 3 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index fda94a264eb7..7ac2e69bfb03 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3258,7 +3258,7 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
  u64 newer_than, unsigned long max_pages);
 void btrfs_get_block_group_info(struct list_head *groups_list,
struct btrfs_ioctl_space_info *space);
-void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
+void btrfs_update_ioctl_balance_args(struct btrfs_fs_info *fs_info,
   struct btrfs_ioctl_balance_args *bargs);
 ssize_t btrfs_dedupe_file_range(struct file *src_file, u64 loff, u64 olen,
   struct file *dst_file, u64 dst_loff);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 3dfd5ab2807b..b85d2a71136d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4499,7 +4499,7 @@ static long btrfs_ioctl_logical_to_ino(struct 
btrfs_fs_info *fs_info,
return ret;
 }
 
-void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
+void btrfs_update_ioctl_balance_args(struct btrfs_fs_info *fs_info,
   struct btrfs_ioctl_balance_args *bargs)
 {
struct btrfs_balance_control *bctl = fs_info->balance_ctl;
@@ -4517,13 +4517,9 @@ void update_ioctl_balance_args(struct btrfs_fs_info 
*fs_info, int lock,
memcpy(>meta, >meta, sizeof(bargs->meta));
memcpy(>sys, >sys, sizeof(bargs->sys));
 
-   if (lock) {
-   spin_lock(_info->balance_lock);
-   memcpy(>stat, >stat, sizeof(bargs->stat));
-   spin_unlock(_info->balance_lock);
-   } else {
-   memcpy(>stat, >stat, sizeof(bargs->stat));
-   }
+   spin_lock(_info->balance_lock);
+   memcpy(>stat, >stat, sizeof(bargs->stat));
+   spin_unlock(_info->balance_lock);
 }
 
 static long btrfs_ioctl_balance(struct file *file, void __user *arg)
@@ -4709,7 +4705,7 @@ static long btrfs_ioctl_balance_progress(struct 
btrfs_fs_info *fs_info,
goto out;
}
 
-   update_ioctl_balance_args(fs_info, 1, bargs);
+   btrfs_update_ioctl_balance_args(fs_info, bargs);
 
if (copy_to_user(arg, bargs, sizeof(*bargs)))
ret = -EFAULT;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9e48a4654d3a..1abe0fc5c105 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3897,7 +3897,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 
if (bargs) {
memset(bargs, 0, sizeof(*bargs));
-   update_ioctl_balance_args(fs_info, 0, bargs);
+   btrfs_update_ioctl_balance_args(fs_info, bargs);
}
 
if ((ret && ret != -ECANCELED && ret != -ENOSPC) ||
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 07/16] btrfs: add proper safety check before resuming dev-replace

2018-04-19 Thread David Sterba
The device replace is paused by unmount or read only remount, and
resumed on next mount or write remount.

The exclusive status should be checked properly as it's a global
invariant and we must not allow 2 operations run. In this case, the
balance can be also paused and resumed under same conditions. It's
always checked first so dev-replace could see the EXCL_OP already taken,
BUT, the ioctl would never let start both at the same time.

Replace the WARN_ON with message and return 0, indicating no error as
this is purely theoretical and the user will be informed. Resolving that
manually should be possible by waiting for the other operation to finish
or cancel the paused state.

Signed-off-by: David Sterba 
---
 fs/btrfs/dev-replace.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 7a87ffad041e..346bd460f8e7 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -908,7 +908,17 @@ int btrfs_resume_dev_replace_async(struct btrfs_fs_info 
*fs_info)
}
btrfs_dev_replace_write_unlock(dev_replace);
 
-   WARN_ON(test_and_set_bit(BTRFS_FS_EXCL_OP, _info->flags));
+   /*
+* This could collide with a paused balance, but the exclusive op logic
+* should never allow both to start and pause. We don't want to allow
+* dev-replace to start anyway.
+*/
+   if (test_and_set_bit(BTRFS_FS_EXCL_OP, _info->flags)) {
+   btrfs_info(fs_info,
+   "cannot resume dev-replace, other exclusive operation running");
+   return 0;
+   }
+
task = kthread_run(btrfs_dev_replace_kthread, fs_info, "btrfs-devrepl");
return PTR_ERR_OR_ZERO(task);
 }
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 06/16] btrfs: move clearing of EXCL_OP out of __cancel_balance

2018-04-19 Thread David Sterba
Make the clearning visible in the callers so we can pair it with the
test_and_set part.

Signed-off-by: David Sterba 
---
 fs/btrfs/ioctl.c   |  2 +-
 fs/btrfs/volumes.c | 13 +++--
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b93dea445802..582bde5b7eda 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4656,7 +4656,7 @@ static long btrfs_ioctl_balance(struct file *file, void 
__user *arg)
 * Ownership of bctl and filesystem flag BTRFS_FS_EXCL_OP
 * goes to to btrfs_balance.  bctl is freed in __cancel_balance,
 * or, if restriper was paused all the way until unmount, in
-* free_fs_info.  The flag is cleared in __cancel_balance.
+* free_fs_info.  The flag should be cleared after __cancel_balance.
 */
need_unlock = false;
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e0bd181dc9e0..5c83ebc8e199 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3760,8 +3760,6 @@ static void __cancel_balance(struct btrfs_fs_info 
*fs_info)
ret = del_balance_item(fs_info);
if (ret)
btrfs_handle_fs_error(fs_info, ret, NULL);
-
-   clear_bit(BTRFS_FS_EXCL_OP, _info->flags);
 }
 
 /* Non-zero return value signifies invalidity */
@@ -3919,6 +3917,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
if ((ret && ret != -ECANCELED && ret != -ENOSPC) ||
balance_need_close(fs_info)) {
__cancel_balance(fs_info);
+   clear_bit(BTRFS_FS_EXCL_OP, _info->flags);
}
 
wake_up(_info->balance_wait_q);
@@ -3927,10 +3926,10 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 out:
if (bctl->flags & BTRFS_BALANCE_RESUME)
__cancel_balance(fs_info);
-   else {
+   else
kfree(bctl);
-   clear_bit(BTRFS_FS_EXCL_OP, _info->flags);
-   }
+   clear_bit(BTRFS_FS_EXCL_OP, _info->flags);
+
return ret;
 }
 
@@ -4089,8 +4088,10 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
mutex_lock(_info->volume_mutex);
mutex_lock(_info->balance_mutex);
 
-   if (fs_info->balance_ctl)
+   if (fs_info->balance_ctl) {
__cancel_balance(fs_info);
+   clear_bit(BTRFS_FS_EXCL_OP, _info->flags);
+   }
 
mutex_unlock(_info->volume_mutex);
}
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 05/16] btrfs: move volume_mutex to callers of btrfs_rm_device

2018-04-19 Thread David Sterba
Move locking and unlocking next to the BTRFS_FS_EXCL_OP bit manipulation
so it's obvious that the two happen at the same time.

Reviewed-by: Anand Jain 
Signed-off-by: David Sterba 
---
 fs/btrfs/ioctl.c   | 4 
 fs/btrfs/volumes.c | 2 --
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index ac85e07f567b..b93dea445802 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2674,6 +2674,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void 
__user *arg)
ret = BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
goto out;
}
+   mutex_lock(_info->volume_mutex);
 
if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID) {
ret = btrfs_rm_device(fs_info, NULL, vol_args->devid);
@@ -2681,6 +2682,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void 
__user *arg)
vol_args->name[BTRFS_SUBVOL_NAME_MAX] = '\0';
ret = btrfs_rm_device(fs_info, vol_args->name, 0);
}
+   mutex_unlock(_info->volume_mutex);
clear_bit(BTRFS_FS_EXCL_OP, _info->flags);
 
if (!ret) {
@@ -2716,6 +2718,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void 
__user *arg)
ret = BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
goto out_drop_write;
}
+   mutex_lock(_info->volume_mutex);
 
vol_args = memdup_user(arg, sizeof(*vol_args));
if (IS_ERR(vol_args)) {
@@ -2730,6 +2733,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void 
__user *arg)
btrfs_info(fs_info, "disk deleted %s", vol_args->name);
kfree(vol_args);
 out:
+   mutex_unlock(_info->volume_mutex);
clear_bit(BTRFS_FS_EXCL_OP, _info->flags);
 out_drop_write:
mnt_drop_write_file(file);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index df2d6d06e014..e0bd181dc9e0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1932,7 +1932,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const 
char *device_path,
u64 num_devices;
int ret = 0;
 
-   mutex_lock(_info->volume_mutex);
mutex_lock(_mutex);
 
num_devices = fs_info->fs_devices->num_devices;
@@ -2048,7 +2047,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const 
char *device_path,
 
 out:
mutex_unlock(_mutex);
-   mutex_unlock(_info->volume_mutex);
return ret;
 
 error_undo:
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 01/16] btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller

2018-04-19 Thread David Sterba
The function is called once and is fairly small, we can merge it with
the caller.

Reviewed-by: Anand Jain 
Signed-off-by: David Sterba 
---
 fs/btrfs/dev-replace.c | 16 +++-
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 0d203633bb96..c0397cd2a3ba 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -45,8 +45,6 @@ static void btrfs_dev_replace_update_device_in_mapping_tree(
struct btrfs_device *srcdev,
struct btrfs_device *tgtdev);
 static int btrfs_dev_replace_kthread(void *data);
-static int btrfs_dev_replace_continue_on_mount(struct btrfs_fs_info *fs_info);
-
 
 int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info)
 {
@@ -822,6 +820,7 @@ static int btrfs_dev_replace_kthread(void *data)
struct btrfs_fs_info *fs_info = data;
struct btrfs_dev_replace *dev_replace = _info->dev_replace;
u64 progress;
+   int ret;
 
progress = btrfs_dev_replace_progress(fs_info);
progress = div_u64(progress, 10);
@@ -832,23 +831,14 @@ static int btrfs_dev_replace_kthread(void *data)
btrfs_dev_name(dev_replace->tgtdev),
(unsigned int)progress);
 
-   btrfs_dev_replace_continue_on_mount(fs_info);
-   clear_bit(BTRFS_FS_EXCL_OP, _info->flags);
-
-   return 0;
-}
-
-static int btrfs_dev_replace_continue_on_mount(struct btrfs_fs_info *fs_info)
-{
-   struct btrfs_dev_replace *dev_replace = _info->dev_replace;
-   int ret;
-
ret = btrfs_scrub_dev(fs_info, dev_replace->srcdev->devid,
  dev_replace->committed_cursor_left,
  btrfs_device_get_total_bytes(dev_replace->srcdev),
  _replace->scrub_progress, 0, 1);
ret = btrfs_dev_replace_finishing(fs_info, ret);
WARN_ON(ret);
+
+   clear_bit(BTRFS_FS_EXCL_OP, _info->flags);
return 0;
 }
 
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 04/16] btrfs: move btrfs_init_dev_replace_tgtdev to dev-replace.c and make static

2018-04-19 Thread David Sterba
The function logically belongs there and there's only a single caller,
no need to export it. No code changes.

Reviewed-by: Anand Jain 
Signed-off-by: David Sterba 
---
 fs/btrfs/dev-replace.c | 99 ++
 fs/btrfs/volumes.c | 99 --
 fs/btrfs/volumes.h |  4 --
 3 files changed, 99 insertions(+), 103 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index e3b8a79c1665..7a87ffad041e 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -188,6 +188,105 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info)
return ret;
 }
 
+/*
+ * Initialize a new device for device replace target from a given source dev
+ * and path.
+ *
+ * Return 0 and new device in @device_out, otherwise return < 0
+ */
+static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
+ const char *device_path,
+ struct btrfs_device *srcdev,
+ struct btrfs_device **device_out)
+{
+   struct btrfs_device *device;
+   struct block_device *bdev;
+   struct list_head *devices;
+   struct rcu_string *name;
+   u64 devid = BTRFS_DEV_REPLACE_DEVID;
+   int ret = 0;
+
+   *device_out = NULL;
+   if (fs_info->fs_devices->seeding) {
+   btrfs_err(fs_info, "the filesystem is a seed filesystem!");
+   return -EINVAL;
+   }
+
+   bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
+ fs_info->bdev_holder);
+   if (IS_ERR(bdev)) {
+   btrfs_err(fs_info, "target device %s is invalid!", device_path);
+   return PTR_ERR(bdev);
+   }
+
+   filemap_write_and_wait(bdev->bd_inode->i_mapping);
+
+   devices = _info->fs_devices->devices;
+   list_for_each_entry(device, devices, dev_list) {
+   if (device->bdev == bdev) {
+   btrfs_err(fs_info,
+ "target device is in the filesystem!");
+   ret = -EEXIST;
+   goto error;
+   }
+   }
+
+
+   if (i_size_read(bdev->bd_inode) <
+   btrfs_device_get_total_bytes(srcdev)) {
+   btrfs_err(fs_info,
+ "target device is smaller than source device!");
+   ret = -EINVAL;
+   goto error;
+   }
+
+
+   device = btrfs_alloc_device(NULL, , NULL);
+   if (IS_ERR(device)) {
+   ret = PTR_ERR(device);
+   goto error;
+   }
+
+   name = rcu_string_strdup(device_path, GFP_KERNEL);
+   if (!name) {
+   btrfs_free_device(device);
+   ret = -ENOMEM;
+   goto error;
+   }
+   rcu_assign_pointer(device->name, name);
+
+   mutex_lock(_info->fs_devices->device_list_mutex);
+   set_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state);
+   device->generation = 0;
+   device->io_width = fs_info->sectorsize;
+   device->io_align = fs_info->sectorsize;
+   device->sector_size = fs_info->sectorsize;
+   device->total_bytes = btrfs_device_get_total_bytes(srcdev);
+   device->disk_total_bytes = btrfs_device_get_disk_total_bytes(srcdev);
+   device->bytes_used = btrfs_device_get_bytes_used(srcdev);
+   device->commit_total_bytes = srcdev->commit_total_bytes;
+   device->commit_bytes_used = device->bytes_used;
+   device->fs_info = fs_info;
+   device->bdev = bdev;
+   set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, >dev_state);
+   set_bit(BTRFS_DEV_STATE_REPLACE_TGT, >dev_state);
+   device->mode = FMODE_EXCL;
+   device->dev_stats_valid = 1;
+   set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
+   device->fs_devices = fs_info->fs_devices;
+   list_add(>dev_list, _info->fs_devices->devices);
+   fs_info->fs_devices->num_devices++;
+   fs_info->fs_devices->open_devices++;
+   mutex_unlock(_info->fs_devices->device_list_mutex);
+
+   *device_out = device;
+   return 0;
+
+error:
+   blkdev_put(bdev, FMODE_EXCL);
+   return ret;
+}
+
 /*
  * called from commit_transaction. Writes changed device replace state to
  * disk.
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ca5521cc1a5b..df2d6d06e014 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2592,105 +2592,6 @@ int btrfs_init_new_device(struct btrfs_fs_info 
*fs_info, const char *device_path
return ret;
 }
 
-/*
- * Initialize a new device for device replace target from a given source dev
- * and path.
- *
- * Return 0 and new device in @device_out, otherwise return < 0
- */
-int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
- const char *device_path,
- struct btrfs_device *srcdev,
-  

[PATCH v2 03/16] btrfs: export and rename free_device

2018-04-19 Thread David Sterba
The function will be used outside of volumes.c, the allocation
btrfs_alloc_device is also exported.

Reviewed-by: Anand Jain 
Signed-off-by: David Sterba 
---
 fs/btrfs/volumes.c | 24 
 fs/btrfs/volumes.h |  1 +
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8086c5687b72..ca5521cc1a5b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -246,7 +246,7 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 
*fsid)
return fs_devs;
 }
 
-static void free_device(struct btrfs_device *device)
+void btrfs_free_device(struct btrfs_device *device)
 {
rcu_string_free(device->name);
bio_put(device->flush_bio);
@@ -261,7 +261,7 @@ static void free_fs_devices(struct btrfs_fs_devices 
*fs_devices)
device = list_entry(fs_devices->devices.next,
struct btrfs_device, dev_list);
list_del(>dev_list);
-   free_device(device);
+   btrfs_free_device(device);
}
kfree(fs_devices);
 }
@@ -294,7 +294,7 @@ void __exit btrfs_cleanup_fs_uuids(void)
 /*
  * Returns a pointer to a new btrfs_device on success; ERR_PTR() on error.
  * Returned struct is not linked onto any lists and must be destroyed using
- * free_device.
+ * btrfs_free_device.
  */
 static struct btrfs_device *__alloc_device(void)
 {
@@ -650,7 +650,7 @@ static void btrfs_free_stale_devices(const char *path,
} else {
fs_devs->num_devices--;
list_del(>dev_list);
-   free_device(dev);
+   btrfs_free_device(dev);
}
}
}
@@ -765,7 +765,7 @@ static noinline struct btrfs_device *device_list_add(const 
char *path,
 
name = rcu_string_strdup(path, GFP_NOFS);
if (!name) {
-   free_device(device);
+   btrfs_free_device(device);
return ERR_PTR(-ENOMEM);
}
rcu_assign_pointer(device->name, name);
@@ -878,7 +878,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct 
btrfs_fs_devices *orig)
name = rcu_string_strdup(orig_dev->name->str,
GFP_KERNEL);
if (!name) {
-   free_device(device);
+   btrfs_free_device(device);
goto error;
}
rcu_assign_pointer(device->name, name);
@@ -950,7 +950,7 @@ void btrfs_free_extra_devids(struct btrfs_fs_devices 
*fs_devices, int step)
}
list_del_init(>dev_list);
fs_devices->num_devices--;
-   free_device(device);
+   btrfs_free_device(device);
}
 
if (fs_devices->seed) {
@@ -968,7 +968,7 @@ static void free_device_rcu(struct rcu_head *head)
struct btrfs_device *device;
 
device = container_of(head, struct btrfs_device, rcu);
-   free_device(device);
+   btrfs_free_device(device);
 }
 
 static void btrfs_close_bdev(struct btrfs_device *device)
@@ -2582,7 +2582,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, 
const char *device_path
if (trans)
btrfs_end_transaction(trans);
 error_free_device:
-   free_device(device);
+   btrfs_free_device(device);
 error:
blkdev_put(bdev, FMODE_EXCL);
if (seeding_dev && !unlocked) {
@@ -2653,7 +2653,7 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info 
*fs_info,
 
name = rcu_string_strdup(device_path, GFP_KERNEL);
if (!name) {
-   free_device(device);
+   btrfs_free_device(device);
ret = -ENOMEM;
goto error;
}
@@ -6419,7 +6419,7 @@ static struct btrfs_device *add_missing_dev(struct 
btrfs_fs_devices *fs_devices,
  *
  * Return: a pointer to a new  btrfs_device on success; ERR_PTR()
  * on error.  Returned struct is not linked onto any lists and must be
- * destroyed with free_device.
+ * destroyed with btrfs_free_device.
  */
 struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
const u64 *devid,
@@ -6442,7 +6442,7 @@ struct btrfs_device *btrfs_alloc_device(struct 
btrfs_fs_info *fs_info,
 
ret = find_next_devid(fs_info, );
if (ret) {
-   free_device(dev);
+   btrfs_free_device(dev);
return ERR_PTR(ret);
}
}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index d1fcaea9fef5..45e3ece21290 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -434,6 +434,7 @@ int 

[PATCH v2 00/16] Kill fs_info::volume_mutex

2018-04-19 Thread David Sterba
This series gets rid of the volume mutex because it's redundant. Updated
branch: git://github.com/kdave/btrfs-devel dev/remove-volume-mutex

Changes for v2:
- sanity check in balance resume is only a warning
- read-only check in balance cancel remains and is only moved
- typo fixes

The fstests seem to pass all relevant tests now and qualifies for
conditional addition to for-next.

David Sterba (16):
  btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller
  btrfs: make success path out of btrfs_init_dev_replace_tgtdev more clear
  btrfs: export and rename free_device
  btrfs: move btrfs_init_dev_replace_tgtdev to dev-replace.c and make static
  btrfs: move volume_mutex to callers of btrfs_rm_device
  btrfs: move clearing of EXCL_OP out of __cancel_balance
  btrfs: add proper safety check before resuming dev-replace
  btrfs: add sanity check when resuming balance after mount
  btrfs: cleanup helpers that reset balance state
  btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start
  btrfs: kill btrfs_fs_info::volume_mutex
  btrfs: track running balance in a simpler way
  btrfs: move and comment read-only check in btrfs_cancel_balance
  btrfs: drop lock parameter from update_ioctl_balance_args and rename
  btrfs: use mutex in btrfs_resume_balance_async
  btrfs: open code set_balance_control

 fs/btrfs/ctree.h   |   9 +-
 fs/btrfs/dev-replace.c | 135 +
 fs/btrfs/disk-io.c |   2 -
 fs/btrfs/extent-tree.c |   2 +-
 fs/btrfs/ioctl.c   |  41 +++-
 fs/btrfs/volumes.c | 263 +++--
 fs/btrfs/volumes.h |   5 +-
 7 files changed, 216 insertions(+), 241 deletions(-)

-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 02/16] btrfs: make success path out of btrfs_init_dev_replace_tgtdev more clear

2018-04-19 Thread David Sterba
This is a preparatory cleanup that will make clear that the only
successful way out of btrfs_init_dev_replace_tgtdev will also set the
device_out to a valid pointer. With this guarantee, the callers can be
simplified.

Reviewed-by: Anand Jain 
Signed-off-by: David Sterba 
---
 fs/btrfs/dev-replace.c | 1 -
 fs/btrfs/volumes.c | 8 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index c0397cd2a3ba..e3b8a79c1665 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -370,7 +370,6 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
dev_replace->cont_reading_from_srcdev_mode = read_src;
WARN_ON(!src_device);
dev_replace->srcdev = src_device;
-   WARN_ON(!tgt_device);
dev_replace->tgtdev = tgt_device;
 
btrfs_info_in_rcu(fs_info,
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 93f8f17cacca..8086c5687b72 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2592,6 +2592,12 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, 
const char *device_path
return ret;
 }
 
+/*
+ * Initialize a new device for device replace target from a given source dev
+ * and path.
+ *
+ * Return 0 and new device in @device_out, otherwise return < 0
+ */
 int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
  const char *device_path,
  struct btrfs_device *srcdev,
@@ -2678,7 +2684,7 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info 
*fs_info,
mutex_unlock(_info->fs_devices->device_list_mutex);
 
*device_out = device;
-   return ret;
+   return 0;
 
 error:
blkdev_put(bdev, FMODE_EXCL);
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs: Add incompat flags check for btrfs_check_super_valid()

2018-04-19 Thread Nikolay Borisov


On 19.04.2018 18:31, David Sterba wrote:
> On Thu, Apr 19, 2018 at 07:10:30PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2018年04月19日 18:59, Nikolay Borisov wrote:
>>>
>>>
>>> On 19.04.2018 12:38, Qu Wenruo wrote:
 Although we have already checked incompat flags manually before really
 mounting it, we could still enhance btrfs_check_super_valid() to check
 incompat flags for later write time super block validation check.

 This patch adds such incompat flags check for btrfs_check_super_valid(),
 currently it won't be triggered, but provides the basis for later write
 time check.

 Signed-off-by: Qu Wenruo 
>>>
>>> Reviewed-by: Nikolay Borisov 
>>>
 ---
  fs/btrfs/disk-io.c | 13 +
  1 file changed, 13 insertions(+)

 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 60caa68c3618..ec123158f051 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -4104,6 +4104,19 @@ static int btrfs_check_super_valid(struct 
 btrfs_fs_info *fs_info)
>>>
>>> nit: Thinking out loud here - wouldn't it make more sense to name the
>>> function btrfs_validate_super. check_super_valid sounds a bit cumbersome
>>> to me. What do you think ?
>>
>> Indeed, I also like to remove the btrfs_ prefix since it's a static
>> function.
>> validate_super() looks much better.
> 
> It's not necessary to remove the btrfs_ prefix from all static
> functions, sometimes the functions appear on stacks or mixed with other
> subystem helpers that have generic names. The prefix makes it clear that
> it's our function.

I agree with David, just make it btrfs_validate_super
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


status page

2018-04-19 Thread Gandalf Corvotempesta
Hi to all,
as kernel 4.16 is out and 4.17 in in RC, would be possible to update
BTRFS status page
https://btrfs.wiki.kernel.org/index.php/Status to reflect 4.16 stability ?

That page is still based on kernel 4.15 (marked as EOL here:
https://www.kernel.org/)

Thank you
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fstests: btrfs/011 cleanup the functions

2018-04-19 Thread Anand Jain
Use common helper functions where needed. By doing this it improves
code readability and debugging of it.

Signed-off-by: Anand Jain 
---
 tests/btrfs/011 | 197 ++--
 1 file changed, 63 insertions(+), 134 deletions(-)

diff --git a/tests/btrfs/011 b/tests/btrfs/011
index 4aae22dc6292..95bc182a0edf 100755
--- a/tests/btrfs/011
+++ b/tests/btrfs/011
@@ -65,149 +65,85 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 # real QA test starts here
 _supported_fs btrfs
 _require_scratch_nocheck
-_require_scratch_dev_pool 4
-_require_btrfs_dump_super
+_require_scratch_dev_pool 5
+_require_scratch_dev_pool_equal_size
+_require_command "$WIPEFS_PROG" wipefs
 
 rm -f $seqres.full
 rm -f $tmp.tmp
 
 echo "*** test btrfs replace"
 
+fill_scratch()
+{
+   local fssize=$1
+
+   # Fill inline extents.
+   for i in `seq 1 500`; do
+   _ddt of=$SCRATCH_MNT/s$i bs=3800 count=1
+   done > /dev/null 2>&1
+
+   # Fill data extents.
+   for i in `seq 1 500`; do
+   _ddt of=$SCRATCH_MNT/l$i bs=16385 count=1
+   done > /dev/null 2>&1
+   _ddt of=$SCRATCH_MNT/t0 bs=1M count=1 > /dev/null 2>&1
+   for i in `seq $fssize`; do
+   cp $SCRATCH_MNT/t0 $SCRATCH_MNT/t$i || _fail "cp failed"
+   done > /dev/null 2>> $seqres.full
+   sync; sync
+}
+
 workout()
 {
local mkfs_options="$1"
local num_devs4raid="$2"
local with_cancel="$3"
-   local quick="$4"
+   local fssize="$4"
local source_dev="`echo ${SCRATCH_DEV_POOL} | awk '{print $1}'`"
-   local target_dev="`echo ${SCRATCH_DEV_POOL} | awk '{print $NF}'`"
-   local fssize
+   local quick="quick"
 
-   echo >> $seqres.full
-   echo "-workout \"$1\" $2 $3 $4---" >> $seqres.full
+   [[ $fssize != 64 ]] && quick="thorough"
 
-   if [ "`echo $SCRATCH_DEV_POOL | wc -w`" -lt `expr $num_devs4raid + 1` 
]; then
-   echo "Skip workout $1 $2 $3 $4" >> $seqres.full
-   echo "Too few devices in SCRATCH_DEV_POOL $SCRATCH_DEV_POOL, 
required: " `expr $num_devs4raid + 1` >> $seqres.full
-   return 0
-   fi
+   echo -e "\\n-workout \"$1\" $2 $3 $4---" >> $seqres.full
 
-   # use min number of disks in order to fill up the disk to replace
-   # as much as possible
-   local used_devs_without_1st="`echo $SCRATCH_DEV_POOL | \
-   awk '{ORS=\" \"; for (i = 2; i <= (NF - 1 < '$num_devs4raid' ? 
NF - 1 : '$num_devs4raid'); i++) print $i}'`"
-
-   # _scratch_mkfs adds the 1st device again (which is $SCRATCH_DEV)
-   _scratch_mkfs $mkfs_options $used_devs_without_1st >> $seqres.full 2>&1 
|| _fail "mkfs failed"
-
-   # create a filesystem on the target device just for the sake of
-   # being able to query its size with btrfs-show-super
-   $MKFS_BTRFS_PROG $MKFS_OPTIONS $target_dev >> $seqres.full 2>&1 || 
_fail "mkfs target_dev failed"
-
-   # The source and target devices for the replace operation are
-   # arbitrarily chosen out of the pool. Since the target device mustn't
-   # be smaller than the source device, the requirement for this test is
-   # that all devices have _exactly_ the same size. If this is not the
-   # case, this test is not run.
-   local num_lines=`$BTRFS_SHOW_SUPER_PROG $SCRATCH_DEV 
$used_devs_without_1st $target_dev | grep dev_item.total_bytes | uniq | wc -l`
-   if [ $num_lines -gt 1 ]; then
-   _notrun "Different device sizes detected"
-   fi
+   $WIPEFS_PROG -a $SCRATCH_DEV_POOL > /dev/null 2>&1
+   _scratch_dev_pool_get $num_devs4raid
+   _spare_dev_get
 
-   if [ `$BTRFS_SHOW_SUPER_PROG $SCRATCH_DEV | grep dev_item.total_bytes | 
awk '{print $2}'` -lt 25 ]; then
-   _notrun "device size too small"
-   fi
+   _scratch_pool_mkfs $mkfs_options >> $seqres.full 2>&1 ||\
+   _fail "mkfs failed"
 
_scratch_mount
+   _require_fs_space $SCRATCH_MNT $((2 * 512 * 1024)) #2.5G
 
-   echo "$BTRFS_UTIL_PROG filesystem show" >> $seqres.full
-   $BTRFS_UTIL_PROG filesystem show >> $seqres.full
+   fill_scratch $fssize
+   _run_btrfs_util_prog filesystem show -m $SCRATCH_MNT
 
-   # Generate metadata and some minimal user data, generate 500 times
-   # 20K extents in the data chunk and fill up metadata with inline
-   # extents.
-   for i in `seq 1 500`; do
-   _ddt of=$SCRATCH_MNT/l$i bs=16385 count=1
-   _ddt of=$SCRATCH_MNT/s$i bs=3800 count=1
-   done > /dev/null 2>&1
-
-   # Generate a template once and quickly copy it multiple times.
-   _ddt of=$SCRATCH_MNT/t0 bs=1M count=1 > /dev/null 2>&1
+   echo -e "Replace from $source_dev to $SPARE_DEV\\n" >> $seqres.full
+   btrfs_replace_test $source_dev $SPARE_DEV "" $with_cancel $quick
 
-   if [ "${quick}Q" = "thoroughQ" ]; 

Re: [PATCH 1/2] btrfs: Factor out read portion of btrfs_get_blocks_direct

2018-04-19 Thread David Sterba
On Wed, Apr 11, 2018 at 10:22:58AM +0300, Nikolay Borisov wrote:
> >> +  free_extent_map(em);
> >> +  goto unlock_err;
> >> +  }
> >> +  /*
> >> +   * We need to unlock only the end area that we aren't using
> >> +   * if there is any leftover space
> >> +   */
> >> +  free_extent_state(cached_state);
> >>free_extent_map(em);
> >> -  goto unlock_err;
> >> +  return 0;
> > 
> > Please add a separate label for that, the funcion uses the single exit
> > block style (labels and one-or-two returns).
> 
> So I think this is unnecessary because in 2/2 I factor out common code
> i.e. the free_extent_map(em); return 0; outside of the 'if' branch and
> so this return disappears. The 3rd hunk in 2/2 begins with:
> 
> @@ -7780,106 +7882,8 @@ static int btrfs_get_blocks_direct(struct inode
> *inode, sector_t iblock,
>* if there is any leftover space
>*/
>   free_extent_state(cached_state);
> - free_extent_map(em);
> - return 0;
> - }

Ok.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs: Add incompat flags check for btrfs_check_super_valid()

2018-04-19 Thread David Sterba
On Thu, Apr 19, 2018 at 07:10:30PM +0800, Qu Wenruo wrote:
> 
> 
> On 2018年04月19日 18:59, Nikolay Borisov wrote:
> > 
> > 
> > On 19.04.2018 12:38, Qu Wenruo wrote:
> >> Although we have already checked incompat flags manually before really
> >> mounting it, we could still enhance btrfs_check_super_valid() to check
> >> incompat flags for later write time super block validation check.
> >>
> >> This patch adds such incompat flags check for btrfs_check_super_valid(),
> >> currently it won't be triggered, but provides the basis for later write
> >> time check.
> >>
> >> Signed-off-by: Qu Wenruo 
> > 
> > Reviewed-by: Nikolay Borisov 
> > 
> >> ---
> >>  fs/btrfs/disk-io.c | 13 +
> >>  1 file changed, 13 insertions(+)
> >>
> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> >> index 60caa68c3618..ec123158f051 100644
> >> --- a/fs/btrfs/disk-io.c
> >> +++ b/fs/btrfs/disk-io.c
> >> @@ -4104,6 +4104,19 @@ static int btrfs_check_super_valid(struct 
> >> btrfs_fs_info *fs_info)
> > 
> > nit: Thinking out loud here - wouldn't it make more sense to name the
> > function btrfs_validate_super. check_super_valid sounds a bit cumbersome
> > to me. What do you think ?
> 
> Indeed, I also like to remove the btrfs_ prefix since it's a static
> function.
> validate_super() looks much better.

It's not necessary to remove the btrfs_ prefix from all static
functions, sometimes the functions appear on stacks or mixed with other
subystem helpers that have generic names. The prefix makes it clear that
it's our function.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/15] Review uuid_mutex usage

2018-04-19 Thread David Sterba
On Thu, Apr 12, 2018 at 10:29:23AM +0800, Anand Jain wrote:
> uuid_mutex lock is not a per-fs lock but a global lock. The main aim of
> this patch-set is to critically review the usage of this lock, and delete
> the unnecessary once. By doing this we improve the concurrency of
> device operations across multiple btrfs filesystems is in the system.
> 
> patch 1: Was sent before, I am including it here, as its about uuid_mutex.
> 
> patch 2-9: Are cleanup and or preparatory patches.
> 
> patch 10-14: Drops the uuid_mutex and makes sure there is enough lock,
> as discussed in the patch change log.
> 
> patch 15: A generic cleanup patch around functions in the same context.
> 
> These patches are on top of
>   https://github.com/kdave/btrfs-devel.git remove-volume-mutex
> And it will be a good idea to go along with the kill-volume-mutex patches.
> 
> This is tested with xfstests and there are no _new_ regression. And I am
> trying to understand the old regressions, and notice that they are
> inconsistent.
> 
> Anand Jain (15):
>   btrfs: optimize move uuid_mutex closer to the critical section

FYI, the patches:

>   btrfs: rename struct btrfs_fs_devices::list
>   btrfs: cleanup __btrfs_open_devices() drop head pointer
>   btrfs: rename __btrfs_close_devices to close_fs_devices
>   btrfs: rename __btrfs_open_devices to open_fs_devices
>   btrfs: cleanup find_device() drop list_head pointer
>   btrfs: cleanup btrfs_rm_device() promote fs_devices pointer

have been added to misc-next as they're independent cleanups. The rest
is still under review.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs: Drop add_delayed_ref_head fs_info parameter

2018-04-19 Thread David Sterba
On Thu, Apr 19, 2018 at 11:09:09AM +0300, Nikolay Borisov wrote:
> You said that while merging the deleayed refs cleanup/streamlining
> series there were some conflicts. It turns out they are due to these 3
> patches missing. I'm sending them as a separate set or I can also send
> them as part of the v2 for the delayed refs cleanup. The only thing
> that's left for that series is to rebase ontop of the delayed refs race
> fix. What will be the best way to handle this ?

I've applied the 3 cleanup patches to misc-next, the conflict was simple
to resolve.

The conflict with the delayed refs cleanup series will be bigger, the
review is pending so I'll expect maybe one more iteration, so taking the
3 cleanup makes a bit more sense.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] btrfs: Sink extent_tree arguments in try_release_extent_mapping

2018-04-19 Thread David Sterba
On Thu, Apr 19, 2018 at 10:46:34AM +0300, Nikolay Borisov wrote:
> This function already gets the page from which the two extent trees
> are referenced. Simplify its signature by moving the code getting the
> trees inside the function. No functional changes.
> 
> Signed-off-by: Nikolay Borisov 

1-6 added to next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] btrfs: Remove btrfs_wait_and_free_delalloc_work

2018-04-19 Thread David Sterba
On Thu, Apr 19, 2018 at 10:46:39AM +0300, Nikolay Borisov wrote:
> This function is called from only 1 place and is effectively a wrapper
> over wait_completion/kfree. It doesn't really bring any value having
> those two calls in a separate function. Just open code it and remove it.
> No functional changes.
> 
> Signed-off-by: Nikolay Borisov 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] btrfs: Use list_empty instead of list_empty_careful

2018-04-19 Thread David Sterba
On Thu, Apr 19, 2018 at 10:46:37AM +0300, Nikolay Borisov wrote:
> list_empty_careful usually is a signal of something tricky going on. Its
> usage in btrfs is actually not needed since both lists it's used on are
> local to a function and cannot be modified concurrently. So switch to
> plain list_empty. No functional changes.
> 
> Signed-off-by: Nikolay Borisov 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: print-tree: output enhancement

2018-04-19 Thread David Sterba
On Thu, Apr 19, 2018 at 08:57:30PM +0800, Qu Wenruo wrote:
> 
> 
> On 2018年04月19日 20:49, David Sterba wrote:
> > On Wed, Apr 11, 2018 at 05:08:12PM +0800, Qu Wenruo wrote:
> >> This patch enhance the following things:
> >>
> >> - tree block header
> >>   * add generation and owner output for node and leaf
> >> - node pointer generation output
> >> - allow btrfs_print_tree() to not follow nodes
> >>   * just like btrfs-progs
> >>
> >> Please note that, although function btrfs_print_tree() is not called by
> >> anyone right now, it's still a pretty useful function to debug kernel.
> >> So that function is still kept for later use.
> > 
> > It will stay of course, code deletionists have no chance removing useful
> > debugging stuff.
> > 
> > Reviewed-by: David Sterba 
> > 
> > Some time ago I improved the progs output of print-tree, this would be
> > also good to port to kernel and we can then keep the files in 1:1.
> 
> Definitely!
> The current kernel print-tree is just better than no output, far from
> human readable output.
> I can't wait to port btrfs-progs print-tree to kernel, but that's over
> 1000 lines for debug facility.
> 
> If that's OK, I could start the port any time.

As print-tree is independent, you can do that incrementally, similar to
the progs patches that factor out the per-key cases into functions.
Anything you'd have we can merge rightaway, these are easy patches that
can serve as a "relaxation" and relief from the difficult ones.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs: print-tree: Add locking status output for debug build

2018-04-19 Thread David Sterba
On Wed, Apr 11, 2018 at 05:08:13PM +0800, Qu Wenruo wrote:
> It's pretty handy if we can get debug output for locking status of an
> extent buffer, specially for race related debugging.
> 
> So add the following output for btrfs_print_tree() and
> btrfs_print_leaf():
> - refs
> - write_locks (as w:%u)
> - read_locks (as r:%u)
> - blocking_writers (as bw:%u)
> - blocking_readers (as br:%u)
> - spinning_writers (as sw:%u)
> - spinning_readers (as sr:%u)
> - lock_owner
> - current->pid

Useful, but atomic is 'int' (%d), and pid_t is also int in disguise
(maybe there's a special printk specifier for that).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: print-tree: output enhancement

2018-04-19 Thread Qu Wenruo


On 2018年04月19日 20:49, David Sterba wrote:
> On Wed, Apr 11, 2018 at 05:08:12PM +0800, Qu Wenruo wrote:
>> This patch enhance the following things:
>>
>> - tree block header
>>   * add generation and owner output for node and leaf
>> - node pointer generation output
>> - allow btrfs_print_tree() to not follow nodes
>>   * just like btrfs-progs
>>
>> Please note that, although function btrfs_print_tree() is not called by
>> anyone right now, it's still a pretty useful function to debug kernel.
>> So that function is still kept for later use.
> 
> It will stay of course, code deletionists have no chance removing useful
> debugging stuff.
> 
> Reviewed-by: David Sterba 
> 
> Some time ago I improved the progs output of print-tree, this would be
> also good to port to kernel and we can then keep the files in 1:1.

Definitely!
The current kernel print-tree is just better than no output, far from
human readable output.
I can't wait to port btrfs-progs print-tree to kernel, but that's over
1000 lines for debug facility.

If that's OK, I could start the port any time.

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] btrfs: print-tree: output enhancement

2018-04-19 Thread David Sterba
On Wed, Apr 11, 2018 at 05:08:12PM +0800, Qu Wenruo wrote:
> This patch enhance the following things:
> 
> - tree block header
>   * add generation and owner output for node and leaf
> - node pointer generation output
> - allow btrfs_print_tree() to not follow nodes
>   * just like btrfs-progs
> 
> Please note that, although function btrfs_print_tree() is not called by
> anyone right now, it's still a pretty useful function to debug kernel.
> So that function is still kept for later use.

It will stay of course, code deletionists have no chance removing useful
debugging stuff.

Reviewed-by: David Sterba 

Some time ago I improved the progs output of print-tree, this would be
also good to port to kernel and we can then keep the files in 1:1.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] btrfs: clean up le_bitmap_{set, clear}()

2018-04-19 Thread David Sterba
On Wed, Apr 18, 2018 at 06:02:35PM -0700, Howard McLauchlan wrote:
> le_bitmap_set() is only used by free-space-tree, so move it there and
> make it static. le_bitmap_clear() is not used, so remove it.
> 
> Signed-off-by: Howard McLauchlan 

Added to the queue, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Filesystem full with unallocated space?

2018-04-19 Thread Matthew Lai

Hi Nikolay,

Updating to 4.15 seems to have fixed it.

Thanks!

Matthew


On 04/18/2018 05:48 PM, Nikolay Borisov wrote:


On 17.04.2018 19:08, Matthew Lai wrote:

Hello!

I am getting ENOSPC on my root filesystem with plenty of unallocated
space according to "fi usage". Any idea why that may be? This is a root
partition for Debian Stable. Not doing anything unusual that I'm aware
of. No snapshots.

Thanks!

Matthew

uname -a:
Linux bigfoot 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23)
x86_64 GNU/Linux

btrfs --version:
btrfs-progs v4.7.3

btrfs fi show:
Label: none  uuid: 2364c63f-e20c-410f-90b4-05f722ee1c77
     Total devices 1 FS bytes used 176.27GiB
     devid    1 size 246.33GiB used 185.01GiB path /dev/sda2

btrfs fi df /:
Data, single: total=183.00GiB, used=175.78GiB
System, single: total=4.00MiB, used=48.00KiB
Metadata, single: total=2.01GiB, used=504.81MiB
GlobalReserve, single: total=211.39MiB, used=0.00B



You haven't shown "btrfs fi usage". In any case there were some patches
in more recent kernels which deal with premature ENOSPC:


996478ca9c46 ("btrfs: change how we decide to commit transactions during
flushing")
17024ad0a0fd ("Btrfs: fix early ENOSPC due to delalloc")


I'd advise you update to 4.14 stable. Otherwise running:
git log --oneline --grep "enospc" fs/btrfs/

will shows you likely candidates.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs: Add incompat flags check for btrfs_check_super_valid()

2018-04-19 Thread Qu Wenruo


On 2018年04月19日 18:59, Nikolay Borisov wrote:
> 
> 
> On 19.04.2018 12:38, Qu Wenruo wrote:
>> Although we have already checked incompat flags manually before really
>> mounting it, we could still enhance btrfs_check_super_valid() to check
>> incompat flags for later write time super block validation check.
>>
>> This patch adds such incompat flags check for btrfs_check_super_valid(),
>> currently it won't be triggered, but provides the basis for later write
>> time check.
>>
>> Signed-off-by: Qu Wenruo 
> 
> Reviewed-by: Nikolay Borisov 
> 
>> ---
>>  fs/btrfs/disk-io.c | 13 +
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 60caa68c3618..ec123158f051 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -4104,6 +4104,19 @@ static int btrfs_check_super_valid(struct 
>> btrfs_fs_info *fs_info)
> 
> nit: Thinking out loud here - wouldn't it make more sense to name the
> function btrfs_validate_super. check_super_valid sounds a bit cumbersome
> to me. What do you think ?

Indeed, I also like to remove the btrfs_ prefix since it's a static
function.
validate_super() looks much better.

Thanks,
Qu

>>  ret = -EINVAL;
>>  }
>>  
>> +/*
>> + * Before calling btrfs_check_super_valid() we have already checked
>> + * incompat flags. So if we developr new incompat flags, it's must be
> s/developr/detect ?
>> + * some corruption.
>> + */
>> +if (btrfs_super_incompat_flags(sb) & ~BTRFS_FEATURE_INCOMPAT_SUPP) {
>> +btrfs_err(fs_info,
>> +"corrupted incompat flags detected 0x%llx, supported 0x%llx",
>> +  btrfs_super_incompat_flags(sb),
>> +  BTRFS_FEATURE_INCOMPAT_SUPP);
>> +ret = -EINVAL;
>> +}
>> +
>>  /*
>>   * The generation is a global counter, we'll trust it more than the 
>> others
>>   * but it's still possible that it's the one that's wrong.
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs: Add incompat flags check for btrfs_check_super_valid()

2018-04-19 Thread Nikolay Borisov


On 19.04.2018 12:38, Qu Wenruo wrote:
> Although we have already checked incompat flags manually before really
> mounting it, we could still enhance btrfs_check_super_valid() to check
> incompat flags for later write time super block validation check.
> 
> This patch adds such incompat flags check for btrfs_check_super_valid(),
> currently it won't be triggered, but provides the basis for later write
> time check.
> 
> Signed-off-by: Qu Wenruo 

Reviewed-by: Nikolay Borisov 

> ---
>  fs/btrfs/disk-io.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 60caa68c3618..ec123158f051 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -4104,6 +4104,19 @@ static int btrfs_check_super_valid(struct 
> btrfs_fs_info *fs_info)

nit: Thinking out loud here - wouldn't it make more sense to name the
function btrfs_validate_super. check_super_valid sounds a bit cumbersome
to me. What do you think ?
>   ret = -EINVAL;
>   }
>  
> + /*
> +  * Before calling btrfs_check_super_valid() we have already checked
> +  * incompat flags. So if we developr new incompat flags, it's must be
s/developr/detect ?
> +  * some corruption.
> +  */
> + if (btrfs_super_incompat_flags(sb) & ~BTRFS_FEATURE_INCOMPAT_SUPP) {
> + btrfs_err(fs_info,
> + "corrupted incompat flags detected 0x%llx, supported 0x%llx",
> +   btrfs_super_incompat_flags(sb),
> +   BTRFS_FEATURE_INCOMPAT_SUPP);
> + ret = -EINVAL;
> + }
> +
>   /*
>* The generation is a global counter, we'll trust it more than the 
> others
>* but it's still possible that it's the one that's wrong.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] btrfs: Add csum type check for btrfs_check_super_valid()

2018-04-19 Thread Qu Wenruo
Just like incompat flags check, although we have already done super csum
type check before calling btrfs_check_super_valid(), we can still add
such check for later write time check.

Signed-off-by: Qu Wenruo 
---
v2:
  Move csum_type check after magic check.
---
 fs/btrfs/disk-io.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ec123158f051..23d70c3fdc22 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3985,6 +3985,15 @@ static int btrfs_check_super_valid(struct btrfs_fs_info 
*fs_info)
btrfs_err(fs_info, "no valid FS found");
ret = -EINVAL;
}
+   /*
+* For write time check, as for mount time we have checked csum before
+* calling btrfs_check_super_valid(), so it must be a corruption
+*/
+   if (btrfs_super_csum_type(sb) >= ARRAY_SIZE(btrfs_csum_sizes)) {
+   btrfs_err(fs_info, "corrupted csum type %u",
+ btrfs_super_csum_type(sb));
+   ret = -EINVAL;
+   }
if (btrfs_super_flags(sb) & ~BTRFS_SUPER_FLAG_SUPP) {
btrfs_err(fs_info, "unrecognized or unsupported super flag: 
%llu",
btrfs_super_flags(sb) & ~BTRFS_SUPER_FLAG_SUPP);
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] btrfs: Do super block verification before writing it to disk

2018-04-19 Thread Qu Wenruo
There are already 2 reports about strangely corrupted super blocks,
where csum still matches but extra garbage gets slipped into super block.

The corruption would looks like:
--
superblock: bytenr=65536, device=/dev/sdc1
-
csum_type   41700 (INVALID)
csum0x3b252d3a [match]
bytenr  65536
flags   0x1
( WRITTEN )
magic   _BHRfS_M [match]
...
incompat_flags  0x5b224169
( MIXED_BACKREF |
  COMPRESS_LZO |
  BIG_METADATA |
  EXTENDED_IREF |
  SKINNY_METADATA |
  unknown flag: 0x5b224000 )
...
--
Or
--
superblock: bytenr=65536, device=/dev/mapper/x
-
csum_type  35355 (INVALID)
csum_size  32
csum   0xf0dbeddd [match]
bytenr 65536
flags  0x1
   ( WRITTEN )
magic  _BHRfS_M [match]
...
incompat_flags 0x176d2169
   ( MIXED_BACKREF |
 COMPRESS_LZO |
 BIG_METADATA |
 EXTENDED_IREF |
 SKINNY_METADATA |
 unknown flag: 0x176d2000 )
--

Obviously, csum_type and incompat_flags get some garbage, but its csum
still matches, which means kernel calculates the csum based on corrupted
super block memory.
And after manually fixing these values, the filesystem is completely
healthy without any problem exposed by btrfs check.

Although the cause is still unknown, at least detect it and prevent further
corruption.

Reported-by: Ken Swenson 
Reported-by: Ben Parsons <9parso...@gmail.com>
Signed-off-by: Qu Wenruo 
---
v2:
  Add new comment for why we could skip bytenr check at sb write time.
  Spell fix.
  Remove unrelated intermediate number fix.
---
 fs/btrfs/disk-io.c | 36 +---
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 23d70c3fdc22..3f380e3d0195 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -55,7 +55,9 @@
 static const struct extent_io_ops btree_extent_io_ops;
 static void end_workqueue_fn(struct btrfs_work *work);
 static void free_fs_root(struct btrfs_root *root);
-static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info);
+static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
+  struct btrfs_super_block *sb,
+  int super_mirror);
 static void btrfs_destroy_ordered_extents(struct btrfs_root *root);
 static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
  struct btrfs_fs_info *fs_info);
@@ -2668,7 +2670,7 @@ int open_ctree(struct super_block *sb,
 
memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE);
 
-   ret = btrfs_check_super_valid(fs_info);
+   ret = btrfs_check_super_valid(fs_info, fs_info->super_copy, 0);
if (ret) {
btrfs_err(fs_info, "superblock contains fatal errors");
err = -EINVAL;
@@ -3563,6 +3565,16 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
max_mirrors)
sb = fs_info->super_for_commit;
dev_item = >dev_item;
 
+   /*
+* super_bytenr will be updated in write_dev_supers(), even if it is
+* corrupted in current copy, it won't reach disk. So skip bytenr check.
+*/
+   if (btrfs_check_super_valid(fs_info, sb, -1)) {
+   btrfs_err(fs_info,
+   "superblock corruption detected before transaction commit");
+   return -EUCLEAN;
+   }
+
mutex_lock(_info->fs_devices->device_list_mutex);
head = _info->fs_devices->devices;
max_errors = btrfs_super_num_devices(fs_info->super_copy) - 1;
@@ -3974,9 +3986,18 @@ int btrfs_read_buffer(struct extent_buffer *buf, u64 
parent_transid, int level,
  level, first_key);
 }
 
-static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
+/*
+ * Check the validation of btrfs super block.
+ *
+ * @sb:super block to check
+ * @super_mirror:  the super block number to check its bytenr.
+ * 0 means the primary (1st) sb, 1 and 2 means 2nd and
+ * 3rd backup sb, while -1 means to skip bytenr check.
+ */
+static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
+  struct btrfs_super_block *sb,
+  int super_mirror)
 {
-   struct btrfs_super_block *sb = fs_info->super_copy;

[PATCH v2 3/3] btrfs: Do super block verification before writing it to disk

2018-04-19 Thread Qu Wenruo
There are already 2 reports about strangely corrupted super blocks,
where csum still matches but extra garbage gets slipped into super block.

The corruption would looks like:
--
superblock: bytenr=65536, device=/dev/sdc1
-
csum_type   41700 (INVALID)
csum0x3b252d3a [match]
bytenr  65536
flags   0x1
( WRITTEN )
magic   _BHRfS_M [match]
...
incompat_flags  0x5b224169
( MIXED_BACKREF |
  COMPRESS_LZO |
  BIG_METADATA |
  EXTENDED_IREF |
  SKINNY_METADATA |
  unknown flag: 0x5b224000 )
...
--
Or
--
superblock: bytenr=65536, device=/dev/mapper/x
-
csum_type  35355 (INVALID)
csum_size  32
csum   0xf0dbeddd [match]
bytenr 65536
flags  0x1
   ( WRITTEN )
magic  _BHRfS_M [match]
...
incompat_flags 0x176d2169
   ( MIXED_BACKREF |
 COMPRESS_LZO |
 BIG_METADATA |
 EXTENDED_IREF |
 SKINNY_METADATA |
 unknown flag: 0x176d2000 )
--

Obviously, csum_type and incompat_flags get some garbage, but its csum
still matches, which means kernel calculates the csum based on corrupted
super block memory.
And after manually fixing these values, the filesystem is completely
healthy without any problem exposed by btrfs check.

Although the cause is still unknown, at least detect it and prevent further
corruption.

Reported-by: Ken Swenson 
Reported-by: Ben Parsons <9parso...@gmail.com>
Signed-off-by: Qu Wenruo 
---
v2:
  Add new comment for why we could skip bytenr check at sb write time.
  Spell fix.
  Remove unrelated intermediate number fix.
---
 fs/btrfs/disk-io.c | 36 +---
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 23d70c3fdc22..3f380e3d0195 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -55,7 +55,9 @@
 static const struct extent_io_ops btree_extent_io_ops;
 static void end_workqueue_fn(struct btrfs_work *work);
 static void free_fs_root(struct btrfs_root *root);
-static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info);
+static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
+  struct btrfs_super_block *sb,
+  int super_mirror);
 static void btrfs_destroy_ordered_extents(struct btrfs_root *root);
 static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
  struct btrfs_fs_info *fs_info);
@@ -2668,7 +2670,7 @@ int open_ctree(struct super_block *sb,
 
memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE);
 
-   ret = btrfs_check_super_valid(fs_info);
+   ret = btrfs_check_super_valid(fs_info, fs_info->super_copy, 0);
if (ret) {
btrfs_err(fs_info, "superblock contains fatal errors");
err = -EINVAL;
@@ -3563,6 +3565,16 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
max_mirrors)
sb = fs_info->super_for_commit;
dev_item = >dev_item;
 
+   /*
+* super_bytenr will be updated in write_dev_supers(), even if it is
+* corrupted in current copy, it won't reach disk. So skip bytenr check.
+*/
+   if (btrfs_check_super_valid(fs_info, sb, -1)) {
+   btrfs_err(fs_info,
+   "superblock corruption detected before transaction commit");
+   return -EUCLEAN;
+   }
+
mutex_lock(_info->fs_devices->device_list_mutex);
head = _info->fs_devices->devices;
max_errors = btrfs_super_num_devices(fs_info->super_copy) - 1;
@@ -3974,9 +3986,18 @@ int btrfs_read_buffer(struct extent_buffer *buf, u64 
parent_transid, int level,
  level, first_key);
 }
 
-static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
+/*
+ * Check the validation of btrfs super block.
+ *
+ * @sb:super block to check
+ * @super_mirror:  the super block number to check its bytenr.
+ * 0 means the primary (1st) sb, 1 and 2 means 2nd and
+ * 3rd backup sb, while -1 means to skip bytenr check.
+ */
+static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
+  struct btrfs_super_block *sb,
+  int super_mirror)
 {
-   struct btrfs_super_block *sb = fs_info->super_copy;

Re: [PATCH 3/3] btrfs: Do super block verification before writing it to disk

2018-04-19 Thread Qu Wenruo


On 2018年04月19日 18:16, David Sterba wrote:
> Looks good, some minor comments below. I'm wondering how to test that.

Currently I'm using the most stupid way to test it, insert code randomly
modifies super blocks.

> We'd have to inject either the corruption or to provide a way to
> forcibly fail the test. For the latter a debugfs should do, I'll send
> something for comments.

What about a sysfs interface to modify super blocks?
Since we already have sectorsize and nodessize, allow it to be writeable
for CONFIG_BTRFS_DEBUG looks pretty good.

Thanks,
Qu

> 
> On Thu, Apr 19, 2018 at 05:38:16PM +0800, Qu Wenruo wrote:
>> @@ -3563,6 +3565,12 @@ int write_all_supers(struct btrfs_fs_info *fs_info, 
>> int max_mirrors)
>>  sb = fs_info->super_for_commit;
>>  dev_item = >dev_item;
>>  
>> +if (btrfs_check_super_valid(fs_info, sb, -1)) {
> 
> A comment that this is skipping the bytenr check would be good.
> 
>> +btrfs_err(fs_info,
>> +"superblock corruption detected before transaction commitment");
> 
>commit
> 
> 
>> +return -EUCLEAN;
>> +}
>> +
>>  mutex_lock(_info->fs_devices->device_list_mutex);
>>  head = _info->fs_devices->devices;
>>  max_errors = btrfs_super_num_devices(fs_info->super_copy) - 1;
>> @@ -3974,9 +3982,18 @@ int btrfs_read_buffer(struct extent_buffer *buf, u64 
>> parent_transid, int level,
>>level, first_key);
>>  }
>>  
>> -static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
>> +/*
>> + * Check the validation of btrfs super block.
>> + *
>> + * @sb: super block to check
>> + * @super_mirror:   the super block number to check its bytenr.
>> + *  0 means the primary (1st) sb, 1 and 2 means 2nd and
>> + *  3rd backup sb, while -1 means to skip bytenr check.
>> + */
>> +static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
>> +   struct btrfs_super_block *sb,
>> +   int super_mirror)
>>  {
>> -struct btrfs_super_block *sb = fs_info->super_copy;
>>  u64 nodesize = btrfs_super_nodesize(sb);
>>  u64 sectorsize = btrfs_super_sectorsize(sb);
>>  int ret = 0;
>> @@ -4019,7 +4036,7 @@ static int btrfs_check_super_valid(struct 
>> btrfs_fs_info *fs_info)
>>   * Check sectorsize and nodesize first, other check will need it.
>>   * Check all possible sectorsize(4K, 8K, 16K, 32K, 64K) here.
>>   */
>> -if (!is_power_of_2(sectorsize) || sectorsize < 4096 ||
>> +if (!is_power_of_2(sectorsize) || sectorsize < SZ_4K ||
> 
> No unrelated changes please. There are some remaining raw values, send a
> separate patch if you want to convert them.
> 
>>  sectorsize > BTRFS_MAX_METADATA_BLOCKSIZE) {
>>  btrfs_err(fs_info, "invalid sectorsize %llu", sectorsize);
>>  ret = -EINVAL;
>> @@ -4088,9 +4105,10 @@ static int btrfs_check_super_valid(struct 
>> btrfs_fs_info *fs_info)
>>  ret = -EINVAL;
>>  }
>>  
>> -if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
>> -btrfs_err(fs_info, "super offset mismatch %llu != %u",
>> -  btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
>> +if (super_mirror >= 0 && btrfs_super_bytenr(sb) !=
>> +btrfs_sb_offset(super_mirror)) {
>> +btrfs_err(fs_info, "super offset mismatch %llu != %llu",
>> +btrfs_super_bytenr(sb), btrfs_sb_offset(super_mirror));
>>  ret = -EINVAL;
>>  }
>>  
>> -- 
>> 2.17.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 2/3] btrfs: Add csum type check for btrfs_check_super_valid()

2018-04-19 Thread Qu Wenruo


On 2018年04月19日 18:09, David Sterba wrote:
> On Thu, Apr 19, 2018 at 05:38:15PM +0800, Qu Wenruo wrote:
>> Just like incompat flags check, although we have already done super csum
>> type check before calling btrfs_check_super_valid(), we can still add
>> such check for later write time check.
>>
>> Signed-off-by: Qu Wenruo 
>> ---
>>  fs/btrfs/disk-io.c | 9 +
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index ec123158f051..b189ec086bc2 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3981,6 +3981,15 @@ static int btrfs_check_super_valid(struct 
>> btrfs_fs_info *fs_info)
>>  u64 sectorsize = btrfs_super_sectorsize(sb);
>>  int ret = 0;
>>  
>> +/*
>> + * For write time check, as for mount time we have checked csum before
>> + * calling btrfs_check_super_valid(), so it must be a corruption
>> + */
>> +if (btrfs_super_csum_type(sb) >= ARRAY_SIZE(btrfs_csum_sizes)) {
>> +btrfs_err(fs_info, "corrupted csum type %u",
>> +  btrfs_super_csum_type(sb));
>> +ret = -EINVAL;
>> +}
>>  if (btrfs_super_magic(sb) != BTRFS_MAGIC) {
> 
> The test for magic should be IMO the first one, that's how the
> filesystem type is identified before anything else. Granted that the
> magic must be correct before it can be even mounted, but the code should
> follow that logic too. Otherwise ok.

Makes sense, will update the patchset.

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 3/3] btrfs: Do super block verification before writing it to disk

2018-04-19 Thread David Sterba
Looks good, some minor comments below. I'm wondering how to test that.
We'd have to inject either the corruption or to provide a way to
forcibly fail the test. For the latter a debugfs should do, I'll send
something for comments.

On Thu, Apr 19, 2018 at 05:38:16PM +0800, Qu Wenruo wrote:
> @@ -3563,6 +3565,12 @@ int write_all_supers(struct btrfs_fs_info *fs_info, 
> int max_mirrors)
>   sb = fs_info->super_for_commit;
>   dev_item = >dev_item;
>  
> + if (btrfs_check_super_valid(fs_info, sb, -1)) {

A comment that this is skipping the bytenr check would be good.

> + btrfs_err(fs_info,
> + "superblock corruption detected before transaction commitment");

   commit


> + return -EUCLEAN;
> + }
> +
>   mutex_lock(_info->fs_devices->device_list_mutex);
>   head = _info->fs_devices->devices;
>   max_errors = btrfs_super_num_devices(fs_info->super_copy) - 1;
> @@ -3974,9 +3982,18 @@ int btrfs_read_buffer(struct extent_buffer *buf, u64 
> parent_transid, int level,
> level, first_key);
>  }
>  
> -static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
> +/*
> + * Check the validation of btrfs super block.
> + *
> + * @sb:  super block to check
> + * @super_mirror:the super block number to check its bytenr.
> + *   0 means the primary (1st) sb, 1 and 2 means 2nd and
> + *   3rd backup sb, while -1 means to skip bytenr check.
> + */
> +static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
> +struct btrfs_super_block *sb,
> +int super_mirror)
>  {
> - struct btrfs_super_block *sb = fs_info->super_copy;
>   u64 nodesize = btrfs_super_nodesize(sb);
>   u64 sectorsize = btrfs_super_sectorsize(sb);
>   int ret = 0;
> @@ -4019,7 +4036,7 @@ static int btrfs_check_super_valid(struct btrfs_fs_info 
> *fs_info)
>* Check sectorsize and nodesize first, other check will need it.
>* Check all possible sectorsize(4K, 8K, 16K, 32K, 64K) here.
>*/
> - if (!is_power_of_2(sectorsize) || sectorsize < 4096 ||
> + if (!is_power_of_2(sectorsize) || sectorsize < SZ_4K ||

No unrelated changes please. There are some remaining raw values, send a
separate patch if you want to convert them.

>   sectorsize > BTRFS_MAX_METADATA_BLOCKSIZE) {
>   btrfs_err(fs_info, "invalid sectorsize %llu", sectorsize);
>   ret = -EINVAL;
> @@ -4088,9 +4105,10 @@ static int btrfs_check_super_valid(struct 
> btrfs_fs_info *fs_info)
>   ret = -EINVAL;
>   }
>  
> - if (btrfs_super_bytenr(sb) != BTRFS_SUPER_INFO_OFFSET) {
> - btrfs_err(fs_info, "super offset mismatch %llu != %u",
> -   btrfs_super_bytenr(sb), BTRFS_SUPER_INFO_OFFSET);
> + if (super_mirror >= 0 && btrfs_super_bytenr(sb) !=
> + btrfs_sb_offset(super_mirror)) {
> + btrfs_err(fs_info, "super offset mismatch %llu != %llu",
> + btrfs_super_bytenr(sb), btrfs_sb_offset(super_mirror));
>   ret = -EINVAL;
>   }
>  
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] btrfs: Add csum type check for btrfs_check_super_valid()

2018-04-19 Thread David Sterba
On Thu, Apr 19, 2018 at 05:38:15PM +0800, Qu Wenruo wrote:
> Just like incompat flags check, although we have already done super csum
> type check before calling btrfs_check_super_valid(), we can still add
> such check for later write time check.
> 
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/disk-io.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index ec123158f051..b189ec086bc2 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3981,6 +3981,15 @@ static int btrfs_check_super_valid(struct 
> btrfs_fs_info *fs_info)
>   u64 sectorsize = btrfs_super_sectorsize(sb);
>   int ret = 0;
>  
> + /*
> +  * For write time check, as for mount time we have checked csum before
> +  * calling btrfs_check_super_valid(), so it must be a corruption
> +  */
> + if (btrfs_super_csum_type(sb) >= ARRAY_SIZE(btrfs_csum_sizes)) {
> + btrfs_err(fs_info, "corrupted csum type %u",
> +   btrfs_super_csum_type(sb));
> + ret = -EINVAL;
> + }
>   if (btrfs_super_magic(sb) != BTRFS_MAGIC) {

The test for magic should be IMO the first one, that's how the
filesystem type is identified before anything else. Granted that the
magic must be correct before it can be even mounted, but the code should
follow that logic too. Otherwise ok.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs: Do super block verification before writing it to disk

2018-04-19 Thread Qu Wenruo
There are already 2 reports about strangely corrupted super blocks,
where csum still matches but extra garbage gets slipped into super block.

The corruption would looks like:
--
superblock: bytenr=65536, device=/dev/sdc1
-
csum_type   41700 (INVALID)
csum0x3b252d3a [match]
bytenr  65536
flags   0x1
( WRITTEN )
magic   _BHRfS_M [match]
...
incompat_flags  0x5b224169
( MIXED_BACKREF |
  COMPRESS_LZO |
  BIG_METADATA |
  EXTENDED_IREF |
  SKINNY_METADATA |
  unknown flag: 0x5b224000 )
...
--
Or
--
superblock: bytenr=65536, device=/dev/mapper/x
-
csum_type  35355 (INVALID)
csum_size  32
csum   0xf0dbeddd [match]
bytenr 65536
flags  0x1
   ( WRITTEN )
magic  _BHRfS_M [match]
...
incompat_flags 0x176d2169
   ( MIXED_BACKREF |
 COMPRESS_LZO |
 BIG_METADATA |
 EXTENDED_IREF |
 SKINNY_METADATA |
 unknown flag: 0x176d2000 )
--

Obviously, csum_type and incompat_flags get some garbage, but its csum
still matches, which means kernel calculates the csum based on corrupted
super block memory.
And after manually fixing these values, the filesystem is completely
healthy without any problem exposed by btrfs check.

Although the cause is still unknown, at least detect it and prevent further
corruption.

Reported-by: Ken Swenson 
Reported-by: Ben Parsons <9parso...@gmail.com>
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/disk-io.c | 34 ++
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b189ec086bc2..fc62f84c3613 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -55,7 +55,9 @@
 static const struct extent_io_ops btree_extent_io_ops;
 static void end_workqueue_fn(struct btrfs_work *work);
 static void free_fs_root(struct btrfs_root *root);
-static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info);
+static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
+  struct btrfs_super_block *sb,
+  int super_mirror);
 static void btrfs_destroy_ordered_extents(struct btrfs_root *root);
 static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
  struct btrfs_fs_info *fs_info);
@@ -2668,7 +2670,7 @@ int open_ctree(struct super_block *sb,
 
memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE);
 
-   ret = btrfs_check_super_valid(fs_info);
+   ret = btrfs_check_super_valid(fs_info, fs_info->super_copy, 0);
if (ret) {
btrfs_err(fs_info, "superblock contains fatal errors");
err = -EINVAL;
@@ -3563,6 +3565,12 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
max_mirrors)
sb = fs_info->super_for_commit;
dev_item = >dev_item;
 
+   if (btrfs_check_super_valid(fs_info, sb, -1)) {
+   btrfs_err(fs_info,
+   "superblock corruption detected before transaction commitment");
+   return -EUCLEAN;
+   }
+
mutex_lock(_info->fs_devices->device_list_mutex);
head = _info->fs_devices->devices;
max_errors = btrfs_super_num_devices(fs_info->super_copy) - 1;
@@ -3974,9 +3982,18 @@ int btrfs_read_buffer(struct extent_buffer *buf, u64 
parent_transid, int level,
  level, first_key);
 }
 
-static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
+/*
+ * Check the validation of btrfs super block.
+ *
+ * @sb:super block to check
+ * @super_mirror:  the super block number to check its bytenr.
+ * 0 means the primary (1st) sb, 1 and 2 means 2nd and
+ * 3rd backup sb, while -1 means to skip bytenr check.
+ */
+static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
+  struct btrfs_super_block *sb,
+  int super_mirror)
 {
-   struct btrfs_super_block *sb = fs_info->super_copy;
u64 nodesize = btrfs_super_nodesize(sb);
u64 sectorsize = btrfs_super_sectorsize(sb);
int ret = 0;
@@ -4019,7 +4036,7 @@ static int btrfs_check_super_valid(struct btrfs_fs_info 
*fs_info)
 * Check sectorsize and nodesize first, other check will need it.
 * Check all possible 

[PATCH 2/3] btrfs: Add csum type check for btrfs_check_super_valid()

2018-04-19 Thread Qu Wenruo
Just like incompat flags check, although we have already done super csum
type check before calling btrfs_check_super_valid(), we can still add
such check for later write time check.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/disk-io.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ec123158f051..b189ec086bc2 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3981,6 +3981,15 @@ static int btrfs_check_super_valid(struct btrfs_fs_info 
*fs_info)
u64 sectorsize = btrfs_super_sectorsize(sb);
int ret = 0;
 
+   /*
+* For write time check, as for mount time we have checked csum before
+* calling btrfs_check_super_valid(), so it must be a corruption
+*/
+   if (btrfs_super_csum_type(sb) >= ARRAY_SIZE(btrfs_csum_sizes)) {
+   btrfs_err(fs_info, "corrupted csum type %u",
+ btrfs_super_csum_type(sb));
+   ret = -EINVAL;
+   }
if (btrfs_super_magic(sb) != BTRFS_MAGIC) {
btrfs_err(fs_info, "no valid FS found");
ret = -EINVAL;
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs: Add incompat flags check for btrfs_check_super_valid()

2018-04-19 Thread Qu Wenruo
Although we have already checked incompat flags manually before really
mounting it, we could still enhance btrfs_check_super_valid() to check
incompat flags for later write time super block validation check.

This patch adds such incompat flags check for btrfs_check_super_valid(),
currently it won't be triggered, but provides the basis for later write
time check.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/disk-io.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 60caa68c3618..ec123158f051 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4104,6 +4104,19 @@ static int btrfs_check_super_valid(struct btrfs_fs_info 
*fs_info)
ret = -EINVAL;
}
 
+   /*
+* Before calling btrfs_check_super_valid() we have already checked
+* incompat flags. So if we developr new incompat flags, it's must be
+* some corruption.
+*/
+   if (btrfs_super_incompat_flags(sb) & ~BTRFS_FEATURE_INCOMPAT_SUPP) {
+   btrfs_err(fs_info,
+   "corrupted incompat flags detected 0x%llx, supported 0x%llx",
+ btrfs_super_incompat_flags(sb),
+ BTRFS_FEATURE_INCOMPAT_SUPP);
+   ret = -EINVAL;
+   }
+
/*
 * The generation is a global counter, we'll trust it more than the 
others
 * but it's still possible that it's the one that's wrong.
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs: Drop add_delayed_ref_head fs_info parameter

2018-04-19 Thread Nikolay Borisov


On 19.04.2018 11:06, Nikolay Borisov wrote:
> It's provided by the transaction handle
> 
> Signed-off-by: Nikolay Borisov 

David,

You said that while merging the deleayed refs cleanup/streamlining
series there were some conflicts. It turns out they are due to these 3
patches missing. I'm sending them as a separate set or I can also send
them as part of the v2 for the delayed refs cleanup. The only thing
that's left for that series is to rebase ontop of the delayed refs race
fix. What will be the best way to handle this ?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs: Drop fs_info parameter from add_delayed_data_ref

2018-04-19 Thread Nikolay Borisov
It's provided by the transaction handle.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/delayed-ref.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index c0608cbf1354..f343ee0b143a 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -701,8 +701,7 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
  * helper to insert a delayed data ref into the rbtree.
  */
 static noinline void
-add_delayed_data_ref(struct btrfs_fs_info *fs_info,
-struct btrfs_trans_handle *trans,
+add_delayed_data_ref(struct btrfs_trans_handle *trans,
 struct btrfs_delayed_ref_head *head_ref,
 struct btrfs_delayed_ref_node *ref, u64 bytenr,
 u64 num_bytes, u64 parent, u64 ref_root, u64 owner,
@@ -719,7 +718,7 @@ add_delayed_data_ref(struct btrfs_fs_info *fs_info,
delayed_refs = >transaction->delayed_refs;
 
if (is_fstree(ref_root))
-   seq = atomic64_read(_info->tree_mod_seq);
+   seq = atomic64_read(>fs_info->tree_mod_seq);
 
/* first set the basic ref node struct up */
refcount_set(>refs, 1);
@@ -744,7 +743,7 @@ add_delayed_data_ref(struct btrfs_fs_info *fs_info,
full_ref->objectid = owner;
full_ref->offset = offset;
 
-   trace_add_delayed_data_ref(fs_info, ref, full_ref, action);
+   trace_add_delayed_data_ref(trans->fs_info, ref, full_ref, action);
 
ret = insert_delayed_ref(trans, delayed_refs, head_ref, ref);
if (ret > 0)
@@ -867,9 +866,8 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
*fs_info,
action, 1, _inserted,
old_ref_mod, new_ref_mod);
 
-   add_delayed_data_ref(fs_info, trans, head_ref, >node, bytenr,
-  num_bytes, parent, ref_root, owner, offset,
-  action);
+   add_delayed_data_ref(trans, head_ref, >node, bytenr, num_bytes,
+parent, ref_root, owner, offset, action);
spin_unlock(_refs->lock);
 
if (qrecord_inserted)
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs: Drop add_delayed_ref_head fs_info parameter

2018-04-19 Thread Nikolay Borisov
It's provided by the transaction handle

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/delayed-ref.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index e9a624196295..c0608cbf1354 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -532,8 +532,7 @@ update_existing_head_ref(struct btrfs_delayed_ref_root 
*delayed_refs,
  * overall modification count.
  */
 static noinline struct btrfs_delayed_ref_head *
-add_delayed_ref_head(struct btrfs_fs_info *fs_info,
-struct btrfs_trans_handle *trans,
+add_delayed_ref_head(struct btrfs_trans_handle *trans,
 struct btrfs_delayed_ref_head *head_ref,
 struct btrfs_qgroup_extent_record *qrecord,
 u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved,
@@ -603,14 +602,14 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
qrecord->num_bytes = num_bytes;
qrecord->old_roots = NULL;
 
-   if(btrfs_qgroup_trace_extent_nolock(fs_info,
+   if(btrfs_qgroup_trace_extent_nolock(trans->fs_info,
delayed_refs, qrecord))
kfree(qrecord);
else
qrecord_inserted = 1;
}
 
-   trace_add_delayed_ref_head(fs_info, head_ref, action);
+   trace_add_delayed_ref_head(trans->fs_info, head_ref, action);
 
existing = htree_insert(_refs->href_root,
_ref->href_node);
@@ -795,8 +794,8 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info 
*fs_info,
 * insert both the head node and the new ref without dropping
 * the spin lock
 */
-   head_ref = add_delayed_ref_head(fs_info, trans, head_ref, record,
-   bytenr, num_bytes, 0, 0, action, 0,
+   head_ref = add_delayed_ref_head(trans, head_ref, record, bytenr,
+   num_bytes, 0, 0, action, 0,
_inserted, old_ref_mod,
new_ref_mod);
 
@@ -863,8 +862,8 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
*fs_info,
 * insert both the head node and the new ref without dropping
 * the spin lock
 */
-   head_ref = add_delayed_ref_head(fs_info, trans, head_ref, record,
-   bytenr, num_bytes, ref_root, reserved,
+   head_ref = add_delayed_ref_head(trans, head_ref, record, bytenr,
+   num_bytes, ref_root, reserved,
action, 1, _inserted,
old_ref_mod, new_ref_mod);
 
@@ -895,9 +894,8 @@ int btrfs_add_delayed_extent_op(struct btrfs_fs_info 
*fs_info,
delayed_refs = >transaction->delayed_refs;
spin_lock(_refs->lock);
 
-   add_delayed_ref_head(fs_info, trans, head_ref, NULL, bytenr,
-num_bytes, 0, 0, BTRFS_UPDATE_DELAYED_HEAD,
-extent_op->is_data, NULL, NULL, NULL);
+   add_delayed_ref_head(trans, head_ref, NULL, bytenr, num_bytes, 0, 0,
+BTRFS_UPDATE_DELAYED_HEAD, extent_op->is_data, 
NULL, NULL, NULL);
 
spin_unlock(_refs->lock);
return 0;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs: Drop fs_info parameter from btrfs_merge_delayed_refs

2018-04-19 Thread Nikolay Borisov
It's provided by the transaction handle.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/delayed-ref.c | 2 +-
 fs/btrfs/delayed-ref.h | 1 -
 fs/btrfs/extent-tree.c | 3 +--
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index f343ee0b143a..51385c6edf04 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -286,10 +286,10 @@ static bool merge_ref(struct btrfs_trans_handle *trans,
 }
 
 void btrfs_merge_delayed_refs(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info,
  struct btrfs_delayed_ref_root *delayed_refs,
  struct btrfs_delayed_ref_head *head)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_delayed_ref_node *ref;
struct rb_node *node;
u64 seq = 0;
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index e2e4f3f703c2..9c03f914577c 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -250,7 +250,6 @@ int btrfs_add_delayed_extent_op(struct btrfs_fs_info 
*fs_info,
u64 bytenr, u64 num_bytes,
struct btrfs_delayed_extent_op *extent_op);
 void btrfs_merge_delayed_refs(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info,
  struct btrfs_delayed_ref_root *delayed_refs,
  struct btrfs_delayed_ref_head *head);
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 9749d781f1f3..afcd16ee39f2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2697,8 +2697,7 @@ static noinline int __btrfs_run_delayed_refs(struct 
btrfs_trans_handle *trans,
 * insert_inline_extent_backref()).
 */
spin_lock(_ref->lock);
-   btrfs_merge_delayed_refs(trans, fs_info, delayed_refs,
-locked_ref);
+   btrfs_merge_delayed_refs(trans, delayed_refs, locked_ref);
 
/*
 * locked_ref is the head node, so we have to go one
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] btrfs: Use list_empty instead of list_empty_careful

2018-04-19 Thread Nikolay Borisov
list_empty_careful usually is a signal of something tricky going on. Its
usage in btrfs is actually not needed since both lists it's used on are
local to a function and cannot be modified concurrently. So switch to
plain list_empty. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/inode.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7b1026a7ffab..423a6eb3165c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -10086,7 +10086,7 @@ static int __start_delalloc_inodes(struct btrfs_root 
*root, int delay_iput,
btrfs_wait_and_free_delalloc_work(work);
}
 
-   if (!list_empty_careful()) {
+   if (!list_empty()) {
spin_lock(>delalloc_lock);
list_splice_tail(, >delalloc_inodes);
spin_unlock(>delalloc_lock);
@@ -10148,7 +10148,7 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info 
*fs_info, int delay_iput,
 
ret = 0;
 out:
-   if (!list_empty_careful()) {
+   if (!list_empty()) {
spin_lock(_info->delalloc_root_lock);
list_splice_tail(, _info->delalloc_roots);
spin_unlock(_info->delalloc_root_lock);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] btrfs: Remove btrfs_wait_and_free_delalloc_work

2018-04-19 Thread Nikolay Borisov
This function is called from only 1 place and is effectively a wrapper
over wait_completion/kfree. It doesn't really bring any value having
those two calls in a separate function. Just open code it and remove it.
No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.h | 1 -
 fs/btrfs/inode.c | 9 ++---
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7526d31f80a4..c6239068117c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3173,7 +3173,6 @@ struct btrfs_delalloc_work {
 
 struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode,
int delay_iput);
-void btrfs_wait_and_free_delalloc_work(struct btrfs_delalloc_work *work);
 
 struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
struct page *page, size_t pg_offset, u64 start,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b4158c2b6f3e..a5986402b172 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -10018,12 +10018,6 @@ struct btrfs_delalloc_work 
*btrfs_alloc_delalloc_work(struct inode *inode,
return work;
 }
 
-void btrfs_wait_and_free_delalloc_work(struct btrfs_delalloc_work *work)
-{
-   wait_for_completion(>completion);
-   kfree(work);
-}
-
 /*
  * some fairly slow code that needs optimization. This walks the list
  * of all the inodes with pending delalloc and forces them to disk.
@@ -10080,7 +10074,8 @@ static int __start_delalloc_inodes(struct btrfs_root 
*root, int delay_iput,
 out:
list_for_each_entry_safe(work, next, , list) {
list_del_init(>list);
-   btrfs_wait_and_free_delalloc_work(work);
+   wait_for_completion(>completion);
+   kfree(work);
}
 
if (!list_empty()) {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] btrfs: Sink extent_tree arguments in try_release_extent_mapping

2018-04-19 Thread Nikolay Borisov
This function already gets the page from which the two extent trees
are referenced. Simplify its signature by moving the code getting the
trees inside the function. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent_io.c | 6 +++---
 fs/btrfs/extent_io.h | 4 +---
 fs/btrfs/inode.c | 8 +---
 3 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index fb32394fd830..9e72e4048acd 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4238,13 +4238,13 @@ static int try_release_extent_state(struct 
extent_map_tree *map,
  * in the range corresponding to the page, both state records and extent
  * map records are removed
  */
-int try_release_extent_mapping(struct extent_map_tree *map,
-  struct extent_io_tree *tree, struct page *page,
-  gfp_t mask)
+int try_release_extent_mapping(struct page *page, gfp_t mask)
 {
struct extent_map *em;
u64 start = page_offset(page);
u64 end = start + PAGE_SIZE - 1;
+   struct extent_io_tree *tree = _I(page->mapping->host)->io_tree;
+   struct extent_map_tree *map = 
_I(page->mapping->host)->extent_tree;
 
if (gfpflags_allow_blocking(mask) &&
page->mapping->host->i_size > SZ_16M) {
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index a53009694b16..cf131fcbecc1 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -278,9 +278,7 @@ typedef struct extent_map *(get_extent_t)(struct 
btrfs_inode *inode,
  int create);
 
 void extent_io_tree_init(struct extent_io_tree *tree, void *private_data);
-int try_release_extent_mapping(struct extent_map_tree *map,
-  struct extent_io_tree *tree, struct page *page,
-  gfp_t mask);
+int try_release_extent_mapping(struct page *page, gfp_t mask);
 int try_release_extent_buffer(struct page *page);
 int lock_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 struct extent_state **cached);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1c6e997d1d3c..a5b8229aa2d8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8746,13 +8746,7 @@ btrfs_readpages(struct file *file, struct address_space 
*mapping,
 }
 static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
 {
-   struct extent_io_tree *tree;
-   struct extent_map_tree *map;
-   int ret;
-
-   tree = _I(page->mapping->host)->io_tree;
-   map = _I(page->mapping->host)->extent_tree;
-   ret = try_release_extent_mapping(map, tree, page, gfp_flags);
+   int ret = try_release_extent_mapping(page, gfp_flags);
if (ret == 1) {
ClearPagePrivate(page);
set_page_private(page, 0);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] btrfs: Remove redundant tree argument from extent_readpages

2018-04-19 Thread Nikolay Borisov
This function is called only from btrfs_readpage and is already passed
the mapping. Simplify its signature by moving the code obtaining
reference to the extent tree in the function. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent_io.c | 6 +++---
 fs/btrfs/extent_io.h | 5 ++---
 fs/btrfs/inode.c | 5 ++---
 3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ffa7bf370f14..5cde345fa0b6 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4126,9 +4126,8 @@ int extent_writepages(struct extent_io_tree *tree,
return ret;
 }
 
-int extent_readpages(struct extent_io_tree *tree,
-struct address_space *mapping,
-struct list_head *pages, unsigned nr_pages)
+int extent_readpages(struct address_space *mapping, struct list_head *pages,
+unsigned nr_pages)
 {
struct bio *bio = NULL;
unsigned page_idx;
@@ -4136,6 +4135,7 @@ int extent_readpages(struct extent_io_tree *tree,
struct page *pagepool[16];
struct page *page;
struct extent_map *em_cached = NULL;
+   struct extent_io_tree *tree = _I(mapping->host)->io_tree;
int nr = 0;
u64 prev_em_start = (u64)-1;
 
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index cf131fcbecc1..2f085108335d 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -424,9 +424,8 @@ int extent_writepages(struct extent_io_tree *tree,
  struct writeback_control *wbc);
 int btree_write_cache_pages(struct address_space *mapping,
struct writeback_control *wbc);
-int extent_readpages(struct extent_io_tree *tree,
-struct address_space *mapping,
-struct list_head *pages, unsigned nr_pages);
+int extent_readpages(struct address_space *mapping, struct list_head *pages,
+unsigned nr_pages);
 int extent_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
__u64 start, __u64 len);
 void set_page_extent_mapped(struct page *page);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a5b8229aa2d8..7b1026a7ffab 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8740,10 +8740,9 @@ static int
 btrfs_readpages(struct file *file, struct address_space *mapping,
struct list_head *pages, unsigned nr_pages)
 {
-   struct extent_io_tree *tree;
-   tree = _I(mapping->host)->io_tree;
-   return extent_readpages(tree, mapping, pages, nr_pages);
+   return extent_readpages(mapping, pages, nr_pages);
 }
+
 static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
 {
int ret = try_release_extent_mapping(page, gfp_flags);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] btrfs: Remove tree argument from extent_writepages

2018-04-19 Thread Nikolay Borisov
It can be directly referenced from the passed address_space so do that.
No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent_io.c | 5 ++---
 fs/btrfs/extent_io.h | 3 +--
 fs/btrfs/inode.c | 5 +
 3 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5cde345fa0b6..22688b5ffb62 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4109,14 +4109,13 @@ int extent_write_locked_range(struct inode *inode, u64 
start, u64 end,
return ret;
 }
 
-int extent_writepages(struct extent_io_tree *tree,
- struct address_space *mapping,
+int extent_writepages(struct address_space *mapping,
  struct writeback_control *wbc)
 {
int ret = 0;
struct extent_page_data epd = {
.bio = NULL,
-   .tree = tree,
+   .tree = _I(mapping->host)->io_tree,
.extent_locked = 0,
.sync_io = wbc->sync_mode == WB_SYNC_ALL,
};
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 2f085108335d..e02bda4b85cd 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -419,8 +419,7 @@ int extent_invalidatepage(struct extent_io_tree *tree,
 int extent_write_full_page(struct page *page, struct writeback_control *wbc);
 int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
  int mode);
-int extent_writepages(struct extent_io_tree *tree,
- struct address_space *mapping,
+int extent_writepages(struct address_space *mapping,
  struct writeback_control *wbc);
 int btree_write_cache_pages(struct address_space *mapping,
struct writeback_control *wbc);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 423a6eb3165c..b4158c2b6f3e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8730,10 +8730,7 @@ static int btrfs_writepage(struct page *page, struct 
writeback_control *wbc)
 static int btrfs_writepages(struct address_space *mapping,
struct writeback_control *wbc)
 {
-   struct extent_io_tree *tree;
-
-   tree = _I(mapping->host)->io_tree;
-   return extent_writepages(tree, mapping, wbc);
+   return extent_writepages(mapping, wbc);
 }
 
 static int
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] btrfs: Remove map argument from try_release_extent_state

2018-04-19 Thread Nikolay Borisov
It's not used in the function so just remove it. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent_io.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 9e72e4048acd..ffa7bf370f14 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4202,8 +4202,7 @@ int extent_invalidatepage(struct extent_io_tree *tree,
  * are locked or under IO and drops the related state bits if it is safe
  * to drop the page.
  */
-static int try_release_extent_state(struct extent_map_tree *map,
-   struct extent_io_tree *tree,
+static int try_release_extent_state(struct extent_io_tree *tree,
struct page *page, gfp_t mask)
 {
u64 start = page_offset(page);
@@ -4278,7 +4277,7 @@ int try_release_extent_mapping(struct page *page, gfp_t 
mask)
free_extent_map(em);
}
}
-   return try_release_extent_state(map, tree, page, mask);
+   return try_release_extent_state(tree, page, mask);
 }
 
 /*
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs_put_block_group] WARNING: CPU: 1 PID: 14674 at fs/btrfs/disk-io.c:3675 free_fs_root+0xc2/0xd0 [btrfs]

2018-04-19 Thread Nikolay Borisov


On 19.04.2018 08:32, Fengguang Wu wrote:
> Hello,
> 
> FYI this happens in mainline kernel and at least dates back to v4.16 .
> 
> It's rather rare error and happens when running xfstests.

Yeah, so this is something which only recently was characterised as
leaking delalloc inodes. I can easily reproduce this when running
generic/019 test. A fix is in the works.

> 
> [  438.327552] BTRFS: error (device dm-0) in __btrfs_free_extent:6962: 
> errno=-5 IO failure
> [  438.336415] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:3070: 
> errno=-5 IO failure
> [  438.345590] BTRFS error (device dm-0): pending csums is 1028096
> [  438.369254] BTRFS error (device dm-0): cleaner transaction attach returned 
> -30
> [  438.377674] BTRFS info (device dm-0): at unmount delalloc count 98304
> [  438.385166] WARNING: CPU: 1 PID: 14674 at fs/btrfs/disk-io.c:3675 
> free_fs_root+0xc2/0xd0 [btrfs]
> [  438.396562] Modules linked in: dm_snapshot dm_thin_pool dm_persistent_data 
> dm_bio_prison dm_bufio dm_flakey dm_mod netconsole btrfs xor zstd_decompress 
> zstd_compress xxhash raid6_pq sd_mod sg snd_hda_codec_hdmi 
> snd_hda_codec_realtek snd_hda_codec_generic ata_generic pata_acpi intel_rapl 
> x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel kvm_intel 
> snd_hda_codec kvm irqbypass crct10dif_pclmul eeepc_wmi crc32_pclmul 
> crc32c_intel ghash_clmulni_intel pata_via asus_wmi sparse_keymap snd_hda_core 
> ata_piix snd_hwdep ppdev rfkill wmi_bmof i915 pcbc snd_pcm snd_timer 
> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops snd aesni_intel 
> parport_pc crypto_simd pcspkr libata soundcore cryptd glue_helper drm wmi 
> parport video shpchp ip_tables
> [  438.467607] CPU: 1 PID: 14674 Comm: umount Not tainted 4.17.0-rc1 #1
> [  438.474798] Hardware name: System manufacturer System Product Name/P8H67-M 
> PRO, BIOS 1002 04/01/2011
> [  438.484804] RIP: 0010:free_fs_root+0xc2/0xd0 [btrfs]
> [  438.490590] RSP: 0018:c90008b0fda8 EFLAGS: 00010282
> [  438.496641] RAX: 88017c5954b0 RBX: 880137f6d800 RCX: 
> 00018013
> [  438.504652] RDX: 0001 RSI: ea0006e93600 RDI: 
> 
> [  438.512679] RBP: 88017b36 R08: 8801ba4dd000 R09: 
> 00018013
> [  438.520644] R10: c90008b0fc70 R11:  R12: 
> c90008b0fdd0
> [  438.528657] R13: 88017b360080 R14: c90008b0fdc8 R15: 
> 
> [  438.536662] FS:  7f06c1a80fc0() GS:8801bfa8() 
> knlGS:
> [  438.545582] CS:  0010 DS:  ES:  CR0: 80050033
> [  438.552157] CR2: 7f06c12b0260 CR3: 00017d3de004 CR4: 
> 000606e0
> [  438.642653] RAX:  RBX: 0234a2d0 RCX: 
> 7f06c1359cf7
> [  438.793559] CPU: 1 PID: 14674 Comm: umount Tainted: GW 
> 4.17.0-rc1 #1
> [  438.802152] Hardware name: System manufacturer System Product Name/P8H67-M 
> PRO, BIOS 1002 04/01/2011
> [  438.812108] RIP: 0010:btrfs_put_block_group+0x41/0x60 [btrfs]
> [  438.819364] RSP: 0018:c90008b0fde0 EFLAGS: 00010206
> [  438.825378] RAX:  RBX: 8801abf63000 RCX: 
> e38e38e38e38e38f
> [  438.833307] RDX: 0001 RSI: 09f6 RDI: 
> 8801abf63000
> [  438.841230] RBP: 88017b36 R08: 88017d3b7750 R09: 
> 000180380010
> [  438.849133] R10: c90008b0fca0 R11:  R12: 
> 8801abf63000
> [  438.857047] R13: 88017b3600a0 R14: 8801abf630e0 R15: 
> dead0100
> [  438.864943] FS:  7f06c1a80fc0() GS:8801bfa8() 
> knlGS:
> [  438.873793] CS:  0010 DS:  ES:  CR0: 80050033
> [  438.880320] CR2: 7f06c12b0260 CR3: 00017d3de004 CR4: 
> 000606e0
> [  438.888226] Call Trace:
> [  438.891454]  btrfs_free_block_groups+0x138/0x3d0 [btrfs]
> [  438.897569]  close_ctree+0x13b/0x2f0 [btrfs]
> [  438.902618]  generic_shutdown_super+0x6c/0x120:
>   __read_once_size at 
> include/linux/compiler.h:188
>(inlined by) list_empty at 
> include/linux/list.h:203
>(inlined by) 
> generic_shutdown_super at fs/super.c:442
> [  438.907801]  kill_anon_super+0xe/0x20:
>   kill_anon_super at 
> fs/super.c:1038
> [  438.912223]  btrfs_kill_super+0x13/0x100 [btrfs]
> [  438.917598]  deactivate_locked_super+0x3f/0x70:
>   deactivate_locked_super at 
> fs/super.c:320
> [  438.922757]  cleanup_mnt+0x3b/0x70:
>   cleanup_mnt at 
> fs/namespace.c:1174
> [  438.926879]  task_work_run+0xa3/0xe0:
>   task_work_run at 
> kernel/task_work.c:115 (discriminator 1)
> [  438.931205]  exit_to_usermode_loop+0x9e/0xa0:
>   

[PATCH] btrfs: Add test that checks rmdir(2) can delete a subvolume

2018-04-19 Thread Misono Tomohiro
Add btrfs test that checks "rmdir" or "rm -r" command can delete a
subvolume like an ordinary drectory.

This behavior has been restricted long time but becomes allowed by
following patch in the kernel:
  btrfs: Allow rmdir(2) to delete an empty subvolume

Signed-off-by: Tomohiro Misono 
---
 tests/btrfs/159 | 140 
 tests/btrfs/159.out |   2 +
 tests/btrfs/group   |   1 +
 3 files changed, 143 insertions(+)
 create mode 100755 tests/btrfs/159
 create mode 100644 tests/btrfs/159.out

diff --git a/tests/btrfs/159 b/tests/btrfs/159
new file mode 100755
index ..2d830502
--- /dev/null
+++ b/tests/btrfs/159
@@ -0,0 +1,140 @@
+#! /bin/bash
+# FS QA Test btrfs/159
+#
+# QA test that checks rmdir(2) works for subvolumes like ordinary directories.
+#
+# This behavior has been restricted long time but becomes allowed by following
+# patch in the kernel:
+#   btrfs: Allow rmdir(2) to delete an empty subvolume
+#
+#---
+# Copyright (c) 2018 Fujitsu. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+_create_subvol()
+{
+   $BTRFS_UTIL_PROG subvolume create $1 2>&1 | \
+   _filter_scratch >> $seqres.full
+}
+
+_create_snapshot()
+{
+   $BTRFS_UTIL_PROG subvolume snapshot $@ 2>&1 | \
+   _filter_scratch >> $seqres.full
+}
+
+_rmdir_subvol()
+{
+   rmdir $1 2>&1 | _filter_scratch
+   return ${PIPESTATUS[0]}
+}
+
+_rm_r_subvol() {
+   rm -r $1 2>&1 | _filter_scratch
+   return ${PIPESTATUS[0]}
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/btrfs
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+
+_scratch_mkfs > /dev/null 2>&1 || _fail "mkfs failed"
+_scratch_mount
+
+# Check that an empty subvolume can be deleted by rmdir
+_create_subvol $SCRATCH_MNT/sub1
+_rmdir_subvol $SCRATCH_MNT/sub1
+
+# Check that non-empty subvolume cannot be deleted by rmdir
+_create_subvol $SCRATCH_MNT/sub2
+touch $SCRATCH_MNT/sub2/file
+_rmdir_subvol $SCRATCH_MNT/sub2 >> $seqres.full && \
+   _fail "rmdir should fail for non-empty subvolume"
+rm $SCRATCH_MNT/sub2/file
+_rmdir_subvol $SCRATCH_MNT/sub2
+
+# Check that read-only empty subvolume can be deleted by rmdir
+_create_subvol $SCRATCH_MNT/sub3
+_create_snapshot -r $SCRATCH_MNT/sub3 $SCRATCH_MNT/snap
+$BTRFS_UTIL_PROG property set $SCRATCH_MNT/sub3 ro true 2>&1 | \
+   _filter_scratch >> $seqres.full
+_rmdir_subvol $SCRATCH_MNT/sub3
+_rmdir_subvol $SCRATCH_MNT/snap
+
+# Check that the default subvolume cannot be deleted by rmdir
+_create_subvol $SCRATCH_MNT/sub4
+subvolid=$(_btrfs_get_subvolid $SCRATCH_MNT sub4)
+$BTRFS_UTIL_PROG subvolume set-default $subvolid $SCRATCH_MNT 2>&1 | \
+   _filter_scratch >> $seqres.full
+_rmdir_subvol $SCRATCH_MNT/sub4 >> $seqres.full && \
+   _fail "rmdir should fail for the default subvolume"
+
+# Check that subvolume stub (created by snapshot) can be deleted by rmdir
+# (Note: this has been always allowed)
+_create_subvol $SCRATCH_MNT/sub5
+_create_subvol $SCRATCH_MNT/sub5/sub6
+_create_snapshot $SCRATCH_MNT/sub5 $SCRATCH_MNT/snap2
+rmdir $SCRATCH_MNT/snap2/sub6
+
+# Check that rm -r works for both non-snapshot subvolume and snapshot
+_create_subvol $SCRATCH_MNT/sub7
+mkdir $SCRATCH_MNT/sub7/dir
+_create_subvol $SCRATCH_MNT/sub7/dir/sub8
+touch $SCRATCH_MNT/sub7/dir/sub8/file
+
+_create_snapshot $SCRATCH_MNT/sub7 $SCRATCH_MNT/snap3
+_create_snapshot -r $SCRATCH_MNT/sub7 $SCRATCH_MNT/snap4
+
+_rm_r_subvol $SCRATCH_MNT/sub7
+_rm_r_subvol $SCRATCH_MNT/snap3
+_rm_r_subvol $SCRATCH_MNT/snap4 >> $seqres.full && \
+   _fail "rm -r should fail for non-empty readonly subvolume"
+
+$BTRFS_UTIL_PROG property set $SCRATCH_MNT/snap4 ro false 2>&1 | \
+   _filter_scratch >> $seqres.full
+_rm_r_subvol $SCRATCH_MNT/snap4
+
+# success, all 

Re: nvme+btrfs+compression sensibility and benchmark

2018-04-19 Thread Nikolay Borisov


On 18.04.2018 22:24, Chris Murphy wrote:
> On Wed, Apr 18, 2018 at 10:38 AM, Austin S. Hemmelgarn > wrote:
> 
>> For reference, the zstd compression in BTRFS uses level 3 by default (as
>> does zlib compression IIRC), though I'm not sure about lzop (I think it
>> uses the lowest compression setting).
>>
> 
> 
> The user space tool, zstd, does default to 3, according to its man page.
> 
>-# # compression level [1-19] (default: 3)
> 
> 
> However, the kernel is claiming it's level 0, which doesn't exist in the
> man page. So I have no idea what we're using. This is what I get with mount
> option compress=zstd
> 

Currently the kernel-mode zstd compression doesn't really support any
levels (compress_level is not set, even if it's passed, and even then
zstd_set_level is also unimplemented). So this number doesn't really
make any difference.

> [4.097858] BTRFS info (device nvme0n1p9): use zstd compression, level 0
> 
> 
> 
> --
> Chris Murphy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html