Re: [PATCH 1/3] btrfs: simplify counting number of eb pages

2018-04-23 Thread Nikolay Borisov


On 24.04.2018 02:03, David Sterba wrote:
> The eb length is nodesize, as initialized in __alloc_extent_buffer.
> Regardless of start, we should always get the same number of pages, so
> use that fact.
> 
> Signed-off-by: David Sterba 
> ---
>  fs/btrfs/extent_io.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index a53009694b16..ee92c1289edd 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -454,8 +454,7 @@ void wait_on_extent_buffer_writeback(struct extent_buffer 
> *eb);
>  
>  static inline unsigned long num_extent_pages(u64 start, u64 len)
>  {
> - return ((start + len + PAGE_SIZE - 1) >> PAGE_SHIFT) -
> - (start >> PAGE_SHIFT);
> + return len >> PAGE_SHIFT;

Shouldn't this really be len + PAGE_SIZE -1 or in fact DIV_ROUND_DOWN
(len, PAGE_SIZE). Because with a nodesize of 4k (and basically less than
a page size) we can get into a situation where we do:

4096 >> 13 = 0

On powerpc for example we have:

arch/powerpc/include/asm/page.h:#define PAGE_SHIFT  18
arch/powerpc/include/asm/page.h:#define PAGE_SHIFT  16
arch/powerpc/include/asm/page.h:#define PAGE_SHIFT  14
arch/powerpc/include/asm/page.h:#define PAGE_SHIFT  12


>  }
>  
>  static inline void extent_buffer_get(struct extent_buffer *eb)
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs-progs: do not merge tree block refs have different root_id

2018-04-23 Thread Su Yue
For an extent item which contains many tree block backrefs, like
=
In 020-extent-ref-cases/keyed_block_ref.img

item 10 key (29470720 METADATA_ITEM 0) itemoff 3450 itemsize 222
refs 23 gen 10 flags TREE_BLOCK
tree block skinny level 0
tree block backref root 278
tree block backref root 277
tree block backref root 276
tree block backref root 275
tree block backref root 274
tree block backref root 273
tree block backref root 272
tree block backref root 271
tree block backref root 270
tree block backref root 269
tree block backref root 268
tree block backref root 267
tree block backref root 266
tree block backref root 265
tree block backref root 264
tree block backref root 263
tree block backref root 262
tree block backref root 261
tree block backref root 260
tree block backref root 259
tree block backref root 258
tree block backref root 257
=
In find_parent_nodes(), these refs's parents are 0, then __merge_refs
will merge refs to one ref. It causes only one root to be returned.

So, if both parents are 0, do not merge refs.

Lowmem check calls find_parent_nodes frequently to decide whether.
check an extent buffer or not. These bug influences bytes accounting.

Signed-off-by: Su Yue 
---
 backref.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/backref.c b/backref.c
index 51553c702187..daadb6299c79 100644
--- a/backref.c
+++ b/backref.c
@@ -507,6 +507,8 @@ static void __merge_refs(struct pref_state *prefstate, int 
mode)
} else {
if (ref1->parent != ref2->parent)
continue;
+   if (ref1->parent == 0)
+   continue;
}
 
eie = ref1->inode_list;
-- 
2.17.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs-progs: remove comments about delayed ref in backref.c

2018-04-23 Thread Su Yue
There is no delayed ref in btrfs-progs, so remove related comments.

Signed-off-by: Su Yue 
---
 backref.c | 19 +++
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/backref.c b/backref.c
index 27309e07a1e9..c144dbf060f2 100644
--- a/backref.c
+++ b/backref.c
@@ -155,19 +155,6 @@ static void init_pref_state(struct pref_state *prefstate)
  * - if you cannot add the parent or a correct key, then we will look into the
  *   block later to set a correct key
  *
- * delayed refs
- * 
- *backref type | shared | indirect | shared | indirect
- * information |   tree | tree |   data | data
- * ++--++--
- *  parent logical |y   | -|-   | -
- *  key to resolve |-   | y|y   | y
- *  tree block logical |-   | -|-   | -
- *  root for resolving |y   | y|y   | y
- *
- * - column 1:   we've the parent -> done
- * - column 2, 3, 4: we use the key to find the parent
- *
  * on disk refs (inline or keyed)
  * ==
  *backref type | shared | indirect | shared | indirect
@@ -735,9 +722,9 @@ static int __add_keyed_refs(struct btrfs_fs_info *fs_info,
 }
 
 /*
- * this adds all existing backrefs (inline backrefs, backrefs and delayed
- * refs) for the given bytenr to the refs list, merges duplicates and resolves
- * indirect refs to their parent bytenr.
+ * this adds all existing backrefs (inline backrefs, backrefs for the given
+ * bytenr to the refs list, merges duplicates and resolves indirect refs to
+ * their parent bytenr.
  * When roots are found, they're added to the roots list
  *
  * FIXME some caching might speed things up
-- 
2.17.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: remove useless branch in __merge_refs

2018-04-23 Thread Su Yue
After call of ref_for_same_block, ref1->parent must equals to
ref2->parent, the block of exchange is never reached.

So remove the block of exchange.

Signed-off-by: Su Yue 
---
 backref.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/backref.c b/backref.c
index c144dbf060f2..51553c702187 100644
--- a/backref.c
+++ b/backref.c
@@ -497,7 +497,6 @@ static void __merge_refs(struct pref_state *prefstate, int 
mode)
for (pos2 = pos1->next, n2 = pos2->next; pos2 != head;
 pos2 = n2, n2 = pos2->next) {
struct __prelim_ref *ref2;
-   struct __prelim_ref *xchg;
struct extent_inode_elem *eie;
 
ref2 = list_entry(pos2, struct __prelim_ref, list);
@@ -505,11 +504,6 @@ static void __merge_refs(struct pref_state *prefstate, int 
mode)
if (mode == 1) {
if (!ref_for_same_block(ref1, ref2))
continue;
-   if (!ref1->parent && ref2->parent) {
-   xchg = ref1;
-   ref1 = ref2;
-   ref2 = xchg;
-   }
} else {
if (ref1->parent != ref2->parent)
continue;
-- 
2.17.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs: print-tree: Add locking status output for debug build

2018-04-23 Thread Qu Wenruo
It's pretty handy if we can get debug output for locking status of an
extent buffer, specially for race related debugging.

So add the following output for btrfs_print_tree() and
btrfs_print_leaf():
- refs
- write_locks (as w:%d)
- read_locks (as r:%d)
- blocking_writers (as bw:%d)
- blocking_readers (as br:%d)
- spinning_writers (as sw:%d)
- spinning_readers (as sr:%d)
- lock_owner
- current->pid

Signed-off-by: Qu Wenruo 
---
changelog:
v2:
  Use correct specifier for int (both atomic_t and pid_t, no special
  specifier found in Documentation/core-api/printk-formats.rst)
---
 fs/btrfs/print-tree.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c
index 9904cf741e1c..7f0ada51e5fe 100644
--- a/fs/btrfs/print-tree.c
+++ b/fs/btrfs/print-tree.c
@@ -179,6 +179,26 @@ static void print_uuid_item(struct extent_buffer *l, 
unsigned long offset,
}
 }
 
+/*
+ * Helper to output refs and locking status.
+ *
+ * Useful to debug race related problem
+ */
+static void print_eb_refs_lock(struct extent_buffer *eb)
+{
+#ifdef CONFIG_BTRFS_DEBUG
+   btrfs_info(eb->fs_info,
+"refs %u lock(w:%u r:%u bw:%u br:%u sw:%u sr:%u) lock_owner %u current %u",
+  atomic_read(>refs), atomic_read(>write_locks),
+  atomic_read(>read_locks),
+  atomic_read(>blocking_writers),
+  atomic_read(>blocking_readers),
+  atomic_read(>spinning_writers),
+  atomic_read(>spinning_readers),
+  eb->lock_owner, current->pid);
+#endif
+}
+
 void btrfs_print_leaf(struct extent_buffer *l)
 {
struct btrfs_fs_info *fs_info;
@@ -206,6 +226,7 @@ void btrfs_print_leaf(struct extent_buffer *l)
   "leaf %llu gen %llu total ptrs %d free space %d owner %llu",
   btrfs_header_bytenr(l), btrfs_header_generation(l), nr,
   btrfs_leaf_free_space(fs_info, l), btrfs_header_owner(l));
+   print_eb_refs_lock(l);
for (i = 0 ; i < nr ; i++) {
item = btrfs_item_nr(i);
btrfs_item_key_to_cpu(l, , i);
@@ -360,6 +381,7 @@ void btrfs_print_tree(struct extent_buffer *c, bool follow)
   btrfs_header_bytenr(c), level, btrfs_header_generation(c),
   nr, (u32)BTRFS_NODEPTRS_PER_BLOCK(fs_info) - nr,
   btrfs_header_owner(c));
+   print_eb_refs_lock(c);
for (i = 0; i < nr; i++) {
btrfs_node_key_to_cpu(c, , i);
pr_info("\tkey %d (%llu %u %llu) block %llu gen %llu\n",
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/4] btrfs: Add csum type check for btrfs_check_super_valid()

2018-04-23 Thread Qu Wenruo
Just like incompat flags check, although we have already done super csum
type check before calling btrfs_check_super_valid(), we can still add
such check for later write time check.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/disk-io.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3968f7ff8de2..9282a6ac91db 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3985,6 +3985,15 @@ static int btrfs_validate_super(struct btrfs_fs_info 
*fs_info)
btrfs_err(fs_info, "no valid FS found");
ret = -EINVAL;
}
+   /*
+* For write time check, as for mount time we have checked csum before
+* calling btrfs_check_super_valid(), so it must be a corruption
+*/
+   if (btrfs_super_csum_type(sb) >= ARRAY_SIZE(btrfs_csum_sizes)) {
+   btrfs_err(fs_info, "corrupted csum type %u",
+ btrfs_super_csum_type(sb));
+   ret = -EINVAL;
+   }
if (btrfs_super_flags(sb) & ~BTRFS_SUPER_FLAG_SUPP) {
btrfs_err(fs_info, "unrecognized or unsupported super flag: 
%llu",
btrfs_super_flags(sb) & ~BTRFS_SUPER_FLAG_SUPP);
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/4] btrfs: Add incompat flags check for btrfs_check_super_valid()

2018-04-23 Thread Qu Wenruo
Although we have already checked incompat flags manually before really
mounting it, we could still enhance btrfs_check_super_valid() to check
incompat flags for later write time super block validation check.

This patch adds such incompat flags check for btrfs_check_super_valid(),
currently it won't be triggered, but provides the basis for later write
time check.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/disk-io.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 23b5c90cdbb2..3968f7ff8de2 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4104,6 +4104,19 @@ static int btrfs_validate_super(struct btrfs_fs_info 
*fs_info)
ret = -EINVAL;
}
 
+   /*
+* Before calling btrfs_check_super_valid() we have already checked
+* incompat flags. So if we developr new incompat flags, it's must be
+* some corruption.
+*/
+   if (btrfs_super_incompat_flags(sb) & ~BTRFS_FEATURE_INCOMPAT_SUPP) {
+   btrfs_err(fs_info,
+   "corrupted incompat flags detected 0x%llx, supported 0x%llx",
+ btrfs_super_incompat_flags(sb),
+ BTRFS_FEATURE_INCOMPAT_SUPP);
+   ret = -EINVAL;
+   }
+
/*
 * The generation is a global counter, we'll trust it more than the 
others
 * but it's still possible that it's the one that's wrong.
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/4] btrfs: Add write time super block validation

2018-04-23 Thread Qu Wenruo
This patchset can be fetched from github:
https://github.com/adam900710/linux/tree/write_time_sb_check

We have 2 reports about corrupted btrfs super block, which has some garbage
in its super block, but otherwise it's completely fine and its csum even
matches.

This means we develop memory corruption during btrfs mount time.
It's not clear whether it's caused by btrfs or some other kernel module,
but at least let's do write time verification to catch such corruption
early.

Changelog:
v2:
  Rename btrfs_check_super_valid() to btrfs_validate_super() suggested
  by Nikolay and David.

Qu Wenruo (4):
  btrfs: Rename btrfs_check_super_valid() to btrfs_validate_super()
  btrfs: Add incompat flags check for btrfs_check_super_valid()
  btrfs: Add csum type check for btrfs_check_super_valid()
  btrfs: Do super block verification before writing it to disk

 fs/btrfs/disk-io.c | 58 --
 1 file changed, 51 insertions(+), 7 deletions(-)

-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/4] btrfs: Rename btrfs_check_super_valid() to btrfs_validate_super()

2018-04-23 Thread Qu Wenruo
Makes the function name a little shorter and easier to read.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/disk-io.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 60caa68c3618..23b5c90cdbb2 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -55,7 +55,7 @@
 static const struct extent_io_ops btree_extent_io_ops;
 static void end_workqueue_fn(struct btrfs_work *work);
 static void free_fs_root(struct btrfs_root *root);
-static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info);
+static int btrfs_validate_super(struct btrfs_fs_info *fs_info);
 static void btrfs_destroy_ordered_extents(struct btrfs_root *root);
 static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
  struct btrfs_fs_info *fs_info);
@@ -2668,7 +2668,7 @@ int open_ctree(struct super_block *sb,
 
memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE);
 
-   ret = btrfs_check_super_valid(fs_info);
+   ret = btrfs_validate_super(fs_info);
if (ret) {
btrfs_err(fs_info, "superblock contains fatal errors");
err = -EINVAL;
@@ -3974,7 +3974,7 @@ int btrfs_read_buffer(struct extent_buffer *buf, u64 
parent_transid, int level,
  level, first_key);
 }
 
-static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info)
+static int btrfs_validate_super(struct btrfs_fs_info *fs_info)
 {
struct btrfs_super_block *sb = fs_info->super_copy;
u64 nodesize = btrfs_super_nodesize(sb);
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/4] btrfs: Do super block verification before writing it to disk

2018-04-23 Thread Qu Wenruo
There are already 2 reports about strangely corrupted super blocks,
where csum still matches but extra garbage gets slipped into super block.

The corruption would looks like:
--
superblock: bytenr=65536, device=/dev/sdc1
-
csum_type   41700 (INVALID)
csum0x3b252d3a [match]
bytenr  65536
flags   0x1
( WRITTEN )
magic   _BHRfS_M [match]
...
incompat_flags  0x5b224169
( MIXED_BACKREF |
  COMPRESS_LZO |
  BIG_METADATA |
  EXTENDED_IREF |
  SKINNY_METADATA |
  unknown flag: 0x5b224000 )
...
--
Or
--
superblock: bytenr=65536, device=/dev/mapper/x
-
csum_type  35355 (INVALID)
csum_size  32
csum   0xf0dbeddd [match]
bytenr 65536
flags  0x1
   ( WRITTEN )
magic  _BHRfS_M [match]
...
incompat_flags 0x176d2169
   ( MIXED_BACKREF |
 COMPRESS_LZO |
 BIG_METADATA |
 EXTENDED_IREF |
 SKINNY_METADATA |
 unknown flag: 0x176d2000 )
--

Obviously, csum_type and incompat_flags get some garbage, but its csum
still matches, which means kernel calculates the csum based on corrupted
super block memory.
And after manually fixing these values, the filesystem is completely
healthy without any problem exposed by btrfs check.

Although the cause is still unknown, at least detect it and prevent further
corruption.

Reported-by: Ken Swenson 
Reported-by: Ben Parsons <9parso...@gmail.com>
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/disk-io.c | 36 +---
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9282a6ac91db..0f5771244641 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -55,7 +55,9 @@
 static const struct extent_io_ops btree_extent_io_ops;
 static void end_workqueue_fn(struct btrfs_work *work);
 static void free_fs_root(struct btrfs_root *root);
-static int btrfs_validate_super(struct btrfs_fs_info *fs_info);
+static int btrfs_validate_super(struct btrfs_fs_info *fs_info,
+  struct btrfs_super_block *sb,
+  int super_mirror);
 static void btrfs_destroy_ordered_extents(struct btrfs_root *root);
 static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
  struct btrfs_fs_info *fs_info);
@@ -2668,7 +2670,7 @@ int open_ctree(struct super_block *sb,
 
memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE);
 
-   ret = btrfs_validate_super(fs_info);
+   ret = btrfs_validate_super(fs_info, fs_info->super_copy, 0);
if (ret) {
btrfs_err(fs_info, "superblock contains fatal errors");
err = -EINVAL;
@@ -3563,6 +3565,16 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
max_mirrors)
sb = fs_info->super_for_commit;
dev_item = >dev_item;
 
+   /*
+* super_bytenr will be updated in write_dev_supers(), even if it is
+* corrupted in current copy, it won't reach disk. So skip bytenr check.
+*/
+   if (btrfs_validate_super(fs_info, sb, -1)) {
+   btrfs_err(fs_info,
+   "superblock corruption detected before transaction commit");
+   return -EUCLEAN;
+   }
+
mutex_lock(_info->fs_devices->device_list_mutex);
head = _info->fs_devices->devices;
max_errors = btrfs_super_num_devices(fs_info->super_copy) - 1;
@@ -3974,9 +3986,18 @@ int btrfs_read_buffer(struct extent_buffer *buf, u64 
parent_transid, int level,
  level, first_key);
 }
 
-static int btrfs_validate_super(struct btrfs_fs_info *fs_info)
+/*
+ * Check the validation of btrfs super block.
+ *
+ * @sb:super block to check
+ * @super_mirror:  the super block number to check its bytenr.
+ * 0 means the primary (1st) sb, 1 and 2 means 2nd and
+ * 3rd backup sb, while -1 means to skip bytenr check.
+ */
+static int btrfs_validate_super(struct btrfs_fs_info *fs_info,
+   struct btrfs_super_block *sb,
+   int super_mirror)
 {
-   struct btrfs_super_block *sb = fs_info->super_copy;
u64 nodesize = btrfs_super_nodesize(sb);
u64 sectorsize = btrfs_super_sectorsize(sb);
int ret = 0;
@@ -4088,9 +4109,10 @@ static int 

Re: [PATCH] btrfs-progs: treewide: Replace strerror(errno) with %m.

2018-04-23 Thread Su Yue



On 01/24/2018 03:42 AM, David Sterba wrote:

On Sun, Jan 07, 2018 at 01:54:21PM -0800, Rosen Penev wrote:

As btrfs is specific to Linux, %m can be used instead of strerror(errno)
in format strings. This has some size reduction benefits for embedded
systems.


Makes sense.


glibc, musl, and uclibc-ng all support %m as a modifier to printf.
A quick glance at the BIONIC libc source indicates that it has
support for %m as well. BSDs and Windows do not but I do believe
them to be beyond the scope of btrfs-progs.


Thanks for checking the compatibility. The %m can be substituted
by a wrapper if this becomes a problem in the future.


It seems a little problem in output now since it's not caused by this
patch.

$ touch /tmp/tmp
$ btrfs-image -r /tmp/tmp /tmp/test.img
ERROR: unable to read cluster: Success
ERROR: restore failed: Success
$ echo $?
1

Though btrfs-image failed, %m(strerror(errno) argument makes output
looks strange.
IMO, here we should judge errno before output. But it means size 
reduction is inavaiable for embedded systems.


Thanks,
Su


Compiled sizes on Ubuntu 16.04:

Before:
3916512 btrfs



After:
3908744 btrfs


the delta is about 7KiB, that's not much but still counts. I would not
object further optimizations towards size reduction as long as the code
remains maintainable.


233256  libbtrfs.so.0.1
4899bcp
2366560 btrfs-convert
2207432 btrfs-corrupt-block
13302   btrfs-debugfs
2151104 btrfs-debug-tree
2134968 btrfs-find-root
2281864 btrfs-image
2143536 btrfs-map-logical
2129704 btrfs-select-super
2151552 btrfstune
2130696 btrfs-zero-log
2276272 mkfs.btrfs


Some of the utilities are typically installed by default, the binaries
share a lot of code as they get built from the same object files. I had
once an idea of a compound binary that would switch the function by the
name of the executable. Similar to what busybox does.

https://github.com/kdave/btrfs-progs/commit/8fc697a7f763f39f3afe0abaa68ac13a49ac8a86

---
* btrfs
* mkfs.btrfs
* btrfstun
* btrfs-image
* btrfs-convert
* btrfs-debug-tree
* btrfs-show-super
* btrfs-find-root

The static target is also supported. The name of resulting boxed
binaries is btrfs.box and btrfs.box.static .

textdata bss dec hex filename
  550988   19120   15444  585552   8ef50 btrfs
1562099   25316   42256 1629671  18dde7 btrfs.static
  659504   21104   16492  697100   aa30c btrfs.box
1817274   27988   44088 1889350  1cd446 btrfs.box.static
---

I was not sure if this is was just another good idea waiting for a usecase (or
not), so I haven't continued past the prototype. Please let me know if you'd be
interested in this functionality, the patch is fairly trivial to update.


@@ -815,7 +815,7 @@ static const char * const cmd_filesystem_sync_usage[] = {
  
  static int cmd_filesystem_sync(int argc, char **argv)

  {
-   int fd, res, e;
+   int fd, res;
char*path;
DIR *dirstream = NULL;
  
@@ -831,10 +831,9 @@ static int cmd_filesystem_sync(int argc, char **argv)

return 1;
  
  	res = ioctl(fd, BTRFS_IOC_SYNC);

-   e = errno;
close_file_or_dir(fd, dirstream);
if( res < 0 ){
-   error("sync ioctl failed on '%s': %s", path, strerror(e));
+   error("sync ioctl failed on '%s': %m", path);


Let me use that one as example, there are a few more similar updates.

There's potentially lost errno from the ioctl if something inside
close_file_or_dir() overwrites it, as there are closedir and close. This
is highly unlikely and I'll deal with that separately, so I'm going to
apply the patch without further changes. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


4.14.35: possible circular locking dependency detected

2018-04-23 Thread Petr Janecek
Hi,
   after scrub start, scrub cancel, umount, mount of a two disk raid1
(data + metadata):

[12999.229791] ==
[12999.236029] WARNING: possible circular locking dependency detected
[12999.242261] 4.14.35 #36 Not tainted
[12999.245806] --
[12999.252037] btrfs/4682 is trying to acquire lock:
[12999.256794]  ("%s-%s""btrfs", name){+.+.}, at: [] 
flush_workqueue+0x70/0x480
[12999.265486] 
   but task is already holding lock:
[12999.271390]  (_info->scrub_lock){+.+.}, at: [] 
btrfs_scrub_dev+0x311/0x650 [btrfs]
[12999.280887] 
   which lock already depends on the new lock.

[12999.289147] 
   the existing dependency chain (in reverse order) is:
[12999.296721] 
   -> #3 (_info->scrub_lock){+.+.}:
[12999.302949]__mutex_lock+0x66/0x9a0
[12999.307287]btrfs_scrub_dev+0x105/0x650 [btrfs]
[12999.312615]btrfs_ioctl+0x19a0/0x2030 [btrfs]
[12999.317706]do_vfs_ioctl+0x8c/0x6a0
[12999.321951]SyS_ioctl+0x6f/0x80
[12999.325860]do_syscall_64+0x64/0x170
[12999.330202]entry_SYSCALL_64_after_hwframe+0x42/0xb7
[12999.335930] 
   -> #2 (_devs->device_list_mutex){+.+.}:
[12999.342798]__mutex_lock+0x66/0x9a0
[12999.347049]reada_start_machine_worker+0xb0/0x3c0 [btrfs]
[12999.353324]btrfs_worker_helper+0x8b/0x630 [btrfs]
[12999.358926]process_one_work+0x242/0x6a0
[12999.363613]worker_thread+0x32/0x3f0
[12999.367894]kthread+0x11f/0x140
[12999.371792]ret_from_fork+0x3a/0x50
[12999.376038] 
   -> #1 ((>normal_work)){+.+.}:
[12999.382254]process_one_work+0x20c/0x6a0
[12999.386881]worker_thread+0x32/0x3f0
[12999.391216]kthread+0x11f/0x140
[12999.395107]ret_from_fork+0x3a/0x50
[12999.399290] 
   -> #0 ("%s-%s""btrfs", name){+.+.}:
[12999.405455]lock_acquire+0x93/0x220
[12999.409683]flush_workqueue+0x97/0x480
[12999.414129]drain_workqueue+0xa4/0x180
[12999.418620]destroy_workqueue+0xe/0x1e0
[12999.423202]btrfs_destroy_workqueue+0x57/0x280 [btrfs]
[12999.429137]scrub_workers_put+0x29/0x60 [btrfs]
[12999.434426]btrfs_scrub_dev+0x324/0x650 [btrfs]
[12999.439764]btrfs_ioctl+0x19a0/0x2030 [btrfs]
[12999.444870]do_vfs_ioctl+0x8c/0x6a0
[12999.449115]SyS_ioctl+0x6f/0x80
[12999.453017]do_syscall_64+0x64/0x170
[12999.457367]entry_SYSCALL_64_after_hwframe+0x42/0xb7
[12999.463103] 
   other info that might help us debug this:

[12999.471370] Chain exists of:
 "%s-%s""btrfs", name --> _devs->device_list_mutex --> 
_info->scrub_lock

[12999.484396]  Possible unsafe locking scenario:

[12999.490524]CPU0CPU1
[12999.495161]
[12999.499840]   lock(_info->scrub_lock);
[12999.504001]lock(_devs->device_list_mutex);
[12999.511341]lock(_info->scrub_lock);
[12999.518074]   lock("%s-%s""btrfs", name);
[12999.522243] 
*** DEADLOCK ***

[12999.528347] 2 locks held by btrfs/4682:
[12999.532330]  #0:  (sb_writers#15){.+.+}, at: [] 
mnt_want_write_file+0x33/0xb0
[12999.541310]  #1:  (_info->scrub_lock){+.+.}, at: [] 
btrfs_scrub_dev+0x311/0x650 [btrfs]
[12999.551389] 
   stack backtrace:
[12999.555917] CPU: 3 PID: 4682 Comm: btrfs Not tainted 4.14.35 #36
[12999.562139] Hardware name: Supermicro X8SIL/X8SIL, BIOS 1.2a   06/27/2012
[12999.569487] Call Trace:
[12999.572049]  dump_stack+0x67/0x95
[12999.575530]  print_circular_bug.isra.42+0x1ce/0x1db
[12999.580580]  __lock_acquire+0x121c/0x1300
[12999.584758]  ? lock_acquire+0x93/0x220
[12999.588668]  lock_acquire+0x93/0x220
[12999.592316]  ? flush_workqueue+0x70/0x480
[12999.596485]  flush_workqueue+0x97/0x480
[12999.600487]  ? flush_workqueue+0x70/0x480
[12999.604596]  ? find_held_lock+0x2d/0x90
[12999.608541]  ? drain_workqueue+0xa4/0x180
[12999.612724]  drain_workqueue+0xa4/0x180
[12999.616730]  destroy_workqueue+0xe/0x1e0
[12999.620784]  btrfs_destroy_workqueue+0x57/0x280 [btrfs]
[12999.626252]  scrub_workers_put+0x29/0x60 [btrfs]
[12999.631087]  btrfs_scrub_dev+0x324/0x650 [btrfs]
[12999.635814]  ? __sb_start_write+0x137/0x1a0
[12999.640156]  ? mnt_want_write_file+0x33/0xb0
[12999.644624]  btrfs_ioctl+0x19a0/0x2030 [btrfs]
[12999.649194]  ? find_held_lock+0x2d/0x90
[12999.653146]  ? do_vfs_ioctl+0x8c/0x6a0
[12999.657095]  ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs]
[12999.663572]  do_vfs_ioctl+0x8c/0x6a0
[12999.667203]  ? __fget+0x100/0x1f0
[12999.670627]  SyS_ioctl+0x6f/0x80
[12999.673954]  do_syscall_64+0x64/0x170
[12999.67]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[12999.682993] RIP: 0033:0x7f409f638f07
[12999.686667] RSP: 002b:7f409f549d38 EFLAGS: 0246 ORIG_RAX: 
0010

[PATCH 3/3] btrfs: switch types to int when counting eb pages

2018-04-23 Thread David Sterba
The loops iterating eb pages use unsigned long, that's an overkill as
we know that there are at most 16 pages (64k / 4k), and 4 by default
(with nodesize 16k).

Signed-off-by: David Sterba 
---
 fs/btrfs/extent_io.c | 44 ++--
 fs/btrfs/extent_io.h |  2 +-
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0cc5d6ae1876..5bdfdb9c8777 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2062,7 +2062,7 @@ int repair_eb_io_failure(struct btrfs_fs_info *fs_info,
 struct extent_buffer *eb, int mirror_num)
 {
u64 start = eb->start;
-   unsigned long i, num_pages = num_extent_pages(eb);
+   int i, num_pages = num_extent_pages(eb);
int ret = 0;
 
if (sb_rdonly(fs_info->sb))
@@ -3541,7 +3541,7 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
  struct btrfs_fs_info *fs_info,
  struct extent_page_data *epd)
 {
-   unsigned long i, num_pages;
+   int i, num_pages;
int flush = 0;
int ret = 0;
 
@@ -3715,7 +3715,7 @@ static noinline_for_stack int write_one_eb(struct 
extent_buffer *eb,
struct extent_io_tree *tree = _I(fs_info->btree_inode)->io_tree;
u64 offset = eb->start;
u32 nritems;
-   unsigned long i, num_pages;
+   int i, num_pages;
unsigned long start, end;
unsigned int write_flags = wbc_to_write_flags(wbc) | REQ_META;
int ret = 0;
@@ -4648,7 +4648,7 @@ int extent_buffer_under_io(struct extent_buffer *eb)
  */
 static void btrfs_release_extent_buffer_page(struct extent_buffer *eb)
 {
-   unsigned long index;
+   int index;
struct page *page;
int mapped = !test_bit(EXTENT_BUFFER_DUMMY, >bflags);
 
@@ -4744,10 +4744,10 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, 
u64 start,
 
 struct extent_buffer *btrfs_clone_extent_buffer(struct extent_buffer *src)
 {
-   unsigned long i;
+   int i;
struct page *p;
struct extent_buffer *new;
-   unsigned long num_pages = num_extent_pages(src);
+   int num_pages = num_extent_pages(src);
 
new = __alloc_extent_buffer(src->fs_info, src->start, src->len);
if (new == NULL)
@@ -4776,8 +4776,8 @@ struct extent_buffer *__alloc_dummy_extent_buffer(struct 
btrfs_fs_info *fs_info,
  u64 start, unsigned long len)
 {
struct extent_buffer *eb;
-   unsigned long num_pages;
-   unsigned long i;
+   int num_pages;
+   int i;
 
eb = __alloc_extent_buffer(fs_info, start, len);
if (!eb)
@@ -4843,7 +4843,7 @@ static void check_buffer_tree_ref(struct extent_buffer 
*eb)
 static void mark_extent_buffer_accessed(struct extent_buffer *eb,
struct page *accessed)
 {
-   unsigned long num_pages, i;
+   int num_pages, i;
 
check_buffer_tree_ref(eb);
 
@@ -4944,8 +4944,8 @@ struct extent_buffer *alloc_extent_buffer(struct 
btrfs_fs_info *fs_info,
  u64 start)
 {
unsigned long len = fs_info->nodesize;
-   unsigned long num_pages;
-   unsigned long i;
+   int num_pages;
+   int i;
unsigned long index = start >> PAGE_SHIFT;
struct extent_buffer *eb;
struct extent_buffer *exists = NULL;
@@ -5160,8 +5160,8 @@ void free_extent_buffer_stale(struct extent_buffer *eb)
 
 void clear_extent_buffer_dirty(struct extent_buffer *eb)
 {
-   unsigned long i;
-   unsigned long num_pages;
+   int i;
+   int num_pages;
struct page *page;
 
num_pages = num_extent_pages(eb);
@@ -5190,8 +5190,8 @@ void clear_extent_buffer_dirty(struct extent_buffer *eb)
 
 int set_extent_buffer_dirty(struct extent_buffer *eb)
 {
-   unsigned long i;
-   unsigned long num_pages;
+   int i;
+   int num_pages;
int was_dirty = 0;
 
check_buffer_tree_ref(eb);
@@ -5209,9 +5209,9 @@ int set_extent_buffer_dirty(struct extent_buffer *eb)
 
 void clear_extent_buffer_uptodate(struct extent_buffer *eb)
 {
-   unsigned long i;
+   int i;
struct page *page;
-   unsigned long num_pages;
+   int num_pages;
 
clear_bit(EXTENT_BUFFER_UPTODATE, >bflags);
num_pages = num_extent_pages(eb);
@@ -5224,9 +5224,9 @@ void clear_extent_buffer_uptodate(struct extent_buffer 
*eb)
 
 void set_extent_buffer_uptodate(struct extent_buffer *eb)
 {
-   unsigned long i;
+   int i;
struct page *page;
-   unsigned long num_pages;
+   int num_pages;
 
set_bit(EXTENT_BUFFER_UPTODATE, >bflags);
num_pages = num_extent_pages(eb);
@@ -5239,13 +5239,13 @@ void set_extent_buffer_uptodate(struct extent_buffer 
*eb)
 int read_extent_buffer_pages(struct extent_io_tree *tree,
 struct extent_buffer *eb, int wait, int 

[PATCH 0/3] Simplify counting of extent buffer pages

2018-04-23 Thread David Sterba
Some low-hanging cleanup fruit. The argument bloat-o-meter shows some
improvements:

extent_io.c:cache_state_if_flags.part.27   -8 (8 -> 0)
extent_io.c:cache_state.part.28-8 (8 -> 0)
extent_io.c:check_buffer_tree_ref.part.31 -24 (24 -> 0)
extent_io.c:mark_extent_buffer_accessed-8 (40 -> 32)
extent_io.c:alloc_extent_state_atomic.part.35  -8 (8 -> 0)
extent_io.c:flush_write_bio.isra.40   -16 (16 -> 0)
extent_io.c:set_page_extent_mapped.part.49 -8 (8 -> 0)
extent_io.c:extent_buffer_under_io.part.50 -8 (8 -> 0)
extent_io.c:__unlock_for_delalloc.isra.38  -8 (8 -> 0)
extent_io.c:merge_state.part.45   -48 (48 -> 0)
extent_io.c:repair_eb_io_failure   -8 (72 -> 64)
extent_io.c:set_extent_buffer_dirty-8 (40 -> 32)
extent_io.c:__alloc_dummy_extent_buffer+8 (32 -> 40)
extent_io.c:write_one_eb  +16 (152 -> 168)

David Sterba (3):
  btrfs: simplify counting number of eb pages
  btrfs: pass only eb to num_extent_pages
  btrfs: switch types to int when counting eb pages

 fs/btrfs/extent_io.c | 68 ++--
 fs/btrfs/extent_io.h |  5 ++--
 2 files changed, 36 insertions(+), 37 deletions(-)

-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs: pass only eb to num_extent_pages

2018-04-23 Thread David Sterba
Almost all callers pass the start and len as 2 arguments but this is not
necessary, all the information is provided by the eb. By reordering the
calls to num_extent_pages, we don't need the local variables with
start/len.

Signed-off-by: David Sterba 
---
 fs/btrfs/extent_io.c | 30 +++---
 fs/btrfs/extent_io.h |  4 ++--
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index fb32394fd830..0cc5d6ae1876 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2062,7 +2062,7 @@ int repair_eb_io_failure(struct btrfs_fs_info *fs_info,
 struct extent_buffer *eb, int mirror_num)
 {
u64 start = eb->start;
-   unsigned long i, num_pages = num_extent_pages(eb->start, eb->len);
+   unsigned long i, num_pages = num_extent_pages(eb);
int ret = 0;
 
if (sb_rdonly(fs_info->sb))
@@ -3591,7 +3591,7 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
if (!ret)
return ret;
 
-   num_pages = num_extent_pages(eb->start, eb->len);
+   num_pages = num_extent_pages(eb);
for (i = 0; i < num_pages; i++) {
struct page *p = eb->pages[i];
 
@@ -3721,7 +3721,7 @@ static noinline_for_stack int write_one_eb(struct 
extent_buffer *eb,
int ret = 0;
 
clear_bit(EXTENT_BUFFER_WRITE_ERR, >bflags);
-   num_pages = num_extent_pages(eb->start, eb->len);
+   num_pages = num_extent_pages(eb);
atomic_set(>io_pages, num_pages);
 
/* set btree blocks beyond nritems with 0 to avoid stale content. */
@@ -4654,7 +4654,7 @@ static void btrfs_release_extent_buffer_page(struct 
extent_buffer *eb)
 
BUG_ON(extent_buffer_under_io(eb));
 
-   index = num_extent_pages(eb->start, eb->len);
+   index = num_extent_pages(eb);
if (index == 0)
return;
 
@@ -4747,7 +4747,7 @@ struct extent_buffer *btrfs_clone_extent_buffer(struct 
extent_buffer *src)
unsigned long i;
struct page *p;
struct extent_buffer *new;
-   unsigned long num_pages = num_extent_pages(src->start, src->len);
+   unsigned long num_pages = num_extent_pages(src);
 
new = __alloc_extent_buffer(src->fs_info, src->start, src->len);
if (new == NULL)
@@ -4779,12 +4779,11 @@ struct extent_buffer 
*__alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info,
unsigned long num_pages;
unsigned long i;
 
-   num_pages = num_extent_pages(start, len);
-
eb = __alloc_extent_buffer(fs_info, start, len);
if (!eb)
return NULL;
 
+   num_pages = num_extent_pages(eb);
for (i = 0; i < num_pages; i++) {
eb->pages[i] = alloc_page(GFP_NOFS);
if (!eb->pages[i])
@@ -4848,7 +4847,7 @@ static void mark_extent_buffer_accessed(struct 
extent_buffer *eb,
 
check_buffer_tree_ref(eb);
 
-   num_pages = num_extent_pages(eb->start, eb->len);
+   num_pages = num_extent_pages(eb);
for (i = 0; i < num_pages; i++) {
struct page *p = eb->pages[i];
 
@@ -4945,7 +4944,7 @@ struct extent_buffer *alloc_extent_buffer(struct 
btrfs_fs_info *fs_info,
  u64 start)
 {
unsigned long len = fs_info->nodesize;
-   unsigned long num_pages = num_extent_pages(start, len);
+   unsigned long num_pages;
unsigned long i;
unsigned long index = start >> PAGE_SHIFT;
struct extent_buffer *eb;
@@ -4968,6 +4967,7 @@ struct extent_buffer *alloc_extent_buffer(struct 
btrfs_fs_info *fs_info,
if (!eb)
return ERR_PTR(-ENOMEM);
 
+   num_pages = num_extent_pages(eb);
for (i = 0; i < num_pages; i++, index++) {
p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
if (!p) {
@@ -5164,7 +5164,7 @@ void clear_extent_buffer_dirty(struct extent_buffer *eb)
unsigned long num_pages;
struct page *page;
 
-   num_pages = num_extent_pages(eb->start, eb->len);
+   num_pages = num_extent_pages(eb);
 
for (i = 0; i < num_pages; i++) {
page = eb->pages[i];
@@ -5198,7 +5198,7 @@ int set_extent_buffer_dirty(struct extent_buffer *eb)
 
was_dirty = test_and_set_bit(EXTENT_BUFFER_DIRTY, >bflags);
 
-   num_pages = num_extent_pages(eb->start, eb->len);
+   num_pages = num_extent_pages(eb);
WARN_ON(atomic_read(>refs) == 0);
WARN_ON(!test_bit(EXTENT_BUFFER_TREE_REF, >bflags));
 
@@ -5214,7 +5214,7 @@ void clear_extent_buffer_uptodate(struct extent_buffer 
*eb)
unsigned long num_pages;
 
clear_bit(EXTENT_BUFFER_UPTODATE, >bflags);
-   num_pages = num_extent_pages(eb->start, eb->len);
+   num_pages = num_extent_pages(eb);
for (i = 0; i < num_pages; i++) {
page = eb->pages[i];
if (page)
@@ -5229,7 +5229,7 @@ void 

[PATCH 1/3] btrfs: simplify counting number of eb pages

2018-04-23 Thread David Sterba
The eb length is nodesize, as initialized in __alloc_extent_buffer.
Regardless of start, we should always get the same number of pages, so
use that fact.

Signed-off-by: David Sterba 
---
 fs/btrfs/extent_io.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index a53009694b16..ee92c1289edd 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -454,8 +454,7 @@ void wait_on_extent_buffer_writeback(struct extent_buffer 
*eb);
 
 static inline unsigned long num_extent_pages(u64 start, u64 len)
 {
-   return ((start + len + PAGE_SIZE - 1) >> PAGE_SHIFT) -
-   (start >> PAGE_SHIFT);
+   return len >> PAGE_SHIFT;
 }
 
 static inline void extent_buffer_get(struct extent_buffer *eb)
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: push relocation recovery into a helper thread

2018-04-23 Thread David Sterba
On Tue, Apr 17, 2018 at 02:45:33PM -0400, Jeff Mahoney wrote:
> On a file system with many snapshots and qgroups enabled, an interrupted
> balance can end up taking a long time to mount due to recovering the
> relocations during mount.  It does this in the task performing the mount,
> which can't be interrupted and may prevent mounting (and systems booting)
> for a long time as well.  The thing is that as part of balance, this
> runs in the background all the time.  This patch pushes the recovery
> into a helper thread and allows the mount to continue normally.  We hold
> off on resuming any paused balance operation until after the relocation
> has completed, disallow any new balance operations if it's running, and
> wait for it on umount and remounting read-only.

The overall logic sounds ok.

So, this can also stall the umount, right? Eg. if I start mount,
relocation goes to background, then unmount will have to wait for
completion.

Balance pause is requested at umount time, something similar should be
possible for relocation recovery. The fs_state bit CLOSING could be
checked somewhere in the loop.

> This doesn't do anything to address the relocation recovery operation
> taking a long time but does allow the file system to mount.
> 
> Signed-off-by: Jeff Mahoney 
> ---
>  fs/btrfs/ctree.h  |7 +++
>  fs/btrfs/disk-io.c|7 ++-
>  fs/btrfs/relocation.c |   92 
> +-
>  fs/btrfs/super.c  |5 +-
>  fs/btrfs/volumes.c|6 +++
>  5 files changed, 97 insertions(+), 20 deletions(-)
> 
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1052,6 +1052,10 @@ struct btrfs_fs_info {
>   struct btrfs_work qgroup_rescan_work;
>   bool qgroup_rescan_running; /* protected by qgroup_rescan_lock */
>  
> + /* relocation recovery items */
> + bool relocation_recovery_started;
> + struct completion relocation_recovery_completion;

This seems to copy the pattern of qgroup rescan, the completion +
workqueue. I'm planning to move this scheme to the fs_state bit instead
of bool and the wait_for_war with global workqueue, but for now we can
leave as it is here.

> +
>   /* filesystem state */
>   unsigned long fs_state;
>  
> @@ -3671,7 +3675,8 @@ int btrfs_init_reloc_root(struct btrfs_t
> struct btrfs_root *root);
>  int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
>   struct btrfs_root *root);
> -int btrfs_recover_relocation(struct btrfs_root *root);
> +int btrfs_recover_relocation(struct btrfs_fs_info *fs_info);
> +void btrfs_wait_for_relocation_completion(struct btrfs_fs_info *fs_info);
>  int btrfs_reloc_clone_csums(struct inode *inode, u64 file_pos, u64 len);
>  int btrfs_reloc_cow_block(struct btrfs_trans_handle *trans,
> struct btrfs_root *root, struct extent_buffer *buf,
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2999,7 +2999,7 @@ retry_root_backup:
>   goto fail_qgroup;
>  
>   mutex_lock(_info->cleaner_mutex);
> - ret = btrfs_recover_relocation(tree_root);
> + ret = btrfs_recover_relocation(fs_info);
>   mutex_unlock(_info->cleaner_mutex);
>   if (ret < 0) {
>   btrfs_warn(fs_info, "failed to recover relocation: %d",
> @@ -3017,7 +3017,8 @@ retry_root_backup:
>   if (IS_ERR(fs_info->fs_root)) {
>   err = PTR_ERR(fs_info->fs_root);
>   btrfs_warn(fs_info, "failed to read fs tree: %d", err);
> - goto fail_qgroup;
> + close_ctree(fs_info);
> + return err;
>   }
>  
>   if (sb_rdonly(sb))
> @@ -3778,6 +3779,8 @@ void close_ctree(struct btrfs_fs_info *f
>   /* wait for the qgroup rescan worker to stop */
>   btrfs_qgroup_wait_for_completion(fs_info, false);
>  
> + btrfs_wait_for_relocation_completion(fs_info);
> +
>   /* wait for the uuid_scan task to finish */
>   down(_info->uuid_tree_rescan_sem);
>   /* avoid complains from lockdep et al., set sem back to initial state */
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "ctree.h"
>  #include "disk-io.h"
>  #include "transaction.h"
> @@ -4492,14 +4493,61 @@ static noinline_for_stack int mark_garba
>  }
>  
>  /*
> - * recover relocation interrupted by system crash.
> - *
>   * this function resumes merging reloc trees with corresponding fs trees.
>   * this is important for keeping the sharing of tree blocks
>   */
> -int btrfs_recover_relocation(struct btrfs_root *root)
> +static int
> +btrfs_resume_relocation(void *data)

Please keep the type and function name on one line.

>  {
> - struct btrfs_fs_info *fs_info = root->fs_info;
> + struct btrfs_fs_info *fs_info = data;
> + struct btrfs_trans_handle *trans;
> + struct reloc_control *rc = 

Re: [RFC] Add support for BTRFS raid5/6 to GRUB

2018-04-23 Thread Goffredo Baroncelli
On 04/23/2018 01:50 PM, Daniel Kiper wrote:
> On Tue, Apr 17, 2018 at 09:57:40PM +0200, Goffredo Baroncelli wrote:
>> Hi All,
>>
>> Below you can find a patch to add support for accessing files from
>> grub in a RAID5/6 btrfs filesystem. This is a RFC because it is
>> missing the support for recovery (i.e. if some devices are missed). In
>> the next days (weeks ?) I will extend this patch to support also this
>> case.
>>
>> Comments are welcome.
> 
> More or less LGTM. Just a nitpick below... I am happy to take full blown
> patch into GRUB if it is ready.

Thanks for the comments; however now I implemented also the recovery. It is 
under testing. Let me few days and I will resubmit the patches.

> 
>> BR
>> G.Baroncelli
>>
>>
>> ---
>>
>> commit 8c80a1b7c913faf50f95c5c76b4666ed17685666
>> Author: Goffredo Baroncelli 
>> Date:   Tue Apr 17 21:40:31 2018 +0200
>>
>> Add initial support for btrfs raid5/6 chunk
>>
>> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
>> index be195448d..4c5632acb 100644
>> --- a/grub-core/fs/btrfs.c
>> +++ b/grub-core/fs/btrfs.c
>> @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
>>  #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10
>>  #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED0x20
>>  #define GRUB_BTRFS_CHUNK_TYPE_RAID100x40
>> +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80
>> +#define GRUB_BTRFS_CHUNK_TYPE_RAID60x100
>>grub_uint8_t dummy2[0xc];
>>grub_uint16_t nstripes;
>>grub_uint16_t nsubstripes;
>> @@ -764,6 +766,39 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, 
>> grub_disk_addr_t addr,
>>stripe_offset = low + chunk_stripe_length
>>  * high;
>>csize = chunk_stripe_length - low;
>> +  break;
>> +}
>> +  case GRUB_BTRFS_CHUNK_TYPE_RAID5:
>> +  case GRUB_BTRFS_CHUNK_TYPE_RAID6:
>> +{
>> +  grub_uint64_t nparities;
>> +  grub_uint64_t parity_pos;
>> +  grub_uint64_t stripe_nr, high;
>> +  grub_uint64_t low;
>> +
>> +  redundancy = 1;   /* no redundancy for now */
>> +
>> +  if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
>> +{
>> +  grub_dprintf ("btrfs", "RAID5\n");
>> +  nparities = 1;
>> +}
>> +  else
>> +{
>> +  grub_dprintf ("btrfs", "RAID6\n");
>> +  nparities = 2;
>> +}
>> +
>> +  stripe_nr = grub_divmod64 (off, chunk_stripe_length, );
>> +
>> +  high = grub_divmod64 (stripe_nr, nstripes - nparities, );
>> +  grub_divmod64 (high+nstripes-nparities, nstripes, _pos);
>> +  grub_divmod64 (parity_pos+nparities+stripen, nstripes, );
> 
> Missing spaces around "+" and "-".
> 
> Daniel
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libbrtfsutil questions

2018-04-23 Thread Austin S. Hemmelgarn

On 2018-04-23 14:25, waxhead wrote:

Howdy!

I am pondering writing a little C program that use libmicrohttpd and 
libbtrfsutil to display some very basic (overview) details about BTRFS.


I was hoping to display the same information that'btrfs fi sh /mnt' and 
'btrfs fi us -T /mnt' do, but somewhat combined. Since I recently just 
figured out how easy it was to do svg graphics I was hoping to try to 
visualize things a bit.


What I was hoping to achieve is:
- show all filesystems
- ..show all devices in a filesystem (and mark missing devices clearly)
- show usage and/or allocation for each device
- possibly display chunks as blocks (like old defrag programs) where 
the brightness indicate how utilied a (meta)data chunk is.

- possibly mark devices with errors ( 'btrfs de st /mnt' ).

The problem is ... I looked at libbtrfsutil and it appears that there is 
mostly sync + subvolume/snapshot stuff in there. >
So my question is: Is libbtrfsutil the right choice and intended to at 
some point (in the future?) supply me with the data I need for these 
things or should I look elsewhere?
I believe that the intent is for libbtrfsutil to cover pretty much 
everything (AIUI, that's supposed to become the public API for BTRFS, 
with the CLI tools just being wrappers for that).  However, as you've 
found out, it's not really there yet.


If you've got some experience with Python and don't mind putting a bit 
more work in learning how things work at a low level in BTRFS, the 
Python interface maintained by Hans van Kranenburg and found at [1] may 
be of some interest.  I would imagine it shouldn't be much more 
difficult to wire the Python http.server module up to that than it would 
be to wire libmicrohttpd up to a C library, and it would give the 
advantage that you could use a nice templating language like Jinja2 to 
format everything.


PS! This a completely private project for my own egoistic reasons. 
However if it turns out to be useful and the code is not too 
embarrassing I am happy put the code into public domain ... if it ever 
gets written :S
For what it's worth, I think the idea is great.  Especially if it could 
be made to return the data in a JSON format, as that would then make it 
trivial to use for integrating most existing system monitoring 
infrastructure with BTRFS.



[1] https://github.com/knorrie/python-btrfs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


libbrtfsutil questions

2018-04-23 Thread waxhead

Howdy!

I am pondering writing a little C program that use libmicrohttpd and 
libbtrfsutil to display some very basic (overview) details about BTRFS.


I was hoping to display the same information that'btrfs fi sh /mnt' and 
'btrfs fi us -T /mnt' do, but somewhat combined. Since I recently just 
figured out how easy it was to do svg graphics I was hoping to try to 
visualize things a bit.


What I was hoping to achieve is:
- show all filesystems
- ..show all devices in a filesystem (and mark missing devices clearly)
- show usage and/or allocation for each device
- possibly display chunks as blocks (like old defrag programs) where 
the brightness indicate how utilied a (meta)data chunk is.

- possibly mark devices with errors ( 'btrfs de st /mnt' ).

The problem is ... I looked at libbtrfsutil and it appears that there is 
mostly sync + subvolume/snapshot stuff in there.


So my question is: Is libbtrfsutil the right choice and intended to at 
some point (in the future?) supply me with the data I need for these 
things or should I look elsewhere?


PS! This a completely private project for my own egoistic reasons. 
However if it turns out to be useful and the code is not too 
embarrassing I am happy put the code into public domain ... if it ever 
gets written :S








--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: status page

2018-04-23 Thread David Sterba
On Thu, Apr 19, 2018 at 06:24:29PM +0200, Gandalf Corvotempesta wrote:
> Hi to all,
> as kernel 4.16 is out and 4.17 in in RC, would be possible to update
> BTRFS status page
> https://btrfs.wiki.kernel.org/index.php/Status to reflect 4.16 stability ?
> 
> That page is still based on kernel 4.15 (marked as EOL here:
> https://www.kernel.org/)

Reviewed and updated for 4.16, there's no change regarding the overall
status, though 4.16 has some raid56 fixes.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Fix wrong first_key parameter in replace_path

2018-04-23 Thread David Sterba
On Mon, Apr 23, 2018 at 05:20:06PM +0300, Nikolay Borisov wrote:
> On 23.04.2018 17:16, David Sterba wrote:
> > On Mon, Apr 23, 2018 at 05:32:04PM +0800, Qu Wenruo wrote:
> >> Commit 581c1760415c ("btrfs: Validate child tree block's level and first
> >> key") introduced new @first_key parameter for read_tree_block(), however
> >> caller in replace_path() is parasing wrong key to read_tree_block().
> >>
> >> It should use parameter @first_key other than @key.
> >>
> >> Normally it won't expose problem as @key is normally initialzied to the
> >> same value of @first_key we expect.
> >> However in relocation recovery case, @key can be set to (0, 0, 0), and
> >> since no valid key in relocation tree can be (0, 0, 0), it will cause
> >> read_tree_block() to return -EUCLEAN and interrupt relocation recovery.
> >>
> >> Fix it by setting @first_key correctly.
> >>
> >> Signed-off-by: Qu Wenruo 
> > 
> > Added to next, thanks.
> 
> This is actually -rc2 material, right?

Yes, adding to next is the 1st stage as we want to start testing. What
will end up in the next rc pull is determined later if everything is
fine. For some patches it's obvious that they're fixing a regression,
the others are clear fixes that still qualify for rc. I can be more
specific about the target branch, but this should be transparent to
developers anyway once the patch is in the pool.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Fix wrong first_key parameter in replace_path

2018-04-23 Thread Nikolay Borisov


On 23.04.2018 17:16, David Sterba wrote:
> On Mon, Apr 23, 2018 at 05:32:04PM +0800, Qu Wenruo wrote:
>> Commit 581c1760415c ("btrfs: Validate child tree block's level and first
>> key") introduced new @first_key parameter for read_tree_block(), however
>> caller in replace_path() is parasing wrong key to read_tree_block().
>>
>> It should use parameter @first_key other than @key.
>>
>> Normally it won't expose problem as @key is normally initialzied to the
>> same value of @first_key we expect.
>> However in relocation recovery case, @key can be set to (0, 0, 0), and
>> since no valid key in relocation tree can be (0, 0, 0), it will cause
>> read_tree_block() to return -EUCLEAN and interrupt relocation recovery.
>>
>> Fix it by setting @first_key correctly.
>>
>> Signed-off-by: Qu Wenruo 
> 
> Added to next, thanks.

This is actually -rc2 material, right?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Fix wrong first_key parameter in replace_path

2018-04-23 Thread David Sterba
On Mon, Apr 23, 2018 at 05:32:04PM +0800, Qu Wenruo wrote:
> Commit 581c1760415c ("btrfs: Validate child tree block's level and first
> key") introduced new @first_key parameter for read_tree_block(), however
> caller in replace_path() is parasing wrong key to read_tree_block().
> 
> It should use parameter @first_key other than @key.
> 
> Normally it won't expose problem as @key is normally initialzied to the
> same value of @first_key we expect.
> However in relocation recovery case, @key can be set to (0, 0, 0), and
> since no valid key in relocation tree can be (0, 0, 0), it will cause
> read_tree_block() to return -EUCLEAN and interrupt relocation recovery.
> 
> Fix it by setting @first_key correctly.
> 
> Signed-off-by: Qu Wenruo 

Added to next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] btrfs: add FS_IOC_FSSETXATTR ioctl

2018-04-23 Thread David Sterba
On Mon, Apr 23, 2018 at 11:42:42AM +0900, Misono Tomohiro wrote:
> On 2018/04/21 2:02, David Sterba wrote:
> > The new ioctl is an extension to the FS_IOC_SETFLAGS and adds new
> > flags and is extensible. Don't get fooled by the XATTR in the name, it
> > does not have anything in common with the extended attributes,
> > incidentally also abbreviated as XATTRs.
> > 
> > This patch allows to set the xflags portion of the fsxattr structure,
> > other items have no meaning and non-zero values will result in
> > EOPNOTSUPP.
> > 
> > Currently supported xflags:
> > 
> > - APPEND
> > - IMMUTABLE
> > - NOATIME
> > - NODUMP
> > - SYNC
> > 
> > The structure of btrfs_ioctl_fssetxattr copies btrfs_ioctl_setflags but
> > is simpler on the flag setting side.
> > 
> > The original patch was written by Chandan Jay Sharma but was incomplete
> > and no further revision has been sent.
> > 
> > Based-on-patches-by: Chandan Jay Sharma 
> > Signed-off-by: David Sterba 
> > ---
> >  fs/btrfs/ioctl.c | 94 
> > 
> >  1 file changed, 94 insertions(+)
> > 
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 52b12ab9b82b..4fd61f191bba 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -388,6 +388,98 @@ static int btrfs_ioctl_fsgetxattr(struct file *file, 
> > void __user *arg)
> > return 0;
> >  }
> >  
> > +static int btrfs_ioctl_fssetxattr(struct file *file, void __user *arg)
> > +{
> > +   struct inode *inode = file_inode(file);
> > +   struct btrfs_inode *binode = BTRFS_I(inode);
> > +   struct btrfs_root *root = binode->root;
> > +   struct btrfs_trans_handle *trans;
> > +   struct fsxattr fa;
> > +   unsigned oldflags;
> > +   unsigned old_i_flags;
> > +   int ret = 0;
> > +
> > +   if (!inode_owner_or_capable(inode))
> > +   return -EPERM;
> > +
> > +   if (btrfs_root_readonly(root))
> > +   return -EROFS;
> > +
> > +   memset(, 0, sizeof(fa));
> > +   if (copy_from_user(, arg, sizeof(fa)))
> > +   return -EFAULT;
> > +
> > +   ret = check_xflags(fa.fsx_xflags);
> > +   if (ret)
> > +   return ret;
> > +
> > +   if (fa.fsx_extsize != 0 || fa.fsx_projid != 0 || fa.fsx_cowextsize != 0)
> > +   return -EOPNOTSUPP;
> > +
> > +   ret = mnt_want_write_file(file);
> > +   if (ret)
> > +   return ret;
> > +
> > +   inode_lock(inode);
> > +
> > +   oldflags = binode->flags;
> > +   old_i_flags = inode->i_flags;
> > +
> > +   /* We need the capabilities to change append-only or immutable inode */
> > +   if (((oldflags & (BTRFS_INODE_APPEND | BTRFS_INODE_IMMUTABLE)) ||
> > +(fa.fsx_xflags & (FS_XFLAG_APPEND | FS_XFLAG_IMMUTABLE))) &&
> > +   !capable(CAP_LINUX_IMMUTABLE)) {
> > +   ret = -EPERM;
> > +   goto out_unlock;
> > +   }
> > +
> > +   if (fa.fsx_xflags & FS_XFLAG_SYNC)
> > +   binode->flags |= BTRFS_INODE_SYNC;
> > +   else
> > +   binode->flags &= ~BTRFS_INODE_SYNC;
> > +   if (fa.fsx_xflags & FS_XFLAG_IMMUTABLE)
> > +   binode->flags |= BTRFS_INODE_IMMUTABLE;
> > +   else
> > +   binode->flags &= ~BTRFS_INODE_IMMUTABLE;
> > +   if (fa.fsx_xflags & FS_XFLAG_APPEND)
> > +   binode->flags |= BTRFS_INODE_APPEND;
> > +   else
> > +   binode->flags &= ~BTRFS_INODE_APPEND;
> > +   if (fa.fsx_xflags & FS_XFLAG_NODUMP)
> > +   binode->flags |= BTRFS_INODE_NODUMP;
> > +   else
> > +   binode->flags &= ~BTRFS_INODE_NODUMP;
> > +   if (fa.fsx_xflags & FS_XFLAG_NOATIME)
> > +   binode->flags |= BTRFS_INODE_NOATIME;
> > +   else
> > +   binode->flags &= ~BTRFS_INODE_NOATIME;
> > +
> > +   /* 1 item for the inode */
> > +   trans = btrfs_start_transaction(root, 1);
> > +   if (IS_ERR(trans)) {
> > +   ret = PTR_ERR(trans);
> > +   goto out_drop;
> > +   }
> > +
> > +   btrfs_sync_inode_flags_to_i_flags(inode);
> > +   inode_inc_iversion(inode);
> > +   inode->i_ctime = current_time(inode);
> > +   ret = btrfs_update_inode(trans, root, inode);
> > +
> > +   btrfs_end_transaction(trans);
> 
> Shouldn't out_drop label be here?
> Otherwise, above "goto out_drop" won't unlock nor restore the flag value.

Right, will fix, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] btrfs: rename btrfs_flags_to_ioctl to reflect which flags it touches

2018-04-23 Thread David Sterba
On Mon, Apr 23, 2018 at 11:37:08AM +0900, Misono Tomohiro wrote:
> On 2018/04/21 2:02, David Sterba wrote:
> > Converts btrfs_inode::flags to the FS_*_FL flags.
> > 
> > Signed-off-by: David Sterba 
> > ---
> >  fs/btrfs/ioctl.c | 9 +
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 953473f2a136..f0e6074233fa 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -105,9 +105,10 @@ static unsigned int btrfs_mask_fsflags_for_type(struct 
> > inode *inode,
> >  }
> >  
> >  /*
> > - * Export inode flags to the format expected by the FS_IOC_GETFLAGS ioctl.
> > + * Export internal inode flags to the format expected by the 
> > FS_IOC_GETFLAGS
> > + * ioctl.
> >   */
> > -static unsigned int btrfs_flags_to_ioctl(unsigned int flags)
> > +static unsigned int btrfs_inode_flags_to_fsflags(unsigned int flags)
> >  {
> > unsigned int iflags = 0;
> >  
> > @@ -161,7 +162,7 @@ void btrfs_sync_inode_flags_to_i_flags(struct inode 
> > *inode)
> >  static int btrfs_ioctl_getflags(struct file *file, void __user *arg)
> >  {
> > struct btrfs_inode *ip = BTRFS_I(file_inode(file));
> 
> *ip here and in btrfs_ioctl_setflags() should be changed to *binode as in 1st 
> patch?

Yeah this would be at least consistent with the 1st patch, on a second
look I'm not sure if the variable renaming should be part of the patch,
so I'll move it to another patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add support for BTRFS raid5/6 to GRUB

2018-04-23 Thread Daniel Kiper
On Tue, Apr 17, 2018 at 09:57:40PM +0200, Goffredo Baroncelli wrote:
> Hi All,
>
> Below you can find a patch to add support for accessing files from
> grub in a RAID5/6 btrfs filesystem. This is a RFC because it is
> missing the support for recovery (i.e. if some devices are missed). In
> the next days (weeks ?) I will extend this patch to support also this
> case.
>
> Comments are welcome.

More or less LGTM. Just a nitpick below... I am happy to take full blown
patch into GRUB if it is ready.

> BR
> G.Baroncelli
>
>
> ---
>
> commit 8c80a1b7c913faf50f95c5c76b4666ed17685666
> Author: Goffredo Baroncelli 
> Date:   Tue Apr 17 21:40:31 2018 +0200
>
> Add initial support for btrfs raid5/6 chunk
>
> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
> index be195448d..4c5632acb 100644
> --- a/grub-core/fs/btrfs.c
> +++ b/grub-core/fs/btrfs.c
> @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
>  #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10
>  #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED0x20
>  #define GRUB_BTRFS_CHUNK_TYPE_RAID100x40
> +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80
> +#define GRUB_BTRFS_CHUNK_TYPE_RAID60x100
>grub_uint8_t dummy2[0xc];
>grub_uint16_t nstripes;
>grub_uint16_t nsubstripes;
> @@ -764,6 +766,39 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, 
> grub_disk_addr_t addr,
> stripe_offset = low + chunk_stripe_length
>   * high;
> csize = chunk_stripe_length - low;
> +   break;
> + }
> +   case GRUB_BTRFS_CHUNK_TYPE_RAID5:
> +   case GRUB_BTRFS_CHUNK_TYPE_RAID6:
> + {
> +   grub_uint64_t nparities;
> +   grub_uint64_t parity_pos;
> +   grub_uint64_t stripe_nr, high;
> +   grub_uint64_t low;
> +
> +   redundancy = 1;   /* no redundancy for now */
> +
> +   if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
> + {
> +   grub_dprintf ("btrfs", "RAID5\n");
> +   nparities = 1;
> + }
> +   else
> + {
> +   grub_dprintf ("btrfs", "RAID6\n");
> +   nparities = 2;
> + }
> +
> +   stripe_nr = grub_divmod64 (off, chunk_stripe_length, );
> +
> +   high = grub_divmod64 (stripe_nr, nstripes - nparities, );
> +   grub_divmod64 (high+nstripes-nparities, nstripes, _pos);
> +   grub_divmod64 (parity_pos+nparities+stripen, nstripes, );

Missing spaces around "+" and "-".

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Fix wrong first_key parameter in replace_path

2018-04-23 Thread Qu Wenruo
Commit 581c1760415c ("btrfs: Validate child tree block's level and first
key") introduced new @first_key parameter for read_tree_block(), however
caller in replace_path() is parasing wrong key to read_tree_block().

It should use parameter @first_key other than @key.

Normally it won't expose problem as @key is normally initialzied to the
same value of @first_key we expect.
However in relocation recovery case, @key can be set to (0, 0, 0), and
since no valid key in relocation tree can be (0, 0, 0), it will cause
read_tree_block() to return -EUCLEAN and interrupt relocation recovery.

Fix it by setting @first_key correctly.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/relocation.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 00b7d3231821..b041b945a7ae 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1841,7 +1841,7 @@ int replace_path(struct btrfs_trans_handle *trans,
old_bytenr = btrfs_node_blockptr(parent, slot);
blocksize = fs_info->nodesize;
old_ptr_gen = btrfs_node_ptr_generation(parent, slot);
-   btrfs_node_key_to_cpu(parent, , slot);
+   btrfs_node_key_to_cpu(parent, _key, slot);
 
if (level <= max_level) {
eb = path->nodes[level];
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Remove delay_iput parameter when running delalloc work

2018-04-23 Thread Nikolay Borisov


On 23.04.2018 12:27, Qu Wenruo wrote:
> 
> 
> On 2018年04月23日 15:54, Nikolay Borisov wrote:
>> While trying to make sense of the lifecycle of delayed iputs it became 
>> apparent
>> that the delay_iput parameter of btrfs_start_delalloc_roots/
>> btrfs_start_delalloc_inodes is always set to 0. (Patch 1 contains historical 
>> information of why this parameter was needed and when it became obsolete). 
>> Now
>> that the code has changed sufficiently it's no longer required to have it so
>> this series gradually removes it. 
>>
>> Nikolay Borisov (5):
>>   btrfs: Remove delayed_iput parameter of btrfs_start_delalloc_roots
>>   btrfs: Remove delayed_iput parameter from btrfs_start_delalloc_inodes
>>   btrfs: Remove delay_iput parameter from __start_delalloc_inodes
>>   btrfs: Remove delayed_iput member from btrfs_delalloc_work
>>   btrfs: Unexport btrfs_alloc_delalloc_work
> 
> Solid cleanup.
> 
> Reviewed-by: Qu Wenruo 
> 
> Just a little nitpick about the 3rd and 4th patch.
> It would be nicer the merge them, as in the the 3rd patch
> btrfs_alloc_delalloc_work() call still takes 0 as @delay_iput, but in
> next patch the @delay_iput just get removed.

I'm fine with that, I guess David if you deem it more reasonable you
could squash the 2 patches. I just did it for the sake of bisectability
and review purposes.

> 
> Thanks,
> Qu
> 
>>
>>  fs/btrfs/ctree.h   |  9 ++---
>>  fs/btrfs/dev-replace.c |  2 +-
>>  fs/btrfs/extent-tree.c |  4 ++--
>>  fs/btrfs/inode.c   | 28 +---
>>  fs/btrfs/ioctl.c   |  4 ++--
>>  5 files changed, 16 insertions(+), 31 deletions(-)
>>
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Remove delay_iput parameter when running delalloc work

2018-04-23 Thread Qu Wenruo


On 2018年04月23日 15:54, Nikolay Borisov wrote:
> While trying to make sense of the lifecycle of delayed iputs it became 
> apparent
> that the delay_iput parameter of btrfs_start_delalloc_roots/
> btrfs_start_delalloc_inodes is always set to 0. (Patch 1 contains historical 
> information of why this parameter was needed and when it became obsolete). Now
> that the code has changed sufficiently it's no longer required to have it so
> this series gradually removes it. 
> 
> Nikolay Borisov (5):
>   btrfs: Remove delayed_iput parameter of btrfs_start_delalloc_roots
>   btrfs: Remove delayed_iput parameter from btrfs_start_delalloc_inodes
>   btrfs: Remove delay_iput parameter from __start_delalloc_inodes
>   btrfs: Remove delayed_iput member from btrfs_delalloc_work
>   btrfs: Unexport btrfs_alloc_delalloc_work

Solid cleanup.

Reviewed-by: Qu Wenruo 

Just a little nitpick about the 3rd and 4th patch.
It would be nicer the merge them, as in the the 3rd patch
btrfs_alloc_delalloc_work() call still takes 0 as @delay_iput, but in
next patch the @delay_iput just get removed.

Thanks,
Qu

> 
>  fs/btrfs/ctree.h   |  9 ++---
>  fs/btrfs/dev-replace.c |  2 +-
>  fs/btrfs/extent-tree.c |  4 ++--
>  fs/btrfs/inode.c   | 28 +---
>  fs/btrfs/ioctl.c   |  4 ++--
>  5 files changed, 16 insertions(+), 31 deletions(-)
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.17-rc1 FS went read-only during balance

2018-04-23 Thread Dmitrii Tcvetkov
> > TL;DR It seems as regression in 4.17, but I managed to find a
> > workaround to make filesystem rw mountable again.
> >
> > Kernel built from tag v4.17-rc1
> > btrfs-progs 4.16
> >
> > Tonight two my machines (PC (ECC RAM) and laptop(non-ECC RAM)) were
> > doing usual weekly balance with this command via cron:
> > btrfs balance start -musage=50 -dusage=50 
> > Both machines run same kernel version. 
> >
> > On PC that caused root and "data" filesystems to go readonly. Root
> > is on an SSD with data single and metadata DUP, "data" filesystem
> > is on 2 HDDs with RAID1 for data and metadata.
> >
> > On laptop only /home went ro, it's on NVMe SSD with data single and
> > metadata DUP. 
> >
> > Btrfs check of PC rootfs was without any errors in both modes, I did
> > them once each before reboot on readonly filesystem with --force
> > flag and then from live usb. Same output without any errors.
> >
> > After reboot kernel refused rw mount rootfs with the same error as
> > during cron balance, ro mount was accepted, error during rw mount:
> > BTRFS: error (device dm-17) in merge_reloc_roots:2465: errno=-117  
> >>> 
>  117 means EUCLEAN, which could be caused by the newly introduced
>  first_key and level check.
> >>> 
>  Please apply this hotfix to fix it.
>  btrfs: Only check first key for committed tree blocks
>  (Which is included in latest pull request)
> >>> 
>  Also, please consider enable CONFIG_BTRFS_DEBUG to provide extra
>  debug info.
> >>> 
>  Thanks,
>  Qu
> >>>
> >>> I tried 4.17-rc2 (as the pull request was pulled) with
> >>> CONFIG_BTRFS_DEBUG on LVM snapshot of laptop home partition (/dev/vdb)
> >>> in a VM (VM kernel sees only snapshot so no UUID collisions). Dmesg
> >>> attached.
> >>
> >> Thanks for the info and your previous btrfs-image.
> >>
> >> The image itself shows nothing wrong, so it should be runtime problem.
> >> Would you please apply these two debug patches?
> >> https://patchwork.kernel.org/patch/10335133/
> >> https://patchwork.kernel.org/patch/10335135/
> >>
> >> And the attached diff file?
> >>
> >> My guess is the parent node is not initialized correctly in this case.
> >>
> >> Thanks,
> >> Qu  
> > 
> > Dmesg from kernel with all three patches applied attached.
> >   
> Thanks for the debug info, it really helps a lot!
> 
> It turns out that I'm just a super idiot, a typo in replace_path()
> caused this, and it could not be trigger unless we enter it from
> relocation recovery.
> 
> Please try the attached patch to see if it solves the problem.
> 
> Thanks,
> Qu
Glad to help, the patch solved the problem, 
rw mount is successful and balance finished, no errors or debug output,
btrfs check is clean in both modes.

[2.842718] BTRFS: device label home devid 1 transid 277952 /dev/vdb
[2.924965] BTRFS: device label root devid 1 transid 84092 /dev/vda2
[3.072271] BTRFS info (device vda2): use lzo compression, level 0
[3.072897] BTRFS info (device vda2): enabling auto defrag
[3.073476] BTRFS info (device vda2): using free space tree
[3.074049] BTRFS info (device vda2): has skinny extents
[5.411821] BTRFS info (device vda2): using free space tree
[   24.925293] BTRFS info (device vdb): using free space tree
[   24.925324] BTRFS info (device vdb): has skinny extents
[   31.711868] BTRFS info (device vdb): continuing balance
[   31.721658] BTRFS info (device vdb): checking UUID tree
[   31.822920] BTRFS info (device vdb): relocating block group 69889687552flags 
data 
[   33.730399] BTRFS info (device vdb): found 12 extents
[   36.950699] BTRFS info (device vdb): found 12 extents
[   37.030813] BTRFS info (device vdb): relocating block group 67742203904flags 
metadata|dup 
[   37.104174] BTRFS info (device vdb): relocating block group 67708649472 
flags system|dup 
[   37.189843] BTRFS info (device vdb): found 1 extents



pgppgUIF6oj1v.pgp
Description: OpenPGP digital signature


Re: 4.17-rc1 FS went read-only during balance

2018-04-23 Thread Qu Wenruo


On 2018年04月23日 16:04, Dmitrii Tcvetkov wrote:
> TL;DR It seems as regression in 4.17, but I managed to find a
> workaround to make filesystem rw mountable again.
>
> Kernel built from tag v4.17-rc1
> btrfs-progs 4.16
>
> Tonight two my machines (PC (ECC RAM) and laptop(non-ECC RAM)) were
> doing usual weekly balance with this command via cron:
> btrfs balance start -musage=50 -dusage=50 
> Both machines run same kernel version. 
>
> On PC that caused root and "data" filesystems to go readonly. Root
> is on an SSD with data single and metadata DUP, "data" filesystem
> is on 2 HDDs with RAID1 for data and metadata.
>
> On laptop only /home went ro, it's on NVMe SSD with data single and
> metadata DUP. 
>
> Btrfs check of PC rootfs was without any errors in both modes, I did
> them once each before reboot on readonly filesystem with --force
> flag and then from live usb. Same output without any errors.
>
> After reboot kernel refused rw mount rootfs with the same error as
> during cron balance, ro mount was accepted, error during rw mount:
> BTRFS: error (device dm-17) in merge_reloc_roots:2465: errno=-117
>>>   
 117 means EUCLEAN, which could be caused by the newly introduced
 first_key and level check.  
>>>   
 Please apply this hotfix to fix it.
 btrfs: Only check first key for committed tree blocks
 (Which is included in latest pull request)  
>>>   
 Also, please consider enable CONFIG_BTRFS_DEBUG to provide extra
 debug info.  
>>>   
 Thanks,
 Qu  
>>>
>>> I tried 4.17-rc2 (as the pull request was pulled) with
>>> CONFIG_BTRFS_DEBUG on LVM snapshot of laptop home partition (/dev/vdb)
>>> in a VM (VM kernel sees only snapshot so no UUID collisions). Dmesg
>>> attached.  
>>
>> Thanks for the info and your previous btrfs-image.
>>
>> The image itself shows nothing wrong, so it should be runtime problem.
>> Would you please apply these two debug patches?
>> https://patchwork.kernel.org/patch/10335133/
>> https://patchwork.kernel.org/patch/10335135/
>>
>> And the attached diff file?
>>
>> My guess is the parent node is not initialized correctly in this case.
>>
>> Thanks,
>> Qu
> 
> Dmesg from kernel with all three patches applied attached.
> 
Thanks for the debug info, it really helps a lot!

It turns out that I'm just a super idiot, a typo in replace_path()
caused this, and it could not be trigger unless we enter it from
relocation recovery.

Please try the attached patch to see if it solves the problem.

Thanks,
Qu
From 4b70eb864192ec5cf54a7e67e2957ddf0e5c0f6f Mon Sep 17 00:00:00 2001
From: Qu Wenruo 
Date: Mon, 23 Apr 2018 16:13:55 +0800
Subject: [PATCH] btrfs: Fix wrong first_key parameter in replace_path

Commit 581c1760415c ("btrfs: Validate child tree block's level and first
key") introduced new @first_key parameter for read_tree_block(), however
caller in replace_path() is parasing wrong key to read_tree_block().

It should use parameter @first_key other than @key.

Normally it won't expose problem as @key is normally initialzied to the
same value of @first_key we expect.
However in relocation recovery case, @key can be set to (0, 0, 0), and
since no valid key in relocation tree can be (0, 0, 0), it will cause
read_tree_block() to return -EUCLEAN and interrupt relocation recovery.

Fix it by setting @first_key correctly.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/relocation.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 00b7d3231821..b041b945a7ae 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1841,7 +1841,7 @@ int replace_path(struct btrfs_trans_handle *trans,
 		old_bytenr = btrfs_node_blockptr(parent, slot);
 		blocksize = fs_info->nodesize;
 		old_ptr_gen = btrfs_node_ptr_generation(parent, slot);
-		btrfs_node_key_to_cpu(parent, , slot);
+		btrfs_node_key_to_cpu(parent, _key, slot);
 
 		if (level <= max_level) {
 			eb = path->nodes[level];
-- 
2.17.0



signature.asc
Description: OpenPGP digital signature


[PATCH 3/5] btrfs: Remove delay_iput parameter from __start_delalloc_inodes

2018-04-23 Thread Nikolay Borisov
It's always set to 0 so remove it

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/inode.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 855237737acb..42a2590559df 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -10189,8 +10189,7 @@ struct btrfs_delalloc_work 
*btrfs_alloc_delalloc_work(struct inode *inode,
  * some fairly slow code that needs optimization. This walks the list
  * of all the inodes with pending delalloc and forces them to disk.
  */
-static int __start_delalloc_inodes(struct btrfs_root *root, int delay_iput,
-  int nr)
+static int __start_delalloc_inodes(struct btrfs_root *root, int nr)
 {
struct btrfs_inode *binode;
struct inode *inode;
@@ -10218,12 +10217,9 @@ static int __start_delalloc_inodes(struct btrfs_root 
*root, int delay_iput,
}
spin_unlock(>delalloc_lock);
 
-   work = btrfs_alloc_delalloc_work(inode, delay_iput);
+   work = btrfs_alloc_delalloc_work(inode, 0);
if (!work) {
-   if (delay_iput)
-   btrfs_add_delayed_iput(inode);
-   else
-   iput(inode);
+   iput(inode);
ret = -ENOMEM;
goto out;
}
@@ -10262,7 +10258,7 @@ int btrfs_start_delalloc_inodes(struct btrfs_root *root)
if (test_bit(BTRFS_FS_STATE_ERROR, _info->fs_state))
return -EROFS;
 
-   ret = __start_delalloc_inodes(root, 0, -1);
+   ret = __start_delalloc_inodes(root, -1);
if (ret > 0)
ret = 0;
return ret;
@@ -10291,7 +10287,7 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info 
*fs_info, int nr)
   _info->delalloc_roots);
spin_unlock(_info->delalloc_root_lock);
 
-   ret = __start_delalloc_inodes(root, 0, nr);
+   ret = __start_delalloc_inodes(root, nr);
btrfs_put_fs_root(root);
if (ret < 0)
goto out;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] btrfs: Remove delayed_iput parameter of btrfs_start_delalloc_roots

2018-04-23 Thread Nikolay Borisov
This parameter was introduced alongside the function in
eb73c1b7cea7 ("Btrfs: introduce per-subvolume delalloc inode list") to
avoid deadlocks since this function was used in the transaction commit
path. However, commit 8d875f95da43 ("btrfs: disable strict file flushes
for renames and truncates") removed that usage, rendering the parameter
obsolete.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.h   | 3 +--
 fs/btrfs/dev-replace.c | 2 +-
 fs/btrfs/extent-tree.c | 4 ++--
 fs/btrfs/inode.c   | 5 ++---
 fs/btrfs/ioctl.c   | 2 +-
 5 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index aac3a8bfab51..8c5ab06ccc0a 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3199,8 +3199,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle 
*trans,
   u32 min_type);
 
 int btrfs_start_delalloc_inodes(struct btrfs_root *root, int delay_iput);
-int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int delay_iput,
-  int nr);
+int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int nr);
 int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
  unsigned int extra_bits,
  struct extent_state **cached_state, int dedupe);
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index f82be266ba4b..a08b30052e06 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -503,7 +503,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
 * flush all outstanding I/O and inode extent mappings before the
 * copy operation is declared as being finished
 */
-   ret = btrfs_start_delalloc_roots(fs_info, 0, -1);
+   ret = btrfs_start_delalloc_roots(fs_info, -1);
if (ret) {
mutex_unlock(_replace->lock_finishing_cancel_unmount);
return ret;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 22ba5d500836..cf3cd23f274d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4337,7 +4337,7 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode 
*inode, u64 bytes)
need_commit--;
 
if (need_commit > 0) {
-   btrfs_start_delalloc_roots(fs_info, 0, -1);
+   btrfs_start_delalloc_roots(fs_info, -1);
btrfs_wait_ordered_roots(fs_info, U64_MAX, 0,
 (u64)-1);
}
@@ -4790,7 +4790,7 @@ static void btrfs_writeback_inodes_sb_nr(struct 
btrfs_fs_info *fs_info,
 * the filesystem is readonly(all dirty pages are written to
 * the disk).
 */
-   btrfs_start_delalloc_roots(fs_info, 0, nr_items);
+   btrfs_start_delalloc_roots(fs_info, nr_items);
if (!current->journal_info)
btrfs_wait_ordered_roots(fs_info, nr_items, 0, (u64)-1);
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c0a38917539d..74ea63159b93 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -10268,8 +10268,7 @@ int btrfs_start_delalloc_inodes(struct btrfs_root 
*root, int delay_iput)
return ret;
 }
 
-int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int delay_iput,
-  int nr)
+int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int nr)
 {
struct btrfs_root *root;
struct list_head splice;
@@ -10292,7 +10291,7 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info 
*fs_info, int delay_iput,
   _info->delalloc_roots);
spin_unlock(_info->delalloc_root_lock);
 
-   ret = __start_delalloc_inodes(root, delay_iput, nr);
+   ret = __start_delalloc_inodes(root, 0, nr);
btrfs_put_fs_root(root);
if (ret < 0)
goto out;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index de5dcdb87e97..ea192378cb3b 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -5316,7 +5316,7 @@ long btrfs_ioctl(struct file *file, unsigned int
case BTRFS_IOC_SYNC: {
int ret;
 
-   ret = btrfs_start_delalloc_roots(fs_info, 0, -1);
+   ret = btrfs_start_delalloc_roots(fs_info, -1);
if (ret)
return ret;
ret = btrfs_sync_fs(inode->i_sb, 1);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] Remove delay_iput parameter when running delalloc work

2018-04-23 Thread Nikolay Borisov
While trying to make sense of the lifecycle of delayed iputs it became apparent
that the delay_iput parameter of btrfs_start_delalloc_roots/
btrfs_start_delalloc_inodes is always set to 0. (Patch 1 contains historical 
information of why this parameter was needed and when it became obsolete). Now
that the code has changed sufficiently it's no longer required to have it so
this series gradually removes it. 

Nikolay Borisov (5):
  btrfs: Remove delayed_iput parameter of btrfs_start_delalloc_roots
  btrfs: Remove delayed_iput parameter from btrfs_start_delalloc_inodes
  btrfs: Remove delay_iput parameter from __start_delalloc_inodes
  btrfs: Remove delayed_iput member from btrfs_delalloc_work
  btrfs: Unexport btrfs_alloc_delalloc_work

 fs/btrfs/ctree.h   |  9 ++---
 fs/btrfs/dev-replace.c |  2 +-
 fs/btrfs/extent-tree.c |  4 ++--
 fs/btrfs/inode.c   | 28 +---
 fs/btrfs/ioctl.c   |  4 ++--
 5 files changed, 16 insertions(+), 31 deletions(-)

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] btrfs: Remove delayed_iput member from btrfs_delalloc_work

2018-04-23 Thread Nikolay Borisov
When allocating a delalloc work we are always setting the delayed_iput
to 0. So remove the delay_iput member of btrfs_delalloc_work, as a
result also remove it as a parameter from btrfs_alloc_delalloc_work
since it's not used anymore.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.h |  4 +---
 fs/btrfs/inode.c | 11 +++
 2 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 1ac983270bb7..91f51a80334f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3165,14 +3165,12 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode 
*inode,
 /* inode.c */
 struct btrfs_delalloc_work {
struct inode *inode;
-   int delay_iput;
struct completion completion;
struct list_head list;
struct btrfs_work work;
 };
 
-struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode,
-   int delay_iput);
+struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode);
 
 struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
struct page *page, size_t pg_offset, u64 start,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 42a2590559df..9b8f2904ad55 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -10158,15 +10158,11 @@ static void btrfs_run_delalloc_work(struct btrfs_work 
*work)
_I(inode)->runtime_flags))
filemap_flush(inode->i_mapping);
 
-   if (delalloc_work->delay_iput)
-   btrfs_add_delayed_iput(inode);
-   else
-   iput(inode);
+   iput(inode);
complete(_work->completion);
 }
 
-struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode,
-   int delay_iput)
+struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode)
 {
struct btrfs_delalloc_work *work;
 
@@ -10177,7 +10173,6 @@ struct btrfs_delalloc_work 
*btrfs_alloc_delalloc_work(struct inode *inode,
init_completion(>completion);
INIT_LIST_HEAD(>list);
work->inode = inode;
-   work->delay_iput = delay_iput;
WARN_ON_ONCE(!inode);
btrfs_init_work(>work, btrfs_flush_delalloc_helper,
btrfs_run_delalloc_work, NULL, NULL);
@@ -10217,7 +10212,7 @@ static int __start_delalloc_inodes(struct btrfs_root 
*root, int nr)
}
spin_unlock(>delalloc_lock);
 
-   work = btrfs_alloc_delalloc_work(inode, 0);
+   work = btrfs_alloc_delalloc_work(inode);
if (!work) {
iput(inode);
ret = -ENOMEM;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] btrfs: Remove delayed_iput parameter from btrfs_start_delalloc_inodes

2018-04-23 Thread Nikolay Borisov
It's always set to 0, so just remove it and collapse the constant value
to the only function we are passing it.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.h | 2 +-
 fs/btrfs/inode.c | 4 ++--
 fs/btrfs/ioctl.c | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8c5ab06ccc0a..1ac983270bb7 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3198,7 +3198,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle 
*trans,
   struct inode *inode, u64 new_size,
   u32 min_type);
 
-int btrfs_start_delalloc_inodes(struct btrfs_root *root, int delay_iput);
+int btrfs_start_delalloc_inodes(struct btrfs_root *root);
 int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int nr);
 int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
  unsigned int extra_bits,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 74ea63159b93..855237737acb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -10254,7 +10254,7 @@ static int __start_delalloc_inodes(struct btrfs_root 
*root, int delay_iput,
return ret;
 }
 
-int btrfs_start_delalloc_inodes(struct btrfs_root *root, int delay_iput)
+int btrfs_start_delalloc_inodes(struct btrfs_root *root)
 {
struct btrfs_fs_info *fs_info = root->fs_info;
int ret;
@@ -10262,7 +10262,7 @@ int btrfs_start_delalloc_inodes(struct btrfs_root 
*root, int delay_iput)
if (test_bit(BTRFS_FS_STATE_ERROR, _info->fs_state))
return -EROFS;
 
-   ret = __start_delalloc_inodes(root, delay_iput, -1);
+   ret = __start_delalloc_inodes(root, 0, -1);
if (ret > 0)
ret = 0;
return ret;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index ea192378cb3b..f94b70e3525d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -640,7 +640,7 @@ static int create_snapshot(struct btrfs_root *root, struct 
inode *dir,
wait_event(root->subv_writers->wait,
   percpu_counter_sum(>subv_writers->counter) == 0);
 
-   ret = btrfs_start_delalloc_inodes(root, 0);
+   ret = btrfs_start_delalloc_inodes(root);
if (ret)
goto dec_and_free;
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] btrfs: Unexport btrfs_alloc_delalloc_work

2018-04-23 Thread Nikolay Borisov
It's used only in inode.c so makes no sense to have it exported.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 91f51a80334f..e5c24d86fdfa 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3170,8 +3170,6 @@ struct btrfs_delalloc_work {
struct btrfs_work work;
 };
 
-struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode);
-
 struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
struct page *page, size_t pg_offset, u64 start,
u64 len, int create);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.17-rc1 FS went read-only during balance

2018-04-23 Thread Qu Wenruo


On 2018年04月23日 13:08, Dmitrii Tcvetkov wrote:
> On Mon, 23 Apr 2018 09:23:53 +0800
> Qu Wenruo  wrote:
> 
>> On 2018年04月21日 22:55, Dmitrii Tcvetkov wrote:
>>> TL;DR It seems as regression in 4.17, but I managed to find a
>>> workaround to make filesystem rw mountable again.
>>>
>>> Kernel built from tag v4.17-rc1
>>> btrfs-progs 4.16
>>>
>>> Tonight two my machines (PC (ECC RAM) and laptop(non-ECC RAM)) were
>>> doing usual weekly balance with this command via cron:
>>> btrfs balance start -musage=50 -dusage=50 
>>> Both machines run same kernel version. 
>>>
>>> On PC that caused root and "data" filesystems to go readonly. Root
>>> is on an SSD with data single and metadata DUP, "data" filesystem
>>> is on 2 HDDs with RAID1 for data and metadata.
>>>
>>> On laptop only /home went ro, it's on NVMe SSD with data single and
>>> metadata DUP. 
>>>
>>> Btrfs check of PC rootfs was without any errors in both modes, I did
>>> them once each before reboot on readonly filesystem with --force
>>> flag and then from live usb. Same output without any errors.
>>>
>>> After reboot kernel refused rw mount rootfs with the same error as
>>> during cron balance, ro mount was accepted, error during rw mount:
>>> BTRFS: error (device dm-17) in merge_reloc_roots:2465: errno=-117  
> 
>> 117 means EUCLEAN, which could be caused by the newly introduced
>> first_key and level check.
> 
>> Please apply this hotfix to fix it.
>> btrfs: Only check first key for committed tree blocks
>> (Which is included in latest pull request)
> 
>> Also, please consider enable CONFIG_BTRFS_DEBUG to provide extra
>> debug info.
> 
>> Thanks,
>> Qu
> 
> I tried 4.17-rc2 (as the pull request was pulled) with
> CONFIG_BTRFS_DEBUG on LVM snapshot of laptop home partition (/dev/vdb)
> in a VM (VM kernel sees only snapshot so no UUID collisions). Dmesg
> attached.

Thanks for the info and your previous btrfs-image.

The image itself shows nothing wrong, so it should be runtime problem.
Would you please apply these two debug patches?
https://patchwork.kernel.org/patch/10335133/
https://patchwork.kernel.org/patch/10335135/

And the attached diff file?

My guess is the parent node is not initialized correctly in this case.

Thanks,
Qu
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 60caa68c3618..79f482578e02 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -458,6 +458,7 @@ static int verify_level_key(struct btrfs_fs_info *fs_info,
 			  eb->start, first_key->objectid, first_key->type,
 			  first_key->offset, found_key.objectid,
 			  found_key.type, found_key.offset);
+		btrfs_print_tree(eb, false);
 	}
 #endif
 	return ret;
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 00b7d3231821..cde0cb6c9786 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1870,6 +1870,8 @@ int replace_path(struct btrfs_trans_handle *trans,
 	 level - 1, _key);
 			if (IS_ERR(eb)) {
 ret = PTR_ERR(eb);
+btrfs_err(fs_info, "parent leaf, slot: %d:", slot);
+btrfs_print_tree(parent, false);
 break;
 			} else if (!extent_buffer_uptodate(eb)) {
 ret = -EIO;


signature.asc
Description: OpenPGP digital signature