Re: [PATCH v3 0/2] fix bug in btrfs_init_new_device()

2017-09-27 Thread Anand Jain



On 09/27/2017 10:26 PM, Nikolay Borisov wrote:



On 27.09.2017 17:22, David Sterba wrote:

On Tue, Sep 26, 2017 at 03:14:27PM +0300, Nikolay Borisov wrote:



On 26.09.2017 11:47, Anand Jain wrote:

1/2 fixes a bug which failed to reset writable when sprouting failed
2/2 fixes BUG_ON in btrfs_init_new_device()

Anand Jain (2):
   btrfs: undo writable when sprouting fails
   btrfs: fix BUG_ON in btrfs_init_new_device()


Reviewed-by: Nikolay Borisov 


Please note that this would lead to unexpected unlocks of uuid_mutex and
sb::s_umount:

2465 ret = btrfs_commit_transaction(trans);
2466
2467 if (seeding_dev) {
2468 mutex_unlock(&uuid_mutex);
2469 up_write(&sb->s_umount);

first unlocks

2470
2471 if (ret) /* transaction commit */
2472 return ret;
2473
2474 ret = btrfs_relocate_sys_chunks(fs_info);
2475 if (ret < 0)
2476 btrfs_handle_fs_error(fs_info, ret,
2477 "Failed to relocate sys chunks after device 
initialization. This can be fixed using the \"btrfs balance\"
2478 trans = btrfs_attach_transaction(root);
2479 if (IS_ERR(trans)) {
2480 if (PTR_ERR(trans) == -ENOENT)
2481 return 0;
2482 return PTR_ERR(trans);


this becomes goto 2494

2483 }
2484 ret = btrfs_commit_transaction(trans);
2485 }
2486
2487 /* Update ctime/mtime for libblkid */
2488 update_dev_time(device_path);
2489 return ret;
2490
2491 error_trans:
2492 btrfs_end_transaction(trans);
2493 rcu_string_free(device->name);
2494 btrfs_sysfs_rm_device_link(fs_info->fs_devices, device);

here

2495 kfree(device);
2496 error:
2497 blkdev_put(bdev, FMODE_EXCL);
2498 if (seeding_dev) {
2499 mutex_unlock(&uuid_mutex);
2500 up_write(&sb->s_umount);

and unlocking again

2501 }
2502 return ret;

So the in-place returns had a meaning but are quite confusing.


 Oops. I forgot about it though I noticed, will fix it.

I concur and had missed that ;(. 


 No you didn't, the changes which David is talking about isn't in this 
set. It was independent patch: But, my fault, to confuse about the patch 
set.
   [PATCH] btrfs: take the error path out if btrfs_attach_transaction() 
fails


V4 is sent to clear this.

Thanks, Anand



I saw that you suggested Anand to
overhaul the overall error handling in this function and I believe this
would be the best course of action.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 3/3] btrfs: error out if btrfs_attach_transaction() fails

2017-09-27 Thread Anand Jain
btrfs_init_new_device() calls btrfs_attach_transaction() to
commit sys chunks, and it should error out if it fails.

Signed-off-by: Anand Jain 
Reviewed-by: Qu Wenruo 
---
v4: make this patch as part of this set.
avoid double mutext unlock.
v3: not part of this set
v2: patch did not exist
v1: patch did not exist
 fs/btrfs/volumes.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4cb575fbf643..bdb8f04663a4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2323,6 +2323,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, 
const char *device_path
u64 tmp;
int seeding_dev = 0;
int ret = 0;
+   bool unlocked = false;
 
if (sb_rdonly(sb) && !fs_info->fs_devices->seeding)
return -EROFS;
@@ -2482,6 +2483,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, 
const char *device_path
if (seeding_dev) {
mutex_unlock(&uuid_mutex);
up_write(&sb->s_umount);
+   unlocked = true;
 
if (ret) /* transaction commit */
return ret;
@@ -2494,7 +2496,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, 
const char *device_path
if (IS_ERR(trans)) {
if (PTR_ERR(trans) == -ENOENT)
return 0;
-   return PTR_ERR(trans);
+   ret = PTR_ERR(trans);
+   goto error_sysfs;
}
ret = btrfs_commit_transaction(trans);
}
@@ -2513,7 +2516,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, 
const char *device_path
kfree(device);
 error:
blkdev_put(bdev, FMODE_EXCL);
-   if (seeding_dev) {
+   if (seeding_dev && !unlocked) {
mutex_unlock(&uuid_mutex);
up_write(&sb->s_umount);
}
-- 
2.13.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/3] btrfs: fix BUG_ON in btrfs_init_new_device()

2017-09-27 Thread Anand Jain
Instead of BUG_ON return error to the caller. And handle the fail
condition by calling the abort transaction and going through the
error path.

Signed-off-by: Anand Jain 
Reviewed-by: Nikolay Borisov 
---
v4: none;
v3: meld this with
 btrfs: cleanup btrfs_init_new_device()
v2: do not consolidate btrfs_abort_transaction()

 fs/btrfs/volumes.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9d64700cc9b6..4cb575fbf643 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2399,7 +2399,10 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, 
const char *device_path
if (seeding_dev) {
sb->s_flags &= ~MS_RDONLY;
ret = btrfs_prepare_sprout(fs_info);
-   BUG_ON(ret); /* -ENOMEM */
+   if (ret) {
+   btrfs_abort_transaction(trans, ret);
+   goto error_trans;
+   }
}
 
device->fs_devices = fs_info->fs_devices;
@@ -2445,14 +2448,14 @@ int btrfs_init_new_device(struct btrfs_fs_info 
*fs_info, const char *device_path
mutex_unlock(&fs_info->chunk_mutex);
if (ret) {
btrfs_abort_transaction(trans, ret);
-   goto error_trans;
+   goto error_sysfs;
}
}
 
ret = btrfs_add_device(trans, fs_info, device);
if (ret) {
btrfs_abort_transaction(trans, ret);
-   goto error_trans;
+   goto error_sysfs;
}
 
if (seeding_dev) {
@@ -2461,7 +2464,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, 
const char *device_path
ret = btrfs_finish_sprout(trans, fs_info);
if (ret) {
btrfs_abort_transaction(trans, ret);
-   goto error_trans;
+   goto error_sysfs;
}
 
/* Sprouting would change fsid of the mounted root,
@@ -2500,12 +2503,13 @@ int btrfs_init_new_device(struct btrfs_fs_info 
*fs_info, const char *device_path
update_dev_time(device_path);
return ret;
 
+error_sysfs:
+   btrfs_sysfs_rm_device_link(fs_info->fs_devices, device);
 error_trans:
if (seeding_dev)
sb->s_flags |= MS_RDONLY;
btrfs_end_transaction(trans);
rcu_string_free(device->name);
-   btrfs_sysfs_rm_device_link(fs_info->fs_devices, device);
kfree(device);
 error:
blkdev_put(bdev, FMODE_EXCL);
-- 
2.13.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/3] btrfs: undo writable when sprouting fails

2017-09-27 Thread Anand Jain
When new device is being added to seed FS, seed FS is marked writable,
but when we fail to bring in the new device, we missed to undo the
writable part. This patch fixes it.

Signed-off-by: Anand Jain 
Reviewed-by: Nikolay Borisov 
---
v4: none
v3: none
v2: add commit log
v1: moving sb->s_flags |= MS_RDONLY; to further below to error:
is not a good idea because, goto error; comes from a place
where sb->s_flags &= ~MS_RDONLY; has not yet been called.

 fs/btrfs/volumes.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0e8f16c305df..9d64700cc9b6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2501,6 +2501,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, 
const char *device_path
return ret;
 
 error_trans:
+   if (seeding_dev)
+   sb->s_flags |= MS_RDONLY;
btrfs_end_transaction(trans);
rcu_string_free(device->name);
btrfs_sysfs_rm_device_link(fs_info->fs_devices, device);
-- 
2.13.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] btrfs-progs: Resolve memory-leak in btrfs qgroup show when quota disabled

2017-09-27 Thread Gu, Jinxiang
Hi,
I am so sorry for that this patch brings a new problem of double free in normal 
scenes.
So I send a new patch .
Please revert this one and use the new patch.


> -Original Message-
> From: David Sterba [mailto:dste...@suse.cz]
> Sent: Tuesday, September 26, 2017 7:28 PM
> To: Gu, Jinxiang/顾 金香 
> Cc: linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH] btrfs-progs: Resolve memory-leak in btrfs qgroup show 
> when quota disabled
> 
> On Tue, Sep 26, 2017 at 05:28:31PM +0800, Gu Jinxiang wrote:
> > When quota disabled, btrfs qgroup show exit with a error message, but
> > the allocated memory is not freed.
> >
> > By the way, this bug marked as issue#20 in github.
> >
> > Signed-off-by: Gu Jinxiang 
> 
> Applied, thanks.
> 



N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{�n谶�)��骅w*jg�报�茛j/�赇z罐���2���ㄨ��&�)摺�a囤���G���h��j:+v���w��佶

[PATCH] btrfs-progs: Resolve memory-leak in btrfs qgroup show when quota disabled

2017-09-27 Thread Gu Jinxiang
From: Gu JinXiang 

When quota disabled, btrfs qgroup show exit with a error message,
but the allocated memory is not freed.
By the way, this bug marked as issue#20 in github.

Signed-off-by: Gu JinXiang 
---
 cmds-qgroup.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/cmds-qgroup.c b/cmds-qgroup.c
index 38382ea9..51d4f3a1 100644
--- a/cmds-qgroup.c
+++ b/cmds-qgroup.c
@@ -369,9 +369,8 @@ static int cmd_qgroup_show(int argc, char **argv)
path = argv[optind];
fd = btrfs_open_dir(path, &dirstream, 1);
if (fd < 0) {
-   free(filter_set);
-   free(comparer_set);
-   return 1;
+   ret = 1;
+   goto out;
}
 
if (sync) {
@@ -399,13 +398,21 @@ static int cmd_qgroup_show(int argc, char **argv)
qgroupid);
}
ret = btrfs_show_qgroups(fd, filter_set, comparer_set);
-   if (ret == -ENOENT)
+   if (ret == -ENOENT) {
error("can't list qgroups: quotas not enabled");
-   else if (ret < 0)
+   goto out;
+   } else if (ret < 0) {
error("can't list qgroups: %s", strerror(-ret));
+   goto out;
+   }
close_file_or_dir(fd, dirstream);
 
+   return !!ret;
+
 out:
+   close_file_or_dir(fd, dirstream);
+   free(filter_set);
+   free(comparer_set);
return !!ret;
 }
 
-- 
2.14.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] btrfs: tree-checker: Enhance output for check_extent_data_item

2017-09-27 Thread Nikolay Borisov


On 28.09.2017 06:36, Qu Wenruo wrote:
> Output the invalid member name and its bad value, along with its
> expected value range or alignment.
> 
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/tree-checker.c | 92 
> +
>  1 file changed, 70 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 52e9ab8c2a79..1324fcae93c0 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -63,6 +63,47 @@ static void generic_err(const struct btrfs_root *root,
>   va_end(args);
>  }
>  
> +/*
> + * Customized reporter for extent data item, since its key objectid and
> + * offset has its own meaning.
> + */
> +__printf(4, 5)
> +static void file_extent_err(const struct btrfs_root *root,
> + const struct extent_buffer *eb,
> + int slot, const char *fmt, ...)
> +{
> + struct btrfs_key key;
> + struct va_format vaf;
> + va_list args;
> +
> + btrfs_item_key_to_cpu(eb, &key, slot);
> + va_start(args, fmt);
> +
> + vaf.fmt = fmt;
> + vaf.va = &args;
> +
> + btrfs_crit(root->fs_info,
> + "corrupt %s root=%llu tree_block=%llu slot=%d ino=%llu 
> file_offset=%llu: %pV",

nit: Again, consider whether we should have : after the first %s so that
the string is consistent among different verifiers.

> + btrfs_header_level(eb) == 0 ? "leaf" : "node",
> + root->objectid, btrfs_header_bytenr(eb), slot,
> + key.objectid, key.offset, &vaf);
> + va_end(args);
> +}
> +
> +/*
> + * Return 0 if the btrfs_file_extent_##name is aligned to @align
> + * Else return 1
> + */
> +#define CHECK_FI_ALIGN(root, leaf, slot, fi, name, align)\
> +({   \
> + if (!IS_ALIGNED(btrfs_file_extent_##name(leaf, fi), align)) \
> + file_extent_err(root, leaf, slot,   \
> + "invalid file extent %s, have %llu, should be aligned 
> to %u",\
> + #name, btrfs_file_extent_##name(leaf, fi),  \
> + align); \
> + (!IS_ALIGNED(btrfs_file_extent_##name(leaf, fi), align));   \
> +})
> +
>  static int check_extent_data_item(struct btrfs_root *root,
> struct extent_buffer *leaf,
> struct btrfs_key *key, int slot)
> @@ -72,15 +113,19 @@ static int check_extent_data_item(struct btrfs_root 
> *root,
>   u32 item_size = btrfs_item_size_nr(leaf, slot);
>  
>   if (!IS_ALIGNED(key->offset, sectorsize)) {
> - CORRUPT("unaligned key offset for file extent",
> - leaf, root, slot);
> + file_extent_err(root, leaf, slot,
> + "unaligned key offset, have %llu should be aligned to 
> %u",
> + key->offset, sectorsize);
>   return -EUCLEAN;
>   }
>  
>   fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
>  
>   if (btrfs_file_extent_type(leaf, fi) > BTRFS_FILE_EXTENT_TYPES) {
> - CORRUPT("invalid file extent type", leaf, root, slot);
> + file_extent_err(root, leaf, slot,
> + "invalid file extent type, have %u expect range [0, 
> %u]",
> + btrfs_file_extent_type(leaf, fi),
> + BTRFS_FILE_EXTENT_TYPES);
>   return -EUCLEAN;
>   }
>  
> @@ -89,18 +134,24 @@ static int check_extent_data_item(struct btrfs_root 
> *root,
>* and must be caught in open_ctree().
>*/
>   if (btrfs_file_extent_compression(leaf, fi) > BTRFS_COMPRESS_TYPES) {
> - CORRUPT("invalid file extent compression", leaf, root, slot);
> + file_extent_err(root, leaf, slot,
> + "invalid file extent compression, have %u expect range 
> [0, %u]",
> + btrfs_file_extent_compression(leaf, fi),
> + BTRFS_COMPRESS_TYPES);
>   return -EUCLEAN;
>   }
>   if (btrfs_file_extent_encryption(leaf, fi)) {
> - CORRUPT("invalid file extent encryption", leaf, root, slot);
> + file_extent_err(root, leaf, slot,
> + "invalid file extent encryption, have %u expect 0",
> + btrfs_file_extent_encryption(leaf, fi));
>   return -EUCLEAN;
>   }
>   if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) {
>   /* Inline extent must have 0 as key offset */
>   if (key->offset) {
> - CORRUPT("inline extent has non-zero key offset",
> - leaf, root, slot);
> + file_extent_err(root, leaf, slot,
> + "invalid offset for inline extent, have %llu 
> expect 0",
> + 

Re: [PATCH 3/5] btrfs: tree-checker: Enhance output for btrfs_check_leaf

2017-09-27 Thread Nikolay Borisov


On 28.09.2017 06:36, Qu Wenruo wrote:
> Enhance the output to print:
> 1) Reason
> 2) Bad value
>If reason can't explain enough
> 3) Good value (range)
> 
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/tree-checker.c | 26 --
>  1 file changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 4b615beb0ca9..4f847955fdc6 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -216,8 +216,8 @@ int btrfs_check_leaf(struct btrfs_root *root, struct 
> extent_buffer *leaf)
>   eb = btrfs_root_node(check_root);
>   /* if leaf is the root, then it's fine */
>   if (leaf != eb) {
> - CORRUPT("non-root leaf's nritems is 0",
> - leaf, check_root, 0);
> + generic_err(check_root, leaf, 0,
> + "non-root leaf's nritems is 0");
>   free_extent_buffer(eb);
>   return -EUCLEAN;
>   }
> @@ -248,7 +248,11 @@ int btrfs_check_leaf(struct btrfs_root *root, struct 
> extent_buffer *leaf)
>  
>   /* Make sure the keys are in the right order */
>   if (btrfs_comp_cpu_keys(&prev_key, &key) >= 0) {
> - CORRUPT("bad key order", leaf, root, slot);
> + generic_err(root, leaf, slot,
> + "bad key order, prev key (%llu %u %llu) current 
> key (%llu %u %llu)",
> + prev_key.objectid, prev_key.type,
> + prev_key.offset, key.objectid, key.type,
> + key.offset);
>   return -EUCLEAN;
>   }
>  
> @@ -263,7 +267,10 @@ int btrfs_check_leaf(struct btrfs_root *root, struct 
> extent_buffer *leaf)
>   item_end_expected = btrfs_item_offset_nr(leaf,
>slot - 1);
>   if (btrfs_item_end_nr(leaf, slot) != item_end_expected) {
> - CORRUPT("slot offset bad", leaf, root, slot);
> + generic_err(root, leaf, slot,
> + "uncontinuous item end, have %u expect %u",
nit: s/uncontinious/discontinious
> + btrfs_item_end_nr(leaf, slot),
> + item_end_expected);
>   return -EUCLEAN;
>   }
>  
> @@ -274,14 +281,21 @@ int btrfs_check_leaf(struct btrfs_root *root, struct 
> extent_buffer *leaf)
>*/
>   if (btrfs_item_end_nr(leaf, slot) >
>   BTRFS_LEAF_DATA_SIZE(fs_info)) {
> - CORRUPT("slot end outside of leaf", leaf, root, slot);
> + generic_err(root, leaf, slot,
> + "slot end outside of leaf, have %u expect range 
> [0, %u]",
> + btrfs_item_end_nr(leaf, slot),
> + BTRFS_LEAF_DATA_SIZE(fs_info));
>   return -EUCLEAN;
>   }
>  
>   /* Also check if the item pointer overlaps with btrfs item. */
>   if (btrfs_item_nr_offset(slot) + sizeof(struct btrfs_item) >
>   btrfs_item_ptr_offset(leaf, slot)) {
> - CORRUPT("slot overlap with its data", leaf, root, slot);
> + generic_err(root, leaf, slot,
> + "slot overlap with its data, item end %lu data 
> start %lu",
> + btrfs_item_nr_offset(slot) +
> + sizeof(struct btrfs_item),
> + btrfs_item_ptr_offset(leaf, slot));
>   return -EUCLEAN;
>   }
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] btrfs: tree-checker: Enhance btrfs_check_node output

2017-09-27 Thread Nikolay Borisov


On 28.09.2017 06:36, Qu Wenruo wrote:
> Use inline function to replace macro since we don't need
> stringification.
> (Macro still exist until all caller get updated)
> 
> And add more info about the error.
> 
> For nr_items error, report if it's too large or too small, and output
> valid value range.
> 
> For blk pointer, added a new alignment checker.
> 
> For key order, also output the next key to make the problem more
> obvious.
> 
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/tree-checker.c | 48 
>  1 file changed, 44 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 301243a69dea..4b615beb0ca9 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -37,6 +37,32 @@
>  btrfs_header_level(eb) == 0 ? "leaf" : "node",   \
>  reason, btrfs_header_bytenr(eb), root->objectid, slot)
>  
> +/*
> + * Append the generic "corrupt leaf/node root=%llu block=%llu slot=%d: " to
> + * @fmt.
> + * Allowing user to customize their output.
> + */
> +__printf(4, 5)
> +static void generic_err(const struct btrfs_root *root,
> + const struct extent_buffer *eb,
> + int slot, const char *fmt, ...)
> +{
> + struct va_format vaf;
> + va_list args;
> +
> + va_start(args, fmt);
> +
> + vaf.fmt = fmt;
> + vaf.va = &args;
> +
> + btrfs_crit(root->fs_info,
> + "corrupt %s root=%llu block=%llu slot=%d: %pV",
> + btrfs_header_level(eb) == 0 ? "leaf" : "node",
> + root->objectid, btrfs_header_bytenr(eb), slot,
> + &vaf);
> + va_end(args);
> +}
> +
>  static int check_extent_data_item(struct btrfs_root *root,
> struct extent_buffer *leaf,
> struct btrfs_key *key, int slot)
> @@ -282,8 +308,10 @@ int btrfs_check_node(struct btrfs_root *root, struct 
> extent_buffer *node)
>  
>   if (nr == 0 || nr > BTRFS_NODEPTRS_PER_BLOCK(root->fs_info)) {
>   btrfs_crit(root->fs_info,
> -"corrupt node: block %llu root %llu nritems %lu",
> -node->start, root->objectid, nr);
> + "corrupt node: root=%llu block=%llu nritems too %s, 
> have %lu, range [1,%u]",

It's a nit but I'd rather have consistency between formatting of error
messages. Here you begin the line with "corrupt node:" whereas when
using generic_err you won't have the ':'. I'd prefer if everything is
consistent so it's easy for people to grep.

> +root->objectid, node->start,
> +nr == 0 ? "small" : "large", nr,
> +BTRFS_NODEPTRS_PER_BLOCK(root->fs_info));
>   return -EIO;
>   }
>  
> @@ -293,13 +321,25 @@ int btrfs_check_node(struct btrfs_root *root, struct 
> extent_buffer *node)
>   btrfs_node_key_to_cpu(node, &next_key, slot + 1);
>  
>   if (!bytenr) {
> - CORRUPT("invalid item slot", node, root, slot);
> + generic_err(root, node, slot,
> + "invalid node pointer, shouldn't point to zero");
>   ret = -EIO;
>   goto out;
>   }
> + if (!IS_ALIGNED(bytenr, root->fs_info->sectorsize)) {
> + generic_err(root, node, slot,
> + "unaligned pointer, have %llu, should be aligned to %u",
> +  bytenr, root->fs_info->sectorsize);
> + ret = -EUCLEAN;
> + goto out;
> + }
>  
>   if (btrfs_comp_cpu_keys(&key, &next_key) >= 0) {
> - CORRUPT("bad key order", node, root, slot);
> + generic_err(root, node, slot,
> + "bad key order, current key (%llu %u %llu) next key 
> (%llu %u %llu)",
> +  key.objectid, key.type, key.offset,
> +  next_key.objectid, next_key.type,
> +  next_key.offset);
>   ret = -EIO;
>   goto out;
>   }
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] btrfs: tree-checker: Enhance output for btrfs_check_leaf

2017-09-27 Thread Qu Wenruo
Enhance the output to print:
1) Reason
2) Bad value
   If reason can't explain enough
3) Good value (range)

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/tree-checker.c | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 4b615beb0ca9..4f847955fdc6 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -216,8 +216,8 @@ int btrfs_check_leaf(struct btrfs_root *root, struct 
extent_buffer *leaf)
eb = btrfs_root_node(check_root);
/* if leaf is the root, then it's fine */
if (leaf != eb) {
-   CORRUPT("non-root leaf's nritems is 0",
-   leaf, check_root, 0);
+   generic_err(check_root, leaf, 0,
+   "non-root leaf's nritems is 0");
free_extent_buffer(eb);
return -EUCLEAN;
}
@@ -248,7 +248,11 @@ int btrfs_check_leaf(struct btrfs_root *root, struct 
extent_buffer *leaf)
 
/* Make sure the keys are in the right order */
if (btrfs_comp_cpu_keys(&prev_key, &key) >= 0) {
-   CORRUPT("bad key order", leaf, root, slot);
+   generic_err(root, leaf, slot,
+   "bad key order, prev key (%llu %u %llu) current 
key (%llu %u %llu)",
+   prev_key.objectid, prev_key.type,
+   prev_key.offset, key.objectid, key.type,
+   key.offset);
return -EUCLEAN;
}
 
@@ -263,7 +267,10 @@ int btrfs_check_leaf(struct btrfs_root *root, struct 
extent_buffer *leaf)
item_end_expected = btrfs_item_offset_nr(leaf,
 slot - 1);
if (btrfs_item_end_nr(leaf, slot) != item_end_expected) {
-   CORRUPT("slot offset bad", leaf, root, slot);
+   generic_err(root, leaf, slot,
+   "uncontinuous item end, have %u expect %u",
+   btrfs_item_end_nr(leaf, slot),
+   item_end_expected);
return -EUCLEAN;
}
 
@@ -274,14 +281,21 @@ int btrfs_check_leaf(struct btrfs_root *root, struct 
extent_buffer *leaf)
 */
if (btrfs_item_end_nr(leaf, slot) >
BTRFS_LEAF_DATA_SIZE(fs_info)) {
-   CORRUPT("slot end outside of leaf", leaf, root, slot);
+   generic_err(root, leaf, slot,
+   "slot end outside of leaf, have %u expect range 
[0, %u]",
+   btrfs_item_end_nr(leaf, slot),
+   BTRFS_LEAF_DATA_SIZE(fs_info));
return -EUCLEAN;
}
 
/* Also check if the item pointer overlaps with btrfs item. */
if (btrfs_item_nr_offset(slot) + sizeof(struct btrfs_item) >
btrfs_item_ptr_offset(leaf, slot)) {
-   CORRUPT("slot overlap with its data", leaf, root, slot);
+   generic_err(root, leaf, slot,
+   "slot overlap with its data, item end %lu data 
start %lu",
+   btrfs_item_nr_offset(slot) +
+   sizeof(struct btrfs_item),
+   btrfs_item_ptr_offset(leaf, slot));
return -EUCLEAN;
}
 
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] btrfs-progs: Move leaf and node validation checker to tree-checker.c

2017-09-27 Thread Qu Wenruo
It's no doubt the comprehensive tree block checker will become larger
and larger, so move them into their own file is quite reasonable.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/Makefile   |   2 +-
 fs/btrfs/ctree.h|   4 +
 fs/btrfs/disk-io.c  | 284 +---
 fs/btrfs/tree-checker.c | 309 
 4 files changed, 317 insertions(+), 282 deletions(-)
 create mode 100644 fs/btrfs/tree-checker.c

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 962a95aefb81..88255e133ade 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -9,7 +9,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
root-tree.o dir-item.o \
   export.o tree-log.o free-space-cache.o zlib.o lzo.o zstd.o \
   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
-  uuid-tree.o props.o hash.o free-space-tree.o
+  uuid-tree.o props.o hash.o free-space-tree.o tree-checker.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ea9c5648ff70..6b7c6fcbc5d5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3732,4 +3732,8 @@ static inline int btrfs_is_testing(struct btrfs_fs_info 
*fs_info)
 #endif
return 0;
 }
+
+/* Tree block validation checker */
+int btrfs_check_leaf(struct btrfs_root *root, struct extent_buffer *leaf);
+int btrfs_check_node(struct btrfs_root *root, struct extent_buffer *node);
 #endif
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c8633f2abdf1..57a9055655d3 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -543,284 +543,6 @@ static int check_tree_block_fsid(struct btrfs_fs_info 
*fs_info,
return ret;
 }
 
-#define CORRUPT(reason, eb, root, slot)
\
-   btrfs_crit(root->fs_info,   \
-  "corrupt %s, %s: block=%llu, root=%llu, slot=%d",\
-  btrfs_header_level(eb) == 0 ? "leaf" : "node",   \
-  reason, btrfs_header_bytenr(eb), root->objectid, slot)
-
-static int check_extent_data_item(struct btrfs_root *root,
- struct extent_buffer *leaf,
- struct btrfs_key *key, int slot)
-{
-   struct btrfs_file_extent_item *fi;
-   u32 sectorsize = root->fs_info->sectorsize;
-   u32 item_size = btrfs_item_size_nr(leaf, slot);
-
-   if (!IS_ALIGNED(key->offset, sectorsize)) {
-   CORRUPT("unaligned key offset for file extent",
-   leaf, root, slot);
-   return -EUCLEAN;
-   }
-
-   fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
-
-   if (btrfs_file_extent_type(leaf, fi) > BTRFS_FILE_EXTENT_TYPES) {
-   CORRUPT("invalid file extent type", leaf, root, slot);
-   return -EUCLEAN;
-   }
-
-   /*
-* Support for new compression/encrption must introduce incompat flag,
-* and must be caught in open_ctree().
-*/
-   if (btrfs_file_extent_compression(leaf, fi) > BTRFS_COMPRESS_TYPES) {
-   CORRUPT("invalid file extent compression", leaf, root, slot);
-   return -EUCLEAN;
-   }
-   if (btrfs_file_extent_encryption(leaf, fi)) {
-   CORRUPT("invalid file extent encryption", leaf, root, slot);
-   return -EUCLEAN;
-   }
-   if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) {
-   /* Inline extent must have 0 as key offset */
-   if (key->offset) {
-   CORRUPT("inline extent has non-zero key offset",
-   leaf, root, slot);
-   return -EUCLEAN;
-   }
-
-   /* Compressed inline extent has no on-disk size, skip it */
-   if (btrfs_file_extent_compression(leaf, fi) !=
-   BTRFS_COMPRESS_NONE)
-   return 0;
-
-   /* Uncompressed inline extent size must match item size */
-   if (item_size != BTRFS_FILE_EXTENT_INLINE_DATA_START +
-   btrfs_file_extent_ram_bytes(leaf, fi)) {
-   CORRUPT("plaintext inline extent has invalid size",
-   leaf, root, slot);
-   return -EUCLEAN;
-   }
-   return 0;
-   }
-
-   /* Regular or preallocated extent has fixed item size */
-   if (item_size != sizeof(*fi)) {
-   CORRUPT(
-   "regluar or preallocated extent data item size is invalid",
-   leaf, root, slot);
-   return -EUCLEAN;
-   }
-   if (!IS_ALIGNED(btrfs_file_extent_ram_bytes(leaf, fi), sectorsize) ||
-   

[PATCH 5/5] btrfs: tree-checker: Enhance output for check_extent_data_item

2017-09-27 Thread Qu Wenruo
Output the invalid member name and its bad value, along with its
expected value range or alignment.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/tree-checker.c | 92 +
 1 file changed, 70 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 52e9ab8c2a79..1324fcae93c0 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -63,6 +63,47 @@ static void generic_err(const struct btrfs_root *root,
va_end(args);
 }
 
+/*
+ * Customized reporter for extent data item, since its key objectid and
+ * offset has its own meaning.
+ */
+__printf(4, 5)
+static void file_extent_err(const struct btrfs_root *root,
+   const struct extent_buffer *eb,
+   int slot, const char *fmt, ...)
+{
+   struct btrfs_key key;
+   struct va_format vaf;
+   va_list args;
+
+   btrfs_item_key_to_cpu(eb, &key, slot);
+   va_start(args, fmt);
+
+   vaf.fmt = fmt;
+   vaf.va = &args;
+
+   btrfs_crit(root->fs_info,
+   "corrupt %s root=%llu tree_block=%llu slot=%d ino=%llu 
file_offset=%llu: %pV",
+   btrfs_header_level(eb) == 0 ? "leaf" : "node",
+   root->objectid, btrfs_header_bytenr(eb), slot,
+   key.objectid, key.offset, &vaf);
+   va_end(args);
+}
+
+/*
+ * Return 0 if the btrfs_file_extent_##name is aligned to @align
+ * Else return 1
+ */
+#define CHECK_FI_ALIGN(root, leaf, slot, fi, name, align)  \
+({ \
+   if (!IS_ALIGNED(btrfs_file_extent_##name(leaf, fi), align)) \
+   file_extent_err(root, leaf, slot,   \
+   "invalid file extent %s, have %llu, should be aligned 
to %u",\
+   #name, btrfs_file_extent_##name(leaf, fi),  \
+   align); \
+   (!IS_ALIGNED(btrfs_file_extent_##name(leaf, fi), align));   \
+})
+
 static int check_extent_data_item(struct btrfs_root *root,
  struct extent_buffer *leaf,
  struct btrfs_key *key, int slot)
@@ -72,15 +113,19 @@ static int check_extent_data_item(struct btrfs_root *root,
u32 item_size = btrfs_item_size_nr(leaf, slot);
 
if (!IS_ALIGNED(key->offset, sectorsize)) {
-   CORRUPT("unaligned key offset for file extent",
-   leaf, root, slot);
+   file_extent_err(root, leaf, slot,
+   "unaligned key offset, have %llu should be aligned to 
%u",
+   key->offset, sectorsize);
return -EUCLEAN;
}
 
fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
 
if (btrfs_file_extent_type(leaf, fi) > BTRFS_FILE_EXTENT_TYPES) {
-   CORRUPT("invalid file extent type", leaf, root, slot);
+   file_extent_err(root, leaf, slot,
+   "invalid file extent type, have %u expect range [0, 
%u]",
+   btrfs_file_extent_type(leaf, fi),
+   BTRFS_FILE_EXTENT_TYPES);
return -EUCLEAN;
}
 
@@ -89,18 +134,24 @@ static int check_extent_data_item(struct btrfs_root *root,
 * and must be caught in open_ctree().
 */
if (btrfs_file_extent_compression(leaf, fi) > BTRFS_COMPRESS_TYPES) {
-   CORRUPT("invalid file extent compression", leaf, root, slot);
+   file_extent_err(root, leaf, slot,
+   "invalid file extent compression, have %u expect range 
[0, %u]",
+   btrfs_file_extent_compression(leaf, fi),
+   BTRFS_COMPRESS_TYPES);
return -EUCLEAN;
}
if (btrfs_file_extent_encryption(leaf, fi)) {
-   CORRUPT("invalid file extent encryption", leaf, root, slot);
+   file_extent_err(root, leaf, slot,
+   "invalid file extent encryption, have %u expect 0",
+   btrfs_file_extent_encryption(leaf, fi));
return -EUCLEAN;
}
if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) {
/* Inline extent must have 0 as key offset */
if (key->offset) {
-   CORRUPT("inline extent has non-zero key offset",
-   leaf, root, slot);
+   file_extent_err(root, leaf, slot,
+   "invalid offset for inline extent, have %llu 
expect 0",
+   key->offset);
return -EUCLEAN;
}
 
@@ -112,8 +163,10 @@ static int check_extent_data_item(struct btrfs_root *root,
/* Uncompressed inline extent size must match item size */
if (item_size 

[PATCH 0/5] Enhance tree block validation checker

2017-09-27 Thread Qu Wenruo
The patchset can be fetched from github:
https://github.com/adam900710/linux/tree/checker_enhance

It's based on David's misc-next branch, with following commit as base:
a5e50b4b444c ("btrfs: Add checker for EXTENT_CSUM")

According to David's suggestion, enhance the output format of tree block
validation checker.

And move them into one separate file: tree-checker.c.

Some example output using btrfsck fsck-test images looks like:

For unagliend file extent member:
---
BTRFS critical (device loop0): corrupt leaf root=1 tree_block=29360128 slot=7 
ino=257 file_offset=0: invalid file extent disk_bytenr, have 755944791, should 
be aligned to 4096
BTRFS error (device loop0): open_ctree failed
---

For bad leaf holes:
---
BTRFS critical (device loop0): corrupt leaf root=1 block=29360128 slot=28: 
uncontinuous item end, have 9387 expect 15018
BTRFS error (device loop0): open_ctree failed
---


Qu Wenruo (5):
  btrfs-progs: Move leaf and node validation checker to tree-checker.c
  btrfs: tree-checker: Enhance btrfs_check_node output
  btrfs: tree-checker: Enhance output for btrfs_check_leaf
  btrfs: tree-checker: Enhance output for check_csum_item
  btrfs: tree-checker: Enhance output for check_extent_data_item

 fs/btrfs/Makefile   |   2 +-
 fs/btrfs/ctree.h|   4 +
 fs/btrfs/disk-io.c  | 284 +
 fs/btrfs/tree-checker.c | 417 
 4 files changed, 425 insertions(+), 282 deletions(-)
 create mode 100644 fs/btrfs/tree-checker.c

-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] btrfs: tree-checker: Enhance output for check_csum_item

2017-09-27 Thread Qu Wenruo
Output the bad value and expected good value (or its alignment).

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/tree-checker.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 4f847955fdc6..52e9ab8c2a79 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -147,15 +147,21 @@ static int check_csum_item(struct btrfs_root *root, 
struct extent_buffer *leaf,
u32 csumsize = btrfs_super_csum_size(root->fs_info->super_copy);
 
if (key->objectid != BTRFS_EXTENT_CSUM_OBJECTID) {
-   CORRUPT("invalid objectid for csum item", leaf, root, slot);
+   generic_err(root, leaf, slot,
+   "invalid objectid for CSUM_ITEM, have %llu expect %llu",
+   key->objectid, BTRFS_EXTENT_CSUM_OBJECTID);
return -EUCLEAN;
}
if (!IS_ALIGNED(key->offset, sectorsize)) {
-   CORRUPT("unaligned key offset for csum item", leaf, root, slot);
+   generic_err(root, leaf, slot,
+   "unaligned key offset for CSUM_ITEM, have %llu should 
be aligned to %u",
+   key->offset, sectorsize);
return -EUCLEAN;
}
if (!IS_ALIGNED(btrfs_item_size_nr(leaf, slot), csumsize)) {
-   CORRUPT("unaligned csum item size", leaf, root, slot);
+   generic_err(root, leaf, slot,
+   "unaligned item size for CSUM_ITEM, have %u should be 
aligned to %u",
+   btrfs_item_size_nr(leaf, slot), csumsize);
return -EUCLEAN;
}
return 0;
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] btrfs: tree-checker: Enhance btrfs_check_node output

2017-09-27 Thread Qu Wenruo
Use inline function to replace macro since we don't need
stringification.
(Macro still exist until all caller get updated)

And add more info about the error.

For nr_items error, report if it's too large or too small, and output
valid value range.

For blk pointer, added a new alignment checker.

For key order, also output the next key to make the problem more
obvious.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/tree-checker.c | 48 
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 301243a69dea..4b615beb0ca9 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -37,6 +37,32 @@
   btrfs_header_level(eb) == 0 ? "leaf" : "node",   \
   reason, btrfs_header_bytenr(eb), root->objectid, slot)
 
+/*
+ * Append the generic "corrupt leaf/node root=%llu block=%llu slot=%d: " to
+ * @fmt.
+ * Allowing user to customize their output.
+ */
+__printf(4, 5)
+static void generic_err(const struct btrfs_root *root,
+   const struct extent_buffer *eb,
+   int slot, const char *fmt, ...)
+{
+   struct va_format vaf;
+   va_list args;
+
+   va_start(args, fmt);
+
+   vaf.fmt = fmt;
+   vaf.va = &args;
+
+   btrfs_crit(root->fs_info,
+   "corrupt %s root=%llu block=%llu slot=%d: %pV",
+   btrfs_header_level(eb) == 0 ? "leaf" : "node",
+   root->objectid, btrfs_header_bytenr(eb), slot,
+   &vaf);
+   va_end(args);
+}
+
 static int check_extent_data_item(struct btrfs_root *root,
  struct extent_buffer *leaf,
  struct btrfs_key *key, int slot)
@@ -282,8 +308,10 @@ int btrfs_check_node(struct btrfs_root *root, struct 
extent_buffer *node)
 
if (nr == 0 || nr > BTRFS_NODEPTRS_PER_BLOCK(root->fs_info)) {
btrfs_crit(root->fs_info,
-  "corrupt node: block %llu root %llu nritems %lu",
-  node->start, root->objectid, nr);
+   "corrupt node: root=%llu block=%llu nritems too %s, 
have %lu, range [1,%u]",
+  root->objectid, node->start,
+  nr == 0 ? "small" : "large", nr,
+  BTRFS_NODEPTRS_PER_BLOCK(root->fs_info));
return -EIO;
}
 
@@ -293,13 +321,25 @@ int btrfs_check_node(struct btrfs_root *root, struct 
extent_buffer *node)
btrfs_node_key_to_cpu(node, &next_key, slot + 1);
 
if (!bytenr) {
-   CORRUPT("invalid item slot", node, root, slot);
+   generic_err(root, node, slot,
+   "invalid node pointer, shouldn't point to zero");
ret = -EIO;
goto out;
}
+   if (!IS_ALIGNED(bytenr, root->fs_info->sectorsize)) {
+   generic_err(root, node, slot,
+   "unaligned pointer, have %llu, should be aligned to %u",
+bytenr, root->fs_info->sectorsize);
+   ret = -EUCLEAN;
+   goto out;
+   }
 
if (btrfs_comp_cpu_keys(&key, &next_key) >= 0) {
-   CORRUPT("bad key order", node, root, slot);
+   generic_err(root, node, slot,
+   "bad key order, current key (%llu %u %llu) next key 
(%llu %u %llu)",
+key.objectid, key.type, key.offset,
+next_key.objectid, next_key.type,
+next_key.offset);
ret = -EIO;
goto out;
}
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option

2017-09-27 Thread Qu Wenruo



On 2017年09月28日 00:20, David Sterba wrote:

On Mon, Sep 25, 2017 at 07:15:30AM -0400, Austin S. Hemmelgarn wrote:

On 2017-09-24 10:08, Goffredo Baroncelli wrote:

On 09/24/2017 12:10 PM, Anand Jain wrote:


A lot of points in this thread, let me address them here.

I don't want to remove --rootdir functionality, the fix that's being
proposed removes the minimalization -- feature that was not well known,
but I understand it's useful (and already used).

I'd like to fix that in another way, eg. as an option to mkfs or a
separate tool.

I'm not worried about adding more code or code complexity. If we do it
right it's not a problem in the long run. But people for some reason
like to delete other people's code or functionality.

Regarding guessing number of users, this is always hard. So if there's
one vocal enough to let us know about the usecase, it's IMHO good time
to explore the it, code-wise and documentation-wise, and fix it or
improve.

So, what next. I'd like to get rid of the custom chunk allocator, namely
because of the known 1MB area misuse and duplication.

Implementing the minimalization might need some preparatory work and I
don't have a realistic ETA now.


Well, if using over-reserve-then-shrink method, it could be done, 
without much hassle.


However ETA maybe delayed to middle of Oct, as I'm going to enjoy my 
holiday during 1st Oct to 7th Oct.


Thanks,
Qu


Ideally we'd fix both problems in one
version, as I'd rather avoid "temporary" solution to drop the
minimalization with the promise to put it back later.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cannot fix btrfs errors after system crash

2017-09-27 Thread Niccolò Belli

Hi,
I was trying to use AMDGPU-PRO's OpenCL stack (with the mainline 4.12.13 
kernel) while it suddently crashed the whole system, not even magic sysrq 
keys did work anymore.
With no surprise, at the next reboot I found several btrfs warnings (see 
https://paste.pound-python.org/show/S5zBG2tXZUTLG699saE5/).
Since btrfs scrub didn't find any error I decided to reboot into a live usb 
and start a btrfs check (I'm using btrfs-progs 4.13).
It did found lots of errors indeed (see 
https://paste.pound-python.org/show/IPxh9sly0EEb0MKPi2dw/).
So I made a full backup with dd and I started a btrfs check --repair (see 
https://paste.pound-python.org/show/c9AlT8ehKKJy6l5xhzXk/).

I also wiped the space cache with --clear-space-cache v1.
A subsequent btrfs check revealed it indeed fixed lots of errors (see 
https://paste.pound-python.org/show/1m2Wodd1q3n0eRlxLpZB/), but 
unfortunately i still have the following errors:


unresolved ref dir 7450239 index 2 namelen 6 name 431886 filetype 1 errors 
80, filetype mismatch
unresolved ref dir 7450595 index 2 namelen 6 name 431886 filetype 1 errors 
80, filetype mismatch
unresolved ref dir 7457122 index 2 namelen 6 name 431886 filetype 1 errors 
80, filetype mismatch


I'm already quite satisfied to be honest: two years ago repair used to eat 
my data, making things worse.
Anyway, why didn't btrfs-check repair them? Is there anything I can do to 
fix them?


Thanks,
Niccolò
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs check error and kernel stack traces

2017-09-27 Thread Johannes Ernst
A bunch of processes hung, so I rebooted. Upon reboot:

> btrfs check /dev/sdb
Checking filesystem on /dev/sdb
UUID: d53460f0-40b8-4e5c-8172-cb4b7cd26a28
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
checking quota groups
ERROR: out of memory
ERROR: Loading qgroups from disk: -2
ERROR: failed to check quota groups
found 237836521472 bytes used, error(s) found
total csum bytes: 230618852
total tree bytes: 1614397440
total fs tree bytes: 1274036224
total extent tree bytes: 60456960
btree space waste bytes: 246521070
file data blocks allocated: 290520551424
 referenced 235795329024
extent buffer leak: start 37634048 len 16384
extent buffer leak: start 590715289600 len 16384

(—mode lowmem produces the exact same result, including out of memory. There 
are no quotas on that filesystem)

> btrfs fi show /build
Label: none  uuid: d53460f0-40b8-4e5c-8172-cb4b7cd26a28
Total devices 2 FS bytes used 221.50GiB
devid1 size 298.09GiB used 265.03GiB path /dev/sdb
devid2 size 298.09GiB used 266.00GiB path /dev/sdc

> btrfs fi df /build
Data, RAID1: total=262.00GiB, used=220.00GiB
System, DUP: total=8.00MiB, used=64.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=3.50GiB, used=1.50GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=324.14MiB, used=0.00B

There are also a number of entries in the systemd journal that seem related. 
This is the first one I can find:

Sep 27 19:15:59 ubos-pc kernel: INFO: task btrfs-cleaner:364 blocked for more 
than 120 seconds.
Sep 27 19:15:59 ubos-pc kernel:   Tainted: G   O4.12.13-1-ARCH 
#1
Sep 27 19:15:59 ubos-pc kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 27 19:15:59 ubos-pc kernel: btrfs-cleaner   D0   364  2 0x
Sep 27 19:15:59 ubos-pc kernel: Call Trace:
Sep 27 19:15:59 ubos-pc kernel:  __schedule+0x236/0x870
Sep 27 19:15:59 ubos-pc kernel:  schedule+0x3d/0x90
Sep 27 19:15:59 ubos-pc kernel:  btrfs_tree_read_lock+0xbc/0x110 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  ? wake_bit_function+0x60/0x60
Sep 27 19:15:59 ubos-pc kernel:  __add_missing_keys+0xbf/0x130 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  ? free_extent_buffer+0x4b/0xa0 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  find_parent_nodes+0x358/0x920 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  ? find_parent_nodes+0x36f/0x920 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  __btrfs_find_all_roots+0xa4/0x110 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  btrfs_find_all_roots+0x55/0x70 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  btrfs_qgroup_trace_extent+0x126/0x150 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  btrfs_qgroup_trace_leaf_items+0x116/0x150 
[btrfs]
Sep 27 19:15:59 ubos-pc kernel:  btrfs_qgroup_trace_subtree+0x203/0x390 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  do_walk_down+0x463/0x590 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  walk_down_tree+0xca/0x100 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  btrfs_drop_snapshot+0x389/0x8c0 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  ? btrfs_delete_unused_bgs+0x25f/0x3b0 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  btrfs_clean_one_deleted_snapshot+0xb5/0x100 
[btrfs]
Sep 27 19:15:59 ubos-pc kernel:  cleaner_kthread+0x14a/0x180 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  kthread+0x125/0x140
Sep 27 19:15:59 ubos-pc kernel:  ? __btree_submit_bio_start+0x20/0x20 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  ? kthread_create_on_node+0x70/0x70
Sep 27 19:15:59 ubos-pc kernel:  ? SyS_exit_group+0x14/0x20
Sep 27 19:15:59 ubos-pc kernel:  ret_from_fork+0x25/0x30


Sep 27 19:15:59 ubos-pc kernel: INFO: task btrfs-transacti:365 blocked for more 
than 120 seconds.
Sep 27 19:15:59 ubos-pc kernel:   Tainted: G   O4.12.13-1-ARCH 
#1
Sep 27 19:15:59 ubos-pc kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 27 19:15:59 ubos-pc kernel: btrfs-transacti D0   365  2 0x
Sep 27 19:15:59 ubos-pc kernel: Call Trace:
Sep 27 19:15:59 ubos-pc kernel:  __schedule+0x236/0x870
Sep 27 19:15:59 ubos-pc kernel:  schedule+0x3d/0x90
Sep 27 19:15:59 ubos-pc kernel:  btrfs_commit_transaction+0x818/0x900 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  ? wake_bit_function+0x60/0x60
Sep 27 19:15:59 ubos-pc kernel:  transaction_kthread+0x193/0x1c0 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  kthread+0x125/0x140
Sep 27 19:15:59 ubos-pc kernel:  ? btrfs_cleanup_transaction+0x500/0x500 [btrfs]
Sep 27 19:15:59 ubos-pc kernel:  ? kthread_create_on_node+0x70/0x70
Sep 27 19:15:59 ubos-pc kernel:  ? SyS_exit_group+0x14/0x20
Sep 27 19:15:59 ubos-pc kernel:  ret_from_fork+0x25/0x30

Sep 27 19:15:59 ubos-pc kernel: INFO: task systemd-journal:30360 blocked for 
more than 120 seconds.
Sep 27 19:15:59 ubos-pc kernel:   Tainted: G   O4.12.13-1-ARCH 
#1
Sep 27 19:15:59 ubos-pc kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 27 19:15:59 ubos-pc kernel: systemd-journal D0 30360  3034

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-27 Thread Marc MERLIN
On Sun, Sep 10, 2017 at 05:22:14PM -0700, Marc MERLIN wrote:
> On Sun, Sep 10, 2017 at 01:16:26PM +, Josef Bacik wrote:
> > Great, if the free space cache is fucked again after the next go
> > around then I need to expand the verifier to watch entries being added
> > to the cache as well.  Thanks,
> 
> Well, I copied about 1TB of data, and nothing happened.
> So it seems clearing it and fsck may have fixed this fault I had been
> carrying for quite a while.
> If so, yeah!
> 
> I'm not sure if this needs a kernel fix to not get triggered and if
> btrfs check should also be improved to catch this, but hopefully you
> know what makes sense there.

Just to report back, it's now been another 2 weeks, and no problem.
Seems that forcing the clear cache was actually the issue. Not sure if
the kernel should have found/detected/auto fixed the problem or if btrfs
check should have.

Either way, thanks for your help.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] fstests: btrfs/150 regression test for reading compressed data

2017-09-27 Thread Liu Bo
We had a bug in btrfs compression code which could end up with a
kernel panic.

This is adding a regression test for the bug and I've also sent a
kernel patch to fix the bug.

The patch is "Btrfs: fix kernel oops while reading compressed data".

Signed-off-by: Liu Bo 
---
v3: - Enable task-filter in debugfs

v2: - Fix ambiguous copyright.
- Use /proc/$pid/make-it-fail to specify IO failure
- Use bash -c to run test only when pid is odd.
- Add test to dangerous group.

 tests/btrfs/150 | 105 
 tests/btrfs/150.out |   3 ++
 tests/btrfs/group   |   1 +
 3 files changed, 109 insertions(+)
 create mode 100755 tests/btrfs/150
 create mode 100644 tests/btrfs/150.out

diff --git a/tests/btrfs/150 b/tests/btrfs/150
new file mode 100755
index 000..97041b6
--- /dev/null
+++ b/tests/btrfs/150
@@ -0,0 +1,105 @@
+#! /bin/bash
+# FS QA Test btrfs/150
+#
+# This is a regression test which ends up with a kernel oops in btrfs.
+# It occurs when btrfs's read repair happens while reading a compressed
+# extent.
+# The patch to fix it is
+#  Btrfs: fix kernel oops while reading compressed data
+#
+#---
+# Copyright (c) 2017 Oracle.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_fail_make_request
+_require_scratch_dev_pool 2
+
+SYSFS_BDEV=`_sysfs_dev $SCRATCH_DEV`
+enable_io_failure()
+{
+   echo 100 > $DEBUGFS_MNT/fail_make_request/probability
+   echo 1000 > $DEBUGFS_MNT/fail_make_request/times
+   echo 0 > $DEBUGFS_MNT/fail_make_request/verbose
+   echo 1 > $DEBUGFS_MNT/fail_make_request/task-filter
+   echo 1 > $SYSFS_BDEV/make-it-fail
+}
+
+disable_io_failure()
+{
+   echo 0 > $DEBUGFS_MNT/fail_make_request/probability
+   echo 0 > $DEBUGFS_MNT/fail_make_request/times
+   echo 0 > $DEBUGFS_MNT/fail_make_request/task-filter
+   echo 0 > $SYSFS_BDEV/make-it-fail
+}
+
+_scratch_pool_mkfs "-d raid1 -b 1G" >> $seqres.full 2>&1
+
+# It doesn't matter which compression algorithm we use.
+_scratch_mount -ocompress
+
+# Create a file with all data being compressed
+$XFS_IO_PROG -f -c "pwrite -W 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io
+
+# Raid1 consists of two copies and btrfs decides which copy to read by reader's
+# %pid.  Now we inject errors to copy #1 and copy #0 is good.  We want to read
+# the bad copy to trigger read-repair.
+while [[ -z $result ]]; do
+   # invalidate the page cache
+   $XFS_IO_PROG -f -c "fadvise -d 0 8K" $SCRATCH_MNT/foobar
+
+   enable_io_failure
+
+   result=$(bash -c "
+   if [ \$((\$\$ % 2)) == 1 ]; then
+   echo 1 > /proc/\$\$/make-it-fail
+   exec $XFS_IO_PROG -c \"pread 0 8K\" \$SCRATCH_MNT/foobar
+   fi")
+
+   disable_io_failure
+done
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/150.out b/tests/btrfs/150.out
new file mode 100644
index 000..c492c24
--- /dev/null
+++ b/tests/btrfs/150.out
@@ -0,0 +1,3 @@
+QA output created by 150
+wrote 8192/8192 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 70c3f05..e73bb1b 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -152,3 +152,4 @@
 147 auto quick send
 148 auto quick rw
 149 auto quick send compress
+150 auto quick dangerous
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fstests: btrfs/150 regression test for reading compressed data

2017-09-27 Thread Liu Bo
On Wed, Sep 27, 2017 at 05:46:44PM +0800, Eryu Guan wrote:
> On Tue, Sep 26, 2017 at 05:18:51PM -0700, Liu Bo wrote:
> > On Tue, Sep 26, 2017 at 04:37:52PM -0700, Liu Bo wrote:
> > > On Tue, Sep 26, 2017 at 05:02:36PM +0800, Eryu Guan wrote:
> > > > On Fri, Sep 22, 2017 at 05:21:27PM -0600, Liu Bo wrote:
> > > > > We had a bug in btrfs compression code which could end up with a
> > > > > kernel panic.
> > > > > 
> > > > > This is adding a regression test for the bug and I've also sent a
> > > > > kernel patch to fix the bug.
> > > > > 
> > > > > The patch is "Btrfs: fix kernel oops while reading compressed data".
> > > > > 
> > > > > Signed-off-by: Liu Bo 
> > > > 
> > > > Hmm, I can't reproduce the panic with 4.13 kernel, which doesn't have
> > > > the fix applied. Can you please help confirm if it panics on your test
> > > > environment?
> > > >
> > > 
> > > Yes, it is reproducible on my box, hrm...I'll be running it more times
> > > to double check.
> > > 
> > 
> > It worked for me...both v4.13 and v4.14.0-rc2 have the following
> > messages[1].
> > 
> > This requires two config:
> > CONFIG_FAULT_INJECTION=y
> > CONFIG_FAULT_INJECTION_DEBUG_FS=y
> > 
> > Could you please check again?
> 
> I re-compiled 4.14-rc2 kernel on my test vm with FAIL_MAKE_REQUEST
> enabled (which requires FAULT_INJECTION), and I can reproduce the crash
> now. It was so weired that previously I did have FAIL_MAKE_REQUEST
> enabled and test ran normally without hitting the bug, but now I can hit
> the bug quite reliably. Not sure what was happning in my previous test..
> 
> Thanks for confirming!

No problem at all, then I'll send a patch v3 with enabling task-filter
pointed out by Lu.

thanks,
-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option

2017-09-27 Thread David Sterba
On Mon, Sep 25, 2017 at 07:15:30AM -0400, Austin S. Hemmelgarn wrote:
> On 2017-09-24 10:08, Goffredo Baroncelli wrote:
> > On 09/24/2017 12:10 PM, Anand Jain wrote:

A lot of points in this thread, let me address them here.

I don't want to remove --rootdir functionality, the fix that's being
proposed removes the minimalization -- feature that was not well known,
but I understand it's useful (and already used).

I'd like to fix that in another way, eg. as an option to mkfs or a
separate tool.

I'm not worried about adding more code or code complexity. If we do it
right it's not a problem in the long run. But people for some reason
like to delete other people's code or functionality.

Regarding guessing number of users, this is always hard. So if there's
one vocal enough to let us know about the usecase, it's IMHO good time
to explore the it, code-wise and documentation-wise, and fix it or
improve.

So, what next. I'd like to get rid of the custom chunk allocator, namely
because of the known 1MB area misuse and duplication.

Implementing the minimalization might need some preparatory work and I
don't have a realistic ETA now. Ideally we'd fix both problems in one
version, as I'd rather avoid "temporary" solution to drop the
minimalization with the promise to put it back later.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/15] btrfs: Use pagevec_lookup_range_tag()

2017-09-27 Thread Jan Kara
We want only pages from given range in btree_write_cache_pages() and
extent_write_cache_pages(). Use pagevec_lookup_range_tag() instead of
pagevec_lookup_tag() and remove unnecessary code.

CC: linux-btrfs@vger.kernel.org
CC: David Sterba 
Reviewed-by: David Sterba 
Signed-off-by: Jan Kara 
---
 fs/btrfs/extent_io.c | 19 ---
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0f077c5db58e..9b7936ea3a88 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3819,8 +3819,8 @@ int btree_write_cache_pages(struct address_space *mapping,
if (wbc->sync_mode == WB_SYNC_ALL)
tag_pages_for_writeback(mapping, index, end);
while (!done && !nr_to_write_done && (index <= end) &&
-  (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag,
-   min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1))) {
+  (nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
+   tag, PAGEVEC_SIZE))) {
unsigned i;
 
scanned = 1;
@@ -3830,11 +3830,6 @@ int btree_write_cache_pages(struct address_space 
*mapping,
if (!PagePrivate(page))
continue;
 
-   if (!wbc->range_cyclic && page->index > end) {
-   done = 1;
-   break;
-   }
-
spin_lock(&mapping->private_lock);
if (!PagePrivate(page)) {
spin_unlock(&mapping->private_lock);
@@ -3966,8 +3961,8 @@ static int extent_write_cache_pages(struct address_space 
*mapping,
tag_pages_for_writeback(mapping, index, end);
done_index = index;
while (!done && !nr_to_write_done && (index <= end) &&
-  (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag,
-   min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1))) {
+  (nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
+   tag, PAGEVEC_SIZE))) {
unsigned i;
 
scanned = 1;
@@ -3992,12 +3987,6 @@ static int extent_write_cache_pages(struct address_space 
*mapping,
continue;
}
 
-   if (!wbc->range_cyclic && page->index > end) {
-   done = 1;
-   unlock_page(page);
-   continue;
-   }
-
if (wbc->sync_mode != WB_SYNC_NONE) {
if (PageWriteback(page))
flush_fn(data);
-- 
2.12.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] btrfs-progs: subvol: fix del --commit-after

2017-09-27 Thread David Sterba
On Wed, Sep 27, 2017 at 11:00:35AM +0900, Misono, Tomohiro wrote:
> Fix subvol del --commit-after to work properly:
>  - SYNC ioctl will be issued even when last delete is failed
>  - SYNC ioctl will be issued on each file system only once in the end
> 
> To achieve this, each deleted subvol's (parent's) fsid is checked each
> time. If the fsid is seen for the first time, its fd will be kept in order
> to issue SYNC ioctl in the end.
> 
> There already exists get_fsid() in cmds-property.c and seen_fsid which
> keeps fsid in hush function in cmds-filesystem.c. This patch utilizes
> them.
> 
> First patch is the independent but critical. Current code is reversed in
> --commit-after and --commit-each operation. i.e. --commit-after means
> --commit-each actually. The patch fix this.
> 
> 2rd to 4th patches make functions stated above to common and last one is
> the main part.

Thanks for the fix, I'm glad we did not have to remove the option.
Patches applied with some adjustments. Please became more familiar with
the coding style. I'm usually fixing all patches here and there, but
once the amount grows I'd rather spend time on more useful things.

Bothering with coding style is annoying but the point is that the
committed code looks exactly the same no matter when or who authored it.

> Thanks Qu for reviewing whole patches.

Appreciated.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] btrfs-progs: subvol: fix subvol del --commit-after

2017-09-27 Thread David Sterba
On Wed, Sep 27, 2017 at 11:03:48AM +0900, Misono, Tomohiro wrote:
> Fix subvol del --commit-after to work properly:
>  - SYNC ioctl will be issued even when last delete is failed
>  - SYNC ioctl will be issued on each file system only once in the end
> 
> To achieve this, get_fsid() and add_seen_fsid() is called after each delete
> to keep only one fd for each fs.
> 
> In the end, seen_fsid_hash will be traversed and SYNC is issued on each fs.
> 
> Signed-off-by: Tomohiro Misono 
> Reviewed-by: Qu Wenruo 

Review comments from me in a form of diff, coding style issues:

diff --git a/cmds-subvolume.c b/cmds-subvolume.c
index 00931a55807e..69c50387a984 100644
--- a/cmds-subvolume.c
+++ b/cmds-subvolume.c
@@ -265,7 +265,7 @@ static int cmd_subvol_delete(int argc, char **argv)
int commit_mode = 0;
u8 fsid[BTRFS_FSID_SIZE];
char uuidbuf[BTRFS_UUID_UNPARSED_SIZE];
-   struct seen_fsid *seen_fsid_hash[SEEN_FSID_HASH_SIZE] = {NULL,};
+   struct seen_fsid *seen_fsid_hash[SEEN_FSID_HASH_SIZE] = { NULL, };
enum { COMMIT_AFTER = 1, COMMIT_EACH = 2 };
 
while (1) {
@@ -364,8 +364,10 @@ static int cmd_subvol_delete(int argc, char **argv)
} else if (commit_mode == COMMIT_AFTER) {
res = get_fsid(dname, fsid, 0);
if (res < 0) {
-   error("unable to get fsid for '%s': %s", path, 
strerror(-res));
-   error("delete suceeded but commit may not be done in 
the end");
+   error("unable to get fsid for '%s': %s",
+   path, strerror(-res));
+   error(
+   "delete suceeded but commit may not be done in the 
end");
ret = 1;
goto out;
}
@@ -373,10 +375,14 @@ static int cmd_subvol_delete(int argc, char **argv)
if (add_seen_fsid(fsid, seen_fsid_hash, fd, dirstream) == 0) {
if (verbose > 0) {
uuid_unparse(fsid, uuidbuf);
-   printf("  new fs is found for '%s', fsid: 
%s\n", path, uuidbuf);
+   printf("  new fs is found for '%s', fsid: %s\n",
+   path, uuidbuf);
}
-   // this is the first time a subvolume on this 
filesystem is deleted
-   // keep fd in order to issue SYNC ioctl in the end
+   /*
+* This is the first time a subvolume on this
+* filesystem is deleted, keep fd in order to issue
+* SYNC ioctl in the end
+*/
goto keep_fd;
}
}
@@ -395,27 +401,32 @@ static int cmd_subvol_delete(int argc, char **argv)
goto again;
 
if (commit_mode == COMMIT_AFTER) {
-   // traverse seen_fsid_hash and issue SYNC ioctl on each 
filesystem
int slot;
-   struct seen_fsid *seen;
 
+   /*
+* Traverse seen_fsid_hash and issue SYNC ioctl on each
+* filesystem
+*/
for (slot = 0; slot < SEEN_FSID_HASH_SIZE; slot++) {
-   seen = seen_fsid_hash[slot];
+   struct seen_fsid *seen = seen_fsid_hash[slot];
+
while (seen) {
res = wait_for_commit(seen->fd);
if (res < 0) {
uuid_unparse(seen->fsid, uuidbuf);
-   error("unable to do final sync after 
deletion: %s, fsid: %s",
+   error(
+   "unable to do final sync after deletion: %s, fsid: %s",
strerror(errno), uuidbuf);
ret = 1;
} else if (verbose > 0) {
uuid_unparse(seen->fsid, uuidbuf);
-   printf("final sync is done for fsid: 
%s\n", uuidbuf);
+   printf("final sync is done for fsid: 
%s\n",
+   uuidbuf);
}
seen = seen->next;
}
}
-   // fd will also be closed in free_seen_fsid
+   /* fd will also be closed in free_seen_fsid */
free_seen_fsid(seen_fsid_hash);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][v2] btrfs: fix send ioctl on 32bit with 64bit kernel

2017-09-27 Thread Josef Bacik
We pass in a pointer in our send arg struct, this means the struct size doesn't
match with 32bit user space and 64bit kernel space.  Fix this by adding a compat
mode and doing the appropriate conversion.

Signed-off-by: Josef Bacik 
---
- move the args_32 out of the uapi header, they don't need it.
- fix compile problems with CONFIG_32BIT and !CONFIG_COMPAT

 fs/btrfs/ioctl.c | 53 -
 fs/btrfs/send.c  | 12 ++--
 fs/btrfs/send.h  |  2 +-
 3 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 050d2d9c5533..75a686a886bc 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -5489,6 +5489,53 @@ static int btrfs_ioctl_set_features(struct file *file, 
void __user *arg)
return ret;
 }
 
+#if defined(CONFIG_64BIT) && defined(CONFIG_COMPAT)
+struct btrfs_ioctl_send_args_32 {
+   __s64 send_fd;  /* in */
+   __u64 clone_sources_count;  /* in */
+   compat_uptr_t clone_sources;/* in */
+   __u64 parent_root;  /* in */
+   __u64 flags;/* in */
+   __u64 reserved[4];  /* in */
+} __attribute__ ((__packed__));
+#define BTRFS_IOC_SEND_32 _IOW(BTRFS_IOCTL_MAGIC, 38, \
+  struct btrfs_ioctl_send_args_32)
+#endif
+
+static int _btrfs_ioctl_send(struct file *file, void __user *argp, bool compat)
+{
+   struct btrfs_ioctl_send_args *arg;
+   int ret;
+
+   if (compat) {
+#if defined(CONFIG_64BIT) && defined(CONFIG_COMPAT)
+   struct btrfs_ioctl_send_args_32 args32;
+   ret = copy_from_user(&args32, argp, sizeof(args32));
+   if (ret)
+   return -EFAULT;
+   arg = kzalloc(sizeof(*arg), GFP_KERNEL);
+   if (!arg)
+   return -ENOMEM;
+   arg->send_fd = args32.send_fd;
+   arg->clone_sources_count = args32.clone_sources_count;
+   arg->clone_sources = compat_ptr(args32.clone_sources);
+   arg->parent_root = args32.parent_root;
+   arg->flags = args32.flags;
+   memcpy(arg->reserved, args32.reserved,
+  sizeof(args32.reserved));
+#else
+   return -ENOTTY;
+#endif
+   } else {
+   arg = memdup_user(argp, sizeof(*arg));
+   if (IS_ERR(arg))
+   return PTR_ERR(arg);
+   }
+   ret = btrfs_ioctl_send(file, arg);
+   kfree(arg);
+   return ret;
+}
+
 long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
 {
@@ -5594,7 +5641,11 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_set_received_subvol_32(file, argp);
 #endif
case BTRFS_IOC_SEND:
-   return btrfs_ioctl_send(file, argp);
+   return _btrfs_ioctl_send(file, argp, false);
+#if defined(CONFIG_64BIT) && defined(CONFIG_COMPAT)
+   case BTRFS_IOC_SEND_32:
+   return _btrfs_ioctl_send(file, argp, true);
+#endif
case BTRFS_IOC_GET_DEV_STATS:
return btrfs_ioctl_get_dev_stats(fs_info, argp);
case BTRFS_IOC_QUOTA_CTL:
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 32b043ef8ac9..d5019035ad06 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "send.h"
 #include "backref.h"
@@ -6365,13 +6366,12 @@ static void btrfs_root_dec_send_in_progress(struct 
btrfs_root* root)
spin_unlock(&root->root_item_lock);
 }
 
-long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_)
+long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args *arg)
 {
int ret = 0;
struct btrfs_root *send_root = BTRFS_I(file_inode(mnt_file))->root;
struct btrfs_fs_info *fs_info = send_root->fs_info;
struct btrfs_root *clone_root;
-   struct btrfs_ioctl_send_args *arg = NULL;
struct btrfs_key key;
struct send_ctx *sctx = NULL;
u32 i;
@@ -6407,13 +6407,6 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user 
*arg_)
goto out;
}
 
-   arg = memdup_user(arg_, sizeof(*arg));
-   if (IS_ERR(arg)) {
-   ret = PTR_ERR(arg);
-   arg = NULL;
-   goto out;
-   }
-
/*
 * Check that we don't overflow at later allocations, we request
 * clone_sources_count + 1 items, and compare to unsigned long inside
@@ -6654,7 +6647,6 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user 
*arg_)
if (sctx && !IS_ERR_OR_NULL(sctx->parent_root))
btrfs_root_dec_send_in_progress(sctx->parent_root);
 
-   kfree(arg);
kvfree(clone_sources_tmp);
 
if (sctx) {
diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h
index 02e00166c4da..3aa4bc55754f 100644
--- a/fs/btrfs/send.h
+++ b/fs/btrfs/send.h
@@ -

Re: [PATCH] btrfs: fix send ioctl on 32bit with 64bit kernel

2017-09-27 Thread Josef Bacik
On Wed, Sep 27, 2017 at 12:52:50PM +0200, David Sterba wrote:
> On Tue, Sep 26, 2017 at 04:40:09PM -0400, Josef Bacik wrote:
> > We pass in a pointer in our send arg struct, this means the struct size 
> > doesn't
> > match with 32bit user space and 64bit kernel space.  Fix this by adding a 
> > compat
> > mode and doing the appropriate conversion.
> 
> How does this differ from the existing workaround that adds the 32bit
> structure (btrfs_ioctl_timespec_32 and
> btrfs_ioctl_received_subvol_args_32 at the top of ioctl.c) and the
> btrfs_compat_ioctl() (at the end of ioctl.c)?

This is different because it's for SEND not RECIEVED_SUBVOL.  We have a pointer
in the send args struct which is differently sized on 64bit from 32bit, so we
need the compat struct so it's the right size when it comes into the kernel, and
then we translate from there.  Without this send is broken with 32bit userspace
and a 64bit kernel.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: Abort transaction if btrfs_update_root fails in btrfs_ioctl_subvol_setflags

2017-09-27 Thread Nikolay Borisov


On 27.09.2017 17:00, David Sterba wrote:
> On Wed, Sep 27, 2017 at 11:48:59AM +0300, Nikolay Borisov wrote:
>> On 26.09.2017 20:39, David Sterba wrote:
>>> On Tue, Sep 26, 2017 at 05:27:41PM +0300, Nikolay Borisov wrote:
 btrfs_update_root can fail for any number of reasons, however the only 
 error
 handling we do is to revert the modified flags, yet we do not abort the
 transaction but proceed to commit it.
>>>
>>> AFAICS btrfs_update_root aborts the transaction itself, so it's not
>>> needed in the caller.
>>>
 Fix this by explicitly checking if the
 update root routine has failed and abort the transaction.

 Fixes: 0caa102da827 ("Btrfs: Add BTRFS_IOC_SUBVOL_GETFLAGS/SETFLAGS 
 ioctls")
 Signed-off-by: Nikolay Borisov 
 ---
  fs/btrfs/ioctl.c | 3 +++
  1 file changed, 3 insertions(+)

 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index d6715c2bcdc4..09fcd51f0e8c 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -1842,8 +1842,11 @@ static noinline int 
 btrfs_ioctl_subvol_setflags(struct file *file,
  
ret = btrfs_update_root(trans, fs_info->tree_root,
&root->root_key, &root->root_item);
 +  if (ret < 0)
 +  btrfs_abort_transaction(trans, ret);
  
btrfs_commit_transaction(trans);
>>>
>>> I think the error from commit_transaction should be returned as well if
>>> we get here.
>>>
>>> So:
>>>
>>> ret = btrfs_update_root();
>>>
>>> if (ret < 0) {
>>> end_transaction();
>>> goto out_reset;
>>> }
>>>
>>> ret = btrfs_commit_transaction();
>>>
>>> out_reset:
>>> /* and then as it is */
>>> if (ret)
>>> btrfs_set_root_flags(...);
>>>
>>> etc.
>>
>> I think even this is not needed, since btrfs_commit_transaction actually
>> handles aborted transaction by calling btrfs_end_transaction and
>> returning the error with which the trans was aborted. So I will just
>> drop this patch
> 
> From API point of view, I'd like to see some pattern, where we don't
> call commit after transaction abort IF we have enough information from
> the context. Ie. the abort and commit would be in the same function.
> Commit has to deal with an aborted transaction, that's right, but kind
> of an implementation detail and last resort resolution.

Fair point, I will respin the patch as well as some of my other
transaction error handling patches.


> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] fix bug in btrfs_init_new_device()

2017-09-27 Thread Nikolay Borisov


On 27.09.2017 17:22, David Sterba wrote:
> On Tue, Sep 26, 2017 at 03:14:27PM +0300, Nikolay Borisov wrote:
>>
>>
>> On 26.09.2017 11:47, Anand Jain wrote:
>>> 1/2 fixes a bug which failed to reset writable when sprouting failed
>>> 2/2 fixes BUG_ON in btrfs_init_new_device()
>>>
>>> Anand Jain (2):
>>>   btrfs: undo writable when sprouting fails
>>>   btrfs: fix BUG_ON in btrfs_init_new_device()
>>
>> Reviewed-by: Nikolay Borisov 
> 
> Please note that this would lead to unexpected unlocks of uuid_mutex and
> sb::s_umount:
> 
> 2465 ret = btrfs_commit_transaction(trans);
> 2466
> 2467 if (seeding_dev) {
> 2468 mutex_unlock(&uuid_mutex);
> 2469 up_write(&sb->s_umount);
> 
> first unlocks
> 
> 2470
> 2471 if (ret) /* transaction commit */
> 2472 return ret;
> 2473
> 2474 ret = btrfs_relocate_sys_chunks(fs_info);
> 2475 if (ret < 0)
> 2476 btrfs_handle_fs_error(fs_info, ret,
> 2477 "Failed to relocate sys chunks after 
> device initialization. This can be fixed using the \"btrfs balance\"
> 2478 trans = btrfs_attach_transaction(root);
> 2479 if (IS_ERR(trans)) {
> 2480 if (PTR_ERR(trans) == -ENOENT)
> 2481 return 0;
> 2482 return PTR_ERR(trans);
> 
> 
> this becomes goto 2494
> 
> 2483 }
> 2484 ret = btrfs_commit_transaction(trans);
> 2485 }
> 2486
> 2487 /* Update ctime/mtime for libblkid */
> 2488 update_dev_time(device_path);
> 2489 return ret;
> 2490
> 2491 error_trans:
> 2492 btrfs_end_transaction(trans);
> 2493 rcu_string_free(device->name);
> 2494 btrfs_sysfs_rm_device_link(fs_info->fs_devices, device);
> 
> here
> 
> 2495 kfree(device);
> 2496 error:
> 2497 blkdev_put(bdev, FMODE_EXCL);
> 2498 if (seeding_dev) {
> 2499 mutex_unlock(&uuid_mutex);
> 2500 up_write(&sb->s_umount);
> 
> and unlocking again
> 
> 2501 }
> 2502 return ret;
> 
> So the in-place returns had a meaning but are quite confusing.

I concur and had missed that ;(. I saw that you suggested Anand to
overhaul the overall error handling in this function and I believe this
would be the best course of action.

> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] fix bug in btrfs_init_new_device()

2017-09-27 Thread David Sterba
On Tue, Sep 26, 2017 at 03:14:27PM +0300, Nikolay Borisov wrote:
> 
> 
> On 26.09.2017 11:47, Anand Jain wrote:
> > 1/2 fixes a bug which failed to reset writable when sprouting failed
> > 2/2 fixes BUG_ON in btrfs_init_new_device()
> > 
> > Anand Jain (2):
> >   btrfs: undo writable when sprouting fails
> >   btrfs: fix BUG_ON in btrfs_init_new_device()
> 
> Reviewed-by: Nikolay Borisov 

Please note that this would lead to unexpected unlocks of uuid_mutex and
sb::s_umount:

2465 ret = btrfs_commit_transaction(trans);
2466
2467 if (seeding_dev) {
2468 mutex_unlock(&uuid_mutex);
2469 up_write(&sb->s_umount);

first unlocks

2470
2471 if (ret) /* transaction commit */
2472 return ret;
2473
2474 ret = btrfs_relocate_sys_chunks(fs_info);
2475 if (ret < 0)
2476 btrfs_handle_fs_error(fs_info, ret,
2477 "Failed to relocate sys chunks after 
device initialization. This can be fixed using the \"btrfs balance\"
2478 trans = btrfs_attach_transaction(root);
2479 if (IS_ERR(trans)) {
2480 if (PTR_ERR(trans) == -ENOENT)
2481 return 0;
2482 return PTR_ERR(trans);


this becomes goto 2494

2483 }
2484 ret = btrfs_commit_transaction(trans);
2485 }
2486
2487 /* Update ctime/mtime for libblkid */
2488 update_dev_time(device_path);
2489 return ret;
2490
2491 error_trans:
2492 btrfs_end_transaction(trans);
2493 rcu_string_free(device->name);
2494 btrfs_sysfs_rm_device_link(fs_info->fs_devices, device);

here

2495 kfree(device);
2496 error:
2497 blkdev_put(bdev, FMODE_EXCL);
2498 if (seeding_dev) {
2499 mutex_unlock(&uuid_mutex);
2500 up_write(&sb->s_umount);

and unlocking again

2501 }
2502 return ret;

So the in-place returns had a meaning but are quite confusing.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: take the error path out if btrfs_attach_transaction() fails

2017-09-27 Thread David Sterba
On Wed, Sep 27, 2017 at 05:50:52PM +0800, Anand Jain wrote:
> btrfs_init_new_device() calls btrfs_attach_transaction() to
> commit sys chunks, however take the error path out if it fails.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/volumes.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index fad3b10a1f81..b526d13a74da 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2494,7 +2494,8 @@ int btrfs_init_new_device(struct btrfs_fs_info 
> *fs_info, const char *device_path
>   if (IS_ERR(trans)) {
>   if (PTR_ERR(trans) == -ENOENT)
>   return 0;
> - return PTR_ERR(trans);
> + ret = PTR_ERR(trans);
> + goto error_sysfs;

The label is introduced by another patch, please resend the whole
patchset, I've seen several iterations and feedback from various people
and I'm not sure I'm looking at the latest version.

Regarding error handling in btrfs_init_new_device, the seeding device
makes it hard to read. This patch would lead to a double unlock of
uuid_mutex and sb::s_umount, because the label error_sysfs will continue
to do the cleanups, that were already partially done in the containing
'if (seeding_dev)' block where the test fails.

I'd suggest to first get rid of the in-place returns and add necessary
labels or separate exit sequences and then address the new error
handling.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Wrong device?

2017-09-27 Thread Andrei Borzenkov
26.09.2017 10:31, Lukas Pirl пишет:
> On 09/25/2017 06:11 PM, linux-bt...@oh3mqu.pp.hyper.fi wrote as excerpted:
>> After a long googling (about more complex situations) I suddenly
>> noticed "device sdb" WTF???  Filesystem is mounted from /dev/md3 (sdb
>> is part of that mdraid) so btrfs should not even know anything about
>> that /dev/sdb.
> 
> I would be interested in explanations regarding this too. It happened
> to me as well, that I was confused by /dev/sd* device paths being
> printed by btrfs in the logs, even though it runs on /dev/md-*
> (/dev/mapper/*) devices exclusively.
> 

Could be related:

https://forums.opensuse.org/showthread.php/526696-Upgrade-destroys-dmraid?p=2838207#post2838207




signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] btrfs: Abort transaction if btrfs_update_root fails in btrfs_ioctl_subvol_setflags

2017-09-27 Thread David Sterba
On Wed, Sep 27, 2017 at 11:48:59AM +0300, Nikolay Borisov wrote:
> On 26.09.2017 20:39, David Sterba wrote:
> > On Tue, Sep 26, 2017 at 05:27:41PM +0300, Nikolay Borisov wrote:
> >> btrfs_update_root can fail for any number of reasons, however the only 
> >> error
> >> handling we do is to revert the modified flags, yet we do not abort the
> >> transaction but proceed to commit it.
> > 
> > AFAICS btrfs_update_root aborts the transaction itself, so it's not
> > needed in the caller.
> > 
> >> Fix this by explicitly checking if the
> >> update root routine has failed and abort the transaction.
> >>
> >> Fixes: 0caa102da827 ("Btrfs: Add BTRFS_IOC_SUBVOL_GETFLAGS/SETFLAGS 
> >> ioctls")
> >> Signed-off-by: Nikolay Borisov 
> >> ---
> >>  fs/btrfs/ioctl.c | 3 +++
> >>  1 file changed, 3 insertions(+)
> >>
> >> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> >> index d6715c2bcdc4..09fcd51f0e8c 100644
> >> --- a/fs/btrfs/ioctl.c
> >> +++ b/fs/btrfs/ioctl.c
> >> @@ -1842,8 +1842,11 @@ static noinline int 
> >> btrfs_ioctl_subvol_setflags(struct file *file,
> >>  
> >>ret = btrfs_update_root(trans, fs_info->tree_root,
> >>&root->root_key, &root->root_item);
> >> +  if (ret < 0)
> >> +  btrfs_abort_transaction(trans, ret);
> >>  
> >>btrfs_commit_transaction(trans);
> > 
> > I think the error from commit_transaction should be returned as well if
> > we get here.
> > 
> > So:
> > 
> > ret = btrfs_update_root();
> > 
> > if (ret < 0) {
> > end_transaction();
> > goto out_reset;
> > }
> > 
> > ret = btrfs_commit_transaction();
> > 
> > out_reset:
> > /* and then as it is */
> > if (ret)
> > btrfs_set_root_flags(...);
> > 
> > etc.
> 
> I think even this is not needed, since btrfs_commit_transaction actually
> handles aborted transaction by calling btrfs_end_transaction and
> returning the error with which the trans was aborted. So I will just
> drop this patch

>From API point of view, I'd like to see some pattern, where we don't
call commit after transaction abort IF we have enough information from
the context. Ie. the abort and commit would be in the same function.
Commit has to deal with an aborted transaction, that's right, but kind
of an implementation detail and last resort resolution.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 6/6] Btrfs: heuristic add byte core set calculation

2017-09-27 Thread David Sterba
On Fri, Aug 25, 2017 at 12:18:45PM +0300, Timofey Titovets wrote:
> Calculate byte core set for data sample:
> Sort bucket's numbers in decreasing order
> Count how many numbers use 90% of sample
> If core set are low (<=25%), data are easily compressible
> If core set high (>=80%), data are not compressible
> 
> Signed-off-by: Timofey Titovets 
> ---
>  fs/btrfs/heuristic.c | 51 ++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
> index ef723e991576..df0cefa42857 100644
> --- a/fs/btrfs/heuristic.c
> +++ b/fs/btrfs/heuristic.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "compression.h"
> 
>  #define READ_SIZE 16
> @@ -25,6 +26,8 @@
>  #define BUCKET_SIZE 256
>  #define MAX_SAMPLE_SIZE (BTRFS_MAX_UNCOMPRESSED*READ_SIZE/ITER_SHIFT)
>  #define BYTE_SET_THRESHOLD 64
> +#define BYTE_CORE_SET_LOW  BYTE_SET_THRESHOLD
> +#define BYTE_CORE_SET_HIGH 200 // ~80%
> 
>  struct bucket_item {
>   u32 count;
> @@ -67,6 +70,45 @@ static struct list_head *heuristic_alloc_workspace(void)
>   return ERR_PTR(-ENOMEM);
>  }
> 
> +/* For bucket sorting */
> +static inline int bucket_compare(const void *lv, const void *rv)
> +{
> + struct bucket_item *l = (struct bucket_item *)(lv);
> + struct bucket_item *r = (struct bucket_item *)(rv);
> +
> + return r->count - l->count;
> +}
> +
> +/*
> + * Byte Core set size
> + * How many bytes use 90% of sample
> + */
> +static int byte_core_set_size(struct workspace *ws)
> +{
> + u32 a = 0;
> + u32 coreset_sum = 0;
> + u32 core_set_threshold = ws->sample_size*90/100;
> + struct bucket_item *bucket = ws->bucket;
> +
> + /* Sort in reverse order */
> + sort(bucket, BUCKET_SIZE, sizeof(*bucket),
> +  &bucket_compare, NULL);
> +
> + for (; a < BYTE_CORE_SET_LOW; a++)
> + coreset_sum += bucket[a].count;
> +
> + if (coreset_sum > core_set_threshold)
> + return a;
> +
> + for (; a < BYTE_CORE_SET_HIGH && bucket[a].count > 0; a++) {
> + coreset_sum += bucket[a].count;
> + if (coreset_sum > core_set_threshold)
> + break;
> + }
> +
> + return a;
> +}
> +
>  static u32 byte_set_size(const struct workspace *ws)
>  {
>   u32 a = 0;
> @@ -164,7 +206,14 @@ static int heuristic(struct list_head *ws, struct inode 
> *inode,
>   if (a > BYTE_SET_THRESHOLD)
>   return 2;
> 
> - return 1;
> + a = byte_core_set_size(workspace);
> + if (a <= BYTE_CORE_SET_LOW)
> + return 3;
> +
> + if (a >= BYTE_CORE_SET_HIGH)
> + return 0;
> +
> + return 4;

So the return value determines guessed compressibility, this could use
some human readable enums or defines.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 6/6] Btrfs: heuristic add byte core set calculation

2017-09-27 Thread David Sterba
On Fri, Aug 25, 2017 at 12:18:45PM +0300, Timofey Titovets wrote:
> Calculate byte core set for data sample:
> Sort bucket's numbers in decreasing order
> Count how many numbers use 90% of sample
> If core set are low (<=25%), data are easily compressible
> If core set high (>=80%), data are not compressible
> 
> Signed-off-by: Timofey Titovets 
> ---
>  fs/btrfs/heuristic.c | 51 ++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
> index ef723e991576..df0cefa42857 100644
> --- a/fs/btrfs/heuristic.c
> +++ b/fs/btrfs/heuristic.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "compression.h"
> 
>  #define READ_SIZE 16
> @@ -25,6 +26,8 @@
>  #define BUCKET_SIZE 256
>  #define MAX_SAMPLE_SIZE (BTRFS_MAX_UNCOMPRESSED*READ_SIZE/ITER_SHIFT)
>  #define BYTE_SET_THRESHOLD 64
> +#define BYTE_CORE_SET_LOW  BYTE_SET_THRESHOLD
> +#define BYTE_CORE_SET_HIGH 200 // ~80%

Please don't use the // c++ style of comments.

> 
>  struct bucket_item {
>   u32 count;
> @@ -67,6 +70,45 @@ static struct list_head *heuristic_alloc_workspace(void)
>   return ERR_PTR(-ENOMEM);
>  }
> 
> +/* For bucket sorting */

"Compare buckets by size, ascending */

> +static inline int bucket_compare(const void *lv, const void *rv)
> +{
> + struct bucket_item *l = (struct bucket_item *)(lv);
> + struct bucket_item *r = (struct bucket_item *)(rv);

The const should be used also for the specific types.

> +
> + return r->count - l->count;
> +}
> +
> +/*
> + * Byte Core set size
> + * How many bytes use 90% of sample
> + */
> +static int byte_core_set_size(struct workspace *ws)
> +{
> + u32 a = 0;
> + u32 coreset_sum = 0;
> + u32 core_set_threshold = ws->sample_size*90/100;

u32 core_set_threshold = ws->sample_size * 90 / 100;

> + struct bucket_item *bucket = ws->bucket;
> +
> + /* Sort in reverse order */

So if it's reverse, please add _rev to the end of the comparator
function. The common naming is "comp_WHAT" so I suggest to use
comp_bucket_rev.

> + sort(bucket, BUCKET_SIZE, sizeof(*bucket),
> +  &bucket_compare, NULL);
> +
> + for (; a < BYTE_CORE_SET_LOW; a++)
> + coreset_sum += bucket[a].count;
> +
> + if (coreset_sum > core_set_threshold)
> + return a;
> +
> + for (; a < BYTE_CORE_SET_HIGH && bucket[a].count > 0; a++) {
> + coreset_sum += bucket[a].count;
> + if (coreset_sum > core_set_threshold)
> + break;
> + }
> +
> + return a;
> +}
> +
>  static u32 byte_set_size(const struct workspace *ws)
>  {
>   u32 a = 0;
> @@ -164,7 +206,14 @@ static int heuristic(struct list_head *ws, struct inode 
> *inode,
>   if (a > BYTE_SET_THRESHOLD)
>   return 2;
> 
> - return 1;
> + a = byte_core_set_size(workspace);
> + if (a <= BYTE_CORE_SET_LOW)
> + return 3;
> +
> + if (a >= BYTE_CORE_SET_HIGH)
> + return 0;
> +
> + return 4;
>  }
> 
>  const struct btrfs_compress_op btrfs_heuristic = {
> --
> 2.14.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 5/6] Btrfs: heuristic add byte set calculation

2017-09-27 Thread David Sterba
On Fri, Aug 25, 2017 at 12:18:44PM +0300, Timofey Titovets wrote:
> Calculate byte set size for data sample:
> Calculate how many unique bytes has been in sample
> By count all bytes in bucket with count > 0
> If byte set low (~25%), data are easily compressible
> 
> Signed-off-by: Timofey Titovets 
> ---
>  fs/btrfs/heuristic.c | 26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
> index f1fa6e4f1c11..ef723e991576 100644
> --- a/fs/btrfs/heuristic.c
> +++ b/fs/btrfs/heuristic.c
> @@ -24,6 +24,7 @@
>  #define ITER_SHIFT 256
>  #define BUCKET_SIZE 256
>  #define MAX_SAMPLE_SIZE (BTRFS_MAX_UNCOMPRESSED*READ_SIZE/ITER_SHIFT)
> +#define BYTE_SET_THRESHOLD 64

Explanation needed, partially covered by the changelog but we need to
see it in the code as well.

> 
>  struct bucket_item {
>   u32 count;
> @@ -66,6 +67,27 @@ static struct list_head *heuristic_alloc_workspace(void)
>   return ERR_PTR(-ENOMEM);
>  }
> 
> +static u32 byte_set_size(const struct workspace *ws)
> +{
> + u32 a = 0;

Please use 'i'.

> + u32 byte_set_size = 0;
> +
> + for (; a < BYTE_SET_THRESHOLD; a++) {

for (i = 0; ...)


> + if (ws->bucket[a].count > 0)
> + byte_set_size++;
> + }
> +
> + for (; a < BUCKET_SIZE; a++) {

So here the initialization is intentionally skipped, please add a
comment that this is expected or explain like "continue past the first
sample until ...".

> + if (ws->bucket[a].count > 0) {
> + byte_set_size++;
> + if (byte_set_size > BYTE_SET_THRESHOLD)
> + return byte_set_size;
> + }
> + }
> +
> + return byte_set_size;
> +}
> +
>  static bool sample_repeated_patterns(struct workspace *ws)
>  {
>   u32 i = 0;
> @@ -138,6 +160,10 @@ static int heuristic(struct list_head *ws, struct inode 
> *inode,
>   workspace->bucket[byte].count++;
>   }
> 
> + a = byte_set_size(workspace);
> + if (a > BYTE_SET_THRESHOLD)
> + return 2;
> +
>   return 1;
>  }
> 
> --
> 2.14.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 4/6] Btrfs: heuristic add detection of repeated data patterns

2017-09-27 Thread David Sterba
On Fri, Aug 25, 2017 at 12:18:43PM +0300, Timofey Titovets wrote:
> Walk over data sample and use memcmp to detect
> repeated data (like zeroed)
> 
> Signed-off-by: Timofey Titovets 
> ---
>  fs/btrfs/heuristic.c | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
> index 5192e51ab81e..f1fa6e4f1c11 100644
> --- a/fs/btrfs/heuristic.c
> +++ b/fs/btrfs/heuristic.c
> @@ -66,6 +66,19 @@ static struct list_head *heuristic_alloc_workspace(void)
>   return ERR_PTR(-ENOMEM);
>  }
> 
> +static bool sample_repeated_patterns(struct workspace *ws)
> +{
> + u32 i = 0;
> + u8 *p = ws->sample;
> +
> + for (; i < ws->sample_size - READ_SIZE; i += READ_SIZE) {

Please use the inline initializatio of the iteration variables.

for (i = 0; i < ws->sample_size - READ_SIZE; i += READ_SIZE) {

> + if(memcpy(&p[i], &p[i + READ_SIZE], READ_SIZE))
> + return false;

I think zeros are by far the most common repeated pattern, for that we
can use some speedy tests, built on top of "find first bit", possibly
with hardware support.

OTOH if it's a more generic "any repeated bytes", this might be actually
slower, and the heuristic needs to be as fast as possible. I don't want
to start optimizing now, so the generic pattern is ok for now.

> + }
> +
> + return true;
> +}
> +
>  static int heuristic(struct list_head *ws, struct inode *inode,
>u64 start, u64 end)
>  {
> @@ -115,6 +128,9 @@ static int heuristic(struct list_head *ws, struct inode 
> *inode,
> 
>   workspace->sample_size = b;
> 
> + if (sample_repeated_patterns(workspace))
> + return 1;
> +
>   memset(workspace->bucket, 0, sizeof(*workspace->bucket)*BUCKET_SIZE);
> 
>   for (a = 0; a < workspace->sample_size; a++) {
> --
> 2.14.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/6] Btrfs: implement heuristic sampling logic

2017-09-27 Thread David Sterba
On Fri, Aug 25, 2017 at 12:18:42PM +0300, Timofey Titovets wrote:
> Copy sample data from input data range to sample buffer
> then calculate byte type count for that sample into bucket.
> 
> Signed-off-by: Timofey Titovets 
> ---
>  fs/btrfs/heuristic.c | 38 +-
>  1 file changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
> index e3924c87af08..5192e51ab81e 100644
> --- a/fs/btrfs/heuristic.c
> +++ b/fs/btrfs/heuristic.c
> @@ -69,8 +69,20 @@ static struct list_head *heuristic_alloc_workspace(void)
>  static int heuristic(struct list_head *ws, struct inode *inode,
>u64 start, u64 end)
>  {
> + struct workspace *workspace = list_entry(ws, struct workspace, list);
>   struct page *page;
>   u64 index, index_end;
> + u32 a, b;

Please use more descriptive variable names. Using 'i' for simple
iteration is ok.

> + u8 *in_data, *sample = workspace->sample;
> + u8 byte;
> +
> + /*
> +  * Compression only handle first 128kb of input range
> +  * And just shift over range in loop for compressing it.
> +  * Let's do the same.
> + */
> + if (end - start > BTRFS_MAX_UNCOMPRESSED)
> + end = start + BTRFS_MAX_UNCOMPRESSED;
> 
>   index = start >> PAGE_SHIFT;
>   index_end = end >> PAGE_SHIFT;
> @@ -79,13 +91,37 @@ static int heuristic(struct list_head *ws, struct inode 
> *inode,
>   if (!IS_ALIGNED(end, PAGE_SIZE))
>   index_end++;
> 
> + b = 0;
>   for (; index < index_end; index++) {
>   page = find_get_page(inode->i_mapping, index);
> - kmap(page);
> + in_data = kmap(page);
> + /* Handle case where start unaligned to PAGE_SIZE */
> + a = start%PAGE_SIZE;

a = start % PAGE_SIZE;

> + while (a < PAGE_SIZE - READ_SIZE) {
> + /* Prevent sample overflow */
> + if (b >= MAX_SAMPLE_SIZE)
> + break;
> + /* Don't sample mem trash from last page */
> + if (start > end - READ_SIZE)
> + break;

I think you can merge the two conditions into one, if you calculate
beforehand where b would overflow 'end -> READ_SIZE'.

> + memcpy(&sample[b], &in_data[a], READ_SIZE);
> + a += ITER_SHIFT;
> + start += ITER_SHIFT;
> + b += READ_SIZE;
> + }
>   kunmap(page);
>   put_page(page);
>   }
> 
> + workspace->sample_size = b;
> +
> + memset(workspace->bucket, 0, sizeof(*workspace->bucket)*BUCKET_SIZE);
> +
> + for (a = 0; a < workspace->sample_size; a++) {
> + byte = sample[a];
> + workspace->bucket[byte].count++;
> + }
> +
>   return 1;
>  }
> 
> --
> 2.14.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/6] Btrfs: heuristic workspace add bucket and sample items

2017-09-27 Thread David Sterba
On Fri, Aug 25, 2017 at 12:18:41PM +0300, Timofey Titovets wrote:
> Heuristic workspace:
>  - Add bucket for storing byte type counters
>  - Add sample array for storing partial copy of
>input data range
>  - Add counter for store current sample size to workspace
> 
> Signed-off-by: Timofey Titovets 
> ---
>  fs/btrfs/heuristic.c | 23 +++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
> index 92f9335bafd4..e3924c87af08 100644
> --- a/fs/btrfs/heuristic.c
> +++ b/fs/btrfs/heuristic.c
> @@ -20,13 +20,29 @@
>  #include 
>  #include "compression.h"
> 
> +#define READ_SIZE 16
> +#define ITER_SHIFT 256
> +#define BUCKET_SIZE 256
> +#define MAX_SAMPLE_SIZE (BTRFS_MAX_UNCOMPRESSED*READ_SIZE/ITER_SHIFT)

All the defines need a short explanation what's the purpose. I've seen
that eg. READ_SIZE is used as sample size, so I suggest to rename it.

Style issue (here and in many other places): binary operators are
separate by a space from both sides:

#define MAX_SAMPLE_SIZE (BTRFS_MAX_UNCOMPRESSED * READ_SIZE / ITER_SHIFT)

> +
> +struct bucket_item {
> + u32 count;
> +};
> +
>  struct workspace {
> + u8  *sample;
> + /* Partial copy of input data */
> + u32 sample_size;
> + /* Bucket store counter for each byte type */
> + struct bucket_item bucket[BUCKET_SIZE];
>   struct list_head list;
>  };
> 
>  static void heuristic_free_workspace(struct list_head *ws)
>  {
>   struct workspace *workspace = list_entry(ws, struct workspace, list);
> +
> + kvfree(workspace->sample);
>   kfree(workspace);
>  }
> 
> @@ -38,9 +54,16 @@ static struct list_head *heuristic_alloc_workspace(void)
>   if (!workspace)
>   return ERR_PTR(-ENOMEM);
> 
> + workspace->sample = kvmalloc(MAX_SAMPLE_SIZE, GFP_KERNEL);
> + if (!workspace->sample)
> + goto fail;
> +
>   INIT_LIST_HEAD(&workspace->list);
> 
>   return &workspace->list;
> +fail:
> + heuristic_free_workspace(&workspace->list);
> + return ERR_PTR(-ENOMEM);
>  }

Regarding the workspaces: I think we'll need to separate them completely
from the compression workspaces. The size is different, the usage
pattern is different and they can be reused by all compression types.

The size is obvious, if the size is eg. 4k, and we have like 128k for
compression workspace, we'd waste too much space.

The heruistic workspace could be used more frequently, when we just
sample the data and maybe decide not to compress. Here the locks and
workspaces will not interfere with actuall compression and
decompression.

So the idea is to preallocate a few heuristic workspaces in a similar
way as we do for compression, and add similar fallback logic, when we
need them but all are used.

I'm hoping that the heuristic calculations will be fast enough that we
can avoid the fallback allocation at all.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 1/6] Btrfs: heuristic make use compression workspaces

2017-09-27 Thread David Sterba
On Fri, Aug 25, 2017 at 12:18:40PM +0300, Timofey Titovets wrote:
> Move heuristic to external file
> Implement compression workspaces support for
> heuristic resources

I think the file compression.c is suitable, there's not that much code
the heuristic adds and it its only related to compresession.

> Signed-off-by: Timofey Titovets 
> ---
>  fs/btrfs/Makefile  |  2 +-
>  fs/btrfs/compression.c | 18 +
>  fs/btrfs/compression.h |  7 -
>  fs/btrfs/heuristic.c   | 73 
> ++
>  4 files changed, 87 insertions(+), 13 deletions(-)
>  create mode 100644 fs/btrfs/heuristic.c
> 
> diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
> index 128ce17a80b0..6fa8479dff43 100644
> --- a/fs/btrfs/Makefile
> +++ b/fs/btrfs/Makefile
> @@ -9,7 +9,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
> root-tree.o dir-item.o \
>  export.o tree-log.o free-space-cache.o zlib.o lzo.o \
>  compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
>  reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
> -uuid-tree.o props.o hash.o free-space-tree.o
> +uuid-tree.o props.o hash.o free-space-tree.o heuristic.o
> 
>  btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
>  btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 883ecc58fd0d..f0aaf27bcc95 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -704,6 +704,7 @@ static struct {
>  static const struct btrfs_compress_op * const btrfs_compress_op[] = {
>   &btrfs_zlib_compress,
>   &btrfs_lzo_compress,
> + &btrfs_heuristic,
>  };
> 
>  void __init btrfs_init_compress(void)
> @@ -1065,18 +1066,13 @@ int btrfs_decompress_buf2page(const char *buf, 
> unsigned long buf_start,
>   */
>  int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
>  {
> - u64 index = start >> PAGE_SHIFT;
> - u64 end_index = end >> PAGE_SHIFT;
> - struct page *page;
> - int ret = 1;
> + int ret;
> + enum btrfs_compression_type type = BTRFS_HEURISTIC;
> + struct list_head *workspace = find_workspace(type);
> 
> - while (index <= end_index) {
> - page = find_get_page(inode->i_mapping, index);
> - kmap(page);
> - kunmap(page);
> - put_page(page);
> - index++;
> - }
> + ret = btrfs_compress_op[type-1]->heuristic(workspace, inode,
> +start, end);
> 
> + free_workspace(type, workspace);
>   return ret;
>  }
> diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
> index 3b1b0ac15fdc..10e9ffa6dfa4 100644
> --- a/fs/btrfs/compression.h
> +++ b/fs/btrfs/compression.h
> @@ -99,7 +99,8 @@ enum btrfs_compression_type {
>   BTRFS_COMPRESS_NONE  = 0,
>   BTRFS_COMPRESS_ZLIB  = 1,
>   BTRFS_COMPRESS_LZO   = 2,
> - BTRFS_COMPRESS_TYPES = 2,
> + BTRFS_HEURISTIC = 3,
> + BTRFS_COMPRESS_TYPES = 3,

Compression cannot be another type, I get it's for the workspace
management, that would need to be managed in another way anyway.

>  };
> 
>  struct btrfs_compress_op {
> @@ -123,10 +124,14 @@ struct btrfs_compress_op {
> struct page *dest_page,
> unsigned long start_byte,
> size_t srclen, size_t destlen);
> +
> + int (*heuristic)(struct list_head *workspace,
> +  struct inode *inode, u64 start, u64 end);
>  };
> 
>  extern const struct btrfs_compress_op btrfs_zlib_compress;
>  extern const struct btrfs_compress_op btrfs_lzo_compress;
> +extern const struct btrfs_compress_op btrfs_heuristic;
> 
>  int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end);
> 
> diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
> new file mode 100644
> index ..92f9335bafd4
> --- /dev/null
> +++ b/fs/btrfs/heuristic.c
> @@ -0,0 +1,73 @@
> +/*
> + * Copyright (c) 2017
> + * All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License v2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "compression.h"
> +
> +struct workspace {
> + struct list_head list;
> +};
> +
> +static void heuristic_free_workspace(struct list_head *ws)
> +{
> + struct workspace *workspace = list_entry(ws, struct workspace, list);
> + kfree(workspace);
> +}
> +
> +static struct list_head *heuristic_alloc_workspace(void)
> +{
> + struct workspace *wor

Re: [PATCH 3/3] Btrfs: remove nr_async_submits and async_submit_draining

2017-09-27 Thread David Sterba
On Thu, Sep 07, 2017 at 11:22:22AM -0600, Liu Bo wrote:
> Now that we have the combo of flushing twice, which can make sure IO
> have started since the second flush will wait for page lock which
> won't be unlocked unless setting page writeback and queuing ordered
> extents, we don't need %async_submit_draining, %async_delalloc_pages
> and %nr_async_submits to tell whether IO have actually started.
> 
> Moreover, all the flushers in use are followed by functions that wait
> for ordered extents to complete, so %nr_async_submits, which tracks
> whether bio's async submit has made progress, doesn't really make
> sense.
> 
> However, %async_delalloc_pages is still required by shrink_delalloc()
> as that function doesn't flush twice in the normal case (just issues a
> writeback with WB_REASON_FS_FREE_SPACE).
> 
> Signed-off-by: Liu Bo 

Patches like this are almost impossible to review just from the code.
This has runtime implications that we'd need to observe in real on
various workloads.

So, the patches "look good", but I did not try to forsee all the
scenarios where threads interact through the counters. I think it should
be safe to add them to for-next and give enough time to test them.

The performance has varied wildly in the last cycle(s) so it's hard to
get a baseline, other than another major release. I do expect changes in
performance characteristics but still hope it'll be the same on average.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Give up on bcache?

2017-09-27 Thread Austin S. Hemmelgarn

On 2017-09-26 18:46, Ferry Toth wrote:

Op Tue, 26 Sep 2017 15:52:44 -0400, schreef Austin S. Hemmelgarn:


On 2017-09-26 12:50, Ferry Toth wrote:

Looking at the Phoronix benchmark here:

https://www.phoronix.com/scan.php?page=article&item=linux414-bcache-
raid&num=2

I think it might be idle hopes to think bcache can be used as a ssd
cache for btrfs to significantly improve performance.. True, the
benchmark is using ext.

It's a benchmark.  They're inherently synthetic and workload specific,
and therefore should not be trusted to represent things accurately for
arbitrary use cases.


So what. A decent benchmark tries to measure a specific aspect of the fs.
Yes, and it usually measures it using a ridiculously unrealistic 
workload.  Some of the benchmarks in iozone are a good example of this, 
like the backwards read one (there is nearly nothing that it provides 
any useful data for).  For a benchmark to be meaningful, you have to 
test what you actually intend to use, and from a practical perspective, 
that article is primarily testing throughput, which is not something you 
should be using SSD caching for.


I think you agree that applications doing lots of fsyncs (databases,
dpkg) are slow on btrfs especially on hdd's, whatever way you measure
that (it feels slow, it measures slow, it really is slow).
Yes, but they're also slow on _everything_.  fsync() is slow.  Period. 
It just more of an issue on BTRFS because it's a CoW filesystem _and_ 
it's slower than ext4 even with that CoW layer bypassed.


On a ssd the problem is less.
And most of that is a result of the significantly higher bulk throughput 
on the SSD, which is not something that SSD caching replicates.


So if you can fix that by using a ssd cache or a hybrid solution, how
would you like to compare that? It _feels_ faster?
That depends.  If it's on a desktop, then that actually is one of the 
best ways to test it, since user perception is your primary quality 
metric (you can make the fastest system in the world, but if the user 
can't tell, you've gained nothing).  If you're on anything else, you 
test the actual workload if possible, and a benchmark that tries to 
replicate the workload if not.  Put another way, if you're building a 
PGSQL server, you should be bench-marking things with a PGSQL 
bench-marking tool, not some arbitrary that likely won't replicate a 
PGSQL workload.



But the most important one (where btrfs always shows to be a little
slow)
would be the SQLLite test. And with ext at least performance _degrades_
except for the Writeback mode, and even there is nowhere near what the
SSD is capable of.

And what makes you think it will be?  You're using it as a hot-data
cache, not a dedicated write-back cache, and you have the overhead from
bcache itself too.  Just some simple math based on examining the bcache
code suggests you can't get better than about 98% of the SSD's
performance if you're lucky, and I'd guess it's more like 80% most of
the time.


I think with btrfs it will be even worse and that it is a fundamental
problem: caching is complex and the cache can not how how the data on
the fs is used.

Actually, the improvement from using bcache with BTRFS is higher
proportionate to the baseline of not using it by a small margin than it
is when used with ext4.  BTRFS does a lot more with the disk, so you
have a lot more time spent accessing the disk, and thus more time that
can be reduced by improving disk performance.  While the CoW nature of
BTRFS does somewhat mitigate the performance improvement from using
bcache, it does not completely negate it.


I would like to reverse this, how much degradation do you suffer from
btrfs on a ssd as baseline compared to btrfs on a mixed ssd/hdd system.
Performance-wise?  It's workload dependent, but in most case it's a hit 
regardless of if you're using BTRFS or some other filesystem.


If instead you're asking what the difference in device longevity, you 
can probably expect the SSD to wear out faster in the second case. 
Unless you have a reasonably big SSD and are using write-around caching, 
every write will hit the SSD too, and you'll end up with lots of 
rewrites on the SSD.


IMHO you are hoping to get ssd performance at hdd cost.
Then you're looking at the wrong tool.  The primary use cases for SSD 
caching are smoothing latency and improving interactivity by reducing 
head movement.  Any other measure of performance is pretty much 
guaranteed to be worse with SSD caching than just using an SSD, and bulk 
throughput is often just as bad as, if not worse than, using a regular 
HDD by itself.


If you are that desperate for performance like an SSD, quit whining 
about cost and just buy an SSD.  Decent ones are down to less than 0.40 
USD per GB depending on the brand (search 'Crucial MX300' on Amazon if 
you want an example), so the cost isn't nearly as bad as people make it 
out to be, especially considering that most the time a normal person who 
isn't doing multimedia wor

Re: [PATCH] btrfs: take the error path out if btrfs_attach_transaction() fails

2017-09-27 Thread Qu Wenruo



On 2017年09月27日 17:50, Anand Jain wrote:

btrfs_init_new_device() calls btrfs_attach_transaction() to
commit sys chunks, however take the error path out if it fails.

Signed-off-by: Anand Jain 


Reviewed-by: Qu Wenruo 

Thanks,
Qu


---
  fs/btrfs/volumes.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index fad3b10a1f81..b526d13a74da 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2494,7 +2494,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, 
const char *device_path
if (IS_ERR(trans)) {
if (PTR_ERR(trans) == -ENOENT)
return 0;
-   return PTR_ERR(trans);
+   ret = PTR_ERR(trans);
+   goto error_sysfs;
}
ret = btrfs_commit_transaction(trans);
}


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Btrfs: remove nr_async_bios

2017-09-27 Thread David Sterba
On Thu, Sep 07, 2017 at 11:22:20AM -0600, Liu Bo wrote:
> This was intended to congest higher layers to not send bios, but as
> 
> 1) the congested bit has been taken by writeback

Can you please be more specific here?

> 2) and no one is waiting for %nr_async_bios down to zero,
> 
> we can safely remove this now.

>From the original commit it looks like mechanism to avoid some write
patterns (streaming not becoming random), but the commit is from 2008,
lot of things have changed.

I think we should at least document whats' the congestion behaviour we
rely on nowadays, so that' sfor the 1). Otherwise patch looks ok.

> 
> Signed-off-by: Liu Bo 
> ---
>  fs/btrfs/ctree.h   |  1 -
>  fs/btrfs/disk-io.c |  1 -
>  fs/btrfs/volumes.c | 14 --
>  3 files changed, 16 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 3f3eb7b..27cd882 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -881,7 +881,6 @@ struct btrfs_fs_info {
>  
>   atomic_t nr_async_submits;
>   atomic_t async_submit_draining;
> - atomic_t nr_async_bios;
>   atomic_t async_delalloc_pages;
>   atomic_t open_ioctl_trans;
>  
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index f45b61f..95583e2 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2657,7 +2657,6 @@ int open_ctree(struct super_block *sb,
>   atomic_set(&fs_info->nr_async_submits, 0);
>   atomic_set(&fs_info->async_delalloc_pages, 0);
>   atomic_set(&fs_info->async_submit_draining, 0);
> - atomic_set(&fs_info->nr_async_bios, 0);
>   atomic_set(&fs_info->defrag_running, 0);
>   atomic_set(&fs_info->qgroup_op_seq, 0);
>   atomic_set(&fs_info->reada_works_cnt, 0);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index bd679bc..6e9df4d 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -450,13 +450,6 @@ static noinline void run_scheduled_bios(struct 
> btrfs_device *device)
>   pending = pending->bi_next;
>   cur->bi_next = NULL;
>  
> - /*
> -  * atomic_dec_return implies a barrier for waitqueue_active
> -  */
> - if (atomic_dec_return(&fs_info->nr_async_bios) < limit &&

And after that the variable 'limit' becomes unused, please remove it as
well.

> - waitqueue_active(&fs_info->async_submit_wait))
> - wake_up(&fs_info->async_submit_wait);
> -
>   BUG_ON(atomic_read(&cur->__bi_cnt) == 0);
>  
>   /*
> @@ -6132,13 +6125,6 @@ static noinline void btrfs_schedule_bio(struct 
> btrfs_device *device,
>   return;
>   }
>  
> - /*
> -  * nr_async_bios allows us to reliably return congestion to the
> -  * higher layers.  Otherwise, the async bio makes it appear we have
> -  * made progress against dirty pages when we've really just put it
> -  * on a queue for later
> -  */
> - atomic_inc(&fs_info->nr_async_bios);
>   WARN_ON(bio->bi_next);
>   bio->bi_next = NULL;
>  
> -- 
> 2.9.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix send ioctl on 32bit with 64bit kernel

2017-09-27 Thread David Sterba
On Tue, Sep 26, 2017 at 04:40:09PM -0400, Josef Bacik wrote:
> We pass in a pointer in our send arg struct, this means the struct size 
> doesn't
> match with 32bit user space and 64bit kernel space.  Fix this by adding a 
> compat
> mode and doing the appropriate conversion.

How does this differ from the existing workaround that adds the 32bit
structure (btrfs_ioctl_timespec_32 and
btrfs_ioctl_received_subvol_args_32 at the top of ioctl.c) and the
btrfs_compat_ioctl() (at the end of ioctl.c)?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs: Fix transaction abort during failure in btrfs_rm_dev_item

2017-09-27 Thread Nikolay Borisov
btrfs_rm_dev_item calls several function under an activa transaction, however
it fails to abort it if an error happens. Fix this by adding respective
btrfs_abort_transaction calls

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/volumes.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0e8f16c305df..640fdd35ec85 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1765,8 +1765,10 @@ static int btrfs_rm_dev_item(struct btrfs_fs_info 
*fs_info,
key.offset = device->devid;
 
ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
-   if (ret < 0)
+   if (ret < 0) {
+   btrfs_abort_transaction(trans, ret);
goto out;
+   }
 
if (ret > 0) {
ret = -ENOENT;
@@ -1775,7 +1777,7 @@ static int btrfs_rm_dev_item(struct btrfs_fs_info 
*fs_info,
 
ret = btrfs_del_item(trans, root, path);
if (ret)
-   goto out;
+   btrfs_abort_transaction(trans, ret);
 out:
btrfs_free_path(path);
btrfs_commit_transaction(trans);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs: Handle failure to add to add received uuid to uuid tree in _btrfs_ioctl_set_received_subvol

2017-09-27 Thread Nikolay Borisov
In case we failed the btrfs_uuid_tree_add we want to abort and end the 
transaction,
currently the code only does btrfs_abort_transaction and then bails out without
calling either btrfs_transaction_commit (which handles the aborted case) or
btrfs_end_transaction. Fix it by eliminating the wrong label jump

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ioctl.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 585111e055e0..7cfc68170e65 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -5154,10 +5154,8 @@ static long _btrfs_ioctl_set_received_subvol(struct file 
*file,
ret = btrfs_uuid_tree_add(trans, fs_info, sa->uuid,
  BTRFS_UUID_KEY_RECEIVED_SUBVOL,
  root->root_key.objectid);
-   if (ret < 0 && ret != -EEXIST) {
+   if (ret < 0 && ret != -EEXIST)
btrfs_abort_transaction(trans, ret);
-   goto out;
-   }
}
ret = btrfs_commit_transaction(trans);
 out:
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs: Remove unnecessary btrfs_abort_transaction on transaction commit failure

2017-09-27 Thread Nikolay Borisov
If btrfs_transaction_commit fails it will proceed to call cleanup_transaction,
which in turn already does btrfs_abort_transaction. So let's remove the
unnecessary code duplication.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ioctl.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index d6715c2bcdc4..585111e055e0 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -5160,11 +5160,6 @@ static long _btrfs_ioctl_set_received_subvol(struct file 
*file,
}
}
ret = btrfs_commit_transaction(trans);
-   if (ret < 0) {
-   btrfs_abort_transaction(trans, ret);
-   goto out;
-   }
-
 out:
up_write(&fs_info->subvol_sem);
mnt_drop_write_file(file);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Few transaction semantic fixes

2017-09-27 Thread Nikolay Borisov
While looking at the transaction code and various pattern I came across some
problems which this series aim to fix. 

Patch 1 just removes some redundant code. 

Patch 2 adds a missing btrfs_abort_transaction, otherwise we don't properly 
close out the transaction. I believe this could lead to btrfs_root_item being 
modified yet we could be missing respective entry in the uuid tree. 

Patch 3 btrfs_rm_dev_item totally missed aborting the transaction in its failure
cases.

Nikolay Borisov (3):
  btrfs: Remove unnecessary btrfs_abort_transaction on transaction
commit failure
  btrfs: Handle failure to add received uuid to uuid tree in 
  _btrfs_ioctl_set_received_subvol
  btrfs: Fix transaction abort during failure in btrfs_rm_dev_item

 fs/btrfs/ioctl.c   | 9 +
 fs/btrfs/volumes.c | 6 --
 2 files changed, 5 insertions(+), 10 deletions(-)

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/2] btrfs: undo writable when sprouting fails

2017-09-27 Thread Anand Jain



On 09/27/2017 01:49 AM, David Sterba wrote:

On Tue, Sep 26, 2017 at 08:57:47PM +0800, Qu Wenruo wrote:



On 2017年09月26日 16:41, Anand Jain wrote:

When new device is being added to seed FS, seed FS is marked writable,
but when we fail to bring in the new device, we missed to undo the
writable part. This patch fixes it.

Signed-off-by: Anand Jain 
---
   fs/btrfs/volumes.c | 2 ++
   1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0e8f16c305df..9d64700cc9b6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2501,6 +2501,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, 
const char *device_path
return ret;
   
   error_trans:

+   if (seeding_dev)
+   sb->s_flags |= MS_RDONLY;


Still the same question.

---
if (seeding_dev) {
mutex_unlock(&uuid_mutex);
up_write(&sb->s_umount);

if (ret) /* transaction commit */
return ret;

ret = btrfs_relocate_sys_chunks(fs_info);
if (ret < 0)
btrfs_handle_fs_error(fs_info, ret,
"Failed to relocate sys chunks after device 
initialization. This
can be fixed using the \"btrfs balance\" command.");
trans = btrfs_attach_transaction(root);
if (IS_ERR(trans)) {
if (PTR_ERR(trans) == -ENOENT)
return 0;
return PTR_ERR(trans);
}
---
In the above btrfs_attch_transaction() error handler, which just
returned without going to error_trans tags, did we misseed the s_flag
revert thing?


I think so. Also I think "sb->s_flags |= MS_RDONLY;" can be moved to the
'if' after the error: label, where we already check the 'seeding_dev'
condition.


Qu, David,

 Thanks. I have fixed that in a new patch as its wasn't about fixing 
the BUG_ON().


 Further, there is another bug that, in the original code we failed to 
undo the btrfs_prepare_sprout() part. I am attempting to fix that, its 
bit complicated. For now, fix for Qu found bug is sent to the ML.


-Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: take the error path out if btrfs_attach_transaction() fails

2017-09-27 Thread Anand Jain
btrfs_init_new_device() calls btrfs_attach_transaction() to
commit sys chunks, however take the error path out if it fails.

Signed-off-by: Anand Jain 
---
 fs/btrfs/volumes.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index fad3b10a1f81..b526d13a74da 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2494,7 +2494,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, 
const char *device_path
if (IS_ERR(trans)) {
if (PTR_ERR(trans) == -ENOENT)
return 0;
-   return PTR_ERR(trans);
+   ret = PTR_ERR(trans);
+   goto error_sysfs;
}
ret = btrfs_commit_transaction(trans);
}
-- 
2.13.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fstests: btrfs/150 regression test for reading compressed data

2017-09-27 Thread Eryu Guan
On Tue, Sep 26, 2017 at 05:18:51PM -0700, Liu Bo wrote:
> On Tue, Sep 26, 2017 at 04:37:52PM -0700, Liu Bo wrote:
> > On Tue, Sep 26, 2017 at 05:02:36PM +0800, Eryu Guan wrote:
> > > On Fri, Sep 22, 2017 at 05:21:27PM -0600, Liu Bo wrote:
> > > > We had a bug in btrfs compression code which could end up with a
> > > > kernel panic.
> > > > 
> > > > This is adding a regression test for the bug and I've also sent a
> > > > kernel patch to fix the bug.
> > > > 
> > > > The patch is "Btrfs: fix kernel oops while reading compressed data".
> > > > 
> > > > Signed-off-by: Liu Bo 
> > > 
> > > Hmm, I can't reproduce the panic with 4.13 kernel, which doesn't have
> > > the fix applied. Can you please help confirm if it panics on your test
> > > environment?
> > >
> > 
> > Yes, it is reproducible on my box, hrm...I'll be running it more times
> > to double check.
> > 
> 
> It worked for me...both v4.13 and v4.14.0-rc2 have the following
> messages[1].
> 
> This requires two config:
> CONFIG_FAULT_INJECTION=y
> CONFIG_FAULT_INJECTION_DEBUG_FS=y
> 
> Could you please check again?

I re-compiled 4.14-rc2 kernel on my test vm with FAIL_MAKE_REQUEST
enabled (which requires FAULT_INJECTION), and I can reproduce the crash
now. It was so weired that previously I did have FAIL_MAKE_REQUEST
enabled and test ran normally without hitting the bug, but now I can hit
the bug quite reliably. Not sure what was happning in my previous test..

Thanks for confirming!

Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs: Remove received_uuid during received snapshot ro->rw switch

2017-09-27 Thread Nikolay Borisov
Currently when a read-only snapshot is received and subsequently its ro property
is set to false i.e. switched to rw-mode the received_uuid of that subvol 
remains
intact. However, once the received volume is switched to RW mode we cannot
guaranteee that it contains the same data, so it makes sense to remove the
received uuid. The presence of the received_uuid can also cause problems when
the volume is being send.

Signed-off-by: Nikolay Borisov 
---

Changes since v1: 
 * Dropped patch 1/2 which was doing a bit of refactoring w.r.t transaction 
 failure handling and instead folded the once change with made sense (
 ret = btrfs_commit_transaction) under the out_reset label.

 fs/btrfs/ioctl.c | 36 
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index d6715c2bcdc4..2049da6a0698 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1811,6 +1811,17 @@ static noinline int btrfs_ioctl_subvol_setflags(struct 
file *file,
goto out_drop_sem;
 
root_flags = btrfs_root_flags(&root->root_item);
+
+   /*
+* 1 - root item
+* 1 - uuid item
+*/
+   trans = btrfs_start_transaction(root, 2);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   goto out_drop_sem;
+   }
+
if (flags & BTRFS_SUBVOL_RDONLY) {
btrfs_set_root_flags(&root->root_item,
 root_flags | BTRFS_ROOT_SUBVOL_RDONLY);
@@ -1824,27 +1835,36 @@ static noinline int btrfs_ioctl_subvol_setflags(struct 
file *file,
btrfs_set_root_flags(&root->root_item,
 root_flags & ~BTRFS_ROOT_SUBVOL_RDONLY);
spin_unlock(&root->root_item_lock);
+   if 
(!btrfs_is_empty_uuid(root->root_item.received_uuid)) {
+   ret = btrfs_uuid_tree_rem(trans, fs_info,
+  root->root_item.received_uuid,
+  BTRFS_UUID_KEY_RECEIVED_SUBVOL,
+  root->root_key.objectid);
+
+   if (ret && ret != -ENOENT) {
+   btrfs_abort_transaction(trans, ret);
+   goto out_reset;
+   }
+
+   memset(root->root_item.received_uuid, 0,
+  BTRFS_UUID_SIZE);
+   }
} else {
spin_unlock(&root->root_item_lock);
btrfs_warn(fs_info,
   "Attempt to set subvolume %llu read-write 
during send",
   root->root_key.objectid);
ret = -EPERM;
-   goto out_drop_sem;
+   btrfs_abort_transaction(trans, ret);
+   goto out_reset;
}
}
 
-   trans = btrfs_start_transaction(root, 1);
-   if (IS_ERR(trans)) {
-   ret = PTR_ERR(trans);
-   goto out_reset;
-   }
-
ret = btrfs_update_root(trans, fs_info->tree_root,
&root->root_key, &root->root_item);
 
-   btrfs_commit_transaction(trans);
 out_reset:
+   ret = btrfs_commit_transaction(trans);
if (ret)
btrfs_set_root_flags(&root->root_item, root_flags);
 out_drop_sem:
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: Abort transaction if btrfs_update_root fails in btrfs_ioctl_subvol_setflags

2017-09-27 Thread Nikolay Borisov


On 26.09.2017 20:39, David Sterba wrote:
> On Tue, Sep 26, 2017 at 05:27:41PM +0300, Nikolay Borisov wrote:
>> btrfs_update_root can fail for any number of reasons, however the only error
>> handling we do is to revert the modified flags, yet we do not abort the
>> transaction but proceed to commit it.
> 
> AFAICS btrfs_update_root aborts the transaction itself, so it's not
> needed in the caller.
> 
>> Fix this by explicitly checking if the
>> update root routine has failed and abort the transaction.
>>
>> Fixes: 0caa102da827 ("Btrfs: Add BTRFS_IOC_SUBVOL_GETFLAGS/SETFLAGS ioctls")
>> Signed-off-by: Nikolay Borisov 
>> ---
>>  fs/btrfs/ioctl.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index d6715c2bcdc4..09fcd51f0e8c 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -1842,8 +1842,11 @@ static noinline int 
>> btrfs_ioctl_subvol_setflags(struct file *file,
>>  
>>  ret = btrfs_update_root(trans, fs_info->tree_root,
>>  &root->root_key, &root->root_item);
>> +if (ret < 0)
>> +btrfs_abort_transaction(trans, ret);
>>  
>>  btrfs_commit_transaction(trans);
> 
> I think the error from commit_transaction should be returned as well if
> we get here.
> 
> So:
> 
> ret = btrfs_update_root();
> 
> if (ret < 0) {
>   end_transaction();
>   goto out_reset;
> }
> 
> ret = btrfs_commit_transaction();
> 
> out_reset:
> /* and then as it is */
> if (ret)
>   btrfs_set_root_flags(...);
> 
> etc.

I think even this is not needed, since btrfs_commit_transaction actually
handles aborted transaction by calling btrfs_end_transaction and
returning the error with which the trans was aborted. So I will just
drop this patch


> 
> 
> 
>> +
>>  out_reset:
>>  if (ret)
>>  btrfs_set_root_flags(&root->root_item, root_flags);
>> -- 
>> 2.7.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Wrong device?

2017-09-27 Thread Ari Saastamoinen

On Tue, 26 Sep 2017, Duncan wrote:


consider it worth having.  No backups means you are defining the data to
be of less value than the time/trouble/resources to make the backup, so
loss is never a big deal, because either you either have a backup if you


I know importance of backups, and this problematic machine are backups  (I 
still have originals if this crashes)



That's why btrfs is listing one of the md components as part of the
filesystem -- it obviously has the same btrfs UUID as the md device that
you actually created the filesystem on.


# for I in c d e f g h i j k l n o p q r ; do blkid /dev/sd$I ; done ; blkid 
/dev/md3
/dev/sdc: UUID="d68e26ee-baeb-f40d-c579-2d506e18d378" 
UUID_SUB="cffdd56f-f8d6-5191-bd7b-d528018e1106" LABEL="dyn-50:3" TYPE="linux_raid_member"
/dev/sdd: UUID="d68e26ee-baeb-f40d-c579-2d506e18d378" 
UUID_SUB="5d586780-d22f-a68b-769a-5c1bfe9491fb" LABEL="dyn-50:3" TYPE="linux_raid_member"
/dev/sde: UUID="d68e26ee-baeb-f40d-c579-2d506e18d378" 
UUID_SUB="5777fb6a-7156-3a70-5384-789ac8dd5656" LABEL="dyn-50:3" TYPE="linux_raid_member"
/dev/sdf: UUID="d68e26ee-baeb-f40d-c579-2d506e18d378" 
UUID_SUB="f017f1c7-df26-9279-ccb9-002b8ce1edd9" LABEL="dyn-50:3" TYPE="linux_raid_member"
/dev/sdg: UUID="d68e26ee-baeb-f40d-c579-2d506e18d378" 
UUID_SUB="0de626d7-f754-e53b-792b-3217f6cb63d2" LABEL="dyn-50:3" TYPE="linux_raid_member"
/dev/sdh: UUID="d68e26ee-baeb-f40d-c579-2d506e18d378" 
UUID_SUB="036e8679-2c6f-4a48-601e-1379cdd0a3eb" LABEL="dyn-50:3" TYPE="linux_raid_member"
/dev/sdi: UUID="d68e26ee-baeb-f40d-c579-2d506e18d378" 
UUID_SUB="f9cb6c12-dc73-92b9-363c-fafb5b0efc81" LABEL="dyn-50:3" TYPE="linux_raid_member"
/dev/sdj: UUID="d68e26ee-baeb-f40d-c579-2d506e18d378" 
UUID_SUB="7f3f09d8-d09d-c263-b292-a9d5d004c7ce" LABEL="dyn-50:3" TYPE="linux_raid_member"
/dev/sdk: UUID="d68e26ee-baeb-f40d-c579-2d506e18d378" 
UUID_SUB="07bde9df-8f5b-518d-78fa-3cff3a950eb4" LABEL="dyn-50:3" TYPE="linux_raid_member"
/dev/sdl: UUID="d68e26ee-baeb-f40d-c579-2d506e18d378" 
UUID_SUB="1895c727-e8b2-463a-4f67-e260ee6e98ff" LABEL="dyn-50:3" TYPE="linux_raid_member"
/dev/sdn: UUID="d68e26ee-baeb-f40d-c579-2d506e18d378" 
UUID_SUB="72a6d25c-a7c0-3329-5f15-8abb7c3edbdb" LABEL="dyn-50:3" TYPE="linux_raid_member"
/dev/sdo: UUID="d68e26ee-baeb-f40d-c579-2d506e18d378" 
UUID_SUB="3580f74a-1605-3e5e-9396-9e929eaeb6fc" LABEL="dyn-50:3" TYPE="linux_raid_member"
/dev/sdp: UUID="d68e26ee-baeb-f40d-c579-2d506e18d378" 
UUID_SUB="006883cb-76f6-2967-1a5a-225d258d72b0" LABEL="dyn-50:3" TYPE="linux_raid_member"
/dev/sdq: UUID="d68e26ee-baeb-f40d-c579-2d506e18d378" 
UUID_SUB="b72b2bc3-d983-ed27-f64a-fea9b27e42bf" LABEL="dyn-50:3" TYPE="linux_raid_member"
/dev/sdr: UUID="d68e26ee-baeb-f40d-c579-2d506e18d378" 
UUID_SUB="bd7b687d-2882-3709-abe4-03f1a1010a67" LABEL="dyn-50:3" TYPE="linux_raid_member"
/dev/md3: UUID="84a31ba6-d22d-4653-b071-5525dbdfe561" 
UUID_SUB="36cf435b-e589-45dd-99a5-5efdb1439913" TYPE="btrfs"

Those sd* devices are member of md, and seems they aren't equal with that
md-device

--
Ari Saastamoinen
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/5] btrfs-progs: check: call repair_root_items() before any repair

2017-09-27 Thread Su Yue
The Annotation of repair_root_items():
"This must be run before any other repair code - not doing it so,
makes other repair code delete or modify backrefs in the extent tree
for example, which will result in an inconsistent fs after repairing
the root items."

However, the rule was broken by commit 1f728b1a514f ("Btrfs-progs,
fsck: move root items repair after root rebuilding").
The commit intends to fix failure of test-fsck/013 so it moves
repair_root_items() after check_extents_and_chunks().

The correct way is to skip calling repair_root_item() when
init_extent_tree is nonzero.
Now put repair_root_items() before do_check_chunks_and_extents() and
not call repair_root_items() if do init_extent_tree.
Then test-fsck/013 works well.

Signed-off-by: Su Yue 
---
Changelog:
v2:
   Correct commit number reported. (Suggested by Qu Wenruo)
v3:
   Change commit format. (Suggested by Qu Wenruo)
   
 cmds-check.c | 43 +++
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index 93b47194..3e2f9faa 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -14644,32 +14644,35 @@ int cmd_check(int argc, char **argv)
goto close_out;
}
 
+   if (!init_extent_tree) {
+   ret = repair_root_items(info);
+   err |= !!ret;
+   if (ret < 0) {
+   error("failed to repair root items: %s",
+ strerror(-ret));
+   goto close_out;
+   }
+   if (repair) {
+   fprintf(stderr, "Fixed %d roots.\n", ret);
+   ret = 0;
+   } else if (ret > 0) {
+   fprintf(stderr,
+   "Found %d roots with an outdated root item.\n",
+   ret);
+   fprintf(stderr,
+   "Please run a filesystem check with the option 
--repair to fix them.\n");
+   ret = 1;
+   err |= !!ret;
+   goto close_out;
+   }
+   }
+
ret = do_check_chunks_and_extents(info);
err |= !!ret;
if (ret)
error(
"errors found in extent allocation tree or chunk allocation");
 
-   ret = repair_root_items(info);
-   err |= !!ret;
-   if (ret < 0) {
-   error("failed to repair root items: %s", strerror(-ret));
-   goto close_out;
-   }
-   if (repair) {
-   fprintf(stderr, "Fixed %d roots.\n", ret);
-   ret = 0;
-   } else if (ret > 0) {
-   fprintf(stderr,
-  "Found %d roots with an outdated root item.\n",
-  ret);
-   fprintf(stderr,
-   "Please run a filesystem check with the option --repair 
to fix them.\n");
-   ret = 1;
-   err |= !!ret;
-   goto close_out;
-   }
-
if (!ctx.progress_enabled) {
if (btrfs_fs_compat_ro(info, FREE_SPACE_TREE))
fprintf(stderr, "checking free space tree\n");
-- 
2.14.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/5] btrfs-progs: check: call repair_root_items() before any repair

2017-09-27 Thread Su Yue
The Annotation of repair_root_items():
"This must be run before any other repair code - not doing it so,
makes other repair code delete or modify backrefs in the extent tree
for example, which will result in an inconsistent fs after repairing
the root items."

However, the rule was broken by commit 1f728b1a514f.
The commit intends to fix failure of test-fsck/013 so it moves
repair_root_items() after check_extents_and_chunks().

The correct way is to skip calling repair_root_item() when
init_extent_tree is nonzero.
Now put repair_root_items() before do_check_chunks_and_extents() and
not call repair_root_items() if do init_extent_tree.
Then test-fsck/013 works well.

---
Changelog:
v2:
   Correct commit number reported. (Suggested by Qu Wenruo)

   Signed-off-by: Su Yue 
---
 cmds-check.c | 43 +++
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index 93b47194..3e2f9faa 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -14644,32 +14644,35 @@ int cmd_check(int argc, char **argv)
goto close_out;
}
 
+   if (!init_extent_tree) {
+   ret = repair_root_items(info);
+   err |= !!ret;
+   if (ret < 0) {
+   error("failed to repair root items: %s",
+ strerror(-ret));
+   goto close_out;
+   }
+   if (repair) {
+   fprintf(stderr, "Fixed %d roots.\n", ret);
+   ret = 0;
+   } else if (ret > 0) {
+   fprintf(stderr,
+   "Found %d roots with an outdated root item.\n",
+   ret);
+   fprintf(stderr,
+   "Please run a filesystem check with the option 
--repair to fix them.\n");
+   ret = 1;
+   err |= !!ret;
+   goto close_out;
+   }
+   }
+
ret = do_check_chunks_and_extents(info);
err |= !!ret;
if (ret)
error(
"errors found in extent allocation tree or chunk allocation");
 
-   ret = repair_root_items(info);
-   err |= !!ret;
-   if (ret < 0) {
-   error("failed to repair root items: %s", strerror(-ret));
-   goto close_out;
-   }
-   if (repair) {
-   fprintf(stderr, "Fixed %d roots.\n", ret);
-   ret = 0;
-   } else if (ret > 0) {
-   fprintf(stderr,
-  "Found %d roots with an outdated root item.\n",
-  ret);
-   fprintf(stderr,
-   "Please run a filesystem check with the option --repair 
to fix them.\n");
-   ret = 1;
-   err |= !!ret;
-   goto close_out;
-   }
-
if (!ctx.progress_enabled) {
if (btrfs_fs_compat_ro(info, FREE_SPACE_TREE))
fprintf(stderr, "checking free space tree\n");
-- 
2.14.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] btrfs-progs: check: call repair_root_items() before any repair

2017-09-27 Thread Qu Wenruo



On 2017年09月27日 14:34, Su Yue wrote:

The Annotation of repair_root_items():
"This must be run before any other repair code - not doing it so,
makes other repair code delete or modify backrefs in the extent tree
for example, which will result in an inconsistent fs after repairing
the root items."

However, the rule was broken by commit
"6896ab801d47419fa06d886bae4bf31d0287ced1" which intends to fix


Wrong commit.

commit 6896ab801d47419fa06d886bae4bf31d0287ced1
Author: Qu Wenruo 
Date:   Mon Sep 11 15:36:12 2017 +0900

btrfs-progs: docs: mkfs: Add extra condition for rootdir option

This commit has no way affect btrfs check.

Please use first 12 characters and one line summary to describe the commit.
And of course, use correct commit.

Thanks,
Qu



failure of test-fsck/013.
The correct way is to skip calling repair_root_item() when
init_extent_tree is nonzero.

Now put repair_root_items() before do_check_chunks_and_extents() and
not call repair_root_items() if do init_extent_tree.
Then test-fsck/013 works well.

Signed-off-by: Su Yue 
---
  cmds-check.c | 43 +++
  1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index 93b47194..3e2f9faa 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -14644,32 +14644,35 @@ int cmd_check(int argc, char **argv)
goto close_out;
}
  
+	if (!init_extent_tree) {

+   ret = repair_root_items(info);
+   err |= !!ret;
+   if (ret < 0) {
+   error("failed to repair root items: %s",
+ strerror(-ret));
+   goto close_out;
+   }
+   if (repair) {
+   fprintf(stderr, "Fixed %d roots.\n", ret);
+   ret = 0;
+   } else if (ret > 0) {
+   fprintf(stderr,
+   "Found %d roots with an outdated root item.\n",
+   ret);
+   fprintf(stderr,
+   "Please run a filesystem check with the option 
--repair to fix them.\n");
+   ret = 1;
+   err |= !!ret;
+   goto close_out;
+   }
+   }
+
ret = do_check_chunks_and_extents(info);
err |= !!ret;
if (ret)
error(
"errors found in extent allocation tree or chunk allocation");
  
-	ret = repair_root_items(info);

-   err |= !!ret;
-   if (ret < 0) {
-   error("failed to repair root items: %s", strerror(-ret));
-   goto close_out;
-   }
-   if (repair) {
-   fprintf(stderr, "Fixed %d roots.\n", ret);
-   ret = 0;
-   } else if (ret > 0) {
-   fprintf(stderr,
-  "Found %d roots with an outdated root item.\n",
-  ret);
-   fprintf(stderr,
-   "Please run a filesystem check with the option --repair to 
fix them.\n");
-   ret = 1;
-   err |= !!ret;
-   goto close_out;
-   }
-
if (!ctx.progress_enabled) {
if (btrfs_fs_compat_ro(info, FREE_SPACE_TREE))
fprintf(stderr, "checking free space tree\n");


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html