Re: [PATCH] Btrfs: remove redundant btrfs_trans_release_metadata"

2018-09-05 Thread Liu Bo
Somehow this ends up with crash in btrfs/124, I'm trying to figure out
what went wrong.

thanks,
liubo


On Tue, Sep 4, 2018 at 6:14 PM, Liu Bo  wrote:
> __btrfs_end_transaction() has done the metadata release twice,
> probably because it used to process delayed refs in between, but now
> that we don't process delayed refs any more, the 2nd release is always
> a noop.
>
> Signed-off-by: Liu Bo 
> ---
>  fs/btrfs/transaction.c | 6 --
>  1 file changed, 6 deletions(-)
>
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index bb1b9f526e98..94b036a74d11 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -826,12 +826,6 @@ static int __btrfs_end_transaction(struct 
> btrfs_trans_handle *trans,
> return 0;
> }
>
> -   btrfs_trans_release_metadata(trans);
> -   trans->block_rsv = NULL;
> -
> -   if (!list_empty(>new_bgs))
> -   btrfs_create_pending_block_groups(trans);
> -
> trans->delayed_ref_updates = 0;
> if (!trans->sync) {
> must_run_delayed_refs =
> --
> 1.8.3.1
>


Re: [PATCH] btrfs: qgroup: Don't trace subtree if we're dropping tree reloc tree

2018-09-05 Thread Qu Wenruo


On 2018/9/5 下午9:11, David Sterba wrote:
> On Wed, Sep 05, 2018 at 01:03:39PM +0800, Qu Wenruo wrote:
>> Tree reloc tree doesn't contribute to qgroup numbers, as we have
> 
> I think you can call it just 'reloc tree', I'm fixing that in all
> changelogs and comments anyway.

But there is another tree called data reloc tree.
That why I'm sticking to tree reloc tree to distinguish from data reloc
tree.

> 
>> accounted them at balance time (check replace_path()).
>>
>> Skip such unneeded subtree trace should reduce some performance
>> overhead.
> 
> Please provide some numbers or description of the improvement. There are
> several performance problems caused by qgroups so it would be good to
> get a better idea how much this patch is going to help. Thanks.

That's the problem.
For my internal test, with 3000+ tree blocks, metadata balance could
save about 1~2%.
But according to dump-tree, the tree layout is almost the worst case
scenario, just one metadata block group owns all the tree blocks.

To get a real world scenario, I need a file with hundreds GB or even
several TB and populate it with a good amount of inline files and enough
CoW to fragment the metadata usage.
Which I don't have such free space.

Anyone who is still struggling with balance + quota, any test data is
appreciated.

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


Re: dduper - Offline btrfs deduplication tool

2018-09-05 Thread Timofey Titovets
пт, 24 авг. 2018 г. в 7:41, Lakshmipathi.G :
>
> Hi -
>
> dduper is an offline dedupe tool. Instead of reading whole file blocks and
> computing checksum, It works by fetching checksum from BTRFS csum tree. This
> hugely improves the performance.
>
> dduper works like:
> - Read csum for given two files.
> - Find matching location.
> - Pass the location to ioctl_ficlonerange directly
>   instead of ioctl_fideduperange
>
> By default, dduper adds safty check to above steps by creating a
> backup reflink file and compares the md5sum after dedupe.
> If the backup file matches new deduped file, then backup file is
> removed. You can skip this check by passing --skip option. Here is
> sample cli usage [1] and quick demo [2]
>
> Some performance numbers: (with -skip option)
>
> Dedupe two 1GB files with same  content - 1.2 seconds
> Dedupe two 5GB files with same  content - 8.2 seconds
> Dedupe two 10GB files with same  content - 13.8 seconds
>
> dduper requires `btrfs inspect-internal dump-csum` command, you can use
> this branch [3] or apply patch by yourself [4]
>
> [1] 
> https://gitlab.collabora.com/laks/btrfs-progs/blob/dump_csum/Documentation/dduper_usage.md
> [2] http://giis.co.in/btrfs_dedupe.gif
> [3] git clone https://gitlab.collabora.com/laks/btrfs-progs.git -b  dump_csum
> [4] https://patchwork.kernel.org/patch/10540229/
>
> Please remember its version-0.1, so test it out, if you plan to use dduper 
> real data.
> Let me know, if you have suggestions or feedback or bugs :)
>
> Cheers.
> Lakshmipathi.G
>

One question:
Why not ioctl_fideduperange?
i.e. you kill most of benefits from that ioctl - atomicity.


-- 
Have a nice day,
Timofey.


Re: nbdkit as a flexible alternative to loopback mounts

2018-09-05 Thread Richard W.M. Jones
On Tue, Sep 04, 2018 at 07:55:00PM -0600, Chris Murphy wrote:
> https://rwmj.wordpress.com/2018/09/04/nbdkit-as-a-flexible-alternative-to-loopback-mounts/
> 
> This is a pretty cool writeup. I can vouch Btrfs will format mount,
> write to, scrub, and btrfs check works on an 8EiB (virtual) disk.
>
> The one thing I thought might cause a problem is the ndb device has a
> 1KiB sector size, but Btrfs (on x86_64) still uses 4096 byte "sector"
> and it all seems to work fine despite that.

Thanks for the kind words.  I did an updated post verifying what you
said and also noting that the ‘nbd-client -b’ option can be used to
adjust the sector size:

  
https://rwmj.wordpress.com/2018/09/05/nbdkit-for-loopback-pt-5-8-exabyte-btrfs-filesystem/

Btrfs still seems to believe the sector size is 4k, although as you
say it doesn't seem to cause any issues.

> Anyway, maybe it's useful for some fstests instead of file backed
> losetup devices?

One interesting feature of nbdkit is that you can write your own
plugins.  For my demonstration, I used the nbdkit-memory-plugin which
implements a purely in-memory sparse array:

  https://github.com/libguestfs/nbdkit/blob/master/plugins/memory/memory.c
  https://github.com/libguestfs/nbdkit/tree/master/common/sparse

But to test btrfs you might want to write a custom plugin.  For
example you might choose a sparse array implementation which is more
suitable for storing specifically btrfs metadata structures, or can
spill to a disk file (which nbdkit-memory-plugin cannot, except swap).

Another thing that's interesting from a testing point of view is the
ability to inject block device errors on demand.  You can either do
this using the supplied nbdkit-error-filter:

  
https://rwmj.wordpress.com/2018/09/04/nbdkit-for-loopback-pt-2-injecting-errors/

or if you were writing your own plugin you'd probably want to do it
there.

Anyway hope you find it interesting.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top


Re: [PATCH] btrfs: qgroup: Don't trace subtree if we're dropping tree reloc tree

2018-09-05 Thread David Sterba
On Wed, Sep 05, 2018 at 01:03:39PM +0800, Qu Wenruo wrote:
> Tree reloc tree doesn't contribute to qgroup numbers, as we have

I think you can call it just 'reloc tree', I'm fixing that in all
changelogs and comments anyway.

> accounted them at balance time (check replace_path()).
> 
> Skip such unneeded subtree trace should reduce some performance
> overhead.

Please provide some numbers or description of the improvement. There are
several performance problems caused by qgroups so it would be good to
get a better idea how much this patch is going to help. Thanks.


Re: [PATCH 0/3] btrfs: qgroup: Deprecate unused features for btrfs_qgroup_inherit()

2018-09-05 Thread David Sterba
On Fri, Aug 31, 2018 at 10:29:27AM +0800, Qu Wenruo wrote:
> This patchset can be fetched from github:
> https://github.com/adam900710/linux/tree/qgroup_inherit_check
> Which is based on v4.19-rc1 tag.
> 
> This patchset will first set btrfs_qgroup_inherit structure size limit
> from PAGE_SIZE to fixed SZ_4K.
> I understand this normally will cause compatibility problem, but
> considering how minor this feature is and no sane guy should use it for
> over 100 qgroups, it should be fine in real world.

Agreed, please update the changelog of 1st patch with description on
what will stop working and under what conditions. The 4k limit sounds
good enough, the real difference would be on architectures with larger
page sizes where the feature would be used.

> The 2nd patch introduce check function for btrfs_qgroup_inherit
> structure and deprecates the following features:
> 1) limit set
>Never utilized by btrfs-progs from the beginning.
> 
> 2) copy rfer/excl
>Although btrfs-progs provides support for it as a hidden,
>undocumented feature, it's the easiest way to screw up qgroup
>numbers.
>And we already have patches wondering around the ML to remove such
>support.

The deprecation should be done in a few steps. First issue a warning
that the feature is deprecated and will be removed in release X. Then
wait until somebody complains (or not) and remove the code in release X.

The X is something like 4.22, ie. at least 2 cycles after the
deprecation warning is added.


Re: [PATCH 4/4] btrfs-progs: print-tree: Use breadth-first search for btrfs_print_tree()

2018-09-05 Thread Nikolay Borisov



On  5.09.2018 09:29, Qu Wenruo wrote:
> btrfs_print_tree() uses depth-first search to print a subtree, it works
> fine until we have 3 level tree.
> 
> In that case, leaves and nodes will be printed in a depth-first order,
> making it pretty hard to locate level 1 nodes.
> 
> This patch will use breadth-first search for btrfs_print_tree().
> It will use btrfs_path::lowest_level to indicate current level, and
> print out tree blocks level by level (breadth-first).
> 
> Signed-off-by: Qu Wenruo 

Reviewed-by: Nikolay Borisov 

> ---
>  print-tree.c | 99 ++--
>  1 file changed, 73 insertions(+), 26 deletions(-)
> 
> diff --git a/print-tree.c b/print-tree.c
> index 31f6fa12522f..0509ec3da46e 100644
> --- a/print-tree.c
> +++ b/print-tree.c
> @@ -1381,6 +1381,78 @@ void btrfs_print_leaf(struct extent_buffer *eb)
>   }
>  }
>  
> +/* Helper function to reach the most left tree block at @path->lowest_level 
> */
> +static int search_leftmost_tree_block(struct btrfs_fs_info *fs_info,
> +   struct btrfs_path *path, int root_level)
> +{
> + int i;
> + int ret = 0;
> +
> + /* Release all nodes expect path->nodes[root_level] */
> + for (i = 0; i < root_level; i++) {
> + path->slots[i] = 0;
> + if (!path->nodes[i])
> + continue;
> + free_extent_buffer(path->nodes[i]);
> + }
> +
> + /* Reach the leftmost tree block by always reading out slot 0 */
> + for (i = root_level; i > path->lowest_level; i--) {
> + struct extent_buffer *eb;
> +
> + path->slots[i] = 0;
> + eb = read_node_slot(fs_info, path->nodes[i], 0);
> + if (!extent_buffer_uptodate(eb)) {
> + ret = -EIO;
> + goto out;
> + }
> + path->nodes[i - 1] = eb;
> + }
> +out:
> + return ret;
> +}
> +
> +static void bfs_print_children(struct extent_buffer *root_eb)
> +{
> + struct btrfs_fs_info *fs_info = root_eb->fs_info;
> + struct btrfs_path path;
> + int root_level = btrfs_header_level(root_eb);
> + int cur_level;
> + int ret;
> +
> + if (root_level < 1)
> + return;
> +
> + btrfs_init_path();
> + /* For path */
> + extent_buffer_get(root_eb);
> + path.nodes[root_level] = root_eb;
> +
> + for (cur_level = root_level - 1; cur_level >= 0; cur_level--) {
> + path.lowest_level = cur_level;
> +
> + /* Use the leftmost tree block as start point */
> + ret = search_leftmost_tree_block(fs_info, , root_level);

So what you do here is really get the leftmost item at until level
'cur_level'.

> + if (ret < 0)
> + goto out;
> +
> + /* Print all sibling tree blocks */
> + while (1) {
> + btrfs_print_tree(path.nodes[cur_level], 0);
Then you print the block.

> + ret = btrfs_next_sibling_tree_block(fs_info, );
And this just loads the next block at level 'cur_level', representing
the "breadth" portion.

> + if (ret < 0)
> + goto out;
> + if (ret > 0) {
> + ret = 0;
> + break;
> + }
> + }
> + }
> +out:
> + btrfs_release_path();
> + return;
> +}
> +
>  void btrfs_print_tree(struct extent_buffer *eb, int follow)
>  {
>   u32 i;
> @@ -1389,7 +1461,6 @@ void btrfs_print_tree(struct extent_buffer *eb, int 
> follow)
>   struct btrfs_fs_info *fs_info = eb->fs_info;
>   struct btrfs_disk_key disk_key;
>   struct btrfs_key key;
> - struct extent_buffer *next;
>  
>   if (!eb)
>   return;
> @@ -1431,30 +1502,6 @@ void btrfs_print_tree(struct extent_buffer *eb, int 
> follow)
>   if (follow && !fs_info)
>   return;
>  
> - for (i = 0; i < nr; i++) {
> - next = read_tree_block(fs_info,
> - btrfs_node_blockptr(eb, i),
> - btrfs_node_ptr_generation(eb, i));
> - if (!extent_buffer_uptodate(next)) {
> - fprintf(stderr, "failed to read %llu in tree %llu\n",
> - (unsigned long long)btrfs_node_blockptr(eb, i),
> - (unsigned long long)btrfs_header_owner(eb));
> - continue;
> - }
> - if (btrfs_header_level(next) != btrfs_header_level(eb) - 1) {
> - warning(
> -"eb corrupted: parent bytenr %llu slot %d level %d child bytenr %llu level 
> has %d expect %d, skipping the slot",
> - btrfs_header_bytenr(eb), i,
> - btrfs_header_level(eb),
> - btrfs_header_bytenr(next),
> - 

Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-09-05 Thread Su Yue




On 2018/9/5 8:33 PM, Christoph Anton Mitterer wrote:

On Wed, 2018-09-05 at 15:04 +0800, Su Yue wrote:

Agreed with Qu, btrfs-check shall not try to do any write.


Well.. it could have been just some coincidence :-)



I found the errors should blame to something about inode_extref check
in lowmem mode.


So you mean errors in btrfs-check... and it was a false positive?


Not so perfect original and lowmem mode of btrfs-check are.
I need to figure out what is on the actual FS, may a false alert or 
actual error.





I have writeen three patches to detect and report errors about
inode_extref. For your convenience, it's based on v4.17:
https://github.com/Damenly/btrfs-progs/tree/ext_ref_v4.17


I hope I can test them soon could take a bit longer as I'm about to
head off into vacation.


Fine, of course. Enjoy it :)

Thanks,
Su


Cheers,
Chris.



Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-09-05 Thread Christoph Anton Mitterer
On Wed, 2018-09-05 at 15:04 +0800, Su Yue wrote:
> Agreed with Qu, btrfs-check shall not try to do any write.

Well.. it could have been just some coincidence :-)


> I found the errors should blame to something about inode_extref check
> in lowmem mode.

So you mean errors in btrfs-check... and it was a false positive?


> I have writeen three patches to detect and report errors about
> inode_extref. For your convenience, it's based on v4.17:
> https://github.com/Damenly/btrfs-progs/tree/ext_ref_v4.17

I hope I can test them soon could take a bit longer as I'm about to
head off into vacation.


Cheers,
Chris.



Re: [PATCH 3/4] btrfs-progs: Introduce function to find next sibling tree block

2018-09-05 Thread Nikolay Borisov



On  5.09.2018 09:29, Qu Wenruo wrote:
> Introduce a new function, btrfs_next_sibling_tree_block(), which could
> find any sibling tree blocks at path->lowest_level, unlike level 0
> limited btrfs_next_leaf().
> 
> Since this function is more generic than btrfs_next_leaf(), also make
> btrfs_next_leaf() a wrapper of btrfs_next_sibling_tree_block(), to keep
> the interface the same as kernel.
> 
> This would provide the basis for later breadth-first search print-tree.
> 
> Signed-off-by: Qu Wenruo 

Reviewed-by: Nikolay Borisov 

> ---
>  ctree.c | 14 +-
>  ctree.h | 15 ++-
>  2 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/ctree.c b/ctree.c
> index 042fae19344d..43d47f19c9cd 100644
> --- a/ctree.c
> +++ b/ctree.c
> @@ -2875,18 +2875,22 @@ int btrfs_prev_leaf(struct btrfs_root *root, struct 
> btrfs_path *path)
>  }
>  
>  /*
> - * walk up the tree as far as required to find the next leaf.
> + * walk up the tree as far as required to find the next sibling tree block.
> + * more generic version of btrfs_next_leaf(), as it could find sibling nodes
> + * if @path->lowest_level is not 0.
> + *
>   * returns 0 if it found something or 1 if there are no greater leaves.
>   * returns < 0 on io errors.
>   */
> -int btrfs_next_leaf(struct btrfs_root *root, struct btrfs_path *path)
> +int btrfs_next_sibling_tree_block(struct btrfs_fs_info *fs_info,
> +   struct btrfs_path *path)
>  {
>   int slot;
> - int level = 1;
> + int level = path->lowest_level + 1;
>   struct extent_buffer *c;
>   struct extent_buffer *next = NULL;
> - struct btrfs_fs_info *fs_info = root->fs_info;
>  
> + BUG_ON(path->lowest_level + 1 >= BTRFS_MAX_LEVEL);
>   while(level < BTRFS_MAX_LEVEL) {
>   if (!path->nodes[level])
>   return 1;
> @@ -2915,7 +2919,7 @@ int btrfs_next_leaf(struct btrfs_root *root, struct 
> btrfs_path *path)
>   free_extent_buffer(c);
>   path->nodes[level] = next;
>   path->slots[level] = 0;
> - if (!level)
> + if (level == path->lowest_level)
>   break;
>   if (path->reada)
>   reada_for_search(fs_info, path, level, 0, 0);
> diff --git a/ctree.h b/ctree.h
> index 6df6075865c2..939c584d0301 100644
> --- a/ctree.h
> +++ b/ctree.h
> @@ -2633,7 +2633,20 @@ static inline int btrfs_insert_empty_item(struct 
> btrfs_trans_handle *trans,
>   return btrfs_insert_empty_items(trans, root, path, key, _size, 1);
>  }
>  
> -int btrfs_next_leaf(struct btrfs_root *root, struct btrfs_path *path);
> +int btrfs_next_sibling_tree_block(struct btrfs_fs_info *fs_info,
> +   struct btrfs_path *path);
> +/*
> + * walk up the tree as far as required to find the next leaf.
> + * returns 0 if it found something or 1 if there are no greater leaves.
> + * returns < 0 on io errors.
> + */
> +static inline int btrfs_next_leaf(struct btrfs_root *root,
> +   struct btrfs_path *path)
> +{
> + path->lowest_level = 0;
> + return btrfs_next_sibling_tree_block(root->fs_info, path);
> +}
> +
>  static inline int btrfs_next_item(struct btrfs_root *root,
> struct btrfs_path *p)
>  {
> 


Re: [PATCH 5/8] btrfs-progs: Wire up delayed refs

2018-09-05 Thread Nikolay Borisov



On  5.09.2018 10:46, Qu Wenruo wrote:
> 
> 
> On 2018/9/5 下午3:41, Nikolay Borisov wrote:
>>
>>
>> On  5.09.2018 08:53, Qu Wenruo wrote:
>>>
>>>
>>> On 2018/9/5 下午1:42, Nikolay Borisov wrote:


 On  5.09.2018 05:10, Qu Wenruo wrote:
>
>
> On 2018/8/16 下午9:10, Nikolay Borisov wrote:
>> This commit enables the delayed refs infrastructures. This entails doing
>> the following:
>>
>> 1. Replacing existing calls of btrfs_extent_post_op (which is the
>> equivalent of delayed refs) with the proper btrfs_run_delayed_refs.
>> As well as eliminating open-coded calls to finish_current_insert and
>> del_pending_extents which execute the delayed ops.
>>
>> 2. Wiring up the addition of delayed refs when freeing extents
>> (btrfs_free_extent) and when adding new extents (alloc_tree_block).
>>
>> 3. Adding calls to btrfs_run_delayed refs in the transaction commit
>> path alongside comments why every call is needed, since it's not always
>> obvious (those call sites were derived empirically by running and
>> debugging existing tests)
>>
>> 4. Correctly flagging the transaction in which we are reinitialising
>> the extent tree.
>>
>> 5 Moving btrfs_write_dirty_block_groups to btrfs_write_dirty_block_groups
>> since blockgroups should be written to disk after the last delayed refs
>> have been run.
>>
>> Signed-off-by: Nikolay Borisov 
>> Signed-off-by: David Sterba 
>
> Is there something (maybe btrfs_run_delayed_refs()?) missing in 
> btrfs-image?
>
> btrfs-image from devel branch can't restore image correctly, the block
> group used bytes is not correct, thus it can't pass misc nor fsck tests.

 This is really strange, all fsck/misc tests passed with those patches.
 Can you be more specific which tests exactly you mean ?
>>>
>>> One case is fsck/020 with lowmem mode. (Original mode lacks block
>>> group->used check).
>>>
>>> More specifically, fsck/020/keyed_data_ref_with_shared_leaf.img
>>>
>>> Using btrfs-image from my distribution (v4.17.1) and devel branch btrfs
>>> check: (cwd is btrfs-progs, devel branch)
>>>
>>> $ btrfs-image -r
>>> tests/fsck-tests/020-extent-ref-cases/keyed_data_ref_with_shared_leaf.img 
>>> ~/test.img
>>> $ btrfs check --mode=wmem ~/test.img
>>> Opening filesystem to check...
>>> Checking filesystem on /home/adam/test.img
>>> UUID: 12dabcf2-d4da-4a70-9701-9f3d48074e73
>>> [1/7] checking root items
>>> [2/7] checking extents
>>> [3/7] checking free space cache
>>> [4/7] checking fs roots
>>> [5/7] checking only csums items (without verifying data)
>>> [6/7] checking root refs done with fs roots in lowmem mode, skipping
>>> [7/7] checking quota groups skipped (not enabled on this FS)
>>> found 1208320 bytes used, no error found
>>> total csum bytes: 512
>>> total tree bytes: 684032
>>> total fs tree bytes: 638976
>>> total extent tree bytes: 16384
>>> btree space waste bytes: 305606
>>> file data blocks allocated: 93847552
>>>  referenced 1773568
>>>
>>> But if using btrfs-image with your delayed ref patch:
>>> $ ./btrfs-image -r
>>> tests/fsck-tests/020-extent-ref-cases/keyed_data_ref_with_shared_leaf.img 
>>> ~/test.img
>>>
>>> # No matter if I'm using btrfs-check from devel or 4.17.1
>>> $ btrfs check --mode=wmem ~/test.img
>>> Opening filesystem to check...
>>> Checking filesystem on /home/adam/test.img
>>> UUID: 12dabcf2-d4da-4a70-9701-9f3d48074e73
>>> [1/7] checking root items
>>> [2/7] checking extents
>>> ERROR: block group[4194304 8388608] used 20480 but extent items used 24576
>>> ERROR: block group[20971520 16777216] used 659456 but extent items used
>>> 655360
>>> ERROR: errors found in extent allocation tree or chunk allocation
>>> [3/7] checking free space cache
>>> [4/7] checking fs roots
>>> [5/7] checking only csums items (without verifying data)
>>> [6/7] checking root refs done with fs roots in lowmem mode, skipping
>>> [7/7] checking quota groups skipped (not enabled on this FS)
>>> found 1208320 bytes used, error(s) found
>>> total csum bytes: 512
>>> total tree bytes: 684032
>>> total fs tree bytes: 638976
>>> total extent tree bytes: 16384
>>> btree space waste bytes: 305606
>>> file data blocks allocated: 93847552
>>>  referenced 1773568
>>>
>>> I'd say, although lowmem check is still far from perfect, it indeed has
>>> extra checks original mode lacks, and in this case it indeed exposes
>>> problem.
>>
>>
>> I'm not able to reproduce it: 
>>
>> make TEST_ENABLE_OVERRIDE=ue TEST_ARGS_CHECK="--mode=lowmem"  test-fsck
>> [TEST]   fsck-tests.sh
>> [TEST/fsck]   001-bad-file-extent-bytenr
>> [TEST/fsck]   002-bad-transid
>> [TEST/fsck]   003-shift-offsets
>> [TEST/fsck]   004-no-dir-index
>> [TEST/fsck]   005-bad-item-offset
>> [TEST/fsck]   006-bad-root-items
>> [TEST/fsck]   007-bad-offset-snapshots
>> [TEST/fsck]   008-bad-dir-index-name
>> [TEST/fsck]   

Re: [PATCH 5/8] btrfs-progs: Wire up delayed refs

2018-09-05 Thread Qu Wenruo



On 2018/9/5 下午3:41, Nikolay Borisov wrote:
> 
> 
> On  5.09.2018 08:53, Qu Wenruo wrote:
>>
>>
>> On 2018/9/5 下午1:42, Nikolay Borisov wrote:
>>>
>>>
>>> On  5.09.2018 05:10, Qu Wenruo wrote:


 On 2018/8/16 下午9:10, Nikolay Borisov wrote:
> This commit enables the delayed refs infrastructures. This entails doing
> the following:
>
> 1. Replacing existing calls of btrfs_extent_post_op (which is the
> equivalent of delayed refs) with the proper btrfs_run_delayed_refs.
> As well as eliminating open-coded calls to finish_current_insert and
> del_pending_extents which execute the delayed ops.
>
> 2. Wiring up the addition of delayed refs when freeing extents
> (btrfs_free_extent) and when adding new extents (alloc_tree_block).
>
> 3. Adding calls to btrfs_run_delayed refs in the transaction commit
> path alongside comments why every call is needed, since it's not always
> obvious (those call sites were derived empirically by running and
> debugging existing tests)
>
> 4. Correctly flagging the transaction in which we are reinitialising
> the extent tree.
>
> 5 Moving btrfs_write_dirty_block_groups to btrfs_write_dirty_block_groups
> since blockgroups should be written to disk after the last delayed refs
> have been run.
>
> Signed-off-by: Nikolay Borisov 
> Signed-off-by: David Sterba 

 Is there something (maybe btrfs_run_delayed_refs()?) missing in 
 btrfs-image?

 btrfs-image from devel branch can't restore image correctly, the block
 group used bytes is not correct, thus it can't pass misc nor fsck tests.
>>>
>>> This is really strange, all fsck/misc tests passed with those patches.
>>> Can you be more specific which tests exactly you mean ?
>>
>> One case is fsck/020 with lowmem mode. (Original mode lacks block
>> group->used check).
>>
>> More specifically, fsck/020/keyed_data_ref_with_shared_leaf.img
>>
>> Using btrfs-image from my distribution (v4.17.1) and devel branch btrfs
>> check: (cwd is btrfs-progs, devel branch)
>>
>> $ btrfs-image -r
>> tests/fsck-tests/020-extent-ref-cases/keyed_data_ref_with_shared_leaf.img 
>> ~/test.img
>> $ btrfs check --mode=wmem ~/test.img
>> Opening filesystem to check...
>> Checking filesystem on /home/adam/test.img
>> UUID: 12dabcf2-d4da-4a70-9701-9f3d48074e73
>> [1/7] checking root items
>> [2/7] checking extents
>> [3/7] checking free space cache
>> [4/7] checking fs roots
>> [5/7] checking only csums items (without verifying data)
>> [6/7] checking root refs done with fs roots in lowmem mode, skipping
>> [7/7] checking quota groups skipped (not enabled on this FS)
>> found 1208320 bytes used, no error found
>> total csum bytes: 512
>> total tree bytes: 684032
>> total fs tree bytes: 638976
>> total extent tree bytes: 16384
>> btree space waste bytes: 305606
>> file data blocks allocated: 93847552
>>  referenced 1773568
>>
>> But if using btrfs-image with your delayed ref patch:
>> $ ./btrfs-image -r
>> tests/fsck-tests/020-extent-ref-cases/keyed_data_ref_with_shared_leaf.img 
>> ~/test.img
>>
>> # No matter if I'm using btrfs-check from devel or 4.17.1
>> $ btrfs check --mode=wmem ~/test.img
>> Opening filesystem to check...
>> Checking filesystem on /home/adam/test.img
>> UUID: 12dabcf2-d4da-4a70-9701-9f3d48074e73
>> [1/7] checking root items
>> [2/7] checking extents
>> ERROR: block group[4194304 8388608] used 20480 but extent items used 24576
>> ERROR: block group[20971520 16777216] used 659456 but extent items used
>> 655360
>> ERROR: errors found in extent allocation tree or chunk allocation
>> [3/7] checking free space cache
>> [4/7] checking fs roots
>> [5/7] checking only csums items (without verifying data)
>> [6/7] checking root refs done with fs roots in lowmem mode, skipping
>> [7/7] checking quota groups skipped (not enabled on this FS)
>> found 1208320 bytes used, error(s) found
>> total csum bytes: 512
>> total tree bytes: 684032
>> total fs tree bytes: 638976
>> total extent tree bytes: 16384
>> btree space waste bytes: 305606
>> file data blocks allocated: 93847552
>>  referenced 1773568
>>
>> I'd say, although lowmem check is still far from perfect, it indeed has
>> extra checks original mode lacks, and in this case it indeed exposes
>> problem.
> 
> 
> I'm not able to reproduce it: 
> 
> make TEST_ENABLE_OVERRIDE=ue TEST_ARGS_CHECK="--mode=lowmem"  test-fsck
> [TEST]   fsck-tests.sh
> [TEST/fsck]   001-bad-file-extent-bytenr
> [TEST/fsck]   002-bad-transid
> [TEST/fsck]   003-shift-offsets
> [TEST/fsck]   004-no-dir-index
> [TEST/fsck]   005-bad-item-offset
> [TEST/fsck]   006-bad-root-items
> [TEST/fsck]   007-bad-offset-snapshots
> [TEST/fsck]   008-bad-dir-index-name
> [TEST/fsck]   009-no-dir-item-or-index
> [TEST/fsck]   010-no-rootdir-inode-item
> [TEST/fsck]   011-no-inode-item
> [TEST/fsck]   012-leaf-corruption
> [TEST/fsck]   

Re: [PATCH 1/4] btrfs-progs: print-tree: Skip deprecated blockptr / nodesize output

2018-09-05 Thread Nikolay Borisov



On  5.09.2018 09:29, Qu Wenruo wrote:
> When printing tree nodes, we output slots like:
> key (EXTENT_TREE ROOT_ITEM 0) block 73625600 (17975) gen 16
> 
> The number in the parentheses is blockptr / nodesize.
> 
> However this number doesn't really do any thing useful.
> And in fact for unaligned metadata block group (block group start bytenr
> is not aligned to 16K), the number doesn't even make sense as it's
> rounded down.
> 
> In factor kernel doesn't ever output such divided result in its
> print-tree.c
> 
> Remove it so later reader won't wonder what the number means.
> 
> Signed-off-by: Qu Wenruo 

Reviewed-by: Nikolay Borisov 

> ---
>  print-tree.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/print-tree.c b/print-tree.c
> index a09ecfbb28f0..31f6fa12522f 100644
> --- a/print-tree.c
> +++ b/print-tree.c
> @@ -1420,9 +1420,8 @@ void btrfs_print_tree(struct extent_buffer *eb, int 
> follow)
>   btrfs_disk_key_to_cpu(, _key);
>   printf("\t");
>   btrfs_print_key(_key);
> - printf(" block %llu (%llu) gen %llu\n",
> + printf(" block %llu gen %llu\n",
>  (unsigned long long)blocknr,
> -(unsigned long long)blocknr / eb->len,
>  (unsigned long long)btrfs_node_ptr_generation(eb, i));
>   fflush(stdout);
>   }
> 


Re: [PATCH 2/4] btrfs-progs: Replace root parameter using fs_info for reada_for_search()

2018-09-05 Thread Nikolay Borisov



On  5.09.2018 09:29, Qu Wenruo wrote:
> As the @root parameter is only used to get @fs_info, use fs_info
> directly instead.
> 
> Signed-off-by: Qu Wenruo 

Reviewed-by: Nikolay Borisov 

> ---
>  cmds-restore.c |  4 ++--
>  ctree.c| 11 +--
>  ctree.h|  4 ++--
>  3 files changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/cmds-restore.c b/cmds-restore.c
> index d12c1a924059..30ea8a7e93d1 100644
> --- a/cmds-restore.c
> +++ b/cmds-restore.c
> @@ -259,7 +259,7 @@ again:
>   }
>  
>   if (path->reada)
> - reada_for_search(root, path, level, slot, 0);
> + reada_for_search(fs_info, path, level, slot, 0);
>  
>   next = read_node_slot(fs_info, c, slot);
>   if (extent_buffer_uptodate(next))
> @@ -276,7 +276,7 @@ again:
>   if (!level)
>   break;
>   if (path->reada)
> - reada_for_search(root, path, level, 0, 0);
> + reada_for_search(fs_info, path, level, 0, 0);
>   next = read_node_slot(fs_info, next, 0);
>   if (!extent_buffer_uptodate(next))
>   goto again;
> diff --git a/ctree.c b/ctree.c
> index d8a6883aa85f..042fae19344d 100644
> --- a/ctree.c
> +++ b/ctree.c
> @@ -1000,10 +1000,9 @@ static int noinline push_nodes_for_insert(struct 
> btrfs_trans_handle *trans,
>  /*
>   * readahead one full node of leaves
>   */
> -void reada_for_search(struct btrfs_root *root, struct btrfs_path *path,
> -  int level, int slot, u64 objectid)
> +void reada_for_search(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
> +   int level, int slot, u64 objectid)
>  {
> - struct btrfs_fs_info *fs_info = root->fs_info;
>   struct extent_buffer *node;
>   struct btrfs_disk_key disk_key;
>   u32 nritems;
> @@ -1203,7 +1202,7 @@ again:
>   break;
>  
>   if (should_reada)
> - reada_for_search(root, p, level, slot,
> + reada_for_search(fs_info, p, level, slot,
>key->objectid);
>  
>   b = read_node_slot(fs_info, b, slot);
> @@ -2902,7 +2901,7 @@ int btrfs_next_leaf(struct btrfs_root *root, struct 
> btrfs_path *path)
>   }
>  
>   if (path->reada)
> - reada_for_search(root, path, level, slot, 0);
> + reada_for_search(fs_info, path, level, slot, 0);
>  
>   next = read_node_slot(fs_info, c, slot);
>   if (!extent_buffer_uptodate(next))
> @@ -2919,7 +2918,7 @@ int btrfs_next_leaf(struct btrfs_root *root, struct 
> btrfs_path *path)
>   if (!level)
>   break;
>   if (path->reada)
> - reada_for_search(root, path, level, 0, 0);
> + reada_for_search(fs_info, path, level, 0, 0);
>   next = read_node_slot(fs_info, next, 0);
>   if (!extent_buffer_uptodate(next))
>   return -EIO;
> diff --git a/ctree.h b/ctree.h
> index 4719962df67d..6df6075865c2 100644
> --- a/ctree.h
> +++ b/ctree.h
> @@ -2562,8 +2562,8 @@ btrfs_check_node(struct btrfs_root *root, struct 
> btrfs_disk_key *parent_key,
>  enum btrfs_tree_block_status
>  btrfs_check_leaf(struct btrfs_root *root, struct btrfs_disk_key *parent_key,
>struct extent_buffer *buf);
> -void reada_for_search(struct btrfs_root *root, struct btrfs_path *path,
> -  int level, int slot, u64 objectid);
> +void reada_for_search(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
> +   int level, int slot, u64 objectid);
>  struct extent_buffer *read_node_slot(struct btrfs_fs_info *fs_info,
>  struct extent_buffer *parent, int slot);
>  int btrfs_previous_item(struct btrfs_root *root,
> 


Re: [PATCH 5/8] btrfs-progs: Wire up delayed refs

2018-09-05 Thread Nikolay Borisov



On  5.09.2018 08:53, Qu Wenruo wrote:
> 
> 
> On 2018/9/5 下午1:42, Nikolay Borisov wrote:
>>
>>
>> On  5.09.2018 05:10, Qu Wenruo wrote:
>>>
>>>
>>> On 2018/8/16 下午9:10, Nikolay Borisov wrote:
 This commit enables the delayed refs infrastructures. This entails doing
 the following:

 1. Replacing existing calls of btrfs_extent_post_op (which is the
 equivalent of delayed refs) with the proper btrfs_run_delayed_refs.
 As well as eliminating open-coded calls to finish_current_insert and
 del_pending_extents which execute the delayed ops.

 2. Wiring up the addition of delayed refs when freeing extents
 (btrfs_free_extent) and when adding new extents (alloc_tree_block).

 3. Adding calls to btrfs_run_delayed refs in the transaction commit
 path alongside comments why every call is needed, since it's not always
 obvious (those call sites were derived empirically by running and
 debugging existing tests)

 4. Correctly flagging the transaction in which we are reinitialising
 the extent tree.

 5 Moving btrfs_write_dirty_block_groups to btrfs_write_dirty_block_groups
 since blockgroups should be written to disk after the last delayed refs
 have been run.

 Signed-off-by: Nikolay Borisov 
 Signed-off-by: David Sterba 
>>>
>>> Is there something (maybe btrfs_run_delayed_refs()?) missing in btrfs-image?
>>>
>>> btrfs-image from devel branch can't restore image correctly, the block
>>> group used bytes is not correct, thus it can't pass misc nor fsck tests.
>>
>> This is really strange, all fsck/misc tests passed with those patches.
>> Can you be more specific which tests exactly you mean ?
> 
> One case is fsck/020 with lowmem mode. (Original mode lacks block
> group->used check).
> 
> More specifically, fsck/020/keyed_data_ref_with_shared_leaf.img
> 
> Using btrfs-image from my distribution (v4.17.1) and devel branch btrfs
> check: (cwd is btrfs-progs, devel branch)
> 
> $ btrfs-image -r
> tests/fsck-tests/020-extent-ref-cases/keyed_data_ref_with_shared_leaf.img 
> ~/test.img
> $ btrfs check --mode=lowmem ~/test.img
> Opening filesystem to check...
> Checking filesystem on /home/adam/test.img
> UUID: 12dabcf2-d4da-4a70-9701-9f3d48074e73
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space cache
> [4/7] checking fs roots
> [5/7] checking only csums items (without verifying data)
> [6/7] checking root refs done with fs roots in lowmem mode, skipping
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 1208320 bytes used, no error found
> total csum bytes: 512
> total tree bytes: 684032
> total fs tree bytes: 638976
> total extent tree bytes: 16384
> btree space waste bytes: 305606
> file data blocks allocated: 93847552
>  referenced 1773568
> 
> But if using btrfs-image with your delayed ref patch:
> $ ./btrfs-image -r
> tests/fsck-tests/020-extent-ref-cases/keyed_data_ref_with_shared_leaf.img 
> ~/test.img
> 
> # No matter if I'm using btrfs-check from devel or 4.17.1
> $ btrfs check --mode=lowmem ~/test.img
> Opening filesystem to check...
> Checking filesystem on /home/adam/test.img
> UUID: 12dabcf2-d4da-4a70-9701-9f3d48074e73
> [1/7] checking root items
> [2/7] checking extents
> ERROR: block group[4194304 8388608] used 20480 but extent items used 24576
> ERROR: block group[20971520 16777216] used 659456 but extent items used
> 655360
> ERROR: errors found in extent allocation tree or chunk allocation
> [3/7] checking free space cache
> [4/7] checking fs roots
> [5/7] checking only csums items (without verifying data)
> [6/7] checking root refs done with fs roots in lowmem mode, skipping
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 1208320 bytes used, error(s) found
> total csum bytes: 512
> total tree bytes: 684032
> total fs tree bytes: 638976
> total extent tree bytes: 16384
> btree space waste bytes: 305606
> file data blocks allocated: 93847552
>  referenced 1773568
> 
> I'd say, although lowmem check is still far from perfect, it indeed has
> extra checks original mode lacks, and in this case it indeed exposes
> problem.


I'm not able to reproduce it: 

make TEST_ENABLE_OVERRIDE=true TEST_ARGS_CHECK="--mode=lowmem"  test-fsck
[TEST]   fsck-tests.sh
[TEST/fsck]   001-bad-file-extent-bytenr
[TEST/fsck]   002-bad-transid
[TEST/fsck]   003-shift-offsets
[TEST/fsck]   004-no-dir-index
[TEST/fsck]   005-bad-item-offset
[TEST/fsck]   006-bad-root-items
[TEST/fsck]   007-bad-offset-snapshots
[TEST/fsck]   008-bad-dir-index-name
[TEST/fsck]   009-no-dir-item-or-index
[TEST/fsck]   010-no-rootdir-inode-item
[TEST/fsck]   011-no-inode-item
[TEST/fsck]   012-leaf-corruption
[TEST/fsck]   013-extent-tree-rebuild
[TEST/fsck]   014-no-extent-info
[TEST/fsck]   015-tree-reloc-tree
[TEST/fsck]   016-wrong-inode-nbytes
[TEST/fsck]   017-missing-all-file-extent
[TEST/fsck]   

Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-09-05 Thread Su Yue




On 09/04/2018 04:24 AM, Christoph Anton Mitterer wrote:

Hey.


On Fri, 2018-08-31 at 10:33 +0800, Su Yue wrote:

Can you please fetch btrfs-progs from my repo and run lowmem check
in readonly?
Repo: https://github.com/Damenly/btrfs-progs/tree/lowmem_debug
It's based on v4.17.1 plus additonal output for debug only.


I've adapted your patch to 4.17 from Debian (i.e. not the 4.17.1).


First I ran it again with the pristine 4.17 from Debian:
# btrfs check --mode=lowmem /dev/mapper/system ; echo $?
Checking filesystem on /dev/mapper/system
UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c
checking extents
checking free space cache
checking fs roots
ERROR: errors found in fs roots
found 435924422656 bytes used, error(s) found
total csum bytes: 423418948
total tree bytes: 2218328064
total fs tree bytes: 1557168128
total extent tree bytes: 125894656
btree space waste bytes: 429599230
file data blocks allocated: 5193373646848
  referenced 555255164928
[ 1248.687628] [ cut here ]
[ 1248.688352] generic_make_request: Trying to write to read-only block-device 
dm-0 (partno 0)
[ 1248.689127] WARNING: CPU: 3 PID: 933 at 
/build/linux-LgHyGB/linux-4.17.17/block/blk-core.c:2180 
generic_make_request_checks+0x43d/0x610
[ 1248.689909] Modules linked in: dm_crypt algif_skcipher af_alg dm_mod 
snd_hda_codec_hdmi snd_hda_codec_realtek intel_rapl snd_hda_codec_generic 
x86_pkg_temp_thermal intel_powerclamp i915 iwlwifi btusb coretemp btrtl btbcm 
uvcvideo kvm_intel snd_hda_intel btintel videobuf2_vmalloc bluetooth 
snd_hda_codec kvm videobuf2_memops videobuf2_v4l2 videobuf2_common cfg80211 
snd_hda_core irqbypass videodev jitterentropy_rng drm_kms_helper 
crct10dif_pclmul snd_hwdep crc32_pclmul drbg ghash_clmulni_intel intel_cstate 
snd_pcm ansi_cprng ppdev intel_uncore drm media ecdh_generic iTCO_wdt snd_timer 
iTCO_vendor_support rtsx_pci_ms crc16 snd intel_rapl_perf memstick joydev 
mei_me rfkill evdev soundcore sg parport_pc pcspkr serio_raw fujitsu_laptop mei 
i2c_algo_bit parport shpchp sparse_keymap pcc_cpufreq lpc_ich button
[ 1248.693639]  video battery ac ip_tables x_tables autofs4 btrfs 
zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic 
raid1 raid0 multipath linear md_mod sd_mod uas usb_storage crc32c_intel 
rtsx_pci_sdmmc mmc_core ahci xhci_pci libahci aesni_intel ehci_pci aes_x86_64 
libata crypto_simd xhci_hcd ehci_hcd cryptd glue_helper psmouse i2c_i801 
scsi_mod rtsx_pci e1000e usbcore usb_common
[ 1248.696956] CPU: 3 PID: 933 Comm: btrfs Not tainted 4.17.0-3-amd64 #1 Debian 
4.17.17-1
[ 1248.698118] Hardware name: FUJITSU LIFEBOOK E782/FJNB253, BIOS Version 2.11 
07/15/2014
[ 1248.699299] RIP: 0010:generic_make_request_checks+0x43d/0x610
[ 1248.700495] RSP: 0018:ac89827c7d88 EFLAGS: 00010286
[ 1248.701702] RAX:  RBX: 98f4848a9200 RCX: 0006
[ 1248.702930] RDX: 0007 RSI: 0082 RDI: 98f49e2d6730
[ 1248.704170] RBP: 98f484f6d460 R08: 033e R09: 00aa
[ 1248.705422] R10: ac89827c7e60 R11:  R12: 
[ 1248.706675] R13: 0001 R14:  R15: 
[ 1248.707928] FS:  7f92842018c0() GS:98f49e2c() 
knlGS:
[ 1248.709190] CS:  0010 DS:  ES:  CR0: 80050033
[ 1248.710448] CR2: 55fc6fe1a5b0 CR3: 000407f62001 CR4: 001606e0
[ 1248.711707] Call Trace:
[ 1248.712960]  ? do_writepages+0x4b/0xe0
[ 1248.714201]  ? blkdev_readpages+0x20/0x20
[ 1248.715441]  ? do_writepages+0x4b/0xe0
[ 1248.716684]  generic_make_request+0x64/0x400
[ 1248.717935]  ? finish_wait+0x80/0x80
[ 1248.719181]  ? mempool_alloc+0x67/0x1a0
[ 1248.720425]  ? submit_bio+0x6c/0x140
[ 1248.721663]  submit_bio+0x6c/0x140
[ 1248.722902]  submit_bio_wait+0x53/0x80
[ 1248.724139]  blkdev_issue_flush+0x7c/0xb0
[ 1248.725377]  blkdev_fsync+0x2f/0x40
[ 1248.726612]  do_fsync+0x38/0x60
[ 1248.727849]  __x64_sys_fsync+0x10/0x20
[ 1248.729086]  do_syscall_64+0x55/0x110
[ 1248.730323]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1248.731565] RIP: 0033:0x7f928354d161
[ 1248.732805] RSP: 002b:7ffd35e3f5d8 EFLAGS: 0246 ORIG_RAX: 
004a
[ 1248.734067] RAX: ffda RBX: 55fc09c0c260 RCX: 7f928354d161
[ 1248.735342] RDX: 55fc09c13e28 RSI: 55fc0899f820 RDI: 0004
[ 1248.736614] RBP: 55fc09c0c2d0 R08: 0005 R09: 55fc09c0da70
[ 1248.738001] R10: 009e R11: 0246 R12: 
[ 1248.739272] R13: 55fc0899d213 R14: 55fc09c0c290 R15: 0001
[ 1248.740542] Code: 24 54 03 00 00 48 8d 74 24 08 48 89 df c6 05 3e 03 d9 00 01 e8 
d5 63 01 00 44 89 e2 48 89 c6 48 c7 c7 80 e1 e6 ad e8 a3 4e d1 ff <0f> 0b 4c 8b 
63 08 e9 7b fc ff ff 80 3d 15 03 d9 00 00 0f 85 94
[ 1248.741909] ---[ end trace c2f580dbd579028c ]---
1

Not really sure why 

[PATCH] fstests: btrfs/149 make it sectorsize independent

2018-09-05 Thread Anand Jain
Originally this test case was designed to work with only 4K sectorsize.
Now enhance it to work with any sector sizes and makes the following
changes:
Output file not to contain any traces of sector size.
Use max_inline=0 mount option so that it meets the requisite of non inline
regular extent.
Don't log the md5sum results to the output file as the data size vary by
the sectorsize.

Signed-off-by: Anand Jain 
---
 common/btrfs|  7 +++
 common/filter   |  5 +
 tests/btrfs/149 | 29 -
 tests/btrfs/149.out | 12 ++--
 4 files changed, 38 insertions(+), 15 deletions(-)

diff --git a/common/btrfs b/common/btrfs
index 79c687f73376..e6a218d6b63a 100644
--- a/common/btrfs
+++ b/common/btrfs
@@ -367,3 +367,10 @@ _run_btrfs_balance_start()
 
run_check $BTRFS_UTIL_PROG balance start $bal_opt $*
 }
+
+#return the sector size of the btrfs scratch fs
+_scratch_sectorsize()
+{
+   $BTRFS_UTIL_PROG inspect-internal dump-super $SCRATCH_DEV |\
+   grep sectorsize | awk '{print $2}'
+}
diff --git a/common/filter b/common/filter
index 3965c2eb752b..e87740ddda3f 100644
--- a/common/filter
+++ b/common/filter
@@ -271,6 +271,11 @@ _filter_xfs_io_pages_modified()
_filter_xfs_io_units_modified "Page" $PAGE_SIZE
 }
 
+_filter_xfs_io_numbers()
+{
+_filter_xfs_io | sed -E 's/[0-9]+//g'
+}
+
 _filter_test_dir()
 {
# TEST_DEV may be a prefix of TEST_DIR (e.g. /mnt, /mnt/ovl-mnt)
diff --git a/tests/btrfs/149 b/tests/btrfs/149
index 3e955a305e0f..3958fa844c8b 100755
--- a/tests/btrfs/149
+++ b/tests/btrfs/149
@@ -44,21 +44,27 @@ rm -fr $send_files_dir
 mkdir $send_files_dir
 
 _scratch_mkfs >>$seqres.full 2>&1
-_scratch_mount "-o compress"
+# On 64K pagesize systems the compression is more efficient, so max_inline
+# helps to create regular (non inline) extent irrespective of the final
+# write size.
+_scratch_mount "-o compress -o max_inline=0"
 
 # Write to our file using direct IO, so that this way the write ends up not
 # getting compressed, that is, we get a regular extent which is neither
 # inlined nor compressed.
 # Alternatively, we could have mounted the fs without compression enabled,
 # which would result as well in an uncompressed regular extent.
-$XFS_IO_PROG -f -d -c "pwrite -S 0xab 0 4K" $SCRATCH_MNT/foobar | 
_filter_xfs_io
+sectorsize=$(_scratch_sectorsize)
+$XFS_IO_PROG -f -d -c "pwrite -S 0xab 0 $sectorsize" $SCRATCH_MNT/foobar |\
+   _filter_xfs_io_numbers
 
 $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \
$SCRATCH_MNT/mysnap1 > /dev/null
 
 # Clone the regular (not inlined) extent.
-$XFS_IO_PROG -c "reflink $SCRATCH_MNT/foobar 0 8K 4K" $SCRATCH_MNT/foobar \
-   | _filter_xfs_io
+$XFS_IO_PROG -c \
+   "reflink $SCRATCH_MNT/foobar 0 $((2 * $sectorsize)) $sectorsize" \
+   $SCRATCH_MNT/foobar | _filter_xfs_io_numbers
 
 $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \
$SCRATCH_MNT/mysnap2 > /dev/null
@@ -76,21 +82,26 @@ $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 -f 
$send_files_dir/2.snap \
 $SCRATCH_MNT/mysnap2 2>&1 >/dev/null | _filter_scratch
 
 echo "File digests in the original filesystem:"
-md5sum $SCRATCH_MNT/mysnap1/foobar | _filter_scratch
-md5sum $SCRATCH_MNT/mysnap2/foobar | _filter_scratch
+sum_src_snap1=$(md5sum $SCRATCH_MNT/mysnap1/foobar | awk '{print $1}')
+sum_src_snap2=$(md5sum $SCRATCH_MNT/mysnap2/foobar | awk '{print $1}')
+echo "src checksum created"
 
 # Now recreate the filesystem by receiving both send streams and verify we get
 # the same file content that the original filesystem had.
 _scratch_unmount
 _scratch_mkfs >>$seqres.full 2>&1
-_scratch_mount "-o compress"
+_scratch_mount "-o compress,max_inline=0"
 
 $BTRFS_UTIL_PROG receive -f $send_files_dir/1.snap $SCRATCH_MNT > /dev/null
 $BTRFS_UTIL_PROG receive -f $send_files_dir/2.snap $SCRATCH_MNT > /dev/null
 
 echo "File digests in the new filesystem:"
-md5sum $SCRATCH_MNT/mysnap1/foobar | _filter_scratch
-md5sum $SCRATCH_MNT/mysnap2/foobar | _filter_scratch
+sum_dest_snap1=$(md5sum $SCRATCH_MNT/mysnap1/foobar | awk '{print $1}')
+sum_dest_snap2=$(md5sum $SCRATCH_MNT/mysnap2/foobar | awk '{print $1}')
+echo "dest checksum created"
+
+[[ $sum_src_snap1 == $sum_dest_snap1 ]] && echo "src and dest checksum matched"
+[[ $sum_src_snap2 == $sum_dest_snap2 ]] && echo "src and dest checksum matched"
 
 status=0
 exit
diff --git a/tests/btrfs/149.out b/tests/btrfs/149.out
index 303de928d35a..6ba251799ff2 100644
--- a/tests/btrfs/149.out
+++ b/tests/btrfs/149.out
@@ -1,14 +1,14 @@
 QA output created by 149
-wrote 4096/4096 bytes at offset 0
+wrote / bytes at offset 
 XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-linked 4096/4096 bytes at offset 8192
+linked / bytes at offset 
 XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 At subvol SCRATCH_MNT/mysnap1
 At subvol SCRATCH_MNT/mysnap2
 File digests in the original filesystem:

[PATCH 4/4] btrfs-progs: print-tree: Use breadth-first search for btrfs_print_tree()

2018-09-05 Thread Qu Wenruo
btrfs_print_tree() uses depth-first search to print a subtree, it works
fine until we have 3 level tree.

In that case, leaves and nodes will be printed in a depth-first order,
making it pretty hard to locate level 1 nodes.

This patch will use breadth-first search for btrfs_print_tree().
It will use btrfs_path::lowest_level to indicate current level, and
print out tree blocks level by level (breadth-first).

Signed-off-by: Qu Wenruo 
---
 print-tree.c | 99 ++--
 1 file changed, 73 insertions(+), 26 deletions(-)

diff --git a/print-tree.c b/print-tree.c
index 31f6fa12522f..0509ec3da46e 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -1381,6 +1381,78 @@ void btrfs_print_leaf(struct extent_buffer *eb)
}
 }
 
+/* Helper function to reach the most left tree block at @path->lowest_level */
+static int search_leftmost_tree_block(struct btrfs_fs_info *fs_info,
+ struct btrfs_path *path, int root_level)
+{
+   int i;
+   int ret = 0;
+
+   /* Release all nodes expect path->nodes[root_level] */
+   for (i = 0; i < root_level; i++) {
+   path->slots[i] = 0;
+   if (!path->nodes[i])
+   continue;
+   free_extent_buffer(path->nodes[i]);
+   }
+
+   /* Reach the leftmost tree block by always reading out slot 0 */
+   for (i = root_level; i > path->lowest_level; i--) {
+   struct extent_buffer *eb;
+
+   path->slots[i] = 0;
+   eb = read_node_slot(fs_info, path->nodes[i], 0);
+   if (!extent_buffer_uptodate(eb)) {
+   ret = -EIO;
+   goto out;
+   }
+   path->nodes[i - 1] = eb;
+   }
+out:
+   return ret;
+}
+
+static void bfs_print_children(struct extent_buffer *root_eb)
+{
+   struct btrfs_fs_info *fs_info = root_eb->fs_info;
+   struct btrfs_path path;
+   int root_level = btrfs_header_level(root_eb);
+   int cur_level;
+   int ret;
+
+   if (root_level < 1)
+   return;
+
+   btrfs_init_path();
+   /* For path */
+   extent_buffer_get(root_eb);
+   path.nodes[root_level] = root_eb;
+
+   for (cur_level = root_level - 1; cur_level >= 0; cur_level--) {
+   path.lowest_level = cur_level;
+
+   /* Use the leftmost tree block as start point */
+   ret = search_leftmost_tree_block(fs_info, , root_level);
+   if (ret < 0)
+   goto out;
+
+   /* Print all sibling tree blocks */
+   while (1) {
+   btrfs_print_tree(path.nodes[cur_level], 0);
+   ret = btrfs_next_sibling_tree_block(fs_info, );
+   if (ret < 0)
+   goto out;
+   if (ret > 0) {
+   ret = 0;
+   break;
+   }
+   }
+   }
+out:
+   btrfs_release_path();
+   return;
+}
+
 void btrfs_print_tree(struct extent_buffer *eb, int follow)
 {
u32 i;
@@ -1389,7 +1461,6 @@ void btrfs_print_tree(struct extent_buffer *eb, int 
follow)
struct btrfs_fs_info *fs_info = eb->fs_info;
struct btrfs_disk_key disk_key;
struct btrfs_key key;
-   struct extent_buffer *next;
 
if (!eb)
return;
@@ -1431,30 +1502,6 @@ void btrfs_print_tree(struct extent_buffer *eb, int 
follow)
if (follow && !fs_info)
return;
 
-   for (i = 0; i < nr; i++) {
-   next = read_tree_block(fs_info,
-   btrfs_node_blockptr(eb, i),
-   btrfs_node_ptr_generation(eb, i));
-   if (!extent_buffer_uptodate(next)) {
-   fprintf(stderr, "failed to read %llu in tree %llu\n",
-   (unsigned long long)btrfs_node_blockptr(eb, i),
-   (unsigned long long)btrfs_header_owner(eb));
-   continue;
-   }
-   if (btrfs_header_level(next) != btrfs_header_level(eb) - 1) {
-   warning(
-"eb corrupted: parent bytenr %llu slot %d level %d child bytenr %llu level has 
%d expect %d, skipping the slot",
-   btrfs_header_bytenr(eb), i,
-   btrfs_header_level(eb),
-   btrfs_header_bytenr(next),
-   btrfs_header_level(next),
-   btrfs_header_level(eb) - 1);
-   free_extent_buffer(next);
-   continue;
-   }
-   btrfs_print_tree(next, 1);
-   free_extent_buffer(next);
-   }
-
+   bfs_print_children(eb);
return;
 }
-- 
2.18.0



[PATCH 2/4] btrfs-progs: Replace root parameter using fs_info for reada_for_search()

2018-09-05 Thread Qu Wenruo
As the @root parameter is only used to get @fs_info, use fs_info
directly instead.

Signed-off-by: Qu Wenruo 
---
 cmds-restore.c |  4 ++--
 ctree.c| 11 +--
 ctree.h|  4 ++--
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/cmds-restore.c b/cmds-restore.c
index d12c1a924059..30ea8a7e93d1 100644
--- a/cmds-restore.c
+++ b/cmds-restore.c
@@ -259,7 +259,7 @@ again:
}
 
if (path->reada)
-   reada_for_search(root, path, level, slot, 0);
+   reada_for_search(fs_info, path, level, slot, 0);
 
next = read_node_slot(fs_info, c, slot);
if (extent_buffer_uptodate(next))
@@ -276,7 +276,7 @@ again:
if (!level)
break;
if (path->reada)
-   reada_for_search(root, path, level, 0, 0);
+   reada_for_search(fs_info, path, level, 0, 0);
next = read_node_slot(fs_info, next, 0);
if (!extent_buffer_uptodate(next))
goto again;
diff --git a/ctree.c b/ctree.c
index d8a6883aa85f..042fae19344d 100644
--- a/ctree.c
+++ b/ctree.c
@@ -1000,10 +1000,9 @@ static int noinline push_nodes_for_insert(struct 
btrfs_trans_handle *trans,
 /*
  * readahead one full node of leaves
  */
-void reada_for_search(struct btrfs_root *root, struct btrfs_path *path,
-int level, int slot, u64 objectid)
+void reada_for_search(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
+ int level, int slot, u64 objectid)
 {
-   struct btrfs_fs_info *fs_info = root->fs_info;
struct extent_buffer *node;
struct btrfs_disk_key disk_key;
u32 nritems;
@@ -1203,7 +1202,7 @@ again:
break;
 
if (should_reada)
-   reada_for_search(root, p, level, slot,
+   reada_for_search(fs_info, p, level, slot,
 key->objectid);
 
b = read_node_slot(fs_info, b, slot);
@@ -2902,7 +2901,7 @@ int btrfs_next_leaf(struct btrfs_root *root, struct 
btrfs_path *path)
}
 
if (path->reada)
-   reada_for_search(root, path, level, slot, 0);
+   reada_for_search(fs_info, path, level, slot, 0);
 
next = read_node_slot(fs_info, c, slot);
if (!extent_buffer_uptodate(next))
@@ -2919,7 +2918,7 @@ int btrfs_next_leaf(struct btrfs_root *root, struct 
btrfs_path *path)
if (!level)
break;
if (path->reada)
-   reada_for_search(root, path, level, 0, 0);
+   reada_for_search(fs_info, path, level, 0, 0);
next = read_node_slot(fs_info, next, 0);
if (!extent_buffer_uptodate(next))
return -EIO;
diff --git a/ctree.h b/ctree.h
index 4719962df67d..6df6075865c2 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2562,8 +2562,8 @@ btrfs_check_node(struct btrfs_root *root, struct 
btrfs_disk_key *parent_key,
 enum btrfs_tree_block_status
 btrfs_check_leaf(struct btrfs_root *root, struct btrfs_disk_key *parent_key,
 struct extent_buffer *buf);
-void reada_for_search(struct btrfs_root *root, struct btrfs_path *path,
-int level, int slot, u64 objectid);
+void reada_for_search(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
+ int level, int slot, u64 objectid);
 struct extent_buffer *read_node_slot(struct btrfs_fs_info *fs_info,
   struct extent_buffer *parent, int slot);
 int btrfs_previous_item(struct btrfs_root *root,
-- 
2.18.0



[PATCH 3/4] btrfs-progs: Introduce function to find next sibling tree block

2018-09-05 Thread Qu Wenruo
Introduce a new function, btrfs_next_sibling_tree_block(), which could
find any sibling tree blocks at path->lowest_level, unlike level 0
limited btrfs_next_leaf().

Since this function is more generic than btrfs_next_leaf(), also make
btrfs_next_leaf() a wrapper of btrfs_next_sibling_tree_block(), to keep
the interface the same as kernel.

This would provide the basis for later breadth-first search print-tree.

Signed-off-by: Qu Wenruo 
---
 ctree.c | 14 +-
 ctree.h | 15 ++-
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/ctree.c b/ctree.c
index 042fae19344d..43d47f19c9cd 100644
--- a/ctree.c
+++ b/ctree.c
@@ -2875,18 +2875,22 @@ int btrfs_prev_leaf(struct btrfs_root *root, struct 
btrfs_path *path)
 }
 
 /*
- * walk up the tree as far as required to find the next leaf.
+ * walk up the tree as far as required to find the next sibling tree block.
+ * more generic version of btrfs_next_leaf(), as it could find sibling nodes
+ * if @path->lowest_level is not 0.
+ *
  * returns 0 if it found something or 1 if there are no greater leaves.
  * returns < 0 on io errors.
  */
-int btrfs_next_leaf(struct btrfs_root *root, struct btrfs_path *path)
+int btrfs_next_sibling_tree_block(struct btrfs_fs_info *fs_info,
+ struct btrfs_path *path)
 {
int slot;
-   int level = 1;
+   int level = path->lowest_level + 1;
struct extent_buffer *c;
struct extent_buffer *next = NULL;
-   struct btrfs_fs_info *fs_info = root->fs_info;
 
+   BUG_ON(path->lowest_level + 1 >= BTRFS_MAX_LEVEL);
while(level < BTRFS_MAX_LEVEL) {
if (!path->nodes[level])
return 1;
@@ -2915,7 +2919,7 @@ int btrfs_next_leaf(struct btrfs_root *root, struct 
btrfs_path *path)
free_extent_buffer(c);
path->nodes[level] = next;
path->slots[level] = 0;
-   if (!level)
+   if (level == path->lowest_level)
break;
if (path->reada)
reada_for_search(fs_info, path, level, 0, 0);
diff --git a/ctree.h b/ctree.h
index 6df6075865c2..939c584d0301 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2633,7 +2633,20 @@ static inline int btrfs_insert_empty_item(struct 
btrfs_trans_handle *trans,
return btrfs_insert_empty_items(trans, root, path, key, _size, 1);
 }
 
-int btrfs_next_leaf(struct btrfs_root *root, struct btrfs_path *path);
+int btrfs_next_sibling_tree_block(struct btrfs_fs_info *fs_info,
+ struct btrfs_path *path);
+/*
+ * walk up the tree as far as required to find the next leaf.
+ * returns 0 if it found something or 1 if there are no greater leaves.
+ * returns < 0 on io errors.
+ */
+static inline int btrfs_next_leaf(struct btrfs_root *root,
+ struct btrfs_path *path)
+{
+   path->lowest_level = 0;
+   return btrfs_next_sibling_tree_block(root->fs_info, path);
+}
+
 static inline int btrfs_next_item(struct btrfs_root *root,
  struct btrfs_path *p)
 {
-- 
2.18.0



[PATCH 0/4] btrfs-progs: print-tree: breadth-first tree print order

2018-09-05 Thread Qu Wenruo
This patchset can be fetched from github:
https://github.com/adam900710/btrfs-progs/tree/dump_tree_enhance

The main point of this patchset is to make "btrfs ins dump-tree" to
print tree blocks in breadth-first order when level is higher than 2.

The 1st patch is just a minor cleanup, to remove some unused and
meaningless output.

The 2nd patch does a root<->fs_info cleanup, provides the basis for
later btrfs_next_sibling_tree_block().

The 3rd patch implements a new function, btrfs_next_sibling_tree_block()
to find next sibling tree block, other than leaf.

The final patch will implement BFS for btrfs_print_tree().
The BFS search itself is implemented using path along with
 path::lowest_level and btrfs_next_sibling_tree_block() to iterate all
sibling tree blocks in a level.

Since BFS order is more human-friendly for higher trees, use BFS to
replace DFS order directly.

Qu Wenruo (4):
  btrfs-progs: print-tree: Skip deprecated blockptr / nodesize output
  btrfs-progs: Replace root parameter using fs_info for
reada_for_search()
  btrfs-progs: Introduce function to find next sibling tree block
  btrfs-progs: print-tree: Use breadth-first search for
btrfs_print_tree()

 cmds-restore.c |   4 +-
 ctree.c|  25 ++--
 ctree.h|  19 +++--
 print-tree.c   | 102 +++--
 4 files changed, 106 insertions(+), 44 deletions(-)

-- 
2.18.0



[PATCH 1/4] btrfs-progs: print-tree: Skip deprecated blockptr / nodesize output

2018-09-05 Thread Qu Wenruo
When printing tree nodes, we output slots like:
key (EXTENT_TREE ROOT_ITEM 0) block 73625600 (17975) gen 16

The number in the parentheses is blockptr / nodesize.

However this number doesn't really do any thing useful.
And in fact for unaligned metadata block group (block group start bytenr
is not aligned to 16K), the number doesn't even make sense as it's
rounded down.

In factor kernel doesn't ever output such divided result in its
print-tree.c

Remove it so later reader won't wonder what the number means.

Signed-off-by: Qu Wenruo 
---
 print-tree.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/print-tree.c b/print-tree.c
index a09ecfbb28f0..31f6fa12522f 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -1420,9 +1420,8 @@ void btrfs_print_tree(struct extent_buffer *eb, int 
follow)
btrfs_disk_key_to_cpu(, _key);
printf("\t");
btrfs_print_key(_key);
-   printf(" block %llu (%llu) gen %llu\n",
+   printf(" block %llu gen %llu\n",
   (unsigned long long)blocknr,
-  (unsigned long long)blocknr / eb->len,
   (unsigned long long)btrfs_node_ptr_generation(eb, i));
fflush(stdout);
}
-- 
2.18.0



Re: [PATCH v2] Btrfs: remove confusing tracepoint in btrfs_add_reserved_bytes

2018-09-05 Thread Nikolay Borisov



On  5.09.2018 04:55, Liu Bo wrote:
> Here we're not releasing any space, but transferring bytes from
> ->bytes_may_use to ->bytes_reserved.
> 
> Signed-off-by: Liu Bo 

Reviewed-by: Nikolay Borisov 

> ---
> v2: Add missing commit log.
> 
>  fs/btrfs/extent-tree.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 41a02cbb5a4a..76ee5ebef2b9 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -6401,10 +6401,6 @@ static int btrfs_add_reserved_bytes(struct 
> btrfs_block_group_cache *cache,
>   } else {
>   cache->reserved += num_bytes;
>   space_info->bytes_reserved += num_bytes;
> -
> - trace_btrfs_space_reservation(cache->fs_info,
> - "space_info", space_info->flags,
> - ram_bytes, 0);
>   space_info->bytes_may_use -= ram_bytes;
>   if (delalloc)
>   cache->delalloc_bytes += num_bytes;
>