On Mon, Jul 25, 2016 at 8:19 AM, Qu Wenruo <quwen...@cn.fujitsu.com> wrote:
> This patch will disable clone detection of send.
>
> The main problem of clone detetion in send is its memory usage and long
> execution time.
>
> The clone detection is done by iterating all backrefs and adding backref
> whose root is the source.
>
> However iterating all backrefs is already quite a bad idea, we should
> never try to do it in a loop, and unfortunately in-band/out-of-band and
> reflink can easily create a file whose file extents are point to the
> same extent.
>
> In that case, btrfs will do backref walk for the same extent again and
> again, until either OOM or soft lockup is triggered.
>
> So disabling clone detection until we find a method that iterates
> backrefs of one extent only once, just like what balance/qgroup is doing.

Is this really a common scenario?
I don't recall ever seeing a report of such slow down or oom due to
the same file pointing to the same extent thousands of times.
Sure it's easy to create such scenario with artificial test data for
our inband deduplication feature (and we should really strive to make
what we have stable and reliable rather than add yet more features).

What needs to be fixed here is the common backref walking code, called
by send plus a few other places (fiemap and some ioctls at least), for
which IIRC there was
some recent patch from one of your colleagues.

Disabling this code makes the problem go away for the scenario of the
same file pointing to the same extent thousands of times (or tens or
hundreds of thousands, whatever).
But what happens if a file points to an extent once but the extent is
referenced by other 100k files (possibly in different snapshots or
subvolumes), isn't it the same problem? The backref code has to find
all such inodes/offsets/roots and take again the same long time...

Following your logic, it seems like everything that uses the backref
walking code should be disabled.

>
> Cc: Filipe Manana <fdman...@gmail.com>
> Reported-by: Tsutomu Itoh <t-i...@jp.fujitsu.com>
> Signed-off-by: Qu Wenruo <quwen...@cn.fujitsu.com>
> ---
>  fs/btrfs/send.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 2db8dc8..eed3f1c 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -1166,6 +1166,7 @@ struct backref_ctx {
>         int found_itself;
>  };
>
> +#if 0
>  static int __clone_root_cmp_bsearch(const void *key, const void *elt)
>  {
>         u64 root = (u64)(uintptr_t)key;
> @@ -1177,6 +1178,7 @@ static int __clone_root_cmp_bsearch(const void *key, 
> const void *elt)
>                 return 1;
>         return 0;
>  }
> +#endif
>
>  static int __clone_root_cmp_sort(const void *e1, const void *e2)
>  {
> @@ -1190,6 +1192,7 @@ static int __clone_root_cmp_sort(const void *e1, const 
> void *e2)
>         return 0;
>  }
>
> +#if 0
>  /*
>   * Called for every backref that is found for the current extent.
>   * Results are collected in sctx->clone_roots->ino/offset/found_refs
> @@ -1445,6 +1448,7 @@ out:
>         kfree(backref_ctx);
>         return ret;
>  }
> +#endif
>
>  static int read_symlink(struct btrfs_root *root,
>                         u64 ino,
> @@ -5291,7 +5295,6 @@ static int process_extent(struct send_ctx *sctx,
>                           struct btrfs_path *path,
>                           struct btrfs_key *key)
>  {
> -       struct clone_root *found_clone = NULL;
>         int ret = 0;
>
>         if (S_ISLNK(sctx->cur_inode_mode))
> @@ -5333,12 +5336,27 @@ static int process_extent(struct send_ctx *sctx,
>                 }
>         }
>
> +       /*
> +        * Current clone detection is both time and memory consuming.
> +        *
> +        * Time consuming is caused by iterating all backref of extent.
> +        * Memory consuming is caused by allocating "found_clone" every
> +        * time for a backref.
> +        *
> +        * XXX: Disabling it is never the best method, but at least it
> +        * won't cause OOM nor super long execution time.
> +        * The root fix needs to change the iteration basis, from iterating
> +        * file extents to iterating extents, so find_parent_nodes() and
> +        * backref walk should be called only once for one extent.
> +        */
> +#if 0
>         ret = find_extent_clone(sctx, path, key->objectid, key->offset,
>                         sctx->cur_inode_size, &found_clone);
>         if (ret != -ENOENT && ret < 0)
>                 goto out;
> +#endif
>
> -       ret = send_write_or_clone(sctx, path, key, found_clone);
> +       ret = send_write_or_clone(sctx, path, key, NULL);
>         if (ret)
>                 goto out;
>  out_hole:
> --
> 2.9.0
>
>
>



-- 
Filipe David Manana,

"People will forget what you said,
 people will forget what you did,
 but people will never forget how you made them feel."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to