On Tue, Oct 20, 2015 at 04:29:46PM +0300, Timofey Titovets wrote:
> For performance reason, leave data at the start of disk, is preferable
> while deduping
> It's might sense for the reasons:
> 1. Spinning rust - start of the disk is much faster
> 2. Btrfs can deallocate empty data chunk from the end of fs - ie it's compact 
> fs

"src" is the extent that is kept, and "dst" is the extent that is
discarded.  When both extents are shared, the dedup userspace has to
pass a common "src" with many different "dst" over several extent-same
calls in order to get rid of all of the references to the "dst" extent.

If "src" and "dst" are arbitrarily swapped over multiple extent-same calls
then it becomes impossible to dedup shared extents.  Heck, if there are
more than two extents even in one extent-same call then it stops working.

It would be possible to have dedup figure out which extent the kernel
picked after the fact, but that's totally unnecessary extra work in
cases where the userspace has a good reason to pick the extents it did
(e.g. administrator hints about future usage of the files where the
extents were found).

Dedup userspace can figure out the physical addresses of the extents
and rearrange the arguments itself if desired.

> Signed-off-by: Timofey Titovets <nefelim...@gmail.com>
> ---
>  fs/btrfs/ioctl.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 3e3e613..3eb77c0 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3074,8 +3074,13 @@ static int btrfs_extent_same(struct inode *src,
> u64 loff, u64 olen,
> 
>   /* pass original length for comparison so we stay within i_size */
>   ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
> - if (ret == 0)
> -     ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> + if (ret == 0) {
> +     /* prefer inode with lowest offset as source for clone*/
> +     if (loff > dest_loff)
> +         ret = btrfs_clone(dst, src, dst_loff, olen, len, loff, 1);
> +     else
> +         ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> + }
> 
>   if (same_inode)
>   unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
> -- 
> 2.6.1

> From 5ed3822bc308c726d91a837fbd97ebacaa51e58d Mon Sep 17 00:00:00 2001
> From: Timofey Titovets <nefelim...@gmail.com>
> Date: Tue, 20 Oct 2015 15:53:20 +0300
> Subject: [RFC PATCH] btrfs/ioctl.c: Prefer inode with lowest offset as source 
> for
>  clone
> 
> For performance reason, leave data at the start of disk, is preferable
> 
> Signed-off-by: Timofey Titovets <nefelim...@gmail.com>
> ---
>  fs/btrfs/ioctl.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 3e3e613..3eb77c0 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3074,8 +3074,13 @@ static int btrfs_extent_same(struct inode *src, u64 
> loff, u64 olen,
>  
>       /* pass original length for comparison so we stay within i_size */
>       ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
> -     if (ret == 0)
> -             ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> +     if (ret == 0) {
> +             /* prefer inode with lowest offset as source for clone*/
> +             if (loff > dest_loff)
> +                     ret = btrfs_clone(dst, src, dst_loff, olen, len, loff, 
> 1);
> +             else
> +                     ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 
> 1);
> +     }
>  
>       if (same_inode)
>               unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
> -- 
> 2.6.1
> 

Attachment: signature.asc
Description: Digital signature

Reply via email to