Re: out-of-band dedup status?

2016-12-09 Thread Chris Murphy
On Fri, Dec 9, 2016 at 11:16 AM, Darrick J. Wong
 wrote:
> [adding mark fasheh (duperemove maintainer) to cc]
>
> On Fri, Dec 09, 2016 at 07:29:21AM -0500, Austin S. Hemmelgarn wrote:
>> On 2016-12-08 21:54, Chris Murphy wrote:
>> >On Thu, Dec 8, 2016 at 7:26 PM, Darrick J. Wong  
>> >wrote:
>> >>On Thu, Dec 08, 2016 at 05:45:40PM -0700, Chris Murphy wrote:
>> >>>OK something's wrong.
>> >>>
>> >>>Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system
>> >>>(mkfs.btrfs -dsingle -msingle, default mount options) and two
>> >>>identical files separately copied.
>> >>>
>> >>>[chris@f25s]$ ls -li /mnt/test
>> >>>total 2811904
>> >>>260 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
>> >>>Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
>> >>>259 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
>> >>>Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
>> >>>
>> >>>[chris@f25s]$ filefrag /mnt/test/*
>> >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found
>> >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found
>> >>>
>> >>>
>> >>>[chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/*
>> >>>Using 128K blocks
>> >>>Using hash: murmur3
>> >>>Gathering file list...
>> >>>Using 4 threads for file hashing phase
>> >>>[1/2] (50.00%) csum: 
>> >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
>> >>>[2/2] (100.00%) csum: 
>> >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
>> >>>Total files:  2
>> >>>Total hashes: 21968
>> >>>Loading only duplicated hashes from hashfile.
>> >>>Using 4 threads for dedupe phase
>> >>>[0xba8400] (1/10947) Try to dedupe extents with id e47862ea
>> >>>[0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2
>> >>>[0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd
>> >>>[0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64
>> >>>[0xba8540] Add extent for file
>> >>>"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
>> >>>1182924800 (4)
>> >>>[0xba8540] Add extent for file
>> >>>"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
>> >>>1182924800 (5)
>> >>>[0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800,
>> >>>131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
>> >>
>> >>Ew, it's deduping these two 1.4GB files 128K at a time, which results in
>> >>12000 ioctl calls.  Each of those 12000 calls has to lock the two
>> >>inodes, read the file contents, remap the blocks, etc.  instead of
>> >>finding the maximal identical range and making a single call for the
>> >>whole range.
>> >>
>> >>That's probably why it's taking forever to dedupe.
>> >
>> >Yes but it looks like it's also heavily fragmenting the files as a
>> >result as well.
>
> I'm not sure why btrfs has that behavior... XFS doesn't do that, and
> evidently there's a bug in ocfs2 such that it sometimes merges records
> and sometimes does not.  Hmm, I'll have to take a second look at ocfs2.

I don't know if it's a kernel regression or a duperemove regression,
but I'm reasonably certain it's a regression because I used kernel
circa 4.6 and duperemove 0.10 in June and it did not do this; or at
the least it was not this verbose with thousands of entries per file
even with -v. I must've deduped 300GiB inside of 30 minutes. So for
two 1.4GiB ISOs to take more than 10 minutes to dedupe is not at all
what I'd expect.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: out-of-band dedup status?

2016-12-09 Thread Darrick J. Wong
[adding mark fasheh (duperemove maintainer) to cc]

On Fri, Dec 09, 2016 at 07:29:21AM -0500, Austin S. Hemmelgarn wrote:
> On 2016-12-08 21:54, Chris Murphy wrote:
> >On Thu, Dec 8, 2016 at 7:26 PM, Darrick J. Wong  
> >wrote:
> >>On Thu, Dec 08, 2016 at 05:45:40PM -0700, Chris Murphy wrote:
> >>>OK something's wrong.
> >>>
> >>>Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system
> >>>(mkfs.btrfs -dsingle -msingle, default mount options) and two
> >>>identical files separately copied.
> >>>
> >>>[chris@f25s]$ ls -li /mnt/test
> >>>total 2811904
> >>>260 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
> >>>Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
> >>>259 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
> >>>Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
> >>>
> >>>[chris@f25s]$ filefrag /mnt/test/*
> >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found
> >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found
> >>>
> >>>
> >>>[chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/*
> >>>Using 128K blocks
> >>>Using hash: murmur3
> >>>Gathering file list...
> >>>Using 4 threads for file hashing phase
> >>>[1/2] (50.00%) csum: 
> >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
> >>>[2/2] (100.00%) csum: 
> >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
> >>>Total files:  2
> >>>Total hashes: 21968
> >>>Loading only duplicated hashes from hashfile.
> >>>Using 4 threads for dedupe phase
> >>>[0xba8400] (1/10947) Try to dedupe extents with id e47862ea
> >>>[0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2
> >>>[0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd
> >>>[0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64
> >>>[0xba8540] Add extent for file
> >>>"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
> >>>1182924800 (4)
> >>>[0xba8540] Add extent for file
> >>>"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> >>>1182924800 (5)
> >>>[0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800,
> >>>131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
> >>
> >>Ew, it's deduping these two 1.4GB files 128K at a time, which results in
> >>12000 ioctl calls.  Each of those 12000 calls has to lock the two
> >>inodes, read the file contents, remap the blocks, etc.  instead of
> >>finding the maximal identical range and making a single call for the
> >>whole range.
> >>
> >>That's probably why it's taking forever to dedupe.
> >
> >Yes but it looks like it's also heavily fragmenting the files as a
> >result as well.

I'm not sure why btrfs has that behavior... XFS doesn't do that, and
evidently there's a bug in ocfs2 such that it sometimes merges records
and sometimes does not.  Hmm, I'll have to take a second look at ocfs2.

> This kind of reinforces what I've been telling people recently, namely that
> while generic batch deduplication generally works, it's quite often better
> to do a custom tool that understands your data-set and knows how to handle
> it efficiently.
> 
> As an example, one of the cases where I use deduplication is on a set of
> directories that are disjoint sets of a larger tree.  So, the directories
> look something like this:
> + a
> | + file1
> | \ file2
> + b
> | + file3
> | \ file2
> \ c
>   + file1
>   \ file3
> 
> In this case, I know that if a/file1 and c/file1 have the same mtime and
> size, they're (supposed to be) copies of the same file.  Given this, the
> tool I use for this just checks for duplicate names with the same size and
> mtime, and then counts on the ioctl's check to verify that the files are
> actually identical (and throws a warning if they aren't), and does some
> special stuff to submit things such that any given file both has the fewest
> possible number of extents and all the extents are roughly the same size.
> On average, even with the fancy extent size calculation logic, this still
> takes less than a quarter of the time that duperemove took on the same
> data-set.

It sure would be nice if duperemove could group all the files that are
the same size and perform whole-file dedupe on the identical ones
instead of doing everything chunk by chunk, particularly since all three
filesystems can actually handle that case.

--D
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: out-of-band dedup status?

2016-12-09 Thread Austin S. Hemmelgarn

On 2016-12-08 21:54, Chris Murphy wrote:

On Thu, Dec 8, 2016 at 7:26 PM, Darrick J. Wong  wrote:

On Thu, Dec 08, 2016 at 05:45:40PM -0700, Chris Murphy wrote:

OK something's wrong.

Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system
(mkfs.btrfs -dsingle -msingle, default mount options) and two
identical files separately copied.

[chris@f25s]$ ls -li /mnt/test
total 2811904
260 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
259 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2

[chris@f25s]$ filefrag /mnt/test/*
/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found
/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found


[chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/*
Using 128K blocks
Using hash: murmur3
Gathering file list...
Using 4 threads for file hashing phase
[1/2] (50.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
[2/2] (100.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
Total files:  2
Total hashes: 21968
Loading only duplicated hashes from hashfile.
Using 4 threads for dedupe phase
[0xba8400] (1/10947) Try to dedupe extents with id e47862ea
[0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2
[0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd
[0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64
[0xba8540] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
1182924800 (4)
[0xba8540] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
1182924800 (5)
[0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800,
131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"


Ew, it's deduping these two 1.4GB files 128K at a time, which results in
12000 ioctl calls.  Each of those 12000 calls has to lock the two
inodes, read the file contents, remap the blocks, etc.  instead of
finding the maximal identical range and making a single call for the
whole range.

That's probably why it's taking forever to dedupe.


Yes but it looks like it's also heavily fragmenting the files as a
result as well.

This kind of reinforces what I've been telling people recently, namely 
that while generic batch deduplication generally works, it's quite often 
better to do a custom tool that understands your data-set and knows how 
to handle it efficiently.


As an example, one of the cases where I use deduplication is on a set of 
directories that are disjoint sets of a larger tree.  So, the 
directories look something like this:

+ a
| + file1
| \ file2
+ b
| + file3
| \ file2
\ c
  + file1
  \ file3

In this case, I know that if a/file1 and c/file1 have the same mtime and 
size, they're (supposed to be) copies of the same file.  Given this, the 
tool I use for this just checks for duplicate names with the same size 
and mtime, and then counts on the ioctl's check to verify that the files 
are actually identical (and throws a warning if they aren't), and does 
some special stuff to submit things such that any given file both has 
the fewest possible number of extents and all the extents are roughly 
the same size.  On average, even with the fancy extent size calculation 
logic, this still takes less than a quarter of the time that duperemove 
took on the same data-set.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: out-of-band dedup status?

2016-12-09 Thread Adam Borowski
On Thu, Dec 08, 2016 at 03:15:38PM -0500, Jeff Mahoney wrote:
> On 12/8/16 1:36 PM, Christoph Anton Mitterer wrote:
> > I just wondered whether out-of-band/"offline" dedup is safe for general
> > use... https://btrfs.wiki.kernel.org/index.php/Status kinda implies so
> > (it tells about unspecified performance issues), but this seems again
> > already outdated (kernel 4.7)...
> 
> SUSE supports it in SLE12 using our 3.12 and 4.4 -based kernels.  There
> haven't been a lot of changes to the kernel component of it.  It's
> pretty simple: check to see if the ranges are identical between two
> files and then reflink between them.
> 
> > Any other things in terms of possible issues, data corruption, etc.
> > that one should know when using deduplication?
> 
> There shouldn't be.  We haven't had any bug reports at SUSE.

I use it on busy machines on ancient kernels (3.14, one 3.13) without any
hint of problems other than dedupe itself being slow.


Meow!
-- 
u-boot problems can be solved with the help of your old SCSI manuals, the
parts that deal with goat termination.  You need a black-handled knife, and
an appropriate set of candles (number and color matters).  Or was it a
silver-handled knife?  Crap, need to look that up.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: out-of-band dedup status?

2016-12-09 Thread Adam Borowski
On Thu, Dec 08, 2016 at 07:54:39PM -0700, Chris Murphy wrote:
> On Thu, Dec 8, 2016 at 7:26 PM, Darrick J. Wong  
> wrote:
> > Ew, it's deduping these two 1.4GB files 128K at a time, which results in
> > 12000 ioctl calls.  Each of those 12000 calls has to lock the two
> > inodes, read the file contents, remap the blocks, etc.  instead of
> > finding the maximal identical range and making a single call for the
> > whole range.
> >
> > That's probably why it's taking forever to dedupe.
> 
> Yes but it looks like it's also heavily fragmenting the files as a
> result as well.

Thus I think it's better to do whole-file dedupe only, other than in some
special cases (like VM images).  Much simpler, faster and doesn't cause
fragmentation.

-- 
u-boot problems can be solved with the help of your old SCSI manuals, the
parts that deal with goat termination.  You need a black-handled knife, and
an appropriate set of candles (number and color matters).  Or was it a
silver-handled knife?  Crap, need to look that up.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: out-of-band dedup status?

2016-12-08 Thread Chris Murphy
On Thu, Dec 8, 2016 at 7:26 PM, Darrick J. Wong  wrote:
> On Thu, Dec 08, 2016 at 05:45:40PM -0700, Chris Murphy wrote:
>> OK something's wrong.
>>
>> Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system
>> (mkfs.btrfs -dsingle -msingle, default mount options) and two
>> identical files separately copied.
>>
>> [chris@f25s]$ ls -li /mnt/test
>> total 2811904
>> 260 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
>> Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
>> 259 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
>> Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
>>
>> [chris@f25s]$ filefrag /mnt/test/*
>> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found
>> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found
>>
>>
>> [chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/*
>> Using 128K blocks
>> Using hash: murmur3
>> Gathering file list...
>> Using 4 threads for file hashing phase
>> [1/2] (50.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
>> [2/2] (100.00%) csum: 
>> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
>> Total files:  2
>> Total hashes: 21968
>> Loading only duplicated hashes from hashfile.
>> Using 4 threads for dedupe phase
>> [0xba8400] (1/10947) Try to dedupe extents with id e47862ea
>> [0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2
>> [0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd
>> [0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64
>> [0xba8540] Add extent for file
>> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
>> 1182924800 (4)
>> [0xba8540] Add extent for file
>> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
>> 1182924800 (5)
>> [0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800,
>> 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
>
> Ew, it's deduping these two 1.4GB files 128K at a time, which results in
> 12000 ioctl calls.  Each of those 12000 calls has to lock the two
> inodes, read the file contents, remap the blocks, etc.  instead of
> finding the maximal identical range and making a single call for the
> whole range.
>
> That's probably why it's taking forever to dedupe.

Yes but it looks like it's also heavily fragmenting the files as a
result as well.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: out-of-band dedup status?

2016-12-08 Thread Darrick J. Wong
On Thu, Dec 08, 2016 at 05:45:40PM -0700, Chris Murphy wrote:
> OK something's wrong.
> 
> Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system
> (mkfs.btrfs -dsingle -msingle, default mount options) and two
> identical files separately copied.
> 
> [chris@f25s]$ ls -li /mnt/test
> total 2811904
> 260 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
> Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
> 259 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
> Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
> 
> [chris@f25s]$ filefrag /mnt/test/*
> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found
> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found
> 
> 
> [chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/*
> Using 128K blocks
> Using hash: murmur3
> Gathering file list...
> Using 4 threads for file hashing phase
> [1/2] (50.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
> [2/2] (100.00%) csum: 
> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
> Total files:  2
> Total hashes: 21968
> Loading only duplicated hashes from hashfile.
> Using 4 threads for dedupe phase
> [0xba8400] (1/10947) Try to dedupe extents with id e47862ea
> [0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2
> [0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd
> [0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64
> [0xba8540] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
> 1182924800 (4)
> [0xba8540] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 1182924800 (5)
> [0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800,
> 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"

Ew, it's deduping these two 1.4GB files 128K at a time, which results in
12000 ioctl calls.  Each of those 12000 calls has to lock the two
inodes, read the file contents, remap the blocks, etc.  instead of
finding the maximal identical range and making a single call for the
whole range.

That's probably why it's taking forever to dedupe.

--D

> [0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64
> [0xba84a0] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
> 543293440 (4)
> [0xba84a0] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 543293440 (5)
> [0xba84a0] Dedupe 1 extents (id: ffed44f2) with target: (543293440,
> 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
> [0xba8540] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 1182924800 (5)
> [0xba8540] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
> 1182924800 (4)
> [0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800,
> 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2"
> [0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2
> [0xba84a0] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 543293440 (5)
> [0xba84a0] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
> 543293440 (4)
> [0xba84a0] Dedupe 1 extents (id: ffed44f2) with target: (543293440,
> 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2"
> [0xba84f0] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
> 101580800 (4)
> [0xba84f0] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 101580800 (5)
> [0xba84f0] Dedupe 1 extents (id: ffeefcdd) with target: (101580800,
> 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
> [0xba84a0] (5/10947) Try to dedupe extents with id ffe24eaf
> [0xba84a0] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
> 171835392 (4)
> [0xba84a0] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 171835392 (5)
> [0xba84a0] Dedupe 1 extents (id: ffe24eaf) with target: (171835392,
> 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
> [0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd
> [0xba8540] (6/10947) Try to dedupe extents with id ffe116c8
> [0xba8400] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
> 52035584 (4)
> [0xba8400] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 52035584 (5)
> [0xba8400] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 52166656 (5)
> [0xba8400] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 60030976 (5)
> [0xba8400] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 

Re: out-of-band dedup status?

2016-12-08 Thread Chris Murphy
OK something's wrong.

Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system
(mkfs.btrfs -dsingle -msingle, default mount options) and two
identical files separately copied.

[chris@f25s]$ ls -li /mnt/test
total 2811904
260 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
259 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2

[chris@f25s]$ filefrag /mnt/test/*
/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found
/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found


[chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/*
Using 128K blocks
Using hash: murmur3
Gathering file list...
Using 4 threads for file hashing phase
[1/2] (50.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
[2/2] (100.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
Total files:  2
Total hashes: 21968
Loading only duplicated hashes from hashfile.
Using 4 threads for dedupe phase
[0xba8400] (1/10947) Try to dedupe extents with id e47862ea
[0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2
[0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd
[0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64
[0xba8540] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
1182924800 (4)
[0xba8540] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
1182924800 (5)
[0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800,
131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
[0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64
[0xba84a0] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
543293440 (4)
[0xba84a0] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
543293440 (5)
[0xba84a0] Dedupe 1 extents (id: ffed44f2) with target: (543293440,
131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
[0xba8540] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
1182924800 (5)
[0xba8540] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
1182924800 (4)
[0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800,
131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2"
[0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2
[0xba84a0] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
543293440 (5)
[0xba84a0] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
543293440 (4)
[0xba84a0] Dedupe 1 extents (id: ffed44f2) with target: (543293440,
131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2"
[0xba84f0] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
101580800 (4)
[0xba84f0] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
101580800 (5)
[0xba84f0] Dedupe 1 extents (id: ffeefcdd) with target: (101580800,
131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
[0xba84a0] (5/10947) Try to dedupe extents with id ffe24eaf
[0xba84a0] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
171835392 (4)
[0xba84a0] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
171835392 (5)
[0xba84a0] Dedupe 1 extents (id: ffe24eaf) with target: (171835392,
131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
[0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd
[0xba8540] (6/10947) Try to dedupe extents with id ffe116c8
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
52035584 (4)
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
52035584 (5)
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
52166656 (5)
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
60030976 (5)
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
60162048 (5)
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
60293120 (5)
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
60424192 (5)
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
60555264 (5)
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
60686336 (5)

[...snip...]

10 minutes later...

[0xba84f0] (06233/10947) Try to dedupe extents with id 703ebf5c
[0xba8400] (06234/10947) Try to dedupe extents with 

Re: out-of-band dedup status?

2016-12-08 Thread Marc Joliet
On Thursday 08 December 2016 13:41:36 Chris Murphy wrote:
> Pretty sure it will not dedupe extents that are referenced in a read
> only subvolume.

I've used duperemove to de-duplicate files in read-only snapshots (of 
different systems) on my backup drive, so unless you're referencing some 
specific issue, I'm pretty sure you're wrong about that.  Maybe you're 
thinking of the occasionally mentioned old dedup kernel implementation?

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


Re: out-of-band dedup status?

2016-12-08 Thread Christoph Anton Mitterer
On Thu, 2016-12-08 at 13:41 -0700, Chris Murphy wrote:
> Pretty sure it will not dedupe extents that are referenced in a read
> only subvolume.

Oh... hm.. well that would be quite some limitation, cause as soon as
one has a snapshot of the full fs (which is probably not so unlikely) i
won't work anymore, cause everything is referenced by the backup ro-
snapshots...

:(


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: out-of-band dedup status?

2016-12-08 Thread Chris Murphy
Pretty sure it will not dedupe extents that are referenced in a read
only subvolume.


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: out-of-band dedup status?

2016-12-08 Thread Jeff Mahoney
On 12/8/16 1:36 PM, Christoph Anton Mitterer wrote:
> Hey.
> 
> I just wondered whether out-of-band/"offline" dedup is safe for general
> use... https://btrfs.wiki.kernel.org/index.php/Status kinda implies so
> (it tells about unspecified performance issues), but this seems again
> already outdated (kernel 4.7)...
> :-(

SUSE supports it in SLE12 using our 3.12 and 4.4 -based kernels.  There
haven't been a lot of changes to the kernel component of it.  It's
pretty simple: check to see if the ranges are identical between two
files and then reflink between them.

> My intention was to use it with duperemove, but AFAIU, the kernel
> itself will anyway do a byte-by-byte comparison before any
> deduplication, so in principle it should be totally safe regardless of
> the stability of the userland tool, right?
> Especially I wouldn't want that "identity" is only assumed because of
> some checksum identity (or collision ;) ).

Yep.  It does a full check in the kernel for precisely that reason.
It's not even enough to do it in userspace because we don't want dedupe
to be race prone.  It's either atomically identical or it's not, and we
don't dedupe if it's not.  If it changes immediately after the ioctl
returns, that's fine -- the cloned range will be CoW'd properly.

> Also, is there anything to take note of when this is used with
> compression and snapshots?

I don't believe so.  IIRC dedupe maps the file to see if it's already
cloned, so it's safe for snapshots (or could relink extents in a
snapshot that diverged and then were restored to their original
contents.  Dedupe works with the uncompressed data, so compression
shouldn't matter here.  I haven't tested it, though.

> What when I use it with incremental send/receive... i.e. I dedupe the
> "master" and then send/receive this to another btrfs... will it work
> (that is will the copy be also deduplicated, with no longer needed
> extents properly being freed)... or at least not cause any corruptions?

It should.  IIRC send also maps the file (using a different mechanism)
and receive will clone those ranges on the other end.

> Any other things in terms of possible issues, data corruption, etc.
> that one should know when using deduplication?

There shouldn't be.  We haven't had any bug reports at SUSE.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


out-of-band dedup status?

2016-12-08 Thread Christoph Anton Mitterer
Hey.

I just wondered whether out-of-band/"offline" dedup is safe for general
use... https://btrfs.wiki.kernel.org/index.php/Status kinda implies so
(it tells about unspecified performance issues), but this seems again
already outdated (kernel 4.7)...
:-(

My intention was to use it with duperemove, but AFAIU, the kernel
itself will anyway do a byte-by-byte comparison before any
deduplication, so in principle it should be totally safe regardless of
the stability of the userland tool, right?
Especially I wouldn't want that "identity" is only assumed because of
some checksum identity (or collision ;) ).

Also, is there anything to take note of when this is used with
compression and snapshots?

What when I use it with incremental send/receive... i.e. I dedupe the
"master" and then send/receive this to another btrfs... will it work
(that is will the copy be also deduplicated, with no longer needed
extents properly being freed)... or at least not cause any corruptions?

Any other things in terms of possible issues, data corruption, etc.
that one should know when using deduplication?


Thanks :)

Chris.

smime.p7s
Description: S/MIME cryptographic signature