Re: out-of-band dedup status?
On Fri, Dec 9, 2016 at 11:16 AM, Darrick J. Wongwrote: > [adding mark fasheh (duperemove maintainer) to cc] > > On Fri, Dec 09, 2016 at 07:29:21AM -0500, Austin S. Hemmelgarn wrote: >> On 2016-12-08 21:54, Chris Murphy wrote: >> >On Thu, Dec 8, 2016 at 7:26 PM, Darrick J. Wong >> >wrote: >> >>On Thu, Dec 08, 2016 at 05:45:40PM -0700, Chris Murphy wrote: >> >>>OK something's wrong. >> >>> >> >>>Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system >> >>>(mkfs.btrfs -dsingle -msingle, default mount options) and two >> >>>identical files separately copied. >> >>> >> >>>[chris@f25s]$ ls -li /mnt/test >> >>>total 2811904 >> >>>260 -rw-r--r--. 1 root root 1439694848 Dec 8 17:26 >> >>>Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso >> >>>259 -rw-r--r--. 1 root root 1439694848 Dec 8 17:26 >> >>>Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2 >> >>> >> >>>[chris@f25s]$ filefrag /mnt/test/* >> >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found >> >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found >> >>> >> >>> >> >>>[chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/* >> >>>Using 128K blocks >> >>>Using hash: murmur3 >> >>>Gathering file list... >> >>>Using 4 threads for file hashing phase >> >>>[1/2] (50.00%) csum: >> >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso >> >>>[2/2] (100.00%) csum: >> >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2 >> >>>Total files: 2 >> >>>Total hashes: 21968 >> >>>Loading only duplicated hashes from hashfile. >> >>>Using 4 threads for dedupe phase >> >>>[0xba8400] (1/10947) Try to dedupe extents with id e47862ea >> >>>[0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2 >> >>>[0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd >> >>>[0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64 >> >>>[0xba8540] Add extent for file >> >>>"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset >> >>>1182924800 (4) >> >>>[0xba8540] Add extent for file >> >>>"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset >> >>>1182924800 (5) >> >>>[0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800, >> >>>131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" >> >> >> >>Ew, it's deduping these two 1.4GB files 128K at a time, which results in >> >>12000 ioctl calls. Each of those 12000 calls has to lock the two >> >>inodes, read the file contents, remap the blocks, etc. instead of >> >>finding the maximal identical range and making a single call for the >> >>whole range. >> >> >> >>That's probably why it's taking forever to dedupe. >> > >> >Yes but it looks like it's also heavily fragmenting the files as a >> >result as well. > > I'm not sure why btrfs has that behavior... XFS doesn't do that, and > evidently there's a bug in ocfs2 such that it sometimes merges records > and sometimes does not. Hmm, I'll have to take a second look at ocfs2. I don't know if it's a kernel regression or a duperemove regression, but I'm reasonably certain it's a regression because I used kernel circa 4.6 and duperemove 0.10 in June and it did not do this; or at the least it was not this verbose with thousands of entries per file even with -v. I must've deduped 300GiB inside of 30 minutes. So for two 1.4GiB ISOs to take more than 10 minutes to dedupe is not at all what I'd expect. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: out-of-band dedup status?
[adding mark fasheh (duperemove maintainer) to cc] On Fri, Dec 09, 2016 at 07:29:21AM -0500, Austin S. Hemmelgarn wrote: > On 2016-12-08 21:54, Chris Murphy wrote: > >On Thu, Dec 8, 2016 at 7:26 PM, Darrick J. Wong> >wrote: > >>On Thu, Dec 08, 2016 at 05:45:40PM -0700, Chris Murphy wrote: > >>>OK something's wrong. > >>> > >>>Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system > >>>(mkfs.btrfs -dsingle -msingle, default mount options) and two > >>>identical files separately copied. > >>> > >>>[chris@f25s]$ ls -li /mnt/test > >>>total 2811904 > >>>260 -rw-r--r--. 1 root root 1439694848 Dec 8 17:26 > >>>Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso > >>>259 -rw-r--r--. 1 root root 1439694848 Dec 8 17:26 > >>>Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2 > >>> > >>>[chris@f25s]$ filefrag /mnt/test/* > >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found > >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found > >>> > >>> > >>>[chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/* > >>>Using 128K blocks > >>>Using hash: murmur3 > >>>Gathering file list... > >>>Using 4 threads for file hashing phase > >>>[1/2] (50.00%) csum: > >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso > >>>[2/2] (100.00%) csum: > >>>/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2 > >>>Total files: 2 > >>>Total hashes: 21968 > >>>Loading only duplicated hashes from hashfile. > >>>Using 4 threads for dedupe phase > >>>[0xba8400] (1/10947) Try to dedupe extents with id e47862ea > >>>[0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2 > >>>[0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd > >>>[0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64 > >>>[0xba8540] Add extent for file > >>>"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset > >>>1182924800 (4) > >>>[0xba8540] Add extent for file > >>>"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset > >>>1182924800 (5) > >>>[0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800, > >>>131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" > >> > >>Ew, it's deduping these two 1.4GB files 128K at a time, which results in > >>12000 ioctl calls. Each of those 12000 calls has to lock the two > >>inodes, read the file contents, remap the blocks, etc. instead of > >>finding the maximal identical range and making a single call for the > >>whole range. > >> > >>That's probably why it's taking forever to dedupe. > > > >Yes but it looks like it's also heavily fragmenting the files as a > >result as well. I'm not sure why btrfs has that behavior... XFS doesn't do that, and evidently there's a bug in ocfs2 such that it sometimes merges records and sometimes does not. Hmm, I'll have to take a second look at ocfs2. > This kind of reinforces what I've been telling people recently, namely that > while generic batch deduplication generally works, it's quite often better > to do a custom tool that understands your data-set and knows how to handle > it efficiently. > > As an example, one of the cases where I use deduplication is on a set of > directories that are disjoint sets of a larger tree. So, the directories > look something like this: > + a > | + file1 > | \ file2 > + b > | + file3 > | \ file2 > \ c > + file1 > \ file3 > > In this case, I know that if a/file1 and c/file1 have the same mtime and > size, they're (supposed to be) copies of the same file. Given this, the > tool I use for this just checks for duplicate names with the same size and > mtime, and then counts on the ioctl's check to verify that the files are > actually identical (and throws a warning if they aren't), and does some > special stuff to submit things such that any given file both has the fewest > possible number of extents and all the extents are roughly the same size. > On average, even with the fancy extent size calculation logic, this still > takes less than a quarter of the time that duperemove took on the same > data-set. It sure would be nice if duperemove could group all the files that are the same size and perform whole-file dedupe on the identical ones instead of doing everything chunk by chunk, particularly since all three filesystems can actually handle that case. --D -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: out-of-band dedup status?
On 2016-12-08 21:54, Chris Murphy wrote: On Thu, Dec 8, 2016 at 7:26 PM, Darrick J. Wongwrote: On Thu, Dec 08, 2016 at 05:45:40PM -0700, Chris Murphy wrote: OK something's wrong. Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system (mkfs.btrfs -dsingle -msingle, default mount options) and two identical files separately copied. [chris@f25s]$ ls -li /mnt/test total 2811904 260 -rw-r--r--. 1 root root 1439694848 Dec 8 17:26 Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso 259 -rw-r--r--. 1 root root 1439694848 Dec 8 17:26 Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2 [chris@f25s]$ filefrag /mnt/test/* /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found [chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/* Using 128K blocks Using hash: murmur3 Gathering file list... Using 4 threads for file hashing phase [1/2] (50.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso [2/2] (100.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2 Total files: 2 Total hashes: 21968 Loading only duplicated hashes from hashfile. Using 4 threads for dedupe phase [0xba8400] (1/10947) Try to dedupe extents with id e47862ea [0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2 [0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd [0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64 [0xba8540] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset 1182924800 (4) [0xba8540] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset 1182924800 (5) [0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800, 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" Ew, it's deduping these two 1.4GB files 128K at a time, which results in 12000 ioctl calls. Each of those 12000 calls has to lock the two inodes, read the file contents, remap the blocks, etc. instead of finding the maximal identical range and making a single call for the whole range. That's probably why it's taking forever to dedupe. Yes but it looks like it's also heavily fragmenting the files as a result as well. This kind of reinforces what I've been telling people recently, namely that while generic batch deduplication generally works, it's quite often better to do a custom tool that understands your data-set and knows how to handle it efficiently. As an example, one of the cases where I use deduplication is on a set of directories that are disjoint sets of a larger tree. So, the directories look something like this: + a | + file1 | \ file2 + b | + file3 | \ file2 \ c + file1 \ file3 In this case, I know that if a/file1 and c/file1 have the same mtime and size, they're (supposed to be) copies of the same file. Given this, the tool I use for this just checks for duplicate names with the same size and mtime, and then counts on the ioctl's check to verify that the files are actually identical (and throws a warning if they aren't), and does some special stuff to submit things such that any given file both has the fewest possible number of extents and all the extents are roughly the same size. On average, even with the fancy extent size calculation logic, this still takes less than a quarter of the time that duperemove took on the same data-set. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: out-of-band dedup status?
On Thu, Dec 08, 2016 at 03:15:38PM -0500, Jeff Mahoney wrote: > On 12/8/16 1:36 PM, Christoph Anton Mitterer wrote: > > I just wondered whether out-of-band/"offline" dedup is safe for general > > use... https://btrfs.wiki.kernel.org/index.php/Status kinda implies so > > (it tells about unspecified performance issues), but this seems again > > already outdated (kernel 4.7)... > > SUSE supports it in SLE12 using our 3.12 and 4.4 -based kernels. There > haven't been a lot of changes to the kernel component of it. It's > pretty simple: check to see if the ranges are identical between two > files and then reflink between them. > > > Any other things in terms of possible issues, data corruption, etc. > > that one should know when using deduplication? > > There shouldn't be. We haven't had any bug reports at SUSE. I use it on busy machines on ancient kernels (3.14, one 3.13) without any hint of problems other than dedupe itself being slow. Meow! -- u-boot problems can be solved with the help of your old SCSI manuals, the parts that deal with goat termination. You need a black-handled knife, and an appropriate set of candles (number and color matters). Or was it a silver-handled knife? Crap, need to look that up. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: out-of-band dedup status?
On Thu, Dec 08, 2016 at 07:54:39PM -0700, Chris Murphy wrote: > On Thu, Dec 8, 2016 at 7:26 PM, Darrick J. Wong> wrote: > > Ew, it's deduping these two 1.4GB files 128K at a time, which results in > > 12000 ioctl calls. Each of those 12000 calls has to lock the two > > inodes, read the file contents, remap the blocks, etc. instead of > > finding the maximal identical range and making a single call for the > > whole range. > > > > That's probably why it's taking forever to dedupe. > > Yes but it looks like it's also heavily fragmenting the files as a > result as well. Thus I think it's better to do whole-file dedupe only, other than in some special cases (like VM images). Much simpler, faster and doesn't cause fragmentation. -- u-boot problems can be solved with the help of your old SCSI manuals, the parts that deal with goat termination. You need a black-handled knife, and an appropriate set of candles (number and color matters). Or was it a silver-handled knife? Crap, need to look that up. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: out-of-band dedup status?
On Thu, Dec 8, 2016 at 7:26 PM, Darrick J. Wongwrote: > On Thu, Dec 08, 2016 at 05:45:40PM -0700, Chris Murphy wrote: >> OK something's wrong. >> >> Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system >> (mkfs.btrfs -dsingle -msingle, default mount options) and two >> identical files separately copied. >> >> [chris@f25s]$ ls -li /mnt/test >> total 2811904 >> 260 -rw-r--r--. 1 root root 1439694848 Dec 8 17:26 >> Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso >> 259 -rw-r--r--. 1 root root 1439694848 Dec 8 17:26 >> Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2 >> >> [chris@f25s]$ filefrag /mnt/test/* >> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found >> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found >> >> >> [chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/* >> Using 128K blocks >> Using hash: murmur3 >> Gathering file list... >> Using 4 threads for file hashing phase >> [1/2] (50.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso >> [2/2] (100.00%) csum: >> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2 >> Total files: 2 >> Total hashes: 21968 >> Loading only duplicated hashes from hashfile. >> Using 4 threads for dedupe phase >> [0xba8400] (1/10947) Try to dedupe extents with id e47862ea >> [0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2 >> [0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd >> [0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64 >> [0xba8540] Add extent for file >> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset >> 1182924800 (4) >> [0xba8540] Add extent for file >> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset >> 1182924800 (5) >> [0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800, >> 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" > > Ew, it's deduping these two 1.4GB files 128K at a time, which results in > 12000 ioctl calls. Each of those 12000 calls has to lock the two > inodes, read the file contents, remap the blocks, etc. instead of > finding the maximal identical range and making a single call for the > whole range. > > That's probably why it's taking forever to dedupe. Yes but it looks like it's also heavily fragmenting the files as a result as well. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: out-of-band dedup status?
On Thu, Dec 08, 2016 at 05:45:40PM -0700, Chris Murphy wrote: > OK something's wrong. > > Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system > (mkfs.btrfs -dsingle -msingle, default mount options) and two > identical files separately copied. > > [chris@f25s]$ ls -li /mnt/test > total 2811904 > 260 -rw-r--r--. 1 root root 1439694848 Dec 8 17:26 > Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso > 259 -rw-r--r--. 1 root root 1439694848 Dec 8 17:26 > Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2 > > [chris@f25s]$ filefrag /mnt/test/* > /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found > /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found > > > [chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/* > Using 128K blocks > Using hash: murmur3 > Gathering file list... > Using 4 threads for file hashing phase > [1/2] (50.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso > [2/2] (100.00%) csum: > /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2 > Total files: 2 > Total hashes: 21968 > Loading only duplicated hashes from hashfile. > Using 4 threads for dedupe phase > [0xba8400] (1/10947) Try to dedupe extents with id e47862ea > [0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2 > [0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd > [0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64 > [0xba8540] Add extent for file > "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset > 1182924800 (4) > [0xba8540] Add extent for file > "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset > 1182924800 (5) > [0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800, > 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" Ew, it's deduping these two 1.4GB files 128K at a time, which results in 12000 ioctl calls. Each of those 12000 calls has to lock the two inodes, read the file contents, remap the blocks, etc. instead of finding the maximal identical range and making a single call for the whole range. That's probably why it's taking forever to dedupe. --D > [0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64 > [0xba84a0] Add extent for file > "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset > 543293440 (4) > [0xba84a0] Add extent for file > "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset > 543293440 (5) > [0xba84a0] Dedupe 1 extents (id: ffed44f2) with target: (543293440, > 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" > [0xba8540] Add extent for file > "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset > 1182924800 (5) > [0xba8540] Add extent for file > "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset > 1182924800 (4) > [0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800, > 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" > [0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2 > [0xba84a0] Add extent for file > "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset > 543293440 (5) > [0xba84a0] Add extent for file > "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset > 543293440 (4) > [0xba84a0] Dedupe 1 extents (id: ffed44f2) with target: (543293440, > 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" > [0xba84f0] Add extent for file > "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset > 101580800 (4) > [0xba84f0] Add extent for file > "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset > 101580800 (5) > [0xba84f0] Dedupe 1 extents (id: ffeefcdd) with target: (101580800, > 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" > [0xba84a0] (5/10947) Try to dedupe extents with id ffe24eaf > [0xba84a0] Add extent for file > "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset > 171835392 (4) > [0xba84a0] Add extent for file > "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset > 171835392 (5) > [0xba84a0] Dedupe 1 extents (id: ffe24eaf) with target: (171835392, > 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" > [0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd > [0xba8540] (6/10947) Try to dedupe extents with id ffe116c8 > [0xba8400] Add extent for file > "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset > 52035584 (4) > [0xba8400] Add extent for file > "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset > 52035584 (5) > [0xba8400] Add extent for file > "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset > 52166656 (5) > [0xba8400] Add extent for file > "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset > 60030976 (5) > [0xba8400] Add extent for file > "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset >
Re: out-of-band dedup status?
OK something's wrong. Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system (mkfs.btrfs -dsingle -msingle, default mount options) and two identical files separately copied. [chris@f25s]$ ls -li /mnt/test total 2811904 260 -rw-r--r--. 1 root root 1439694848 Dec 8 17:26 Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso 259 -rw-r--r--. 1 root root 1439694848 Dec 8 17:26 Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2 [chris@f25s]$ filefrag /mnt/test/* /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found [chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/* Using 128K blocks Using hash: murmur3 Gathering file list... Using 4 threads for file hashing phase [1/2] (50.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso [2/2] (100.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2 Total files: 2 Total hashes: 21968 Loading only duplicated hashes from hashfile. Using 4 threads for dedupe phase [0xba8400] (1/10947) Try to dedupe extents with id e47862ea [0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2 [0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd [0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64 [0xba8540] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset 1182924800 (4) [0xba8540] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset 1182924800 (5) [0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800, 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" [0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64 [0xba84a0] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset 543293440 (4) [0xba84a0] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset 543293440 (5) [0xba84a0] Dedupe 1 extents (id: ffed44f2) with target: (543293440, 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" [0xba8540] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset 1182924800 (5) [0xba8540] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset 1182924800 (4) [0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800, 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" [0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2 [0xba84a0] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset 543293440 (5) [0xba84a0] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset 543293440 (4) [0xba84a0] Dedupe 1 extents (id: ffed44f2) with target: (543293440, 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" [0xba84f0] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset 101580800 (4) [0xba84f0] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset 101580800 (5) [0xba84f0] Dedupe 1 extents (id: ffeefcdd) with target: (101580800, 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" [0xba84a0] (5/10947) Try to dedupe extents with id ffe24eaf [0xba84a0] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset 171835392 (4) [0xba84a0] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset 171835392 (5) [0xba84a0] Dedupe 1 extents (id: ffe24eaf) with target: (171835392, 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" [0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd [0xba8540] (6/10947) Try to dedupe extents with id ffe116c8 [0xba8400] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset 52035584 (4) [0xba8400] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset 52035584 (5) [0xba8400] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset 52166656 (5) [0xba8400] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset 60030976 (5) [0xba8400] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset 60162048 (5) [0xba8400] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset 60293120 (5) [0xba8400] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset 60424192 (5) [0xba8400] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset 60555264 (5) [0xba8400] Add extent for file "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset 60686336 (5) [...snip...] 10 minutes later... [0xba84f0] (06233/10947) Try to dedupe extents with id 703ebf5c [0xba8400] (06234/10947) Try to dedupe extents with
Re: out-of-band dedup status?
On Thursday 08 December 2016 13:41:36 Chris Murphy wrote: > Pretty sure it will not dedupe extents that are referenced in a read > only subvolume. I've used duperemove to de-duplicate files in read-only snapshots (of different systems) on my backup drive, so unless you're referencing some specific issue, I'm pretty sure you're wrong about that. Maybe you're thinking of the occasionally mentioned old dedup kernel implementation? -- Marc Joliet -- "People who think they know everything really annoy those of us who know we don't" - Bjarne Stroustrup signature.asc Description: This is a digitally signed message part.
Re: out-of-band dedup status?
On Thu, 2016-12-08 at 13:41 -0700, Chris Murphy wrote: > Pretty sure it will not dedupe extents that are referenced in a read > only subvolume. Oh... hm.. well that would be quite some limitation, cause as soon as one has a snapshot of the full fs (which is probably not so unlikely) i won't work anymore, cause everything is referenced by the backup ro- snapshots... :( Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: out-of-band dedup status?
Pretty sure it will not dedupe extents that are referenced in a read only subvolume. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: out-of-band dedup status?
On 12/8/16 1:36 PM, Christoph Anton Mitterer wrote: > Hey. > > I just wondered whether out-of-band/"offline" dedup is safe for general > use... https://btrfs.wiki.kernel.org/index.php/Status kinda implies so > (it tells about unspecified performance issues), but this seems again > already outdated (kernel 4.7)... > :-( SUSE supports it in SLE12 using our 3.12 and 4.4 -based kernels. There haven't been a lot of changes to the kernel component of it. It's pretty simple: check to see if the ranges are identical between two files and then reflink between them. > My intention was to use it with duperemove, but AFAIU, the kernel > itself will anyway do a byte-by-byte comparison before any > deduplication, so in principle it should be totally safe regardless of > the stability of the userland tool, right? > Especially I wouldn't want that "identity" is only assumed because of > some checksum identity (or collision ;) ). Yep. It does a full check in the kernel for precisely that reason. It's not even enough to do it in userspace because we don't want dedupe to be race prone. It's either atomically identical or it's not, and we don't dedupe if it's not. If it changes immediately after the ioctl returns, that's fine -- the cloned range will be CoW'd properly. > Also, is there anything to take note of when this is used with > compression and snapshots? I don't believe so. IIRC dedupe maps the file to see if it's already cloned, so it's safe for snapshots (or could relink extents in a snapshot that diverged and then were restored to their original contents. Dedupe works with the uncompressed data, so compression shouldn't matter here. I haven't tested it, though. > What when I use it with incremental send/receive... i.e. I dedupe the > "master" and then send/receive this to another btrfs... will it work > (that is will the copy be also deduplicated, with no longer needed > extents properly being freed)... or at least not cause any corruptions? It should. IIRC send also maps the file (using a different mechanism) and receive will clone those ranges on the other end. > Any other things in terms of possible issues, data corruption, etc. > that one should know when using deduplication? There shouldn't be. We haven't had any bug reports at SUSE. -Jeff -- Jeff Mahoney SUSE Labs signature.asc Description: OpenPGP digital signature
out-of-band dedup status?
Hey. I just wondered whether out-of-band/"offline" dedup is safe for general use... https://btrfs.wiki.kernel.org/index.php/Status kinda implies so (it tells about unspecified performance issues), but this seems again already outdated (kernel 4.7)... :-( My intention was to use it with duperemove, but AFAIU, the kernel itself will anyway do a byte-by-byte comparison before any deduplication, so in principle it should be totally safe regardless of the stability of the userland tool, right? Especially I wouldn't want that "identity" is only assumed because of some checksum identity (or collision ;) ). Also, is there anything to take note of when this is used with compression and snapshots? What when I use it with incremental send/receive... i.e. I dedupe the "master" and then send/receive this to another btrfs... will it work (that is will the copy be also deduplicated, with no longer needed extents properly being freed)... or at least not cause any corruptions? Any other things in terms of possible issues, data corruption, etc. that one should know when using deduplication? Thanks :) Chris. smime.p7s Description: S/MIME cryptographic signature