Re: Announcing btrfs-dedupe 1.1.0
Yeah, ok, can you create an account on my souce tracker and create an issue for this please? https://gitlab.wellbehavedsoftware.com/well-behaved-software/btrfs-dedupe I am fairly sure I can fix this without too much difficulty. ;-) James On 26/01/17 18:16, Robert Krig wrote: I've tried your binaries, which also seem to work fine on Debian Stretch. (At least using the latest ubuntu xenial binary). I've only run into one little issue, btrfs-dedupe will abort with "Serialization error: invalid value: Path contains invalid UTF-8 characters at line 0 column 0" if I run it on some large top level directories. Unfortunately it doesn't list which directory it has a problem with. Wouldn't it be better if btrfs-dedupe simply ignores directories it has a problem with, and continues with the rest? On 13.01.2017 20:08, James Pharaoh wrote: Did you try the binaries? I can build binaries for other platforms if you let me know what you are interested in. In any case, you'll need to install rust: https://www.rust-lang.org/install.html Which will tell you to do this on Linux, and presumably all unix platforms: curl https://sh.rustup.rs -sSf | sh You can either log in and out or reload your profile to get the installed software in your PATH: source ~/.profile Then you can checkout btrfs-dedupe, eg from my gitlab public https, I'll assume you have git installed: git clone https://gitlab.wellbehavedsoftware.com/well-behaved-software/btrfs-dedupe.git Then cd in and build using cargo: cd btrfs-dedupe cargo build --release There is basically just one binary which will end up in target/release/btrfs-dedupe. I'll add these instructions to the README later. James On 13/01/17 13:56, Robert Krig wrote: Hi, could you include some build instructions for people that are unfamiliar with compiling rust code? On 08.01.2017 17:57, James Pharaoh wrote: Hi everyone, I'm pleased to announce a new version of my btrfs-dedupe tool, written in rust, available here: http://btrfs-dedupe.com/ Binary packages built on ubuntu (probably will work elsewhere, but haven't tried this), are available at: https://dist.wellbehavedsoftware.com/btrfs-dedupe/ This version is considered ready for production use. It maintains a compressed database of the filesystem state, and it tracks file metadata, hashes file contents, and the extent-map contents, in order to work out what needs to be deduplicated. This is a whole-file deduplication tool, similar to bedup, but since it is written in Rust, and designed to work with the dedupe ioctl, I think it's more suitable for production use. As normal for open source, this comes without any warranty etc, but the only updates are performed via the defragment and deduplication ioctls, and so assuming they work correctly then this should not cause any corruption. Please feel free to contact me with any questions/problems. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe 1.1.0
I've tried your binaries, which also seem to work fine on Debian Stretch. (At least using the latest ubuntu xenial binary). I've only run into one little issue, btrfs-dedupe will abort with "Serialization error: invalid value: Path contains invalid UTF-8 characters at line 0 column 0" if I run it on some large top level directories. Unfortunately it doesn't list which directory it has a problem with. Wouldn't it be better if btrfs-dedupe simply ignores directories it has a problem with, and continues with the rest? On 13.01.2017 20:08, James Pharaoh wrote: > Did you try the binaries? I can build binaries for other platforms if > you let me know what you are interested in. > > In any case, you'll need to install rust: > > https://www.rust-lang.org/install.html > > Which will tell you to do this on Linux, and presumably all unix > platforms: > > curl https://sh.rustup.rs -sSf | sh > > You can either log in and out or reload your profile to get the > installed software in your PATH: > > source ~/.profile > > Then you can checkout btrfs-dedupe, eg from my gitlab public https, > I'll assume you have git installed: > > git clone > https://gitlab.wellbehavedsoftware.com/well-behaved-software/btrfs-dedupe.git > > Then cd in and build using cargo: > > cd btrfs-dedupe > cargo build --release > > There is basically just one binary which will end up in > target/release/btrfs-dedupe. > > I'll add these instructions to the README later. > > James > > On 13/01/17 13:56, Robert Krig wrote: >> Hi, could you include some build instructions for people that are >> unfamiliar with compiling rust code? >> >> >> On 08.01.2017 17:57, James Pharaoh wrote: >>> Hi everyone, >>> >>> I'm pleased to announce a new version of my btrfs-dedupe tool, written >>> in rust, available here: >>> >>> http://btrfs-dedupe.com/ >>> >>> Binary packages built on ubuntu (probably will work elsewhere, but >>> haven't tried this), are available at: >>> >>> https://dist.wellbehavedsoftware.com/btrfs-dedupe/ >>> >>> This version is considered ready for production use. It maintains a >>> compressed database of the filesystem state, and it tracks file >>> metadata, hashes file contents, and the extent-map contents, in order >>> to work out what needs to be deduplicated. >>> >>> This is a whole-file deduplication tool, similar to bedup, but since >>> it is written in Rust, and designed to work with the dedupe ioctl, I >>> think it's more suitable for production use. >>> >>> As normal for open source, this comes without any warranty etc, but >>> the only updates are performed via the defragment and deduplication >>> ioctls, and so assuming they work correctly then this should not cause >>> any corruption. >>> >>> Please feel free to contact me with any questions/problems. >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >>> linux-btrfs" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe 1.1.0
Did you try the binaries? I can build binaries for other platforms if you let me know what you are interested in. In any case, you'll need to install rust: https://www.rust-lang.org/install.html Which will tell you to do this on Linux, and presumably all unix platforms: curl https://sh.rustup.rs -sSf | sh You can either log in and out or reload your profile to get the installed software in your PATH: source ~/.profile Then you can checkout btrfs-dedupe, eg from my gitlab public https, I'll assume you have git installed: git clone https://gitlab.wellbehavedsoftware.com/well-behaved-software/btrfs-dedupe.git Then cd in and build using cargo: cd btrfs-dedupe cargo build --release There is basically just one binary which will end up in target/release/btrfs-dedupe. I'll add these instructions to the README later. James On 13/01/17 13:56, Robert Krig wrote: Hi, could you include some build instructions for people that are unfamiliar with compiling rust code? On 08.01.2017 17:57, James Pharaoh wrote: Hi everyone, I'm pleased to announce a new version of my btrfs-dedupe tool, written in rust, available here: http://btrfs-dedupe.com/ Binary packages built on ubuntu (probably will work elsewhere, but haven't tried this), are available at: https://dist.wellbehavedsoftware.com/btrfs-dedupe/ This version is considered ready for production use. It maintains a compressed database of the filesystem state, and it tracks file metadata, hashes file contents, and the extent-map contents, in order to work out what needs to be deduplicated. This is a whole-file deduplication tool, similar to bedup, but since it is written in Rust, and designed to work with the dedupe ioctl, I think it's more suitable for production use. As normal for open source, this comes without any warranty etc, but the only updates are performed via the defragment and deduplication ioctls, and so assuming they work correctly then this should not cause any corruption. Please feel free to contact me with any questions/problems. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe 1.1.0
Hi, could you include some build instructions for people that are unfamiliar with compiling rust code? On 08.01.2017 17:57, James Pharaoh wrote: > Hi everyone, > > I'm pleased to announce a new version of my btrfs-dedupe tool, written > in rust, available here: > > http://btrfs-dedupe.com/ > > Binary packages built on ubuntu (probably will work elsewhere, but > haven't tried this), are available at: > > https://dist.wellbehavedsoftware.com/btrfs-dedupe/ > > This version is considered ready for production use. It maintains a > compressed database of the filesystem state, and it tracks file > metadata, hashes file contents, and the extent-map contents, in order > to work out what needs to be deduplicated. > > This is a whole-file deduplication tool, similar to bedup, but since > it is written in Rust, and designed to work with the dedupe ioctl, I > think it's more suitable for production use. > > As normal for open source, this comes without any warranty etc, but > the only updates are performed via the defragment and deduplication > ioctls, and so assuming they work correctly then this should not cause > any corruption. > > Please feel free to contact me with any questions/problems. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe 1.1.0
It's supposed to be public! Will have to look into that In any case, it's also on github here: https://github.com/wellbehavedsoftware/btrfs-dedupe James On 08/01/17 22:22, j...@mailb.org wrote: hey, On 01/08/2017 05:57 PM, James Pharaoh wrote: As normal for open source where is the source? https://gitlab.wellbehavedsoftware.com/well-behaved-software/btrfs-dedupe does not have a way to browse the code git clone https://gitlab.wellbehavedsoftware.com/well-behaved-software/btrfs-dedupe asks for Username for 'https://gitlab.wellbehavedsoftware.com': the site has a tooltip suggesting otherwise: j -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Announcing btrfs-dedupe 1.1.0
Hi everyone, I'm pleased to announce a new version of my btrfs-dedupe tool, written in rust, available here: http://btrfs-dedupe.com/ Binary packages built on ubuntu (probably will work elsewhere, but haven't tried this), are available at: https://dist.wellbehavedsoftware.com/btrfs-dedupe/ This version is considered ready for production use. It maintains a compressed database of the filesystem state, and it tracks file metadata, hashes file contents, and the extent-map contents, in order to work out what needs to be deduplicated. This is a whole-file deduplication tool, similar to bedup, but since it is written in Rust, and designed to work with the dedupe ioctl, I think it's more suitable for production use. As normal for open source, this comes without any warranty etc, but the only updates are performed via the defragment and deduplication ioctls, and so assuming they work correctly then this should not cause any corruption. Please feel free to contact me with any questions/problems. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On giovedì 17 novembre 2016 04:01:52 CET, Zygo Blaxell wrote: Duperemove does use a lot of memory, but the logs at that URL only show 2G of RAM in duperemove--not nearly enough to trigger OOM under normal conditions on an 8G machine. There's another process with 6G of virtual address space (although much less than that resident) that looks more interesting (i.e. duperemove might just be the victim of some interaction between baloo_file and the OOM killer). Thanks, I killed baloo_file before starting duperemove and it somehow improved (it reached 99.73% before getting killed by OOM killer once again): [ 6342.147251] Purging GPU memory, 0 pages freed, 18268 pages still pinned. [ 6342.147253] 48 and 0 pages still available in the bound and unbound GPU page lists. [ 6342.147340] Xorg invoked oom-killer: gfp_mask=0x240c0d0(GFP_TEMPORARY|__GFP_COMP|__GFP_ZERO), order=3, oom_score_adj=0 [ 6342.147341] Xorg cpuset=/ mems_allowed=0 [ 6342.147346] CPU: 3 PID: 650 Comm: Xorg Not tainted 4.8.8-2-ARCH #1 [ 6342.147347] Hardware name: Dell Inc. XPS 13 9343/0F5KF3, BIOS A09 08/29/2016 [ 6342.147348] 0286 9b89a9c8 88020752f598 812fde10 [ 6342.147351] 88020752f758 8801edc62ac0 88020752f608 81205fa2 [ 6342.147353] 000188020752f5a0 9b89a9c8 [ 6342.147356] Call Trace: [ 6342.147361] [] dump_stack+0x63/0x83 [ 6342.147364] [] dump_header+0x5c/0x1ea [ 6342.147366] [] oom_kill_process+0x265/0x410 [ 6342.147368] [] ? has_capability_noaudit+0x17/0x20 [ 6342.147369] [] out_of_memory+0x380/0x420 [ 6342.147373] [] ? find_next_bit+0x18/0x20 [ 6342.147374] [] __alloc_pages_nodemask+0xda0/0xde0 [ 6342.147377] [] alloc_pages_current+0x95/0x140 [ 6342.147380] [] kmalloc_order_trace+0x2e/0xf0 [ 6342.147382] [] __kmalloc+0x1ea/0x200 [ 6342.147397] [] ? alloc_gen8_temp_bitmaps+0x2e/0x80 [i915] [ 6342.147407] [] alloc_gen8_temp_bitmaps+0x47/0x80 [i915] [ 6342.147417] [] gen8_alloc_va_range_3lvl+0x98/0x9c0 [i915] [ 6342.147419] [] ? shmem_getpage_gfp+0xed/0xc30 [ 6342.147421] [] ? sg_init_table+0x1a/0x40 [ 6342.147423] [] ? swiotlb_map_sg_attrs+0x53/0x130 [ 6342.147432] [] gen8_alloc_va_range+0x256/0x490 [i915] [ 6342.147442] [] i915_vma_bind+0x9b/0x190 [i915] [ 6342.147453] [] i915_gem_object_do_pin+0x86b/0xa90 [i915] [ 6342.147463] [] i915_gem_object_pin+0x2d/0x30 [i915] [ 6342.147472] [] i915_gem_execbuffer_reserve_vma.isra.7+0x9f/0x180 [i915] [ 6342.147482] [] i915_gem_execbuffer_reserve.isra.8+0x396/0x3c0 [i915] [ 6342.147491] [] i915_gem_do_execbuffer.isra.14+0x68b/0x1270 [i915] [ 6342.147493] [] ? unix_stream_read_generic+0x281/0x8a0 [ 6342.147503] [] i915_gem_execbuffer2+0x104/0x270 [i915] [ 6342.147509] [] drm_ioctl+0x200/0x4f0 [drm] [ 6342.147518] [] ? i915_gem_execbuffer+0x330/0x330 [i915] [ 6342.147520] [] ? enqueue_hrtimer+0x3d/0xa0 [ 6342.147522] [] ? timerqueue_del+0x24/0x70 [ 6342.147523] [] ? __remove_hrtimer+0x3c/0x90 [ 6342.147525] [] do_vfs_ioctl+0xa3/0x5f0 [ 6342.147527] [] ? do_setitimer+0x12b/0x230 [ 6342.147529] [] ? __fget+0x77/0xb0 [ 6342.147531] [] SyS_ioctl+0x79/0x90 [ 6342.147533] [] entry_SYSCALL_64_fastpath+0x1a/0xa4 [ 6342.147535] Mem-Info: [ 6342.147538] active_anon:76311 inactive_anon:76782 isolated_anon:0 active_file:347581 inactive_file:1415592 isolated_file:64 unevictable:8 dirty:482 writeback:0 unstable:0 slab_reclaimable:27219 slab_unreclaimable:14772 mapped:20714 shmem:30458 pagetables:10557 bounce:0 free:25642 free_pcp:327 free_cma:0 [ 6342.147541] Node 0 active_anon:305244kB inactive_anon:307128kB active_file:1390324kB inactive_file:5662368kB unevictable:32kB isolated(anon):0kB isolated(file):256kB mapped:82856kB dirty:1928kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 81920kB anon_thp: 121832kB writeback_tmp:0kB unstable:0kB pages_scanned:32 all_unreclaimable? no [ 6342.147542] Node 0 DMA free:15688kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15896kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:208kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 6342.147545] lowmem_reserve[]: 0 3395 7850 7850 7850 [ 6342.147548] Node 0 DMA32 free:48772kB min:29172kB low:36464kB high:43756kB active_anon:84724kB inactive_anon:87164kB active_file:555728kB inactive_file:2639796kB unevictable:0kB writepending:696kB present:3564504kB managed:3488752kB mlocked:0kB slab_reclaimable:47196kB slab_unreclaimable:11472kB kernel_stack:192kB pagetables:200kB bounce:0kB free_pcp:1284kB local_pcp:0kB free_cma:0kB [ 6342.147553] lowmem_reserve[]: 0 0 4454 4454 4454 [ 6342.147555] Node 0 Normal free:38108kB min:38276kB low:47844kB high:57412kB active_anon:220520kB inactive_anon:219964kB active_file:834596kB inactive_
Re: Announcing btrfs-dedupe
On Wed, Nov 16, 2016 at 11:24:33PM +0100, Niccolò Belli wrote: > On martedì 15 novembre 2016 18:52:01 CET, Zygo Blaxell wrote: > >Like I said, millions of extents per week... > > > >64K is an enormous dedup block size, especially if it comes with a 64K > >alignment constraint as well. > > > >These are the top ten duplicate block sizes from a sample of 95251 > >dedup ops on a medium-sized production server with 4TB of filesystem > >(about one machine-day of data): > > Which software do you use to dedupe your data? I tried duperemove but it > gets killed by the OOM killer because it triggers some kind of memory leak: > https://github.com/markfasheh/duperemove/issues/163 Duperemove does use a lot of memory, but the logs at that URL only show 2G of RAM in duperemove--not nearly enough to trigger OOM under normal conditions on an 8G machine. There's another process with 6G of virtual address space (although much less than that resident) that looks more interesting (i.e. duperemove might just be the victim of some interaction between baloo_file and the OOM killer). On the other hand, the logs also show kernel 4.8. 100% of my test machines failed to finish booting before they were cut down by OOM on 4.7.x kernels. The same problem occurs on early kernels in the 4.8.x series. I am having good results with 4.8.6 and later, but you should be aware that significant changes have been made to the way OOM works in these kernel versions, and maybe you're hitting a regression for your use case. > Niccolò Belli > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: Digital signature
Re: Announcing btrfs-dedupe
On martedì 15 novembre 2016 18:52:01 CET, Zygo Blaxell wrote: Like I said, millions of extents per week... 64K is an enormous dedup block size, especially if it comes with a 64K alignment constraint as well. These are the top ten duplicate block sizes from a sample of 95251 dedup ops on a medium-sized production server with 4TB of filesystem (about one machine-day of data): Which software do you use to dedupe your data? I tried duperemove but it gets killed by the OOM killer because it triggers some kind of memory leak: https://github.com/markfasheh/duperemove/issues/163 Niccolò Belli -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On Tue, Nov 15, 2016 at 07:26:53AM -0500, Austin S. Hemmelgarn wrote: > On 2016-11-14 16:10, Zygo Blaxell wrote: > >Why is deduplicating thousands of blocks of data crazy? I already > >deduplicate four orders of magnitude more than that per week. > You missed the 'tiny' quantifier. I'm talking really small blocks, on the > order of less than 64k (so, IOW, stuff that's not much bigger than a few > filesystem blocks), and that is somewhat crazy because it ends up not only > taking _really_ long to do compared to larger chunks (because you're running > more independent hashes than with bigger blocks), but also because it will > often split extents unnecessarily and contribute to fragmentation, which > will lead to all kinds of other performance problems on the FS. Like I said, millions of extents per week... 64K is an enormous dedup block size, especially if it comes with a 64K alignment constraint as well. These are the top ten duplicate block sizes from a sample of 95251 dedup ops on a medium-sized production server with 4TB of filesystem (about one machine-day of data): total bytes extent countdup size 2750808064 20987 131072 803733504 1533524288 123801600 975 126976 103575552 842912288 97443840793 122880 8205107210016 8192 7749222418919 4096 71331840645 110592 64143360540 118784 63897600650 98304 all bytes all extents average dup size 6129995776 95251 64356 128K and 512K are the most common sizes due to btrfs compression (it limits the block size to 128K for compressed extents and seems to limit uncompressed extents to 512K for some reason). 12K is #4, and 3 of the top ten sizes are below 16K. The average size is just a little below 64K. These are the duplicates with block sizes smaller than 64K: total bytes extent countextent size 41615360635 65536 46264320753 61440 45817856799 57344 41267200775 53248 45760512931 49152 46948352104245056 43417600106040960 47296512128336864 59277312180932768 49029120171028672 43745280178024576 53616640261820480 43466752265316384 103575552 842912288 8205107210016 8192 7749222418919 4096 all bytes <=64K extents <=64K average dup size <=64K 870641664 55212 15769 14% of my duplicate bytes are in blocks smaller than 64K or blocks not aligned to a 64K boundary within a file. It's too large a space saving to ignore on machines that have constrained storage. It may be worthwhile skipping 4K and 8K dedups--at 250 ms per dedup, they're 30% of the total run time and only 2.6% of the total dedup bytes. On the other hand, this machine is already deduping everything fast enough to keep up with new data, so there's no performance problem to solve here. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: Digital signature
Re: Announcing btrfs-dedupe
On 2016-11-14 16:10, Zygo Blaxell wrote: On Mon, Nov 14, 2016 at 02:56:51PM -0500, Austin S. Hemmelgarn wrote: On 2016-11-14 14:51, Zygo Blaxell wrote: Deduplicating an extent that may might be concurrently modified during the dedup is a reasonable userspace request. In the general case there's no way for userspace to ensure that it's not happening. I'm not even talking about the locking, I'm talking about the data comparison that the ioctl does to ensure they are the same before deduplicating them, and specifically that protecting against userspace just passing in two random extents that happen to be the same size but not contain the same data (because deduplication _should_ reject such a situation, that's what the clone ioctl is for). If I'm deduping a VM image, and the virtual host is writing to said image (which is likely since an incremental dedup will be intentionally doing dedup over recently active data sets), the extent I just compared in userspace might be different by the time the kernel sees it. This is an important reason why the whole lock/read/compare/replace step is an atomic operation from userspace's PoV. The read also saves having to confirm a short/weak hash isn't a collision. The RAM savings from using weak hashes (~48 bits) are a huge performance win. The locking overhead is very small compared to the reading overhead, and (in the absence of bugs) it will only block concurrent writes to the same offset range in the src/dst inodes (based on a read of the code...I don't know if there's also an inode-level or backref-level barrier that expands the locking scope). I'm not arguing that it's a bad thing that the kernel is doing this, I'm just saying that the locking overhead is minuscule in most cases compared to the data comparison. It is absolutely necessary for exactly the reasons you are outlining. I'm not sure the ioctl is well designed for simply throwing random data at it, especially not entire files (it can't handle files over 16MB anyway). It will read more data than it has to compared to a block-by-block comparison from userspace with prefetches or a pair of IO threads. If userspace reads both copies of the data just before issuing the extent-same call, the kernel will read the data from cache reasonably quickly. It still depends on the use case to a certain extent. In the case I was using as an example, I know to a reasonably certain degree (barring tampering, bugs, or hardware failure) that any two files are identical, and I actually don't want to trash the page-cache just to deduplicate data faster (he data set in question is large, but most of it is idle at any given point in time), so there's no point in me prereading everything in userspace, which in turn makes the script I use much simpler (the most complex part is figuring out how to split extents for files bigger than the ioctl can handle such that I don't have tiny tail extents but still have a minimum number per file). The locking is perfectly reasonable and shouldn't contribute that much to the overhead (unless you're being crazy and deduplicating thousands of tiny blocks of data). Why is deduplicating thousands of blocks of data crazy? I already deduplicate four orders of magnitude more than that per week. You missed the 'tiny' quantifier. I'm talking really small blocks, on the order of less than 64k (so, IOW, stuff that's not much bigger than a few filesystem blocks), and that is somewhat crazy because it ends up not only taking _really_ long to do compared to larger chunks (because you're running more independent hashes than with bigger blocks), but also because it will often split extents unnecessarily and contribute to fragmentation, which will lead to all kinds of other performance problems on the FS. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On Mon, Nov 14, 2016 at 09:07:51PM +0100, James Pharaoh wrote: > On 14/11/16 20:51, Zygo Blaxell wrote: > >On Mon, Nov 14, 2016 at 01:39:02PM -0500, Austin S. Hemmelgarn wrote: > >>On 2016-11-14 13:22, James Pharaoh wrote: > >>>One thing I am keen to understand is if BTRFS will automatically ignore > >>>a request to deduplicate a file if it is already deduplicated? Given the > >>>performance I see when doing a repeat deduplication, it seems to me that > >>>it can't be doing so, although this could be caused by the CPU usage you > >>>mention above. > >> > >>What's happening is that the dedupe ioctl does a byte-wise comparison of the > >>ranges to make sure they're the same before linking them. This is actually > >>what takes most of the time when calling the ioctl, and is part of why it > >>takes longer the larger the range to deduplicate is. In essence, it's > >>behaving like an OS should and not trusting userspace to make reasonable > >>requests (which is also why there's a separate ioctl to clone a range from > >>another file instead of deduplicating existing data). > > > > - the extent-same ioctl could check to see which extents > > are referenced by the src and dst ranges, and return success > > immediately without reading data if they are the same (but > > userspace should already know this, or it's wasting a huge amount > > of time before it even calls the kernel). > > Yes, this is what I am talking about. I believe I should be able to read > data about the BTRFS data structures and determine if this is the case. I > don't care if there are false matches, due to concurrent updates, but > there'll be a /lot/ of repeat deduplications unless I do this, because even > if the file is identical, the mtime etc hasn't changed, and I have a record > of previously doing a dedupe, there's no guarantee that the file hasn't been > rewritten in place (eg by rsync), and no way that I know of to reliably > detect if a file has been changed. > > I am sure there are libraries out there which can look into the data > structures of a BTRFS file system, I haven't researched this in detail > though. I imagine that with some kind of lock on a BTRFS root, this could be > achieved by simply reading the data from the disk, since I believe that > everything is copy-on-write, so no existing data should be overwritten until > all roots referring to it are updated. Perhaps I'm missing something > though... FIEMAP (VFS) and SEARCH_V2 (btrfs-specific) will both give you access to the underlying physical block numbers. SEARCH_V2 is non-trivial to use without reverse-engineering significant parts of btrfs-progs. SEARCH_V2 is a generic tree-searching tool which will give you all kinds of information about btrfs structures...it's essential for a sophisticated deduplicator and overkill for a simple one. For full-file dedup using FIEMAP you only need to look at the "physical" field of the first extent (if it's zero or the same as the other file, the files cannot be deduplicated or are already deduplicated, respectively). The source for 'filefrag' (from e2fsprogs) is good for learning how FIEMAP works. For block-level dedup you need to look at each extent individually. That's much slower and full of additional caveats. If you're going down that road it's probably better to just improve duperemove instead. > James signature.asc Description: Digital signature
Re: Announcing btrfs-dedupe
On Mon, Nov 14, 2016 at 02:56:51PM -0500, Austin S. Hemmelgarn wrote: > On 2016-11-14 14:51, Zygo Blaxell wrote: > >Deduplicating an extent that may might be concurrently modified during the > >dedup is a reasonable userspace request. In the general case there's > >no way for userspace to ensure that it's not happening. > I'm not even talking about the locking, I'm talking about the data > comparison that the ioctl does to ensure they are the same before > deduplicating them, and specifically that protecting against userspace just > passing in two random extents that happen to be the same size but not > contain the same data (because deduplication _should_ reject such a > situation, that's what the clone ioctl is for). If I'm deduping a VM image, and the virtual host is writing to said image (which is likely since an incremental dedup will be intentionally doing dedup over recently active data sets), the extent I just compared in userspace might be different by the time the kernel sees it. This is an important reason why the whole lock/read/compare/replace step is an atomic operation from userspace's PoV. The read also saves having to confirm a short/weak hash isn't a collision. The RAM savings from using weak hashes (~48 bits) are a huge performance win. The locking overhead is very small compared to the reading overhead, and (in the absence of bugs) it will only block concurrent writes to the same offset range in the src/dst inodes (based on a read of the code...I don't know if there's also an inode-level or backref-level barrier that expands the locking scope). I'm not sure the ioctl is well designed for simply throwing random data at it, especially not entire files (it can't handle files over 16MB anyway). It will read more data than it has to compared to a block-by-block comparison from userspace with prefetches or a pair of IO threads. If userspace reads both copies of the data just before issuing the extent-same call, the kernel will read the data from cache reasonably quickly. > The locking is perfectly reasonable and shouldn't contribute that much to > the overhead (unless you're being crazy and deduplicating thousands of tiny > blocks of data). Why is deduplicating thousands of blocks of data crazy? I already deduplicate four orders of magnitude more than that per week. > >That said, some optimization is possible (although there are good reasons > >not to bother with optimization in the kernel): > > > > - VFS could recognize when it has two separate references to > > the same physical extent and not re-read the same data twice > > (but that requires teaching VFS how to do CoW in general, and is > > hard for political reasons on top of the obvious technical ones). > > > > - the extent-same ioctl could check to see which extents > > are referenced by the src and dst ranges, and return success > > immediately without reading data if they are the same (but > > userspace should already know this, or it's wasting a huge amount > > of time before it even calls the kernel). > > > >>TBH, even though it's kind of annoying from a performance perspective, it's > >>a rather nice safety net to have. For example, one of the cases where I do > >>deduplication is a couple of directories where each directory is an > >>overlapping partial subset of one large tree which I keep elsewhere. In > >>this case, I can tell just by filename exactly what files might be > >>duplicates, so the ioctl's check lets me just call the ioctl on all > >>potential duplicates (after checking size, no point in wasting time if the > >>files obviously aren't duplicates), and have it figure out whether or not > >>they can be deduplicated. > >>> > >>>In any case, I'm considering some digging into the filesystem structures > >>>to see if I can work this out myself before i do any deduplication. I'm > >>>fairly sure this should be relatively simple to work out, at least well > >>>enough for my purposes. > >>Sadly, there's no way to avoid doing so right now. > >> > >>-- > >>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > >>the body of a message to majord...@vger.kernel.org > >>More majordomo info at http://vger.kernel.org/majordomo-info.html > signature.asc Description: Digital signature
Re: Announcing btrfs-dedupe
On 14/11/16 20:51, Zygo Blaxell wrote: On Mon, Nov 14, 2016 at 01:39:02PM -0500, Austin S. Hemmelgarn wrote: On 2016-11-14 13:22, James Pharaoh wrote: One thing I am keen to understand is if BTRFS will automatically ignore a request to deduplicate a file if it is already deduplicated? Given the performance I see when doing a repeat deduplication, it seems to me that it can't be doing so, although this could be caused by the CPU usage you mention above. >> What's happening is that the dedupe ioctl does a byte-wise comparison of the ranges to make sure they're the same before linking them. This is actually what takes most of the time when calling the ioctl, and is part of why it takes longer the larger the range to deduplicate is. In essence, it's behaving like an OS should and not trusting userspace to make reasonable requests (which is also why there's a separate ioctl to clone a range from another file instead of deduplicating existing data). - the extent-same ioctl could check to see which extents are referenced by the src and dst ranges, and return success immediately without reading data if they are the same (but userspace should already know this, or it's wasting a huge amount of time before it even calls the kernel). Yes, this is what I am talking about. I believe I should be able to read data about the BTRFS data structures and determine if this is the case. I don't care if there are false matches, due to concurrent updates, but there'll be a /lot/ of repeat deduplications unless I do this, because even if the file is identical, the mtime etc hasn't changed, and I have a record of previously doing a dedupe, there's no guarantee that the file hasn't been rewritten in place (eg by rsync), and no way that I know of to reliably detect if a file has been changed. I am sure there are libraries out there which can look into the data structures of a BTRFS file system, I haven't researched this in detail though. I imagine that with some kind of lock on a BTRFS root, this could be achieved by simply reading the data from the disk, since I believe that everything is copy-on-write, so no existing data should be overwritten until all roots referring to it are updated. Perhaps I'm missing something though... James -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On 2016-11-14 14:51, Zygo Blaxell wrote: On Mon, Nov 14, 2016 at 01:39:02PM -0500, Austin S. Hemmelgarn wrote: On 2016-11-14 13:22, James Pharaoh wrote: One thing I am keen to understand is if BTRFS will automatically ignore a request to deduplicate a file if it is already deduplicated? Given the performance I see when doing a repeat deduplication, it seems to me that it can't be doing so, although this could be caused by the CPU usage you mention above. What's happening is that the dedupe ioctl does a byte-wise comparison of the ranges to make sure they're the same before linking them. This is actually what takes most of the time when calling the ioctl, and is part of why it takes longer the larger the range to deduplicate is. In essence, it's behaving like an OS should and not trusting userspace to make reasonable requests (which is also why there's a separate ioctl to clone a range from another file instead of deduplicating existing data). Deduplicating an extent that may might be concurrently modified during the dedup is a reasonable userspace request. In the general case there's no way for userspace to ensure that it's not happening. I'm not even talking about the locking, I'm talking about the data comparison that the ioctl does to ensure they are the same before deduplicating them, and specifically that protecting against userspace just passing in two random extents that happen to be the same size but not contain the same data (because deduplication _should_ reject such a situation, that's what the clone ioctl is for). The locking is perfectly reasonable and shouldn't contribute that much to the overhead (unless you're being crazy and deduplicating thousands of tiny blocks of data). That said, some optimization is possible (although there are good reasons not to bother with optimization in the kernel): - VFS could recognize when it has two separate references to the same physical extent and not re-read the same data twice (but that requires teaching VFS how to do CoW in general, and is hard for political reasons on top of the obvious technical ones). - the extent-same ioctl could check to see which extents are referenced by the src and dst ranges, and return success immediately without reading data if they are the same (but userspace should already know this, or it's wasting a huge amount of time before it even calls the kernel). TBH, even though it's kind of annoying from a performance perspective, it's a rather nice safety net to have. For example, one of the cases where I do deduplication is a couple of directories where each directory is an overlapping partial subset of one large tree which I keep elsewhere. In this case, I can tell just by filename exactly what files might be duplicates, so the ioctl's check lets me just call the ioctl on all potential duplicates (after checking size, no point in wasting time if the files obviously aren't duplicates), and have it figure out whether or not they can be deduplicated. In any case, I'm considering some digging into the filesystem structures to see if I can work this out myself before i do any deduplication. I'm fairly sure this should be relatively simple to work out, at least well enough for my purposes. Sadly, there's no way to avoid doing so right now. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On Mon, Nov 14, 2016 at 01:39:02PM -0500, Austin S. Hemmelgarn wrote: > On 2016-11-14 13:22, James Pharaoh wrote: > >One thing I am keen to understand is if BTRFS will automatically ignore > >a request to deduplicate a file if it is already deduplicated? Given the > >performance I see when doing a repeat deduplication, it seems to me that > >it can't be doing so, although this could be caused by the CPU usage you > >mention above. > What's happening is that the dedupe ioctl does a byte-wise comparison of the > ranges to make sure they're the same before linking them. This is actually > what takes most of the time when calling the ioctl, and is part of why it > takes longer the larger the range to deduplicate is. In essence, it's > behaving like an OS should and not trusting userspace to make reasonable > requests (which is also why there's a separate ioctl to clone a range from > another file instead of deduplicating existing data). Deduplicating an extent that may might be concurrently modified during the dedup is a reasonable userspace request. In the general case there's no way for userspace to ensure that it's not happening. That said, some optimization is possible (although there are good reasons not to bother with optimization in the kernel): - VFS could recognize when it has two separate references to the same physical extent and not re-read the same data twice (but that requires teaching VFS how to do CoW in general, and is hard for political reasons on top of the obvious technical ones). - the extent-same ioctl could check to see which extents are referenced by the src and dst ranges, and return success immediately without reading data if they are the same (but userspace should already know this, or it's wasting a huge amount of time before it even calls the kernel). > TBH, even though it's kind of annoying from a performance perspective, it's > a rather nice safety net to have. For example, one of the cases where I do > deduplication is a couple of directories where each directory is an > overlapping partial subset of one large tree which I keep elsewhere. In > this case, I can tell just by filename exactly what files might be > duplicates, so the ioctl's check lets me just call the ioctl on all > potential duplicates (after checking size, no point in wasting time if the > files obviously aren't duplicates), and have it figure out whether or not > they can be deduplicated. > > > >In any case, I'm considering some digging into the filesystem structures > >to see if I can work this out myself before i do any deduplication. I'm > >fairly sure this should be relatively simple to work out, at least well > >enough for my purposes. > Sadly, there's no way to avoid doing so right now. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: Digital signature
Re: Announcing btrfs-dedupe
On Mon, Nov 14, 2016 at 07:22:59PM +0100, James Pharaoh wrote: > On 14/11/16 19:07, Zygo Blaxell wrote: > >There is also a still-unresolved problem where the filesystem CPU usage > >rises exponentially for some operations depending on the number of shared > >references to an extent. Files which contain blocks with more than a few > >thousand shared references can trigger this problem. A file over 1TB can > >keep the kernel busy at 100% CPU for over 40 minutes at a time. > > Yes, I see this all the time. For my use cases, I don't really care about > "shared references" as blocks of files, but am happy to simply deduplicate > at the whole-file level. I wonder if this still will have the same effect, > however. I guess that this could be mitigated in a tool, but this is going > to be both annoying and not the most elegant solution. If you have huge files (1TB+) this can be a problem even with whole-file deduplications (which are really just extent-level deduplications applied to the entire file). The CPU time is a product of file size and extent reference count with some other multipliers on top. I've hacked around it by timing how long it takes to manipulate the data, and blacklisting any hash value or block address that takes more than 10 seconds to process (if such a block is found after blacklisting, just skip processing the block/extent/file entirely). It turns out there are very few of these in practice (only a few hundred per TB) but these few hundred block hash values occur millions of times in a large data corpus. > One thing I am keen to understand is if BTRFS will automatically ignore a > request to deduplicate a file if it is already deduplicated? Given the > performance I see when doing a repeat deduplication, it seems to me that it > can't be doing so, although this could be caused by the CPU usage you > mention above. As far as I can tell btrfs doesn't do anything different in this case--it'll happily repeat the entire lock/read/compare/delete/insert sequence even if the outcome cannot be different from the initial conditions. Due to limitations of VFS caching it'll read the same blocks from storage hardware twice, too. > In any case, I'm considering some digging into the filesystem structures to > see if I can work this out myself before i do any deduplication. I'm fairly > sure this should be relatively simple to work out, at least well enough for > my purposes. I used FIEMAP (then later replaced it with SEARCH_V2 for speed) to map the extents to physical addresses before deduping them. If you're only going to do whole-file dedup then you only need to care about the physical address of the first non-hole extent. signature.asc Description: Digital signature
Re: Announcing btrfs-dedupe
On 2016-11-14 13:22, James Pharaoh wrote: On 14/11/16 19:07, Zygo Blaxell wrote: On Mon, Nov 07, 2016 at 07:49:51PM +0100, James Pharaoh wrote: Annoyingly I can't find this now, but I definitely remember reading someone, apparently someone knowledgable, claim that the latest version of the kernel which I was using at the time, still suffered from issues regarding the dedupe code. This was a while ago, and I would be very pleased to hear that there is high confidence in the current implementation! I'll post a link if I manage to find the comments. I've been running the btrfs dedup ioctl 7 times per second on average over 42TB of test data for most of a year (and at a lower rate for two years). I have not found any data corruptions due to _dedup_. I did find three distinct data corruption kernel bugs unrelated to dedup, and two test machines with bad RAM, so I'm pretty sure my corruption detection is working. That said, I wouldn't run dedup on a kernel older than 4.4. LTS kernels might be OK too, but only if they're up to date with backported btrfs fixes. Ok, I think this might have referred to the 4.2 kernel, which was newly released at the time. I wish I could find the post! Kernels older than 3.13 lack the FILE_EXTENT_SAME ioctl and can only deduplicate static data (i.e. data you are certain is not being concurrently modified). Before 3.12 there are so many bugs you might as well not bother. Yes well I don't need to be told that, sadly. Older kernels are bad for dedup because of non-corruption reasons. Between 3.13 and 4.4, the following bugs were fixed: - false-negative capability checks (e.g. same-inode, EOF extent) reduce dedup efficiency - ctime updates (older versions would update ctime when a file was deduped) mess with incremental backup tools, build systems, etc. - kernel memory leaks (self-explanatory) - multiple kernel hang/panic bugs (e.g. a deadlock if two threads try to read the same extent at the same time, and at least one of those threads is dedup; and there was some race condition leading to invalid memory access on dedup's comparison reads) which won't eat your data, but they might ruin your day anyway. Ok, I have thing I've seen some stuff like this, I certainly have problems, but never a loss of data. Things can take a LONG time to get out of the filesystem, though. There is also a still-unresolved problem where the filesystem CPU usage rises exponentially for some operations depending on the number of shared references to an extent. Files which contain blocks with more than a few thousand shared references can trigger this problem. A file over 1TB can keep the kernel busy at 100% CPU for over 40 minutes at a time. Yes, I see this all the time. For my use cases, I don't really care about "shared references" as blocks of files, but am happy to simply deduplicate at the whole-file level. I wonder if this still will have the same effect, however. I guess that this could be mitigated in a tool, but this is going to be both annoying and not the most elegant solution. The issue is at the extent level, so it will impact whole files too (but it will have less impact on defragmented files that are then deduplicated as whole files). Pretty much anything that pins references to extents will impact this, so cloned extents and snapshots will also have an impact. There might also be a correlation between delalloc data and hangs in extent-same, but I have NOT been able to confirm this. All I know at this point is that doing a fsync() on the source FD just before doing the extent-same ioctl dramatically reduces filesystem hang rates: several weeks between hangs (or no hangs at all) with fsync, vs. 18 hours or less without. Interesting, I'll maybe see if I can make use of this. One thing I am keen to understand is if BTRFS will automatically ignore a request to deduplicate a file if it is already deduplicated? Given the performance I see when doing a repeat deduplication, it seems to me that it can't be doing so, although this could be caused by the CPU usage you mention above. What's happening is that the dedupe ioctl does a byte-wise comparison of the ranges to make sure they're the same before linking them. This is actually what takes most of the time when calling the ioctl, and is part of why it takes longer the larger the range to deduplicate is. In essence, it's behaving like an OS should and not trusting userspace to make reasonable requests (which is also why there's a separate ioctl to clone a range from another file instead of deduplicating existing data). TBH, even though it's kind of annoying from a performance perspective, it's a rather nice safety net to have. For example, one of the cases where I do deduplication is a couple of directories where each directory is an overlapping partial subset of one large tree which I keep elsewhere. In this case, I can tell just by filename exactly what
Re: Announcing btrfs-dedupe
On Tue, Nov 08, 2016 at 12:06:01PM +0100, Niccolò Belli wrote: > Nice, you should probably update the btrfs wiki as well, because there is no > mention of btrfs-dedupe. > > First question, why this name? Don't you plan to support xfs as well? Does XFS plan to support LOGICAL_INO, INO_PATHS, and something analogous to SEARCH_V2? POSIX API + FILE_EXTENT_SAME is OK for the lowest common denominator across arbitrary filesystems, but a btrfs-specific tool can do a lot better. Especially for incremental dedup and low-RAM algorithms. signature.asc Description: Digital signature
Re: Announcing btrfs-dedupe
On 14/11/16 19:07, Zygo Blaxell wrote: On Mon, Nov 07, 2016 at 07:49:51PM +0100, James Pharaoh wrote: Annoyingly I can't find this now, but I definitely remember reading someone, apparently someone knowledgable, claim that the latest version of the kernel which I was using at the time, still suffered from issues regarding the dedupe code. This was a while ago, and I would be very pleased to hear that there is high confidence in the current implementation! I'll post a link if I manage to find the comments. I've been running the btrfs dedup ioctl 7 times per second on average over 42TB of test data for most of a year (and at a lower rate for two years). I have not found any data corruptions due to _dedup_. I did find three distinct data corruption kernel bugs unrelated to dedup, and two test machines with bad RAM, so I'm pretty sure my corruption detection is working. That said, I wouldn't run dedup on a kernel older than 4.4. LTS kernels might be OK too, but only if they're up to date with backported btrfs fixes. Ok, I think this might have referred to the 4.2 kernel, which was newly released at the time. I wish I could find the post! Kernels older than 3.13 lack the FILE_EXTENT_SAME ioctl and can only deduplicate static data (i.e. data you are certain is not being concurrently modified). Before 3.12 there are so many bugs you might as well not bother. Yes well I don't need to be told that, sadly. Older kernels are bad for dedup because of non-corruption reasons. Between 3.13 and 4.4, the following bugs were fixed: - false-negative capability checks (e.g. same-inode, EOF extent) reduce dedup efficiency - ctime updates (older versions would update ctime when a file was deduped) mess with incremental backup tools, build systems, etc. - kernel memory leaks (self-explanatory) - multiple kernel hang/panic bugs (e.g. a deadlock if two threads try to read the same extent at the same time, and at least one of those threads is dedup; and there was some race condition leading to invalid memory access on dedup's comparison reads) which won't eat your data, but they might ruin your day anyway. Ok, I have thing I've seen some stuff like this, I certainly have problems, but never a loss of data. Things can take a LONG time to get out of the filesystem, though. There is also a still-unresolved problem where the filesystem CPU usage rises exponentially for some operations depending on the number of shared references to an extent. Files which contain blocks with more than a few thousand shared references can trigger this problem. A file over 1TB can keep the kernel busy at 100% CPU for over 40 minutes at a time. Yes, I see this all the time. For my use cases, I don't really care about "shared references" as blocks of files, but am happy to simply deduplicate at the whole-file level. I wonder if this still will have the same effect, however. I guess that this could be mitigated in a tool, but this is going to be both annoying and not the most elegant solution. There might also be a correlation between delalloc data and hangs in extent-same, but I have NOT been able to confirm this. All I know at this point is that doing a fsync() on the source FD just before doing the extent-same ioctl dramatically reduces filesystem hang rates: several weeks between hangs (or no hangs at all) with fsync, vs. 18 hours or less without. Interesting, I'll maybe see if I can make use of this. One thing I am keen to understand is if BTRFS will automatically ignore a request to deduplicate a file if it is already deduplicated? Given the performance I see when doing a repeat deduplication, it seems to me that it can't be doing so, although this could be caused by the CPU usage you mention above. In any case, I'm considering some digging into the filesystem structures to see if I can work this out myself before i do any deduplication. I'm fairly sure this should be relatively simple to work out, at least well enough for my purposes. James -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On Mon, Nov 07, 2016 at 07:49:51PM +0100, James Pharaoh wrote: > Annoyingly I can't find this now, but I definitely remember reading someone, > apparently someone knowledgable, claim that the latest version of the kernel > which I was using at the time, still suffered from issues regarding the > dedupe code. > This was a while ago, and I would be very pleased to hear that there is high > confidence in the current implementation! I'll post a link if I manage to > find the comments. I've been running the btrfs dedup ioctl 7 times per second on average over 42TB of test data for most of a year (and at a lower rate for two years). I have not found any data corruptions due to _dedup_. I did find three distinct data corruption kernel bugs unrelated to dedup, and two test machines with bad RAM, so I'm pretty sure my corruption detection is working. That said, I wouldn't run dedup on a kernel older than 4.4. LTS kernels might be OK too, but only if they're up to date with backported btrfs fixes. Kernels older than 3.13 lack the FILE_EXTENT_SAME ioctl and can only deduplicate static data (i.e. data you are certain is not being concurrently modified). Before 3.12 there are so many bugs you might as well not bother. Older kernels are bad for dedup because of non-corruption reasons. Between 3.13 and 4.4, the following bugs were fixed: - false-negative capability checks (e.g. same-inode, EOF extent) reduce dedup efficiency - ctime updates (older versions would update ctime when a file was deduped) mess with incremental backup tools, build systems, etc. - kernel memory leaks (self-explanatory) - multiple kernel hang/panic bugs (e.g. a deadlock if two threads try to read the same extent at the same time, and at least one of those threads is dedup; and there was some race condition leading to invalid memory access on dedup's comparison reads) which won't eat your data, but they might ruin your day anyway. There is also a still-unresolved problem where the filesystem CPU usage rises exponentially for some operations depending on the number of shared references to an extent. Files which contain blocks with more than a few thousand shared references can trigger this problem. A file over 1TB can keep the kernel busy at 100% CPU for over 40 minutes at a time. There might also be a correlation between delalloc data and hangs in extent-same, but I have NOT been able to confirm this. All I know at this point is that doing a fsync() on the source FD just before doing the extent-same ioctl dramatically reduces filesystem hang rates: several weeks between hangs (or no hangs at all) with fsync, vs. 18 hours or less without. > James > > On 07/11/16 18:59, Mark Fasheh wrote: > >Hi James, > > > >Re the following text on your project page: > > > >"IMPORTANT CAVEAT — I have read that there are race and/or error > >conditions which can cause filesystem corruption in the kernel > >implementation of the deduplication ioctl." > > > >Can you expound on that? I'm not aware of any bugs right now but if > >there is any it'd absolutely be worth having that info on the btrfs > >list. > > > >Thanks, > >--Mark > > > > > >On Sun, Nov 6, 2016 at 7:30 AM, James Pharaoh > > wrote: > >>Hi all, > >> > >>I'm pleased to announce my btrfs deduplication utility, written in Rust. > >>This operates on whole files, is fast, and I believe complements the > >>existing utilities (duperemove, bedup), which exist currently. > >> > >>Please visit the homepage for more information: > >> > >>http://btrfs-dedupe.com > >> > >>James Pharaoh > >>-- > >>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > >>the body of a message to majord...@vger.kernel.org > >>More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: Digital signature
Re: Announcing btrfs-dedupe
I've updated the BTRFS wiki here with all the new tools people have mentioned: https://btrfs.wiki.kernel.org/index.php/Deduplication#Other_tools Please let me know if anyone who does not have access to the wiki has any additions, updates or corrections to what I've written here. James On 08/11/16 23:36, Saint Germain wrote: On Sun, 6 Nov 2016 14:30:52 +0100, James Pharaoh wrote : Hi all, I'm pleased to announce my btrfs deduplication utility, written in Rust. This operates on whole files, is fast, and I believe complements the existing utilities (duperemove, bedup), which exist currently. Please visit the homepage for more information: http://btrfs-dedupe.com Thanks for having shared your work. Please be aware of these other similar softwares: - jdupes: https://github.com/jbruchon/jdupes - rmlint: https://github.com/sahib/rmlint And of course fdupes. Some intesting points I have seen in them: - use xxhash to identify potential duplicates (huge speedup) - ability to deduplicate read-only snapshots - identify potential reflinked files (see also my email here: https://www.spinics.net/lists/linux-btrfs/msg60081.html) - ability to filter out hardlinks - triangle problem: see jdupes readme - jdupes has started the process to be included in Debian I hope that will help and that you can share some codes with them ! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On Mon, Nov 07, 2016 at 09:54:09PM +0100, Adam Borowski wrote: > [1]. For some reasons zfs-on-linux guys didn't implement this yet, despite > it being an obvious thing on ZFS. In my understanding, the COW mechanics are different, there are no extent back references, so this would require some design updates. See issue 405 at ZoL tracker. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On Wed, 09 Nov 2016 12:24:51 +0100, Niccolò Belli wrote : > > On martedì 8 novembre 2016 23:36:25 CET, Saint Germain wrote: > > Please be aware of these other similar softwares: > > - jdupes: https://github.com/jbruchon/jdupes > > - rmlint: https://github.com/sahib/rmlint > > And of course fdupes. > > > > Some intesting points I have seen in them: > > - use xxhash to identify potential duplicates (huge speedup) > > - ability to deduplicate read-only snapshots > > - identify potential reflinked files (see also my email here: > > https://www.spinics.net/lists/linux-btrfs/msg60081.html) > > - ability to filter out hardlinks > > - triangle problem: see jdupes readme > > - jdupes has started the process to be included in Debian > > > > I hope that will help and that you can share some codes with them ! > > > Hi, > What do you think about jdupes? I'm searching an alternative to > duperemove and rmlint doesn't seem to support btrfs deduplication, so > I would like to try jdupes. My main problem with duperemove is a > memory leak, also it seems to lead to greater disk usage: > https://github.com/markfasheh/duperemove/issues/163 rmlint is supporting btrfs deduplication: rmlint --algorithm=xxhash --types="duplicates" --hidden --config=sh:handler=clone --no-hardlinked I've used jdupes and rmlint to deduplicate 2TB with 4GB RAM and it took a few hours. So it is acceptable from a performance point of view. The problems I found have been corrected by both. Jdupes author is really kind and reactive ! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
Hi, What do you think about jdupes? I'm searching an alternative to duperemove and rmlint doesn't seem to support btrfs deduplication, so I would like to try jdupes. My main problem with duperemove is a memory leak, also it seems to lead to greater disk usage: https://github.com/markfasheh/duperemove/issues/163 Niccolo' Belli On martedì 8 novembre 2016 23:36:25 CET, Saint Germain wrote: Please be aware of these other similar softwares: - jdupes: https://github.com/jbruchon/jdupes - rmlint: https://github.com/sahib/rmlint And of course fdupes. Some intesting points I have seen in them: - use xxhash to identify potential duplicates (huge speedup) - ability to deduplicate read-only snapshots - identify potential reflinked files (see also my email here: https://www.spinics.net/lists/linux-btrfs/msg60081.html) - ability to filter out hardlinks - triangle problem: see jdupes readme - jdupes has started the process to be included in Debian I hope that will help and that you can share some codes with them ! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On Sun, 6 Nov 2016 14:30:52 +0100, James Pharaoh wrote : > Hi all, > > I'm pleased to announce my btrfs deduplication utility, written in > Rust. This operates on whole files, is fast, and I believe > complements the existing utilities (duperemove, bedup), which exist > currently. > > Please visit the homepage for more information: > > http://btrfs-dedupe.com > Thanks for having shared your work. Please be aware of these other similar softwares: - jdupes: https://github.com/jbruchon/jdupes - rmlint: https://github.com/sahib/rmlint And of course fdupes. Some intesting points I have seen in them: - use xxhash to identify potential duplicates (huge speedup) - ability to deduplicate read-only snapshots - identify potential reflinked files (see also my email here: https://www.spinics.net/lists/linux-btrfs/msg60081.html) - ability to filter out hardlinks - triangle problem: see jdupes readme - jdupes has started the process to be included in Debian I hope that will help and that you can share some codes with them ! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On Tue, Nov 08, 2016 at 10:59:56AM -0800, Mark Fasheh wrote: > On Mon, Nov 7, 2016 at 6:17 PM, Darrick J. Wong > wrote: > > On Mon, Nov 07, 2016 at 09:54:09PM +0100, Adam Borowski wrote: > >> Mark has already included XFS in documentation of duperemove, all that > >> looks > >> amiss is btrfs-extent-same having an obsolete name. But then, I never did > >> any non-superficial tests on XFS, beyond "seems to work". > > I'd actually be ok dropping btrfs-extent-same completely at this point > but I'm concerned that it would leave some users behind. > > > > /me wonders if ocfs2 will ever catch up to the reflink/dedupe party. ;) > > Hey, Ocfs2 started the reflink party! But yeah it's fallen behind > since then with respect to cow and dedupe. More importantly though I'd > like to see some extra extent tracking in there like XFS did with the > reflink b+tree. Perhaps this should move to the ocfs2 list, but... ...as I understand ocfs2, each inode can point to the head of a refcount tree that maintains refcounts for all the physical blocks that are mapped by any of the files that share that refcount tree. It wouldn't be difficult to hook up this existing refcount structure to the reflink and dedupe vfs ioctls, with the huge caveat that both inodes will end up belonging to the same refcount tree (or the call fails). This might not be such a huge issue for reflink since we're generally only using it during a file copy anyway, but for dedupe this could have disastrous consequences if someone does an fs-wide dedupe and every file in the fs ends up with the same refcount tree. So I guess you could give each block group its own refcount tree or something so that all the writes in the fs don't end up contending for a single data structure. --D >--Mark > > -- > "When the going gets weird, the weird turn pro." > Hunter S. Thompson > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On Mon, Nov 7, 2016 at 6:17 PM, Darrick J. Wong wrote: > On Mon, Nov 07, 2016 at 09:54:09PM +0100, Adam Borowski wrote: >> Mark has already included XFS in documentation of duperemove, all that looks >> amiss is btrfs-extent-same having an obsolete name. But then, I never did >> any non-superficial tests on XFS, beyond "seems to work". I'd actually be ok dropping btrfs-extent-same completely at this point but I'm concerned that it would leave some users behind. > /me wonders if ocfs2 will ever catch up to the reflink/dedupe party. ;) Hey, Ocfs2 started the reflink party! But yeah it's fallen behind since then with respect to cow and dedupe. More importantly though I'd like to see some extra extent tracking in there like XFS did with the reflink b+tree. --Mark -- "When the going gets weird, the weird turn pro." Hunter S. Thompson -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On Mon, Nov 7, 2016 at 6:40 PM, Christoph Anton Mitterer wrote: > On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote: >> I think adding a whole-file dedup mode to duperemove would be better >> (from user's POV) than writing a whole new tool > > What would IMO be really good from a user's POV was, if one of the > tools, deemed to be the "best", would be added to the btrfs-progs and > simply become "the official" one. Yeah there's two problems, one being that the extent-same ioctl (and duperemove) is cross-file system now so I. The other one James touches on, which is that there's a non trivial amount of complexity in duperemove so shoving it in btrfs progs just means we're going to have parallel development streams solving some different problems. That's not to say that every dedupe tool has to be complex - we have xfs_io to run the ioctl and I don't think it'd be a bad idea if btrfs-progs had a simple interface to it too. --Mark -- "When the going gets weird, the weird turn pro." Hunter S. Thompson -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On martedì 8 novembre 2016 17:58:52 CET, James Pharaoh wrote: Yes, everything you have described here is something I intend to create, and might as well include in the tool itself. I'll add it to the roadmap ;-) Sounds good, but I have yet another feature request which is even more interesting in my opinion. If you ever used snapper you probably already found yourself in the poisition when you want to free some space and you actually can't, because the files you want to delete are already present in countless snapshots. Such a way you will have to delete the unwanted files from every snapshot, which is tedious task, even more difficult if you moved/renamed these files. What I actually do is exploiting duperemove's hashfile to grep for the checksum and obtain all the paths. Then I will have to switch the snapshots to rw, manually delete each file and finally switch them back to ro. A tool which automates these task would be awesome. Niccolo' -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On 2016-11-08 11:57, Darrick J. Wong wrote: On Tue, Nov 08, 2016 at 08:26:02AM -0500, Austin S. Hemmelgarn wrote: On 2016-11-07 21:40, Christoph Anton Mitterer wrote: On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote: I think adding a whole-file dedup mode to duperemove would be better (from user's POV) than writing a whole new tool What would IMO be really good from a user's POV was, if one of the tools, deemed to be the "best", would be added to the btrfs-progs and simply become "the official" one. The problem is that for deduplication, most tools won't work well for everything. For example the cases I use it in are very specific and have horrible performance using pretty much any available tool (I have a couple cases where I have disjoint subsets of the same directory tree with different prefixes, so I can tell exactly which files are duplicated, and that any duplicate file is 100% duplicate, as well as a couple of cases where changes are small, scattered, and highly predictable (and thus it's easier to find what's changed and dedupe everything else instead of finding what's the same), and none of the existing options do well in either situation). I'd argue at minimum for having the extent-same tool from duperemove in btrfs-progs, as that lets people do deduplication how they want without having to write C code. Something equivalent that would let you call any BTRFS ioctl with (reasonably) arbitrary arguments might actually be even better (I can see such a tool being wonderful for debugging). Since xfsprogs 4.3, xfs_io has a 'dedupe' command that can talk to FIDEDUPERANGE (f.k.a. EXTENT SAME): $ xfs_io -c '/mnt/srcfile srcoffset dstoffset length' /mnt/destfile I actually hadn't known about this, thanks. It means that xfs_io just got even more useful despite me not running XFS. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
Yes, everything you have described here is something I intend to create, and might as well include in the tool itself. I'll add it to the roadmap ;-) James On 08/11/16 17:57, Niccolò Belli wrote: On martedì 8 novembre 2016 12:38:48 CET, James Pharaoh wrote: You can't deduplicate a read-only snapshot, but you can create read-write snapshots from them, deduplicate those, and then recreate the read-only ones. This is what I've done. Since snapper creates hundreds of snapshots, isn't this something that the deduplication software could do for me if I explicitely tell it to do so? I mean momentarily switching the snapshot to rw in order to deduplicate it, then switching it back to ro. In theory, once this has been done once, it shouldn't have to be done again, at least for those snapshots, unless you want to modify the deduplication. It's probably a good idea to defragment files and directories first, as well. I can't defragment anything, because it would take too much free space to do so with so many snapshots. Instead, the deduplication software could defragment each file before calling the extent-same ioctl, that would be feasible. Such a way you will not need hilarious amounts of free space to defragment the fs. It should be possible to deduplicate a read-only file to a read-write one, but that's probably not worth the effort in many real-world use cases. This is exactly what I would expect a deduplication tool to do when it encounters a ro snapshot, except when I explicitely tell it to momentarily switch the snapshot to rw in order to deduplicate it. Niccolo' Belli -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On Tue, Nov 08, 2016 at 08:26:02AM -0500, Austin S. Hemmelgarn wrote: > On 2016-11-07 21:40, Christoph Anton Mitterer wrote: > >On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote: > >>I think adding a whole-file dedup mode to duperemove would be better > >>(from user's POV) than writing a whole new tool > > > >What would IMO be really good from a user's POV was, if one of the > >tools, deemed to be the "best", would be added to the btrfs-progs and > >simply become "the official" one. > > The problem is that for deduplication, most tools won't work well for > everything. For example the cases I use it in are very specific and have > horrible performance using pretty much any available tool (I have a couple > cases where I have disjoint subsets of the same directory tree with > different prefixes, so I can tell exactly which files are duplicated, and > that any duplicate file is 100% duplicate, as well as a couple of cases > where changes are small, scattered, and highly predictable (and thus it's > easier to find what's changed and dedupe everything else instead of finding > what's the same), and none of the existing options do well in either > situation). > > I'd argue at minimum for having the extent-same tool from duperemove in > btrfs-progs, as that lets people do deduplication how they want without > having to write C code. Something equivalent that would let you call any > BTRFS ioctl with (reasonably) arbitrary arguments might actually be even > better (I can see such a tool being wonderful for debugging). Since xfsprogs 4.3, xfs_io has a 'dedupe' command that can talk to FIDEDUPERANGE (f.k.a. EXTENT SAME): $ xfs_io -c '/mnt/srcfile srcoffset dstoffset length' /mnt/destfile --D > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On martedì 8 novembre 2016 12:38:48 CET, James Pharaoh wrote: You can't deduplicate a read-only snapshot, but you can create read-write snapshots from them, deduplicate those, and then recreate the read-only ones. This is what I've done. Since snapper creates hundreds of snapshots, isn't this something that the deduplication software could do for me if I explicitely tell it to do so? I mean momentarily switching the snapshot to rw in order to deduplicate it, then switching it back to ro. In theory, once this has been done once, it shouldn't have to be done again, at least for those snapshots, unless you want to modify the deduplication. It's probably a good idea to defragment files and directories first, as well. I can't defragment anything, because it would take too much free space to do so with so many snapshots. Instead, the deduplication software could defragment each file before calling the extent-same ioctl, that would be feasible. Such a way you will not need hilarious amounts of free space to defragment the fs. It should be possible to deduplicate a read-only file to a read-write one, but that's probably not worth the effort in many real-world use cases. This is exactly what I would expect a deduplication tool to do when it encounters a ro snapshot, except when I explicitely tell it to momentarily switch the snapshot to rw in order to deduplicate it. Niccolo' Belli -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On 2016-11-07 21:40, Christoph Anton Mitterer wrote: On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote: I think adding a whole-file dedup mode to duperemove would be better (from user's POV) than writing a whole new tool What would IMO be really good from a user's POV was, if one of the tools, deemed to be the "best", would be added to the btrfs-progs and simply become "the official" one. The problem is that for deduplication, most tools won't work well for everything. For example the cases I use it in are very specific and have horrible performance using pretty much any available tool (I have a couple cases where I have disjoint subsets of the same directory tree with different prefixes, so I can tell exactly which files are duplicated, and that any duplicate file is 100% duplicate, as well as a couple of cases where changes are small, scattered, and highly predictable (and thus it's easier to find what's changed and dedupe everything else instead of finding what's the same), and none of the existing options do well in either situation). I'd argue at minimum for having the extent-same tool from duperemove in btrfs-progs, as that lets people do deduplication how they want without having to write C code. Something equivalent that would let you call any BTRFS ioctl with (reasonably) arbitrary arguments might actually be even better (I can see such a tool being wonderful for debugging). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On 08/11/16 12:06, Niccolò Belli wrote: Nice, you should probably update the btrfs wiki as well, because there is no mention of btrfs-dedupe. I am planning to, I had to apply for an account, which has now been approved. First question, why this name? Don't you plan to support xfs as well? It didn't occur to me, to be honest. I might support XFS as well, but I don't use it, and will possibly be adding other btrfs-specific stuff to it. You'll notice it's part of a bigger wbs-backup repo, with other tools, which I'm developing to manage my storage and backup requirements. I'll take a look at it, and certainly see if it works out of the box. Second question, I'm trying deduplication tools for the very first time and I still have to figure out how to handle snapper snapshots, which are read only. I currently tried duperemove 0.11 git and I get tons of "Error 30: Read-only file system while opening "/.../@snapshots/4385/...". How am I supposed to handle snapper snapshots? > Is btrfs-dedupe able to handle snapper snapshots? You can't deduplicate a read-only snapshot, but you can create read-write snapshots from them, deduplicate those, and then recreate the read-only ones. This is what I've done. In theory, once this has been done once, it shouldn't have to be done again, at least for those snapshots, unless you want to modify the deduplication. It's probably a good idea to defragment files and directories first, as well. It should be possible to deduplicate a read-only file to a read-write one, but that's probably not worth the effort in many real-world use cases. James -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
Nice, you should probably update the btrfs wiki as well, because there is no mention of btrfs-dedupe. First question, why this name? Don't you plan to support xfs as well? Second question, I'm trying deduplication tools for the very first time and I still have to figure out how to handle snapper snapshots, which are read only. I currently tried duperemove 0.11 git and I get tons of "Error 30: Read-only file system while opening "/.../@snapshots/4385/...". How am I supposed to handle snapper snapshots? I do not run duperemove from a live distro, instead I run it directly on the system I want to deduplicate: sudo mount -o noatime,compress=lzo,autodefrag /dev/mapper/cryptroot /home/niko/nosnap/rootfs/ sudo duperemove -drh --dedupe-options=nofiemap --hashfile=/home/niko/nosnap/rootfs.hash /home/niko/nosnap/rootfs/ Is btrfs-dedupe able to handle snapper snapshots? Thanks, Niccolo' Belli -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
Perhaps the complexity of doing this efficiently makes it inappropriate for inclusion in the tool itself, whereas I believe the core implementation's focus is on in-band deduplication, automatic and behind the scenes. On 08/11/16 03:40, Christoph Anton Mitterer wrote: On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote: I think adding a whole-file dedup mode to duperemove would be better (from user's POV) than writing a whole new tool What would IMO be really good from a user's POV was, if one of the tools, deemed to be the "best", would be added to the btrfs-progs and simply become "the official" one. Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote: > I think adding a whole-file dedup mode to duperemove would be better > (from user's POV) than writing a whole new tool What would IMO be really good from a user's POV was, if one of the tools, deemed to be the "best", would be added to the btrfs-progs and simply become "the official" one. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: Announcing btrfs-dedupe
On Mon, Nov 07, 2016 at 09:54:09PM +0100, Adam Borowski wrote: > On Mon, Nov 07, 2016 at 09:48:41AM -0800, Mark Fasheh wrote: > > also on XFS with the dedupe ioctl (I believe this should be out with > > Linux-4.9). > > It's already there in 4.9-rc1, although you need a special version of > xfsprogs (possibly already released, I didn't check). It's an experimental > feature that needs to be enabled with "-m reflink=1". The code will be available in xfsprogs 4.9, due out after Linux 4.9. You'll still have to pass '-m reflink=1' to enable reflink until we declare the feature stable, however. > Despite that experimental status, I'd strongly recommend James to test his > tool on xfs as well, as it's the second major implementation of this API[1]. Agreed. :) > Mark has already included XFS in documentation of duperemove, all that looks > amiss is btrfs-extent-same having an obsolete name. But then, I never did > any non-superficial tests on XFS, beyond "seems to work". /me wonders if ocfs2 will ever catch up to the reflink/dedupe party. ;) --Darrick > > > Meow! > > [1]. For some reasons zfs-on-linux guys didn't implement this yet, despite > it being an obvious thing on ZFS. > -- > A MAP07 (Dead Simple) raspberry tincture recipe: 0.5l 95% alcohol, 1kg > raspberries, 0.4kg sugar; put into a big jar for 1 month. Filter out and > throw away the fruits (can dump them into a cake, etc), let the drink age > at least 3-6 months. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On Mon, Nov 07, 2016 at 09:48:41AM -0800, Mark Fasheh wrote: > also on XFS with the dedupe ioctl (I believe this should be out with > Linux-4.9). It's already there in 4.9-rc1, although you need a special version of xfsprogs (possibly already released, I didn't check). It's an experimental feature that needs to be enabled with "-m reflink=1". Despite that experimental status, I'd strongly recommend James to test his tool on xfs as well, as it's the second major implementation of this API[1]. Mark has already included XFS in documentation of duperemove, all that looks amiss is btrfs-extent-same having an obsolete name. But then, I never did any non-superficial tests on XFS, beyond "seems to work". Meow! [1]. For some reasons zfs-on-linux guys didn't implement this yet, despite it being an obvious thing on ZFS. -- A MAP07 (Dead Simple) raspberry tincture recipe: 0.5l 95% alcohol, 1kg raspberries, 0.4kg sugar; put into a big jar for 1 month. Filter out and throw away the fruits (can dump them into a cake, etc), let the drink age at least 3-6 months. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
FWIW I have updated my comments about duperemove and also the "caveat" section you mentioned in your other mail in the readme. http://btrfs-dedupe.com James On 07/11/16 19:49, James Pharaoh wrote: Annoyingly I can't find this now, but I definitely remember reading someone, apparently someone knowledgable, claim that the latest version of the kernel which I was using at the time, still suffered from issues regarding the dedupe code. This was a while ago, and I would be very pleased to hear that there is high confidence in the current implementation! I'll post a link if I manage to find the comments. James On 07/11/16 18:59, Mark Fasheh wrote: Hi James, Re the following text on your project page: "IMPORTANT CAVEAT — I have read that there are race and/or error conditions which can cause filesystem corruption in the kernel implementation of the deduplication ioctl." Can you expound on that? I'm not aware of any bugs right now but if there is any it'd absolutely be worth having that info on the btrfs list. Thanks, --Mark On Sun, Nov 6, 2016 at 7:30 AM, James Pharaoh wrote: Hi all, I'm pleased to announce my btrfs deduplication utility, written in Rust. This operates on whole files, is fast, and I believe complements the existing utilities (duperemove, bedup), which exist currently. Please visit the homepage for more information: http://btrfs-dedupe.com James Pharaoh -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
Annoyingly I can't find this now, but I definitely remember reading someone, apparently someone knowledgable, claim that the latest version of the kernel which I was using at the time, still suffered from issues regarding the dedupe code. This was a while ago, and I would be very pleased to hear that there is high confidence in the current implementation! I'll post a link if I manage to find the comments. James On 07/11/16 18:59, Mark Fasheh wrote: Hi James, Re the following text on your project page: "IMPORTANT CAVEAT — I have read that there are race and/or error conditions which can cause filesystem corruption in the kernel implementation of the deduplication ioctl." Can you expound on that? I'm not aware of any bugs right now but if there is any it'd absolutely be worth having that info on the btrfs list. Thanks, --Mark On Sun, Nov 6, 2016 at 7:30 AM, James Pharaoh wrote: Hi all, I'm pleased to announce my btrfs deduplication utility, written in Rust. This operates on whole files, is fast, and I believe complements the existing utilities (duperemove, bedup), which exist currently. Please visit the homepage for more information: http://btrfs-dedupe.com James Pharaoh -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
Hi James, Re the following text on your project page: "IMPORTANT CAVEAT — I have read that there are race and/or error conditions which can cause filesystem corruption in the kernel implementation of the deduplication ioctl." Can you expound on that? I'm not aware of any bugs right now but if there is any it'd absolutely be worth having that info on the btrfs list. Thanks, --Mark On Sun, Nov 6, 2016 at 7:30 AM, James Pharaoh wrote: > Hi all, > > I'm pleased to announce my btrfs deduplication utility, written in Rust. > This operates on whole files, is fast, and I believe complements the > existing utilities (duperemove, bedup), which exist currently. > > Please visit the homepage for more information: > > http://btrfs-dedupe.com > > James Pharaoh > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
Hi David and James, On Mon, Nov 7, 2016 at 6:02 AM, David Sterba wrote: > On Sun, Nov 06, 2016 at 02:30:52PM +0100, James Pharaoh wrote: >> I'm pleased to announce my btrfs deduplication utility, written in Rust. >> This operates on whole files, is fast, and I believe complements the >> existing utilities (duperemove, bedup), which exist currently. > > Mark can correct me if I'm wrong, but AFAIK, duperemove can consume > output of fdupes, which does the whole file scanning for duplicates. And > I think adding a whole-file dedup mode to duperemove would be better > (from user's POV) than writing a whole new tool, eg. because of existing > availability of duperemove in the distros. Yeah you are correct - fdupes -r /foo | duperemove --fdupes will get you the same effect. There's been a request for us to do all of that internally so that the whole file dedupe works with the mtime checking code. This is entirely doable. I would probably either add a field to the files table or add a new table to hold whole-file hashes. We can then squeeze down our existing block hashes into one big one or just rehash the whole file. > Also looking to your roadmap, some of the items are implemented in > duperemove: database of existing csums, cross filesystem boundary, > mtime-based speedups). Yeah, rescanning based on mtime was a huge speedup for Duperemove as was keeping checksums in a db. We do all this today, also on XFS with the dedupe ioctl (I believe this should be out with Linux-4.9). Btw, there's lots of little details and bug fixes which I feel add up to a relatively complete (though far from perfect!) tool. For example, the dedupe code can handle multiple kernel versions including old kernels which couldn't dedupe on non aligned block boundaries. Every major step in duperemove is threaded at this point too which has also been an enormous performance increase (which new features benefit from). Thanks, --Mark -- "When the going gets weird, the weird turn pro." Hunter S. Thompson -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On Sun, Nov 06, 2016 at 02:30:52PM +0100, James Pharaoh wrote: > I'm pleased to announce my btrfs deduplication utility, written in Rust. > This operates on whole files, is fast, and I believe complements the > existing utilities (duperemove, bedup), which exist currently. Mark can correct me if I'm wrong, but AFAIK, duperemove can consume output of fdupes, which does the whole file scanning for duplicates. And I think adding a whole-file dedup mode to duperemove would be better (from user's POV) than writing a whole new tool, eg. because of existing availability of duperemove in the distros. Also looking to your roadmap, some of the items are implemented in duperemove: database of existing csums, cross filesystem boundary, mtime-based speedups). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Announcing btrfs-dedupe
Hi all, I'm pleased to announce my btrfs deduplication utility, written in Rust. This operates on whole files, is fast, and I believe complements the existing utilities (duperemove, bedup), which exist currently. Please visit the homepage for more information: http://btrfs-dedupe.com James Pharaoh -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html