Re: btrfs dedup - available or experimental? Or yet to be?
Rich Freeman r-bt...@thefreemanclan.net schrieb: On Sun, Mar 29, 2015 at 7:43 AM, Kai Krakow hurikha...@gmail.com wrote: With the planned performance improvements, I'm guessing the best way will become mounting the root subvolume (subvolid 0) and letting duperemove work on that as a whole - including crossing all fs boundaries. Why cross filesystem boundaries by default? If you scan from the root subvolume you're guanteed to traverse every file on the filesystem (which is all that can be deduped) without crossing any filesystem boundaries. Even if you have btrfs on non-btrfs on btrfs there must be some other path that reaches the same files when scanning from subvolid 0. Yes, the chosen default is probably not the best for this kind of utility. But I suppose it follows the principle of least surprise. At least every utility I'm daily using (like find) follows this default route. By the way, I wrote default because one should keep in mind that it is not recursive by default (and thus crossing the boundary wouldn't even apply in the default configuration) which only strengthens my point for the principle of least surprise. And I'd leave that open for discussion here to change the default, all I suggested was that duperemove should not try to become smart about it as the only choice (behavior will be undefined otherwise when deploying this on a vast amount of individually configured systems). I could image that there was a cmdline option to make it smart. The idea for subvolid 0: It is just pure intention how I would use it for my personal purpose. By no means this should be in any default deployments. -- Replies to list only preferred. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs dedup - available or experimental? Or yet to be?
On Sun, Mar 29, 2015 at 7:43 AM, Kai Krakow hurikha...@gmail.com wrote: With the planned performance improvements, I'm guessing the best way will become mounting the root subvolume (subvolid 0) and letting duperemove work on that as a whole - including crossing all fs boundaries. Why cross filesystem boundaries by default? If you scan from the root subvolume you're guanteed to traverse every file on the filesystem (which is all that can be deduped) without crossing any filesystem boundaries. Even if you have btrfs on non-btrfs on btrfs there must be some other path that reaches the same files when scanning from subvolid 0. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs dedup - available or experimental? Or yet to be?
On Sun, 2015-03-29 at 13:43 +0200, Kai Krakow wrote: Concluding that: duperemove should probably not try to become smart about filesystem boundaries. It should either cross them or not as it is now - the option is left to the user (as is the task to supply proper cmdline arguments with that). Couldn't it per default simply cross boundaries just within the same btrfs fs (i.e. amongst all it's subvolumes), since this seems to be the natural choice users want in most cases,... and via --no-xdev option or something like that it would be allowed to pass boundaries? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: btrfs dedup - available or experimental? Or yet to be?
On Sun, 2015-03-29 at 16:44 +0200, Kai Krakow wrote: Yes, the chosen default is probably not the best for this kind of utility. But I suppose it follows the principle of least surprise. At least every utility I'm daily using (like find) follows this default route. But the default with all these tools is that they operate on the file hierarchy and per default don't care about filesystems at all - or at least not in their original meaning. dedup is IMHO however a more filesystem internal centric operation... more like defragmentation ore tune2fs. Cheers. smime.p7s Description: S/MIME cryptographic signature
Re: btrfs dedup - available or experimental? Or yet to be?
Rich Freeman r-bt...@thefreemanclan.net schrieb: On Thu, Mar 26, 2015 at 8:07 PM, Martin m_bt...@ml1.co.uk wrote: Anyone with any comments on how well duperemove performs for TB-sized volumes? Took many hours but less than a day for a few TB - I'm not sure whether it is smart enough to take less time on subsequent scans like bedup. Does it work across subvolumes? (Presumably not...) As far as I can tell, yes. Unless you pass a command-line option it crosses filesystem boundaries and even scans non-btrfs filesystems (like /proc, /dev, etc). Obviously you'll want to avoid that since it only wastes time and I can just imagine it trying to hash kcore and such. Other than being less-than-ideal intelligence-wise, it seemed effective. I can live with that in an early release like this. This is mainly in there to support deduping across different subvolumes within the same device pool. So I think the idea was neither less-than- ideal, nor unintelligent, and it has nothing to do with performance. But your warning is still valid: One should take care not to dedupe special filesystems (but that is the same with every other tool out there, like rsync, cp, essentially everything that supports recursion), nor is it very effective for the deduplication process to cross a boundary to a non- btrfs device - for one or more exceptions: You may want duperemove to write hashes for a non-btrfs device and use the result for other purposes outside of duperemoves scope, or you are nesting btrfs into non-btrfs into btrfs mounts, or... Concluding that: duperemove should probably not try to become smart about filesystem boundaries. It should either cross them or not as it is now - the option is left to the user (as is the task to supply proper cmdline arguments with that). With the planned performance improvements, I'm guessing the best way will become mounting the root subvolume (subvolid 0) and letting duperemove work on that as a whole - including crossing all fs boundaries. -- Replies to list only preferred. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs dedup - available or experimental? Or yet to be?
On Fri, Mar 27, 2015 at 12:07:29AM +, Martin wrote: Excellent and very rapid packaging, thanks! Already compiled, installed, and soon to be tried on a test subvolume... Anyone with any comments on how well duperemove performs for TB-sized volumes? https://github.com/markfasheh/duperemove/wiki/Performance-Numbers That page has some sample performance numbers. Keep in mind that the tests were done on reasonably nice hardware. TB-size is definitely on the larger end of what I expect it should handling these days. The biggest problem you would see is memory usage - versions 0.09 and below will be storing all hashes in memory so if everything else is fast enough that's likely the first bump you'll hit. Master branch has some code which reduces our memory consumption dramatically by using a bloom filter and temporarily storing them on disk. That branch needs some more features and bug fixing before I'm ready to call it stable. Does it work across subvolumes? (Presumably not...) Yep it will dedupe across subvolumes for you! --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs dedup - available or experimental? Or yet to be?
On Tue, Mar 24, 2015 at 09:30:52PM -0400, Rich Freeman wrote: On Mon, Mar 23, 2015 at 7:22 PM, Hugo Mills h...@carfax.org.uk wrote: On Mon, Mar 23, 2015 at 11:10:46PM +, Martin wrote: As titled: Does btrfs have dedup (on raid1 multiple disks) that can be enabled? The current state of play is on the wiki: https://btrfs.wiki.kernel.org/index.php/Deduplication I hadn't realized that bedup was deprecated. This seems unfortunate since it seemed to be a lot smarter about detecting what has and hasn't already been scanned, and it also supported defragmenting files while de-duplicating them. Hi just FYI, only rescanning files that have changed since the last scan is a feature I've been working on in duperemove for some time now. I have some rudimentary code that works which will be going into master branch in a week or so (I wanted to finish it this week but other things have kept me busy). But anyway that should help with the lack of intelligence on what files to scan. --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs dedup - available or experimental? Or yet to be?
On 25/03/15 01:30, Rich Freeman wrote: On Mon, Mar 23, 2015 at 7:22 PM, Hugo Mills h...@carfax.org.uk wrote: On Mon, Mar 23, 2015 at 11:10:46PM +, Martin wrote: As titled: Does btrfs have dedup (on raid1 multiple disks) that can be enabled? The current state of play is on the wiki: https://btrfs.wiki.kernel.org/index.php/Deduplication I hadn't realized that bedup was deprecated. This seems unfortunate since it seemed to be a lot smarter about detecting what has and hasn't already been scanned, and it also supported defragmenting files while de-duplicating them. I'll give duperemove a shot. I just packaged it on Gentoo. Excellent and very rapid packaging, thanks! Already compiled, installed, and soon to be tried on a test subvolume... Anyone with any comments on how well duperemove performs for TB-sized volumes? Does it work across subvolumes? (Presumably not...) Thanks, Martin -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs dedup - available or experimental? Or yet to be?
On Thu, Mar 26, 2015 at 8:07 PM, Martin m_bt...@ml1.co.uk wrote: Anyone with any comments on how well duperemove performs for TB-sized volumes? Took many hours but less than a day for a few TB - I'm not sure whether it is smart enough to take less time on subsequent scans like bedup. Does it work across subvolumes? (Presumably not...) As far as I can tell, yes. Unless you pass a command-line option it crosses filesystem boundaries and even scans non-btrfs filesystems (like /proc, /dev, etc). Obviously you'll want to avoid that since it only wastes time and I can just imagine it trying to hash kcore and such. Other than being less-than-ideal intelligence-wise, it seemed effective. I can live with that in an early release like this. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs dedup - available or experimental? Or yet to be?
On Mon, Mar 23, 2015 at 7:22 PM, Hugo Mills h...@carfax.org.uk wrote: On Mon, Mar 23, 2015 at 11:10:46PM +, Martin wrote: As titled: Does btrfs have dedup (on raid1 multiple disks) that can be enabled? The current state of play is on the wiki: https://btrfs.wiki.kernel.org/index.php/Deduplication I hadn't realized that bedup was deprecated. This seems unfortunate since it seemed to be a lot smarter about detecting what has and hasn't already been scanned, and it also supported defragmenting files while de-duplicating them. I'll give duperemove a shot. I just packaged it on Gentoo. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs dedup - available or experimental? Or yet to be?
As titled: Does btrfs have dedup (on raid1 multiple disks) that can be enabled? Can anyone relate any experiences? Is there (or will there be,) a bad penalty of fragmentation? (For kernel 3.18.9) Thanks, Martin -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs dedup - available or experimental? Or yet to be?
On Mon, Mar 23, 2015 at 11:10:46PM +, Martin wrote: As titled: Does btrfs have dedup (on raid1 multiple disks) that can be enabled? The current state of play is on the wiki: https://btrfs.wiki.kernel.org/index.php/Deduplication Can anyone relate any experiences? duperemove is reported as working. Is there (or will there be,) a bad penalty of fragmentation? With duperemove, it operates on an extent scale, not at the level of blocks, so the fragmentation isn't so bad. Hugo. -- Hugo Mills | ©1973 Unclear Research Ltd hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: 65E74AC0 | signature.asc Description: Digital signature