Re: btrfs dedup - available or experimental? Or yet to be?

2015-03-29 Thread Kai Krakow
Rich Freeman r-bt...@thefreemanclan.net schrieb:

 On Sun, Mar 29, 2015 at 7:43 AM, Kai Krakow hurikha...@gmail.com wrote:

 With the planned performance improvements, I'm guessing the best way will
 become mounting the root subvolume (subvolid 0) and letting duperemove
 work on that as a whole - including crossing all fs boundaries.

 
 Why cross filesystem boundaries by default?  If you scan from the root
 subvolume you're guanteed to traverse every file on the filesystem
 (which is all that can be deduped) without crossing any filesystem
 boundaries.  Even if you have btrfs on non-btrfs on btrfs there must
 be some other path that reaches the same files when scanning from
 subvolid 0.

Yes, the chosen default is probably not the best for this kind of utility. 
But I suppose it follows the principle of least surprise. At least every 
utility I'm daily using (like find) follows this default route. By the way, 
I wrote default because one should keep in mind that it is not recursive 
by default (and thus crossing the boundary wouldn't even apply in the 
default configuration) which only strengthens my point for the principle of 
least surprise. And I'd leave that open for discussion here to change the 
default, all I suggested was that duperemove should not try to become smart 
about it as the only choice (behavior will be undefined otherwise when 
deploying this on a vast amount of individually configured systems). I could 
image that there was a cmdline option to make it smart.

The idea for subvolid 0: It is just pure intention how I would use it for my 
personal purpose. By no means this should be in any default deployments.

-- 
Replies to list only preferred.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs dedup - available or experimental? Or yet to be?

2015-03-29 Thread Rich Freeman
On Sun, Mar 29, 2015 at 7:43 AM, Kai Krakow hurikha...@gmail.com wrote:

 With the planned performance improvements, I'm guessing the best way will
 become mounting the root subvolume (subvolid 0) and letting duperemove work
 on that as a whole - including crossing all fs boundaries.


Why cross filesystem boundaries by default?  If you scan from the root
subvolume you're guanteed to traverse every file on the filesystem
(which is all that can be deduped) without crossing any filesystem
boundaries.  Even if you have btrfs on non-btrfs on btrfs there must
be some other path that reaches the same files when scanning from
subvolid 0.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs dedup - available or experimental? Or yet to be?

2015-03-29 Thread Christoph Anton Mitterer
On Sun, 2015-03-29 at 13:43 +0200, Kai Krakow wrote: 
 Concluding that: duperemove should probably not try to become smart about 
 filesystem boundaries. It should either cross them or not as it is now - the 
 option is left to the user (as is the task to supply proper cmdline 
 arguments with that).
Couldn't it per default simply cross boundaries just within the same
btrfs fs (i.e. amongst all it's subvolumes), since this seems to be the
natural choice users want in most cases,... and via --no-xdev option or
something like that it would be allowed to pass boundaries?


Cheers,
Chris.


smime.p7s
Description: S/MIME cryptographic signature


Re: btrfs dedup - available or experimental? Or yet to be?

2015-03-29 Thread Christoph Anton Mitterer
On Sun, 2015-03-29 at 16:44 +0200, Kai Krakow wrote: 
 Yes, the chosen default is probably not the best for this kind of utility. 
 But I suppose it follows the principle of least surprise. At least every 
 utility I'm daily using (like find) follows this default route.
But the default with all these tools is that they operate on the file
hierarchy and per default don't care about filesystems at all - or at
least not in their original meaning.

dedup is IMHO however a more filesystem internal centric operation...
more like defragmentation ore tune2fs.


Cheers.


smime.p7s
Description: S/MIME cryptographic signature


Re: btrfs dedup - available or experimental? Or yet to be?

2015-03-29 Thread Kai Krakow
Rich Freeman r-bt...@thefreemanclan.net schrieb:

 On Thu, Mar 26, 2015 at 8:07 PM, Martin m_bt...@ml1.co.uk wrote:

 Anyone with any comments on how well duperemove performs for TB-sized
 volumes?
 
 Took many hours but less than a day for a few TB - I'm not sure
 whether it is smart enough to take less time on subsequent scans like
 bedup.
 

 Does it work across subvolumes? (Presumably not...)
 
 As far as I can tell, yes.  Unless you pass a command-line option it
 crosses filesystem boundaries and even scans non-btrfs filesystems
 (like /proc, /dev, etc).  Obviously you'll want to avoid that since it
 only wastes time and I can just imagine it trying to hash kcore and
 such.
 
 Other than being less-than-ideal intelligence-wise, it seemed
 effective.  I can live with that in an early release like this.

This is mainly in there to support deduping across different subvolumes 
within the same device pool. So I think the idea was neither less-than-
ideal, nor unintelligent, and it has nothing to do with performance.

But your warning is still valid: One should take care not to dedupe 
special filesystems (but that is the same with every other tool out there, 
like rsync, cp, essentially everything that supports recursion), nor is it 
very effective for the deduplication process to cross a boundary to a non-
btrfs device - for one or more exceptions: You may want duperemove to write 
hashes for a non-btrfs device and use the result for other purposes outside 
of duperemoves scope, or you are nesting btrfs into non-btrfs into btrfs 
mounts, or...

Concluding that: duperemove should probably not try to become smart about 
filesystem boundaries. It should either cross them or not as it is now - the 
option is left to the user (as is the task to supply proper cmdline 
arguments with that).

With the planned performance improvements, I'm guessing the best way will 
become mounting the root subvolume (subvolid 0) and letting duperemove work 
on that as a whole - including crossing all fs boundaries.

-- 
Replies to list only preferred.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs dedup - available or experimental? Or yet to be?

2015-03-27 Thread Mark Fasheh
On Fri, Mar 27, 2015 at 12:07:29AM +, Martin wrote:
 Excellent and very rapid packaging, thanks!
 
 
 Already compiled, installed, and soon to be tried on a test subvolume...
 
 
 Anyone with any comments on how well duperemove performs for TB-sized
 volumes?

https://github.com/markfasheh/duperemove/wiki/Performance-Numbers

That page has some sample performance numbers. Keep in mind that the tests
were done on reasonably nice hardware.

TB-size is definitely on the larger end of what I expect it should handling
these days. The biggest problem you would see is memory usage - versions
0.09 and below will be storing all hashes in memory so if everything else is
fast enough that's likely the first bump you'll hit.

Master branch has some code which reduces our memory consumption
dramatically by using a bloom filter and temporarily storing them on disk.
That branch needs some more features and bug fixing before I'm ready to call
it stable.


 Does it work across subvolumes? (Presumably not...)

Yep it will dedupe across subvolumes for you!
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs dedup - available or experimental? Or yet to be?

2015-03-27 Thread Mark Fasheh
On Tue, Mar 24, 2015 at 09:30:52PM -0400, Rich Freeman wrote:
 On Mon, Mar 23, 2015 at 7:22 PM, Hugo Mills h...@carfax.org.uk wrote:
  On Mon, Mar 23, 2015 at 11:10:46PM +, Martin wrote:
  As titled:
 
 
  Does btrfs have dedup (on raid1 multiple disks) that can be enabled?
 
 The current state of play is on the wiki:
 
  https://btrfs.wiki.kernel.org/index.php/Deduplication
 
 
 I hadn't realized that bedup was deprecated.
 
 This seems unfortunate since it seemed to be a lot smarter about
 detecting what has and hasn't already been scanned, and it also
 supported defragmenting files while de-duplicating them.

Hi just FYI, only rescanning files that have changed since the last scan is
a feature I've been working on in duperemove for some time now. I have some
rudimentary code that works which will be going into master branch in a week
or so (I wanted to finish it this week but other things have kept me busy).

But anyway that should help with the lack of intelligence on what files to
scan.
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs dedup - available or experimental? Or yet to be?

2015-03-26 Thread Martin
On 25/03/15 01:30, Rich Freeman wrote:
 On Mon, Mar 23, 2015 at 7:22 PM, Hugo Mills h...@carfax.org.uk wrote:
 On Mon, Mar 23, 2015 at 11:10:46PM +, Martin wrote:
 As titled:


 Does btrfs have dedup (on raid1 multiple disks) that can be enabled?

The current state of play is on the wiki:

 https://btrfs.wiki.kernel.org/index.php/Deduplication

 
 I hadn't realized that bedup was deprecated.
 
 This seems unfortunate since it seemed to be a lot smarter about
 detecting what has and hasn't already been scanned, and it also
 supported defragmenting files while de-duplicating them.
 
 I'll give duperemove a shot.   I just packaged it on Gentoo.

Excellent and very rapid packaging, thanks!


Already compiled, installed, and soon to be tried on a test subvolume...


Anyone with any comments on how well duperemove performs for TB-sized
volumes?

Does it work across subvolumes? (Presumably not...)


Thanks,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs dedup - available or experimental? Or yet to be?

2015-03-26 Thread Rich Freeman
On Thu, Mar 26, 2015 at 8:07 PM, Martin m_bt...@ml1.co.uk wrote:

 Anyone with any comments on how well duperemove performs for TB-sized
 volumes?

Took many hours but less than a day for a few TB - I'm not sure
whether it is smart enough to take less time on subsequent scans like
bedup.


 Does it work across subvolumes? (Presumably not...)

As far as I can tell, yes.  Unless you pass a command-line option it
crosses filesystem boundaries and even scans non-btrfs filesystems
(like /proc, /dev, etc).  Obviously you'll want to avoid that since it
only wastes time and I can just imagine it trying to hash kcore and
such.

Other than being less-than-ideal intelligence-wise, it seemed
effective.  I can live with that in an early release like this.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs dedup - available or experimental? Or yet to be?

2015-03-24 Thread Rich Freeman
On Mon, Mar 23, 2015 at 7:22 PM, Hugo Mills h...@carfax.org.uk wrote:
 On Mon, Mar 23, 2015 at 11:10:46PM +, Martin wrote:
 As titled:


 Does btrfs have dedup (on raid1 multiple disks) that can be enabled?

The current state of play is on the wiki:

 https://btrfs.wiki.kernel.org/index.php/Deduplication


I hadn't realized that bedup was deprecated.

This seems unfortunate since it seemed to be a lot smarter about
detecting what has and hasn't already been scanned, and it also
supported defragmenting files while de-duplicating them.

I'll give duperemove a shot.   I just packaged it on Gentoo.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs dedup - available or experimental? Or yet to be?

2015-03-23 Thread Martin
As titled:


Does btrfs have dedup (on raid1 multiple disks) that can be enabled?

Can anyone relate any experiences?

Is there (or will there be,) a bad penalty of fragmentation?


(For kernel 3.18.9)

Thanks,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs dedup - available or experimental? Or yet to be?

2015-03-23 Thread Hugo Mills
On Mon, Mar 23, 2015 at 11:10:46PM +, Martin wrote:
 As titled:
 
 
 Does btrfs have dedup (on raid1 multiple disks) that can be enabled?

   The current state of play is on the wiki:

https://btrfs.wiki.kernel.org/index.php/Deduplication

 Can anyone relate any experiences?

   duperemove is reported as working.

 Is there (or will there be,) a bad penalty of fragmentation?

   With duperemove, it operates on an extent scale, not at the level
of blocks, so the fragmentation isn't so bad.

   Hugo.

-- 
Hugo Mills | ©1973 Unclear Research Ltd
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: 65E74AC0  |


signature.asc
Description: Digital signature