Re: Announcing btrfs-dedupe 1.1.0

2017-01-26 Thread James Pharaoh
Yeah, ok, can you create an account on my souce tracker and create an 
issue for this please?


https://gitlab.wellbehavedsoftware.com/well-behaved-software/btrfs-dedupe

I am fairly sure I can fix this without too much difficulty. ;-)

James

On 26/01/17 18:16, Robert Krig wrote:

I've tried your binaries, which also seem to work fine on Debian
Stretch. (At least using the latest ubuntu xenial binary).

I've only run into one little issue, btrfs-dedupe will abort with
"Serialization error: invalid value: Path contains invalid UTF-8
characters at line 0 column 0" if I run it on some large top level
directories. Unfortunately it doesn't list which directory it has a
problem with. Wouldn't it be better if btrfs-dedupe simply ignores
directories it has a problem with, and continues with the rest?


On 13.01.2017 20:08, James Pharaoh wrote:

Did you try the binaries? I can build binaries for other platforms if
you let me know what you are interested in.

In any case, you'll need to install rust:

https://www.rust-lang.org/install.html

Which will tell you to do this on Linux, and presumably all unix
platforms:

curl https://sh.rustup.rs -sSf | sh

You can either log in and out or reload your profile to get the
installed software in your PATH:

source ~/.profile

Then you can checkout btrfs-dedupe, eg from my gitlab public https,
I'll assume you have git installed:

git clone
https://gitlab.wellbehavedsoftware.com/well-behaved-software/btrfs-dedupe.git

Then cd in and build using cargo:

cd btrfs-dedupe
cargo build --release

There is basically just one binary which will end up in
target/release/btrfs-dedupe.

I'll add these instructions to the README later.

James

On 13/01/17 13:56, Robert Krig wrote:

Hi, could you include some build instructions for people that are
unfamiliar with compiling rust code?


On 08.01.2017 17:57, James Pharaoh wrote:

Hi everyone,

I'm pleased to announce a new version of my btrfs-dedupe tool, written
in rust, available here:

http://btrfs-dedupe.com/

Binary packages built on ubuntu (probably will work elsewhere, but
haven't tried this), are available at:

https://dist.wellbehavedsoftware.com/btrfs-dedupe/

This version is considered ready for production use. It maintains a
compressed database of the filesystem state, and it tracks file
metadata, hashes file contents, and the extent-map contents, in order
to work out what needs to be deduplicated.

This is a whole-file deduplication tool, similar to bedup, but since
it is written in Rust, and designed to work with the dedupe ioctl, I
think it's more suitable for production use.

As normal for open source, this comes without any warranty etc, but
the only updates are performed via the defragment and deduplication
ioctls, and so assuming they work correctly then this should not cause
any corruption.

Please feel free to contact me with any questions/problems.
--
To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe 1.1.0

2017-01-26 Thread Robert Krig
I've tried your binaries, which also seem to work fine on Debian
Stretch. (At least using the latest ubuntu xenial binary).

I've only run into one little issue, btrfs-dedupe will abort with
"Serialization error: invalid value: Path contains invalid UTF-8
characters at line 0 column 0" if I run it on some large top level
directories. Unfortunately it doesn't list which directory it has a
problem with. Wouldn't it be better if btrfs-dedupe simply ignores
directories it has a problem with, and continues with the rest?


On 13.01.2017 20:08, James Pharaoh wrote:
> Did you try the binaries? I can build binaries for other platforms if
> you let me know what you are interested in.
>
> In any case, you'll need to install rust:
>
> https://www.rust-lang.org/install.html
>
> Which will tell you to do this on Linux, and presumably all unix
> platforms:
>
> curl https://sh.rustup.rs -sSf | sh
>
> You can either log in and out or reload your profile to get the
> installed software in your PATH:
>
> source ~/.profile
>
> Then you can checkout btrfs-dedupe, eg from my gitlab public https,
> I'll assume you have git installed:
>
> git clone
> https://gitlab.wellbehavedsoftware.com/well-behaved-software/btrfs-dedupe.git
>
> Then cd in and build using cargo:
>
> cd btrfs-dedupe
> cargo build --release
>
> There is basically just one binary which will end up in
> target/release/btrfs-dedupe.
>
> I'll add these instructions to the README later.
>
> James
>
> On 13/01/17 13:56, Robert Krig wrote:
>> Hi, could you include some build instructions for people that are
>> unfamiliar with compiling rust code?
>>
>>
>> On 08.01.2017 17:57, James Pharaoh wrote:
>>> Hi everyone,
>>>
>>> I'm pleased to announce a new version of my btrfs-dedupe tool, written
>>> in rust, available here:
>>>
>>> http://btrfs-dedupe.com/
>>>
>>> Binary packages built on ubuntu (probably will work elsewhere, but
>>> haven't tried this), are available at:
>>>
>>> https://dist.wellbehavedsoftware.com/btrfs-dedupe/
>>>
>>> This version is considered ready for production use. It maintains a
>>> compressed database of the filesystem state, and it tracks file
>>> metadata, hashes file contents, and the extent-map contents, in order
>>> to work out what needs to be deduplicated.
>>>
>>> This is a whole-file deduplication tool, similar to bedup, but since
>>> it is written in Rust, and designed to work with the dedupe ioctl, I
>>> think it's more suitable for production use.
>>>
>>> As normal for open source, this comes without any warranty etc, but
>>> the only updates are performed via the defragment and deduplication
>>> ioctls, and so assuming they work correctly then this should not cause
>>> any corruption.
>>>
>>> Please feel free to contact me with any questions/problems.
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe 1.1.0

2017-01-13 Thread James Pharaoh
Did you try the binaries? I can build binaries for other platforms if 
you let me know what you are interested in.


In any case, you'll need to install rust:

https://www.rust-lang.org/install.html

Which will tell you to do this on Linux, and presumably all unix platforms:

curl https://sh.rustup.rs -sSf | sh

You can either log in and out or reload your profile to get the 
installed software in your PATH:


source ~/.profile

Then you can checkout btrfs-dedupe, eg from my gitlab public https, I'll 
assume you have git installed:


git clone 
https://gitlab.wellbehavedsoftware.com/well-behaved-software/btrfs-dedupe.git


Then cd in and build using cargo:

cd btrfs-dedupe
cargo build --release

There is basically just one binary which will end up in 
target/release/btrfs-dedupe.


I'll add these instructions to the README later.

James

On 13/01/17 13:56, Robert Krig wrote:

Hi, could you include some build instructions for people that are
unfamiliar with compiling rust code?


On 08.01.2017 17:57, James Pharaoh wrote:

Hi everyone,

I'm pleased to announce a new version of my btrfs-dedupe tool, written
in rust, available here:

http://btrfs-dedupe.com/

Binary packages built on ubuntu (probably will work elsewhere, but
haven't tried this), are available at:

https://dist.wellbehavedsoftware.com/btrfs-dedupe/

This version is considered ready for production use. It maintains a
compressed database of the filesystem state, and it tracks file
metadata, hashes file contents, and the extent-map contents, in order
to work out what needs to be deduplicated.

This is a whole-file deduplication tool, similar to bedup, but since
it is written in Rust, and designed to work with the dedupe ioctl, I
think it's more suitable for production use.

As normal for open source, this comes without any warranty etc, but
the only updates are performed via the defragment and deduplication
ioctls, and so assuming they work correctly then this should not cause
any corruption.

Please feel free to contact me with any questions/problems.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe 1.1.0

2017-01-13 Thread Robert Krig
Hi, could you include some build instructions for people that are
unfamiliar with compiling rust code?


On 08.01.2017 17:57, James Pharaoh wrote:
> Hi everyone,
>
> I'm pleased to announce a new version of my btrfs-dedupe tool, written
> in rust, available here:
>
> http://btrfs-dedupe.com/
>
> Binary packages built on ubuntu (probably will work elsewhere, but
> haven't tried this), are available at:
>
> https://dist.wellbehavedsoftware.com/btrfs-dedupe/
>
> This version is considered ready for production use. It maintains a
> compressed database of the filesystem state, and it tracks file
> metadata, hashes file contents, and the extent-map contents, in order
> to work out what needs to be deduplicated.
>
> This is a whole-file deduplication tool, similar to bedup, but since
> it is written in Rust, and designed to work with the dedupe ioctl, I
> think it's more suitable for production use.
>
> As normal for open source, this comes without any warranty etc, but
> the only updates are performed via the defragment and deduplication
> ioctls, and so assuming they work correctly then this should not cause
> any corruption.
>
> Please feel free to contact me with any questions/problems.
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe 1.1.0

2017-01-08 Thread James Pharaoh

It's supposed to be public! Will have to look into that

In any case, it's also on github here:

https://github.com/wellbehavedsoftware/btrfs-dedupe

James

On 08/01/17 22:22, j...@mailb.org wrote:

hey,

On 01/08/2017 05:57 PM, James Pharaoh wrote:

As normal for open source


where is the source?

https://gitlab.wellbehavedsoftware.com/well-behaved-software/btrfs-dedupe does
not have a way to browse the code

git clone 
https://gitlab.wellbehavedsoftware.com/well-behaved-software/btrfs-dedupe

asks for Username for 'https://gitlab.wellbehavedsoftware.com':

the site has a tooltip suggesting otherwise:


j


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Announcing btrfs-dedupe 1.1.0

2017-01-08 Thread James Pharaoh

Hi everyone,

I'm pleased to announce a new version of my btrfs-dedupe tool, written 
in rust, available here:


http://btrfs-dedupe.com/

Binary packages built on ubuntu (probably will work elsewhere, but 
haven't tried this), are available at:


https://dist.wellbehavedsoftware.com/btrfs-dedupe/

This version is considered ready for production use. It maintains a 
compressed database of the filesystem state, and it tracks file 
metadata, hashes file contents, and the extent-map contents, in order to 
work out what needs to be deduplicated.


This is a whole-file deduplication tool, similar to bedup, but since it 
is written in Rust, and designed to work with the dedupe ioctl, I think 
it's more suitable for production use.


As normal for open source, this comes without any warranty etc, but the 
only updates are performed via the defragment and deduplication ioctls, 
and so assuming they work correctly then this should not cause any 
corruption.


Please feel free to contact me with any questions/problems.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-18 Thread Niccolò Belli

On giovedì 17 novembre 2016 04:01:52 CET, Zygo Blaxell wrote:

Duperemove does use a lot of memory, but the logs at that URL only show
2G of RAM in duperemove--not nearly enough to trigger OOM under normal
conditions on an 8G machine.  There's another process with 6G of virtual
address space (although much less than that resident) that looks more
interesting (i.e. duperemove might just be the victim of some interaction
between baloo_file and the OOM killer).


Thanks, I killed baloo_file before starting duperemove and it somehow 
improved (it reached 99.73% before getting killed by OOM killer once 
again):


[ 6342.147251] Purging GPU memory, 0 pages freed, 18268 pages still pinned.
[ 6342.147253] 48 and 0 pages still available in the bound and unbound GPU 
page lists.
[ 6342.147340] Xorg invoked oom-killer: 
gfp_mask=0x240c0d0(GFP_TEMPORARY|__GFP_COMP|__GFP_ZERO), order=3, 
oom_score_adj=0

[ 6342.147341] Xorg cpuset=/ mems_allowed=0
[ 6342.147346] CPU: 3 PID: 650 Comm: Xorg Not tainted 4.8.8-2-ARCH #1
[ 6342.147347] Hardware name: Dell Inc. XPS 13 9343/0F5KF3, BIOS A09 
08/29/2016
[ 6342.147348]  0286 9b89a9c8 88020752f598 
812fde10
[ 6342.147351]  88020752f758 8801edc62ac0 88020752f608 
81205fa2
[ 6342.147353]  000188020752f5a0 9b89a9c8  


[ 6342.147356] Call Trace:
[ 6342.147361]  [] dump_stack+0x63/0x83
[ 6342.147364]  [] dump_header+0x5c/0x1ea
[ 6342.147366]  [] oom_kill_process+0x265/0x410
[ 6342.147368]  [] ? has_capability_noaudit+0x17/0x20
[ 6342.147369]  [] out_of_memory+0x380/0x420
[ 6342.147373]  [] ? find_next_bit+0x18/0x20
[ 6342.147374]  [] __alloc_pages_nodemask+0xda0/0xde0
[ 6342.147377]  [] alloc_pages_current+0x95/0x140
[ 6342.147380]  [] kmalloc_order_trace+0x2e/0xf0
[ 6342.147382]  [] __kmalloc+0x1ea/0x200
[ 6342.147397]  [] ? alloc_gen8_temp_bitmaps+0x2e/0x80 
[i915]
[ 6342.147407]  [] alloc_gen8_temp_bitmaps+0x47/0x80 
[i915]
[ 6342.147417]  [] gen8_alloc_va_range_3lvl+0x98/0x9c0 
[i915]

[ 6342.147419]  [] ? shmem_getpage_gfp+0xed/0xc30
[ 6342.147421]  [] ? sg_init_table+0x1a/0x40
[ 6342.147423]  [] ? swiotlb_map_sg_attrs+0x53/0x130
[ 6342.147432]  [] gen8_alloc_va_range+0x256/0x490 [i915]
[ 6342.147442]  [] i915_vma_bind+0x9b/0x190 [i915]
[ 6342.147453]  [] i915_gem_object_do_pin+0x86b/0xa90 
[i915]

[ 6342.147463]  [] i915_gem_object_pin+0x2d/0x30 [i915]
[ 6342.147472]  [] 
i915_gem_execbuffer_reserve_vma.isra.7+0x9f/0x180 [i915]
[ 6342.147482]  [] 
i915_gem_execbuffer_reserve.isra.8+0x396/0x3c0 [i915]
[ 6342.147491]  [] 
i915_gem_do_execbuffer.isra.14+0x68b/0x1270 [i915]

[ 6342.147493]  [] ? unix_stream_read_generic+0x281/0x8a0
[ 6342.147503]  [] i915_gem_execbuffer2+0x104/0x270 
[i915]

[ 6342.147509]  [] drm_ioctl+0x200/0x4f0 [drm]
[ 6342.147518]  [] ? i915_gem_execbuffer+0x330/0x330 
[i915]

[ 6342.147520]  [] ? enqueue_hrtimer+0x3d/0xa0
[ 6342.147522]  [] ? timerqueue_del+0x24/0x70
[ 6342.147523]  [] ? __remove_hrtimer+0x3c/0x90
[ 6342.147525]  [] do_vfs_ioctl+0xa3/0x5f0
[ 6342.147527]  [] ? do_setitimer+0x12b/0x230
[ 6342.147529]  [] ? __fget+0x77/0xb0
[ 6342.147531]  [] SyS_ioctl+0x79/0x90
[ 6342.147533]  [] entry_SYSCALL_64_fastpath+0x1a/0xa4
[ 6342.147535] Mem-Info:
[ 6342.147538] active_anon:76311 inactive_anon:76782 isolated_anon:0
   active_file:347581 inactive_file:1415592 isolated_file:64
   unevictable:8 dirty:482 writeback:0 unstable:0
   slab_reclaimable:27219 slab_unreclaimable:14772
   mapped:20714 shmem:30458 pagetables:10557 bounce:0
   free:25642 free_pcp:327 free_cma:0
[ 6342.147541] Node 0 active_anon:305244kB inactive_anon:307128kB 
active_file:1390324kB inactive_file:5662368kB unevictable:32kB 
isolated(anon):0kB isolated(file):256kB mapped:82856kB dirty:1928kB 
writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 81920kB anon_thp: 
121832kB writeback_tmp:0kB unstable:0kB pages_scanned:32 all_unreclaimable? 
no
[ 6342.147542] Node 0 DMA free:15688kB min:132kB low:164kB high:196kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB writepending:0kB present:15984kB managed:15896kB 
mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:208kB kernel_stack:0kB 
pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB

[ 6342.147545] lowmem_reserve[]: 0 3395 7850 7850 7850
[ 6342.147548] Node 0 DMA32 free:48772kB min:29172kB low:36464kB 
high:43756kB active_anon:84724kB inactive_anon:87164kB active_file:555728kB 
inactive_file:2639796kB unevictable:0kB writepending:696kB 
present:3564504kB managed:3488752kB mlocked:0kB slab_reclaimable:47196kB 
slab_unreclaimable:11472kB kernel_stack:192kB pagetables:200kB bounce:0kB 
free_pcp:1284kB local_pcp:0kB free_cma:0kB

[ 6342.147553] lowmem_reserve[]: 0 0 4454 4454 4454
[ 6342.147555] Node 0 Normal free:38108kB min:38276kB low:47844kB 
high:57412kB active_anon:220520kB inactive_anon:219964kB 
active_file:834596kB inactive_

Re: Announcing btrfs-dedupe

2016-11-16 Thread Zygo Blaxell
On Wed, Nov 16, 2016 at 11:24:33PM +0100, Niccolò Belli wrote:
> On martedì 15 novembre 2016 18:52:01 CET, Zygo Blaxell wrote:
> >Like I said, millions of extents per week...
> >
> >64K is an enormous dedup block size, especially if it comes with a 64K
> >alignment constraint as well.
> >
> >These are the top ten duplicate block sizes from a sample of 95251
> >dedup ops on a medium-sized production server with 4TB of filesystem
> >(about one machine-day of data):
> 
> Which software do you use to dedupe your data? I tried duperemove but it
> gets killed by the OOM killer because it triggers some kind of memory leak:
> https://github.com/markfasheh/duperemove/issues/163

Duperemove does use a lot of memory, but the logs at that URL only show
2G of RAM in duperemove--not nearly enough to trigger OOM under normal
conditions on an 8G machine.  There's another process with 6G of virtual
address space (although much less than that resident) that looks more
interesting (i.e. duperemove might just be the victim of some interaction
between baloo_file and the OOM killer).

On the other hand, the logs also show kernel 4.8.  100% of my test
machines failed to finish booting before they were cut down by OOM on
4.7.x kernels.  The same problem occurs on early kernels in the 4.8.x
series.  I am having good results with 4.8.6 and later, but you should
be aware that significant changes have been made to the way OOM works
in these kernel versions, and maybe you're hitting a regression for your
use case.

> Niccolò Belli
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: Digital signature


Re: Announcing btrfs-dedupe

2016-11-16 Thread Niccolò Belli

On martedì 15 novembre 2016 18:52:01 CET, Zygo Blaxell wrote:

Like I said, millions of extents per week...

64K is an enormous dedup block size, especially if it comes with a 64K
alignment constraint as well.

These are the top ten duplicate block sizes from a sample of 95251
dedup ops on a medium-sized production server with 4TB of filesystem
(about one machine-day of data):


Which software do you use to dedupe your data? I tried duperemove but it 
gets killed by the OOM killer because it triggers some kind of memory leak: 
https://github.com/markfasheh/duperemove/issues/163


Niccolò Belli
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-15 Thread Zygo Blaxell
On Tue, Nov 15, 2016 at 07:26:53AM -0500, Austin S. Hemmelgarn wrote:
> On 2016-11-14 16:10, Zygo Blaxell wrote:
> >Why is deduplicating thousands of blocks of data crazy?  I already
> >deduplicate four orders of magnitude more than that per week.
> You missed the 'tiny' quantifier.  I'm talking really small blocks, on the
> order of less than 64k (so, IOW, stuff that's not much bigger than a few
> filesystem blocks), and that is somewhat crazy because it ends up not only
> taking _really_ long to do compared to larger chunks (because you're running
> more independent hashes than with bigger blocks), but also because it will
> often split extents unnecessarily and contribute to fragmentation, which
> will lead to all kinds of other performance problems on the FS.

Like I said, millions of extents per week...

64K is an enormous dedup block size, especially if it comes with a 64K
alignment constraint as well.

These are the top ten duplicate block sizes from a sample of 95251
dedup ops on a medium-sized production server with 4TB of filesystem
(about one machine-day of data):

total bytes extent countdup size
2750808064  20987   131072
803733504   1533524288
123801600   975 126976
103575552   842912288
97443840793 122880
8205107210016   8192
7749222418919   4096
71331840645 110592
64143360540 118784
63897600650 98304

all bytes   all extents average dup size
6129995776  95251   64356

128K and 512K are the most common sizes due to btrfs compression (it
limits the block size to 128K for compressed extents and seems to limit
uncompressed extents to 512K for some reason).  12K is #4, and 3 of the
top ten sizes are below 16K.  The average size is just a little below 64K.

These are the duplicates with block sizes smaller than 64K:

total bytes extent countextent size
41615360635 65536
46264320753 61440
45817856799 57344
41267200775 53248 
45760512931 49152
46948352104245056
43417600106040960
47296512128336864
59277312180932768
49029120171028672
43745280178024576
53616640261820480
43466752265316384
103575552   842912288
8205107210016   8192 
7749222418919   4096 

all bytes <=64K extents <=64K   average dup size <=64K
870641664   55212   15769

14% of my duplicate bytes are in blocks smaller than 64K or blocks not
aligned to a 64K boundary within a file.  It's too large a space saving
to ignore on machines that have constrained storage.

It may be worthwhile skipping 4K and 8K dedups--at 250 ms per dedup,
they're 30% of the total run time and only 2.6% of the total dedup bytes.
On the other hand, this machine is already deduping everything fast enough
to keep up with new data, so there's no performance problem to solve here.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: Digital signature


Re: Announcing btrfs-dedupe

2016-11-15 Thread Austin S. Hemmelgarn

On 2016-11-14 16:10, Zygo Blaxell wrote:

On Mon, Nov 14, 2016 at 02:56:51PM -0500, Austin S. Hemmelgarn wrote:

On 2016-11-14 14:51, Zygo Blaxell wrote:

Deduplicating an extent that may might be concurrently modified during the
dedup is a reasonable userspace request.  In the general case there's
no way for userspace to ensure that it's not happening.

I'm not even talking about the locking, I'm talking about the data
comparison that the ioctl does to ensure they are the same before
deduplicating them, and specifically that protecting against userspace just
passing in two random extents that happen to be the same size but not
contain the same data (because deduplication _should_ reject such a
situation, that's what the clone ioctl is for).


If I'm deduping a VM image, and the virtual host is writing to said image
(which is likely since an incremental dedup will be intentionally doing
dedup over recently active data sets), the extent I just compared in
userspace might be different by the time the kernel sees it.

This is an important reason why the whole lock/read/compare/replace step
is an atomic operation from userspace's PoV.

The read also saves having to confirm a short/weak hash isn't a collision.
The RAM savings from using weak hashes (~48 bits) are a huge performance
win.

The locking overhead is very small compared to the reading overhead,
and (in the absence of bugs) it will only block concurrent writes to the
same offset range in the src/dst inodes (based on a read of the code...I
don't know if there's also an inode-level or backref-level barrier that
expands the locking scope).
I'm not arguing that it's a bad thing that the kernel is doing this, I'm 
just saying that the locking overhead is minuscule in most cases 
compared to the data comparison.  It is absolutely necessary for exactly 
the reasons you are outlining.


I'm not sure the ioctl is well designed for simply throwing random
data at it, especially not entire files (it can't handle files over
16MB anyway).  It will read more data than it has to compared to a
block-by-block comparison from userspace with prefetches or a pair of
IO threads.  If userspace reads both copies of the data just before
issuing the extent-same call, the kernel will read the data from cache
reasonably quickly.
It still depends on the use case to a certain extent.  In the case I was 
using as an example, I know to a reasonably certain degree (barring 
tampering, bugs, or hardware failure) that any two files are identical, 
and I actually don't want to trash the page-cache just to deduplicate 
data faster (he data set in question is large, but most of it is idle at 
any given point in time), so there's no point in me prereading 
everything in userspace, which in turn makes the script I use much 
simpler (the most complex part is figuring out how to split extents for 
files bigger than the ioctl can handle such that I don't have tiny tail 
extents but still have a minimum number per file).



The locking is perfectly reasonable and shouldn't contribute that much to
the overhead (unless you're being crazy and deduplicating thousands of tiny
blocks of data).


Why is deduplicating thousands of blocks of data crazy?  I already
deduplicate four orders of magnitude more than that per week.
You missed the 'tiny' quantifier.  I'm talking really small blocks, on 
the order of less than 64k (so, IOW, stuff that's not much bigger than a 
few filesystem blocks), and that is somewhat crazy because it ends up 
not only taking _really_ long to do compared to larger chunks (because 
you're running more independent hashes than with bigger blocks), but 
also because it will often split extents unnecessarily and contribute to 
fragmentation, which will lead to all kinds of other performance 
problems on the FS.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-14 Thread Zygo Blaxell
On Mon, Nov 14, 2016 at 09:07:51PM +0100, James Pharaoh wrote:
> On 14/11/16 20:51, Zygo Blaxell wrote:
> >On Mon, Nov 14, 2016 at 01:39:02PM -0500, Austin S. Hemmelgarn wrote:
> >>On 2016-11-14 13:22, James Pharaoh wrote:
> >>>One thing I am keen to understand is if BTRFS will automatically ignore
> >>>a request to deduplicate a file if it is already deduplicated? Given the
> >>>performance I see when doing a repeat deduplication, it seems to me that
> >>>it can't be doing so, although this could be caused by the CPU usage you
> >>>mention above.
> >>
> >>What's happening is that the dedupe ioctl does a byte-wise comparison of the
> >>ranges to make sure they're the same before linking them.  This is actually
> >>what takes most of the time when calling the ioctl, and is part of why it
> >>takes longer the larger the range to deduplicate is.  In essence, it's
> >>behaving like an OS should and not trusting userspace to make reasonable
> >>requests (which is also why there's a separate ioctl to clone a range from
> >>another file instead of deduplicating existing data).
> >
> > - the extent-same ioctl could check to see which extents
> > are referenced by the src and dst ranges, and return success
> > immediately without reading data if they are the same (but
> > userspace should already know this, or it's wasting a huge amount
> > of time before it even calls the kernel).
> 
> Yes, this is what I am talking about. I believe I should be able to read
> data about the BTRFS data structures and determine if this is the case. I
> don't care if there are false matches, due to concurrent updates, but
> there'll be a /lot/ of repeat deduplications unless I do this, because even
> if the file is identical, the mtime etc hasn't changed, and I have a record
> of previously doing a dedupe, there's no guarantee that the file hasn't been
> rewritten in place (eg by rsync), and no way that I know of to reliably
> detect if a file has been changed.
> 
> I am sure there are libraries out there which can look into the data
> structures of a BTRFS file system, I haven't researched this in detail
> though. I imagine that with some kind of lock on a BTRFS root, this could be
> achieved by simply reading the data from the disk, since I believe that
> everything is copy-on-write, so no existing data should be overwritten until
> all roots referring to it are updated. Perhaps I'm missing something
> though...

FIEMAP (VFS) and SEARCH_V2 (btrfs-specific) will both give you access
to the underlying physical block numbers.  SEARCH_V2 is non-trivial
to use without reverse-engineering significant parts of btrfs-progs.
SEARCH_V2 is a generic tree-searching tool which will give you all kinds
of information about btrfs structures...it's essential for a sophisticated
deduplicator and overkill for a simple one.

For full-file dedup using FIEMAP you only need to look at the "physical"
field of the first extent (if it's zero or the same as the other file, the
files cannot be deduplicated or are already deduplicated, respectively).
The source for 'filefrag' (from e2fsprogs) is good for learning how
FIEMAP works.

For block-level dedup you need to look at each extent individually.
That's much slower and full of additional caveats.  If you're going down
that road it's probably better to just improve duperemove instead.

> James


signature.asc
Description: Digital signature


Re: Announcing btrfs-dedupe

2016-11-14 Thread Zygo Blaxell
On Mon, Nov 14, 2016 at 02:56:51PM -0500, Austin S. Hemmelgarn wrote:
> On 2016-11-14 14:51, Zygo Blaxell wrote:
> >Deduplicating an extent that may might be concurrently modified during the
> >dedup is a reasonable userspace request.  In the general case there's
> >no way for userspace to ensure that it's not happening.
> I'm not even talking about the locking, I'm talking about the data
> comparison that the ioctl does to ensure they are the same before
> deduplicating them, and specifically that protecting against userspace just
> passing in two random extents that happen to be the same size but not
> contain the same data (because deduplication _should_ reject such a
> situation, that's what the clone ioctl is for).

If I'm deduping a VM image, and the virtual host is writing to said image
(which is likely since an incremental dedup will be intentionally doing
dedup over recently active data sets), the extent I just compared in
userspace might be different by the time the kernel sees it.

This is an important reason why the whole lock/read/compare/replace step
is an atomic operation from userspace's PoV.

The read also saves having to confirm a short/weak hash isn't a collision.
The RAM savings from using weak hashes (~48 bits) are a huge performance
win.

The locking overhead is very small compared to the reading overhead,
and (in the absence of bugs) it will only block concurrent writes to the
same offset range in the src/dst inodes (based on a read of the code...I
don't know if there's also an inode-level or backref-level barrier that
expands the locking scope).

I'm not sure the ioctl is well designed for simply throwing random
data at it, especially not entire files (it can't handle files over
16MB anyway).  It will read more data than it has to compared to a
block-by-block comparison from userspace with prefetches or a pair of
IO threads.  If userspace reads both copies of the data just before
issuing the extent-same call, the kernel will read the data from cache
reasonably quickly.

> The locking is perfectly reasonable and shouldn't contribute that much to
> the overhead (unless you're being crazy and deduplicating thousands of tiny
> blocks of data).

Why is deduplicating thousands of blocks of data crazy?  I already
deduplicate four orders of magnitude more than that per week.

> >That said, some optimization is possible (although there are good reasons
> >not to bother with optimization in the kernel):
> >
> > - VFS could recognize when it has two separate references to
> > the same physical extent and not re-read the same data twice
> > (but that requires teaching VFS how to do CoW in general, and is
> > hard for political reasons on top of the obvious technical ones).
> >
> > - the extent-same ioctl could check to see which extents
> > are referenced by the src and dst ranges, and return success
> > immediately without reading data if they are the same (but
> > userspace should already know this, or it's wasting a huge amount
> > of time before it even calls the kernel).
> >
> >>TBH, even though it's kind of annoying from a performance perspective, it's
> >>a rather nice safety net to have.  For example, one of the cases where I do
> >>deduplication is a couple of directories where each directory is an
> >>overlapping partial subset of one large tree which I keep elsewhere.  In
> >>this case, I can tell just by filename exactly what files might be
> >>duplicates, so the ioctl's check lets me just call the ioctl on all
> >>potential duplicates (after checking size, no point in wasting time if the
> >>files obviously aren't duplicates), and have it figure out whether or not
> >>they can be deduplicated.
> >>>
> >>>In any case, I'm considering some digging into the filesystem structures
> >>>to see if I can work this out myself before i do any deduplication. I'm
> >>>fairly sure this should be relatively simple to work out, at least well
> >>>enough for my purposes.
> >>Sadly, there's no way to avoid doing so right now.
> >>
> >>--
> >>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >>the body of a message to majord...@vger.kernel.org
> >>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


signature.asc
Description: Digital signature


Re: Announcing btrfs-dedupe

2016-11-14 Thread James Pharaoh

On 14/11/16 20:51, Zygo Blaxell wrote:

On Mon, Nov 14, 2016 at 01:39:02PM -0500, Austin S. Hemmelgarn wrote:

On 2016-11-14 13:22, James Pharaoh wrote:

One thing I am keen to understand is if BTRFS will automatically ignore
a request to deduplicate a file if it is already deduplicated? Given the
performance I see when doing a repeat deduplication, it seems to me that
it can't be doing so, although this could be caused by the CPU usage you
mention above.

>>

What's happening is that the dedupe ioctl does a byte-wise comparison of the
ranges to make sure they're the same before linking them.  This is actually
what takes most of the time when calling the ioctl, and is part of why it
takes longer the larger the range to deduplicate is.  In essence, it's
behaving like an OS should and not trusting userspace to make reasonable
requests (which is also why there's a separate ioctl to clone a range from
another file instead of deduplicating existing data).


- the extent-same ioctl could check to see which extents
are referenced by the src and dst ranges, and return success
immediately without reading data if they are the same (but
userspace should already know this, or it's wasting a huge amount
of time before it even calls the kernel).


Yes, this is what I am talking about. I believe I should be able to read 
data about the BTRFS data structures and determine if this is the case. 
I don't care if there are false matches, due to concurrent updates, but 
there'll be a /lot/ of repeat deduplications unless I do this, because 
even if the file is identical, the mtime etc hasn't changed, and I have 
a record of previously doing a dedupe, there's no guarantee that the 
file hasn't been rewritten in place (eg by rsync), and no way that I 
know of to reliably detect if a file has been changed.


I am sure there are libraries out there which can look into the data 
structures of a BTRFS file system, I haven't researched this in detail 
though. I imagine that with some kind of lock on a BTRFS root, this 
could be achieved by simply reading the data from the disk, since I 
believe that everything is copy-on-write, so no existing data should be 
overwritten until all roots referring to it are updated. Perhaps I'm 
missing something though...


James
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-14 Thread Austin S. Hemmelgarn

On 2016-11-14 14:51, Zygo Blaxell wrote:

On Mon, Nov 14, 2016 at 01:39:02PM -0500, Austin S. Hemmelgarn wrote:

On 2016-11-14 13:22, James Pharaoh wrote:

One thing I am keen to understand is if BTRFS will automatically ignore
a request to deduplicate a file if it is already deduplicated? Given the
performance I see when doing a repeat deduplication, it seems to me that
it can't be doing so, although this could be caused by the CPU usage you
mention above.

What's happening is that the dedupe ioctl does a byte-wise comparison of the
ranges to make sure they're the same before linking them.  This is actually
what takes most of the time when calling the ioctl, and is part of why it
takes longer the larger the range to deduplicate is.  In essence, it's
behaving like an OS should and not trusting userspace to make reasonable
requests (which is also why there's a separate ioctl to clone a range from
another file instead of deduplicating existing data).


Deduplicating an extent that may might be concurrently modified during the
dedup is a reasonable userspace request.  In the general case there's
no way for userspace to ensure that it's not happening.
I'm not even talking about the locking, I'm talking about the data 
comparison that the ioctl does to ensure they are the same before 
deduplicating them, and specifically that protecting against userspace 
just passing in two random extents that happen to be the same size but 
not contain the same data (because deduplication _should_ reject such a 
situation, that's what the clone ioctl is for).


The locking is perfectly reasonable and shouldn't contribute that much 
to the overhead (unless you're being crazy and deduplicating thousands 
of tiny blocks of data).


That said, some optimization is possible (although there are good reasons
not to bother with optimization in the kernel):

- VFS could recognize when it has two separate references to
the same physical extent and not re-read the same data twice
(but that requires teaching VFS how to do CoW in general, and is
hard for political reasons on top of the obvious technical ones).

- the extent-same ioctl could check to see which extents
are referenced by the src and dst ranges, and return success
immediately without reading data if they are the same (but
userspace should already know this, or it's wasting a huge amount
of time before it even calls the kernel).


TBH, even though it's kind of annoying from a performance perspective, it's
a rather nice safety net to have.  For example, one of the cases where I do
deduplication is a couple of directories where each directory is an
overlapping partial subset of one large tree which I keep elsewhere.  In
this case, I can tell just by filename exactly what files might be
duplicates, so the ioctl's check lets me just call the ioctl on all
potential duplicates (after checking size, no point in wasting time if the
files obviously aren't duplicates), and have it figure out whether or not
they can be deduplicated.


In any case, I'm considering some digging into the filesystem structures
to see if I can work this out myself before i do any deduplication. I'm
fairly sure this should be relatively simple to work out, at least well
enough for my purposes.

Sadly, there's no way to avoid doing so right now.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-14 Thread Zygo Blaxell
On Mon, Nov 14, 2016 at 01:39:02PM -0500, Austin S. Hemmelgarn wrote:
> On 2016-11-14 13:22, James Pharaoh wrote:
> >One thing I am keen to understand is if BTRFS will automatically ignore
> >a request to deduplicate a file if it is already deduplicated? Given the
> >performance I see when doing a repeat deduplication, it seems to me that
> >it can't be doing so, although this could be caused by the CPU usage you
> >mention above.
> What's happening is that the dedupe ioctl does a byte-wise comparison of the
> ranges to make sure they're the same before linking them.  This is actually
> what takes most of the time when calling the ioctl, and is part of why it
> takes longer the larger the range to deduplicate is.  In essence, it's
> behaving like an OS should and not trusting userspace to make reasonable
> requests (which is also why there's a separate ioctl to clone a range from
> another file instead of deduplicating existing data).

Deduplicating an extent that may might be concurrently modified during the
dedup is a reasonable userspace request.  In the general case there's
no way for userspace to ensure that it's not happening.

That said, some optimization is possible (although there are good reasons
not to bother with optimization in the kernel):

- VFS could recognize when it has two separate references to
the same physical extent and not re-read the same data twice
(but that requires teaching VFS how to do CoW in general, and is
hard for political reasons on top of the obvious technical ones).

- the extent-same ioctl could check to see which extents
are referenced by the src and dst ranges, and return success
immediately without reading data if they are the same (but
userspace should already know this, or it's wasting a huge amount
of time before it even calls the kernel).

> TBH, even though it's kind of annoying from a performance perspective, it's
> a rather nice safety net to have.  For example, one of the cases where I do
> deduplication is a couple of directories where each directory is an
> overlapping partial subset of one large tree which I keep elsewhere.  In
> this case, I can tell just by filename exactly what files might be
> duplicates, so the ioctl's check lets me just call the ioctl on all
> potential duplicates (after checking size, no point in wasting time if the
> files obviously aren't duplicates), and have it figure out whether or not
> they can be deduplicated.
> >
> >In any case, I'm considering some digging into the filesystem structures
> >to see if I can work this out myself before i do any deduplication. I'm
> >fairly sure this should be relatively simple to work out, at least well
> >enough for my purposes.
> Sadly, there's no way to avoid doing so right now.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: Digital signature


Re: Announcing btrfs-dedupe

2016-11-14 Thread Zygo Blaxell
On Mon, Nov 14, 2016 at 07:22:59PM +0100, James Pharaoh wrote:
> On 14/11/16 19:07, Zygo Blaxell wrote:
> >There is also a still-unresolved problem where the filesystem CPU usage
> >rises exponentially for some operations depending on the number of shared
> >references to an extent.  Files which contain blocks with more than a few
> >thousand shared references can trigger this problem.  A file over 1TB can
> >keep the kernel busy at 100% CPU for over 40 minutes at a time.
> 
> Yes, I see this all the time. For my use cases, I don't really care about
> "shared references" as blocks of files, but am happy to simply deduplicate
> at the whole-file level. I wonder if this still will have the same effect,
> however. I guess that this could be mitigated in a tool, but this is going
> to be both annoying and not the most elegant solution.

If you have huge files (1TB+) this can be a problem even with whole-file
deduplications (which are really just extent-level deduplications applied
to the entire file).  The CPU time is a product of file size and extent
reference count with some other multipliers on top.

I've hacked around it by timing how long it takes to manipulate the data,
and blacklisting any hash value or block address that takes more than
10 seconds to process (if such a block is found after blacklisting, just
skip processing the block/extent/file entirely).  It turns out there are
very few of these in practice (only a few hundred per TB) but these few
hundred block hash values occur millions of times in a large data corpus.

> One thing I am keen to understand is if BTRFS will automatically ignore a
> request to deduplicate a file if it is already deduplicated? Given the
> performance I see when doing a repeat deduplication, it seems to me that it
> can't be doing so, although this could be caused by the CPU usage you
> mention above.

As far as I can tell btrfs doesn't do anything different in this
case--it'll happily repeat the entire lock/read/compare/delete/insert
sequence even if the outcome cannot be different from the initial
conditions.  Due to limitations of VFS caching it'll read the same blocks
from storage hardware twice, too.

> In any case, I'm considering some digging into the filesystem structures to
> see if I can work this out myself before i do any deduplication. I'm fairly
> sure this should be relatively simple to work out, at least well enough for
> my purposes.

I used FIEMAP (then later replaced it with SEARCH_V2 for speed) to map
the extents to physical addresses before deduping them.  If you're only
going to do whole-file dedup then you only need to care about the physical
address of the first non-hole extent.



signature.asc
Description: Digital signature


Re: Announcing btrfs-dedupe

2016-11-14 Thread Austin S. Hemmelgarn

On 2016-11-14 13:22, James Pharaoh wrote:

On 14/11/16 19:07, Zygo Blaxell wrote:

On Mon, Nov 07, 2016 at 07:49:51PM +0100, James Pharaoh wrote:

Annoyingly I can't find this now, but I definitely remember reading
someone,
apparently someone knowledgable, claim that the latest version of the
kernel
which I was using at the time, still suffered from issues regarding the
dedupe code.



This was a while ago, and I would be very pleased to hear that there
is high
confidence in the current implementation! I'll post a link if I
manage to
find the comments.


I've been running the btrfs dedup ioctl 7 times per second on average
over 42TB of test data for most of a year (and at a lower rate for two
years).  I have not found any data corruptions due to _dedup_.  I did
find
three distinct data corruption kernel bugs unrelated to dedup, and two
test machines with bad RAM, so I'm pretty sure my corruption detection
is working.

That said, I wouldn't run dedup on a kernel older than 4.4.  LTS kernels
might be OK too, but only if they're up to date with backported btrfs
fixes.


Ok, I think this might have referred to the 4.2 kernel, which was newly
released at the time. I wish I could find the post!


Kernels older than 3.13 lack the FILE_EXTENT_SAME ioctl and can
only deduplicate static data (i.e. data you are certain is not being
concurrently modified).  Before 3.12 there are so many bugs you might
as well not bother.


Yes well I don't need to be told that, sadly.


Older kernels are bad for dedup because of non-corruption reasons.
Between 3.13 and 4.4, the following bugs were fixed:

- false-negative capability checks (e.g. same-inode, EOF extent)
reduce dedup efficiency

- ctime updates (older versions would update ctime when a file was
deduped) mess with incremental backup tools, build systems, etc.

- kernel memory leaks (self-explanatory)

- multiple kernel hang/panic bugs (e.g. a deadlock if two threads
try to read the same extent at the same time, and at least one
of those threads is dedup; and there was some race condition
leading to invalid memory access on dedup's comparison reads)
which won't eat your data, but they might ruin your day anyway.


Ok, I have thing I've seen some stuff like this, I certainly have
problems, but never a loss of data. Things can take a LONG time to get
out of the filesystem, though.


There is also a still-unresolved problem where the filesystem CPU usage
rises exponentially for some operations depending on the number of shared
references to an extent.  Files which contain blocks with more than a few
thousand shared references can trigger this problem.  A file over 1TB can
keep the kernel busy at 100% CPU for over 40 minutes at a time.


Yes, I see this all the time. For my use cases, I don't really care
about "shared references" as blocks of files, but am happy to simply
deduplicate at the whole-file level. I wonder if this still will have
the same effect, however. I guess that this could be mitigated in a
tool, but this is going to be both annoying and not the most elegant
solution.
The issue is at the extent level, so it will impact whole files too (but 
it will have less impact on defragmented files that are then 
deduplicated as whole files).  Pretty much anything that pins references 
to extents will impact this, so cloned extents and snapshots will also 
have an impact.



There might also be a correlation between delalloc data and hangs in
extent-same, but I have NOT been able to confirm this.  All I know
at this point is that doing a fsync() on the source FD just before
doing the extent-same ioctl dramatically reduces filesystem hang rates:
several weeks between hangs (or no hangs at all) with fsync, vs. 18 hours
or less without.


Interesting, I'll maybe see if I can make use of this.

One thing I am keen to understand is if BTRFS will automatically ignore
a request to deduplicate a file if it is already deduplicated? Given the
performance I see when doing a repeat deduplication, it seems to me that
it can't be doing so, although this could be caused by the CPU usage you
mention above.
What's happening is that the dedupe ioctl does a byte-wise comparison of 
the ranges to make sure they're the same before linking them.  This is 
actually what takes most of the time when calling the ioctl, and is part 
of why it takes longer the larger the range to deduplicate is.  In 
essence, it's behaving like an OS should and not trusting userspace to 
make reasonable requests (which is also why there's a separate ioctl to 
clone a range from another file instead of deduplicating existing data).


TBH, even though it's kind of annoying from a performance perspective, 
it's a rather nice safety net to have.  For example, one of the cases 
where I do deduplication is a couple of directories where each directory 
is an overlapping partial subset of one large tree which I keep 
elsewhere.  In this case, I can tell just by filename exactly what

Re: Announcing btrfs-dedupe

2016-11-14 Thread Zygo Blaxell
On Tue, Nov 08, 2016 at 12:06:01PM +0100, Niccolò Belli wrote:
> Nice, you should probably update the btrfs wiki as well, because there is no
> mention of btrfs-dedupe.
> 
> First question, why this name? Don't you plan to support xfs as well?

Does XFS plan to support LOGICAL_INO, INO_PATHS, and something analogous
to SEARCH_V2?

POSIX API + FILE_EXTENT_SAME is OK for the lowest common denominator
across arbitrary filesystems, but a btrfs-specific tool can do a lot
better.  Especially for incremental dedup and low-RAM algorithms.



signature.asc
Description: Digital signature


Re: Announcing btrfs-dedupe

2016-11-14 Thread James Pharaoh

On 14/11/16 19:07, Zygo Blaxell wrote:

On Mon, Nov 07, 2016 at 07:49:51PM +0100, James Pharaoh wrote:

Annoyingly I can't find this now, but I definitely remember reading someone,
apparently someone knowledgable, claim that the latest version of the kernel
which I was using at the time, still suffered from issues regarding the
dedupe code.



This was a while ago, and I would be very pleased to hear that there is high
confidence in the current implementation! I'll post a link if I manage to
find the comments.


I've been running the btrfs dedup ioctl 7 times per second on average
over 42TB of test data for most of a year (and at a lower rate for two
years).  I have not found any data corruptions due to _dedup_.  I did find
three distinct data corruption kernel bugs unrelated to dedup, and two
test machines with bad RAM, so I'm pretty sure my corruption detection
is working.

That said, I wouldn't run dedup on a kernel older than 4.4.  LTS kernels
might be OK too, but only if they're up to date with backported btrfs
fixes.


Ok, I think this might have referred to the 4.2 kernel, which was newly 
released at the time. I wish I could find the post!



Kernels older than 3.13 lack the FILE_EXTENT_SAME ioctl and can
only deduplicate static data (i.e. data you are certain is not being
concurrently modified).  Before 3.12 there are so many bugs you might
as well not bother.


Yes well I don't need to be told that, sadly.


Older kernels are bad for dedup because of non-corruption reasons.
Between 3.13 and 4.4, the following bugs were fixed:

- false-negative capability checks (e.g. same-inode, EOF extent)
reduce dedup efficiency

- ctime updates (older versions would update ctime when a file was
deduped) mess with incremental backup tools, build systems, etc.

- kernel memory leaks (self-explanatory)

- multiple kernel hang/panic bugs (e.g. a deadlock if two threads
try to read the same extent at the same time, and at least one
of those threads is dedup; and there was some race condition
leading to invalid memory access on dedup's comparison reads)
which won't eat your data, but they might ruin your day anyway.


Ok, I have thing I've seen some stuff like this, I certainly have 
problems, but never a loss of data. Things can take a LONG time to get 
out of the filesystem, though.



There is also a still-unresolved problem where the filesystem CPU usage
rises exponentially for some operations depending on the number of shared
references to an extent.  Files which contain blocks with more than a few
thousand shared references can trigger this problem.  A file over 1TB can
keep the kernel busy at 100% CPU for over 40 minutes at a time.


Yes, I see this all the time. For my use cases, I don't really care 
about "shared references" as blocks of files, but am happy to simply 
deduplicate at the whole-file level. I wonder if this still will have 
the same effect, however. I guess that this could be mitigated in a 
tool, but this is going to be both annoying and not the most elegant 
solution.



There might also be a correlation between delalloc data and hangs in
extent-same, but I have NOT been able to confirm this.  All I know
at this point is that doing a fsync() on the source FD just before
doing the extent-same ioctl dramatically reduces filesystem hang rates:
several weeks between hangs (or no hangs at all) with fsync, vs. 18 hours
or less without.


Interesting, I'll maybe see if I can make use of this.

One thing I am keen to understand is if BTRFS will automatically ignore 
a request to deduplicate a file if it is already deduplicated? Given the 
performance I see when doing a repeat deduplication, it seems to me that 
it can't be doing so, although this could be caused by the CPU usage you 
mention above.


In any case, I'm considering some digging into the filesystem structures 
to see if I can work this out myself before i do any deduplication. I'm 
fairly sure this should be relatively simple to work out, at least well 
enough for my purposes.


James
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-14 Thread Zygo Blaxell
On Mon, Nov 07, 2016 at 07:49:51PM +0100, James Pharaoh wrote:
> Annoyingly I can't find this now, but I definitely remember reading someone,
> apparently someone knowledgable, claim that the latest version of the kernel
> which I was using at the time, still suffered from issues regarding the
> dedupe code.

> This was a while ago, and I would be very pleased to hear that there is high
> confidence in the current implementation! I'll post a link if I manage to
> find the comments.

I've been running the btrfs dedup ioctl 7 times per second on average
over 42TB of test data for most of a year (and at a lower rate for two
years).  I have not found any data corruptions due to _dedup_.  I did find
three distinct data corruption kernel bugs unrelated to dedup, and two
test machines with bad RAM, so I'm pretty sure my corruption detection
is working.

That said, I wouldn't run dedup on a kernel older than 4.4.  LTS kernels
might be OK too, but only if they're up to date with backported btrfs
fixes.

Kernels older than 3.13 lack the FILE_EXTENT_SAME ioctl and can
only deduplicate static data (i.e. data you are certain is not being
concurrently modified).  Before 3.12 there are so many bugs you might
as well not bother.

Older kernels are bad for dedup because of non-corruption reasons.
Between 3.13 and 4.4, the following bugs were fixed:

- false-negative capability checks (e.g. same-inode, EOF extent)
reduce dedup efficiency

- ctime updates (older versions would update ctime when a file was
deduped) mess with incremental backup tools, build systems, etc.

- kernel memory leaks (self-explanatory)

- multiple kernel hang/panic bugs (e.g. a deadlock if two threads
try to read the same extent at the same time, and at least one
of those threads is dedup; and there was some race condition
leading to invalid memory access on dedup's comparison reads)
which won't eat your data, but they might ruin your day anyway.

There is also a still-unresolved problem where the filesystem CPU usage
rises exponentially for some operations depending on the number of shared
references to an extent.  Files which contain blocks with more than a few
thousand shared references can trigger this problem.  A file over 1TB can
keep the kernel busy at 100% CPU for over 40 minutes at a time.

There might also be a correlation between delalloc data and hangs in
extent-same, but I have NOT been able to confirm this.  All I know
at this point is that doing a fsync() on the source FD just before
doing the extent-same ioctl dramatically reduces filesystem hang rates:
several weeks between hangs (or no hangs at all) with fsync, vs. 18 hours
or less without.

> James
> 
> On 07/11/16 18:59, Mark Fasheh wrote:
> >Hi James,
> >
> >Re the following text on your project page:
> >
> >"IMPORTANT CAVEAT — I have read that there are race and/or error
> >conditions which can cause filesystem corruption in the kernel
> >implementation of the deduplication ioctl."
> >
> >Can you expound on that? I'm not aware of any bugs right now but if
> >there is any it'd absolutely be worth having that info on the btrfs
> >list.
> >
> >Thanks,
> >--Mark
> >
> >
> >On Sun, Nov 6, 2016 at 7:30 AM, James Pharaoh
> > wrote:
> >>Hi all,
> >>
> >>I'm pleased to announce my btrfs deduplication utility, written in Rust.
> >>This operates on whole files, is fast, and I believe complements the
> >>existing utilities (duperemove, bedup), which exist currently.
> >>
> >>Please visit the homepage for more information:
> >>
> >>http://btrfs-dedupe.com
> >>
> >>James Pharaoh
> >>--
> >>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >>the body of a message to majord...@vger.kernel.org
> >>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: Digital signature


Re: Announcing btrfs-dedupe

2016-11-13 Thread James Pharaoh
I've updated the BTRFS wiki here with all the new tools people have 
mentioned:


https://btrfs.wiki.kernel.org/index.php/Deduplication#Other_tools

Please let me know if anyone who does not have access to the wiki has 
any additions, updates or corrections to what I've written here.


James

On 08/11/16 23:36, Saint Germain wrote:

On Sun, 6 Nov 2016 14:30:52 +0100, James Pharaoh
 wrote :


Hi all,

I'm pleased to announce my btrfs deduplication utility, written in
Rust. This operates on whole files, is fast, and I believe
complements the existing utilities (duperemove, bedup), which exist
currently.

Please visit the homepage for more information:

http://btrfs-dedupe.com



Thanks for having shared your work.
Please be aware of these other similar softwares:
- jdupes: https://github.com/jbruchon/jdupes
- rmlint: https://github.com/sahib/rmlint
And of course fdupes.

Some intesting points I have seen in them:
- use xxhash to identify potential duplicates (huge speedup)
- ability to deduplicate read-only snapshots
- identify potential reflinked files (see also my email here:
  https://www.spinics.net/lists/linux-btrfs/msg60081.html)
- ability to filter out hardlinks
- triangle problem: see jdupes readme
- jdupes has started the process to be included in Debian

I hope that will help and that you can share some codes with them !

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-09 Thread David Sterba
On Mon, Nov 07, 2016 at 09:54:09PM +0100, Adam Borowski wrote:
> [1]. For some reasons zfs-on-linux guys didn't implement this yet, despite
> it being an obvious thing on ZFS.

In my understanding, the COW mechanics are different, there are no
extent back references, so this would require some design updates. See
issue 405 at ZoL tracker.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-09 Thread Saint Germain
On Wed, 09 Nov 2016 12:24:51 +0100, Niccolò Belli
 wrote :
> 
> On martedì 8 novembre 2016 23:36:25 CET, Saint Germain wrote:
> > Please be aware of these other similar softwares:
> > - jdupes: https://github.com/jbruchon/jdupes
> > - rmlint: https://github.com/sahib/rmlint
> > And of course fdupes.
> >
> > Some intesting points I have seen in them:
> > - use xxhash to identify potential duplicates (huge speedup)
> > - ability to deduplicate read-only snapshots
> > - identify potential reflinked files (see also my email here:
> >   https://www.spinics.net/lists/linux-btrfs/msg60081.html)
> > - ability to filter out hardlinks
> > - triangle problem: see jdupes readme
> > - jdupes has started the process to be included in Debian
> >
> > I hope that will help and that you can share some codes with them !
> > 
> Hi,
> What do you think about jdupes? I'm searching an alternative to
> duperemove and rmlint doesn't seem to support btrfs deduplication, so
> I would like to try jdupes. My main problem with duperemove is a
> memory leak, also it seems to lead to greater disk usage: 
> https://github.com/markfasheh/duperemove/issues/163

rmlint is supporting btrfs deduplication:
rmlint --algorithm=xxhash --types="duplicates" --hidden 
--config=sh:handler=clone --no-hardlinked

I've used jdupes and rmlint to deduplicate 2TB with 4GB RAM and it took
a few hours. So it is acceptable from a performance point of view.
The problems I found have been corrected by both.

Jdupes author is really kind and reactive !
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-09 Thread Niccolò Belli

Hi,
What do you think about jdupes? I'm searching an alternative to duperemove 
and rmlint doesn't seem to support btrfs deduplication, so I would like to 
try jdupes. My main problem with duperemove is a memory leak, also it seems 
to lead to greater disk usage: 
https://github.com/markfasheh/duperemove/issues/163


Niccolo' Belli

On martedì 8 novembre 2016 23:36:25 CET, Saint Germain wrote:

Please be aware of these other similar softwares:
- jdupes: https://github.com/jbruchon/jdupes
- rmlint: https://github.com/sahib/rmlint
And of course fdupes.

Some intesting points I have seen in them:
- use xxhash to identify potential duplicates (huge speedup)
- ability to deduplicate read-only snapshots
- identify potential reflinked files (see also my email here:
  https://www.spinics.net/lists/linux-btrfs/msg60081.html)
- ability to filter out hardlinks
- triangle problem: see jdupes readme
- jdupes has started the process to be included in Debian

I hope that will help and that you can share some codes with them !

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-08 Thread Saint Germain
On Sun, 6 Nov 2016 14:30:52 +0100, James Pharaoh
 wrote :

> Hi all,
> 
> I'm pleased to announce my btrfs deduplication utility, written in
> Rust. This operates on whole files, is fast, and I believe
> complements the existing utilities (duperemove, bedup), which exist
> currently.
> 
> Please visit the homepage for more information:
> 
> http://btrfs-dedupe.com
> 

Thanks for having shared your work.
Please be aware of these other similar softwares:
- jdupes: https://github.com/jbruchon/jdupes
- rmlint: https://github.com/sahib/rmlint
And of course fdupes.

Some intesting points I have seen in them:
- use xxhash to identify potential duplicates (huge speedup)
- ability to deduplicate read-only snapshots
- identify potential reflinked files (see also my email here:
  https://www.spinics.net/lists/linux-btrfs/msg60081.html)
- ability to filter out hardlinks
- triangle problem: see jdupes readme
- jdupes has started the process to be included in Debian

I hope that will help and that you can share some codes with them !

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-08 Thread Darrick J. Wong
On Tue, Nov 08, 2016 at 10:59:56AM -0800, Mark Fasheh wrote:
> On Mon, Nov 7, 2016 at 6:17 PM, Darrick J. Wong  
> wrote:
> > On Mon, Nov 07, 2016 at 09:54:09PM +0100, Adam Borowski wrote:
> >> Mark has already included XFS in documentation of duperemove, all that 
> >> looks
> >> amiss is btrfs-extent-same having an obsolete name.  But then, I never did
> >> any non-superficial tests on XFS, beyond "seems to work".
> 
> I'd actually be ok dropping btrfs-extent-same completely at this point
> but I'm concerned that it would leave some users behind.
> 
> 
> > /me wonders if ocfs2 will ever catch up to the reflink/dedupe party. ;)
> 
> Hey, Ocfs2 started the reflink party! But yeah it's fallen behind
> since then with respect to cow and dedupe. More importantly though I'd
> like to see some extra extent tracking in there like XFS did with the
> reflink b+tree.

Perhaps this should move to the ocfs2 list, but...

...as I understand ocfs2, each inode can point to the head of a refcount
tree that maintains refcounts for all the physical blocks that are
mapped by any of the files that share that refcount tree.  It wouldn't
be difficult to hook up this existing refcount structure to the reflink
and dedupe vfs ioctls, with the huge caveat that both inodes will end up
belonging to the same refcount tree (or the call fails).  This might not
be such a huge issue for reflink since we're generally only using it
during a file copy anyway, but for dedupe this could have disastrous
consequences if someone does an fs-wide dedupe and every file in the fs
ends up with the same refcount tree.

So I guess you could give each block group its own refcount tree or
something so that all the writes in the fs don't end up contending for a
single data structure.

--D

>--Mark
> 
> -- 
> "When the going gets weird, the weird turn pro."
> Hunter S. Thompson
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-08 Thread Mark Fasheh
On Mon, Nov 7, 2016 at 6:17 PM, Darrick J. Wong  wrote:
> On Mon, Nov 07, 2016 at 09:54:09PM +0100, Adam Borowski wrote:
>> Mark has already included XFS in documentation of duperemove, all that looks
>> amiss is btrfs-extent-same having an obsolete name.  But then, I never did
>> any non-superficial tests on XFS, beyond "seems to work".

I'd actually be ok dropping btrfs-extent-same completely at this point
but I'm concerned that it would leave some users behind.


> /me wonders if ocfs2 will ever catch up to the reflink/dedupe party. ;)

Hey, Ocfs2 started the reflink party! But yeah it's fallen behind
since then with respect to cow and dedupe. More importantly though I'd
like to see some extra extent tracking in there like XFS did with the
reflink b+tree.
   --Mark

-- 
"When the going gets weird, the weird turn pro."
Hunter S. Thompson
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-08 Thread Mark Fasheh
On Mon, Nov 7, 2016 at 6:40 PM, Christoph Anton Mitterer
 wrote:
> On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote:
>> I think adding a whole-file dedup mode to duperemove would be better
>> (from user's POV) than writing a whole new tool
>
> What would IMO be really good from a user's POV was, if one of the
> tools, deemed to be the "best", would be added to the btrfs-progs and
> simply become "the official" one.

Yeah there's two problems, one being that the extent-same ioctl (and
duperemove) is cross-file system now so I. The other one James touches
on, which is that there's a non trivial amount of complexity in
duperemove so shoving it in btrfs progs just means we're going to have
parallel development streams solving some different problems.

That's not to say that every dedupe tool has to be complex - we have
xfs_io to run the ioctl and I don't think it'd be a bad idea if
btrfs-progs had a simple interface to it too.
   --Mark



-- 
"When the going gets weird, the weird turn pro."
Hunter S. Thompson
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-08 Thread Niccolò Belli

On martedì 8 novembre 2016 17:58:52 CET, James Pharaoh wrote:
Yes, everything you have described here is something I intend 
to create, and might as well include in the tool itself. I'll 
add it to the roadmap ;-)


Sounds good, but I have yet another feature request which is even more 
interesting in my opinion.
If you ever used snapper you probably already found yourself in the 
poisition when you want to free some space and you actually can't, because 
the files you want to delete are already present in countless snapshots. 
Such a way you will have to delete the unwanted files from every snapshot, 
which is tedious task, even more difficult if you moved/renamed these 
files. What I actually do is exploiting duperemove's hashfile to grep for 
the checksum and obtain all the paths. Then I will have to switch the 
snapshots to rw, manually delete each file and finally switch them back to 
ro. A tool which automates these task would be awesome.


Niccolo'
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-08 Thread Austin S. Hemmelgarn

On 2016-11-08 11:57, Darrick J. Wong wrote:

On Tue, Nov 08, 2016 at 08:26:02AM -0500, Austin S. Hemmelgarn wrote:

On 2016-11-07 21:40, Christoph Anton Mitterer wrote:

On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote:

I think adding a whole-file dedup mode to duperemove would be better
(from user's POV) than writing a whole new tool


What would IMO be really good from a user's POV was, if one of the
tools, deemed to be the "best", would be added to the btrfs-progs and
simply become "the official" one.


The problem is that for deduplication, most tools won't work well for
everything.  For example the cases I use it in are very specific and have
horrible performance using pretty much any available tool (I have a couple
cases where I have disjoint subsets of the same directory tree with
different prefixes, so I can tell exactly which files are duplicated, and
that any duplicate file is 100% duplicate, as well as a couple of cases
where changes are small, scattered, and highly predictable (and thus it's
easier to find what's changed and dedupe everything else instead of finding
what's the same), and none of the existing options do well in either
situation).

I'd argue at minimum for having the extent-same tool from duperemove in
btrfs-progs, as that lets people do deduplication how they want without
having to write C code.  Something equivalent that would let you call any
BTRFS ioctl with (reasonably) arbitrary arguments might actually be even
better (I can see such a tool being wonderful for debugging).


Since xfsprogs 4.3, xfs_io has a 'dedupe' command that can talk to
FIDEDUPERANGE (f.k.a. EXTENT SAME):

$ xfs_io -c '/mnt/srcfile srcoffset dstoffset length' /mnt/destfile

I actually hadn't known about this, thanks.  It means that xfs_io just 
got even more useful despite me not running XFS.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-08 Thread James Pharaoh
Yes, everything you have described here is something I intend to create, 
and might as well include in the tool itself. I'll add it to the roadmap ;-)


James

On 08/11/16 17:57, Niccolò Belli wrote:

On martedì 8 novembre 2016 12:38:48 CET, James Pharaoh wrote:

You can't deduplicate a read-only snapshot, but you can create
read-write snapshots from them, deduplicate those, and then recreate
the read-only ones. This is what I've done.


Since snapper creates hundreds of snapshots, isn't this something that
the deduplication software could do for me if I explicitely tell it to
do so? I mean momentarily switching the snapshot to rw in order to
deduplicate it, then switching it back to ro.


In theory, once this has been done once, it shouldn't have to be done
again, at least for those snapshots, unless you want to modify the
deduplication. It's probably a good idea to defragment files and
directories first, as well.


I can't defragment anything, because it would take too much free space
to do so with so many snapshots. Instead, the deduplication software
could defragment each file before calling the extent-same ioctl, that
would be feasible. Such a way you will not need hilarious amounts of
free space to defragment the fs.


It should be possible to deduplicate a read-only file to a read-write
one, but that's probably not worth the effort in many real-world use
cases.


This is exactly what I would expect a deduplication tool to do when it
encounters a ro snapshot, except when I explicitely tell it to
momentarily switch the snapshot to rw in order to deduplicate it.

Niccolo' Belli

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-08 Thread Darrick J. Wong
On Tue, Nov 08, 2016 at 08:26:02AM -0500, Austin S. Hemmelgarn wrote:
> On 2016-11-07 21:40, Christoph Anton Mitterer wrote:
> >On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote:
> >>I think adding a whole-file dedup mode to duperemove would be better
> >>(from user's POV) than writing a whole new tool
> >
> >What would IMO be really good from a user's POV was, if one of the
> >tools, deemed to be the "best", would be added to the btrfs-progs and
> >simply become "the official" one.
> 
> The problem is that for deduplication, most tools won't work well for
> everything.  For example the cases I use it in are very specific and have
> horrible performance using pretty much any available tool (I have a couple
> cases where I have disjoint subsets of the same directory tree with
> different prefixes, so I can tell exactly which files are duplicated, and
> that any duplicate file is 100% duplicate, as well as a couple of cases
> where changes are small, scattered, and highly predictable (and thus it's
> easier to find what's changed and dedupe everything else instead of finding
> what's the same), and none of the existing options do well in either
> situation).
> 
> I'd argue at minimum for having the extent-same tool from duperemove in
> btrfs-progs, as that lets people do deduplication how they want without
> having to write C code.  Something equivalent that would let you call any
> BTRFS ioctl with (reasonably) arbitrary arguments might actually be even
> better (I can see such a tool being wonderful for debugging).

Since xfsprogs 4.3, xfs_io has a 'dedupe' command that can talk to
FIDEDUPERANGE (f.k.a. EXTENT SAME):

$ xfs_io -c '/mnt/srcfile srcoffset dstoffset length' /mnt/destfile

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-08 Thread Niccolò Belli

On martedì 8 novembre 2016 12:38:48 CET, James Pharaoh wrote:
You can't deduplicate a read-only snapshot, but you can create 
read-write snapshots from them, deduplicate those, and then 
recreate the read-only ones. This is what I've done.


Since snapper creates hundreds of snapshots, isn't this something that the 
deduplication software could do for me if I explicitely tell it to do so? I 
mean momentarily switching the snapshot to rw in order to deduplicate it, 
then switching it back to ro.


In theory, once this has been done once, it shouldn't have to 
be done again, at least for those snapshots, unless you want to 
modify the deduplication. It's probably a good idea to 
defragment files and directories first, as well.


I can't defragment anything, because it would take too much free space to 
do so with so many snapshots. Instead, the deduplication software could 
defragment each file before calling the extent-same ioctl, that would be 
feasible. Such a way you will not need hilarious amounts of free space to 
defragment the fs.


It should be possible to deduplicate a read-only file to a 
read-write one, but that's probably not worth the effort in many 
real-world use cases.


This is exactly what I would expect a deduplication tool to do when it 
encounters a ro snapshot, except when I explicitely tell it to momentarily 
switch the snapshot to rw in order to deduplicate it.


Niccolo' Belli
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-08 Thread Austin S. Hemmelgarn

On 2016-11-07 21:40, Christoph Anton Mitterer wrote:

On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote:

I think adding a whole-file dedup mode to duperemove would be better
(from user's POV) than writing a whole new tool


What would IMO be really good from a user's POV was, if one of the
tools, deemed to be the "best", would be added to the btrfs-progs and
simply become "the official" one.


The problem is that for deduplication, most tools won't work well for 
everything.  For example the cases I use it in are very specific and 
have horrible performance using pretty much any available tool (I have a 
couple cases where I have disjoint subsets of the same directory tree 
with different prefixes, so I can tell exactly which files are 
duplicated, and that any duplicate file is 100% duplicate, as well as a 
couple of cases where changes are small, scattered, and highly 
predictable (and thus it's easier to find what's changed and dedupe 
everything else instead of finding what's the same), and none of the 
existing options do well in either situation).


I'd argue at minimum for having the extent-same tool from duperemove in 
btrfs-progs, as that lets people do deduplication how they want without 
having to write C code.  Something equivalent that would let you call 
any BTRFS ioctl with (reasonably) arbitrary arguments might actually be 
even better (I can see such a tool being wonderful for debugging).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-08 Thread James Pharaoh

On 08/11/16 12:06, Niccolò Belli wrote:

Nice, you should probably update the btrfs wiki as well, because there
is no mention of btrfs-dedupe.


I am planning to, I had to apply for an account, which has now been 
approved.



First question, why this name? Don't you plan to support xfs as well?


It didn't occur to me, to be honest. I might support XFS as well, but I 
don't use it, and will possibly be adding other btrfs-specific stuff to 
it. You'll notice it's part of a bigger wbs-backup repo, with other 
tools, which I'm developing to manage my storage and backup requirements.


I'll take a look at it, and certainly see if it works out of the box.


Second question, I'm trying deduplication tools for the very first time
and I still have to figure out how to handle snapper snapshots, which
are read only. I currently tried duperemove 0.11 git and I get tons of
"Error 30: Read-only file system while opening
"/.../@snapshots/4385/...". How am I supposed to handle snapper snapshots?


> Is btrfs-dedupe able to handle snapper snapshots?

You can't deduplicate a read-only snapshot, but you can create 
read-write snapshots from them, deduplicate those, and then recreate the 
read-only ones. This is what I've done.


In theory, once this has been done once, it shouldn't have to be done 
again, at least for those snapshots, unless you want to modify the 
deduplication. It's probably a good idea to defragment files and 
directories first, as well.


It should be possible to deduplicate a read-only file to a read-write 
one, but that's probably not worth the effort in many real-world use cases.


James
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-08 Thread Niccolò Belli
Nice, you should probably update the btrfs wiki as well, because there is 
no mention of btrfs-dedupe.


First question, why this name? Don't you plan to support xfs as well?

Second question, I'm trying deduplication tools for the very first time and 
I still have to figure out how to handle snapper snapshots, which are read 
only. I currently tried duperemove 0.11 git and I get tons of "Error 30: 
Read-only file system while opening "/.../@snapshots/4385/...". How am I 
supposed to handle snapper snapshots?


I do not run duperemove from a live distro, instead I run it directly on 
the system I want to deduplicate:


sudo mount -o noatime,compress=lzo,autodefrag /dev/mapper/cryptroot 
/home/niko/nosnap/rootfs/
sudo duperemove -drh --dedupe-options=nofiemap 
--hashfile=/home/niko/nosnap/rootfs.hash /home/niko/nosnap/rootfs/


Is btrfs-dedupe able to handle snapper snapshots?

Thanks,
Niccolo' Belli
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-07 Thread James Pharaoh
Perhaps the complexity of doing this efficiently makes it inappropriate 
for inclusion in the tool itself, whereas I believe the core 
implementation's focus is on in-band deduplication, automatic and behind 
the scenes.


On 08/11/16 03:40, Christoph Anton Mitterer wrote:

On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote:

I think adding a whole-file dedup mode to duperemove would be better
(from user's POV) than writing a whole new tool


What would IMO be really good from a user's POV was, if one of the
tools, deemed to be the "best", would be added to the btrfs-progs and
simply become "the official" one.

Cheers,
Chris.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-07 Thread Christoph Anton Mitterer
On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote:
> I think adding a whole-file dedup mode to duperemove would be better
> (from user's POV) than writing a whole new tool

What would IMO be really good from a user's POV was, if one of the
tools, deemed to be the "best", would be added to the btrfs-progs and
simply become "the official" one.

Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: Announcing btrfs-dedupe

2016-11-07 Thread Darrick J. Wong
On Mon, Nov 07, 2016 at 09:54:09PM +0100, Adam Borowski wrote:
> On Mon, Nov 07, 2016 at 09:48:41AM -0800, Mark Fasheh wrote:
> > also on XFS with the dedupe ioctl (I believe this should be out with
> > Linux-4.9).
> 
> It's already there in 4.9-rc1, although you need a special version of
> xfsprogs (possibly already released, I didn't check).  It's an experimental
> feature that needs to be enabled with "-m reflink=1".

The code will be available in xfsprogs 4.9, due out after Linux 4.9.

You'll still have to pass '-m reflink=1' to enable reflink until we
declare the feature stable, however.

> Despite that experimental status, I'd strongly recommend James to test his
> tool on xfs as well, as it's the second major implementation of this API[1].

Agreed. :)

> Mark has already included XFS in documentation of duperemove, all that looks
> amiss is btrfs-extent-same having an obsolete name.  But then, I never did
> any non-superficial tests on XFS, beyond "seems to work".

/me wonders if ocfs2 will ever catch up to the reflink/dedupe party. ;)

--Darrick

> 
> 
> Meow!
> 
> [1]. For some reasons zfs-on-linux guys didn't implement this yet, despite
> it being an obvious thing on ZFS.
> -- 
> A MAP07 (Dead Simple) raspberry tincture recipe: 0.5l 95% alcohol, 1kg
> raspberries, 0.4kg sugar; put into a big jar for 1 month.  Filter out and
> throw away the fruits (can dump them into a cake, etc), let the drink age
> at least 3-6 months.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-07 Thread Adam Borowski
On Mon, Nov 07, 2016 at 09:48:41AM -0800, Mark Fasheh wrote:
> also on XFS with the dedupe ioctl (I believe this should be out with
> Linux-4.9).

It's already there in 4.9-rc1, although you need a special version of
xfsprogs (possibly already released, I didn't check).  It's an experimental
feature that needs to be enabled with "-m reflink=1".

Despite that experimental status, I'd strongly recommend James to test his
tool on xfs as well, as it's the second major implementation of this API[1].


Mark has already included XFS in documentation of duperemove, all that looks
amiss is btrfs-extent-same having an obsolete name.  But then, I never did
any non-superficial tests on XFS, beyond "seems to work".


Meow!

[1]. For some reasons zfs-on-linux guys didn't implement this yet, despite
it being an obvious thing on ZFS.
-- 
A MAP07 (Dead Simple) raspberry tincture recipe: 0.5l 95% alcohol, 1kg
raspberries, 0.4kg sugar; put into a big jar for 1 month.  Filter out and
throw away the fruits (can dump them into a cake, etc), let the drink age
at least 3-6 months.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-07 Thread James Pharaoh
FWIW I have updated my comments about duperemove and also the "caveat" 
section you mentioned in your other mail in the readme.


http://btrfs-dedupe.com

James

On 07/11/16 19:49, James Pharaoh wrote:

Annoyingly I can't find this now, but I definitely remember reading
someone, apparently someone knowledgable, claim that the latest version
of the kernel which I was using at the time, still suffered from issues
regarding the dedupe code.

This was a while ago, and I would be very pleased to hear that there is
high confidence in the current implementation! I'll post a link if I
manage to find the comments.

James

On 07/11/16 18:59, Mark Fasheh wrote:

Hi James,

Re the following text on your project page:

"IMPORTANT CAVEAT — I have read that there are race and/or error
conditions which can cause filesystem corruption in the kernel
implementation of the deduplication ioctl."

Can you expound on that? I'm not aware of any bugs right now but if
there is any it'd absolutely be worth having that info on the btrfs
list.

Thanks,
--Mark


On Sun, Nov 6, 2016 at 7:30 AM, James Pharaoh
 wrote:

Hi all,

I'm pleased to announce my btrfs deduplication utility, written in Rust.
This operates on whole files, is fast, and I believe complements the
existing utilities (duperemove, bedup), which exist currently.

Please visit the homepage for more information:

http://btrfs-dedupe.com

James Pharaoh
--
To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-07 Thread James Pharaoh
Annoyingly I can't find this now, but I definitely remember reading 
someone, apparently someone knowledgable, claim that the latest version 
of the kernel which I was using at the time, still suffered from issues 
regarding the dedupe code.


This was a while ago, and I would be very pleased to hear that there is 
high confidence in the current implementation! I'll post a link if I 
manage to find the comments.


James

On 07/11/16 18:59, Mark Fasheh wrote:

Hi James,

Re the following text on your project page:

"IMPORTANT CAVEAT — I have read that there are race and/or error
conditions which can cause filesystem corruption in the kernel
implementation of the deduplication ioctl."

Can you expound on that? I'm not aware of any bugs right now but if
there is any it'd absolutely be worth having that info on the btrfs
list.

Thanks,
--Mark


On Sun, Nov 6, 2016 at 7:30 AM, James Pharaoh
 wrote:

Hi all,

I'm pleased to announce my btrfs deduplication utility, written in Rust.
This operates on whole files, is fast, and I believe complements the
existing utilities (duperemove, bedup), which exist currently.

Please visit the homepage for more information:

http://btrfs-dedupe.com

James Pharaoh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-07 Thread Mark Fasheh
Hi James,

Re the following text on your project page:

"IMPORTANT CAVEAT — I have read that there are race and/or error
conditions which can cause filesystem corruption in the kernel
implementation of the deduplication ioctl."

Can you expound on that? I'm not aware of any bugs right now but if
there is any it'd absolutely be worth having that info on the btrfs
list.

Thanks,
--Mark


On Sun, Nov 6, 2016 at 7:30 AM, James Pharaoh
 wrote:
> Hi all,
>
> I'm pleased to announce my btrfs deduplication utility, written in Rust.
> This operates on whole files, is fast, and I believe complements the
> existing utilities (duperemove, bedup), which exist currently.
>
> Please visit the homepage for more information:
>
> http://btrfs-dedupe.com
>
> James Pharaoh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-07 Thread Mark Fasheh
Hi David and James,

On Mon, Nov 7, 2016 at 6:02 AM, David Sterba  wrote:
> On Sun, Nov 06, 2016 at 02:30:52PM +0100, James Pharaoh wrote:
>> I'm pleased to announce my btrfs deduplication utility, written in Rust.
>> This operates on whole files, is fast, and I believe complements the
>> existing utilities (duperemove, bedup), which exist currently.
>
> Mark can correct me if I'm wrong, but AFAIK, duperemove can consume
> output of fdupes, which does the whole file scanning for duplicates. And
> I think adding a whole-file dedup mode to duperemove would be better
> (from user's POV) than writing a whole new tool, eg. because of existing
> availability of duperemove in the distros.

Yeah you are correct - fdupes -r /foo | duperemove --fdupes  will get
you the same effect.

There's been a request for us to do all of that internally so that the
whole file dedupe works with the mtime checking code. This is entirely
doable. I would probably either add a field to the files table or add
a new table to hold whole-file hashes. We can then squeeze down our
existing block hashes into one big one or just rehash the whole file.


> Also looking to your roadmap, some of the items are implemented in
> duperemove: database of existing csums, cross filesystem boundary,
> mtime-based speedups).

Yeah, rescanning based on mtime was a huge speedup for Duperemove as
was keeping checksums in a db. We do all this today, also on XFS with
the dedupe ioctl (I believe this should be out with Linux-4.9).

Btw, there's lots of little details and bug fixes which I feel add up
to a relatively complete (though far from perfect!) tool. For example,
the dedupe code can handle multiple kernel versions including old
kernels which couldn't dedupe on non aligned block boundaries. Every
major step in duperemove is threaded at this point too which has also
been an enormous performance increase (which new features benefit
from).

Thanks,
--Mark

-- 
"When the going gets weird, the weird turn pro."
Hunter S. Thompson
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcing btrfs-dedupe

2016-11-07 Thread David Sterba
On Sun, Nov 06, 2016 at 02:30:52PM +0100, James Pharaoh wrote:
> I'm pleased to announce my btrfs deduplication utility, written in Rust. 
> This operates on whole files, is fast, and I believe complements the 
> existing utilities (duperemove, bedup), which exist currently.

Mark can correct me if I'm wrong, but AFAIK, duperemove can consume
output of fdupes, which does the whole file scanning for duplicates. And
I think adding a whole-file dedup mode to duperemove would be better
(from user's POV) than writing a whole new tool, eg. because of existing
availability of duperemove in the distros.

Also looking to your roadmap, some of the items are implemented in
duperemove: database of existing csums, cross filesystem boundary,
mtime-based speedups).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Announcing btrfs-dedupe

2016-11-06 Thread James Pharaoh

Hi all,

I'm pleased to announce my btrfs deduplication utility, written in Rust. 
This operates on whole files, is fast, and I believe complements the 
existing utilities (duperemove, bedup), which exist currently.


Please visit the homepage for more information:

http://btrfs-dedupe.com

James Pharaoh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html