Re: [markfasheh/duperemove] Why blocksize is limit to 1MB?

Martin Raiber Tue, 03 Jan 2017 16:09:12 -0800
On 04.01.2017 00:43 Hans van Kranenburg wrote:
> On 01/04/2017 12:12 AM, Peter Becker wrote:
>> Good hint, this would be an option and i will try this.
>>
>> Regardless of this the curiosity has packed me and I will try to
>> figure out where the problem with the low transfer rate is.
>>
>> 2017-01-04 0:07 GMT+01:00 Hans van Kranenburg 
>> <hans.van.kranenb...@mendix.com>:
>>> On 01/03/2017 08:24 PM, Peter Becker wrote:
>>>> All invocations are justified, but not relevant in (offline) backup
>>>> and archive scenarios.
>>>>
>>>> For example you have multiple version of append-only log-files or
>>>> append-only db-files (each more then 100GB in size), like this:
>>>>
>>>>> Snapshot_01_01_2017
>>>> -> file1.log .. 201 GB
>>>>
>>>>> Snapshot_02_01_2017
>>>> -> file1.log .. 205 GB
>>>>
>>>>> Snapshot_03_01_2017
>>>> -> file1.log .. 221 GB
>>>>
>>>> The first 201 GB would be every time the same.
>>>> Files a copied at night from windows, linux or bsd systems and
>>>> snapshoted after copy.
>>> XY problem?
>>>
>>> Why not use rsync --inplace in combination with btrfs snapshots? Even if
>>> the remote does not support rsync and you need to pull the full file
>>> first, you could again use rsync locally.
> <annoyed>please don't toppost</annoyed>
>
> Also, there is a rather huge difference in the two approaches, given the
> way how btrfs works internally.
>
> Say, I have a subvolume with thousands of directories and millions of
> files with random data in it, and I want to have a second deduped copy
> of it.
>
> Approach 1:
>
> Create a full copy of everything (compare: retrieving remote file again)
> (now 200% of data storage is used), and after that do deduplication, so
> that again only 100% of data storage is used.
>
> Approach 2:
>
> cp -av --reflink original/ copy/
>
> By doing this, you end up with the same as doing approach 1 if your
> deduper is the most ideal in the world (and the files are so random they
> don't contain duplicate blocks inside them).
>
> Approach 3:
>
> btrfs sub snap original copy
>
> W00t, that was fast, and the only thing that happened was writing a few
> 16kB metadata pages again. (1 for the toplevel tree page that got cloned
> into a new filesystem tree, and a few for the blocks one level lower to
> add backreferences to the new root).
>
> So:
>
> The big difference in the end result between approach 1,2 and otoh 3 is
> that while deduplicating your data, you're actually duplicating all your
> metadata at the same time.
>
> In your situation, if possible doing an rsync --inplace from the remote,
> so that only changed appended data gets stored, and then useing native
> btrfs snapshotting it would seem the most effective.
>
Or use UrBackup as backup software. It uses the snapshot then modfiy
approach with btrfs, plus you get file level deduplication between
clients using reflinks.
smime.p7s
Description: S/MIME Cryptographic Signature
Re: [markfasheh/duperemove] Why blocksize is limit to 1MB?

Reply via email to