Re: rsync rewrites all blocks of large files although it uses delta transfer

2019-02-14 Thread Paul Slootman via rsync
On Thu 14 Feb 2019, Delian Krustev via rsync wrote:
> On Wednesday, February 13, 2019 6:25:59 PM EET Remi Gauvin 
>  wrote:
> > If the --inplace delta is as large as the filesize, then the
> > structure/location of the data has changed enough that the whole file
> > would have to be written out in any case.
> 
> This is not the case.
> If you see my original post you would have noticed that the delta transfer 
> finds only about 20 MB of differences within the almost 2G datafile.

I think you're missing the point of Remi's message.

Say the original file is:

ABCDEFGHIJ

The new file is:

XABCDEFGHI

Then the delta is just 10%, but the entire file needs to be rewritten as
the structure is changed.


Paul

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync rewrites all blocks of large files although it uses delta transfer

2019-02-13 Thread Delian Krustev via rsync
On Wednesday, February 13, 2019 6:25:59 PM EET Remi Gauvin 
 wrote:
> If the --inplace delta is as large as the filesize, then the
> structure/location of the data has changed enough that the whole file
> would have to be written out in any case.

This is not the case.
If you see my original post you would have noticed that the delta transfer 
finds only about 20 MB of differences within the almost 2G datafile.

The problem with --inplace without --backupdir is that delta transfers can no 
longer work efficiently.


Cheers
--
Delian

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync rewrites all blocks of large files although it uses delta transfer

2019-02-13 Thread Delian Krustev via rsync
On Wednesday, February 13, 2019 6:20:13 PM EET Remi Gauvin via rsync 
 wrote:
> Have you run the nifs-clean before checking this free space comparison?
>  Maybe there is just large amplification created by Rsyn's many small
> writes when using --inplace.

nilfs-clean is being suspended for the time of the backup. It would have idled 
if the fullness threshold of the FS (90% by default) have not been reached.

The problem is probably that these mysqldump files have changed data near the 
beginning of the files. Thus any later blocks have to be overwritten. In order 
to avoid this "rsync" would have to allocate and deallocate space in the 
middle of the file:

  http://man7.org/linux/man-pages/man2/fallocate.2.html

and unfortunately the respective syscalls are not portable, quite new and 
filesystem specific.

Would have been nice to have these for all OSes and filesystems though. And 
better yet not aligned on FS block size. E.g.:

  - give me 5 new blocks in the middle of file F starting at POS
  - do not use the entire last block of these 5 but rather only X bytes of it.

or
  - replace block 5 with "this" partial block data
  - truncate blocks 6 to 20

I can find a usage for them in many application workflows - from text editors 
trough databases to backup software ..


Cheers
--
Delian


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync rewrites all blocks of large files although it uses delta transfer

2019-02-13 Thread Remi Gauvin via rsync
On 2019-02-13 10:47 a.m., Delian Krustev via rsync wrote:
>
> 
> Free space at the beginning and end of the backup:
> Filesystem 1M-blocks   Used Available Use% Mounted on
> /dev/mapper/bkp   102392  76872 20400  80% /mnt/bkp
> /dev/mapper/bkp   102392  78768 18504  81% /mnt/bkp
> 
> 
> 
> As can be seen "rsync" has sent about 20M and received 300K of data. However 
> the filesystem has allocated almost 2G, which is the total size of the files 
> being backed up.
> 
> The filesystem mounted on "/mnt/bkp" is of type "nilfs2", which is a log 
> structured filesystem. I'm using its snapshotting feature to keep backups for 
> past dates.


Have you run the nifs-clean before checking this free space comparison?
 Maybe there is just large amplification created by Rsyn's many small
writes when using --inplace.

<>

signature.asc
Description: OpenPGP digital signature
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync rewrites all blocks of large files although it uses delta transfer

2019-02-13 Thread Remi Gauvin via rsync
On 2019-02-13 5:26 p.m., Delian Krustev via rsync wrote:

> 
> The copy is needed for the comparison of the blocks as "--inplace" overwrites 
> the destination file. I've tried without "--backup" but then the delta 
> transfers too much data - close to the size of the backed-up files.
> 

It's cool that --backup can be used as source data that way, a feature
was unaware of.. but I think you found the cause of your problem right
here as well.

If the --inplace delta is as large as the filesize, then the
structure/location of the data has changed enough that the whole file
would have to be written out in any case.



<>

signature.asc
Description: OpenPGP digital signature
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync rewrites all blocks of large files although it uses delta transfer

2019-02-13 Thread Kevin Korb via rsync
It can't do what you want.  The closest thing would be --compare-dest.

On 2/13/19 5:26 PM, Delian Krustev wrote:
> On Wednesday, February 13, 2019 11:29:44 AM EET Kevin Korb via rsync 
>  wrote:
>> With --backup in order to end up with 2 files it has to write out a
>> whole new file.
>> Sure, it only sent the differences (normally that means
>> over the network but there is no network here) but the writing end was
>> told to duplicate the file being updated before updating it.
> 
> The copy is needed for the comparison of the blocks as "--inplace" overwrites 
> the destination file. I've tried without "--backup" but then the delta 
> transfers too much data - close to the size of the backed-up files.
> 
> The copy is in a temp file system which is discarded after the backup (by "rm 
> -rf"). This temp filesystem is not log structured or copy-on-write so having 
> a 
> copy there is not a big problem. Although I don't want a backup of all files 
> which are modified but rather a TMPDIR.
> 
> The ideal workflow would be to compare SRC and DST and write changed blocks 
> to 
> the TMPDIR, then read them from TMPDIR and apply it to DST.
> 
> 
> 
>  
> Cheers
> --
> Delian
> 

-- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
Kevin Korb  Phone:(407) 252-6853
Systems Administrator   Internet:
FutureQuest, Inc.   ke...@futurequest.net  (work)
Orlando, Floridak...@sanitarium.net (personal)
Web page:   https://sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,



signature.asc
Description: OpenPGP digital signature
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync rewrites all blocks of large files although it uses delta transfer

2019-02-13 Thread Delian Krustev via rsync
On Wednesday, February 13, 2019 11:29:44 AM EET Kevin Korb via rsync 
 wrote:
> With --backup in order to end up with 2 files it has to write out a
> whole new file.
> Sure, it only sent the differences (normally that means
> over the network but there is no network here) but the writing end was
> told to duplicate the file being updated before updating it.

The copy is needed for the comparison of the blocks as "--inplace" overwrites 
the destination file. I've tried without "--backup" but then the delta 
transfers too much data - close to the size of the backed-up files.

The copy is in a temp file system which is discarded after the backup (by "rm 
-rf"). This temp filesystem is not log structured or copy-on-write so having a 
copy there is not a big problem. Although I don't want a backup of all files 
which are modified but rather a TMPDIR.

The ideal workflow would be to compare SRC and DST and write changed blocks to 
the TMPDIR, then read them from TMPDIR and apply it to DST.



 
Cheers
--
Delian



-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync rewrites all blocks of large files although it uses delta transfer

2019-02-13 Thread Kevin Korb via rsync
With --backup in order to end up with 2 files it has to write out a
whole new file.  Sure, it only sent the differences (normally that means
over the network but there is no network here) but the writing end was
told to duplicate the file being updated before updating it.

On 2/13/19 10:47 AM, Delian Krustev via rsync wrote:
>   Hi All,
> 
> For a backup purpose I'm trying to transfer only the changed blocks of
> large files. Thus I've run "rsync" with the appropriate options:
> 
>   RSYNC_BKPDIR=`mktemp -d`
>   rsync \
>   --archive \
>   --no-whole-file \
>   --inplace \
>   --backup \
>   --backup-dir="$RSYNC_BKPDIR" \
>   --verbose \
>   --stats \
>   /var/backups/mysql-dbs/. \
>   /mnt/bkp/var/backups/mysql-dbs/.
> 
> The problem is that although "rsync" shows that delta transfer is used(when 
> run with -vv) and only small amount if data is transferred, the target files 
> look to be overwritten in full.
> 
> Here is the output of "rsync" and some more debugging info:
> 
> 
> 
> sending incremental file list
> ./
> horde.data.sql
> horde.schema.sql
> LARGEDB.data.sql
> LARGEDB.schema.sql
> mysql.data.sql
> mysql.schema.sql
> phpmyadmin.data.sql
> phpmyadmin.schema.sql
> 
> Number of files: 9 (reg: 8, dir: 1)
> Number of created files: 0
> Number of deleted files: 0
> Number of regular files transferred: 8
> Total file size: 1,944,522,704 bytes
> Total transferred file size: 1,944,522,704 bytes
> Literal data: 21,421,681 bytes
> Matched data: 1,923,101,023 bytes
> File list size: 0
> File list generation time: 0.001 seconds
> File list transfer time: 0.000 seconds
> Total bytes sent: 21,612,218
> Total bytes received: 323,302
> 
> sent 21,612,218 bytes  received 323,302 bytes  259,591.95 bytes/sec
> total size is 1,944,522,704  speedup is 88.65
> 
> # du -m 1.9G /tmp/tmp.8gBzjNQOQZ
> 1.9G /tmp/tmp.8gBzjNQOQZ
> 
> # tree -a /tmp/tmp.8gBzjNQOQZ
> /tmp/tmp.8gBzjNQOQZ
> ├── horde.data.sql
> ├── horde.schema.sql
> ├── LARGEDB.data.sql
> ├── LARGEDB.schema.sql
> ├── mysql.data.sql
> ├── mysql.schema.sql
> ├── phpmyadmin.data.sql
> └── phpmyadmin.schema.sql
> 
> 0 directories, 8 files
> 
> Free space at the beginning and end of the backup:
> Filesystem 1M-blocks   Used Available Use% Mounted on
> /dev/mapper/bkp   102392  76872 20400  80% /mnt/bkp
> /dev/mapper/bkp   102392  78768 18504  81% /mnt/bkp
> 
> 
> 
> As can be seen "rsync" has sent about 20M and received 300K of data. However 
> the filesystem has allocated almost 2G, which is the total size of the files 
> being backed up.
> 
> The filesystem mounted on "/mnt/bkp" is of type "nilfs2", which is a log 
> structured filesystem. I'm using its snapshotting feature to keep backups for 
> past dates.
> 
> 
> Is there anything that can be done in order "rsync" to overwrite only the 
> changed blocks ?
> 
> 
> 
> 
> P.S. I guess that it will be the same for copy-on-write filesystems, e.g. 
> BTRFS or ZFS.
> 
> 
> 
> Cheers
> --
> Delian
> 

-- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
Kevin Korb  Phone:(407) 252-6853
Systems Administrator   Internet:
FutureQuest, Inc.   ke...@futurequest.net  (work)
Orlando, Floridak...@sanitarium.net (personal)
Web page:   https://sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,



signature.asc
Description: OpenPGP digital signature
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html