Re: delta copies

2007-09-24 Thread Fabian Cenedese

Not necessarily.  Depending on how pg_dump works, it could be that
small changes to the database are resulting in unnecessarily large
changes to the dump.  Make sure you are using the uncompressed format
because most compression algorithms defeat the delta-transfer
algorithm almost completely.  Then you might take a look at two
consecutive dumps and check whether records common to both appear in
the same order in each dump.  (If pg_dump is dumping the records in a
different order each time, that would also defeat the delta-transfer
algorithm because no block of several consecutive records could be
matched.)

I was wondering if it would be possible to add a switch (probably coupled
to -v(+) ) that would report the number of matched blocks per file. Maybe
even with the offset of the block. That of course wouldn't help much in a
production environment, but it could help while setting up the whole
backup strategy. This would have helped the original poster finding
out if rsync is working correctly.

bye  Fabi


-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: delta copies

2007-09-24 Thread Matt McCutchen
On 9/24/07, Fabian Cenedese [EMAIL PROTECTED] wrote:
 I was wondering if it would be possible to add a switch (probably coupled
 to -v(+) ) that would report the number of matched blocks per file. Maybe
 even with the offset of the block.

Rsync already lists all matched blocks by their offsets and lengths if
given -vvv.

Matt
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: delta copies

2007-09-24 Thread Fabian Cenedese
At 07:51 24.09.2007 -0400, Matt McCutchen wrote:
On 9/24/07, Fabian Cenedese [EMAIL PROTECTED] wrote:
 I was wondering if it would be possible to add a switch (probably coupled
 to -v(+) ) that would report the number of matched blocks per file. Maybe
 even with the offset of the block.

Rsync already lists all matched blocks by their offsets and lengths if
given -vvv.

Right, sorry about that. I didn't find it because the manual doesn't state
what info is given for more than 2 v's, only used for debugging :)

bye  Fabi


-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: delta copies

2007-09-23 Thread Robert Fitzpatrick
On Sun, 2007-09-23 at 00:56 -0400, Matt McCutchen wrote:
 On 9/22/07, Robert Fitzpatrick [EMAIL PROTECTED] wrote:
 Either the delta transfer algorithm is not being used due to a
 misconfiguration, or the pgsql backups are changing in a perverse way
 that prevents it from matching any data.  Please send the command line
 and output for the remote transfers the same way you did for the local
 test so I can see what is happening.
 

Doing some more testing this morning, maybe what you suggested about the
pgsql backup is what is happening. I took a closer look and realize that
my sql backup is actually smaller some days than the destination, will
that affect delta copies? This is because the db is a mail cache and
items expire each night and get purged from the database, then vacuumed.
So, the size will relatively stay the same, but will fluctuate sometimes
larger, sometimes smaller in size compared to the destination copy.

I noticed this when I tried a simple pg_dump of a single db on a test
machine here in our lab to the same destination server, all works great.
So, I rsync that same sql backup to the problem source server and try
from there, wouldn't you know it, it worked great.

I'm doing a fresh test now of my problematic backup to be sure it is
carried out the exact same as my successful backup except the db that is
being backed up of course. I'll send you the results if still a problem
like you requested if you don't think the size is an issue.

Thanks for all the help!

-- 
Robert

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: delta copies

2007-09-23 Thread Matt McCutchen
On 9/23/07, Robert Fitzpatrick [EMAIL PROTECTED] wrote:
 Doing some more testing this morning, maybe what you suggested about the
 pgsql backup is what is happening. I took a closer look and realize that
 my sql backup is actually smaller some days than the destination, will
 that affect delta copies?

No, rsync will match as much data as it can regardless of the relative
sizes of the source and old destination files.

Matt
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: delta copies

2007-09-23 Thread Matt McCutchen
On 9/23/07, Robert Fitzpatrick [EMAIL PROTECTED] wrote:
 On Sun, 2007-09-23 at 00:56 -0400, Matt McCutchen wrote:
  Either the delta transfer algorithm is not being used due to a
  misconfiguration, or the pgsql backups are changing in a perverse way
  that prevents it from matching any data.

 Doing some more testing this morning, maybe what you suggested about the
 pgsql backup is what is happening.

The perverse way I mentioned would be along the lines of updating a
set of timestamps that appear every few hundred bytes in the backup
file, regardless of how many values in the database have actually
changed.  This would prevent any of the blocks into which rsync splits
the old destination file from matching the source file.

Matt
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: delta copies

2007-09-23 Thread WebTent
On Sun, 2007-09-23 at 12:09 -0400, Matt McCutchen wrote:
 On 9/23/07, Robert Fitzpatrick [EMAIL PROTECTED] wrote:
  On Sun, 2007-09-23 at 00:56 -0400, Matt McCutchen wrote:
   Either the delta transfer algorithm is not being used due to a
   misconfiguration, or the pgsql backups are changing in a perverse way
   that prevents it from matching any data.
 
  Doing some more testing this morning, maybe what you suggested about the
  pgsql backup is what is happening.
 
 The perverse way I mentioned would be along the lines of updating a
 set of timestamps that appear every few hundred bytes in the backup
 file, regardless of how many values in the database have actually
 changed.  This would prevent any of the blocks into which rsync splits
 the old destination file from matching the source file.

Well, I am getting matched data, but it just doesn't seem to be matching
very much considering the small change in file size. I tested one dump
after another rsyncing in between dumps and got very little matched
data :(

esmtp# ls -la maia.sql
-rw-r--r--  1 root  wheel  997960610 Sep 23 14:14 maia.sql
mx1# pg_dump -Fc -Upgsql maia  maia.sql
mx1# ls -la maia.sql 
-rw-r--r--  1 root  wheel  999709040 Sep 23 15:55 maia.sql
mx1# rsync -az --stats --progress data/maia.sql 
esmtp:/data/backup/mx1.webtent.net/db/data/
building file list ... 
1 file to consider
maia.sql
   999709040 100%  369.39kB/s0:44:02 (xfer#1, to-check=0/1)

Number of files: 1
Number of files transferred: 1
Total file size: 999709040 bytes
Total transferred file size: 999709040 bytes
Literal data: 987800910 bytes
Matched data: 11908130 bytes
File list size: 35
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 987566347
Total bytes received: 221228

sent 987566347 bytes  received 221228 bytes  371000.03 bytes/sec
total size is 999709040  speedup is 1.01

Since this database is very active as a mail cache, I guess it is
changing more data than it seems.

-- 
Robert

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: delta copies

2007-09-23 Thread Matt McCutchen
On 9/23/07, WebTent [EMAIL PROTECTED] wrote:
 Well, I am getting matched data, but it just doesn't seem to be matching
 very much considering the small change in file size. I tested one dump
 after another rsyncing in between dumps and got very little matched
 data :(

 mx1# pg_dump -Fc -Upgsql maia  maia.sql

 Total transferred file size: 999709040 bytes
 Literal data: 987800910 bytes
 Matched data: 11908130 bytes

 Since this database is very active as a mail cache, I guess it is
 changing more data than it seems.

Not necessarily.  Depending on how pg_dump works, it could be that
small changes to the database are resulting in unnecessarily large
changes to the dump.  Make sure you are using the uncompressed format
because most compression algorithms defeat the delta-transfer
algorithm almost completely.  Then you might take a look at two
consecutive dumps and check whether records common to both appear in
the same order in each dump.  (If pg_dump is dumping the records in a
different order each time, that would also defeat the delta-transfer
algorithm because no block of several consecutive records could be
matched.)

Matt
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: delta copies

2007-09-23 Thread WebTent
On Sun, 2007-09-23 at 17:46 -0400, Matt McCutchen wrote:
 On 9/23/07, WebTent [EMAIL PROTECTED] wrote:
 Not necessarily.  Depending on how pg_dump works, it could be that
 small changes to the database are resulting in unnecessarily large
 changes to the dump.  Make sure you are using the uncompressed format
 because most compression algorithms defeat the delta-transfer
 algorithm almost completely.  Then you might take a look at two
 consecutive dumps and check whether records common to both appear in
 the same order in each dump.  (If pg_dump is dumping the records in a
 different order each time, that would also defeat the delta-transfer
 algorithm because no block of several consecutive records could be
 matched.)

Yeah, I understand what you're saying. What blows me away is that I have
cwRsync keeping two MSSQL backups in sync and this works great. Using
Microsoft SQL Server backup makes a binary compressed backup and it
matches data very well and transfers take a fraction of their initial
transfer. But this larger pgsql compressed backup is so different. I
think I've already tried plain text backup, but I will run it tonight
and if it still doesn't match any better, I'll see if I can find out why
on the Postgresql list this week.

Thanks again for the insight!

-- 
Robert

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: delta copies

2007-09-22 Thread Matt McCutchen
On 9/22/07, Robert Fitzpatrick [EMAIL PROTECTED] wrote:
 I do a
 simple tar of /var twice and then rsync the two files only as a test, no
 folders involvedtrying to determine why the entire file is being
 copied in its entirety and not synchronized bit-by-bit.

 esmtp# rsync -az --progress --stats test/ test2/
 building file list ...
 2 files to consider
 ./
 testvar.tar
   1298128896 100%   13.00MB/s0:01:35  (1, 100.0% of 2)

 Number of files: 2
 Number of files transferred: 1
 Total file size: 1298128896 bytes
 Total transferred file size: 1298128896 bytes
 Literal data: 1298128896 bytes
 Matched data: 0 bytes
 File list size: 53
 File list generation time: 0.001 seconds
 File list transfer time: 0.000 seconds
 Total bytes sent: 178178479
 Total bytes received: 48

 sent 178178479 bytes  received 48 bytes  1846409.61 bytes/sec
 total size is 1298128896  speedup is 7.29

Delta transfers reduce network traffic between the sending and
receiving rsync processes at the cost of some extra CPU time and disk
I/O (e.g., the receiver has to read the old destination file).  The
reduction in network traffic is only relevant if the two processes are
on different machines.  When they are on the same machine as in your
example, there is no point in doing delta transfers, so by default
rsync does not do them.  You can force rsync to do delta transfers by
passing --no-whole-file, but I can't imagine why you would want to do
this unless you are testing to see how much traffic reduction delta
transfers would achieve.

 Also, does the match data bytes show the amount
 matched when making delta copies, I assume?

Yes.  More precisely, it shows the amount that rsync found to be
matched.  Rsync's goal is to match as much data as convenient, not to
give you an exact measure of how much of the file changed.

Matt
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: delta copies

2007-09-22 Thread Robert Fitzpatrick
On Sat, 2007-09-22 at 20:00 -0400, Matt McCutchen wrote:
 On 9/22/07, Robert Fitzpatrick [EMAIL PROTECTED] wrote:
 Delta transfers reduce network traffic between the sending and
 receiving rsync processes at the cost of some extra CPU time and disk
 I/O (e.g., the receiver has to read the old destination file).  The
 reduction in network traffic is only relevant if the two processes are
 on different machines.  When they are on the same machine as in your
 example, there is no point in doing delta transfers, so by default
 rsync does not do them.  You can force rsync to do delta transfers by
 passing --no-whole-file, but I can't imagine why you would want to do
 this unless you are testing to see how much traffic reduction delta
 transfers would achieve.

Yes, of course, I want to use this doing remote transfers. But since I
was not able to get it to work with our remote transfers, I was trying
to do it locally to speed up testing. Are you saying that rsync knows
the difference and will enable/disable delta transfers depending on
whether local or not? However, again, I get the same result doing remote
transfers. Thanks for the tip, I will try some more testing tomorrow
using the --no-whole-file option...

-- 
Robert

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: delta copies

2007-09-22 Thread Matt McCutchen
On 9/22/07, Robert Fitzpatrick [EMAIL PROTECTED] wrote:
 Are you saying that rsync knows
 the difference and will enable/disable delta transfers depending on
 whether local or not?

Yes.

 However, again, I get the same result doing remote
 transfers.

Either the delta transfer algorithm is not being used due to a
misconfiguration, or the pgsql backups are changing in a perverse way
that prevents it from matching any data.  Please send the command line
and output for the remote transfers the same way you did for the local
test so I can see what is happening.

Matt
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html