Re: delta copies
Not necessarily. Depending on how pg_dump works, it could be that small changes to the database are resulting in unnecessarily large changes to the dump. Make sure you are using the uncompressed format because most compression algorithms defeat the delta-transfer algorithm almost completely. Then you might take a look at two consecutive dumps and check whether records common to both appear in the same order in each dump. (If pg_dump is dumping the records in a different order each time, that would also defeat the delta-transfer algorithm because no block of several consecutive records could be matched.) I was wondering if it would be possible to add a switch (probably coupled to -v(+) ) that would report the number of matched blocks per file. Maybe even with the offset of the block. That of course wouldn't help much in a production environment, but it could help while setting up the whole backup strategy. This would have helped the original poster finding out if rsync is working correctly. bye Fabi -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: delta copies
On 9/24/07, Fabian Cenedese [EMAIL PROTECTED] wrote: I was wondering if it would be possible to add a switch (probably coupled to -v(+) ) that would report the number of matched blocks per file. Maybe even with the offset of the block. Rsync already lists all matched blocks by their offsets and lengths if given -vvv. Matt -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: delta copies
At 07:51 24.09.2007 -0400, Matt McCutchen wrote: On 9/24/07, Fabian Cenedese [EMAIL PROTECTED] wrote: I was wondering if it would be possible to add a switch (probably coupled to -v(+) ) that would report the number of matched blocks per file. Maybe even with the offset of the block. Rsync already lists all matched blocks by their offsets and lengths if given -vvv. Right, sorry about that. I didn't find it because the manual doesn't state what info is given for more than 2 v's, only used for debugging :) bye Fabi -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: delta copies
On Sun, 2007-09-23 at 00:56 -0400, Matt McCutchen wrote: On 9/22/07, Robert Fitzpatrick [EMAIL PROTECTED] wrote: Either the delta transfer algorithm is not being used due to a misconfiguration, or the pgsql backups are changing in a perverse way that prevents it from matching any data. Please send the command line and output for the remote transfers the same way you did for the local test so I can see what is happening. Doing some more testing this morning, maybe what you suggested about the pgsql backup is what is happening. I took a closer look and realize that my sql backup is actually smaller some days than the destination, will that affect delta copies? This is because the db is a mail cache and items expire each night and get purged from the database, then vacuumed. So, the size will relatively stay the same, but will fluctuate sometimes larger, sometimes smaller in size compared to the destination copy. I noticed this when I tried a simple pg_dump of a single db on a test machine here in our lab to the same destination server, all works great. So, I rsync that same sql backup to the problem source server and try from there, wouldn't you know it, it worked great. I'm doing a fresh test now of my problematic backup to be sure it is carried out the exact same as my successful backup except the db that is being backed up of course. I'll send you the results if still a problem like you requested if you don't think the size is an issue. Thanks for all the help! -- Robert -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: delta copies
On 9/23/07, Robert Fitzpatrick [EMAIL PROTECTED] wrote: Doing some more testing this morning, maybe what you suggested about the pgsql backup is what is happening. I took a closer look and realize that my sql backup is actually smaller some days than the destination, will that affect delta copies? No, rsync will match as much data as it can regardless of the relative sizes of the source and old destination files. Matt -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: delta copies
On 9/23/07, Robert Fitzpatrick [EMAIL PROTECTED] wrote: On Sun, 2007-09-23 at 00:56 -0400, Matt McCutchen wrote: Either the delta transfer algorithm is not being used due to a misconfiguration, or the pgsql backups are changing in a perverse way that prevents it from matching any data. Doing some more testing this morning, maybe what you suggested about the pgsql backup is what is happening. The perverse way I mentioned would be along the lines of updating a set of timestamps that appear every few hundred bytes in the backup file, regardless of how many values in the database have actually changed. This would prevent any of the blocks into which rsync splits the old destination file from matching the source file. Matt -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: delta copies
On Sun, 2007-09-23 at 12:09 -0400, Matt McCutchen wrote: On 9/23/07, Robert Fitzpatrick [EMAIL PROTECTED] wrote: On Sun, 2007-09-23 at 00:56 -0400, Matt McCutchen wrote: Either the delta transfer algorithm is not being used due to a misconfiguration, or the pgsql backups are changing in a perverse way that prevents it from matching any data. Doing some more testing this morning, maybe what you suggested about the pgsql backup is what is happening. The perverse way I mentioned would be along the lines of updating a set of timestamps that appear every few hundred bytes in the backup file, regardless of how many values in the database have actually changed. This would prevent any of the blocks into which rsync splits the old destination file from matching the source file. Well, I am getting matched data, but it just doesn't seem to be matching very much considering the small change in file size. I tested one dump after another rsyncing in between dumps and got very little matched data :( esmtp# ls -la maia.sql -rw-r--r-- 1 root wheel 997960610 Sep 23 14:14 maia.sql mx1# pg_dump -Fc -Upgsql maia maia.sql mx1# ls -la maia.sql -rw-r--r-- 1 root wheel 999709040 Sep 23 15:55 maia.sql mx1# rsync -az --stats --progress data/maia.sql esmtp:/data/backup/mx1.webtent.net/db/data/ building file list ... 1 file to consider maia.sql 999709040 100% 369.39kB/s0:44:02 (xfer#1, to-check=0/1) Number of files: 1 Number of files transferred: 1 Total file size: 999709040 bytes Total transferred file size: 999709040 bytes Literal data: 987800910 bytes Matched data: 11908130 bytes File list size: 35 File list generation time: 0.001 seconds File list transfer time: 0.000 seconds Total bytes sent: 987566347 Total bytes received: 221228 sent 987566347 bytes received 221228 bytes 371000.03 bytes/sec total size is 999709040 speedup is 1.01 Since this database is very active as a mail cache, I guess it is changing more data than it seems. -- Robert -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: delta copies
On 9/23/07, WebTent [EMAIL PROTECTED] wrote: Well, I am getting matched data, but it just doesn't seem to be matching very much considering the small change in file size. I tested one dump after another rsyncing in between dumps and got very little matched data :( mx1# pg_dump -Fc -Upgsql maia maia.sql Total transferred file size: 999709040 bytes Literal data: 987800910 bytes Matched data: 11908130 bytes Since this database is very active as a mail cache, I guess it is changing more data than it seems. Not necessarily. Depending on how pg_dump works, it could be that small changes to the database are resulting in unnecessarily large changes to the dump. Make sure you are using the uncompressed format because most compression algorithms defeat the delta-transfer algorithm almost completely. Then you might take a look at two consecutive dumps and check whether records common to both appear in the same order in each dump. (If pg_dump is dumping the records in a different order each time, that would also defeat the delta-transfer algorithm because no block of several consecutive records could be matched.) Matt -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: delta copies
On Sun, 2007-09-23 at 17:46 -0400, Matt McCutchen wrote: On 9/23/07, WebTent [EMAIL PROTECTED] wrote: Not necessarily. Depending on how pg_dump works, it could be that small changes to the database are resulting in unnecessarily large changes to the dump. Make sure you are using the uncompressed format because most compression algorithms defeat the delta-transfer algorithm almost completely. Then you might take a look at two consecutive dumps and check whether records common to both appear in the same order in each dump. (If pg_dump is dumping the records in a different order each time, that would also defeat the delta-transfer algorithm because no block of several consecutive records could be matched.) Yeah, I understand what you're saying. What blows me away is that I have cwRsync keeping two MSSQL backups in sync and this works great. Using Microsoft SQL Server backup makes a binary compressed backup and it matches data very well and transfers take a fraction of their initial transfer. But this larger pgsql compressed backup is so different. I think I've already tried plain text backup, but I will run it tonight and if it still doesn't match any better, I'll see if I can find out why on the Postgresql list this week. Thanks again for the insight! -- Robert -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: delta copies
On 9/22/07, Robert Fitzpatrick [EMAIL PROTECTED] wrote: I do a simple tar of /var twice and then rsync the two files only as a test, no folders involvedtrying to determine why the entire file is being copied in its entirety and not synchronized bit-by-bit. esmtp# rsync -az --progress --stats test/ test2/ building file list ... 2 files to consider ./ testvar.tar 1298128896 100% 13.00MB/s0:01:35 (1, 100.0% of 2) Number of files: 2 Number of files transferred: 1 Total file size: 1298128896 bytes Total transferred file size: 1298128896 bytes Literal data: 1298128896 bytes Matched data: 0 bytes File list size: 53 File list generation time: 0.001 seconds File list transfer time: 0.000 seconds Total bytes sent: 178178479 Total bytes received: 48 sent 178178479 bytes received 48 bytes 1846409.61 bytes/sec total size is 1298128896 speedup is 7.29 Delta transfers reduce network traffic between the sending and receiving rsync processes at the cost of some extra CPU time and disk I/O (e.g., the receiver has to read the old destination file). The reduction in network traffic is only relevant if the two processes are on different machines. When they are on the same machine as in your example, there is no point in doing delta transfers, so by default rsync does not do them. You can force rsync to do delta transfers by passing --no-whole-file, but I can't imagine why you would want to do this unless you are testing to see how much traffic reduction delta transfers would achieve. Also, does the match data bytes show the amount matched when making delta copies, I assume? Yes. More precisely, it shows the amount that rsync found to be matched. Rsync's goal is to match as much data as convenient, not to give you an exact measure of how much of the file changed. Matt -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: delta copies
On Sat, 2007-09-22 at 20:00 -0400, Matt McCutchen wrote: On 9/22/07, Robert Fitzpatrick [EMAIL PROTECTED] wrote: Delta transfers reduce network traffic between the sending and receiving rsync processes at the cost of some extra CPU time and disk I/O (e.g., the receiver has to read the old destination file). The reduction in network traffic is only relevant if the two processes are on different machines. When they are on the same machine as in your example, there is no point in doing delta transfers, so by default rsync does not do them. You can force rsync to do delta transfers by passing --no-whole-file, but I can't imagine why you would want to do this unless you are testing to see how much traffic reduction delta transfers would achieve. Yes, of course, I want to use this doing remote transfers. But since I was not able to get it to work with our remote transfers, I was trying to do it locally to speed up testing. Are you saying that rsync knows the difference and will enable/disable delta transfers depending on whether local or not? However, again, I get the same result doing remote transfers. Thanks for the tip, I will try some more testing tomorrow using the --no-whole-file option... -- Robert -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: delta copies
On 9/22/07, Robert Fitzpatrick [EMAIL PROTECTED] wrote: Are you saying that rsync knows the difference and will enable/disable delta transfers depending on whether local or not? Yes. However, again, I get the same result doing remote transfers. Either the delta transfer algorithm is not being used due to a misconfiguration, or the pgsql backups are changing in a perverse way that prevents it from matching any data. Please send the command line and output for the remote transfers the same way you did for the local test so I can see what is happening. Matt -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html