Re: [BackupPC-users] RsyncP problem
Harald Amtmann wrote at about 19:29:07 +0100 on Monday, December 7, 2009: So, for anyone who cares (doesn't seem to be anyone on this list who noticed), I found this post from 2006 stating and analyzing my exact problem: You are assuming something that is not true... http://www.topology.org/linux/backuppc.html On this site, search for Design flaw: Avoidable re-transmission of massive amounts of data. For future reference and archiving, I quote here in full: 2006-6-7: During the last week while using BackupPC in earnest, I have noticed a very serious design flaw which it totally avoidable by making a small change to the software. First I will describe the flaw with an example. details snipped The design flaw here is crystal clear. Consider a single file home1/xyz.txt. The authors has designed the BackupPC system so that the file home1/xyz.txt is sent in full from client1 to server1 unless details snipped The cure for this design flaw is very easy indeed, and it would save me several days of saturated LAN bandwidth when I make back-ups. It's very sad that the authors did not design the software correctly. Here is how the software design flaw can be fixed. This is an open source project -- rather than repetitively talking about serious design flaws in a very workable piece of software (to which I believe you have contributed nothing) and instead of talking about how sad it is that the authors didn't correct it, why don't you stop complaining and code a better version. I'm sure if you produce a demonstrably better version and test it under a range of use-cases to validate its robustness that people would be more than happy to use your fix for this serious design flaw. And you win a bigger bonus if you do this all using tar or rsync without the requirement for any client software of any other remotely executed commands... The above design concept would make BackupPC much more efficient even under normal circumstances where the variable $Conf{RsyncShareName} is unchanging. At present, rsyncd will only refrain from sending a file if it is present in the same path in the same module in a previous full back-up. If server1 already has the same identical file in any other location, the file is sent by rsyncd and then discarded after it arrives. It sounds like you know what you want to do so start coding and stop complaining... If the above serious design flaw is not fixed, it will not do much harm to people whose files are rarely changing and rarely moving. But if, for example, you move a directory tree from once place to another, BackupPC will re-send the whole lot across the LAN, and then it will discard the files when they arrive on the BackupPC server. This will keep on happening until after you have made a full back-up of the files in the new location. No one is stopping you from fixing this serious design flaw which obviously is not keeping the bulk of us users up at night worrying. And for the record, I don't necessarily disagree with you that there are things that can be improved but your attitude is going to get you less than nowhere. Also, the coders are hardly stupid and there are good reasons for the various tradeoffs they have made that you would be wise in trying to understand before disparaging them and their software. -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] RsyncP problem
And for the record, I don't necessarily disagree with you that there are things that can be improved but your attitude is going to get you less than nowhere. Also, the coders are hardly stupid and there are good reasons for the various tradeoffs they have made that you would be wise in trying to understand before disparaging them and their software. Hi I didn't want to sound rude. This was my 6th mail regarding this problem (5 to this list, 1 personally to Craig) I think. In the first 5 mails I was reporting my observtaions asking whether what I am seeing is expected behaviour or an error on my part, each mail providing more detail as I was trying to find the source of the problem. In my personal mail to Craig I stated the same question and asked for pointers as to where in RsyncP might be the problem so that I can start working on a fix (if possible). Not one single one of the mails got a reply, so I kept looking myself for an answer, both in Google and the source code. This last mail was just me being happy that I found out that this is indeed expected behaviour, that I can stop looking for problems in my setup and as a record for any future users who observe this behaviour. Regards Harald -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] RsyncP problem
Les Mikesell wrote at about 14:11:12 -0600 on Monday, December 7, 2009: It applies to full rsync or rsyncd backups. An interrupted full should be marked as a 'partial' in your backup summary - and the subsequent full retry should not transfer the completed files again although it will take the time to to a block checksum compare over them. I don't think it applies to incomplete files, so if you have one huge file that didn't finish I think it would retry from the start. This and Conf{IncrLevels} are fairly recent additions - be sure you have a current backuppc version and the code and documentation match. Even the current version won't find new or moved content if it exists in the pool, though. Is there any reason the rsync option --partial couldn't be implemented in perl-File-RsyncP (if not already there)? This would presumably allow partial backups of single files to be resumed. Not sure how hard it would be but intuitively, I wouldn't think it would be too hard. This could be important when backing up large files (e.g., video, databases, isos) and in particular over a slow link. -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] RsyncP problem
So, for anyone who cares (doesn't seem to be anyone on this list who noticed), I found this post from 2006 stating and analyzing my exact problem: http://www.topology.org/linux/backuppc.html On this site, search for Design flaw: Avoidable re-transmission of massive amounts of data. For future reference and archiving, I quote here in full: 2006-6-7: During the last week while using BackupPC in earnest, I have noticed a very serious design flaw which it totally avoidable by making a small change to the software. First I will describe the flaw with an example. 1. First I back up the ryncd module home from computer client1 to computer server1 using the rsyncd method. This uses the following line in the server1 config.pl file: $Conf{RsyncShareName} = ['home']; 2. Then I do an incremental back-up of module home from client1 to server1. This back-up correctly sends only the changes in the file-system module home over the network. So the back-up is very quick. 3. Now I modify the variable $Conf{RsyncShareName} on server1 as follows: $Conf{RsyncShareName} = ['home', 'home1']; 4. Next, I make an incremental back-up. Naturally, the home module is sent very efficiently over the LAN and home1 is sent in full, essentially uncompressed. Well, this isn't quite natural. In fact, it's quite avoidable, but I'll explain why this is so later. 5. Now I make a second incremental back-up of home and home1. Since I have already backed up these two modules, I expect them both to be very quick. But this does not happen. In fact, all of home1 is sent in full over the LAN, which in my case takes about 10 hours. This is a real nuisance. This problem occurs even if I have this in the config.pl file on server1: $Conf{IncrFill} = 1; 6. Next, I make a full back-up. This sends only the changes to home over the LAN, but sends the full contents of home1, uncompressed, over the LAN, even though I have already sent this module in full twice. 7. Now when I make future back-ups, the modules home and home1 are both sent efficiently and quickly. The design flaw here is crystal clear. Consider a single file home1/xyz.txt. The authors has designed the BackupPC system so that the file home1/xyz.txt is sent in full from client1 to server1 unless 1. the file home1/xyz.txt is already on server1 with the identical path in the identical module home1, and 2. the back-up in which home1/xyz.txt exists is a full back-up, not an incremental back-up. If the above conditions do not both hold, the full file is transmitted by rsyncd on client1; then it is discarded by server1 if it is already present on server1 in either the same path in an earlier back-up, or in any path at all in any other module in any kind of earlier back-up. So the software correctly discards duplicate files when they arrive on server1, but they are still transmitted anyway. The cure for this design flaw is very easy indeed, and it would save me several days of saturated LAN bandwidth when I make back-ups. It's very sad that the authors did not design the software correctly. Here is how the software design flaw can be fixed. 1. When an rsync file-system module module1 is to be transmitted from client1 to server1, first transmit the hash (e.g. MD5) of each file from client1 to server1. This can be done (a) on a file by file basis, (b) for all the files in module1 at the same time, or (c) in bundles of say, a few hundred or thousand hashes at a time. 2. The BackupPC server server1 matches the received file hashes with the global hash table of all files on server1, both full back-up files and incremenetal back-up files. 3. Then server1 requests rsyncd on client1 to only transmit the files which are not already present on server1. Notice that the files on server1 do not have to be in the same path in the same module on server1 in a full back-up, which is the case in the current BackupPC software design. 4. Then client1 sends only the files which are requested, which are the files which are not already present on server1. The above design concept would make BackupPC much more efficient even under normal circumstances where the variable $Conf{RsyncShareName} is unchanging. At present, rsyncd will only refrain from sending a file if it is present in the same path in the same module in a previous full back-up. If server1 already has the same identical file in any other location, the file is sent by rsyncd and then discarded after it arrives. If the above serious design flaw is not fixed, it will not do much harm to people whose files are rarely changing and rarely moving. But if, for example, you move a directory tree from once place to another, BackupPC will re-send the whole lot across the LAN, and then it will discard the files when they arrive on the BackupPC server. This will keep on happening until after you have made a full back-up of the files in the new location.
Re: [BackupPC-users] RsyncP problem
Harald Amtmann wrote: So, for anyone who cares (doesn't seem to be anyone on this list who noticed), I found this post from 2006 stating and analyzing my exact problem: http://www.topology.org/linux/backuppc.html On this site, search for Design flaw: Avoidable re-transmission of massive amounts of data. It's documented behavior, so not a surprise. 5. Now I make a second incremental back-up of home and home1. Since I have already backed up these two modules, I expect them both to be very quick. But this does not happen. In fact, all of home1 is sent in full over the LAN, which in my case takes about 10 hours. This is a real nuisance. This problem occurs even if I have this in the config.pl file on server1: $Conf{IncrFill} = 1; You have the wrong expectations. Do you have a reasonably current version, and did you read the section on $Conf{IncrLevels} in http://backuppc.sourceforge.net/faq/BackupPC.html? You can also just do full runs instead of incrementals - they take a long time as the target has to read the files to verify the block checksums, but not a lot of bandwidth. The cure for this design flaw is very easy indeed, and it would save me several days of saturated LAN bandwidth when I make back-ups. It's very sad that the authors did not design the software correctly. Here is how the software design flaw can be fixed. 1. When an rsync file-system module module1 is to be transmitted from client1 to server1, first transmit the hash (e.g. MD5) of each file from client1 to server1. This can be done (a) on a file by file basis, (b) for all the files in module1 at the same time, or (c) in bundles of say, a few hundred or thousand hashes at a time. The rsync binary on the target isn't going to do that. 2. The BackupPC server server1 matches the received file hashes with the global hash table of all files on server1, both full back-up files and incremenetal back-up files. Aside from not matching rsync, the file hashes have expected collisions that can only be resolved by a full data comparison. And there's no reason to expect all of the files in the pool to have been collected with an rsync transfer method. -- Les Mikesell lesmikes...@gmail.com -- Join us December 9, 2009 for the Red Hat Virtual Experience, a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere. http://p.sf.net/sfu/redhat-sfdev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] RsyncP problem
Original-Nachricht Datum: Mon, 07 Dec 2009 13:08:52 -0600 Von: Les Mikesell lesmikes...@gmail.com An: General list for user discussion,questions and support backuppc-users@lists.sourceforge.net Betreff: Re: [BackupPC-users] RsyncP problem Harald Amtmann wrote: So, for anyone who cares (doesn't seem to be anyone on this list who noticed), I found this post from 2006 stating and analyzing my exact problem: http://www.topology.org/linux/backuppc.html On this site, search for Design flaw: Avoidable re-transmission of massive amounts of data. It's documented behavior, so not a surprise. With the rsync transfer method the partial backup is used to resume the next full backup, avoiding the need to retransfer the file data already in the partial backup. This is also from the docs and doesn't work. I have 40 GB of data and do a first full backup. It gets interrupted. I start it again and all data is retransmitted. Does the rsync transfer method not include rsyncd method which I am using? -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 -- Join us December 9, 2009 for the Red Hat Virtual Experience, a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere. http://p.sf.net/sfu/redhat-sfdev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] RsyncP problem
Harald Amtmann wrote: Original-Nachricht Datum: Mon, 07 Dec 2009 13:08:52 -0600 Von: Les Mikesell lesmikes...@gmail.com An: General list for user discussion, questions and support backuppc-users@lists.sourceforge.net Betreff: Re: [BackupPC-users] RsyncP problem Harald Amtmann wrote: So, for anyone who cares (doesn't seem to be anyone on this list who noticed), I found this post from 2006 stating and analyzing my exact problem: http://www.topology.org/linux/backuppc.html On this site, search for Design flaw: Avoidable re-transmission of massive amounts of data. It's documented behavior, so not a surprise. With the rsync transfer method the partial backup is used to resume the next full backup, avoiding the need to retransfer the file data already in the partial backup. This is also from the docs and doesn't work. I have 40 GB of data and do a first full backup. It gets interrupted. I start it again and all data is retransmitted. Does the rsync transfer method not include rsyncd method which I am using? It applies to full rsync or rsyncd backups. An interrupted full should be marked as a 'partial' in your backup summary - and the subsequent full retry should not transfer the completed files again although it will take the time to to a block checksum compare over them. I don't think it applies to incomplete files, so if you have one huge file that didn't finish I think it would retry from the start. This and Conf{IncrLevels} are fairly recent additions - be sure you have a current backuppc version and the code and documentation match. Even the current version won't find new or moved content if it exists in the pool, though. -- Les Mikesell lesmikes...@gmail.com -- Join us December 9, 2009 for the Red Hat Virtual Experience, a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere. http://p.sf.net/sfu/redhat-sfdev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] RsyncP problem
Conf{IncrLevels} are fairly recent additions - be sure you have a current backuppc version and the code and documentation match. Even the current version won't find new or moved content if it exists in the pool, though. Are you referring to 3.2.0 beta 1 or 3.1.0 as recent version? I am using 3.1.0 from Debian. -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 -- Join us December 9, 2009 for the Red Hat Virtual Experience, a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere. http://p.sf.net/sfu/redhat-sfdev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] RsyncP problem
Harald Amtmann wrote: Conf{IncrLevels} are fairly recent additions - be sure you have a current backuppc version and the code and documentation match. Even the current version won't find new or moved content if it exists in the pool, though. Are you referring to 3.2.0 beta 1 or 3.1.0 as recent version? I am using 3.1.0 from Debian. From the changelog here http://sourceforge.net/project/shownotes.php?release_id=673692 I'd say the features should be in 3.1.0 but there could have been bugs with subsequent fixes. -- Les Mikesell lesmikes...@gmail.com -- Join us December 9, 2009 for the Red Hat Virtual Experience, a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere. http://p.sf.net/sfu/redhat-sfdev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/