Re: [BackupPC-users] Matching files against the pool remotely.
Les Mikesell gmail.com> writes: > > Shawn Perry wrote: > > Take a look at how Unison does it's compares. > > > > It's not impossible - it just can't be done with existing tools > and storage scheme. > This look indeed like a complicated problem. There is an rsync patch which add an option --link-by-hash=DIR pretty similar to the pool of backuppc. It create hard links of similar files ordered by md4 sum on the fly during the sync process. I tested it and I'm looking at the C code right now. I'm sure there is less than 5 lines to make it skip a remote file whom the hash match a file in the pool. It imply a patched rsync. That may be acceptable for precise cases like mine where bandwidth is critical but only on the backuppc side of rsync (the client). This mean the rsyncd part will stay regular rsync and send sums of the whole files not following BackupPC_Link logic. So It would imply in turn to change the way BackupPC_Link manage the pool, etc... And there is also the duplicate hash problem. My better idea so far would be to use a patched rsync and a separate pool directory for rsyncd method only. Matching against this pool would be done only if the classic matching by path failed. The matching file would also not be just hard-linked but rather serve as base file for rsync update, so in case we have several candidate one can safely use the first coming file. This way it can be integrated in BackupPC only with some host configuration. Malik. -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] Matching files against the pool remotely.
Shawn Perry wrote: > Take a look at how Unison does it's compares. > It's not impossible - it just can't be done with existing tools and storage scheme. -- Les Mikesell lesmikes...@gmail.com -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] Matching files against the pool remotely.
Take a look at how Unison does it's compares. On Fri, Dec 18, 2009 at 9:12 AM, Les Mikesell wrote: > Malik Recoing. wrote: >> I know a file will be skiped if it is present in the previous backup, but what appens if the file have been backed up for another host ? >>> It is required to be uploaded first as otherwise there's nothing to >>> compare it to (yeah, I know, that's a pain[1]). >>> >>> It might theoretically be sufficient to let the remote side calculate a >>> hash and compare it against the files in the pool with matching hashes, >>> and then let rsync do full compares against all the matching hashes in the >>> pool (since hash collisions happen), but I don't believe anyone has tried >>> to code this up yet, and it would only be of limited uses in systems that >>> were network bandwidth constrained rather than disk bandwidth constrained. >> >> I'm quite sure it will be an improvement for both. Globaly there will be no >> overhead. More : the hash calculation will be kind of "clustered" delegating >> it >> to the client. The matching of identical hash is anyway done by >> BackupPC_Link. >> Thus BackupPC_Link will became pointless in a "rsync-only" configuration. The >> disk and the network trafic will be reduced as many files won't be >> transfered at >> all. > > There are two problems: one is that the remote agent is a standard rsync > binary that knows nothing about backuppc's hashes; the other is that > hash collisions are normal and expected - and disambiguated by a full > data comparison. > >> I tougth of a similar solution. When your client are mostly "full system >> tree" >> backups, you may have ready-to-copy backups of the differents OS tree. When a >> new client is added, you copy the corresponding OS directory as it was the >> first >> full backup. > > Yes, if your remote machines are essentially clones of each other, you > could create their pc directories as clones with a tool that knows how > to make a tree of hardlinks. > > A better solution might be to have a local machine at the site running > backuppc and work out some way to get an offsite copy. If bandwidth is > such an issue, you are also going to have trouble doing a restore. But, > if you've followed this mail list very long you'd know that the 'offsite > copy' problem doesn't have a good solution yet either. > > -- > Les Mikesell > lesmikes...@gmail.com > > > > -- > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast and easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev > ___ > BackupPC-users mailing list > BackupPC-users@lists.sourceforge.net > List: https://lists.sourceforge.net/lists/listinfo/backuppc-users > Wiki: http://backuppc.wiki.sourceforge.net > Project: http://backuppc.sourceforge.net/ > -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] Matching files against the pool remotely.
Malik Recoing. wrote: > >>> I know a file will be skiped if it is present in the previous backup, but >>> what >>> appens if the file have been backed up for another host ? >> It is required to be uploaded first as otherwise there's nothing to >> compare it to (yeah, I know, that's a pain[1]). >> >> It might theoretically be sufficient to let the remote side calculate a >> hash and compare it against the files in the pool with matching hashes, >> and then let rsync do full compares against all the matching hashes in the >> pool (since hash collisions happen), but I don't believe anyone has tried >> to code this up yet, and it would only be of limited uses in systems that >> were network bandwidth constrained rather than disk bandwidth constrained. > > I'm quite sure it will be an improvement for both. Globaly there will be no > overhead. More : the hash calculation will be kind of "clustered" delegating > it > to the client. The matching of identical hash is anyway done by BackupPC_Link. > Thus BackupPC_Link will became pointless in a "rsync-only" configuration. The > disk and the network trafic will be reduced as many files won't be transfered > at > all. There are two problems: one is that the remote agent is a standard rsync binary that knows nothing about backuppc's hashes; the other is that hash collisions are normal and expected - and disambiguated by a full data comparison. > I tougth of a similar solution. When your client are mostly "full system tree" > backups, you may have ready-to-copy backups of the differents OS tree. When a > new client is added, you copy the corresponding OS directory as it was the > first > full backup. Yes, if your remote machines are essentially clones of each other, you could create their pc directories as clones with a tool that knows how to make a tree of hardlinks. A better solution might be to have a local machine at the site running backuppc and work out some way to get an offsite copy. If bandwidth is such an issue, you are also going to have trouble doing a restore. But, if you've followed this mail list very long you'd know that the 'offsite copy' problem doesn't have a good solution yet either. -- Les Mikesell lesmikes...@gmail.com -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] Matching files against the pool remotely.
Tim Connors gmail.com> writes: > > On Fri, 18 Dec 2009, Malik Recoing. wrote: > > > I know a file will be skiped if it is present in the previous backup, but > > what > > appens if the file have been backed up for another host ? > > It is required to be uploaded first as otherwise there's nothing to > compare it to (yeah, I know, that's a pain[1]). > > It might theoretically be sufficient to let the remote side calculate a > hash and compare it against the files in the pool with matching hashes, > and then let rsync do full compares against all the matching hashes in the > pool (since hash collisions happen), but I don't believe anyone has tried > to code this up yet, and it would only be of limited uses in systems that > were network bandwidth constrained rather than disk bandwidth constrained. I'm quite sure it will be an improvement for both. Globaly there will be no overhead. More : the hash calculation will be kind of "clustered" delegating it to the client. The matching of identical hash is anyway done by BackupPC_Link. Thus BackupPC_Link will became pointless in a "rsync-only" configuration. The disk and the network trafic will be reduced as many files won't be transfered at all. If such a feature exists, it will give BackupPC a "magic" touch, backing up a wole tree of well known files in a minute even over a slow network. What a pity I'm not fluent with perl... > [1] I just worked around this myself by copying a large set of files onto > sneakernet (my USB key), copying them onto a directory on the local backup > server, backing that directory up, then moving the corresponding directory > in the backup tree into the previous backup of the remote system, so it > will be picked up and compared against the same files when that remote > system is next backed up. I find out tomorrow whether that actually > worked :) > I tougth of a similar solution. When your client are mostly "full system tree" backups, you may have ready-to-copy backups of the differents OS tree. When a new client is added, you copy the corresponding OS directory as it was the first full backup. Malik. -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] Matching files against the pool remotely.
On Fri, 18 Dec 2009, Malik Recoing. wrote: > The Holy Doc says ( Barratt:Desing:operation:2 ): "it checks each file in the > backup to see if it is identical to an existing file from any previous backup > of > any PC. It does this without needed to write the file to disk." > > But it doesn't say "without the need to upload the file in memory". > > I know a file will be skiped if it is present in the previous backup, but what > appens if the file have been backed up for another host ? It is required to be uploaded first as otherwise there's nothing to compare it to (yeah, I know, that's a pain[1]). It might theoretically be sufficient to let the remote side calculate a hash and compare it against the files in the pool with matching hashes, and then let rsync do full compares against all the matching hashes in the pool (since hash collisions happen), but I don't believe anyone has tried to code this up yet, and it would only be of limited uses in systems that were network bandwidth constrained rather than disk bandwidth constrained. [1] I just worked around this myself by copying a large set of files onto sneakernet (my USB key), copying them onto a directory on the local backup server, backing that directory up, then moving the corresponding directory in the backup tree into the previous backup of the remote system, so it will be picked up and compared against the same files when that remote system is next backed up. I find out tomorrow whether that actually worked :) -- TimC Computer screens simply ooze buckets of yang. To balance this, place some women around the corners of the room. -- Kaz Cooke, Dumb Feng Shui -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
[BackupPC-users] Matching files against the pool remotely.
Hello, I'm trying to optimize BackupPC for use over internet with a lot of client (say 100 per server). Clients run rsyncd and are connected via dsl of variable speed. Many discussions in this list helped me a lot. But I can't figure out one thing : Does BackupPC use rsync features to skip a file allready in the pool _before_ it uploaded it ? Or does he need to upload it first and then the file is matched against the pool, eventualy replaced by a hard link ? In the first case this will save bandwidth and disk, in the second case only disk space. Is BackupPC able to match a file remotely ? The Holy Doc says ( Barratt:Desing:operation:2 ): "it checks each file in the backup to see if it is identical to an existing file from any previous backup of any PC. It does this without needed to write the file to disk." But it doesn't say "without the need to upload the file in memory". I know a file will be skiped if it is present in the previous backup, but what appens if the file have been backed up for another host ? Thank you for your enlightenments. Malik. -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/