Here is how I think an algorithm could work, which requires no changes on the remote side, so it should work with the plain rsync protocol, but likely will require some changes to rsyncp.
Let: B be the BackupPC_dump process P be the peer (PC to be backed up) R be the remote rsync process F be a file on P that backuppc wants to save Gc, Gc2 be two compressed files in the same chain in the server pool G1, G2 be their uncompressed data F[x], G1[x] be the xth block of F or G1 respectively HR(b) be the rsync rdelta-hash of block b HL(b) be backuppc's hash-for-pooling of block b 0. B has already contacted P and R has been spawned. Maybe some files have already been transferred; Now B arrives at F and determines that it need to transfer the file (i.e. it's new, or the attributes have changed). 1. Since we need the first 8 blocks to determine the pool chain, we just receive the first 8 blocks. Depending on the protocol, this possibly requires us to ask for the hashes first, and to pretend that each hash simply does not match out (fake) local hashes: newfile = []; // array of blocks for( x=0 ; x < 8 ; ++x ) { tmphash = recv( HR( F[x] )); if tmphash == IMPOSSIBLE_HASH { explode(); } newfile += recv( F[x] ); } 2. Okay, now we have the first 8 blocks and can determine the pool chain, if there is one: localfiles = [] localfiles = get_pool_files( newfile[0], newfile[7] ) 3. And now iteratively ask for hashsums from the remote, compare them to the corresponding blocks in the pool files (which may be cached!), and assemble the new file: x = 8 while !F.eof() { tmphash = recv( HR( F[x] )); tmpblock = ""; for G in localfiles { // iterate G1, G2, … if tmphash == HR( G[x] ) { tmpblock = G[x]; break; /* possible optimisation: replace localfiles with just * localfile so that for successive blocks, not all local * files have to be compared. */ } } if !tmpblock { tmpblock = recv( F[x] ); } newfile += tmpblock; } 4. At the very end, determine if the new data is identical to existing data and hardlink. This can either be done with a full-file hashsum (to be safe), or the above algorithm can determine that while it's running. If this logic cannot be implemented inside or atop of rsyncp, then maybe the following logic would work: 1. Create a buffer in memory and fill it with zeroes; 2. Pass this buffer to rsyncp and ask it to synchronise F into it; 3. After 8×128k blocks have been determined: a. determine if there is a corresponding pool file; b. if yes, uncompress it and write it into the buffer; 4. Let rsyncp continue, with a potentially modified buffer. Thoughts? -- martin | http://madduck.net/ | http://two.sentenc.es/ "i've got a bike, you can ride it if you like, it's got a basket. a bell that rings, and things to make it look good. i'd give it to you if i could, but i borrowed it." -- syd barrett, 1967 spamtraps: madduck.bo...@madduck.net
digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)
------------------------------------------------------------------------------ This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/