On 15/07/10 15:39, James Courtier-Dutton wrote:

Take 1 central site PC called "A"
Take two remote sites PC called "B" and "C".

B has already sent a full backup to A.
C wishes to send a full backup to A, but lots of the data on C is the same as B.
C generates HASHs of its files, and only sends the HASHs to A.
A responses to C saying which HASHs it has not already got from B.
C then only sends a subset of the data, I.e. data that was not already
sent from B.

Thus, as lot of WAN bandwidth is saved.

The problem is that hash collisions can occur. Two files with the same hash are /probably/ the same file, but probably isn't good enough -- a backup system has to 100% sure. And the only way to be certain is to get both files and compare them byte by byte.

BackupPC uses hashes for file names, but also checks for hash collisions and deals with them when they happen.


There is also the possibility of doing this on a site bases. So, one
machine at the site de-dupes all the data for that site, and then just
sends the de-duped data over the WAN link.

That could work, but needs more software at the client end. Does rsync do anything like that?

cheers

Chris
--
Chris Dennis                                  cgden...@btinternet.com
Fordingbridge, Hampshire, UK

--
Please post to: Hampshire@mailman.lug.org.uk
Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire
LUG URL: http://www.hantslug.org.uk
--------------------------------------------------------------

Reply via email to