On 15 July 2010 20:44, Chris Dennis <cgden...@btinternet.com> wrote: > On 15/07/10 15:39, James Courtier-Dutton wrote: >> >> Take 1 central site PC called "A" >> Take two remote sites PC called "B" and "C". >> >> B has already sent a full backup to A. >> C wishes to send a full backup to A, but lots of the data on C is the same >> as B. >> C generates HASHs of its files, and only sends the HASHs to A. >> A responses to C saying which HASHs it has not already got from B. >> C then only sends a subset of the data, I.e. data that was not already >> sent from B. >> >> Thus, as lot of WAN bandwidth is saved. > > The problem is that hash collisions can occur. Two files with the same hash > are /probably/ the same file, but probably isn't good enough -- a backup > system has to 100% sure. And the only way to be certain is to get both > files and compare them byte by byte. >
There are algorithms that detect collisions without having to send the entire file. For example, rsync uses them. Say you change one byte in a large file. rsync will not send the entire file again, it will only send the changes. If I followed your statement, I would have to stop using rsync. Kind Regards James -- Please post to: Hampshire@mailman.lug.org.uk Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire LUG URL: http://www.hantslug.org.uk --------------------------------------------------------------