On 15 July 2010 20:44, Chris Dennis <cgden...@btinternet.com> wrote:
> On 15/07/10 15:39, James Courtier-Dutton wrote:
>>
>> Take 1 central site PC called "A"
>> Take two remote sites PC called "B" and "C".
>>
>> B has already sent a full backup to A.
>> C wishes to send a full backup to A, but lots of the data on C is the same
>> as B.
>> C generates HASHs of its files, and only sends the HASHs to A.
>> A responses to C saying which HASHs it has not already got from B.
>> C then only sends a subset of the data, I.e. data that was not already
>> sent from B.
>>
>> Thus, as lot of WAN bandwidth is saved.
>
> The problem is that hash collisions can occur.  Two files with the same hash
> are /probably/ the same file, but probably isn't good enough -- a backup
> system has to 100% sure.  And the only way to be certain is to get both
> files and compare them byte by byte.
>

There are algorithms that detect collisions without having to send the
entire file.
For example, rsync uses them.
Say you change one byte in a large file. rsync will not send the
entire file again, it will only send the changes.
If I followed your statement, I would have to stop using rsync.

Kind Regards

James

--
Please post to: Hampshire@mailman.lug.org.uk
Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire
LUG URL: http://www.hantslug.org.uk
--------------------------------------------------------------

Reply via email to