On Thu, Aug 31, 2017 at 10:02 PM, Mike Small <sma...@sdf.org> wrote:
> John Abreau <abre...@gmail.com> writes:
>
>> I've heard of tools using MD5 or SHA1 hashes to identify duplicates, and
>> potential issues with hash collisions causing false positives.
>
> By accident or maliciously? The numbers seem off for accidental
> collisions. An md5 sum is a 16 digit hex number. That gives
> 340282366920938463463374607431768211456 potential hash sums (or does the
> algorithm offer only a smaller subset?). I'm not going to bother to
> compute the probability of a collision. It's a very remote possiblity,
> yes? For the malicious case, if someone's able to mess with the hashes
> used by deduplication code in your file system or in your hopefully
> almost as good userland equivalent (which of course must use git in some
> way or another for reasons that are not clear to me) you have unsolvable
> problems.

Does git only compare the checksum or does it also look at file size as well?
I would think that comparing file size might make it even harder to
get a collision.
The only duplicate checksum that I've ever seen in practice was on 0
length files.
Zero length files are, of course, all perfect duplicates of each other... :-)

Bill Bogstad
_______________________________________________
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss

Reply via email to