Martin wrote:
On Wed, Apr 15, 2009 at 11:03 AM, Steven D'Aprano
<ste...@remove.this.cybersource.com.au> wrote:
The checksum does look at every byte in each file. Checksumming isn't a
way to avoid looking at each byte of the two files, it is a way of
mapping all the bytes to a single number.

My understanding of the original question was a way to determine
wether 2 files are equal or not. Creating a checksum of 1-n files and
comparing those checksums IMHO is a valid way to do that. I know it's
a (one way) mapping between a (possibly) longer byte sequence and
another one, how does checksumming not take each byte in the original
sequence into account.

The fact that two md5 hashes are equal does not mean that the sources they were generated from are equal. To do that you must still perform a byte-by-byte comparison which is much less work for the processor than generating an md5 or sha hash.

If you insist on using a hashing algorithm to determine the equivalence of two files you will eventually realise that it is a flawed plan because you will eventually find two files with different contents that nonetheless hash to the same value.

The more files you test with the quicker you will find out this basic truth.

This is not complex, it's a simple fact about how hashing algorithms work.

  n

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to