Re: binary file compare...

Nigel Rantor Wed, 15 Apr 2009 10:05:01 -0700

Martin wrote:

On Wed, Apr 15, 2009 at 11:03 AM, Steven D'Aprano
<ste...@remove.this.cybersource.com.au> wrote:

The checksum does look at every byte in each file. Checksumming isn't a
way to avoid looking at each byte of the two files, it is a way of
mapping all the bytes to a single number.


My understanding of the original question was a way to determine
wether 2 files are equal or not. Creating a checksum of 1-n files and
comparing those checksums IMHO is a valid way to do that. I know it's
a (one way) mapping between a (possibly) longer byte sequence and
another one, how does checksumming not take each byte in the original
sequence into account.

The fact that two md5 hashes are equal does not mean that the sourcesthey were generated from are equal. To do that you must still perform abyte-by-byte comparison which is much less work for the processor thangenerating an md5 or sha hash.

If you insist on using a hashing algorithm to determine the equivalenceof two files you will eventually realise that it is a flawed planbecause you will eventually find two files with different contents thatnonetheless hash to the same value.


The more files you test with the quicker you will find out this basic truth.

This is not complex, it's a simple fact about how hashing algorithms work.

  n

--
http://mail.python.org/mailman/listinfo/python-list

Re: binary file compare...

Reply via email to