On Wed, 15 Apr 2009 07:54:20 +0200, Martin wrote: >> Perhaps I'm being dim, but how else are you going to decide if two >> files are the same unless you compare the bytes in the files? > > I'd say checksums, just about every download relies on checksums to > verify you do have indeed the same file.
The checksum does look at every byte in each file. Checksumming isn't a way to avoid looking at each byte of the two files, it is a way of mapping all the bytes to a single number. >> You could hash them and compare the hashes, but that's a lot more work >> than just comparing the two byte streams. > > hashing is not exactly much mork in it's simplest form it's 2 lines per > file. Hashing is a *lot* more work than just comparing two bytes. The MD5 checksum has been specifically designed to be fast and compact, and the algorithm is still complicated: http://en.wikipedia.org/wiki/MD5#Pseudocode The reference implementation is here: http://www.fastsum.com/rfc1321.php#APPENDIXA SHA-1 is even more complicated still: http://en.wikipedia.org/wiki/SHA_hash_functions#SHA-1_pseudocode Just because *calling* some checksum function is easy doesn't make the checksum function itself simple. They do a LOT more work than just a simple comparison between bytes, and that's totally unnecessary work if you are making a one-off comparison of two local files. -- Steven -- http://mail.python.org/mailman/listinfo/python-list