Re: Calculate sha1 hash of a binary file
LaundroMat [EMAIL PROTECTED] writes: Hi - I'm trying to calculate unique hash values for binary files, independent of their location and filename, and I was wondering whether I'm going in the right direction. Basically, the hash values are calculated thusly: f = open('binaryfile.bin') import hashlib h = hashlib.sha1() h.update(f.read()) hash = h.hexdigest() f.close() A quick try-out shows that effectively, after renaming a file, its hash remains the same as it was before. I have my doubts however as to the usefulness of this. As f.read() does not seem to read until the end of the file (for a 3.3MB file only a string of 639 bytes is being returned, perhaps a 00-byte counts as EOF?), is there a high danger for collusion? Are there better ways of calculating hash values of binary files? Apart from opening the file in binary mode, I would consider to read and update the hash in chunks of e.g. 512 KB. The above code is probably going to perform horribly for sufficiently large files, since you try read the entire file into memory. Best, -Nikolaus -- »It is not worth an intelligent man's time to be in the majority. By definition, there are already enough people to do that.« -J.H. Hardy PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C -- http://mail.python.org/mailman/listinfo/python-list
Re: Calculate sha1 hash of a binary file
Thanks all! -- http://mail.python.org/mailman/listinfo/python-list
Re: Calculate sha1 hash of a binary file
I did some testing, and calculating the hash value of a 1Gb file does take some time using this method. Would it be wise to calculate the hash value based on say for instance the first Mb? Is there a much larger chance of collusion this way (I suppose not). If it's helpful, the files would primarily be media (video) files. Thanks, Mathieu -- http://mail.python.org/mailman/listinfo/python-list
Re: Calculate sha1 hash of a binary file
LaundroMat [EMAIL PROTECTED] writes: Would it be wise to calculate the hash value based on say for instance the first Mb? Is there a much larger chance of collusion this way (I suppose not). If it's helpful, the files would primarily be media (video) files. The usual purpose of using this type of hash is to detect corruption and/or tampering. So you want to hash the whole file, not just part of it. If you're not worried about intentional tampering, md5 should be somewhat faster than sha, but there are some attacks against it and you shouldn't use it for high security applications where you want security against forgery. It should still have almost no chance of accidental collisions. -- http://mail.python.org/mailman/listinfo/python-list
Re: Calculate sha1 hash of a binary file
On Aug 7, 2:22 pm, Paul Rubin http://[EMAIL PROTECTED] wrote: LaundroMat [EMAIL PROTECTED] writes: Would it be wise to calculate the hash value based on say for instance the first Mb? Is there a much larger chance of collusion this way (I suppose not). If it's helpful, the files would primarily be media (video) files. The usual purpose of using this type of hash is to detect corruption and/or tampering. So you want to hash the whole file, not just part of it. If you're not worried about intentional tampering, md5 should be somewhat faster than sha, but there are some attacks against it and you shouldn't use it for high security applications where you want security against forgery. It should still have almost no chance of accidental collisions. Well, what I really intend to do is store the file hashes, in order to be able to recognise the files later on when they are stored on another location, and under another filename. It's not so much tampering I'm concerned with. -- http://mail.python.org/mailman/listinfo/python-list
Calculate sha1 hash of a binary file
Hi - I'm trying to calculate unique hash values for binary files, independent of their location and filename, and I was wondering whether I'm going in the right direction. Basically, the hash values are calculated thusly: f = open('binaryfile.bin') import hashlib h = hashlib.sha1() h.update(f.read()) hash = h.hexdigest() f.close() A quick try-out shows that effectively, after renaming a file, its hash remains the same as it was before. I have my doubts however as to the usefulness of this. As f.read() does not seem to read until the end of the file (for a 3.3MB file only a string of 639 bytes is being returned, perhaps a 00-byte counts as EOF?), is there a high danger for collusion? Are there better ways of calculating hash values of binary files? Thanks in advance, Mathieu -- http://mail.python.org/mailman/listinfo/python-list
Re: Calculate sha1 hash of a binary file
LaundroMat wrote: Hi - I'm trying to calculate unique hash values for binary files, independent of their location and filename, and I was wondering whether I'm going in the right direction. Basically, the hash values are calculated thusly: f = open('binaryfile.bin') import hashlib h = hashlib.sha1() h.update(f.read()) hash = h.hexdigest() f.close() A quick try-out shows that effectively, after renaming a file, its hash remains the same as it was before. I have my doubts however as to the usefulness of this. As f.read() does not seem to read until the end of the file (for a 3.3MB file only a string of 639 bytes is being returned, perhaps a 00-byte counts as EOF?), is there a high danger for collusion? Guess: you're running on Windows? You need to open binary files by using open (filename, rb) to indicate that Windows shouldn't treat certain characters -- specifically character 26 -- as special. TJG -- http://mail.python.org/mailman/listinfo/python-list