Re: Question regarding checksuming of a file
A script I use for comparing files by MD5 sum uses the following function, which you may find helps: def getSum(self): md5Sum = md5.new() f = open(self.filename, 'rb') for line in f: md5Sum.update(line) f.close() return md5Sum.hexdigest() -- http://mail.python.org/mailman/listinfo/python-list
Re: Question regarding checksuming of a file
Ant [EMAIL PROTECTED] writes: def getSum(self): md5Sum = md5.new() f = open(self.filename, 'rb') for line in f: md5Sum.update(line) f.close() return md5Sum.hexdigest() This should work, but there is one hazard if the file is very large and is not a text file. You're trying to read one line at a time from it, which means a contiguous string of characters up to a newline. Depending on the file contents, that could mean gigabytes which get read into memory. So it's best to read a fixed size amount in each operation, e.g. (untested): def getblocks(f, blocksize=1024): while True: s = f.read(blocksize) if not s: return yield s then change for line in f to for line in f.getblocks(). I actually think an iterator like the above should be added to the stdlib, since the for line in f idiom is widely used and sometimes inadvisable, like the fixed sized buffers in those old C programs that led to buffer overflow bugs. -- http://mail.python.org/mailman/listinfo/python-list
Re: Question regarding checksuming of a file
When I run the script, I get an error that the file object does not have the attribute getblocks. Did you mean this instead? def getblocks(f, blocksize=1024): while True: s = f.read(blocksize) if not s: return yield s def getsum(self): md5sum = md5.new() f = open(self.file_name, 'rb') for line in getblocks(f) : md5sum.update(line) f.close() return md5sum.hexdigest() -- http://mail.python.org/mailman/listinfo/python-list
Re: Question regarding checksuming of a file
Am Sonntag 14 Mai 2006 20:51 schrieb Andrew Robert: def getblocks(f, blocksize=1024): while True: s = f.read(blocksize) if not s: return yield s This won't work. The following will: def getblocks(f,blocksize=1024): while True: s = f.read(blocksize) if not s: break yield s --- Heiko. -- http://mail.python.org/mailman/listinfo/python-list
Re: Question regarding checksuming of a file
Andrew Robert [EMAIL PROTECTED] writes: When I run the script, I get an error that the file object does not have the attribute getblocks. Woops, yes, you have to call getblocks(f). Also, Heiko says you can't use return to break out of the generator; I thought you could but maybe I got confused. -- http://mail.python.org/mailman/listinfo/python-list
Re: Question regarding checksuming of a file
Am Sonntag 14 Mai 2006 22:29 schrieb Paul Rubin: Andrew Robert [EMAIL PROTECTED] writes: When I run the script, I get an error that the file object does not have the attribute getblocks. Woops, yes, you have to call getblocks(f). Also, Heiko says you can't use return to break out of the generator; I thought you could but maybe I got confused. Yeah, you can. You can't return arg in a generator (of course, this raises a SyntaxError), but you can use return to generate a raise StopIteration. So, it wasn't you who was confused... ;-) --- Heiko. -- http://mail.python.org/mailman/listinfo/python-list
Question regarding checksuming of a file
Good evening, I need to generate checksums of a file, store the value in a variable, and pass it along for later comparison. The MD5 module would seem to do the trick but I'm sketchy on implementation. The nearest I can see would be import md5 m=md5.new() contents = open(self.file_name,rb).read() check=md5.update(contents) However this does not appear to be actually returning the checksum. Does anyone have insight into where I am going wrong? Any help you can provide would be greatly appreciated. Thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: Question regarding checksuming of a file
Andrew Robert wrote: m=md5.new() contents = open(self.file_name,rb).read() check=md5.update(contents) However this does not appear to be actually returning the checksum. the docs are your friend, use them. hint: first you eat, then you... http://docs.python.org/lib/module-md5.html -- Edward Elliott UC Berkeley School of Law (Boalt Hall) complangpython at eddeye dot net -- http://mail.python.org/mailman/listinfo/python-list
Re: Question regarding checksuming of a file
Actually, I think I got it but would like to confirm this looks right. import md5 checksum = md5.new() mfn = open(self.file_name, 'r') for line in mfn.readlines(): checksum.update(line) mfn.close() cs = checksum.hexdigest() print cs The value cs should contain the MD5 checksum or did I miss something? Any help you can provide would be greatly appreciated. Thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: Question regarding checksuming of a file
In article [EMAIL PROTECTED], Andrew Robert [EMAIL PROTECTED] wrote: Good evening, I need to generate checksums of a file, store the value in a variable, and pass it along for later comparison. The MD5 module would seem to do the trick but I'm sketchy on implementation. The nearest I can see would be import md5 m=md5.new() contents = open(self.file_name,rb).read() check=md5.update(contents) However this does not appear to be actually returning the checksum. Does anyone have insight into where I am going wrong? After calling update(), you need to call digest(). Update() only updates the internal state of the md5 state machine; digest() returns the hash. Also, for the code above, it's m.update(), not md5.update(). Update() is a method of an md5 instance object, not the md5 module itself. Lastly, the md5 algorithm is known to be weak. If you're doing md5 to maintain compatability with some pre-existing implementation, that's one thing. But, if you're starting something new from scratch, I would suggest using SHA-1 instead (see the sha module). SHA-1 is much stronger cryptographically than md5. The Python API is virtually identical, so it's no added work to switch to the stronger algorithm. -- http://mail.python.org/mailman/listinfo/python-list
Re: Question regarding checksuming of a file
Roy Smith wrote: However this does not appear to be actually returning the checksum. Does anyone have insight into where I am going wrong? After calling update(), you need to call digest(). Update() only updates the internal state of the md5 state machine; digest() returns the hash. Also, for the code above, it's m.update(), not md5.update(). Update() is a method of an md5 instance object, not the md5 module itself. Lastly, the md5 algorithm is known to be weak. If you're doing md5 to maintain compatability with some pre-existing implementation, that's one thing. But, if you're starting something new from scratch, I would suggest using SHA-1 instead (see the sha module). SHA-1 is much stronger cryptographically than md5. The Python API is virtually identical, so it's no added work to switch to the stronger algorithm. Hi Roy, This is strictly for checking if a file was corrupted during transit over an MQSeries channel. The check is not intended to be used for crypto purposes. -- http://mail.python.org/mailman/listinfo/python-list