Re: Question regarding checksuming of a file

2006-05-14 Thread Ant
A script I use for comparing files by MD5 sum uses the following
function, which you may find helps:

def getSum(self):
md5Sum = md5.new()

f = open(self.filename, 'rb')

for line in f:
md5Sum.update(line)

f.close()

return md5Sum.hexdigest()

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question regarding checksuming of a file

2006-05-14 Thread Paul Rubin
Ant [EMAIL PROTECTED] writes:
 def getSum(self):
 md5Sum = md5.new()
 f = open(self.filename, 'rb')
 for line in f:
 md5Sum.update(line)
 f.close()
 return md5Sum.hexdigest()

This should work, but there is one hazard if the file is very large
and is not a text file.  You're trying to read one line at a time from
it, which means a contiguous string of characters up to a newline.
Depending on the file contents, that could mean gigabytes which get
read into memory.  So it's best to read a fixed size amount in each
operation, e.g. (untested):

   def getblocks(f, blocksize=1024):
  while True:
s = f.read(blocksize)
if not s: return
yield s

then change for line in f to for line in f.getblocks().

I actually think an iterator like the above should be added to the
stdlib, since the for line in f idiom is widely used and sometimes
inadvisable, like the fixed sized buffers in those old C programs
that led to buffer overflow bugs.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question regarding checksuming of a file

2006-05-14 Thread Andrew Robert

When I run the script, I get an error that the file object does not have
 the attribute getblocks.

 Did you mean this instead?

 def getblocks(f, blocksize=1024):
while True:
s = f.read(blocksize)
if not s: return
yield s

 def getsum(self):
md5sum = md5.new()
 f = open(self.file_name, 'rb')
 for line in getblocks(f) :
 md5sum.update(line)
 f.close()
return md5sum.hexdigest()

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question regarding checksuming of a file

2006-05-14 Thread Heiko Wundram
Am Sonntag 14 Mai 2006 20:51 schrieb Andrew Robert:
  def getblocks(f, blocksize=1024):
   while True:
   s = f.read(blocksize)
   if not s: return
   yield s

This won't work. The following will:

def getblocks(f,blocksize=1024):
while True:
s = f.read(blocksize)
if not s: break
yield s

--- Heiko.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question regarding checksuming of a file

2006-05-14 Thread Paul Rubin
Andrew Robert [EMAIL PROTECTED] writes:
 When I run the script, I get an error that the file object does not have
  the attribute getblocks.

Woops, yes, you have to call getblocks(f).  Also, Heiko says you can't
use return to break out of the generator; I thought you could but
maybe I got confused.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question regarding checksuming of a file

2006-05-14 Thread Heiko Wundram
Am Sonntag 14 Mai 2006 22:29 schrieb Paul Rubin:
 Andrew Robert [EMAIL PROTECTED] writes:
  When I run the script, I get an error that the file object does not have
   the attribute getblocks.

 Woops, yes, you have to call getblocks(f).  Also, Heiko says you can't
 use return to break out of the generator; I thought you could but
 maybe I got confused.

Yeah, you can. You can't return arg in a generator (of course, this raises a 
SyntaxError), but you can use return to generate a raise StopIteration. So, 
it wasn't you who was confused... ;-)

--- Heiko.
-- 
http://mail.python.org/mailman/listinfo/python-list


Question regarding checksuming of a file

2006-05-13 Thread Andrew Robert
Good evening,

I need to generate checksums of a file, store the value in a variable,
and pass it along for later comparison.

The MD5 module would seem to do the trick but I'm sketchy on implementation.


The nearest I can see would be

import md5

m=md5.new()
contents = open(self.file_name,rb).read()
check=md5.update(contents)

However this does not appear to be actually returning the checksum.

Does anyone have insight into where I am going wrong?

Any help you can provide would be greatly appreciated.

Thanks
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question regarding checksuming of a file

2006-05-13 Thread Edward Elliott
Andrew Robert wrote:

 m=md5.new()
 contents = open(self.file_name,rb).read()
 check=md5.update(contents)
 
 However this does not appear to be actually returning the checksum.

the docs are your friend, use them.  hint: first you eat, then you...
http://docs.python.org/lib/module-md5.html

-- 
Edward Elliott
UC Berkeley School of Law (Boalt Hall)
complangpython at eddeye dot net
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question regarding checksuming of a file

2006-05-13 Thread Andrew Robert
Actually, I think I got it but would like to confirm this looks right.

import md5
checksum = md5.new()
mfn = open(self.file_name, 'r')
for line in mfn.readlines():
checksum.update(line)
mfn.close()
cs = checksum.hexdigest()
print cs

The value cs should contain the MD5 checksum or did I miss something?

Any help you can provide would be greatly appreciated.

Thanks
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question regarding checksuming of a file

2006-05-13 Thread Roy Smith
In article [EMAIL PROTECTED],
 Andrew Robert [EMAIL PROTECTED] wrote:

 Good evening,
 
 I need to generate checksums of a file, store the value in a variable,
 and pass it along for later comparison.
 
 The MD5 module would seem to do the trick but I'm sketchy on implementation.
 
 
 The nearest I can see would be
 
 import md5
 
 m=md5.new()
 contents = open(self.file_name,rb).read()
 check=md5.update(contents)
 
 However this does not appear to be actually returning the checksum.
 
 Does anyone have insight into where I am going wrong?

After calling update(), you need to call digest().  Update() only updates 
the internal state of the md5 state machine; digest() returns the hash.  
Also, for the code above, it's m.update(), not md5.update().  Update() is a 
method of an md5 instance object, not the md5 module itself.

Lastly, the md5 algorithm is known to be weak.  If you're doing md5 to 
maintain compatability with some pre-existing implementation, that's one 
thing.  But, if you're starting something new from scratch, I would suggest 
using SHA-1 instead (see the sha module).  SHA-1 is much stronger 
cryptographically than md5.  The Python API is virtually identical, so it's 
no added work to switch to the stronger algorithm.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question regarding checksuming of a file

2006-05-13 Thread Andrew Robert
Roy Smith wrote:

 However this does not appear to be actually returning the checksum.

 Does anyone have insight into where I am going wrong?
 
 After calling update(), you need to call digest().  Update() only updates 
 the internal state of the md5 state machine; digest() returns the hash.  
 Also, for the code above, it's m.update(), not md5.update().  Update() is a 
 method of an md5 instance object, not the md5 module itself.
 
 Lastly, the md5 algorithm is known to be weak.  If you're doing md5 to 
 maintain compatability with some pre-existing implementation, that's one 
 thing.  But, if you're starting something new from scratch, I would suggest 
 using SHA-1 instead (see the sha module).  SHA-1 is much stronger 
 cryptographically than md5.  The Python API is virtually identical, so it's 
 no added work to switch to the stronger algorithm.

Hi Roy,

This is strictly for checking if a file was corrupted during transit
over an MQSeries channel.

The check is not intended to be used for crypto purposes.
-- 
http://mail.python.org/mailman/listinfo/python-list