[issue17436] pass a file object to hashlib.update
STINNER Victor added the comment: obj.update(buffer[:size]) This code does an useless memory copy: obj.update(memoryview(buffer)[:size]) can be used instead. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17436 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17436] pass a file object to hashlib.update
Changes by anatoly techtonik techto...@gmail.com: -- title: pass a string to hashlib.update - pass a file object to hashlib.update ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17436 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17436] pass a file object to hashlib.update
anatoly techtonik added the comment: Otherwise you need to repeat this code. def filehash(filepath): blocksize = 64*1024 sha = hashlib.sha256() with open(filepath, 'rb') as fp: while True: data = fp.read(blocksize) if not data: break sha.update(data) return sha.hexdigest() -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17436 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17436] pass a file object to hashlib.update
STINNER Victor added the comment: It makes sense to allow hashlib.update accept file like object to read from. Not update directly, but I agree that an helper would be convinient. Here is another proposition using unbuffered file and readinto() with bytearray. It should be faster, but I didn't try with a benchmark. I also wrote two functions, because sometimes you have a file object, not a file path. --- import hashlib, sys def hash_readfile_obj(obj, fp, buffersize=64 * 1024): buffer = bytearray(buffersize) while True: size = fp.readinto(buffer) if not size: break if size == buffersize: obj.update(buffer) else: obj.update(buffer[:size]) def hash_readfile(obj, filepath, buffersize=64 * 1024): with open(filepath, 'rb', buffering=0) as fp: hash_readfile_obj(obj, fp, buffersize) def file_sha256(filepath): sha = hashlib.sha256() hash_readfile(sha, filepath) return sha.hexdigest() for name in sys.argv[1:]: print(%s %s % (file_sha256(name), name)) --- readfile() and readfile_obj() should be methods of an hash object. -- nosy: +haypo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17436 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17436] pass a file object to hashlib.update
Changes by Jesús Cea Avión j...@jcea.es: -- nosy: +jcea ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17436 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17436] pass a file object to hashlib.update
anatoly techtonik added the comment: Why unbuffered will be faster?? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17436 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17436] pass a file object to hashlib.update
anatoly techtonik added the comment: Even though I mentioned passing file object in the title of this bugreport, what I really need is the following API: hexhash = hashlib.sha256().readfile(filename).hexdigest() -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17436 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17436] pass a file object to hashlib.update
STINNER Victor added the comment: Why unbuffered will be faster?? Well, I'm not sure that it is faster. But I would prefer to avoid buffering if it is not needed. 2013/3/16 anatoly techtonik rep...@bugs.python.org: anatoly techtonik added the comment: Why unbuffered will be faster?? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17436 ___ -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17436 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17436] pass a file object to hashlib.update
anatoly techtonik added the comment: I don't get that. I thought that buffered reading should be faster, although I agree that OS should handle this better. Why have the buffering turned on by default then? (I miss the ability to fork discussions from tracker, but there is no choice). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17436 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com