[issue21146] update gzip usage examples in docs
Roundup Robot added the comment: New changeset ae1528beae67 by Andrew Kuchling in branch 'default': #21146: give a more efficient recipe in gzip docs https://hg.python.org/cpython/rev/ae1528beae67 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21146 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21146] update gzip usage examples in docs
Changes by A.M. Kuchling a...@amk.ca: -- resolution: - fixed stage: - resolved status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21146 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21146] update gzip usage examples in docs
A.M. Kuchling added the comment: Applied to trunk. Wolfgang Maier: thanks for your patch! -- nosy: +akuchling ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21146 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21146] update gzip usage examples in docs
Wolfgang Maier added the comment: well, buffering is not the issue here. It's that the file iterator used in the current example is line-based, so whatever the buffer size you're doing unnecessary inspection to find and split on line terminators. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21146 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21146] update gzip usage examples in docs
Matt Chaput added the comment: The patch looks good to me. -- nosy: +maatt ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21146 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21146] update gzip usage examples in docs
Éric Araujo added the comment: Isn’t there a buffering argument in open that can be used to avoid line buffering? -- nosy: +eric.araujo versions: +Python 3.5 -Python 3.2, Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21146 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21146] update gzip usage examples in docs
Wolfgang Maier added the comment: ok, I've prepared the patch using the elegant shutil solution. -- keywords: +patch Added file: http://bugs.python.org/file34765/gzip_example_usage_patch.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21146 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21146] update gzip usage examples in docs
New submission from Wolfgang Maier: The current documentation of the gzip module should have its section 12.2.1. Examples of usage updated to reflect the changes made to the module in Python3.2 (https://docs.python.org/3.2/whatsnew/3.2.html#gzip-and-zipfile). Currently, the recipe given for gz-compressing a file is: import gzip with open('/home/joe/file.txt', 'rb') as f_in: with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out: f_out.writelines(f_in) which is clearly sub-optimal because it is line-based. An equally simple, but more efficient recipe would be: chunk_size = 1024 with open('/home/joe/file.txt', 'rb') as f_in: with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out: while True: c = f_in.read(chunk_size) if not c: break d = f_out.write(c) Comparing the two examples I find a = 2x performance gain (both in terms of CPU time and wall time). In the inverse scenario of file *de*-compression (which is not part of the docs though), the performance increase of substituting: with gzip.open('/home/joe/file.txt.gz', 'rb') as f_in: with open('/home/joe/file.txt', 'wb') as f_out: f_out.writelines(f_in) with: with gzip.open('/home/joe/file.txt.gz', 'rb') as f_in: with open('/home/joe/file.txt', 'wb') as f_out: while True: c = f_in.read(chunk_size) if not c: break d = f_out.write(c) is even higher (4-5x speed-ups). In the de-compression case, another = 2x speed-up can be achieved by avoiding the gzip module completely and going through a zlib.decompressobj instead, but of course this is a bit more complicated and should be documented in the zlib docs rather than the gzip docs (if you're interested, I could provide my code for it though). Using the zlib library compression/decompression speed gets comparable to linux gzip/gunzip. -- assignee: docs@python components: Documentation messages: 215440 nosy: docs@python, wolma priority: normal severity: normal status: open title: update gzip usage examples in docs type: performance versions: Python 3.2, Python 3.3, Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21146 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21146] update gzip usage examples in docs
INADA Naoki added the comment: Maybe, shutil.copyfileobj() is good. import gzip import shutil with open(src, 'rb') as f_in: with gzip.open(dst, 'wb') as f_out: shutil.copyfileobj(f_in, f_out) -- nosy: +naoki ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21146 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21146] update gzip usage examples in docs
Wolfgang Maier added the comment: with open(src, 'rb') as f_in: with gzip.open(dst, 'wb') as f_out: shutil.copyfileobj(f_in, f_out) +1 !! exactly as fast as my suggestion (with compression and de-compression), but a lot clearer ! Hadn't thought of it. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21146 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21146] update gzip usage examples in docs
Wolfgang Maier added the comment: same speed is not surprising though as shutil.copyfileobj is implemented like this: def copyfileobj(fsrc, fdst, length=16*1024): copy data from file-like object fsrc to file-like object fdst while 1: buf = fsrc.read(length) if not buf: break fdst.write(buf) which is essentially what I was proposing :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21146 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com