[issue21146] update gzip usage examples in docs

2015-04-14 Thread Roundup Robot

Roundup Robot added the comment:

New changeset ae1528beae67 by Andrew Kuchling in branch 'default':
#21146: give a more efficient recipe in gzip docs
https://hg.python.org/cpython/rev/ae1528beae67

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21146
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21146] update gzip usage examples in docs

2015-04-14 Thread A.M. Kuchling

Changes by A.M. Kuchling a...@amk.ca:


--
resolution:  - fixed
stage:  - resolved
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21146
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21146] update gzip usage examples in docs

2015-04-14 Thread A.M. Kuchling

A.M. Kuchling added the comment:

Applied to trunk.  Wolfgang Maier: thanks for your patch!

--
nosy: +akuchling

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21146
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21146] update gzip usage examples in docs

2014-04-15 Thread Wolfgang Maier

Wolfgang Maier added the comment:

well, buffering is not the issue here. It's that the file iterator used in the 
current example is line-based, so whatever the buffer size you're doing 
unnecessary inspection to find and split on line terminators.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21146
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21146] update gzip usage examples in docs

2014-04-14 Thread Matt Chaput

Matt Chaput added the comment:

The patch looks good to me.

--
nosy: +maatt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21146
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21146] update gzip usage examples in docs

2014-04-14 Thread Éric Araujo

Éric Araujo added the comment:

Isn’t there a buffering argument in open that can be used to avoid line 
buffering?

--
nosy: +eric.araujo
versions: +Python 3.5 -Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21146
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21146] update gzip usage examples in docs

2014-04-08 Thread Wolfgang Maier

Wolfgang Maier added the comment:

ok, I've prepared the patch using the elegant shutil solution.

--
keywords: +patch
Added file: http://bugs.python.org/file34765/gzip_example_usage_patch.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21146
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21146] update gzip usage examples in docs

2014-04-03 Thread Wolfgang Maier

New submission from Wolfgang Maier:

The current documentation of the gzip module should have its section 12.2.1. 
Examples of usage updated to reflect the changes made to the module in 
Python3.2 (https://docs.python.org/3.2/whatsnew/3.2.html#gzip-and-zipfile).

Currently, the recipe given for gz-compressing a file is:

import gzip
with open('/home/joe/file.txt', 'rb') as f_in:
with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
f_out.writelines(f_in)

which is clearly sub-optimal because it is line-based.

An equally simple, but more efficient recipe would be:

chunk_size = 1024
with open('/home/joe/file.txt', 'rb') as f_in:
with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
while True:
c = f_in.read(chunk_size)
if not c: break
d = f_out.write(c)

Comparing the two examples I find a = 2x performance gain (both in terms of 
CPU time and wall time).

In the inverse scenario of file *de*-compression (which is not part of the docs 
though), the performance increase of substituting:

with gzip.open('/home/joe/file.txt.gz', 'rb') as f_in:
with open('/home/joe/file.txt', 'wb') as f_out:
f_out.writelines(f_in)

with:

with gzip.open('/home/joe/file.txt.gz', 'rb') as f_in:
with open('/home/joe/file.txt', 'wb') as f_out:
while True:
c = f_in.read(chunk_size)
if not c: break
d = f_out.write(c)

is even higher (4-5x speed-ups).

In the de-compression case, another = 2x speed-up can be achieved by avoiding 
the gzip module completely and going through a zlib.decompressobj instead, but 
of course this is a bit more complicated and should be documented in the zlib 
docs rather than the gzip docs (if you're interested, I could provide my code 
for it though).
Using the zlib library compression/decompression speed gets comparable to linux 
gzip/gunzip.

--
assignee: docs@python
components: Documentation
messages: 215440
nosy: docs@python, wolma
priority: normal
severity: normal
status: open
title: update gzip usage examples in docs
type: performance
versions: Python 3.2, Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21146
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21146] update gzip usage examples in docs

2014-04-03 Thread INADA Naoki

INADA Naoki added the comment:

Maybe, shutil.copyfileobj() is good.

import gzip
import shutil

with open(src, 'rb') as f_in:
with gzip.open(dst, 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)

--
nosy: +naoki

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21146
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21146] update gzip usage examples in docs

2014-04-03 Thread Wolfgang Maier

Wolfgang Maier added the comment:

 with open(src, 'rb') as f_in:
 with gzip.open(dst, 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)

+1 !!
exactly as fast as my suggestion (with compression and de-compression), but a 
lot clearer !
Hadn't thought of it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21146
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21146] update gzip usage examples in docs

2014-04-03 Thread Wolfgang Maier

Wolfgang Maier added the comment:

same speed is not surprising though as shutil.copyfileobj is implemented like 
this:

def copyfileobj(fsrc, fdst, length=16*1024):
copy data from file-like object fsrc to file-like object fdst
while 1:
buf = fsrc.read(length)
if not buf:
break
fdst.write(buf)

which is essentially what I was proposing :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21146
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com