New submission from Gunnar Aastrand Grimnes:

When reading large files with fileinput, it will work as expected and only 
process a line at a time when used normally, but if you add an hook_encoded 
openhook it will read the whole file into memory before returning the first 
line. 

Verify by running this program on a large text file: 

import fileinput

for l in fileinput.input(openhook=fileinput.hook_encoded('iso-8859-1')):
    raw_input()

and check how much memory it uses. Remove the openhook and memory usage goes 
down to nothing.

The problem is that fileinput calls readlines with a size-hint and in 
codecs.StreamReader, readlines explicitly ignores this hint and reads all lines 
into memory. 

http://bugs.python.org/issue20501 is open for fixing up the documentation for 
fileinput, but a fix would also be nice.

I see two options: 

1. As suggested by r.david.murray: Give us a way of signaling to fileinput that 
it should not use readlines, for instance by setting buffer=None

2. Fix the codecs module to allow StreamReader to respect the hint if given. 
Although the comment there says it's no efficient way to do this, at least an 
inefficient way would be better than reading a possibly infinite stream in. A 
simple solution would be to repeatedly call readline. A more complicated 
solution would be to read chunks from the stream, and then encode them, just 
like the readline method does. 

BTW - this issue is py2.7 only, I tested a file object from io.open with 
encoding in 3.3 and it supports readlines just fine.

----------
components: IO
messages: 210371
nosy: gromgull
priority: normal
severity: normal
status: open
title: fileinput module will read whole file into memory when using 
fileinput.hook_encoded due to codecs.StreamReader.readlines
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20528>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to