[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-03-03 Thread Roundup Robot

Roundup Robot added the comment:

New changeset d37f963394aa by Serhiy Storchaka in branch '2.7':
Correct and improve comments in test_fileinput (closes #20501).
http://hg.python.org/cpython/rev/d37f963394aa

New changeset 204ef3bca9c8 by Serhiy Storchaka in branch '3.3':
Correct comments and improve failure reports in test_fileinput (closes #20501).
http://hg.python.org/cpython/rev/204ef3bca9c8

New changeset c47cc6351ce7 by Serhiy Storchaka in branch 'default':
Correct comments and improve failure reports in test_fileinput (closes #20501).
http://hg.python.org/cpython/rev/c47cc6351ce7

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-03-03 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Thank you Vajrasky and Zachary.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-28 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

My bad. I had cleaned patch before committing to 3.x, but forgot to revert and 
correct it in 2.7. Thank you Vajrasky an Zachary.

 Does it mean: UTF-7 is an unlikely locale encoding?

Sorry for my bad English. I meant that UTF-7 is used here because it is 
unlikely that it is locale encoding. Is an needed here?

 I would actually vote for using writeTmp, and make a new issue for making 
 writeTmp use a context manager.

I think that explicit two-line code is better than calling superfluous 
function. We don't need generated filename, we write only one chunk of code, we 
use context manager, so the code is only two lines long.

 inside a `try ... except UnicodeDecodeError: self.fail('Read to end of 
 file')` block.

May be, but in 2.7 a traceback doesn't contain info about original exception 
and this can hide necessary details in case of failure.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-28 Thread Zachary Ware

Zachary Ware added the comment:

For the UTF-7 comment, I'd go with something like locale encoding is probably 
not UTF-7 or UTF-7 is a convenient, seldom used encoding, or something to 
that effect.

About writeTmp vs. context manager: as you wish.  It would still be good to 
convert writeTmp to use a context manager though, but that is a separate issue. 
 Just cleaning up 2.7 to match 3.x is enough to make me happy :)

For the explicit failure message, perhaps the best we can do in 2.7 is a 
comment saying that the most likely failure is a UnicodeDecodeError due to the 
entire file being read when it shouldn't have been.  That can be deduced from 
the separate comments you already have there (and especially from the issue 
link), but I think it may be better to just come right out and say it.  If 
exception chaining means the original exception is kept with a self.fail call 
in 3.x, I think 3.x should get the explicit failure message, though.  I don't 
feel too strongly about it, though; I'll leave it to you to change or not as 
you like :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-27 Thread Zachary Ware

Zachary Ware added the comment:

Agreed with Vajrasky, the 2.7 commit wasn't as clean as it could be (sorry I 
didn't get an actual review done before you committed, I posted my last test 
results in a bit of a hurry).  The commented out line should either be removed, 
or used instead of the 'with' block above it.  I would actually vote for using 
writeTmp, and make a new issue for making writeTmp use a context manager.

Also, I think the new test_readline could be made a little clearer by putting

+fi = FileInput(files=TESTFN, openhook=hook_encoded('ascii'), bufsize=8)
+self.assertEqual(fi.readline(), u'A\n')
+self.assertEqual(fi.readline(), u'B\r\n')
+self.assertEqual(fi.readline(), u'C\r')

inside a `try ... except UnicodeDecodeError: self.fail('Read to end of file')` 
block.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-26 Thread Zachary Ware

Zachary Ware added the comment:

Output attached.

--
Added file: http://bugs.python.org/file34231/20501.output

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-26 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Same as on Linux (and differs from 3.x). Thank you Zachary.

Here is corrected patch. Added also a test which tests that readline() doesn't 
read whole file.

--
Added file: http://bugs.python.org/file34232/fileinput_hook_encoded_3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-26 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Removed file: http://bugs.python.org/file33934/fileinput_hook_encoded.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-26 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Removed file: http://bugs.python.org/file34222/fileinput_hook_encoded_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-26 Thread Zachary Ware

Zachary Ware added the comment:

New patch passes on Windows.  Without the patch to fileinput.py, the new 
hook_encoded tests pass and the new test_readline test fails (as expected).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-26 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 1a1a9d6fb278 by Serhiy Storchaka in branch '2.7':
Issue #20501: fileinput module no longer reads whole file into memory when using
http://hg.python.org/cpython/rev/1a1a9d6fb278

New changeset b4a139713b3b by Serhiy Storchaka in branch '3.3':
Added tests for issue #20501.
http://hg.python.org/cpython/rev/b4a139713b3b

New changeset 1a38fa1f701d by Serhiy Storchaka in branch 'default':
Added tests for issue #20501.
http://hg.python.org/cpython/rev/1a38fa1f701d

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-26 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
assignee: docs@python - serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-26 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Thank you Zachary that ran tests for me.

Thank you Gunnar for your report.

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-26 Thread Vajrasky Kok

Vajrasky Kok added the comment:

From http://hg.python.org/cpython/rev/1a1a9d6fb278

+t1 = TESTFN
+#t1 = writeTmp(1, ['A\nB\r\nC\rD+IKw-'], mode='wb')
+self.addCleanup(safe_unlink, TESTFN)

You left out the debugging statement.

And this English statement:

# Unlikely UTF-7 is locale encoding

could be made clearer. Does it mean: UTF-7 is an unlikely locale encoding?

--
nosy: +vajrasky

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-25 Thread Zachary Ware

Zachary Ware added the comment:

 Can anyone please test the patch on Windows?

It seems to work; memory usage is much lower with the patch than without using 
an 8 MB file.  I don't notice any behavioral change with the patch; if there's 
anything specific to look for on that front, give me a test script in either 
English or Python :)

--
nosy: +zach.ware

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-25 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Thank you Zachary.

Here is a patch with a test. I'm not sure that it is successful on Windows.

--
Added file: http://bugs.python.org/file34222/fileinput_hook_encoded_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-25 Thread Zachary Ware

Zachary Ware added the comment:

The new test passes on Windows with the whole patch applied, but fails without 
the changes to fileinput.py.  Is this change meant to fix behavior, or just the 
memory usage issue?

Just for completeness, here's the failure output (with unpatched fileinput.py):

==
FAIL: test_modes (__main__.Test_hook_encoded)
--
Traceback (most recent call last):
  File P:\ath\to\2.7\cpython\lib\test\test_fileinput.py, line 235, in 
test_modes
check('r', [u'A\n', u'B\r\n', u'C\rD\u20ac'])
  File P:\ath\to\2.7\cpython\lib\test\test_fileinput.py, line 233, in check
self.assertEqual(lines, expected_lines)
AssertionError: Lists differ: [u'A\n', u'B\r\n', u'C\r', u'D... != [u'A\n', 
u'B\r\n', u'C\rD\u20a...

First differing element 2:
C
D\u20ac

First list contains 1 additional elements.
First extra element 3:
D\u20ac

- [u'A\n', u'B\r\n', u'C\r', u'D\u20ac']
? -

+ [u'A\n', u'B\r\n', u'C\rD\u20ac']

--

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-25 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

What would be the failure output when comment out first 1, 2 or three checks?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-24 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Can anyone please test the patch on Windows?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-05 Thread Gunnar Aastrand Grimnes

Gunnar Aastrand Grimnes added the comment:

Agreed that a doc-fix is the first step. 

Shall I make another issue for the fix in fileinput?

 I also wonder if the comment in codecs is not too negative - it is not easy to 
get it 100% right, but the method just above readlines is readline, which does 
somehow manage to read a single line. To me it seems like it would be better to 
makes readlines repeatedly call readline. It would also be inefficient, but 
still better than reading the whole (possibly infinite) stream in ...

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-05 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

As far as the issue is in the codecs module, this is 2.7 only issue, because 
codecs.open is used only in 2.7. And I suppose that replacing codecs.open by 
io.open should fix this issue.

--
keywords: +patch
nosy: +benjamin.peterson, hynek, pitrou, serhiy.storchaka, stutzbach
stage:  - patch review
versions:  -Python 3.3, Python 3.4
Added file: http://bugs.python.org/file33934/fileinput_hook_encoded.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-03 Thread Gunnar Aastrand Grimnes

New submission from Gunnar Aastrand Grimnes:

When reading large files with fileinput, it will work as expected and only 
process a line at a time when used normally, but if you add an hook_encoded 
openhook it will read the whole file into memory before returning the first 
line. 

Verify by running this program on a large text file: 

import fileinput

for l in fileinput.input(openhook=fileinput.hook_encoded('iso-8859-1')):
raw_input()

and check how much memory it uses. Remove the openhook and memory usage goes 
down to nothing.

--
components: IO
messages: 210130
nosy: gromgull
priority: normal
severity: normal
status: open
title: fileinput module will read whole file into memory when using 
fileinput.hook_encoded
type: behavior
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-03 Thread Gunnar Aastrand Grimnes

Gunnar Aastrand Grimnes added the comment:

The problem lies in codecs.py here: 
http://hg.python.org/cpython/file/ae7facd874ba/Lib/codecs.py#l581

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20501] fileinput module will read whole file into memory when using fileinput.hook_encoded

2014-02-03 Thread R. David Murray

R. David Murray added the comment:

This requires (1) a doc update to indicate the problem and (2) a way to tell 
fileinput to *not* use readlines to optimize by calling readlines(bufsize), 
since in the case of using codecs it can be a problem.  Presumably buffer=None 
would be the logical way to spell this.
I'm marking this as a doc bug, a separate issue should be opened for adding the 
fileinput buffer=None enhancement.

(The codec module's comments indicate there's no practical way to implement 
sizehint.)

--
assignee:  - docs@python
components: +Documentation
nosy: +docs@python, r.david.murray
versions: +Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com