New issue 2071: file.readinto() uses too much memory
https://bitbucket.org/pypy/pypy/issue/2071/filereadinto-uses-too-much-memory
Andrew Dalke:
I am using CFFI to read a file containing 7 GB of uint64_t data. I use
ffi.new() to allocate the space, then readinto() the pre-allocated buffer, as
suggested by the CFFI documentation.
(Note: the docstring for readinto says "Undocumented. Don't use this; it may go
away".)
It appears that something internal to readinto makes a copy of the input
because the readinto() ends up running out of memory on my 16 GB box, which has
15 GB free.
I am able to reproduce the problem using the array module, so it is not some
oddity of the CFFI implementation. Here is an example of what causes a problem
on my machine:
```
#!python
>>>> import array
>>>> a=array.array("c", s)
>>>> a.extend(s)
>>>> a.extend(s)
# do some cleanup, to be on the safe side.
>>>> del s
>>>> import gc
>>>> gc.collect()
0
# Read ~6GB from a file with >7GB in it
>>>> len(a)
6442450944
>>>> filename = "pubchem.14"
>>>> import os
>>>> os.path.getsize(filename)
7662345264
>>>> infile = open(filename, "rb")
# Currently, virtual memory size = 8.87 GB
>>>> infile.readinto(a)
^CTerminated
# I killed it when the virtual memory was at 14 GB and still growing
```
_______________________________________________
pypy-issue mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-issue