On Mon, 29 Jun 2015 at 16:13 Ozan Çağlayan <ozan...@gmail.com> wrote:
> Hello all, > > Well I am searching my dream scientific language :) > > The current codebase that I am working with is related to a language > translation software written in C++. I wanted to re-implement parts of > it in Python and/or Julia to both learn it (as I didn't write the C++ > stuff) and maybe to make it available for other people who are > interested. > > I saw Pyston last night then I came back to PyPy. > > As a first step, I tried to parse a 300MB structed text file > containing 1.1M lines like these: > > 0 ||| I love you mother . ||| label=number1 number2 number3 number4 > label2=number5 number6 ... number19 ||| number20 > > Line-by-line accessing was actually pretty fast *but* trying to store > the lines in a Python list drains RAM on my 4G laptop. This is > disappointing. A raw text file (utf-8) of 300MB takes more than 1GB of > memory. > The first question is always: can you avoid storing everything in memory? There are improvements you could do but you'll still hit the memory limit again with a slightly bigger file. So rethink that part of the code if possible. This is a fundamental algorithmic point: as long as your approach requires everything to be stored in memory it has an upper size limit. You can incrementally raise that limit with diminishing returns but it's usually better to think of a way to remove the upper limit altogether. You're storing a Python object for each line in the file. Each of these objects has an associated dict. That probably represents a significant part of the memory storage. Try using __slots__ which is intended for the situation where you have lots of small instances. (Not sure how much difference it makes with PyPy though). You can also get significantly better memory efficiency if you store using arrays of some kind rather than many different Python objects. I would probably use numpy record arrays for this problem. -- Oscar
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev