Bugs item #2814892, was opened at 2009-06-30 18:42
Message generated for change (Tracker Item Submitted) made by batripler
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=453021&aid=2814892&group_id=48422
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Source
Group: rpy2
Status: Open
Resolution: None
Priority: 5
Private: Yes
Submitted By: batripler (batripler)
Assigned to: Nobody/Anonymous (nobody)
Summary: memory leak in SexpVector
Initial Comment:
Hi Laurent,
Thank you for creating a wonderfully useful piece of software. I've started
using it for a few weeks, now, and I think I have uncovered a relatively
serious problem in rpy2.rinterface.SexpVector, which is at the heart of the
system. Here is a manifestation of the problem. Perhaps I am doing something
wrong.
Start a new python session and run:
{{{
import numpy; x=numpy.zeros(2e7)
}}}
You can modify the size of the array. Also, depending on the numpy defaults on
your machine, the memory consumption will vary. On my machine a double is 8
bytes, times 2e7 = ~150MB. I see the process at ~162MB due to the Python
interpreter footprint.
Now, kill this session, start a new one, and run the following:
{{{
import rpy2.robjects, rpy2.rinterface as rint
reval=rint.baseNameSpaceEnv['eval']
rparse=rint.baseNameSpaceEnv['parse']
x=reval(rparse(text=rint.StrSexpVector(["numeric(2e7)"])))
}}}
In this case, we are just creating a REALSXP vector on the R side. I see the
process coming in at 203MB, which is reasonable given that both Python and R
interpretters are now running. Again, this is assuming that every element of
the REALSXP vector is 8 bytes.
Now, finally, in a new Python process, let's create an array on the Python side
and copy it over to the R side:
{{{
import numpy, rpy2.robjects, rpy2.rinterface as rint;
x=numpy.zeros(2e7); y=rint.SexpVector(x, rint.REALSXP)
}}}
I would expect the max size of this process to be the sum of the previous two.
It contains both the Python object, as well an equivalently-large R object. It
comes in at a whopping 950MB!!
Incidentally, if I run the following code:
{{{
import numpy; x=numpy.zeros(2e7); y=list(x)
}}}
... it weighs in at a hefty 985MB.
Now, I had a look at your code rinterface.c:newSEXP, and noticed that you are
iterating using the Python sequence protocol, and creating object
intermediaries. I don't mind the temporary memory bloat -- though it would be
much faster and leaner to special-case numpy arrays and avoid the move to
object space and back, -- but somehow these intermediaries are also hanging
around. Either that, or the allocator is, for some reason, not returning space
back to the OS. A few of these conversions and our processes is toasted.
Also, for proper 64-bit compatiblity, the index variable "i" should be
Py_ssize_t.
FYI - I'm compiling from source on a 64-bit Linux box. Running Python 2.5.4,
with numpy 1.3.0, and rpy2 2.0.5.
Thanks again for a great tool.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=453021&aid=2814892&group_id=48422
------------------------------------------------------------------------------
_______________________________________________
rpy-list mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rpy-list