Jason Roberts wrote:
> Hi Laurent,
>
>> The libR is used as a shared library.
>> Under win32, and AFAIUI, that should translate as using an unbound DLL
>> (otherwise the same version of libR will be required) and hold as long
>> as the names used from the symbol table presented by libR.so do not
> change.
>> In the case it does, then a new(er) version of rpy2 should be available.
>
> Ok. AFAIK that should work. I wondered why rpy did not work this way. The
> only thing I could guess is that rpy wanted to support a large range of
> versions of R, including very early versions where the R team was still
> deciding on the names and definitions of very core functions.
I think that this is mostly what happened.
Duncam with RSPython, Walter and Greg with rpy, and Simon Urbanek with
JRI, probably caused changes in the R API.
> Hopefully
> those functions are stable now and you will not need to take the same course
> of action with rpy2.
>
> What version of R do you use when compiling rpy2? I noticed a comment saying
> rpy2 is compatible with R 2.7.0 and later. Are you compiling using 2.7.0
> then?
I did so when compiling the first win32 builds.
Laurent Oget has been contributing the win32 builds since the release
2.0.0b1.
>> On a related note, I'd like to offer the option to have R really
>> embedded in rpy2 (with an R install inside the rpy2 installed module)...
>> so if someone has the time...
>
> This sounds interesting. It would allow Python users to call R without
> having to install R separately, and you could ensure there were no version
> compatibility problems. On the minus side, it would mean you need to release
> a new version of rpy2 whenever a new R was released.
That would be an option someone making a compiled build could switch on
(that does not mean that I will provide such builds, and this for the
reason you mention).
This is probably of interest for standalone solutions, and will probably
have to wait.
> Unfortunately I do not have time to work on this, at least right now.
>
>> Thanks. An ultimate patch would likely be a little more complex
>> (by checking that os.path.join(R_HOME, 'bin') is not the _first_ R found
>> in the PATH... but I am sure of what PATH is needed for here - can
>> someone with win32 try when just removing the PATH creation ?)
>
> I would be happy to try this for you, but I'm not sure exactly what you want
> me to do. Are you unsure whether all three of those directories (bin,
> modules, and lib) need to be in the PATH? I can try them in different
> combinations and see what happens. Let me know if this is what you want.
I meant: just try commenting out the three lines
os.environ['PATH'] += ';' + os.path.join(R_HOME, 'bin')
os.environ['PATH'] += ';' + os.path.join(R_HOME, 'modules')
os.environ['PATH'] += ';' + os.path.join(R_HOME, 'lib')
...but your code below tells why PATH is indeed needed (and my request
irrelevant).
> I can say that with my R 2.8.1 installation, there is no directory called
> lib. There is a directory called library, but that is where all the R
> packages go. There are no binaries in there. So I would suggest that you
> could remove lib from the PATH, but before you did this, we should check all
> the versions of R back to 2.7.0. I can do that if you want.
>
> In the modules directory, there are some shared libraries, such as
> lapack.dll. I am not sure what the difference is between lapack.dll there
> and the Rlapack.dll in the bin directory, but I do recall there being some
> issues with lapack in the past. I suggest you keep modules in the PATH.
>
> In my wrapper around rpy, I have the following related code which might be
> of interest to you:
>
> # Before importing rpy, capture the PATH environment
> # variable. rpy is going to add some R directories to it.
> # Because R imposes a maximum length on environment
> # variables (perhaps 1019 characters), we need to move
> # these directories to the front of the PATH to ensure
> # they are not truncated by the maximum length limiter.
> # This will work around the issue described by MGET ticket
> # #286.
>
> oldPath = os.environ['PATH'].split(os.pathsep)
>
> # Now import rpy.
>
> from GeoEco.AssimilatedModules.rpy import rpy
> RDependency._rpy = rpy
>
> # Move the paths that rpy appended to the front of the
> # PATH.
>
> newPath = os.environ['PATH'].split(os.pathsep)
> newPath = os.pathsep.join(newPath[len(oldPath):] +
> newPath[:len(oldPath)])
>
> # To work around MGET ticket #203 (Evaluate R Statements
> # tools fail with "lapack routines cannot be loaded" error
> # when running a glm), set the PATH environment variable
> # seen by the R interpreter to that seen by Python, so R
> # sees the changes that rpy attempted to make to the PATH.
>
> rpy.r('Sys.setenv(PATH="%s")' % newPath.replace('\\', '\\\\'))
>
> Finally, regarding the memory and handle leak tests:
>
> I used ArcGIS 9.3 SP1, Python 2.5.1, R 2.8.1, rpy2 2.0.3, WinXP SP3 with
> latest updates. In ArcGIS, I created a geoprocessing model with a single
> instance of the tool I mentioned in my previous message. I configured the
> model to run 100000 times and started it. Using Windows Task Manager, I
> monitored VM Size (equivalent to Private Bytes in perfmon) and Handles.
>
> I first ran the test a few times with these two lines of the script
> commented out:
>
> #from rpy2 import robjects
> #sqrt_x = robjects.r.sqrt(x)[0]
>
> Then I ran it again with the comments removed. This me to see if there were
> leaks when rpy2 was not even imported. Interpreting the results are
> difficult because ArcGIS exhibited a bug (how typical) in which it said the
> model was complete before the progress bar reached 100000.
>
> Without rpy2:
>
> Iterations Memory Memory Handles Handles
> completed at start at end at start at end
> ---------- -------- ------ -------- -------
> 7921 190 MB 395 MB 1409 1403
> 7891 395 MB 490 MB 1406 1405
> 7930 489 MB 599 MB 1405 1405
>
> With rpy2:
>
> Iterations Memory Memory Handles Handles
> Completed at start at end at start at end
> ---------- -------- ------ -------- -------
> 15408 599 MB 692 MB 1408 1408
> 59168 692 MB 784 MB 1409 1408
>
> The very first time I ran this, it looks like ArcGIS allocated a 200 MB that
> it did not immediately release. I do not consider this to necessarily be a
> leak. It may have an internal allocator that is configured to hold on to a
> bunch of memory for a while. But in every subsequent run, it allocated about
> 100 MB more, including the runs with rpy2 enabled.
>
> These results are tricky to interpret. First of all, I do not understand why
> the progress bar reported many more iterations with rpy2 enabled. It may be
> that the progress bar is broken, and that 100000 iterations completed in all
> cases, but that the script executed so quickly that progress events were
> dropped by ArcGIS, or something like that. This would explain why more
> iterations were reported with rpy2, because the script would go slower and
> not overwhelm the progress bar as much. It would also explain why about the
> same amount of memory is leaked with and without rpy, regardless of the
> number of iterations completed.
>
> In any case, it does not appear that substantially more memory was leaked
> with rpy2 enabled. This is a good sign, and because of this, I'm not going
> to bother trying to determine whether the progress bar is broken or ArcGIS
> is truly halting the iteration before 100000 is reached. In either
> situation, there is a bug with ArcGIS, not rpy2. ArcGIS has always been a
> buggy program, despite its popularity.
Isn't GRASS a worthy Open Source alternative to it ? (I am not so much
into GIS, so you will know better - I am just being curious here)
> Finally, it is clear that no handles are leaked.
Glad to hear that.
> There is probably at least one place in rpy2 that is leaking a module
> handle, in rinterface/__init__.py:
>
> win32api.LoadLibrary( Rlib )
>
> This will not cause a handle leak in the usual sense. Instead it will just
> cause the process's internal reference count for R.dll to increment every
> time rpy2 is imported. This is sub-optimal, but there is probably little
> harm. The reference leak will prevent R.dll from ever being unloaded but
> given that rpy2 and Python itself do not shut down very cleanly, it might be
> very hard to achieve proper unloading of R.dll anyway. I don't think you
> need to address this.
It doesn't harm to do things cleanly either.
Do not hesitate to share what would be better if you have it available.
> These results look pretty good to me. I am going to investigate integrating
> rpy2 into our application!
Good.
Let us know how it goes.
L.
> Jason
>
> -----Original Message-----
> From: Laurent Gautier [mailto:[email protected]]
> Sent: Friday, March 20, 2009 3:51 AM
> To: Jason Roberts
> Cc: 'RPy help, support and design discussion list'
> Subject: Re: FW: rpy2 in ArcGIS 9.3
>
> Jason Roberts wrote:
>> Laurent,
>>
>> Thank you very much for the reply.
>>
>>> I am not certain of which way the risk probability stand (compile each
>>> time, or compile once and hope for the best). Time will tell.
>> So rpy2 does not require recompilation every time R is released? How is it
>> binding to R then? (I have not looked at the C code yet. If you can just
>> point me in the right direction I can figure it out myself.)
>
> The libR is used as a shared library.
> Under win32, and AFAIUI, that should translate as using an unbound DLL
> (otherwise the same version of libR will be required) and hold as long
> as the names used from the symbol table presented by libR.so do not change.
> In the case it does, then a new(er) version of rpy2 should be available.
> Admittedly not an absolute perfect options, but I wanted to avoid
> version-specific conditional definitions in the code; rpy had it, but I
> had to start from a simple base. This does not mean this aspect of rpy
> will not be added in the future, but I'd like to explore options first.
>
> On a related note, I'd like to offer the option to have R really
> embedded in rpy2 (with an R install inside the rpy2 installed module)...
> so if someone has the time...
>
>>> You could try with a dummy minimal extension to ArcGIS and tell us.
>> I tried this out using ArcGIS 9.3 SP1, Python 2.5.1 (comes with ArcGIS
> 9.3),
>> and rpy2-2.0.3.win32-py2.5.exe. I created a Python-based geoprocessing
> tool
>> with the following code to exercise rpy2 in a minimal way:
>>
>> # Initialize the ArcGIS geoprocessor object, so we can communicate
>> # with ArcGIS.
>>
>> import arcgisscripting
>> gp = arcgisscripting.create()
>>
>> # Using rpy2, calculate the square root of the input parameter. If we
>> # catch an exception, report a traceback to ArcGIS.
>>
>> import os, traceback
>> try:
>> x = gp.GetParameter(0)
>> from rpy2 import robjects
>> sqrt_x = robjects.r.sqrt(x)[0]
>> except:
>> gp.AddError(traceback.format_exc())
>> raise
>>
>> It worked (!!!) and the performance appeared to be quite good. I am
> running
>> it in a loop now to check for leaks. I'll send a followup on that later.
>
> If you are having an issue, check the following:
> http://www.mail-archive.com/[email protected]/msg01696.html
>
>
>> There was one problem that I noticed immediately. Currently, line 37 of
>> rinterface/__init.py__ blindly adds R directories to the PATH:
>>
>> # Win32-specific code copied from RPy-1.x
>> if sys.platform == 'win32':
>> import win32api
>> os.environ['PATH'] += ';' + os.path.join(R_HOME, 'bin')
>> os.environ['PATH'] += ';' + os.path.join(R_HOME, 'modules')
>> os.environ['PATH'] += ';' + os.path.join(R_HOME, 'lib')
>
> I see.
>
>> The new PATH is persisted in the environment of the calling ArcGIS
> process.
>> When that process initializes the Python interpreter a second time, this
>> code is called again, adding duplicate entries to PATH. This can go on
> until
>> the PATH reaches 32767 characters, and then putenv will raise an OSError.
> In
>> my case, my tool ran 335 times before this occurred. I observed the
> problem
>> happen by adding additional logging statements to my minimal example
> above,
>> and watched the len(os.environ['PATH']) grow close to 32767 before putenv
>> failed.
>>
>> To fix, something like this is appropriate:
>>
>> # Win32-specific code copied from RPy-1.x
>> if sys.platform == 'win32':
>> import win32api
>> if os.path.join(R_HOME, 'bin') not in os.environ['PATH'].split(';'):
>> os.environ['PATH'] += ';' + os.path.join(R_HOME, 'bin')
>> if os.path.join(R_HOME, 'modules') not in
> os.environ['PATH'].split(';'):
>> os.environ['PATH'] += ';' + os.path.join(R_HOME, 'modules')
>> if os.path.join(R_HOME, 'lib') not in os.environ['PATH'].split(';'):
>> os.environ['PATH'] += ';' + os.path.join(R_HOME, 'lib')
>
> Thanks. An ultimate patch would likely be a little more complex
> (by checking that os.path.join(R_HOME, 'bin') is not the _first_ R found
> in the PATH... but I am sure of what PATH is needed for here - can
> someone with win32 try when just removing the PATH creation ?)
>
>> I'm currently running it 100000 times, monitoring memory and handles. I'll
>> let you know how it turns out.
>>
>> I'm pretty hopeful this will work out well. There could be problems with R
>> packages that do fancy things (like link to other C libraries) but even if
>> that's a problem, just having the ability to do basic R from ArcGIS 9.3 in
> a
>> performant manner will be very, very nice for us and our users.
>>
>> Jason
>>
>>
>> -----Original Message-----
>> From: Laurent Gautier [mailto:[email protected]]
>> Sent: Thursday, March 19, 2009 1:57 AM
>> To: RPy help, support and design discussion list
>> Cc: Jason Roberts
>> Subject: Re: FW: rpy2 in ArcGIS 9.3
>>
>> Jason Roberts wrote:
>>> Greetings rpy2 developers,
>>>
>>>
>>>
>>> I am the primary developer of an open source Python package called
>>> Marine Geospatial Ecology Tools
>>> (http://code.env.duke.edu/projects/mget). These tools perform various
>>> jobs that are useful to marine ecologists. Many of the tools are
>>> designed to be invoked from ArcGIS, a desktop GIS application that runs
>>> on Windows.
>>>
>> rpy2 works best on UNIX-alikes at the moment.
>> (features are not working on win32).
>>
>>> To date, we have had good success accessing R using rpy. Thank you very
>>> much for making this package freely available.
>> I can't take those credits:
>> rpy is Walter and Greg's work, with the help of contributors.
>>
>>> But we noted last year
>>> that rpy is no longer being maintained, and rpy2 is the new replacement.
>> Kind of. I started with rpy2 about a year ago, as what I was trying to
>> do did not appear possible with rpy. Rpy is still available, although
>> its development on the slow lane at the moment, I think.
>>
>>> It will be a big job for us to switch to rpy2, so we have been delaying
>>> the switch. In the interim, we've been compiling rpy every time a new R
>>> release has come out. This is probably increasingly risky, so we're
>>> becoming more motivated to make the switch.
>> I am not certain of which way the risk probability stand (compile each
>> time, or compile once and hope for the best). Time will tell.
>>
>>> In addition, there is an
>>> ArcGIS 9.3 / rpy compatibility problem that is pretty inconvenient.
>>> Basically we are wondering if this problem exists with rpy2.
>>>
>>>
>>>
>>> The problem was discussed last year; see
>>>
> http://sourceforge.net/tracker/?func=detail&atid=453021&aid=2062627&group_id
>> =48422
>>
> <http://sourceforge.net/tracker/?func=detail&atid=453021&aid=2062627&group_i
>> d=48422>.
>>> In brief: Every time ArcGIS 9.3 runs a Python-based tool, it initializes
>>> a new instance of the Python interpreter in the ArcGIS process
>>> (typically ArcCatalog.exe or ArcMap.exe). The interpreter instance
>>> eventually loads the rpy extension module (e.g. _rpy2070.dll). The
>>> interpreter exits when the tool completes. But this does not cause the
>>> rpy extension module to be unloaded from the process, and when ArcGIS
>>> runs the tool a second time, creating a new Python interpreter, rpy
>>> fails to initialize.
>>>
>>>
>>>
>>> In last year's bug report, lgautier mentioned that "the problem was
>>> fixed a few weeks ago" (i.e. last summer). Is it correct then that this
>>> procedure of initializing the interpreter, using rpy2, shutting down the
>>> interpreter, and so on, can be done indefinitely from a single process
>>> without any ill effects?
>>>
>> May be, may be not.
>> I have not looked at whether the C-level part of rpy2 does what it
>> should regarding the creating and destruction of Python interpreters.
>>
>> You could try with a dummy minimal extension to ArcGIS and tell us.
>>
>>
>>
>> Hoping this helps,
>>
>>
>>
>> L.
>>
>>> Thanks for your help! And thanks again to you guys for developing this
>>> great reusable software.
>>>
>>>
>>>
>>> Jason
>>>
>>>
>>>
>>>
>>>
>>> / /
>>>
>>
>
>
>
> ------------------------------------------------------------------------------
> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
> powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
> easily build your RIAs with Flex Builder, the Eclipse(TM)based development
> software that enables intelligent coding and step-through debugging.
> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
> _______________________________________________
> rpy-list mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rpy-list
------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
rpy-list mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rpy-list