Re: [Rd] Large discrepancies in the same object being saved to .RData

2010-07-11 Thread Prof Brian Ripley

On Sun, 11 Jul 2010, Tony Plate wrote:

Another way of seeing the environments referenced in an object is using 
str(), e.g.:



f1 - function() {

+ junk - rnorm(1000)
+ x - 1:3
+ y - rnorm(3)
+ lm(y ~ x)
+ }

v1 - f1()
object.size(f1)

1636 bytes

grep(Environment, capture.output(str(v1)), value=TRUE)

[1]   .. ..- attr(*, \.Environment\)=environment: 0x01f11a30 
[2]   .. .. ..- attr(*, \.Environment\)=environment: 0x01f11a30 


'Some of the environments in a few cases': remember environments have 
environments (and so on), and that namespaces and packages are also 
environments.  So we need to know about the environment of 
environment(v1$terms), which also gets saved (either as a reference or 
as an environment, depending on what it is).


And this approach does not work for many of the commonest cases:


f - function() {

+ x - pi
+ g - function() print(x)
+ return(g)
+ }

g - f()
str(g)

function ()
 - attr(*, source)= chr function() print(x)

ls(environment(g))

[1] g x

In fact I think it works only for formulae.


-- Tony Plate

On 7/10/2010 10:10 PM, bill.venab...@csiro.au wrote:

Well, I have answered one of my questions below.  The hidden
environment is attached to the 'terms' component of v1.


Well, not really hidden.  A terms component is a formula (see 
?terms.object), and a formula has an environment just as a closure 
does.  In neither case does the print() method tell you about it -- 
but ?formula does.



To see this



lapply(v1, environment)


$coefficients
NULL

$residuals
NULL

$effects
NULL

$rank
NULL

$fitted.values
NULL

$assign
NULL

$qr
NULL

$df.residual
NULL

$xlevels
NULL

$call
NULL

$terms
environment: 0x021b9e18

$model
NULL



rm(junk, envir = with(v1, environment(terms)))
usedVcells()


[1] 96532





This is still a bit of a trap for young (and old!) players...

I think the main point in my mind is why is it that object.size()
excludes enclosing environments in its reckonings?

Bill Venables.

-Original Message-
From: Venables, Bill (CMIS, Cleveland)
Sent: Sunday, 11 July 2010 11:40 AM
To: 'Duncan Murdoch'; 'Paul Johnson'
Cc: 'r-devel@r-project.org'; Taylor, Julian (CMIS, Waite Campus)
Subject: RE: [Rd] Large discrepancies in the same object being saved to 
.RData


I'm still a bit puzzled by the original question.  I don't think it
has much to do with .RData files and their sizes.  For me the puzzle
comes much earlier.  Here is an example of what I mean using a little
session



usedVcells- function() gc()[Vcells, used]
usedVcells()### the base load


[1] 96345

### Now look at what happens when a function returns a formula as the
### value, with a big item floating around in the function closure:



f0- function() {


+ junk- rnorm(1000)
+ y ~ x
+ }


v0- f0()
usedVcells()   ### much bigger than base, why?


[1] 10096355


v0 ### no obvious envirnoment


y ~ x


object.size(v0)  ### so far, no clue given where


### the extra Vcells are located.
372 bytes

### Does v0 have an enclosing environment?



environment(v0) ### yep.


environment: 0x021cc538


ls(envir = environment(v0)) ### as expected, there's the junk


[1] junk


rm(junk, envir = environment(v0))  ### this does the trick.
usedVcells()


[1] 96355

### Now consider a second example where the object
### is not a formula, but contains one.



f1- function() {


+ junk- rnorm(1000)
+ x- 1:3
+ y- rnorm(3)
+ lm(y ~ x)
+ }



v1- f1()
usedVcells()  ### as might have been expected.


[1] 10096455

### in this case, though, there is no
### (obvious) enclosing environment



environment(v1)


NULL


object.size(v1)  ### so where are the junk Vcells located?


7744 bytes


ls(envir = environment(v1))  ### clearly wil not work


Error in ls(envir = environment(v1)) : invalid 'envir' argument



rm(v1) ### removing the object does clear out the junk.
usedVcells()


[1] 96366




And in this second case, as noted by Julian Taylor, if you save() the
object the .RData file is also huge.  There is an environment attached
to the object somewhere, but it appears to be occluded and entirely
inaccessible.  (I have poked around the object components trying to
find the thing but without success.)

Have I missed something?

Bill Venables.

-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] 
On Behalf Of Duncan Murdoch

Sent: Sunday, 11 July 2010 10:36 AM
To: Paul Johnson
Cc: r-devel@r-project.org
Subject: Re: [Rd] Large discrepancies in the same object being saved to 
.RData


On 10/07/2010 2:33 PM, Paul Johnson wrote:

On Wed, Jul 7, 2010 at 7:12 AM, Duncan Murdochmurdoch.dun...@gmail.com 
wrote:




On 06/07/2010 9:04 PM, julian.tay...@csiro.au wrote:



Hi developers,



After some investigation I have found there can be large discrepancies 
in
the same object being saved as an external xx.RData file. The 
immediate
repercussion of this is the possible increased size of your .RData 
workspace


Re: [Rd] LinkingTo and C++

2010-07-11 Thread Dominick Samperi
While linking to package shared libs is not possible in general, as Simon
point out, it is
possible under Windows, provided Windows knows how to find the library
linked to
at runtime (this requires a customized Makefile.win). One way this is done
under
Windows is simply to place the package/libs directories containing the
package
shared libs to be linked to in the Windows search path, but this may be a
problem for packages released to CRAN since this would require updates
to CRAN's PATH environment variable.

Another possibility is to load all packages containing shared libs to be
linked to
before using any shared lib that is dynamically linked to them. For example,
if
B.dll is dynamically linked to A.dll (and both A and B are packages), and if
foo() is a function in B.dll that uses functions in A.dll, then this will
work:

library(A)
library(B)
.Call('foo')

But this is not a very natural or convenient solution. It requires that
library()
commands be used with every instance of Rscript, for example.

A better solution would be some variant of LinkingTo: A that somehow has the
same effect as setting Windows search path to include the libs directory
containing A.dll and B.dll.

Of course, most of the issues disappear when linking to static libs instead
of dynamic ones, and it is not clear that the extra effort needed to support
dynamic libs will yield much benefit in this case.

Dominick

On Thu, Feb 11, 2010 at 12:55 PM, Simon Urbanek simon.urba...@r-project.org
 wrote:

 Romain,

 I think your'e confusing two entirely different concepts here:

 1) LinkingTo: allows a package to provide C-level functions to other
 packages (see R-ext 5.4). Let's say package A provides a function foo by
 calling R_RegisterCCallable for that function. If a package B wants to use
 that function, it uses LinkingTo: and calls R_GetCCallable to obtain the
 function pointer. It does not actually link to package A because that is in
 general not possible - it simply obtains the pointers through R. In
 addition, LinkingTo: makes sure that you have access to the header files of
 package A which help you to cast the functions and define any data
 structures you may need. Since C++ is a superset of C you can use this
 facility with C++ as long as you don't depend on anything outside of the
 header files.

 2) linking directly to another package's shared object is in general not
 possible, because packages are not guaranteed to be dynamic libraries. They
 are usually shared objects which may or may not be compatible with a dynamic
 library on a given platform. Therefore the R-ext describes other way in
 which you may provide some library independently of the package shared
 object to other packages (see R-ext 5.8). The issue is that you have to
 create a separate library (PKG/libs[/arch]/PKG.so won't work in general!)
 and provide this to other packages. As 5.8 says, this is in general not
 trivial because it is very platform dependent and the most portable way is
 to offer a static library.

 To come back to your example, LinkingTo: A and B will work if you remove
 Makevars from B (you don't want to link) and put your hello method into the
 A.h header:

  library (B)
 Loading required package: A
  .Call(say_hello, PACKAGE = B)
 [1] hello

 However, your'e not really using the LinkingTo: facilities for the
 functions so it's essentially just helping you to find the header file.

 Cheers,
 Simon



 On Feb 11, 2010, at 4:08 AM, Romain Francois wrote:

  Hello,
 
  I've been trying to make LinkingTo work when the package linked to has
 c++ code.
 
  I've put dumb packages to illustrate this emails here ;
 http://addictedtor.free.fr/misc/linkingto
 
  Package A defines this C++ class:
 
  class A {
  public:
A() ;
~A() ;
SEXP hello() ;
  } ;
 
  Package B has this function :
 
  SEXP say_hello(){
A a ;
return a.hello() ;
  }
 
  headers of package A are copied into inst/include so that package B can
 have.
 
  LinkingTo: A
 
  in its DESCRIPTION file.
 
  Also, package B has the R function ;
 
  g - function(){
.Call(say_hello, PACKAGE = B)
  }
 
  With this I can compile A and B, but then I get :
 
  $ Rscript -e B::g()
  Error in dyn.load(file, DLLpath = DLLpath, ...) :
   unable to load shared library '/usr/local/lib/R/library/B/libs/B.so':
   /usr/local/lib/R/library/B/libs/B.so: undefined symbol: _ZN1AD1Ev
  Calls: :: ... tryCatch - tryCatchList - tryCatchOne - Anonymous
 
  If I then add a Makevars in B with this :
 
 
  # find the root directory where A is installed
  ADIR=$(shell $(R_HOME)/bin/Rscript -e cat(system.file(package='A')) )
 
  PKG_LIBS= $(ADIR)/libs/A$(DYLIB_EXT)
 
 
  Then it works:
 
  $ Rscript -e B::g()
  [1] hello
 
  So it appears that adding the -I flag, which is what LinkingTo does is
 not enough when the package linking from (B) actually has to link to the
 linked to package (A).
 
  I've been looking at
 

Re: [Rd] Large discrepancies in the same object being saved to .RData

2010-07-11 Thread Duncan Murdoch

On 11/07/2010 1:30 PM, Prof Brian Ripley wrote:

On Sun, 11 Jul 2010, Tony Plate wrote:

Another way of seeing the environments referenced in an object is using 
str(), e.g.:



f1 - function() {

+ junk - rnorm(1000)
+ x - 1:3
+ y - rnorm(3)
+ lm(y ~ x)
+ }

v1 - f1()
object.size(f1)

1636 bytes

grep(Environment, capture.output(str(v1)), value=TRUE)

[1]   .. ..- attr(*, \.Environment\)=environment: 0x01f11a30 
[2]   .. .. ..- attr(*, \.Environment\)=environment: 0x01f11a30 


'Some of the environments in a few cases': remember environments have 
environments (and so on), and that namespaces and packages are also 
environments.  So we need to know about the environment of 
environment(v1$terms), which also gets saved (either as a reference or 
as an environment, depending on what it is).


And this approach does not work for many of the commonest cases:


f - function() {

+ x - pi
+ g - function() print(x)
+ return(g)
+ }

g - f()
str(g)

function ()
  - attr(*, source)= chr function() print(x)

ls(environment(g))

[1] g x

In fact I think it works only for formulae.


-- Tony Plate

On 7/10/2010 10:10 PM, bill.venab...@csiro.au wrote:

Well, I have answered one of my questions below.  The hidden
environment is attached to the 'terms' component of v1.


Well, not really hidden.  A terms component is a formula (see 
?terms.object), and a formula has an environment just as a closure 
does.  In neither case does the print() method tell you about it -- 
but ?formula does.




I've just changed the default print method for formulas to display the 
environment if it is not globalenv(), which is the rule used for 
closures as well.  So now in R-devel:


 as.formula(y ~ x)
y ~ x

as before, but

 as.formula(y ~ x, env=new.env())
y ~ x
environment: 01f83400

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Saving an R program as C Code

2010-07-11 Thread Aaron J. Ferguson
Is there anyway to say R procedures or packages as C code. Ideally, I want
to run a logistic regression in R but have the code available in C, or Java
or whatever. Thoughts?

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] S4 class extends data.frame, getDataPart sees list

2010-07-11 Thread Daniel Murphy
R-Devel:

When I get the data part of an S4 class that contains=data.frame, it gives
me a list, even when the data.frame is the S4 version:

 d-data.frame(x=1:3)
 isS4(d)
[1] FALSE   # of course
 dS4-new(data.frame,d)
 isS4(dS4)
[1] TRUE# ok
 class(dS4)
[1] data.frame   # good
attr(,package)
[1] methods
 setClass(A, representation(label=character), contains=data.frame)
[1] A
 a-new(A,dS4, label=myFrame)
 getDataPart(a)
[[1]]  # oh?
[1] 1 2 3

 class(a...@.data)
[1] list   # hmm
 names(a)
[1] x # sure, that makes sense
 a
Object of class A
  x
1 1
2 2
3 3
Slot label:
[1] myFrame


Was I wrong to have expected the data part of 'a' to be a data.frame?

Thanks.

Dan Murphy

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel