On Sun, 11 Jul 2010, Tony Plate wrote:

Another way of seeing the environments referenced in an object is using str(), e.g.:

f1 <- function() {
+ junk <- rnorm(10000000)
+ x <- 1:3
+ y <- rnorm(3)
+ lm(y ~ x)
+ }
v1 <- f1()
object.size(f1)
1636 bytes
grep("Environment", capture.output(str(v1)), value=TRUE)
[1] "  .. ..- attr(*, \".Environment\")=<environment: 0x01f11a30> "
[2] "  .. .. ..- attr(*, \".Environment\")=<environment: 0x01f11a30> "

'Some of the environments in a few cases': remember environments have environments (and so on), and that namespaces and packages are also environments. So we need to know about the environment of environment(v1$terms), which also gets saved (either as a reference or as an environment, depending on what it is).

And this approach does not work for many of the commonest cases:

f <- function() {
+ x <- pi
+ g <- function() print(x)
+ return(g)
+ }
g <- f()
str(g)
function ()
 - attr(*, "source")= chr "function() print(x)"
ls(environment(g))
[1] "g" "x"

In fact I think it works only for formulae.

-- Tony Plate

On 7/10/2010 10:10 PM, bill.venab...@csiro.au wrote:
Well, I have answered one of my questions below.  The hidden
environment is attached to the 'terms' component of v1.

Well, not really hidden. A terms component is a formula (see ?terms.object), and a formula has an environment just as a closure does. In neither case does the print() method tell you about it -- but ?formula does.

To see this


lapply(v1, environment)

$coefficients
NULL

$residuals
NULL

$effects
NULL

$rank
NULL

$fitted.values
NULL

$assign
NULL

$qr
NULL

$df.residual
NULL

$xlevels
NULL

$call
NULL

$terms
<environment: 0x021b9e18>

$model
NULL


rm(junk, envir = with(v1, environment(terms)))
usedVcells()

[1] 96532



This is still a bit of a trap for young (and old!) players...

I think the main point in my mind is why is it that object.size()
excludes enclosing environments in its reckonings?

Bill Venables.

-----Original Message-----
From: Venables, Bill (CMIS, Cleveland)
Sent: Sunday, 11 July 2010 11:40 AM
To: 'Duncan Murdoch'; 'Paul Johnson'
Cc: 'r-devel@r-project.org'; Taylor, Julian (CMIS, Waite Campus)
Subject: RE: [Rd] Large discrepancies in the same object being saved to .RData

I'm still a bit puzzled by the original question.  I don't think it
has much to do with .RData files and their sizes.  For me the puzzle
comes much earlier.  Here is an example of what I mean using a little
session


usedVcells<- function() gc()["Vcells", "used"]
usedVcells()        ### the base load

[1] 96345

### Now look at what happens when a function returns a formula as the
### value, with a big item floating around in the function closure:


f0<- function() {

+ junk<- rnorm(10000000)
+ y ~ x
+ }

v0<- f0()
usedVcells()   ### much bigger than base, why?

[1] 10096355

v0             ### no obvious envirnoment

y ~ x

object.size(v0)  ### so far, no clue given where

                    ### the extra Vcells are located.
372 bytes

### Does v0 have an enclosing environment?


environment(v0)             ### yep.

<environment: 0x021cc538>

ls(envir = environment(v0)) ### as expected, there's the junk

[1] "junk"

rm(junk, envir = environment(v0))  ### this does the trick.
usedVcells()

[1] 96355

### Now consider a second example where the object
### is not a formula, but contains one.


f1<- function() {

+ junk<- rnorm(10000000)
+ x<- 1:3
+ y<- rnorm(3)
+ lm(y ~ x)
+ }


v1<- f1()
usedVcells()  ### as might have been expected.

[1] 10096455

### in this case, though, there is no
### (obvious) enclosing environment


environment(v1)

NULL

object.size(v1)  ### so where are the junk Vcells located?

7744 bytes

ls(envir = environment(v1))  ### clearly wil not work

Error in ls(envir = environment(v1)) : invalid 'envir' argument


rm(v1)     ### removing the object does clear out the junk.
usedVcells()

[1] 96366


And in this second case, as noted by Julian Taylor, if you save() the
object the .RData file is also huge.  There is an environment attached
to the object somewhere, but it appears to be occluded and entirely
inaccessible.  (I have poked around the object components trying to
find the thing but without success.)

Have I missed something?

Bill Venables.

-----Original Message-----
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Duncan Murdoch
Sent: Sunday, 11 July 2010 10:36 AM
To: Paul Johnson
Cc: r-devel@r-project.org
Subject: Re: [Rd] Large discrepancies in the same object being saved to .RData

On 10/07/2010 2:33 PM, Paul Johnson wrote:

On Wed, Jul 7, 2010 at 7:12 AM, Duncan Murdoch<murdoch.dun...@gmail.com> wrote:


On 06/07/2010 9:04 PM, julian.tay...@csiro.au wrote:


Hi developers,



After some investigation I have found there can be large discrepancies in the same object being saved as an external "xx.RData" file. The immediate repercussion of this is the possible increased size of your .RData workspace
for no apparent reason.





I haven't worked through your example, but in general the way that local
objects get captured is when part of the return value includes an
environment.


Hi, can I ask a follow up question?

Is there a tool to browse *.Rdata files without loading them into R?


I don't know of one.  You can load the whole file into an empty
environment, but then you lose information about "where did it come from"?

Duncan Murdoch

In HDF5 (a data storage format we use sometimes), there is a CLI
program "h5dump" that will spit out line-by-line all the contents of a
storage entity.  It will literally track through all the metadata, all
the vectors of scores, etc.  I've found that handy to "see what's
really  in there" in cases like the one that OP asked about.
Sometimes, we find that there are things that are "in there" by
mistake, as Duncan describes, and then we can try to figure why they
are in there.

pj




______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to