On 21/10/2021 2:09 a.m., Jinsong Zhao wrote:
This example has demoed the similar or same characteristics of my question.

If I
  > save(formula, file = "abc.RData")
and then in a new launched R session, I
  > load("abc.RData")
  > formula
x ~ y
<environment: 0x00000000171e4be8>

I want to know what are stored in the <environment: 0x00000000171e4be8>,
and how to access it, or how to save the object without the environment.

Using Henrik's example, the environment would contain all the local variables of the make_formula call. In his case, that's just the "large" variable, but in real examples, it can be quite a few things.

To access it, you can do

e <- environment(formula)

ls(e) # shows just "large"
e$large  # extracts that value

It is possible to save the formula without the environment, but you should *never* do that. That changes the meaning of the formula and is almost certain to lead to bugs in the future.

For example, consider this slightly more complicated example like Henrik's:

make_formula <- function() {
  x <- rnorm(100)
  y <- rnorm(100)
  x ~ y
}
formula <- make_formula()
lm(formula)
#>
#> Call:
#> lm(formula = formula)
#>
#> Coefficients:
#> (Intercept)            y
#>     -0.1584      -0.0805

Here the lm() function finds the variables used in the formula in the formula's attached environment. You'd get a completely different answer (probably wrong) if you removed the environment.

In your real example where the save files are too big, the solution is to find where those RDA objects were created, and make sure there are no unused local variables at the time you return the result. Any local variable that's mentioned in the formula should be kept, but other variables that may have been used to construct them can be removed, e.g.

make_formula <- function() {
  # Create a local variable
  large <- rnorm(100000)

  # Use it to create variables in the formula
  x <- large + 1
  y <- large + rnorm(100000)

  # Remove the temporary one
  rm(large)

  # Return the formula
  x ~ y
}

Duncan Murdoch


Best,
Jinsong

On 2021/10/21 4:06, Henrik Bengtsson wrote:
Example illustrating what Duncan says:

make_formula <- function() { large <- rnorm(1e6); x ~ y }
formula <- make_formula()

# "Apparent" size of object
object.size(formula)
728 bytes

# Actual serialization size
length(serialize(formula, connection = NULL))
[1] 8000203

# A better size estimate
lobstr::obj_size(formula)
8,000,888 B

/Henrik

On Wed, Oct 20, 2021 at 12:57 PM Duncan Murdoch
<murdoch.dun...@gmail.com> wrote:

On 20/10/2021 9:20 a.m., Jinsong Zhao wrote:
On 2021/10/20 21:05, Duncan Murdoch wrote:
On 20/10/2021 8:57 a.m., Jinsong Zhao wrote:
Hi there,

I have a RData file that is obtained by save.image() with size about
74.0 MB (77,608,222 bytes).

When load into R, I measured the size of each object with object.size():

object.size(combn.rda.m)
105448 bytes
object.size(cross)
102064 bytes
object.size(denitr.1)
25032 bytes
object.size(rda.denitr.1)
600280 bytes
object.size(xh)
7792 bytes
object.size(xh.x)
6064 bytes
object.size(xh.x.1)
24144 bytes
object.size(xh.x.2)
24144 bytes
object.size(xh.x.3)
24144 bytes
object.size(xh.y)
2384 bytes

There are all small objects.

If I delete the largest one "rda.denitr.1", and save.image("xx.RData").
It has the size of 22.6 KB (23,244 bytes). All seem OK.

However, when I save(rda.denitr.1, file = "yy.RData"), then it has the
size of 73.9 MB (77,574,869 bytes).

I don't know why...

Any hint?

As the docs for object.size() say, "Exactly which parts of the memory
allocation should be attributed to which object is not clear-cut."  In
particular, if a function or formula has an associated environment, it
isn't included, but it is sometimes saved in the image.

So I'd suspect rda.denitr.1 contains something that references an
environment, and it's an environment that would be saved.  (I forget the
exact rules, but I think that means it's not the global environment and
it's not a package environment.)

Duncan Murdoch


The rda.denitr.1 is only a list with length 2:
rda.denitr.1[[1]] is a vector with length 10;
rda.denitr.2[[2]] is a list with the length 10. rda.denitr.1[[2]][[1]]
to rda.denitr.1[[2]][[10]] are small RDA objects generated by rda() from
vegan package.

If I
    > a <- rda.denitr.1[[2]][[1]]
    > object.size(a)
59896 bytes
    > save(a, file = "abc.RData")
It also has a large size of 73.9 MB (77,536,611 bytes)

Jinsong


The rda() function uses formulas.  If it saves the formula in the
result, then it references the environment of that formula, typically
the environment where the formula was created.

Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to