Re: [Rd] how to control the environment of a formula
thanks. yes, I was considering to use as.character(f) but your solution 2 is much better -- did not know ' was a R function as well. just checked: model.frame does not get confused and this will be used to evaluate formula by all functions in my packages. however, there could be related problems with memory. I noticed that some of my processes use unexpectedly much memory. how can one trace this? I am not desperate to save diskspace: the problem is that file transfer and sharing (like dropbox) suffer when each simulation results fills 8M instead of 130K just because a large data set is invisibly sitting in the saved file. Duncan Murdoch writes: > On 13-04-19 2:57 PM, Thomas Alexander Gerds wrote: >> hmm. I have tested a bit more, and found this perhaps more difficult >> solve situation. even though I delete x, since x is part of the >> output of the formula, the size of the object is twice as much as it >> should be: >> test <- function(x){ x <- rnorm(100) out <- list(x=x) rm(x) >> out$f <- as.formula(a~b) out } v <- test(1) x <- rnorm(100) >> save(v,file="~/tmp/v.rda") save(x,file="~/tmp/x.rda") system("ls >> -lah ~/tmp/*.rda") >> -rw-rw-r-- 1 tag tag 15M Apr 19 20:52 /home/tag/tmp/v.rda -rw-rw-r-- >> 1 tag tag 7,4M Apr 19 20:52 /home/tag/tmp/x.rda >> can you solve this as well? > > Yes, this is tricky. The problem is that "out" is in the environment > of out$f, so you get two copies when you save it. (I think you won't > have two copies in memory, because R only makes a copy when it needs > to, but I haven't traced this.) > > Here are two solutions, both have some problems. > > 1. Don't put out in the environment: > > test <- function(x) { x <- rnorm(100) out$x <- list(x=x) out$f <- > a ~ b # the as.formula() was never needed # temporarily create a new > environment local({ # get a copy of what you want to keep out <- out # > remove everything that you don't need from the formula rm(list=c("x", > "out"), envir=environment(out$f)) # return the local copy out }) } > > I don't like this because it is too tricky, but you could probably > wrap the tricky bits into a little function (a variant on return() > that cleans out the environment first), so it's probably what I would > use if I was desperate to save space in saved copies. > > 2. Never evaluate the formula in the first place, so it doesn't pick > up the environment: > > test <- function(x) { x <- rnorm(100) out$x <- list(x=x) out$f <- > quote(a ~ b) out } > > This is a lot simpler, but it might not work with some modelling > functions, which would be confused by receiving the model formula > unevaluated. It also has the problems that you get with using > .GlobalEnv as the environment of the formula, but maybe to a slightly > lesser extent: rather than having what is possibly the wrong > environment, it doesn't have one at all. > > Duncan Murdoch > >> thanks! thomas >> Duncan Murdoch writes: >> >>> On 13-04-18 11:39 AM, Thomas Alexander Gerds wrote: Dear Duncan thank you for taking the time to answer my questions! It will be quite some work to delete all the objects generated inside the function ... but if there is no other way to avoid a large environment then this is what I will do. >>> It's not really that hard. Use names <- ls() in the function to >>> get a list of all of them; remove the names of variables that might >>> be needed in the formula (and the name of the formula itself); then >>> use rm(list=names) to delete everything else just before returning >>> it. >>> Duncan Murdoch >>> -- Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics Copenhagen University of Copenhagen, Oester Farimagsgade 5, 1014 Copenhagen, Denmark __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] how to control the environment of a formula
On Sat, Apr 20, 2013 at 1:44 PM, Duncan Murdoch wrote: > On 13-04-19 2:57 PM, Thomas Alexander Gerds wrote: >> >> >> hmm. I have tested a bit more, and found this perhaps more difficult >> solve situation. even though I delete x, since x is part of the output >> of the formula, the size of the object is twice as much as it should be: >> >> test <- function(x){ >>x <- rnorm(100) >>out <- list(x=x) >>rm(x) >>out$f <- as.formula(a~b) >>out >> } >> v <- test(1) >> x <- rnorm(100) >> save(v,file="~/tmp/v.rda") >> save(x,file="~/tmp/x.rda") >> system("ls -lah ~/tmp/*.rda") >> >> -rw-rw-r-- 1 tag tag 15M Apr 19 20:52 /home/tag/tmp/v.rda >> -rw-rw-r-- 1 tag tag 7,4M Apr 19 20:52 /home/tag/tmp/x.rda >> >> can you solve this as well? > > > Yes, this is tricky. The problem is that "out" is in the environment of > out$f, so you get two copies when you save it. (I think you won't have two > copies in memory, because R only makes a copy when it needs to, but I > haven't traced this.) > > Here are two solutions, both have some problems. > > 1. Don't put out in the environment: > > > test <- function(x) { > x <- rnorm(100) > out$x <- list(x=x) > out$f <- a ~ b# the as.formula() was never needed > # temporarily create a new environment > local({ > # get a copy of what you want to keep > out <- out > # remove everything that you don't need from the formula > rm(list=c("x", "out"), envir=environment(out$f)) > # return the local copy > out > }) > } > > I don't like this because it is too tricky, but you could probably wrap the > tricky bits into a little function (a variant on return() that cleans out > the environment first), so it's probably what I would use if I was desperate > to save space in saved copies. > > 2. Never evaluate the formula in the first place, so it doesn't pick up the > environment: > > > test <- function(x) { > x <- rnorm(100) > out$x <- list(x=x) > out$f <- quote(a ~ b) > out > } > > This is a lot simpler, but it might not work with some modelling functions, > which would be confused by receiving the model formula unevaluated. It also > has the problems that you get with using .GlobalEnv as the environment of > the formula, but maybe to a slightly lesser extent: rather than having what > is possibly the wrong environment, it doesn't have one at all. An approach along the lines of Duncan's last solution that works with lm but may or may not work with other regression-style functions is to use a character string: fit <- lm("demand ~ Time", BOD) As long as you are only saving the input you should be OK but if you are saving the output of lm then you are back to the same problem since the "lm" object will contain a formula. > class(formula(fit)) [1] "formula" __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] how to control the environment of a formula
On 13-04-19 2:57 PM, Thomas Alexander Gerds wrote: hmm. I have tested a bit more, and found this perhaps more difficult solve situation. even though I delete x, since x is part of the output of the formula, the size of the object is twice as much as it should be: test <- function(x){ x <- rnorm(100) out <- list(x=x) rm(x) out$f <- as.formula(a~b) out } v <- test(1) x <- rnorm(100) save(v,file="~/tmp/v.rda") save(x,file="~/tmp/x.rda") system("ls -lah ~/tmp/*.rda") -rw-rw-r-- 1 tag tag 15M Apr 19 20:52 /home/tag/tmp/v.rda -rw-rw-r-- 1 tag tag 7,4M Apr 19 20:52 /home/tag/tmp/x.rda can you solve this as well? Yes, this is tricky. The problem is that "out" is in the environment of out$f, so you get two copies when you save it. (I think you won't have two copies in memory, because R only makes a copy when it needs to, but I haven't traced this.) Here are two solutions, both have some problems. 1. Don't put out in the environment: test <- function(x) { x <- rnorm(100) out$x <- list(x=x) out$f <- a ~ b# the as.formula() was never needed # temporarily create a new environment local({ # get a copy of what you want to keep out <- out # remove everything that you don't need from the formula rm(list=c("x", "out"), envir=environment(out$f)) # return the local copy out }) } I don't like this because it is too tricky, but you could probably wrap the tricky bits into a little function (a variant on return() that cleans out the environment first), so it's probably what I would use if I was desperate to save space in saved copies. 2. Never evaluate the formula in the first place, so it doesn't pick up the environment: test <- function(x) { x <- rnorm(100) out$x <- list(x=x) out$f <- quote(a ~ b) out } This is a lot simpler, but it might not work with some modelling functions, which would be confused by receiving the model formula unevaluated. It also has the problems that you get with using .GlobalEnv as the environment of the formula, but maybe to a slightly lesser extent: rather than having what is possibly the wrong environment, it doesn't have one at all. Duncan Murdoch thanks! thomas Duncan Murdoch writes: On 13-04-18 11:39 AM, Thomas Alexander Gerds wrote: Dear Duncan thank you for taking the time to answer my questions! It will be quite some work to delete all the objects generated inside the function ... but if there is no other way to avoid a large environment then this is what I will do. It's not really that hard. Use names <- ls() in the function to get a list of all of them; remove the names of variables that might be needed in the formula (and the name of the formula itself); then use rm(list=names) to delete everything else just before returning it. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] how to control the environment of a formula
hmm. I have tested a bit more, and found this perhaps more difficult solve situation. even though I delete x, since x is part of the output of the formula, the size of the object is twice as much as it should be: test <- function(x){ x <- rnorm(100) out <- list(x=x) rm(x) out$f <- as.formula(a~b) out } v <- test(1) x <- rnorm(100) save(v,file="~/tmp/v.rda") save(x,file="~/tmp/x.rda") system("ls -lah ~/tmp/*.rda") -rw-rw-r-- 1 tag tag 15M Apr 19 20:52 /home/tag/tmp/v.rda -rw-rw-r-- 1 tag tag 7,4M Apr 19 20:52 /home/tag/tmp/x.rda can you solve this as well? thanks! thomas Duncan Murdoch writes: > On 13-04-18 11:39 AM, Thomas Alexander Gerds wrote: >> Dear Duncan >> thank you for taking the time to answer my questions! It will be >> quite some work to delete all the objects generated inside the >> function ... but if there is no other way to avoid a large >> environment then this is what I will do. > > It's not really that hard. Use names <- ls() in the function to get a > list of all of them; remove the names of variables that might be > needed in the formula (and the name of the formula itself); then use > rm(list=names) to delete everything else just before returning it. > > Duncan Murdoch > -- Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics Copenhagen University of Copenhagen, Oester Farimagsgade 5, 1014 Copenhagen, Denmark __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] how to control the environment of a formula
Duncan, I stand by all my comments. Well behaved function -- those that look only at their input arguments -- do just fine with a simple env. Now as to formulas --- the part of R that has most aggressively messed with normal evaluation rules. It is quite possible that there is/was no other way to implement their functionality set, so I'm not throwing rocks at that. However, as soon as they enter the scene the consequences multiply like rabbits and I feel like I've fallen into a hall of mirrors. Nothing else has caused me as much ongoing confusion and wonderment in the survival package. As soon as you introduced them all my arguments are irrelevant. Terry T On 4/19/13 9:05 AM, "Duncan Murdoch" wrote: >On 13-04-19 8:41 AM, Therneau, Terry M., Ph.D. wrote: >> I went through the same problem and discovery process 2 years ago >>with the survival package. With pspline() terms the return object from >>coxph includes a simple 6 line function for enhanced printout, which by >>default carried along another 30 irrelevant things some of which were >>huge. >> I personally think that setting environment(f) <- .Globalenv is the >>clearest and most simple solution. >> Note that R does not save the environment of functions defined at the >>top level; the prior line says to treat your function as "one of those". >> It works very well as long as your function is an actual function, >>i.e. It depends only on its input arguments. >> >> \begin {opinion} >>S started out as a pure functional language. That is, a function >>depends ONLY on its arguments. Many of the strengths of S/R flow >>directly from the simplicity and rigor that this gives. >> There is an adage in programming, going back to at least the earliest >>Fortran compilers, that all successful languages have a way to break >>their own rules; and S indeed had some hidden workarounds. Formalizing >>these non-functional back doors as R has done with environments is a >>good thing. >> >> However, the back doors should be used only with extreme reluctance. I >>cringe at each new "how to be sneaky" discussion on the mailing lists. >>The 'solution' is rarely worth the long term price. >> \end{opinion} > >Hmmm, it seems to me that your first paragraph contradicts your opinion. > If you set the environment of a formula to .GlobalEnv then suddenly >the way that formula acts depends on all sorts of things that weren't >there when it was created. > >Attaching the formula at the time of creation of a formula means that >the names within it refer to data that is currently in scope. That's >generally a good thing. It means that code will act the same when you >run it at the top level or in a function. > >For example, consider this: > >f <- function() { >x <- 1:10 >x2 <- x^2 >y <- rnorm(10, mean=x2) >formula <- y ~ x + x2 >formula >} > >fit <- lm(f()) >update(fit, . ~ . - x) > > >This code works fine, all because the formula keeps the environment >where it was created. If I modify it like this: > >f <- function() { >x <- 1:10 >x2 <- x^2 >y <- rnorm(10, mean=x2) >formula <- y ~ x + x2 >environment(formula) <- .GlobalEnv >formula >} > >fit <- lm(f()) >update(fit, . ~ . - x) > > >then I really have no idea what it will produce, because it depends on >global variables y, x and x2, not the local ones created in the >function. If I'm lucky, I'll get an "object not found" error; if I'm >not lucky, it'll just go find some other variables and use those. > >Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] how to control the environment of a formula
On 13-04-19 8:41 AM, Therneau, Terry M., Ph.D. wrote: I went through the same problem and discovery process 2 years ago with the survival package. With pspline() terms the return object from coxph includes a simple 6 line function for enhanced printout, which by default carried along another 30 irrelevant things some of which were huge. I personally think that setting environment(f) <- .Globalenv is the clearest and most simple solution. Note that R does not save the environment of functions defined at the top level; the prior line says to treat your function as "one of those". It works very well as long as your function is an actual function, i.e. It depends only on its input arguments. \begin {opinion} S started out as a pure functional language. That is, a function depends ONLY on its arguments. Many of the strengths of S/R flow directly from the simplicity and rigor that this gives. There is an adage in programming, going back to at least the earliest Fortran compilers, that all successful languages have a way to break their own rules; and S indeed had some hidden workarounds. Formalizing these non-functional back doors as R has done with environments is a good thing. However, the back doors should be used only with extreme reluctance. I cringe at each new "how to be sneaky" discussion on the mailing lists. The 'solution' is rarely worth the long term price. \end{opinion} Hmmm, it seems to me that your first paragraph contradicts your opinion. If you set the environment of a formula to .GlobalEnv then suddenly the way that formula acts depends on all sorts of things that weren't there when it was created. Attaching the formula at the time of creation of a formula means that the names within it refer to data that is currently in scope. That's generally a good thing. It means that code will act the same when you run it at the top level or in a function. For example, consider this: f <- function() { x <- 1:10 x2 <- x^2 y <- rnorm(10, mean=x2) formula <- y ~ x + x2 formula } fit <- lm(f()) update(fit, . ~ . - x) This code works fine, all because the formula keeps the environment where it was created. If I modify it like this: f <- function() { x <- 1:10 x2 <- x^2 y <- rnorm(10, mean=x2) formula <- y ~ x + x2 environment(formula) <- .GlobalEnv formula } fit <- lm(f()) update(fit, . ~ . - x) then I really have no idea what it will produce, because it depends on global variables y, x and x2, not the local ones created in the function. If I'm lucky, I'll get an "object not found" error; if I'm not lucky, it'll just go find some other variables and use those. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] how to control the environment of a formula
On 13-04-18 11:39 AM, Thomas Alexander Gerds wrote: Dear Duncan thank you for taking the time to answer my questions! It will be quite some work to delete all the objects generated inside the function ... but if there is no other way to avoid a large environment then this is what I will do. It's not really that hard. Use names <- ls() in the function to get a list of all of them; remove the names of variables that might be needed in the formula (and the name of the formula itself); then use rm(list=names) to delete everything else just before returning it. Duncan Murdoch Cheers Thomas Duncan Murdoch writes: On 13-04-18 1:09 AM, Thomas Alexander Gerds wrote: Dear List I have experienced that objects generated with one of my packages used a lot of space when saved on disc (object.size did not show this!). some debugging revealed that formula and call objects carried the full environment of subroutines along, including even stuff not needed by the formula or call. here is a sketch of the problem , | test <- function(x){ x <- rnorm(100) out <- list() out$f <- | a~b out } v <- test(1) save(v,file="~/tmp/v.rda") system("ls -lah | ~/tmp/v.rda") | -rw-rw-r-- 1 tag tag 7,4M Apr 18 06:41 /home/tag/tmp/v.rda ` I tried to replace line 3 by , | as.formula(a~b,env=emptyenv()) or as.formula(a~b,env=NULL) ` without the desired effect. Instead adding either , | environment(out$f) <- emptyenv() or environment(out$f) <- NULL ` has the desired effect (i.e. the saved object size is shrunken). unfortunately there is a new problem: , | test <- function(x){ x <- rnorm(100) out <- list() out$f <- | a~b environment(out$f) <- emptyenv() out } d <- | data.frame(a=1,b=1) v <- test(1) model.frame(v$f,data=d) | Error in eval(expr, envir, enclos) : could not find function | "list" ` Same with NULL in place of emptyenv() Finally using .GlobalEnv in place of emptyenv() seems to remove both problems. But it will cause other, less obvious problems. In a formula, the symbols mean something. By setting the environment to .GlobalEnv you're changing the meaning. You'll get nonsense in certain cases when functions look up the meaning of those symbols and find the wrong thing. (I don't have an example at hand, but I imagine it would be easy to put one together with update().) My questions: 1) why does the argument env of as.formula have no effect? Because the first argument already had an associated environment. You passed a ~ b, which is evaluated to a formula; calling as.formula on a formula does nothing. The env argument is only used when a new formula needs to be constructed. (You can see this in the source code; as.formula is a very simple function.) 2) is there a better way to tell formula not to copy unrelated stuff into the associated environment? Yes, delete it. For example, you could write your function as test <- function(x){ x <- rnorm(100) out <- list() out$f <- a~b rm(x) out } 3) why does object.size not show the size of the environments that formulas can carry along? Because many objects can share the same environment. See ?object.size for more details. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] how to control the environment of a formula
I went through the same problem and discovery process 2 years ago with the survival package. With pspline() terms the return object from coxph includes a simple 6 line function for enhanced printout, which by default carried along another 30 irrelevant things some of which were huge. I personally think that setting environment(f) <- .Globalenv is the clearest and most simple solution. Note that R does not save the environment of functions defined at the top level; the prior line says to treat your function as "one of those". It works very well as long as your function is an actual function, i.e. It depends only on its input arguments. \begin {opinion} S started out as a pure functional language. That is, a function depends ONLY on its arguments. Many of the strengths of S/R flow directly from the simplicity and rigor that this gives. There is an adage in programming, going back to at least the earliest Fortran compilers, that all successful languages have a way to break their own rules; and S indeed had some hidden workarounds. Formalizing these non-functional back doors as R has done with environments is a good thing. However, the back doors should be used only with extreme reluctance. I cringe at each new "how to be sneaky" discussion on the mailing lists. The 'solution' is rarely worth the long term price. \end{opinion} Terry Therneau [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] how to control the environment of a formula
Dear Duncan thank you for taking the time to answer my questions! It will be quite some work to delete all the objects generated inside the function ... but if there is no other way to avoid a large environment then this is what I will do. Cheers Thomas Duncan Murdoch writes: > On 13-04-18 1:09 AM, Thomas Alexander Gerds wrote: >> Dear List >> I have experienced that objects generated with one of my packages >> used a lot of space when saved on disc (object.size did not show >> this!). >> some debugging revealed that formula and call objects carried the >> full environment of subroutines along, including even stuff not >> needed by the formula or call. here is a sketch of the problem >> , >> | test <- function(x){ x <- rnorm(100) out <- list() out$f <- >> | a~b out } v <- test(1) save(v,file="~/tmp/v.rda") system("ls -lah >> | ~/tmp/v.rda") >> | -rw-rw-r-- 1 tag tag 7,4M Apr 18 06:41 /home/tag/tmp/v.rda >> ` >> I tried to replace line 3 by >> , >> | as.formula(a~b,env=emptyenv()) or as.formula(a~b,env=NULL) >> ` >> without the desired effect. Instead adding either >> , >> | environment(out$f) <- emptyenv() or environment(out$f) <- NULL >> ` >> has the desired effect (i.e. the saved object size is >> shrunken). unfortunately there is a new problem: >> , >> | test <- function(x){ x <- rnorm(100) out <- list() out$f <- >> | a~b environment(out$f) <- emptyenv() out } d <- >> | data.frame(a=1,b=1) v <- test(1) model.frame(v$f,data=d) >> | Error in eval(expr, envir, enclos) : could not find function >> | "list" >> ` >> Same with NULL in place of emptyenv() >> Finally using .GlobalEnv in place of emptyenv() seems to remove both >> problems. > > But it will cause other, less obvious problems. In a formula, the > symbols mean something. By setting the environment to .GlobalEnv > you're changing the meaning. You'll get nonsense in certain cases > when functions look up the meaning of those symbols and find the wrong > thing. (I don't have an example at hand, but I imagine it would be > easy to put one together with update().) > >> My questions: >> 1) why does the argument env of as.formula have no effect? > > Because the first argument already had an associated environment. You > passed a ~ b, which is evaluated to a formula; calling as.formula on a > formula does nothing. The env argument is only used when a new formula > needs to be constructed. (You can see this in the source code; > as.formula is a very simple function.) > >> 2) is there a better way to tell formula not to copy unrelated stuff >> into the associated environment? > > Yes, delete it. For example, you could write your function as > > test <- function(x){ x <- rnorm(100) out <- list() out$f <- a~b > rm(x) out } > >> 3) why does object.size not show the size of the environments that >> formulas can carry along? > > Because many objects can share the same environment. See ?object.size > for more details. > > Duncan Murdoch -- Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics Copenhagen University of Copenhagen, Oester Farimagsgade 5, 1014 Copenhagen, Denmark __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] how to control the environment of a formula
On 13-04-18 1:09 AM, Thomas Alexander Gerds wrote: Dear List I have experienced that objects generated with one of my packages used a lot of space when saved on disc (object.size did not show this!). some debugging revealed that formula and call objects carried the full environment of subroutines along, including even stuff not needed by the formula or call. here is a sketch of the problem , | test <- function(x){ | x <- rnorm(100) | out <- list() | out$f <- a~b | out | } | v <- test(1) | save(v,file="~/tmp/v.rda") | system("ls -lah ~/tmp/v.rda") | | -rw-rw-r-- 1 tag tag 7,4M Apr 18 06:41 /home/tag/tmp/v.rda ` I tried to replace line 3 by , | as.formula(a~b,env=emptyenv()) | or | as.formula(a~b,env=NULL) ` without the desired effect. Instead adding either , | environment(out$f) <- emptyenv() | or | environment(out$f) <- NULL ` has the desired effect (i.e. the saved object size is shrunken). unfortunately there is a new problem: , | test <- function(x){ | x <- rnorm(100) | out <- list() | out$f <- a~b | environment(out$f) <- emptyenv() | out | } | d <- data.frame(a=1,b=1) | v <- test(1) | model.frame(v$f,data=d) | | Error in eval(expr, envir, enclos) : could not find function "list" ` Same with NULL in place of emptyenv() Finally using .GlobalEnv in place of emptyenv() seems to remove both problems. But it will cause other, less obvious problems. In a formula, the symbols mean something. By setting the environment to .GlobalEnv you're changing the meaning. You'll get nonsense in certain cases when functions look up the meaning of those symbols and find the wrong thing. (I don't have an example at hand, but I imagine it would be easy to put one together with update().) My questions: 1) why does the argument env of as.formula have no effect? Because the first argument already had an associated environment. You passed a ~ b, which is evaluated to a formula; calling as.formula on a formula does nothing. The env argument is only used when a new formula needs to be constructed. (You can see this in the source code; as.formula is a very simple function.) 2) is there a better way to tell formula not to copy unrelated stuff into the associated environment? Yes, delete it. For example, you could write your function as test <- function(x){ x <- rnorm(100) out <- list() out$f <- a~b rm(x) out } 3) why does object.size not show the size of the environments that formulas can carry along? Because many objects can share the same environment. See ?object.size for more details. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] how to control the environment of a formula
Dear List I have experienced that objects generated with one of my packages used a lot of space when saved on disc (object.size did not show this!). some debugging revealed that formula and call objects carried the full environment of subroutines along, including even stuff not needed by the formula or call. here is a sketch of the problem , | test <- function(x){ | x <- rnorm(100) | out <- list() | out$f <- a~b | out | } | v <- test(1) | save(v,file="~/tmp/v.rda") | system("ls -lah ~/tmp/v.rda") | | -rw-rw-r-- 1 tag tag 7,4M Apr 18 06:41 /home/tag/tmp/v.rda ` I tried to replace line 3 by , | as.formula(a~b,env=emptyenv()) | or | as.formula(a~b,env=NULL) ` without the desired effect. Instead adding either , | environment(out$f) <- emptyenv() | or | environment(out$f) <- NULL ` has the desired effect (i.e. the saved object size is shrunken). unfortunately there is a new problem: , | test <- function(x){ | x <- rnorm(100) | out <- list() | out$f <- a~b | environment(out$f) <- emptyenv() | out | } | d <- data.frame(a=1,b=1) | v <- test(1) | model.frame(v$f,data=d) | | Error in eval(expr, envir, enclos) : could not find function "list" ` Same with NULL in place of emptyenv() Finally using .GlobalEnv in place of emptyenv() seems to remove both problems. My questions: 1) why does the argument env of as.formula have no effect? 2) is there a better way to tell formula not to copy unrelated stuff into the associated environment? 3) why does object.size not show the size of the environments that formulas can carry along? Regards Thomas -- Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics University of Copenhagen, Øster Farimagsgade 5, 1014 Copenhagen, Denmark Office: CSS-15.2.07 (Gamle Kommunehospital) tel: 35327914 (sec: 35327901) -- -- Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics University of Copenhagen, Øster Farimagsgade 5, 1014 Copenhagen, Denmark Office: CSS-15.2.07 (Gamle Kommunehospital) tel: 35327914 (sec: 35327901) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel