Re: [Rd] how to control the environment of a formula

2013-04-20 Thread Thomas Alexander Gerds

thanks. yes, I was considering to use as.character(f) but your solution
2 is much better -- did not know ' was a R function as well. just
checked: model.frame does not get confused and this will be used to
evaluate formula by all functions in my packages.

however, there could be related problems with memory. I noticed that
some of my processes use unexpectedly much memory. how can one trace
this?

I am not desperate to save diskspace: the problem is that file transfer
and sharing (like dropbox) suffer when each simulation results fills 8M
instead of 130K just because a large data set is invisibly sitting in
the saved file.

Duncan Murdoch  writes:

> On 13-04-19 2:57 PM, Thomas Alexander Gerds wrote:
>> hmm. I have tested a bit more, and found this perhaps more difficult
>> solve situation. even though I delete x, since x is part of the
>> output of the formula, the size of the object is twice as much as it
>> should be:
>> test <- function(x){ x <- rnorm(100) out <- list(x=x) rm(x)
>> out$f <- as.formula(a~b) out } v <- test(1) x <- rnorm(100)
>> save(v,file="~/tmp/v.rda") save(x,file="~/tmp/x.rda") system("ls
>> -lah ~/tmp/*.rda")
>> -rw-rw-r-- 1 tag tag 15M Apr 19 20:52 /home/tag/tmp/v.rda -rw-rw-r--
>> 1 tag tag 7,4M Apr 19 20:52 /home/tag/tmp/x.rda
>> can you solve this as well?
>
> Yes, this is tricky.  The problem is that "out" is in the environment
> of out$f, so you get two copies when you save it.  (I think you won't
> have two copies in memory, because R only makes a copy when it needs
> to, but I haven't traced this.)
>
> Here are two solutions, both have some problems.
>
> 1.  Don't put out in the environment:
>
> test <- function(x) { x <- rnorm(100) out$x <- list(x=x) out$f <-
> a ~ b # the as.formula() was never needed # temporarily create a new
> environment local({ # get a copy of what you want to keep out <- out #
> remove everything that you don't need from the formula rm(list=c("x",
> "out"), envir=environment(out$f)) # return the local copy out }) }
>
> I don't like this because it is too tricky, but you could probably
> wrap the tricky bits into a little function (a variant on return()
> that cleans out the environment first), so it's probably what I would
> use if I was desperate to save space in saved copies.
>
> 2. Never evaluate the formula in the first place, so it doesn't pick
> up the environment:
>
> test <- function(x) { x <- rnorm(100) out$x <- list(x=x) out$f <-
> quote(a ~ b) out }
>
> This is a lot simpler, but it might not work with some modelling
> functions, which would be confused by receiving the model formula
> unevaluated.  It also has the problems that you get with using
> .GlobalEnv as the environment of the formula, but maybe to a slightly
> lesser extent: rather than having what is possibly the wrong
> environment, it doesn't have one at all.
>
> Duncan Murdoch
>
>> thanks!  thomas
>> Duncan Murdoch  writes:
>>
>>> On 13-04-18 11:39 AM, Thomas Alexander Gerds wrote:
 Dear Duncan thank you for taking the time to answer my questions!
 It will be quite some work to delete all the objects generated
 inside the function ... but if there is no other way to avoid a
 large environment then this is what I will do.
>>> It's not really that hard.  Use names <- ls() in the function to
>>> get a list of all of them; remove the names of variables that might
>>> be needed in the formula (and the name of the formula itself); then
>>> use rm(list=names) to delete everything else just before returning
>>> it.
>>> Duncan Murdoch
>>>

-- 
Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics Copenhagen
University of Copenhagen, Oester Farimagsgade 5, 1014 Copenhagen, Denmark

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to control the environment of a formula

2013-04-20 Thread Gabor Grothendieck
On Sat, Apr 20, 2013 at 1:44 PM, Duncan Murdoch
 wrote:
> On 13-04-19 2:57 PM, Thomas Alexander Gerds wrote:
>>
>>
>> hmm. I have tested a bit more, and found this perhaps more difficult
>> solve situation. even though I delete x, since x is part of the output
>> of the formula, the size of the object is twice as much as it should be:
>>
>> test <- function(x){
>>x <- rnorm(100)
>>out <- list(x=x)
>>rm(x)
>>out$f <- as.formula(a~b)
>>out
>> }
>> v <- test(1)
>> x <- rnorm(100)
>> save(v,file="~/tmp/v.rda")
>> save(x,file="~/tmp/x.rda")
>> system("ls -lah ~/tmp/*.rda")
>>
>> -rw-rw-r-- 1 tag tag  15M Apr 19 20:52 /home/tag/tmp/v.rda
>> -rw-rw-r-- 1 tag tag 7,4M Apr 19 20:52 /home/tag/tmp/x.rda
>>
>> can you solve this as well?
>
>
> Yes, this is tricky.  The problem is that "out" is in the environment of
> out$f, so you get two copies when you save it.  (I think you won't have two
> copies in memory, because R only makes a copy when it needs to, but I
> haven't traced this.)
>
> Here are two solutions, both have some problems.
>
> 1.  Don't put out in the environment:
>
>
> test <- function(x) {
>   x <- rnorm(100)
>   out$x <- list(x=x)
>   out$f <- a ~ b# the as.formula() was never needed
>   # temporarily create a new environment
>   local({
> # get a copy of what you want to keep
> out <- out
> # remove everything that you don't need from the formula
> rm(list=c("x", "out"), envir=environment(out$f))
> # return the local copy
> out
>   })
> }
>
> I don't like this because it is too tricky, but you could probably wrap the
> tricky bits into a little function (a variant on return() that cleans out
> the environment first), so it's probably what I would use if I was desperate
> to save space in saved copies.
>
> 2. Never evaluate the formula in the first place, so it doesn't pick up the
> environment:
>
>
> test <- function(x) {
>   x <- rnorm(100)
>   out$x <- list(x=x)
>   out$f <- quote(a ~ b)
>   out
> }
>
> This is a lot simpler, but it might not work with some modelling functions,
> which would be confused by receiving the model formula unevaluated.  It also
> has the problems that you get with using .GlobalEnv as the environment of
> the formula, but maybe to a slightly lesser extent:  rather than having what
> is possibly the wrong environment, it doesn't have one at all.

An approach along the lines of Duncan's last solution that works with
lm but may or may not work with other regression-style functions is to
use a character string:

fit <- lm("demand ~ Time", BOD)

As long as you are only saving the input you should be OK but if you
are saving the output of lm then you are back to the same problem
since the "lm" object will contain a formula.

> class(formula(fit))
[1] "formula"

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to control the environment of a formula

2013-04-20 Thread Duncan Murdoch

On 13-04-19 2:57 PM, Thomas Alexander Gerds wrote:


hmm. I have tested a bit more, and found this perhaps more difficult
solve situation. even though I delete x, since x is part of the output
of the formula, the size of the object is twice as much as it should be:

test <- function(x){
   x <- rnorm(100)
   out <- list(x=x)
   rm(x)
   out$f <- as.formula(a~b)
   out
}
v <- test(1)
x <- rnorm(100)
save(v,file="~/tmp/v.rda")
save(x,file="~/tmp/x.rda")
system("ls -lah ~/tmp/*.rda")

-rw-rw-r-- 1 tag tag  15M Apr 19 20:52 /home/tag/tmp/v.rda
-rw-rw-r-- 1 tag tag 7,4M Apr 19 20:52 /home/tag/tmp/x.rda

can you solve this as well?


Yes, this is tricky.  The problem is that "out" is in the environment of 
out$f, so you get two copies when you save it.  (I think you won't have 
two copies in memory, because R only makes a copy when it needs to, but 
I haven't traced this.)


Here are two solutions, both have some problems.

1.  Don't put out in the environment:

test <- function(x) {
  x <- rnorm(100)
  out$x <- list(x=x)
  out$f <- a ~ b# the as.formula() was never needed
  # temporarily create a new environment
  local({
# get a copy of what you want to keep
out <- out
# remove everything that you don't need from the formula
rm(list=c("x", "out"), envir=environment(out$f))
# return the local copy
out
  })
}

I don't like this because it is too tricky, but you could probably wrap 
the tricky bits into a little function (a variant on return() that 
cleans out the environment first), so it's probably what I would use if 
I was desperate to save space in saved copies.


2. Never evaluate the formula in the first place, so it doesn't pick up 
the environment:


test <- function(x) {
  x <- rnorm(100)
  out$x <- list(x=x)
  out$f <- quote(a ~ b)
  out
}

This is a lot simpler, but it might not work with some modelling 
functions, which would be confused by receiving the model formula 
unevaluated.  It also has the problems that you get with using 
.GlobalEnv as the environment of the formula, but maybe to a slightly 
lesser extent:  rather than having what is possibly the wrong 
environment, it doesn't have one at all.


Duncan Murdoch




thanks!
thomas

Duncan Murdoch  writes:


On 13-04-18 11:39 AM, Thomas Alexander Gerds wrote:

Dear Duncan
thank you for taking the time to answer my questions! It will be
quite some work to delete all the objects generated inside the
function ... but if there is no other way to avoid a large
environment then this is what I will do.


It's not really that hard.  Use names <- ls() in the function to get a
list of all of them; remove the names of variables that might be
needed in the formula (and the name of the formula itself); then use
rm(list=names) to delete everything else just before returning it.

Duncan Murdoch



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to control the environment of a formula

2013-04-19 Thread Thomas Alexander Gerds

hmm. I have tested a bit more, and found this perhaps more difficult
solve situation. even though I delete x, since x is part of the output
of the formula, the size of the object is twice as much as it should be:

test <- function(x){
  x <- rnorm(100)
  out <- list(x=x)
  rm(x)
  out$f <- as.formula(a~b)
  out
}
v <- test(1)
x <- rnorm(100)
save(v,file="~/tmp/v.rda")
save(x,file="~/tmp/x.rda")
system("ls -lah ~/tmp/*.rda")

-rw-rw-r-- 1 tag tag  15M Apr 19 20:52 /home/tag/tmp/v.rda
-rw-rw-r-- 1 tag tag 7,4M Apr 19 20:52 /home/tag/tmp/x.rda

can you solve this as well?

thanks!
thomas

Duncan Murdoch  writes:

> On 13-04-18 11:39 AM, Thomas Alexander Gerds wrote:
>> Dear Duncan
>> thank you for taking the time to answer my questions! It will be
>> quite some work to delete all the objects generated inside the
>> function ... but if there is no other way to avoid a large
>> environment then this is what I will do.
>
> It's not really that hard.  Use names <- ls() in the function to get a
> list of all of them; remove the names of variables that might be
> needed in the formula (and the name of the formula itself); then use
> rm(list=names) to delete everything else just before returning it.
>
> Duncan Murdoch
>
-- 
Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics Copenhagen
University of Copenhagen, Oester Farimagsgade 5, 1014 Copenhagen, Denmark

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to control the environment of a formula

2013-04-19 Thread Therneau, Terry M., Ph.D.
Duncan,
 I stand by all my comments.  Well behaved function -- those that look
only at their input arguments -- do just fine with a simple env.
 Now as to formulas --- the part of R that has most aggressively messed
with normal evaluation rules.  It is quite possible that there is/was no
other way to implement their functionality set, so I'm not throwing rocks
at that.  However, as soon as they enter the scene the consequences
multiply like rabbits and I feel like I've fallen into a hall of mirrors.
Nothing else has caused me as much ongoing confusion and wonderment in the
survival package.
  As soon as you introduced them all my arguments are irrelevant.

Terry T


On 4/19/13 9:05 AM, "Duncan Murdoch"  wrote:

>On 13-04-19 8:41 AM, Therneau, Terry M., Ph.D. wrote:
>>   I went through the same problem and discovery process 2 years ago
>>with the survival package.  With pspline()  terms the return object from
>>coxph includes a simple 6 line function for enhanced printout, which by
>>default carried along another 30 irrelevant things some of which were
>>huge.
>> I personally think that setting environment(f) <- .Globalenv is the
>>clearest and most simple solution.
>> Note that R does not save the environment of functions defined at the
>>top level; the prior line says to treat your function as "one of those".
>> It works very well as long as your function is an actual function,
>>i.e. It depends only on its input arguments.
>>
>> \begin {opinion}
>>S started out as a pure functional language.  That is, a function
>>depends ONLY on its arguments.   Many of the strengths of S/R flow
>>directly from the simplicity and rigor that this gives.
>> There is an adage in programming, going back to at least the earliest
>>Fortran compilers,  that all successful languages have a way to break
>>their own rules;  and S indeed had some hidden workarounds.  Formalizing
>>these non-functional back doors as R has done with environments is a
>>good thing.
>>
>> However, the back doors should be used only with extreme reluctance.  I
>>cringe at each new "how to be sneaky" discussion on the mailing lists.
>>The 'solution' is rarely worth the long term price.
>>   \end{opinion}
>
>Hmmm, it seems to me that your first paragraph contradicts your opinion.
>  If you set the environment of a formula to .GlobalEnv then suddenly
>the way that formula acts depends on all sorts of things that weren't
>there when it was created.
>
>Attaching the formula at the time of creation of a formula means that
>the names within it refer to data that is currently in scope.  That's
>generally a good thing.  It means that code will act the same when you
>run it at the top level or in a function.
>
>For example, consider this:
>
>f <- function() {
>x <- 1:10
>x2 <- x^2
>y <- rnorm(10, mean=x2)
>formula <- y ~ x + x2
>formula
>}
>
>fit <- lm(f())
>update(fit, . ~ . - x)
>
>
>This code works fine, all because the formula keeps the environment
>where it was created.  If I modify it like this:
>
>f <- function() {
>x <- 1:10
>x2 <- x^2
>y <- rnorm(10, mean=x2)
>formula <- y ~ x + x2
>environment(formula) <- .GlobalEnv
>formula
>}
>
>fit <- lm(f())
>update(fit, . ~ . - x)
>
>
>then I really have no idea what it will produce, because it depends on
>global variables y, x and x2, not the local ones created in the
>function.  If I'm lucky, I'll get an "object not found" error; if I'm
>not lucky, it'll just go find some other variables and use those.
>
>Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to control the environment of a formula

2013-04-19 Thread Duncan Murdoch

On 13-04-19 8:41 AM, Therneau, Terry M., Ph.D. wrote:

  I went through the same problem and discovery process 2 years ago with the 
survival package.  With pspline()  terms the return object from coxph includes 
a simple 6 line function for enhanced printout, which by default carried along 
another 30 irrelevant things some of which were huge.
I personally think that setting environment(f) <- .Globalenv is the clearest 
and most simple solution.
Note that R does not save the environment of functions defined at the top level; the 
prior line says to treat your function as "one of those".  It works very well 
as long as your function is an actual function,  i.e. It depends only on its input 
arguments.

\begin {opinion}
   S started out as a pure functional language.  That is, a function depends 
ONLY on its arguments.   Many of the strengths of S/R flow directly from the 
simplicity and rigor that this gives.
There is an adage in programming, going back to at least the earliest Fortran 
compilers,  that all successful languages have a way to break their own rules;  
and S indeed had some hidden workarounds.  Formalizing these non-functional 
back doors as R has done with environments is a good thing.

However, the back doors should be used only with extreme reluctance.  I cringe at each 
new "how to be sneaky" discussion on the mailing lists.  The 'solution' is 
rarely worth the long term price.
  \end{opinion}


Hmmm, it seems to me that your first paragraph contradicts your opinion. 
 If you set the environment of a formula to .GlobalEnv then suddenly 
the way that formula acts depends on all sorts of things that weren't 
there when it was created.


Attaching the formula at the time of creation of a formula means that 
the names within it refer to data that is currently in scope.  That's 
generally a good thing.  It means that code will act the same when you 
run it at the top level or in a function.


For example, consider this:

f <- function() {
   x <- 1:10
   x2 <- x^2
   y <- rnorm(10, mean=x2)
   formula <- y ~ x + x2
   formula
}

fit <- lm(f())
update(fit, . ~ . - x)


This code works fine, all because the formula keeps the environment 
where it was created.  If I modify it like this:


f <- function() {
   x <- 1:10
   x2 <- x^2
   y <- rnorm(10, mean=x2)
   formula <- y ~ x + x2
   environment(formula) <- .GlobalEnv
   formula
}

fit <- lm(f())
update(fit, . ~ . - x)


then I really have no idea what it will produce, because it depends on 
global variables y, x and x2, not the local ones created in the 
function.  If I'm lucky, I'll get an "object not found" error; if I'm 
not lucky, it'll just go find some other variables and use those.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to control the environment of a formula

2013-04-19 Thread Duncan Murdoch

On 13-04-18 11:39 AM, Thomas Alexander Gerds wrote:

Dear Duncan

thank you for taking the time to answer my questions! It will be quite
some work to delete all the objects generated inside the function
... but if there is no other way to avoid a large environment then this
is what I will do.


It's not really that hard.  Use names <- ls() in the function to get a 
list of all of them; remove the names of variables that might be needed 
in the formula (and the name of the formula itself); then use 
rm(list=names) to delete everything else just before returning it.


Duncan Murdoch



Cheers
Thomas

Duncan Murdoch  writes:


On 13-04-18 1:09 AM, Thomas Alexander Gerds wrote:

Dear List
I have experienced that objects generated with one of my packages
used a lot of space when saved on disc (object.size did not show
this!).
some debugging revealed that formula and call objects carried the
full environment of subroutines along, including even stuff not
needed by the formula or call. here is a sketch of the problem
,
| test <- function(x){ x <- rnorm(100) out <- list() out$f <-
| a~b out } v <- test(1) save(v,file="~/tmp/v.rda") system("ls -lah
| ~/tmp/v.rda")
| -rw-rw-r-- 1 tag tag 7,4M Apr 18 06:41 /home/tag/tmp/v.rda
`
I tried to replace line 3 by
,
| as.formula(a~b,env=emptyenv()) or as.formula(a~b,env=NULL)
`
without the desired effect. Instead adding either
,
| environment(out$f) <- emptyenv() or environment(out$f) <- NULL
`
has the desired effect (i.e. the saved object size is
shrunken). unfortunately there is a new problem:
,
| test <- function(x){ x <- rnorm(100) out <- list() out$f <-
| a~b environment(out$f) <- emptyenv() out } d <-
| data.frame(a=1,b=1) v <- test(1) model.frame(v$f,data=d)
| Error in eval(expr, envir, enclos) : could not find function
| "list"
`
Same with NULL in place of emptyenv()
Finally using .GlobalEnv in place of emptyenv() seems to remove both
problems.


But it will cause other, less obvious problems.  In a formula, the
symbols mean something.  By setting the environment to .GlobalEnv
you're changing the meaning.  You'll get nonsense in certain cases
when functions look up the meaning of those symbols and find the wrong
thing. (I don't have an example at hand, but I imagine it would be
easy to put one together with update().)


My questions:
1) why does the argument env of as.formula have no effect?


Because the first argument already had an associated environment.  You
passed a ~ b, which is evaluated to a formula; calling as.formula on a
formula does nothing. The env argument is only used when a new formula
needs to be constructed.  (You can see this in the source code;
as.formula is a very simple function.)


2) is there a better way to tell formula not to copy unrelated stuff
into the associated environment?


Yes, delete it.  For example, you could write your function as

  test <- function(x){ x <- rnorm(100) out <- list() out$f <- a~b
rm(x) out }


3) why does object.size not show the size of the environments that
formulas can carry along?


Because many objects can share the same environment.  See ?object.size
for more details.

Duncan Murdoch




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to control the environment of a formula

2013-04-19 Thread Therneau, Terry M., Ph.D.
 I went through the same problem and discovery process 2 years ago with the 
survival package.  With pspline()  terms the return object from coxph includes 
a simple 6 line function for enhanced printout, which by default carried along 
another 30 irrelevant things some of which were huge.
I personally think that setting environment(f) <- .Globalenv is the clearest 
and most simple solution.
Note that R does not save the environment of functions defined at the top 
level; the prior line says to treat your function as "one of those".  It works 
very well as long as your function is an actual function,  i.e. It depends only 
on its input arguments.

\begin {opinion}
  S started out as a pure functional language.  That is, a function depends 
ONLY on its arguments.   Many of the strengths of S/R flow directly from the 
simplicity and rigor that this gives.
There is an adage in programming, going back to at least the earliest Fortran 
compilers,  that all successful languages have a way to break their own rules;  
and S indeed had some hidden workarounds.  Formalizing these non-functional 
back doors as R has done with environments is a good thing.

However, the back doors should be used only with extreme reluctance.  I cringe 
at each new "how to be sneaky" discussion on the mailing lists.  The 'solution' 
is rarely worth the long term price.
 \end{opinion}

Terry Therneau

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to control the environment of a formula

2013-04-18 Thread Thomas Alexander Gerds
Dear Duncan 

thank you for taking the time to answer my questions! It will be quite
some work to delete all the objects generated inside the function
... but if there is no other way to avoid a large environment then this
is what I will do.

Cheers
Thomas

Duncan Murdoch  writes:

> On 13-04-18 1:09 AM, Thomas Alexander Gerds wrote:
>> Dear List
>> I have experienced that objects generated with one of my packages
>> used a lot of space when saved on disc (object.size did not show
>> this!).
>> some debugging revealed that formula and call objects carried the
>> full environment of subroutines along, including even stuff not
>> needed by the formula or call. here is a sketch of the problem
>> ,
>> | test <- function(x){ x <- rnorm(100) out <- list() out$f <-
>> | a~b out } v <- test(1) save(v,file="~/tmp/v.rda") system("ls -lah
>> | ~/tmp/v.rda")
>> | -rw-rw-r-- 1 tag tag 7,4M Apr 18 06:41 /home/tag/tmp/v.rda
>> `
>> I tried to replace line 3 by
>> ,
>> | as.formula(a~b,env=emptyenv()) or as.formula(a~b,env=NULL)
>> `
>> without the desired effect. Instead adding either
>> ,
>> | environment(out$f) <- emptyenv() or environment(out$f) <- NULL
>> `
>> has the desired effect (i.e. the saved object size is
>> shrunken). unfortunately there is a new problem:
>> ,
>> | test <- function(x){ x <- rnorm(100) out <- list() out$f <-
>> | a~b environment(out$f) <- emptyenv() out } d <-
>> | data.frame(a=1,b=1) v <- test(1) model.frame(v$f,data=d)
>> | Error in eval(expr, envir, enclos) : could not find function
>> | "list"
>> `
>> Same with NULL in place of emptyenv()
>> Finally using .GlobalEnv in place of emptyenv() seems to remove both
>> problems.
>
> But it will cause other, less obvious problems.  In a formula, the
> symbols mean something.  By setting the environment to .GlobalEnv
> you're changing the meaning.  You'll get nonsense in certain cases
> when functions look up the meaning of those symbols and find the wrong
> thing. (I don't have an example at hand, but I imagine it would be
> easy to put one together with update().)
>
>> My questions:
>> 1) why does the argument env of as.formula have no effect?
>
> Because the first argument already had an associated environment.  You
> passed a ~ b, which is evaluated to a formula; calling as.formula on a
> formula does nothing. The env argument is only used when a new formula
> needs to be constructed.  (You can see this in the source code;
> as.formula is a very simple function.)
>
>> 2) is there a better way to tell formula not to copy unrelated stuff
>> into the associated environment?
>
> Yes, delete it.  For example, you could write your function as
>
>  test <- function(x){ x <- rnorm(100) out <- list() out$f <- a~b
> rm(x) out }
>
>> 3) why does object.size not show the size of the environments that
>> formulas can carry along?
>
> Because many objects can share the same environment.  See ?object.size
> for more details.
>
> Duncan Murdoch

-- 
Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics Copenhagen
University of Copenhagen, Oester Farimagsgade 5, 1014 Copenhagen, Denmark

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to control the environment of a formula

2013-04-18 Thread Duncan Murdoch

On 13-04-18 1:09 AM, Thomas Alexander Gerds wrote:

Dear List

I have experienced that objects generated with one of my packages used
a lot of space when saved on disc (object.size did not show this!).

some debugging revealed that formula and call objects carried the full
environment of subroutines along, including even stuff not needed by the
formula or call. here is a sketch of the problem

,
| test <- function(x){
|   x <- rnorm(100)
|   out <- list()
|   out$f <- a~b
|   out
| }
| v <- test(1)
| save(v,file="~/tmp/v.rda")
| system("ls -lah ~/tmp/v.rda")
|
| -rw-rw-r-- 1 tag tag 7,4M Apr 18 06:41 /home/tag/tmp/v.rda
`

I tried to replace line 3 by

,
| as.formula(a~b,env=emptyenv())
| or
| as.formula(a~b,env=NULL)
`

without the desired effect. Instead adding either

,
| environment(out$f) <- emptyenv()
| or
| environment(out$f) <- NULL
`

has the desired effect (i.e. the saved object size is
shrunken). unfortunately there is a new problem:

,
| test <- function(x){
|   x <- rnorm(100)
|   out <- list()
|   out$f <- a~b
|   environment(out$f) <- emptyenv()
|   out
| }
| d <- data.frame(a=1,b=1)
| v <- test(1)
| model.frame(v$f,data=d)
|
| Error in eval(expr, envir, enclos) : could not find function "list"
`

Same with NULL in place of emptyenv()

Finally using .GlobalEnv in place of emptyenv() seems to remove both problems.


But it will cause other, less obvious problems.  In a formula, the 
symbols mean something.  By setting the environment to .GlobalEnv you're 
changing the meaning.  You'll get nonsense in certain cases when 
functions look up the meaning of those symbols and find the wrong thing. 
 (I don't have an example at hand, but I imagine it would be easy to 
put one together with update().)



My questions:

1)  why does the argument env of as.formula have no effect?


Because the first argument already had an associated environment.  You 
passed a ~ b, which is evaluated to a formula; calling as.formula on a 
formula does nothing. The env argument is only used when a new formula 
needs to be constructed.  (You can see this in the source code; 
as.formula is a very simple function.)



2)  is there a better way to tell formula not to copy unrelated stuff
 into the associated environment?


Yes, delete it.  For example, you could write your function as

 test <- function(x){
   x <- rnorm(100)
   out <- list()
   out$f <- a~b
   rm(x)
   out
 }



3)  why does object.size not show the size of the environments that
 formulas can carry along?


Because many objects can share the same environment.  See ?object.size 
for more details.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] how to control the environment of a formula

2013-04-18 Thread Thomas Alexander Gerds
Dear List

I have experienced that objects generated with one of my packages used
a lot of space when saved on disc (object.size did not show this!).

some debugging revealed that formula and call objects carried the full
environment of subroutines along, including even stuff not needed by the
formula or call. here is a sketch of the problem

,
| test <- function(x){
|   x <- rnorm(100)
|   out <- list()
|   out$f <- a~b
|   out
| }
| v <- test(1)
| save(v,file="~/tmp/v.rda")
| system("ls -lah ~/tmp/v.rda")
| 
| -rw-rw-r-- 1 tag tag 7,4M Apr 18 06:41 /home/tag/tmp/v.rda
`

I tried to replace line 3 by

,
| as.formula(a~b,env=emptyenv())
| or
| as.formula(a~b,env=NULL)
`

without the desired effect. Instead adding either

,
| environment(out$f) <- emptyenv()
| or
| environment(out$f) <- NULL
`

has the desired effect (i.e. the saved object size is
shrunken). unfortunately there is a new problem:

,
| test <- function(x){
|   x <- rnorm(100)
|   out <- list()
|   out$f <- a~b
|   environment(out$f) <- emptyenv()
|   out
| }
| d <- data.frame(a=1,b=1)
| v <- test(1)
| model.frame(v$f,data=d)
| 
| Error in eval(expr, envir, enclos) : could not find function "list"
`

Same with NULL in place of emptyenv()

Finally using .GlobalEnv in place of emptyenv() seems to remove both problems.
My questions:

1)  why does the argument env of as.formula have no effect?
2)  is there a better way to tell formula not to copy unrelated stuff
into the associated environment?
3)  why does object.size not show the size of the environments that
formulas can carry along?

Regards
Thomas


--
Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics
University of Copenhagen, Øster Farimagsgade 5, 1014 Copenhagen, Denmark
Office: CSS-15.2.07 (Gamle Kommunehospital)
tel: 35327914 (sec: 35327901) 

-- 
--
Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics
University of Copenhagen, Øster Farimagsgade 5, 1014 Copenhagen, Denmark
Office: CSS-15.2.07 (Gamle Kommunehospital)
tel: 35327914 (sec: 35327901) 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel