Re: [Rd] how to control the environment of a formula

2013-04-21 Thread Thomas Alexander Gerds

thanks. yes, I was considering to use as.character(f) but your solution
2 is much better -- did not know ' was a R function as well. just
checked: model.frame does not get confused and this will be used to
evaluate formula by all functions in my packages.

however, there could be related problems with memory. I noticed that
some of my processes use unexpectedly much memory. how can one trace
this?

I am not desperate to save diskspace: the problem is that file transfer
and sharing (like dropbox) suffer when each simulation results fills 8M
instead of 130K just because a large data set is invisibly sitting in
the saved file.

Duncan Murdoch murdoch.dun...@gmail.com writes:

 On 13-04-19 2:57 PM, Thomas Alexander Gerds wrote:
 hmm. I have tested a bit more, and found this perhaps more difficult
 solve situation. even though I delete x, since x is part of the
 output of the formula, the size of the object is twice as much as it
 should be:
 test - function(x){ x - rnorm(100) out - list(x=x) rm(x)
 out$f - as.formula(a~b) out } v - test(1) x - rnorm(100)
 save(v,file=~/tmp/v.rda) save(x,file=~/tmp/x.rda) system(ls
 -lah ~/tmp/*.rda)
 -rw-rw-r-- 1 tag tag 15M Apr 19 20:52 /home/tag/tmp/v.rda -rw-rw-r--
 1 tag tag 7,4M Apr 19 20:52 /home/tag/tmp/x.rda
 can you solve this as well?

 Yes, this is tricky.  The problem is that out is in the environment
 of out$f, so you get two copies when you save it.  (I think you won't
 have two copies in memory, because R only makes a copy when it needs
 to, but I haven't traced this.)

 Here are two solutions, both have some problems.

 1.  Don't put out in the environment:

 test - function(x) { x - rnorm(100) out$x - list(x=x) out$f -
 a ~ b # the as.formula() was never needed # temporarily create a new
 environment local({ # get a copy of what you want to keep out - out #
 remove everything that you don't need from the formula rm(list=c(x,
 out), envir=environment(out$f)) # return the local copy out }) }

 I don't like this because it is too tricky, but you could probably
 wrap the tricky bits into a little function (a variant on return()
 that cleans out the environment first), so it's probably what I would
 use if I was desperate to save space in saved copies.

 2. Never evaluate the formula in the first place, so it doesn't pick
 up the environment:

 test - function(x) { x - rnorm(100) out$x - list(x=x) out$f -
 quote(a ~ b) out }

 This is a lot simpler, but it might not work with some modelling
 functions, which would be confused by receiving the model formula
 unevaluated.  It also has the problems that you get with using
 .GlobalEnv as the environment of the formula, but maybe to a slightly
 lesser extent: rather than having what is possibly the wrong
 environment, it doesn't have one at all.

 Duncan Murdoch

 thanks!  thomas
 Duncan Murdoch murdoch.dun...@gmail.com writes:

 On 13-04-18 11:39 AM, Thomas Alexander Gerds wrote:
 Dear Duncan thank you for taking the time to answer my questions!
 It will be quite some work to delete all the objects generated
 inside the function ... but if there is no other way to avoid a
 large environment then this is what I will do.
 It's not really that hard.  Use names - ls() in the function to
 get a list of all of them; remove the names of variables that might
 be needed in the formula (and the name of the formula itself); then
 use rm(list=names) to delete everything else just before returning
 it.
 Duncan Murdoch


-- 
Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics Copenhagen
University of Copenhagen, Oester Farimagsgade 5, 1014 Copenhagen, Denmark

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to control the environment of a formula

2013-04-20 Thread Duncan Murdoch

On 13-04-19 2:57 PM, Thomas Alexander Gerds wrote:


hmm. I have tested a bit more, and found this perhaps more difficult
solve situation. even though I delete x, since x is part of the output
of the formula, the size of the object is twice as much as it should be:

test - function(x){
   x - rnorm(100)
   out - list(x=x)
   rm(x)
   out$f - as.formula(a~b)
   out
}
v - test(1)
x - rnorm(100)
save(v,file=~/tmp/v.rda)
save(x,file=~/tmp/x.rda)
system(ls -lah ~/tmp/*.rda)

-rw-rw-r-- 1 tag tag  15M Apr 19 20:52 /home/tag/tmp/v.rda
-rw-rw-r-- 1 tag tag 7,4M Apr 19 20:52 /home/tag/tmp/x.rda

can you solve this as well?


Yes, this is tricky.  The problem is that out is in the environment of 
out$f, so you get two copies when you save it.  (I think you won't have 
two copies in memory, because R only makes a copy when it needs to, but 
I haven't traced this.)


Here are two solutions, both have some problems.

1.  Don't put out in the environment:

test - function(x) {
  x - rnorm(100)
  out$x - list(x=x)
  out$f - a ~ b# the as.formula() was never needed
  # temporarily create a new environment
  local({
# get a copy of what you want to keep
out - out
# remove everything that you don't need from the formula
rm(list=c(x, out), envir=environment(out$f))
# return the local copy
out
  })
}

I don't like this because it is too tricky, but you could probably wrap 
the tricky bits into a little function (a variant on return() that 
cleans out the environment first), so it's probably what I would use if 
I was desperate to save space in saved copies.


2. Never evaluate the formula in the first place, so it doesn't pick up 
the environment:


test - function(x) {
  x - rnorm(100)
  out$x - list(x=x)
  out$f - quote(a ~ b)
  out
}

This is a lot simpler, but it might not work with some modelling 
functions, which would be confused by receiving the model formula 
unevaluated.  It also has the problems that you get with using 
.GlobalEnv as the environment of the formula, but maybe to a slightly 
lesser extent:  rather than having what is possibly the wrong 
environment, it doesn't have one at all.


Duncan Murdoch




thanks!
thomas

Duncan Murdoch murdoch.dun...@gmail.com writes:


On 13-04-18 11:39 AM, Thomas Alexander Gerds wrote:

Dear Duncan
thank you for taking the time to answer my questions! It will be
quite some work to delete all the objects generated inside the
function ... but if there is no other way to avoid a large
environment then this is what I will do.


It's not really that hard.  Use names - ls() in the function to get a
list of all of them; remove the names of variables that might be
needed in the formula (and the name of the formula itself); then use
rm(list=names) to delete everything else just before returning it.

Duncan Murdoch



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to control the environment of a formula

2013-04-20 Thread Gabor Grothendieck
On Sat, Apr 20, 2013 at 1:44 PM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:
 On 13-04-19 2:57 PM, Thomas Alexander Gerds wrote:


 hmm. I have tested a bit more, and found this perhaps more difficult
 solve situation. even though I delete x, since x is part of the output
 of the formula, the size of the object is twice as much as it should be:

 test - function(x){
x - rnorm(100)
out - list(x=x)
rm(x)
out$f - as.formula(a~b)
out
 }
 v - test(1)
 x - rnorm(100)
 save(v,file=~/tmp/v.rda)
 save(x,file=~/tmp/x.rda)
 system(ls -lah ~/tmp/*.rda)

 -rw-rw-r-- 1 tag tag  15M Apr 19 20:52 /home/tag/tmp/v.rda
 -rw-rw-r-- 1 tag tag 7,4M Apr 19 20:52 /home/tag/tmp/x.rda

 can you solve this as well?


 Yes, this is tricky.  The problem is that out is in the environment of
 out$f, so you get two copies when you save it.  (I think you won't have two
 copies in memory, because R only makes a copy when it needs to, but I
 haven't traced this.)

 Here are two solutions, both have some problems.

 1.  Don't put out in the environment:


 test - function(x) {
   x - rnorm(100)
   out$x - list(x=x)
   out$f - a ~ b# the as.formula() was never needed
   # temporarily create a new environment
   local({
 # get a copy of what you want to keep
 out - out
 # remove everything that you don't need from the formula
 rm(list=c(x, out), envir=environment(out$f))
 # return the local copy
 out
   })
 }

 I don't like this because it is too tricky, but you could probably wrap the
 tricky bits into a little function (a variant on return() that cleans out
 the environment first), so it's probably what I would use if I was desperate
 to save space in saved copies.

 2. Never evaluate the formula in the first place, so it doesn't pick up the
 environment:


 test - function(x) {
   x - rnorm(100)
   out$x - list(x=x)
   out$f - quote(a ~ b)
   out
 }

 This is a lot simpler, but it might not work with some modelling functions,
 which would be confused by receiving the model formula unevaluated.  It also
 has the problems that you get with using .GlobalEnv as the environment of
 the formula, but maybe to a slightly lesser extent:  rather than having what
 is possibly the wrong environment, it doesn't have one at all.

An approach along the lines of Duncan's last solution that works with
lm but may or may not work with other regression-style functions is to
use a character string:

fit - lm(demand ~ Time, BOD)

As long as you are only saving the input you should be OK but if you
are saving the output of lm then you are back to the same problem
since the lm object will contain a formula.

 class(formula(fit))
[1] formula

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to control the environment of a formula

2013-04-19 Thread Duncan Murdoch

On 13-04-19 8:41 AM, Therneau, Terry M., Ph.D. wrote:

  I went through the same problem and discovery process 2 years ago with the 
survival package.  With pspline()  terms the return object from coxph includes 
a simple 6 line function for enhanced printout, which by default carried along 
another 30 irrelevant things some of which were huge.
I personally think that setting environment(f) - .Globalenv is the clearest 
and most simple solution.
Note that R does not save the environment of functions defined at the top level; the 
prior line says to treat your function as one of those.  It works very well 
as long as your function is an actual function,  i.e. It depends only on its input 
arguments.

\begin {opinion}
   S started out as a pure functional language.  That is, a function depends 
ONLY on its arguments.   Many of the strengths of S/R flow directly from the 
simplicity and rigor that this gives.
There is an adage in programming, going back to at least the earliest Fortran 
compilers,  that all successful languages have a way to break their own rules;  
and S indeed had some hidden workarounds.  Formalizing these non-functional 
back doors as R has done with environments is a good thing.

However, the back doors should be used only with extreme reluctance.  I cringe at each 
new how to be sneaky discussion on the mailing lists.  The 'solution' is 
rarely worth the long term price.
  \end{opinion}


Hmmm, it seems to me that your first paragraph contradicts your opinion. 
 If you set the environment of a formula to .GlobalEnv then suddenly 
the way that formula acts depends on all sorts of things that weren't 
there when it was created.


Attaching the formula at the time of creation of a formula means that 
the names within it refer to data that is currently in scope.  That's 
generally a good thing.  It means that code will act the same when you 
run it at the top level or in a function.


For example, consider this:

f - function() {
   x - 1:10
   x2 - x^2
   y - rnorm(10, mean=x2)
   formula - y ~ x + x2
   formula
}

fit - lm(f())
update(fit, . ~ . - x)


This code works fine, all because the formula keeps the environment 
where it was created.  If I modify it like this:


f - function() {
   x - 1:10
   x2 - x^2
   y - rnorm(10, mean=x2)
   formula - y ~ x + x2
   environment(formula) - .GlobalEnv
   formula
}

fit - lm(f())
update(fit, . ~ . - x)


then I really have no idea what it will produce, because it depends on 
global variables y, x and x2, not the local ones created in the 
function.  If I'm lucky, I'll get an object not found error; if I'm 
not lucky, it'll just go find some other variables and use those.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to control the environment of a formula

2013-04-19 Thread Therneau, Terry M., Ph.D.
Duncan,
 I stand by all my comments.  Well behaved function -- those that look
only at their input arguments -- do just fine with a simple env.
 Now as to formulas --- the part of R that has most aggressively messed
with normal evaluation rules.  It is quite possible that there is/was no
other way to implement their functionality set, so I'm not throwing rocks
at that.  However, as soon as they enter the scene the consequences
multiply like rabbits and I feel like I've fallen into a hall of mirrors.
Nothing else has caused me as much ongoing confusion and wonderment in the
survival package.
  As soon as you introduced them all my arguments are irrelevant.

Terry T


On 4/19/13 9:05 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote:

On 13-04-19 8:41 AM, Therneau, Terry M., Ph.D. wrote:
   I went through the same problem and discovery process 2 years ago
with the survival package.  With pspline()  terms the return object from
coxph includes a simple 6 line function for enhanced printout, which by
default carried along another 30 irrelevant things some of which were
huge.
 I personally think that setting environment(f) - .Globalenv is the
clearest and most simple solution.
 Note that R does not save the environment of functions defined at the
top level; the prior line says to treat your function as one of those.
 It works very well as long as your function is an actual function,
i.e. It depends only on its input arguments.

 \begin {opinion}
S started out as a pure functional language.  That is, a function
depends ONLY on its arguments.   Many of the strengths of S/R flow
directly from the simplicity and rigor that this gives.
 There is an adage in programming, going back to at least the earliest
Fortran compilers,  that all successful languages have a way to break
their own rules;  and S indeed had some hidden workarounds.  Formalizing
these non-functional back doors as R has done with environments is a
good thing.

 However, the back doors should be used only with extreme reluctance.  I
cringe at each new how to be sneaky discussion on the mailing lists.
The 'solution' is rarely worth the long term price.
   \end{opinion}

Hmmm, it seems to me that your first paragraph contradicts your opinion.
  If you set the environment of a formula to .GlobalEnv then suddenly
the way that formula acts depends on all sorts of things that weren't
there when it was created.

Attaching the formula at the time of creation of a formula means that
the names within it refer to data that is currently in scope.  That's
generally a good thing.  It means that code will act the same when you
run it at the top level or in a function.

For example, consider this:

f - function() {
x - 1:10
x2 - x^2
y - rnorm(10, mean=x2)
formula - y ~ x + x2
formula
}

fit - lm(f())
update(fit, . ~ . - x)


This code works fine, all because the formula keeps the environment
where it was created.  If I modify it like this:

f - function() {
x - 1:10
x2 - x^2
y - rnorm(10, mean=x2)
formula - y ~ x + x2
environment(formula) - .GlobalEnv
formula
}

fit - lm(f())
update(fit, . ~ . - x)


then I really have no idea what it will produce, because it depends on
global variables y, x and x2, not the local ones created in the
function.  If I'm lucky, I'll get an object not found error; if I'm
not lucky, it'll just go find some other variables and use those.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to control the environment of a formula

2013-04-19 Thread Thomas Alexander Gerds

hmm. I have tested a bit more, and found this perhaps more difficult
solve situation. even though I delete x, since x is part of the output
of the formula, the size of the object is twice as much as it should be:

test - function(x){
  x - rnorm(100)
  out - list(x=x)
  rm(x)
  out$f - as.formula(a~b)
  out
}
v - test(1)
x - rnorm(100)
save(v,file=~/tmp/v.rda)
save(x,file=~/tmp/x.rda)
system(ls -lah ~/tmp/*.rda)

-rw-rw-r-- 1 tag tag  15M Apr 19 20:52 /home/tag/tmp/v.rda
-rw-rw-r-- 1 tag tag 7,4M Apr 19 20:52 /home/tag/tmp/x.rda

can you solve this as well?

thanks!
thomas

Duncan Murdoch murdoch.dun...@gmail.com writes:

 On 13-04-18 11:39 AM, Thomas Alexander Gerds wrote:
 Dear Duncan
 thank you for taking the time to answer my questions! It will be
 quite some work to delete all the objects generated inside the
 function ... but if there is no other way to avoid a large
 environment then this is what I will do.

 It's not really that hard.  Use names - ls() in the function to get a
 list of all of them; remove the names of variables that might be
 needed in the formula (and the name of the formula itself); then use
 rm(list=names) to delete everything else just before returning it.

 Duncan Murdoch

-- 
Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics Copenhagen
University of Copenhagen, Oester Farimagsgade 5, 1014 Copenhagen, Denmark

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] how to control the environment of a formula

2013-04-18 Thread Thomas Alexander Gerds
Dear List

I have experienced that objects generated with one of my packages used
a lot of space when saved on disc (object.size did not show this!).

some debugging revealed that formula and call objects carried the full
environment of subroutines along, including even stuff not needed by the
formula or call. here is a sketch of the problem

,
| test - function(x){
|   x - rnorm(100)
|   out - list()
|   out$f - a~b
|   out
| }
| v - test(1)
| save(v,file=~/tmp/v.rda)
| system(ls -lah ~/tmp/v.rda)
| 
| -rw-rw-r-- 1 tag tag 7,4M Apr 18 06:41 /home/tag/tmp/v.rda
`

I tried to replace line 3 by

,
| as.formula(a~b,env=emptyenv())
| or
| as.formula(a~b,env=NULL)
`

without the desired effect. Instead adding either

,
| environment(out$f) - emptyenv()
| or
| environment(out$f) - NULL
`

has the desired effect (i.e. the saved object size is
shrunken). unfortunately there is a new problem:

,
| test - function(x){
|   x - rnorm(100)
|   out - list()
|   out$f - a~b
|   environment(out$f) - emptyenv()
|   out
| }
| d - data.frame(a=1,b=1)
| v - test(1)
| model.frame(v$f,data=d)
| 
| Error in eval(expr, envir, enclos) : could not find function list
`

Same with NULL in place of emptyenv()

Finally using .GlobalEnv in place of emptyenv() seems to remove both problems.
My questions:

1)  why does the argument env of as.formula have no effect?
2)  is there a better way to tell formula not to copy unrelated stuff
into the associated environment?
3)  why does object.size not show the size of the environments that
formulas can carry along?

Regards
Thomas


--
Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics
University of Copenhagen, Øster Farimagsgade 5, 1014 Copenhagen, Denmark
Office: CSS-15.2.07 (Gamle Kommunehospital)
tel: 35327914 (sec: 35327901) 

-- 
--
Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics
University of Copenhagen, Øster Farimagsgade 5, 1014 Copenhagen, Denmark
Office: CSS-15.2.07 (Gamle Kommunehospital)
tel: 35327914 (sec: 35327901) 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to control the environment of a formula

2013-04-18 Thread Duncan Murdoch

On 13-04-18 1:09 AM, Thomas Alexander Gerds wrote:

Dear List

I have experienced that objects generated with one of my packages used
a lot of space when saved on disc (object.size did not show this!).

some debugging revealed that formula and call objects carried the full
environment of subroutines along, including even stuff not needed by the
formula or call. here is a sketch of the problem

,
| test - function(x){
|   x - rnorm(100)
|   out - list()
|   out$f - a~b
|   out
| }
| v - test(1)
| save(v,file=~/tmp/v.rda)
| system(ls -lah ~/tmp/v.rda)
|
| -rw-rw-r-- 1 tag tag 7,4M Apr 18 06:41 /home/tag/tmp/v.rda
`

I tried to replace line 3 by

,
| as.formula(a~b,env=emptyenv())
| or
| as.formula(a~b,env=NULL)
`

without the desired effect. Instead adding either

,
| environment(out$f) - emptyenv()
| or
| environment(out$f) - NULL
`

has the desired effect (i.e. the saved object size is
shrunken). unfortunately there is a new problem:

,
| test - function(x){
|   x - rnorm(100)
|   out - list()
|   out$f - a~b
|   environment(out$f) - emptyenv()
|   out
| }
| d - data.frame(a=1,b=1)
| v - test(1)
| model.frame(v$f,data=d)
|
| Error in eval(expr, envir, enclos) : could not find function list
`

Same with NULL in place of emptyenv()

Finally using .GlobalEnv in place of emptyenv() seems to remove both problems.


But it will cause other, less obvious problems.  In a formula, the 
symbols mean something.  By setting the environment to .GlobalEnv you're 
changing the meaning.  You'll get nonsense in certain cases when 
functions look up the meaning of those symbols and find the wrong thing. 
 (I don't have an example at hand, but I imagine it would be easy to 
put one together with update().)



My questions:

1)  why does the argument env of as.formula have no effect?


Because the first argument already had an associated environment.  You 
passed a ~ b, which is evaluated to a formula; calling as.formula on a 
formula does nothing. The env argument is only used when a new formula 
needs to be constructed.  (You can see this in the source code; 
as.formula is a very simple function.)



2)  is there a better way to tell formula not to copy unrelated stuff
 into the associated environment?


Yes, delete it.  For example, you could write your function as

 test - function(x){
   x - rnorm(100)
   out - list()
   out$f - a~b
   rm(x)
   out
 }



3)  why does object.size not show the size of the environments that
 formulas can carry along?


Because many objects can share the same environment.  See ?object.size 
for more details.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to control the environment of a formula

2013-04-18 Thread Thomas Alexander Gerds
Dear Duncan 

thank you for taking the time to answer my questions! It will be quite
some work to delete all the objects generated inside the function
... but if there is no other way to avoid a large environment then this
is what I will do.

Cheers
Thomas

Duncan Murdoch murdoch.dun...@gmail.com writes:

 On 13-04-18 1:09 AM, Thomas Alexander Gerds wrote:
 Dear List
 I have experienced that objects generated with one of my packages
 used a lot of space when saved on disc (object.size did not show
 this!).
 some debugging revealed that formula and call objects carried the
 full environment of subroutines along, including even stuff not
 needed by the formula or call. here is a sketch of the problem
 ,
 | test - function(x){ x - rnorm(100) out - list() out$f -
 | a~b out } v - test(1) save(v,file=~/tmp/v.rda) system(ls -lah
 | ~/tmp/v.rda)
 | -rw-rw-r-- 1 tag tag 7,4M Apr 18 06:41 /home/tag/tmp/v.rda
 `
 I tried to replace line 3 by
 ,
 | as.formula(a~b,env=emptyenv()) or as.formula(a~b,env=NULL)
 `
 without the desired effect. Instead adding either
 ,
 | environment(out$f) - emptyenv() or environment(out$f) - NULL
 `
 has the desired effect (i.e. the saved object size is
 shrunken). unfortunately there is a new problem:
 ,
 | test - function(x){ x - rnorm(100) out - list() out$f -
 | a~b environment(out$f) - emptyenv() out } d -
 | data.frame(a=1,b=1) v - test(1) model.frame(v$f,data=d)
 | Error in eval(expr, envir, enclos) : could not find function
 | list
 `
 Same with NULL in place of emptyenv()
 Finally using .GlobalEnv in place of emptyenv() seems to remove both
 problems.

 But it will cause other, less obvious problems.  In a formula, the
 symbols mean something.  By setting the environment to .GlobalEnv
 you're changing the meaning.  You'll get nonsense in certain cases
 when functions look up the meaning of those symbols and find the wrong
 thing. (I don't have an example at hand, but I imagine it would be
 easy to put one together with update().)

 My questions:
 1) why does the argument env of as.formula have no effect?

 Because the first argument already had an associated environment.  You
 passed a ~ b, which is evaluated to a formula; calling as.formula on a
 formula does nothing. The env argument is only used when a new formula
 needs to be constructed.  (You can see this in the source code;
 as.formula is a very simple function.)

 2) is there a better way to tell formula not to copy unrelated stuff
 into the associated environment?

 Yes, delete it.  For example, you could write your function as

  test - function(x){ x - rnorm(100) out - list() out$f - a~b
 rm(x) out }

 3) why does object.size not show the size of the environments that
 formulas can carry along?

 Because many objects can share the same environment.  See ?object.size
 for more details.

 Duncan Murdoch

-- 
Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics Copenhagen
University of Copenhagen, Oester Farimagsgade 5, 1014 Copenhagen, Denmark

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel