Re: [R] problem with lapply(x, subset, ...) and variable select argument

2005-10-11 Thread Gabor Grothendieck
Just one simple shortening of DR's solution:

tt <- function (n) {
   x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
   print(sapply(x, function(...) subset(...), select = n))
}

n <- "b"
tt("a")


On 10/11/05, Dimitris Rizopoulos <[EMAIL PROTECTED]> wrote:
> As Gabor said, the issue here is that subset.data.frame() evaluates
> the value of the `select' argument in the parent.frame(); Thus, if you
> create a local function within lapply() (or sapply()) it works:
>
> tt <- function (n) {
>x <- list(data.frame(a = 1, b = 2), data.frame(a = 3, b = 4))
>print(lapply(x, function(y, n) subset(y, select = n), n = n))
>print(sapply(x, function(y, n) subset(y, select = n), n = n))
> }
>
> tt("a")
>
>
> I hope it helps.
>
> Best,
> Dimitris
>
> 
> Dimitris Rizopoulos
> Ph.D. Student
> Biostatistical Centre
> School of Public Health
> Catholic University of Leuven
>
> Address: Kapucijnenvoer 35, Leuven, Belgium
> Tel: +32/(0)16/336899
> Fax: +32/(0)16/337015
> Web: http://www.med.kuleuven.be/biostat/
> http://www.student.kuleuven.be/~m0390867/dimitris.htm
>
>
>
> - Original Message -
> From: "joerg van den hoff" <[EMAIL PROTECTED]>
> To: "Gabor Grothendieck" <[EMAIL PROTECTED]>; "Thomas Lumley"
> <[EMAIL PROTECTED]>
> Cc: "r-help" 
> Sent: Tuesday, October 11, 2005 10:18 AM
> Subject: Re: [R] problem with lapply(x, subset,...) and variable
> select argument
>
>
> > Gabor Grothendieck wrote:
> >> The problem is that subset looks into its parent frame but in this
> >> case the parent frame is not the environment in tt but the
> >> environment
> >> in lapply since tt does not call subset directly but rather lapply
> >> does.
> >>
> >> Try this which is similar except we have added the line beginning
> >> with environment before the print statement.
> >>
> >> tt <- function (n) {
> >>x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
> >>environment(lapply) <- environment()
> >>print(lapply(x, subset, select = n))
> >> }
> >>
> >> n <- "b"
> >> tt("a")
> >>
> >> What this does is create a new version of lapply whose
> >> parent is the environment in tt.
> >>
> >>
> >> On 10/10/05, joerg van den hoff <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >>>I need to extract identically named columns from several data
> >>>frames in
> >>>a list. the column name is a variable (i.e. not known in advance).
> >>>the
> >>>whole thing occurs within a function body. I'd like to use lapply
> >>>with a
> >>>variable 'select' argument.
> >>>
> >>>
> >>>example:
> >>>
> >>>tt <- function (n) {
> >>>   x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
> >>>   for (xx in x) print(subset(xx, select = n))   ### works
> >>>   print (lapply(x, subset, select = a))   ### works
> >>>   print (lapply(x, subset, select = "a"))  ### works
> >>>   print (lapply(x, subset, select = n))  ### does not work as
> >>> intended
> >>>}
> >>>n = "b"
> >>>tt("a")  #works (but selects not the intended column)
> >>>rm(n)
> >>>tt("a")   #no longer works in the lapply call including variable
> >>>'n'
> >>>
> >>>
> >>>question: how  can I enforce evaluation of the variable n such that
> >>>the lapply call works? I suspect it has something to do with eval
> >>>and
> >>>specifying the correct evaluation frame, but how? 
> >>>
> >>>
> >>>many thanks
> >>>
> >>>joerg
> >>>
> >>>__
> >>>R-help@stat.math.ethz.ch mailing list
> >>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>>PLEASE do read the posting guide!
> >>>http://www.R-project.org/posting-guide.html
> >>>
> >>
> >>
> >
> > many thanks to thomas and gabor for their help. both solutions solve
> > my
> > problem perfectly.
> >
> > but just as an attempt to improve my understanding of the inner
> > workings
> > of R (similar problems are sure to come up ...) two more question:
> >
> 

Re: [R] problem with lapply(x, subset, ...) and variable select argument

2005-10-11 Thread Thomas Lumley
On Tue, 11 Oct 2005, joerg van den hoff wrote:
> many thanks to thomas and gabor for their help. both solutions solve my 
> problem perfectly.
>
> but just as an attempt to improve my understanding of the inner workings of R 
> (similar problems are sure to come up ...) two more question:
>
> 1.
> why does the call of the "[" function (thomas' solution) behave different 
> from "subset" in that the look up of the variable "n" works without providing 
> lapply with the current environment (which is nice)?

"[" behaves like nearly all functions in R: the value of the argument is 
passed.   subset() does some tricky things to subvert the usual argument 
passing.  Quite a few of the modelling functions do similar tricky things, 
and they do sometimes get confused when passed as arguments to another 
function.

> 2.
> using 'subset' in this context becomes more cumbersome, if sapply is used. it 
> seems that than I need
> ...
> environment(sapply) <- environment(lapply) <- environment()
> sapply(x, subset, select = n))
> ...
> to get it working (and that means you must know, that sapply uses lapply). or 
> can I somehow avoid the additional explicit definition of the 
> lapply-environment?

You really don't want to go around playing with environment() on 
functions. That way lies madness.  Use subset at the command line and [ or 
[[ in programming.  I don't think I have ever set environment() on a 
function (only on formulas).


-thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problem with lapply(x, subset, ...) and variable select argument

2005-10-11 Thread Peter Dalgaard
"Dimitris Rizopoulos" <[EMAIL PROTECTED]> writes:

> As Gabor said, the issue here is that subset.data.frame() evaluates 
> the value of the `select' argument in the parent.frame(); Thus, if you 
> create a local function within lapply() (or sapply()) it works:

It's more complicated than that: It evaluates the select argument in a
named list with names duplicating those of the data frame, and *then*
in parent.frame. This is convenient for command line use, because you
can specify ranges of variables as in

  dfsub <- subset(dfr,select=c(sex:treat, x_pre:x_24))

but it is quite risky to try and do this inside a function - if you're
passing in a variable, the result depends on whether there is a
variable of the same name in the data frame! You can probably get
around it using substitute() constructions, but I think it is safer to
avoid using functions with nonstandard semantics inside functions.
 
 
> tt <- function (n) {
> x <- list(data.frame(a = 1, b = 2), data.frame(a = 3, b = 4))
> print(lapply(x, function(y, n) subset(y, select = n), n = n))
> print(sapply(x, function(y, n) subset(y, select = n), n = n))
> }
> 
> tt("a")
> 
> 
> I hope it helps.
> 
> Best,
> Dimitris
> 
> 
> Dimitris Rizopoulos
> Ph.D. Student
> Biostatistical Centre
> School of Public Health
> Catholic University of Leuven
> 
> Address: Kapucijnenvoer 35, Leuven, Belgium
> Tel: +32/(0)16/336899
> Fax: +32/(0)16/337015
> Web: http://www.med.kuleuven.be/biostat/
>  http://www.student.kuleuven.be/~m0390867/dimitris.htm
> 
> 
> 
> - Original Message - 
> From: "joerg van den hoff" <[EMAIL PROTECTED]>
> To: "Gabor Grothendieck" <[EMAIL PROTECTED]>; "Thomas Lumley" 
> <[EMAIL PROTECTED]>
> Cc: "r-help" 
> Sent: Tuesday, October 11, 2005 10:18 AM
> Subject: Re: [R] problem with lapply(x, subset,...) and variable 
> select argument
> 
> 
> > Gabor Grothendieck wrote:
> >> The problem is that subset looks into its parent frame but in this
> >> case the parent frame is not the environment in tt but the 
> >> environment
> >> in lapply since tt does not call subset directly but rather lapply 
> >> does.
> >>
> >> Try this which is similar except we have added the line beginning
> >> with environment before the print statement.
> >>
> >> tt <- function (n) {
> >>x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
> >>environment(lapply) <- environment()
> >>print(lapply(x, subset, select = n))
> >> }
> >>
> >> n <- "b"
> >> tt("a")
> >>
> >> What this does is create a new version of lapply whose
> >> parent is the environment in tt.
> >>
> >>
> >> On 10/10/05, joerg van den hoff <[EMAIL PROTECTED]> 
> >> wrote:
> >>
> >>>I need to extract identically named columns from several data 
> >>>frames in
> >>>a list. the column name is a variable (i.e. not known in advance). 
> >>>the
> >>>whole thing occurs within a function body. I'd like to use lapply 
> >>>with a
> >>>variable 'select' argument.
> >>>
> >>>
> >>>example:
> >>>
> >>>tt <- function (n) {
> >>>   x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
> >>>   for (xx in x) print(subset(xx, select = n))   ### works
> >>>   print (lapply(x, subset, select = a))   ### works
> >>>   print (lapply(x, subset, select = "a"))  ### works
> >>>   print (lapply(x, subset, select = n))  ### does not work as 
> >>> intended
> >>>}
> >>>n = "b"
> >>>tt("a")  #works (but selects not the intended column)
> >>>rm(n)
> >>>tt("a")   #no longer works in the lapply call including variable 
> >>>'n'
> >>>
> >>>
> >>>question: how  can I enforce evaluation of the variable n such that
> >>>the lapply call works? I suspect it has something to do with eval 
> >>>and
> >>>specifying the correct evaluation frame, but how? 
> >>>
> >>>
> >>>many thanks
> >>>
> >>>joerg
> >>>
> >>>__
> >>>R-help@stat.math.ethz.ch mailing list
> >>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>>PLEAS

Re: [R] problem with lapply(x, subset, ...) and variable select argument

2005-10-11 Thread Dimitris Rizopoulos
As Gabor said, the issue here is that subset.data.frame() evaluates 
the value of the `select' argument in the parent.frame(); Thus, if you 
create a local function within lapply() (or sapply()) it works:

tt <- function (n) {
x <- list(data.frame(a = 1, b = 2), data.frame(a = 3, b = 4))
print(lapply(x, function(y, n) subset(y, select = n), n = n))
print(sapply(x, function(y, n) subset(y, select = n), n = n))
}

tt("a")


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm



- Original Message - 
From: "joerg van den hoff" <[EMAIL PROTECTED]>
To: "Gabor Grothendieck" <[EMAIL PROTECTED]>; "Thomas Lumley" 
<[EMAIL PROTECTED]>
Cc: "r-help" 
Sent: Tuesday, October 11, 2005 10:18 AM
Subject: Re: [R] problem with lapply(x, subset,...) and variable 
select argument


> Gabor Grothendieck wrote:
>> The problem is that subset looks into its parent frame but in this
>> case the parent frame is not the environment in tt but the 
>> environment
>> in lapply since tt does not call subset directly but rather lapply 
>> does.
>>
>> Try this which is similar except we have added the line beginning
>> with environment before the print statement.
>>
>> tt <- function (n) {
>>x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
>>environment(lapply) <- environment()
>>print(lapply(x, subset, select = n))
>> }
>>
>> n <- "b"
>> tt("a")
>>
>> What this does is create a new version of lapply whose
>> parent is the environment in tt.
>>
>>
>> On 10/10/05, joerg van den hoff <[EMAIL PROTECTED]> 
>> wrote:
>>
>>>I need to extract identically named columns from several data 
>>>frames in
>>>a list. the column name is a variable (i.e. not known in advance). 
>>>the
>>>whole thing occurs within a function body. I'd like to use lapply 
>>>with a
>>>variable 'select' argument.
>>>
>>>
>>>example:
>>>
>>>tt <- function (n) {
>>>   x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
>>>   for (xx in x) print(subset(xx, select = n))   ### works
>>>   print (lapply(x, subset, select = a))   ### works
>>>   print (lapply(x, subset, select = "a"))  ### works
>>>   print (lapply(x, subset, select = n))  ### does not work as 
>>> intended
>>>}
>>>n = "b"
>>>tt("a")  #works (but selects not the intended column)
>>>rm(n)
>>>tt("a")   #no longer works in the lapply call including variable 
>>>'n'
>>>
>>>
>>>question: how  can I enforce evaluation of the variable n such that
>>>the lapply call works? I suspect it has something to do with eval 
>>>and
>>>specifying the correct evaluation frame, but how? 
>>>
>>>
>>>many thanks
>>>
>>>joerg
>>>
>>>__
>>>R-help@stat.math.ethz.ch mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide! 
>>>http://www.R-project.org/posting-guide.html
>>>
>>
>>
>
> many thanks to thomas and gabor for their help. both solutions solve 
> my
> problem perfectly.
>
> but just as an attempt to improve my understanding of the inner 
> workings
> of R (similar problems are sure to come up ...) two more question:
>
> 1.
> why does the call of the "[" function (thomas' solution) behave
> different from "subset" in that the look up of the variable "n" 
> works
> without providing lapply with the current environment (which is 
> nice)?
>
> 2.
> using 'subset' in this context becomes more cumbersome, if sapply is
> used. it seems that than I need
> ...
> environment(sapply) <- environment(lapply) <- environment()
> sapply(x, subset, select = n))
> ...
> to get it working (and that means you must know, that sapply uses
> lapply). or can I somehow avoid the additional explicit definition 
> of
> the lapply-environment?
>
>
> again: many thanks
>
> joerg
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problem with lapply(x, subset, ...) and variable select argument

2005-10-11 Thread joerg van den hoff
Gabor Grothendieck wrote:
> The problem is that subset looks into its parent frame but in this
> case the parent frame is not the environment in tt but the environment
> in lapply since tt does not call subset directly but rather lapply does.
> 
> Try this which is similar except we have added the line beginning
> with environment before the print statement.
> 
> tt <- function (n) {
>x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
>environment(lapply) <- environment()
>print(lapply(x, subset, select = n))
> }
> 
> n <- "b"
> tt("a")
> 
> What this does is create a new version of lapply whose
> parent is the environment in tt.
> 
> 
> On 10/10/05, joerg van den hoff <[EMAIL PROTECTED]> wrote:
> 
>>I need to extract identically named columns from several data frames in
>>a list. the column name is a variable (i.e. not known in advance). the
>>whole thing occurs within a function body. I'd like to use lapply with a
>>variable 'select' argument.
>>
>>
>>example:
>>
>>tt <- function (n) {
>>   x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
>>   for (xx in x) print(subset(xx, select = n))   ### works
>>   print (lapply(x, subset, select = a))   ### works
>>   print (lapply(x, subset, select = "a"))  ### works
>>   print (lapply(x, subset, select = n))  ### does not work as intended
>>}
>>n = "b"
>>tt("a")  #works (but selects not the intended column)
>>rm(n)
>>tt("a")   #no longer works in the lapply call including variable 'n'
>>
>>
>>question: how  can I enforce evaluation of the variable n such that
>>the lapply call works? I suspect it has something to do with eval and
>>specifying the correct evaluation frame, but how? 
>>
>>
>>many thanks
>>
>>joerg
>>
>>__
>>R-help@stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>
> 
> 

many thanks to thomas and gabor for their help. both solutions solve my 
problem perfectly.

but just as an attempt to improve my understanding of the inner workings 
of R (similar problems are sure to come up ...) two more question:

1.
why does the call of the "[" function (thomas' solution) behave 
different from "subset" in that the look up of the variable "n" works 
without providing lapply with the current environment (which is nice)?

2.
using 'subset' in this context becomes more cumbersome, if sapply is 
used. it seems that than I need
...
environment(sapply) <- environment(lapply) <- environment()
sapply(x, subset, select = n))
...
to get it working (and that means you must know, that sapply uses 
lapply). or can I somehow avoid the additional explicit definition of 
the lapply-environment?


again: many thanks

joerg

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problem with lapply(x, subset, ...) and variable select argument

2005-10-10 Thread Gabor Grothendieck
The problem is that subset looks into its parent frame but in this
case the parent frame is not the environment in tt but the environment
in lapply since tt does not call subset directly but rather lapply does.

Try this which is similar except we have added the line beginning
with environment before the print statement.

tt <- function (n) {
   x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
   environment(lapply) <- environment()
   print(lapply(x, subset, select = n))
}

n <- "b"
tt("a")

What this does is create a new version of lapply whose
parent is the environment in tt.


On 10/10/05, joerg van den hoff <[EMAIL PROTECTED]> wrote:
> I need to extract identically named columns from several data frames in
> a list. the column name is a variable (i.e. not known in advance). the
> whole thing occurs within a function body. I'd like to use lapply with a
> variable 'select' argument.
>
>
> example:
>
> tt <- function (n) {
>x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
>for (xx in x) print(subset(xx, select = n))   ### works
>print (lapply(x, subset, select = a))   ### works
>print (lapply(x, subset, select = "a"))  ### works
>print (lapply(x, subset, select = n))  ### does not work as intended
> }
> n = "b"
> tt("a")  #works (but selects not the intended column)
> rm(n)
> tt("a")   #no longer works in the lapply call including variable 'n'
>
>
> question: how  can I enforce evaluation of the variable n such that
> the lapply call works? I suspect it has something to do with eval and
> specifying the correct evaluation frame, but how? 
>
>
> many thanks
>
> joerg
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problem with lapply(x, subset, ...) and variable select argument

2005-10-10 Thread Thomas Lumley
On Mon, 10 Oct 2005, joerg van den hoff wrote:

> I need to extract identically named columns from several data frames in
> a list. the column name is a variable (i.e. not known in advance). the
> whole thing occurs within a function body. I'd like to use lapply with a
> variable 'select' argument.

You would probably be better off using "[" rather than subset().

tt <- function (n) {
 x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
 print(lapply(x,"[",n))
}

seems to do what you want.

-thomas

> example:
>
> tt <- function (n) {
>x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
>for (xx in x) print(subset(xx, select = n))   ### works
>print (lapply(x, subset, select = a))   ### works
>print (lapply(x, subset, select = "a"))  ### works
>print (lapply(x, subset, select = n))  ### does not work as intended
> }
> n = "b"
> tt("a")  #works (but selects not the intended column)
> rm(n)
> tt("a")   #no longer works in the lapply call including variable 'n'
>
>
> question: how  can I enforce evaluation of the variable n such that
> the lapply call works? I suspect it has something to do with eval and
> specifying the correct evaluation frame, but how? 
>
>
> many thanks
>
> joerg
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] problem with lapply(x, subset, ...) and variable select argument

2005-10-10 Thread joerg van den hoff
I need to extract identically named columns from several data frames in 
a list. the column name is a variable (i.e. not known in advance). the 
whole thing occurs within a function body. I'd like to use lapply with a
variable 'select' argument.


example:

tt <- function (n) {
x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
for (xx in x) print(subset(xx, select = n))   ### works
print (lapply(x, subset, select = a))   ### works
print (lapply(x, subset, select = "a"))  ### works
print (lapply(x, subset, select = n))  ### does not work as intended
}
n = "b"
tt("a")  #works (but selects not the intended column)
rm(n)
tt("a")   #no longer works in the lapply call including variable 'n'


question: how  can I enforce evaluation of the variable n such that
the lapply call works? I suspect it has something to do with eval and
specifying the correct evaluation frame, but how? 


many thanks

joerg

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html