Re: [R] problem with lapply(x, subset, ...) and variable select argument

2005-10-11 Thread joerg van den hoff
Gabor Grothendieck wrote:
 The problem is that subset looks into its parent frame but in this
 case the parent frame is not the environment in tt but the environment
 in lapply since tt does not call subset directly but rather lapply does.
 
 Try this which is similar except we have added the line beginning
 with environment before the print statement.
 
 tt - function (n) {
x - list(data.frame(a=1,b=2), data.frame(a=3,b=4))
environment(lapply) - environment()
print(lapply(x, subset, select = n))
 }
 
 n - b
 tt(a)
 
 What this does is create a new version of lapply whose
 parent is the environment in tt.
 
 
 On 10/10/05, joerg van den hoff [EMAIL PROTECTED] wrote:
 
I need to extract identically named columns from several data frames in
a list. the column name is a variable (i.e. not known in advance). the
whole thing occurs within a function body. I'd like to use lapply with a
variable 'select' argument.


example:

tt - function (n) {
   x - list(data.frame(a=1,b=2), data.frame(a=3,b=4))
   for (xx in x) print(subset(xx, select = n))   ### works
   print (lapply(x, subset, select = a))   ### works
   print (lapply(x, subset, select = a))  ### works
   print (lapply(x, subset, select = n))  ### does not work as intended
}
n = b
tt(a)  #works (but selects not the intended column)
rm(n)
tt(a)   #no longer works in the lapply call including variable 'n'


question: how  can I enforce evaluation of the variable n such that
the lapply call works? I suspect it has something to do with eval and
specifying the correct evaluation frame, but how? 


many thanks

joerg

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 
 

many thanks to thomas and gabor for their help. both solutions solve my 
problem perfectly.

but just as an attempt to improve my understanding of the inner workings 
of R (similar problems are sure to come up ...) two more question:

1.
why does the call of the [ function (thomas' solution) behave 
different from subset in that the look up of the variable n works 
without providing lapply with the current environment (which is nice)?

2.
using 'subset' in this context becomes more cumbersome, if sapply is 
used. it seems that than I need
...
environment(sapply) - environment(lapply) - environment()
sapply(x, subset, select = n))
...
to get it working (and that means you must know, that sapply uses 
lapply). or can I somehow avoid the additional explicit definition of 
the lapply-environment?


again: many thanks

joerg

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problem with lapply(x, subset, ...) and variable select argument

2005-10-11 Thread Dimitris Rizopoulos
As Gabor said, the issue here is that subset.data.frame() evaluates 
the value of the `select' argument in the parent.frame(); Thus, if you 
create a local function within lapply() (or sapply()) it works:

tt - function (n) {
x - list(data.frame(a = 1, b = 2), data.frame(a = 3, b = 4))
print(lapply(x, function(y, n) subset(y, select = n), n = n))
print(sapply(x, function(y, n) subset(y, select = n), n = n))
}

tt(a)


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm



- Original Message - 
From: joerg van den hoff [EMAIL PROTECTED]
To: Gabor Grothendieck [EMAIL PROTECTED]; Thomas Lumley 
[EMAIL PROTECTED]
Cc: r-help r-help@stat.math.ethz.ch
Sent: Tuesday, October 11, 2005 10:18 AM
Subject: Re: [R] problem with lapply(x, subset,...) and variable 
select argument


 Gabor Grothendieck wrote:
 The problem is that subset looks into its parent frame but in this
 case the parent frame is not the environment in tt but the 
 environment
 in lapply since tt does not call subset directly but rather lapply 
 does.

 Try this which is similar except we have added the line beginning
 with environment before the print statement.

 tt - function (n) {
x - list(data.frame(a=1,b=2), data.frame(a=3,b=4))
environment(lapply) - environment()
print(lapply(x, subset, select = n))
 }

 n - b
 tt(a)

 What this does is create a new version of lapply whose
 parent is the environment in tt.


 On 10/10/05, joerg van den hoff [EMAIL PROTECTED] 
 wrote:

I need to extract identically named columns from several data 
frames in
a list. the column name is a variable (i.e. not known in advance). 
the
whole thing occurs within a function body. I'd like to use lapply 
with a
variable 'select' argument.


example:

tt - function (n) {
   x - list(data.frame(a=1,b=2), data.frame(a=3,b=4))
   for (xx in x) print(subset(xx, select = n))   ### works
   print (lapply(x, subset, select = a))   ### works
   print (lapply(x, subset, select = a))  ### works
   print (lapply(x, subset, select = n))  ### does not work as 
 intended
}
n = b
tt(a)  #works (but selects not the intended column)
rm(n)
tt(a)   #no longer works in the lapply call including variable 
'n'


question: how  can I enforce evaluation of the variable n such that
the lapply call works? I suspect it has something to do with eval 
and
specifying the correct evaluation frame, but how? 


many thanks

joerg

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html




 many thanks to thomas and gabor for their help. both solutions solve 
 my
 problem perfectly.

 but just as an attempt to improve my understanding of the inner 
 workings
 of R (similar problems are sure to come up ...) two more question:

 1.
 why does the call of the [ function (thomas' solution) behave
 different from subset in that the look up of the variable n 
 works
 without providing lapply with the current environment (which is 
 nice)?

 2.
 using 'subset' in this context becomes more cumbersome, if sapply is
 used. it seems that than I need
 ...
 environment(sapply) - environment(lapply) - environment()
 sapply(x, subset, select = n))
 ...
 to get it working (and that means you must know, that sapply uses
 lapply). or can I somehow avoid the additional explicit definition 
 of
 the lapply-environment?


 again: many thanks

 joerg

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problem with lapply(x, subset, ...) and variable select argument

2005-10-11 Thread Peter Dalgaard
Dimitris Rizopoulos [EMAIL PROTECTED] writes:

 As Gabor said, the issue here is that subset.data.frame() evaluates 
 the value of the `select' argument in the parent.frame(); Thus, if you 
 create a local function within lapply() (or sapply()) it works:

It's more complicated than that: It evaluates the select argument in a
named list with names duplicating those of the data frame, and *then*
in parent.frame. This is convenient for command line use, because you
can specify ranges of variables as in

  dfsub - subset(dfr,select=c(sex:treat, x_pre:x_24))

but it is quite risky to try and do this inside a function - if you're
passing in a variable, the result depends on whether there is a
variable of the same name in the data frame! You can probably get
around it using substitute() constructions, but I think it is safer to
avoid using functions with nonstandard semantics inside functions.
 
 
 tt - function (n) {
 x - list(data.frame(a = 1, b = 2), data.frame(a = 3, b = 4))
 print(lapply(x, function(y, n) subset(y, select = n), n = n))
 print(sapply(x, function(y, n) subset(y, select = n), n = n))
 }
 
 tt(a)
 
 
 I hope it helps.
 
 Best,
 Dimitris
 
 
 Dimitris Rizopoulos
 Ph.D. Student
 Biostatistical Centre
 School of Public Health
 Catholic University of Leuven
 
 Address: Kapucijnenvoer 35, Leuven, Belgium
 Tel: +32/(0)16/336899
 Fax: +32/(0)16/337015
 Web: http://www.med.kuleuven.be/biostat/
  http://www.student.kuleuven.be/~m0390867/dimitris.htm
 
 
 
 - Original Message - 
 From: joerg van den hoff [EMAIL PROTECTED]
 To: Gabor Grothendieck [EMAIL PROTECTED]; Thomas Lumley 
 [EMAIL PROTECTED]
 Cc: r-help r-help@stat.math.ethz.ch
 Sent: Tuesday, October 11, 2005 10:18 AM
 Subject: Re: [R] problem with lapply(x, subset,...) and variable 
 select argument
 
 
  Gabor Grothendieck wrote:
  The problem is that subset looks into its parent frame but in this
  case the parent frame is not the environment in tt but the 
  environment
  in lapply since tt does not call subset directly but rather lapply 
  does.
 
  Try this which is similar except we have added the line beginning
  with environment before the print statement.
 
  tt - function (n) {
 x - list(data.frame(a=1,b=2), data.frame(a=3,b=4))
 environment(lapply) - environment()
 print(lapply(x, subset, select = n))
  }
 
  n - b
  tt(a)
 
  What this does is create a new version of lapply whose
  parent is the environment in tt.
 
 
  On 10/10/05, joerg van den hoff [EMAIL PROTECTED] 
  wrote:
 
 I need to extract identically named columns from several data 
 frames in
 a list. the column name is a variable (i.e. not known in advance). 
 the
 whole thing occurs within a function body. I'd like to use lapply 
 with a
 variable 'select' argument.
 
 
 example:
 
 tt - function (n) {
x - list(data.frame(a=1,b=2), data.frame(a=3,b=4))
for (xx in x) print(subset(xx, select = n))   ### works
print (lapply(x, subset, select = a))   ### works
print (lapply(x, subset, select = a))  ### works
print (lapply(x, subset, select = n))  ### does not work as 
  intended
 }
 n = b
 tt(a)  #works (but selects not the intended column)
 rm(n)
 tt(a)   #no longer works in the lapply call including variable 
 'n'
 
 
 question: how  can I enforce evaluation of the variable n such that
 the lapply call works? I suspect it has something to do with eval 
 and
 specifying the correct evaluation frame, but how? 
 
 
 many thanks
 
 joerg
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
 
 
 
  many thanks to thomas and gabor for their help. both solutions solve 
  my
  problem perfectly.
 
  but just as an attempt to improve my understanding of the inner 
  workings
  of R (similar problems are sure to come up ...) two more question:
 
  1.
  why does the call of the [ function (thomas' solution) behave
  different from subset in that the look up of the variable n 
  works
  without providing lapply with the current environment (which is 
  nice)?
 
  2.
  using 'subset' in this context becomes more cumbersome, if sapply is
  used. it seems that than I need
  ...
  environment(sapply) - environment(lapply) - environment()
  sapply(x, subset, select = n))
  ...
  to get it working (and that means you must know, that sapply uses
  lapply). or can I somehow avoid the additional explicit definition 
  of
  the lapply-environment?
 
 
  again: many thanks
 
  joerg
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
  
 
 
 Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
 
 __
 R-help@stat.math.ethz.ch mailing list
 https

Re: [R] problem with lapply(x, subset, ...) and variable select argument

2005-10-11 Thread Thomas Lumley
On Tue, 11 Oct 2005, joerg van den hoff wrote:
 many thanks to thomas and gabor for their help. both solutions solve my 
 problem perfectly.

 but just as an attempt to improve my understanding of the inner workings of R 
 (similar problems are sure to come up ...) two more question:

 1.
 why does the call of the [ function (thomas' solution) behave different 
 from subset in that the look up of the variable n works without providing 
 lapply with the current environment (which is nice)?

[ behaves like nearly all functions in R: the value of the argument is 
passed.   subset() does some tricky things to subvert the usual argument 
passing.  Quite a few of the modelling functions do similar tricky things, 
and they do sometimes get confused when passed as arguments to another 
function.

 2.
 using 'subset' in this context becomes more cumbersome, if sapply is used. it 
 seems that than I need
 ...
 environment(sapply) - environment(lapply) - environment()
 sapply(x, subset, select = n))
 ...
 to get it working (and that means you must know, that sapply uses lapply). or 
 can I somehow avoid the additional explicit definition of the 
 lapply-environment?

You really don't want to go around playing with environment() on 
functions. That way lies madness.  Use subset at the command line and [ or 
[[ in programming.  I don't think I have ever set environment() on a 
function (only on formulas).


-thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problem with lapply(x, subset, ...) and variable select argument

2005-10-11 Thread Gabor Grothendieck
Just one simple shortening of DR's solution:

tt - function (n) {
   x - list(data.frame(a=1,b=2), data.frame(a=3,b=4))
   print(sapply(x, function(...) subset(...), select = n))
}

n - b
tt(a)


On 10/11/05, Dimitris Rizopoulos [EMAIL PROTECTED] wrote:
 As Gabor said, the issue here is that subset.data.frame() evaluates
 the value of the `select' argument in the parent.frame(); Thus, if you
 create a local function within lapply() (or sapply()) it works:

 tt - function (n) {
x - list(data.frame(a = 1, b = 2), data.frame(a = 3, b = 4))
print(lapply(x, function(y, n) subset(y, select = n), n = n))
print(sapply(x, function(y, n) subset(y, select = n), n = n))
 }

 tt(a)


 I hope it helps.

 Best,
 Dimitris

 
 Dimitris Rizopoulos
 Ph.D. Student
 Biostatistical Centre
 School of Public Health
 Catholic University of Leuven

 Address: Kapucijnenvoer 35, Leuven, Belgium
 Tel: +32/(0)16/336899
 Fax: +32/(0)16/337015
 Web: http://www.med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm



 - Original Message -
 From: joerg van den hoff [EMAIL PROTECTED]
 To: Gabor Grothendieck [EMAIL PROTECTED]; Thomas Lumley
 [EMAIL PROTECTED]
 Cc: r-help r-help@stat.math.ethz.ch
 Sent: Tuesday, October 11, 2005 10:18 AM
 Subject: Re: [R] problem with lapply(x, subset,...) and variable
 select argument


  Gabor Grothendieck wrote:
  The problem is that subset looks into its parent frame but in this
  case the parent frame is not the environment in tt but the
  environment
  in lapply since tt does not call subset directly but rather lapply
  does.
 
  Try this which is similar except we have added the line beginning
  with environment before the print statement.
 
  tt - function (n) {
 x - list(data.frame(a=1,b=2), data.frame(a=3,b=4))
 environment(lapply) - environment()
 print(lapply(x, subset, select = n))
  }
 
  n - b
  tt(a)
 
  What this does is create a new version of lapply whose
  parent is the environment in tt.
 
 
  On 10/10/05, joerg van den hoff [EMAIL PROTECTED]
  wrote:
 
 I need to extract identically named columns from several data
 frames in
 a list. the column name is a variable (i.e. not known in advance).
 the
 whole thing occurs within a function body. I'd like to use lapply
 with a
 variable 'select' argument.
 
 
 example:
 
 tt - function (n) {
x - list(data.frame(a=1,b=2), data.frame(a=3,b=4))
for (xx in x) print(subset(xx, select = n))   ### works
print (lapply(x, subset, select = a))   ### works
print (lapply(x, subset, select = a))  ### works
print (lapply(x, subset, select = n))  ### does not work as
  intended
 }
 n = b
 tt(a)  #works (but selects not the intended column)
 rm(n)
 tt(a)   #no longer works in the lapply call including variable
 'n'
 
 
 question: how  can I enforce evaluation of the variable n such that
 the lapply call works? I suspect it has something to do with eval
 and
 specifying the correct evaluation frame, but how? 
 
 
 many thanks
 
 joerg
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
 
 
 
 
  many thanks to thomas and gabor for their help. both solutions solve
  my
  problem perfectly.
 
  but just as an attempt to improve my understanding of the inner
  workings
  of R (similar problems are sure to come up ...) two more question:
 
  1.
  why does the call of the [ function (thomas' solution) behave
  different from subset in that the look up of the variable n
  works
  without providing lapply with the current environment (which is
  nice)?
 
  2.
  using 'subset' in this context becomes more cumbersome, if sapply is
  used. it seems that than I need
  ...
  environment(sapply) - environment(lapply) - environment()
  sapply(x, subset, select = n))
  ...
  to get it working (and that means you must know, that sapply uses
  lapply). or can I somehow avoid the additional explicit definition
  of
  the lapply-environment?
 
 
  again: many thanks
 
  joerg
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
  http://www.R-project.org/posting-guide.html
 


 Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] problem with lapply(x, subset, ...) and variable select argument

2005-10-10 Thread joerg van den hoff
I need to extract identically named columns from several data frames in 
a list. the column name is a variable (i.e. not known in advance). the 
whole thing occurs within a function body. I'd like to use lapply with a
variable 'select' argument.


example:

tt - function (n) {
x - list(data.frame(a=1,b=2), data.frame(a=3,b=4))
for (xx in x) print(subset(xx, select = n))   ### works
print (lapply(x, subset, select = a))   ### works
print (lapply(x, subset, select = a))  ### works
print (lapply(x, subset, select = n))  ### does not work as intended
}
n = b
tt(a)  #works (but selects not the intended column)
rm(n)
tt(a)   #no longer works in the lapply call including variable 'n'


question: how  can I enforce evaluation of the variable n such that
the lapply call works? I suspect it has something to do with eval and
specifying the correct evaluation frame, but how? 


many thanks

joerg

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problem with lapply(x, subset, ...) and variable select argument

2005-10-10 Thread Thomas Lumley
On Mon, 10 Oct 2005, joerg van den hoff wrote:

 I need to extract identically named columns from several data frames in
 a list. the column name is a variable (i.e. not known in advance). the
 whole thing occurs within a function body. I'd like to use lapply with a
 variable 'select' argument.

You would probably be better off using [ rather than subset().

tt - function (n) {
 x - list(data.frame(a=1,b=2), data.frame(a=3,b=4))
 print(lapply(x,[,n))
}

seems to do what you want.

-thomas

 example:

 tt - function (n) {
x - list(data.frame(a=1,b=2), data.frame(a=3,b=4))
for (xx in x) print(subset(xx, select = n))   ### works
print (lapply(x, subset, select = a))   ### works
print (lapply(x, subset, select = a))  ### works
print (lapply(x, subset, select = n))  ### does not work as intended
 }
 n = b
 tt(a)  #works (but selects not the intended column)
 rm(n)
 tt(a)   #no longer works in the lapply call including variable 'n'


 question: how  can I enforce evaluation of the variable n such that
 the lapply call works? I suspect it has something to do with eval and
 specifying the correct evaluation frame, but how? 


 many thanks

 joerg

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problem with lapply(x, subset, ...) and variable select argument

2005-10-10 Thread Gabor Grothendieck
The problem is that subset looks into its parent frame but in this
case the parent frame is not the environment in tt but the environment
in lapply since tt does not call subset directly but rather lapply does.

Try this which is similar except we have added the line beginning
with environment before the print statement.

tt - function (n) {
   x - list(data.frame(a=1,b=2), data.frame(a=3,b=4))
   environment(lapply) - environment()
   print(lapply(x, subset, select = n))
}

n - b
tt(a)

What this does is create a new version of lapply whose
parent is the environment in tt.


On 10/10/05, joerg van den hoff [EMAIL PROTECTED] wrote:
 I need to extract identically named columns from several data frames in
 a list. the column name is a variable (i.e. not known in advance). the
 whole thing occurs within a function body. I'd like to use lapply with a
 variable 'select' argument.


 example:

 tt - function (n) {
x - list(data.frame(a=1,b=2), data.frame(a=3,b=4))
for (xx in x) print(subset(xx, select = n))   ### works
print (lapply(x, subset, select = a))   ### works
print (lapply(x, subset, select = a))  ### works
print (lapply(x, subset, select = n))  ### does not work as intended
 }
 n = b
 tt(a)  #works (but selects not the intended column)
 rm(n)
 tt(a)   #no longer works in the lapply call including variable 'n'


 question: how  can I enforce evaluation of the variable n such that
 the lapply call works? I suspect it has something to do with eval and
 specifying the correct evaluation frame, but how? 


 many thanks

 joerg

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html