Re: [R] programming: telling a function where to look for the entered variables

2011-04-01 Thread Nick Sabbe
See the warning in ?subset.
Passing the column name of lvar is not the same as passing the 'contextual
column' (as I coin it in these circumstances).
You can solve it by indeed using [] instead.

For my own comfort, here is the relevant line from your original function:
Data.tmp - subset(Fulldf, lvar==subgroup, select=c(xvar,yvar))
Which should become something like (untested but should be close):
Data.tmp - Fulldf[Fulldf[,lvar]==subgroup, c(xvar,yvar)]

This should be a lot easier to translate based on column names, as the
column names are now used as such.

HTH,


Nick Sabbe
--
ping: nick.sa...@ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36

-- Do Not Disapprove




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of E Hofstadler
Sent: vrijdag 1 april 2011 13:09
To: r-help@r-project.org
Subject: [R] programming: telling a function where to look for the entered
variables

Hi there,

Could someone help me with the following programming problem..?

I have written a function that works for my intended purpose, but it
is quite closely tied to a particular dataframe and the names of the
variables in this dataframe. However, I'd like to use the same
function for different dataframes and variables. My problem is that
I'm not quite sure how to tell my function in which dataframe the
entered variables are located.

Here's some reproducible data and the function:

# create reproducible data
set.seed(124)
xvar - sample(0:3, 1000, replace = T)
yvar - sample(0:1, 1000, replace=T)
zvar - rnorm(100)
lvar - sample(0:1, 1000, replace=T)
Fulldf - as.data.frame(cbind(xvar,yvar,zvar,lvar))
Fulldf$xvar - factor(xvar, labels=c(blue,green,red,yellow))
Fulldf$yvar - factor(yvar, labels=c(area1,area2))
Fulldf$lvar - factor(lvar, labels=c(yes,no))

and here's the function in the form that it currently works: from a
subset of the dataframe Fulldf, a contingency table is created (in my
actual data, several other operations are then performed on that
contingency table, but these are not relevant for the problem in
question, therefore I've deleted it) .

# function as it currently works: tailored to a particular dataframe
(Fulldf)

myfunct - function(subgroup){ # enter a particular subgroup for which
the contingency table should be calculated (i.e. a particular value of
the factor lvar)
Data.tmp - subset(Fulldf, lvar==subgroup, select=c(xvar,yvar))
#restrict dataframe to given subgroup and two columns of the original
dataframe
Data.tmp - na.omit(Data.tmp) # exclude missing values
indextable - table(Data.tmp$xvar, Data.tmp$yvar) # make contingency table
return(indextable)
}

#Since I need to use the function with different dataframes and
variable names, I'd like to be able to tell my function the name of
the dataframe and variables it should use for calculating the index.
This is how I tried to modify the first part of the #function, but it
didn't work:

# function as I would like it to work: independent of any particular
dataframe or variable names (doesn't work)

myfunct.better - function(subgroup, lvarname, yvarname, dataframe){
#enter the subgroup, the variable names to be used and the dataframe
in which they are found
Data.tmp - subset(dataframe, lvarname==subgroup, select=c(xvar,
deparse(substitute(yvarname # trying to subset the given dataframe
for the given subgroup of the given variable. The variable xvar
happens to have the same name in all dataframes) but the variable
yvarname has different names in the different dataframes
Data.tmp - na.omit(Data.tmp)
indextable - table(Data.tmp$xvar, Data.tmp$yvarname) # create the
contingency table on the basis of the entered variables
return(indextable)
}

calling

myfunct.better(yes, lvarname=lvar, yvarname=yvar, dataframe=Fulldf)

results in the following error:

Error in `[.data.frame`(x, r, vars, drop = drop) :
  undefined columns selected

My feeling is that R doesn't know where to look for the entered
variables (lvar, yvar), but I'm not sure how to solve this problem. I
tried using with() and even attach() within the function, but that
didn't work.

Any help is greatly appreciated.

Best,
Esther

P.S.:
Are there books that elaborate programming in R for beginners -- and I
mean things like how to best use vectorization instead of loops and
general best practice tips for programming. Most of the books I've
been looking at focus on applying R for particular statistical
analyses, and only comparably briefly deal with more general
programming aspects. I was wondering if there's any books or tutorials
out there that cover the latter aspects in a more elaborate and
systematic way...?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] programming: telling a function where to look for the entered variables

2011-04-01 Thread E Hofstadler
Thanks Nick and Juan for your replies.

Nick, thanks for pointing out the warning in subset(). I'm not sure
though I understand the example you provided -- because despite using
subset() rather than bracket notation, the original function (myfunct)
does what is expected of it. The problem I have is with the second
function (myfunct.better), where variable names + dataframe are not
fixed within the function but passed to the function when calling it
-- and even with bracket notation I don't quite manage to tell R where
to look for the columns that related to the entered column names.
(but then perhaps I misunderstood you)

This is what I tried (using bracket notation):

myfunct.better(dataframe, subgroup, lvarname,yvarname){
Data.tmp - dataframe[dataframe[,deparse(substitute(lvarname))]==subgroup,
c(xvar,deparse(substitute(yvarname)))]
}

but this creates an empty contingency table only -- perhaps because my
use of deparse() is flawed (I think what is converted into a string is
lvarname and yvarname, rather than the column names that these two
function-variables represent in the dataframe)?


2011/4/1 Nick Sabbe nick.sa...@ugent.be:
 See the warning in ?subset.
 Passing the column name of lvar is not the same as passing the 'contextual
 column' (as I coin it in these circumstances).
 You can solve it by indeed using [] instead.

 For my own comfort, here is the relevant line from your original function:
 Data.tmp - subset(Fulldf, lvar==subgroup, select=c(xvar,yvar))
 Which should become something like (untested but should be close):
 Data.tmp - Fulldf[Fulldf[,lvar]==subgroup, c(xvar,yvar)]

 This should be a lot easier to translate based on column names, as the
 column names are now used as such.

 HTH,


 Nick Sabbe
 --
 ping: nick.sa...@ugent.be
 link: http://biomath.ugent.be
 wink: A1.056, Coupure Links 653, 9000 Gent
 ring: 09/264.59.36

 -- Do Not Disapprove




 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of E Hofstadler
 Sent: vrijdag 1 april 2011 13:09
 To: r-help@r-project.org
 Subject: [R] programming: telling a function where to look for the entered
 variables

 Hi there,

 Could someone help me with the following programming problem..?

 I have written a function that works for my intended purpose, but it
 is quite closely tied to a particular dataframe and the names of the
 variables in this dataframe. However, I'd like to use the same
 function for different dataframes and variables. My problem is that
 I'm not quite sure how to tell my function in which dataframe the
 entered variables are located.

 Here's some reproducible data and the function:

 # create reproducible data
 set.seed(124)
 xvar - sample(0:3, 1000, replace = T)
 yvar - sample(0:1, 1000, replace=T)
 zvar - rnorm(100)
 lvar - sample(0:1, 1000, replace=T)
 Fulldf - as.data.frame(cbind(xvar,yvar,zvar,lvar))
 Fulldf$xvar - factor(xvar, labels=c(blue,green,red,yellow))
 Fulldf$yvar - factor(yvar, labels=c(area1,area2))
 Fulldf$lvar - factor(lvar, labels=c(yes,no))

 and here's the function in the form that it currently works: from a
 subset of the dataframe Fulldf, a contingency table is created (in my
 actual data, several other operations are then performed on that
 contingency table, but these are not relevant for the problem in
 question, therefore I've deleted it) .

 # function as it currently works: tailored to a particular dataframe
 (Fulldf)

 myfunct - function(subgroup){ # enter a particular subgroup for which
 the contingency table should be calculated (i.e. a particular value of
 the factor lvar)
 Data.tmp - subset(Fulldf, lvar==subgroup, select=c(xvar,yvar))
 #restrict dataframe to given subgroup and two columns of the original
 dataframe
 Data.tmp - na.omit(Data.tmp) # exclude missing values
 indextable - table(Data.tmp$xvar, Data.tmp$yvar) # make contingency table
 return(indextable)
 }

 #Since I need to use the function with different dataframes and
 variable names, I'd like to be able to tell my function the name of
 the dataframe and variables it should use for calculating the index.
 This is how I tried to modify the first part of the #function, but it
 didn't work:

 # function as I would like it to work: independent of any particular
 dataframe or variable names (doesn't work)

 myfunct.better - function(subgroup, lvarname, yvarname, dataframe){
 #enter the subgroup, the variable names to be used and the dataframe
 in which they are found
    Data.tmp - subset(dataframe, lvarname==subgroup, select=c(xvar,
 deparse(substitute(yvarname # trying to subset the given dataframe
 for the given subgroup of the given variable. The variable xvar
 happens to have the same name in all dataframes) but the variable
 yvarname has different names in the different dataframes
 Data.tmp - na.omit(Data.tmp)
    indextable - table(Data.tmp$xvar, Data.tmp$yvarname) # create the
 contingency table on the basis of the entered variables
 return(indextable)
 }

 

Re: [R] programming: telling a function where to look for the entered variables

2011-04-01 Thread Nick Sabbe
This should be a version that does what you want.
Because you named the variable lvarname, I assumed you were already passing
lvar instead of trying to pass lvar (without the quotes), which is in no
way a 'name'.

myfunct.better - function(subgroup, lvarname, xvarname, yvarname,
dataframe)
{
#enter the subgroup, the variable names to be used and the dataframe
#in which they are found
Data.tmp - Fulldf[Fulldf[,lvarname]==subgroup,
c(xvarname,yvarname)]
Data.tmp -na.omit(Data.tmp)
indextable - table(Data.tmp[,xvarname], Data.tmp[,yvarname]) #
create the contingency 
#table on the basis of the entered variables
#actually, if I remember well, you could simply use
indextable-table(Data.tmp) here
#that would allow for some more simplifications (replace xvarname
and yvarname by
#columnsOfInterest or similar, and pass that instead of c(xvarname,
yvarname) )
return(indextable)
}

myfunct.better(yes, lvarname=lvar, xvarname=xvar, yvarname=yvar,
dataframe=Fulldf)


HTH,


Nick Sabbe
--
ping: nick.sa...@ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36

-- Do Not Disapprove




-Original Message-
From: irene.p...@googlemail.com [mailto:irene.p...@googlemail.com] On Behalf
Of E Hofstadler
Sent: vrijdag 1 april 2011 14:28
To: Nick Sabbe
Cc: r-help@r-project.org
Subject: Re: [R] programming: telling a function where to look for the
entered variables

Thanks Nick and Juan for your replies.

Nick, thanks for pointing out the warning in subset(). I'm not sure
though I understand the example you provided -- because despite using
subset() rather than bracket notation, the original function (myfunct)
does what is expected of it. The problem I have is with the second
function (myfunct.better), where variable names + dataframe are not
fixed within the function but passed to the function when calling it
-- and even with bracket notation I don't quite manage to tell R where
to look for the columns that related to the entered column names.
(but then perhaps I misunderstood you)

This is what I tried (using bracket notation):

myfunct.better(dataframe, subgroup, lvarname,yvarname){
Data.tmp - dataframe[dataframe[,deparse(substitute(lvarname))]==subgroup,
c(xvar,deparse(substitute(yvarname)))]
}

but this creates an empty contingency table only -- perhaps because my
use of deparse() is flawed (I think what is converted into a string is
lvarname and yvarname, rather than the column names that these two
function-variables represent in the dataframe)?


2011/4/1 Nick Sabbe nick.sa...@ugent.be:
 See the warning in ?subset.
 Passing the column name of lvar is not the same as passing the 'contextual
 column' (as I coin it in these circumstances).
 You can solve it by indeed using [] instead.

 For my own comfort, here is the relevant line from your original function:
 Data.tmp - subset(Fulldf, lvar==subgroup, select=c(xvar,yvar))
 Which should become something like (untested but should be close):
 Data.tmp - Fulldf[Fulldf[,lvar]==subgroup, c(xvar,yvar)]

 This should be a lot easier to translate based on column names, as the
 column names are now used as such.

 HTH,


 Nick Sabbe
 --
 ping: nick.sa...@ugent.be
 link: http://biomath.ugent.be
 wink: A1.056, Coupure Links 653, 9000 Gent
 ring: 09/264.59.36

 -- Do Not Disapprove




 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On
 Behalf Of E Hofstadler
 Sent: vrijdag 1 april 2011 13:09
 To: r-help@r-project.org
 Subject: [R] programming: telling a function where to look for the entered
 variables

 Hi there,

 Could someone help me with the following programming problem..?

 I have written a function that works for my intended purpose, but it
 is quite closely tied to a particular dataframe and the names of the
 variables in this dataframe. However, I'd like to use the same
 function for different dataframes and variables. My problem is that
 I'm not quite sure how to tell my function in which dataframe the
 entered variables are located.

 Here's some reproducible data and the function:

 # create reproducible data
 set.seed(124)
 xvar - sample(0:3, 1000, replace = T)
 yvar - sample(0:1, 1000, replace=T)
 zvar - rnorm(100)
 lvar - sample(0:1, 1000, replace=T)
 Fulldf - as.data.frame(cbind(xvar,yvar,zvar,lvar))
 Fulldf$xvar - factor(xvar, labels=c(blue,green,red,yellow))
 Fulldf$yvar - factor(yvar, labels=c(area1,area2))
 Fulldf$lvar - factor(lvar, labels=c(yes,no))

 and here's the function in the form that it currently works: from a
 subset of the dataframe Fulldf, a contingency table is created (in my
 actual data, several other operations are then performed on that
 contingency table, but these are not relevant for the problem in
 question, therefore I've deleted it) .

 # function as it currently works: tailored to a particular dataframe
 (Fulldf)

 myfunct - function(subgroup){ # 

Re: [R] programming: telling a function where to look for the entered variables

2011-04-01 Thread E Hofstadler
2011/4/1 Nick Sabbe nick.sa...@ugent.be:
 This should be a version that does what you want.

Indeed it does, thank you very much!

 Because you named the variable lvarname, I assumed you were already passing
 lvar instead of trying to pass lvar (without the quotes), which is in no
 way a 'name'.

Sorry about that, I can see how my variable names were somewhat confusing.

Many thanks once again!




 -Original Message-
 From: irene.p...@googlemail.com [mailto:irene.p...@googlemail.com] On Behalf
 Of E Hofstadler
 Sent: vrijdag 1 april 2011 14:28
 To: Nick Sabbe
 Cc: r-help@r-project.org
 Subject: Re: [R] programming: telling a function where to look for the
 entered variables

 Thanks Nick and Juan for your replies.

 Nick, thanks for pointing out the warning in subset(). I'm not sure
 though I understand the example you provided -- because despite using
 subset() rather than bracket notation, the original function (myfunct)
 does what is expected of it. The problem I have is with the second
 function (myfunct.better), where variable names + dataframe are not
 fixed within the function but passed to the function when calling it
 -- and even with bracket notation I don't quite manage to tell R where
 to look for the columns that related to the entered column names.
 (but then perhaps I misunderstood you)

 This is what I tried (using bracket notation):

 myfunct.better(dataframe, subgroup, lvarname,yvarname){
 Data.tmp - dataframe[dataframe[,deparse(substitute(lvarname))]==subgroup,
 c(xvar,deparse(substitute(yvarname)))]
 }

 but this creates an empty contingency table only -- perhaps because my
 use of deparse() is flawed (I think what is converted into a string is
 lvarname and yvarname, rather than the column names that these two
 function-variables represent in the dataframe)?


 2011/4/1 Nick Sabbe nick.sa...@ugent.be:
 See the warning in ?subset.
 Passing the column name of lvar is not the same as passing the 'contextual
 column' (as I coin it in these circumstances).
 You can solve it by indeed using [] instead.

 For my own comfort, here is the relevant line from your original function:
 Data.tmp - subset(Fulldf, lvar==subgroup, select=c(xvar,yvar))
 Which should become something like (untested but should be close):
 Data.tmp - Fulldf[Fulldf[,lvar]==subgroup, c(xvar,yvar)]

 This should be a lot easier to translate based on column names, as the
 column names are now used as such.

 HTH,


 Nick Sabbe
 --
 ping: nick.sa...@ugent.be
 link: http://biomath.ugent.be
 wink: A1.056, Coupure Links 653, 9000 Gent
 ring: 09/264.59.36

 -- Do Not Disapprove




 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On
 Behalf Of E Hofstadler
 Sent: vrijdag 1 april 2011 13:09
 To: r-help@r-project.org
 Subject: [R] programming: telling a function where to look for the entered
 variables

 Hi there,

 Could someone help me with the following programming problem..?

 I have written a function that works for my intended purpose, but it
 is quite closely tied to a particular dataframe and the names of the
 variables in this dataframe. However, I'd like to use the same
 function for different dataframes and variables. My problem is that
 I'm not quite sure how to tell my function in which dataframe the
 entered variables are located.

 Here's some reproducible data and the function:

 # create reproducible data
 set.seed(124)
 xvar - sample(0:3, 1000, replace = T)
 yvar - sample(0:1, 1000, replace=T)
 zvar - rnorm(100)
 lvar - sample(0:1, 1000, replace=T)
 Fulldf - as.data.frame(cbind(xvar,yvar,zvar,lvar))
 Fulldf$xvar - factor(xvar, labels=c(blue,green,red,yellow))
 Fulldf$yvar - factor(yvar, labels=c(area1,area2))
 Fulldf$lvar - factor(lvar, labels=c(yes,no))

 and here's the function in the form that it currently works: from a
 subset of the dataframe Fulldf, a contingency table is created (in my
 actual data, several other operations are then performed on that
 contingency table, but these are not relevant for the problem in
 question, therefore I've deleted it) .

 # function as it currently works: tailored to a particular dataframe
 (Fulldf)

 myfunct - function(subgroup){ # enter a particular subgroup for which
 the contingency table should be calculated (i.e. a particular value of
 the factor lvar)
 Data.tmp - subset(Fulldf, lvar==subgroup, select=c(xvar,yvar))
 #restrict dataframe to given subgroup and two columns of the original
 dataframe
 Data.tmp - na.omit(Data.tmp) # exclude missing values
 indextable - table(Data.tmp$xvar, Data.tmp$yvar) # make contingency table
 return(indextable)
 }

 #Since I need to use the function with different dataframes and
 variable names, I'd like to be able to tell my function the name of
 the dataframe and variables it should use for calculating the index.
 This is how I tried to modify the first part of the #function, but it
 didn't work:

 # function as I would like it to work: independent of any particular