[R] using lm() with variable formula

2007-05-17 Thread Chris Elsaesser
New to R; please excuse me if this is a dumb question.  I tried to RTFM;
didn't help.

I want to do a series of regressions over the columns in a data.frame,
systematically varying the response variable and the the terms; and not
necessarily including all the non-response columns.  In my case, the
columns are time series. I don't know if that makes a difference; it
does mean I have to call lag() to offset non-response terms. I can not
assume a specific number of columns in the data.frame; might be 3, might
be 20. 

My central problem is that the formula given to lm() is different each
time.  For example, say a data.frame had columns with the following
headings:  height, weight, BP (blood pressure), and Cals (calorie intake
per time frame).  In that case, I'd need something like the following:

lm(height ~ weight + BP + Cals)
lm(height ~ weight + BP)
lm(height ~ weight + Cals)
lm(height ~ BP + Cals)
lm(weight ~ height + BP)
lm(weight ~ height + Cals)
etc.

In general, I'll have to read the header to get the argument labels.

Do I have to write several functions, each taking a different number of
arguments?  I'd like to construct a string or list representing the
varialbes in the formula and apply lm(), so to say  [I'm mainly a Lisp
programmer where that part would be very simple. Anyone have a Lisp API
for R? :-}]

Thanks,
chris

Chris Elsaesser, PhD
Principal Scientist, Machine Learning
SPADAC Inc.
7921 Jones Branch Dr. Suite 600  
McLean, VA 22102  

703.371.7301 (m)
703.637.9421 (o)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using lm() with variable formula

2007-05-17 Thread Gabor Grothendieck
Try this:


lm(Sepal.Length ~., iris[1:3])

# or

cn <- c("Sepal.Length", "Sepal.Width", "Petal.Length")
lm(Sepal.Length ~., iris[cn])



On 5/17/07, Chris Elsaesser <[EMAIL PROTECTED]> wrote:
> New to R; please excuse me if this is a dumb question.  I tried to RTFM;
> didn't help.
>
> I want to do a series of regressions over the columns in a data.frame,
> systematically varying the response variable and the the terms; and not
> necessarily including all the non-response columns.  In my case, the
> columns are time series. I don't know if that makes a difference; it
> does mean I have to call lag() to offset non-response terms. I can not
> assume a specific number of columns in the data.frame; might be 3, might
> be 20.
>
> My central problem is that the formula given to lm() is different each
> time.  For example, say a data.frame had columns with the following
> headings:  height, weight, BP (blood pressure), and Cals (calorie intake
> per time frame).  In that case, I'd need something like the following:
>
>lm(height ~ weight + BP + Cals)
>lm(height ~ weight + BP)
>lm(height ~ weight + Cals)
>lm(height ~ BP + Cals)
>lm(weight ~ height + BP)
>lm(weight ~ height + Cals)
>etc.
>
> In general, I'll have to read the header to get the argument labels.
>
> Do I have to write several functions, each taking a different number of
> arguments?  I'd like to construct a string or list representing the
> varialbes in the formula and apply lm(), so to say  [I'm mainly a Lisp
> programmer where that part would be very simple. Anyone have a Lisp API
> for R? :-}]
>
> Thanks,
> chris
>
> Chris Elsaesser, PhD
> Principal Scientist, Machine Learning
> SPADAC Inc.
> 7921 Jones Branch Dr. Suite 600
> McLean, VA 22102
>
> 703.371.7301 (m)
> 703.637.9421 (o)
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using lm() with variable formula

2007-05-17 Thread Richard M. Heiberger
> tmp <- data.frame(matrix(rnorm(40),10,4, dimnames=list(NULL, 
> c("Y","A","B","C"
> tmp
> tmp.form <-  paste(names(tmp)[1], paste(names(tmp)[-1], collapse=" + "), 
> sep=" ~ ")
> tmp.form
> lm(tmp.form, tmp)

The R language is powerful enough to most of the lisp-like things
you may want to do.

Rich

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using lm() with variable formula

2007-05-17 Thread Bert Gunter
... and note that if a matrix of responses is on the left of ~ , separate
regressions will be simultaneously fit to each of the columns of the matrix.
Note that this **is** in TFM -- ?lm.


Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck
Sent: Thursday, May 17, 2007 8:22 AM
To: Chris Elsaesser
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] using lm() with variable formula

Try this:


lm(Sepal.Length ~., iris[1:3])

# or

cn <- c("Sepal.Length", "Sepal.Width", "Petal.Length")
lm(Sepal.Length ~., iris[cn])



On 5/17/07, Chris Elsaesser <[EMAIL PROTECTED]> wrote:
> New to R; please excuse me if this is a dumb question.  I tried to RTFM;
> didn't help.
>
> I want to do a series of regressions over the columns in a data.frame,
> systematically varying the response variable and the the terms; and not
> necessarily including all the non-response columns.  In my case, the
> columns are time series. I don't know if that makes a difference; it
> does mean I have to call lag() to offset non-response terms. I can not
> assume a specific number of columns in the data.frame; might be 3, might
> be 20.
>
> My central problem is that the formula given to lm() is different each
> time.  For example, say a data.frame had columns with the following
> headings:  height, weight, BP (blood pressure), and Cals (calorie intake
> per time frame).  In that case, I'd need something like the following:
>
>lm(height ~ weight + BP + Cals)
>lm(height ~ weight + BP)
>lm(height ~ weight + Cals)
>lm(height ~ BP + Cals)
>lm(weight ~ height + BP)
>lm(weight ~ height + Cals)
>etc.
>
> In general, I'll have to read the header to get the argument labels.
>
> Do I have to write several functions, each taking a different number of
> arguments?  I'd like to construct a string or list representing the
> varialbes in the formula and apply lm(), so to say  [I'm mainly a Lisp
> programmer where that part would be very simple. Anyone have a Lisp API
> for R? :-}]
>
> Thanks,
> chris
>
> Chris Elsaesser, PhD
> Principal Scientist, Machine Learning
> SPADAC Inc.
> 7921 Jones Branch Dr. Suite 600
> McLean, VA 22102
>
> 703.371.7301 (m)
> 703.637.9421 (o)
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using lm() with variable formula

2007-05-21 Thread Vladimir Eremeev

I was solving similar problem some time ago.
Here is my script.
I had a data frame, containing a response and several other variables, which
were assumed predictors.
I was trying to choose the best linear approximation.
This approach now seems to me useless, please, don't blame me for that.
However, the script might be useful to you.


library(forward)

# dfr is a data.frame, that contains everything.
# The response variable is named med5x
# The following lines construct linear models for all possibe formulas
# of the form 
# med5x~T+a+height
# med5x~a+height+RH
# T, a, RH, etc are the names of possible predictors

inputs<-names(dfr)[c(10:30,1)]  # dfr was a very large data frame,
containing lot of variables.
# here we have chosen only a subset of them.

for(nc in 11:length(inputs)){ # the linear models were assumed to have at
least 11 terms
# now we are generating character vectors containing formulas.

  formulas<-paste("med5x",sep="~",
 
fwd.combn(inputs,nc,fun=function(x){paste(x,collapse="+")}))

# and then, are trying to fit every

  for(f in formulas){
lms<-lm(eval(parse(text=f)),data=dfr)

   
cat(file="linear_models.txt",f,sum(residuals(lms)^2),"\n",sep="\t",append=TRUE)
  }
}


Hmm, looking back, I see that this is rather inefficient script.
For example, the inner cycle can easily be replaced with the apply function.


Chris Elsaesser wrote:
> 
> New to R; please excuse me if this is a dumb question.  I tried to RTFM;
> didn't help.
> 
> I want to do a series of regressions over the columns in a data.frame,
> systematically varying the response variable and the the terms; and not
> necessarily including all the non-response columns.  In my case, the
> columns are time series. I don't know if that makes a difference; it
> does mean I have to call lag() to offset non-response terms. I can not
> assume a specific number of columns in the data.frame; might be 3, might
> be 20. 
> 
> My central problem is that the formula given to lm() is different each
> time.  For example, say a data.frame had columns with the following
> headings:  height, weight, BP (blood pressure), and Cals (calorie intake
> per time frame).  In that case, I'd need something like the following:
> 
>   lm(height ~ weight + BP + Cals)
>   lm(height ~ weight + BP)
>   lm(height ~ weight + Cals)
>   lm(height ~ BP + Cals)
>   lm(weight ~ height + BP)
>   lm(weight ~ height + Cals)
>   etc.
> 
> In general, I'll have to read the header to get the argument labels.
> 
> Do I have to write several functions, each taking a different number of
> arguments?  I'd like to construct a string or list representing the
> varialbes in the formula and apply lm(), so to say  [I'm mainly a Lisp
> programmer where that part would be very simple. Anyone have a Lisp API
> for R? :-}]
> 
> 

-- 
View this message in context: 
http://www.nabble.com/using-lm%28%29-with-variable-formula-tf3772540.html#a10716815
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using lm() with variable formula [Broadcast]

2007-05-18 Thread Liaw, Andy
One way to do it is by giving a data frame with the right variables to
lm() as the first argument each time.  If lm() is given a data frame as
the first argument, it will treat the first variable as the LHS and the
rest as the RHS of the formula.

As examples, you can do:

lm(myData[c("height", "weight", "BP", "Cals")])

(The drawback to this is that the "formula" in the fitted model object
looks a bit strange...)

Andy


From: Chris Elsaesser
> 
> New to R; please excuse me if this is a dumb question.  I 
> tried to RTFM;
> didn't help.
> 
> I want to do a series of regressions over the columns in a data.frame,
> systematically varying the response variable and the the 
> terms; and not
> necessarily including all the non-response columns.  In my case, the
> columns are time series. I don't know if that makes a difference; it
> does mean I have to call lag() to offset non-response terms. I can not
> assume a specific number of columns in the data.frame; might 
> be 3, might
> be 20. 
> 
> My central problem is that the formula given to lm() is different each
> time.  For example, say a data.frame had columns with the following
> headings:  height, weight, BP (blood pressure), and Cals 
> (calorie intake
> per time frame).  In that case, I'd need something like the following:
> 
>   lm(height ~ weight + BP + Cals)
>   lm(height ~ weight + BP)
>   lm(height ~ weight + Cals)
>   lm(height ~ BP + Cals)
>   lm(weight ~ height + BP)
>   lm(weight ~ height + Cals)
>   etc.
> 
> In general, I'll have to read the header to get the argument labels.
> 
> Do I have to write several functions, each taking a different 
> number of
> arguments?  I'd like to construct a string or list representing the
> varialbes in the formula and apply lm(), so to say  [I'm mainly a Lisp
> programmer where that part would be very simple. Anyone have 
> a Lisp API
> for R? :-}]
> 
> Thanks,
> chris
> 
> Chris Elsaesser, PhD
> Principal Scientist, Machine Learning
> SPADAC Inc.
> 7921 Jones Branch Dr. Suite 600  
> McLean, VA 22102  
> 
> 703.371.7301 (m)
> 703.637.9421 (o)
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.