[R] OLS variables

2005-11-06 Thread Leaf Sun
Dear all,

Is there any simple way in R that can I put the all the interactions of the 
variables in the OLS model?

e.g.

I have a bunch of variables, x1,x2, x20... I expect then to have 
interaction (e.g. x1*x2, x3*x4*x5... ) with some combinations(2 way or higher 
dimensions). 

Is there any way that I can write the model simpler?

Thanks!

Leaf

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] OLS variables

2005-11-06 Thread John Fox
Dear Leaf,

I assume that you're using lm() to fit the model, and that you don't really
want *all* of the interactions among 20 predictors: You'd need quite a lot
of data to fit a model with 2^20 terms in it, and might have trouble
interpreting the results. 

If you know which interactions you're looking for, then why not specify them
directly, as in lm(y ~  x1*x2 + x3*x4*x5 + etc.)? On the other hand, it you
want to include all interactions, say, up to three-way, and you've put the
variables in a data frame, then lm(y ~ .^3, data=DataFrame) will do it.
There are many terms in this model, however, if not quite 2^20.

The introductory manual that comes with R has information on model formulas
in Section 11.

I hope this helps,
 John 


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Leaf Sun
> Sent: Sunday, November 06, 2005 3:11 AM
> To: r-help@stat.math.ethz.ch
> Subject: [R] OLS variables
> 
> Dear all,
> 
> Is there any simple way in R that can I put the all the 
> interactions of the variables in the OLS model?
> 
> e.g.
> 
> I have a bunch of variables, x1,x2, x20... I expect then 
> to have interaction (e.g. x1*x2, x3*x4*x5... ) with some 
> combinations(2 way or higher dimensions). 
> 
> Is there any way that I can write the model simpler?
> 
> Thanks!
> 
> Leaf
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] OLS variables

2005-11-06 Thread Adaikalavan Ramasamy
IMHO, the details section of help(formula) provides a nicer help.

Regards, Adai


On Sun, 2005-11-06 at 08:27 -0500, John Fox wrote:
> Dear Leaf,
> 
> I assume that you're using lm() to fit the model, and that you don't really
> want *all* of the interactions among 20 predictors: You'd need quite a lot
> of data to fit a model with 2^20 terms in it, and might have trouble
> interpreting the results. 
> 
> If you know which interactions you're looking for, then why not specify them
> directly, as in lm(y ~  x1*x2 + x3*x4*x5 + etc.)? On the other hand, it you
> want to include all interactions, say, up to three-way, and you've put the
> variables in a data frame, then lm(y ~ .^3, data=DataFrame) will do it.
> There are many terms in this model, however, if not quite 2^20.
> 
> The introductory manual that comes with R has information on model formulas
> in Section 11.
> 
> I hope this helps,
>  John 
> 
> 
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox 
>  
> 
> > -Original Message-
> > From: [EMAIL PROTECTED] 
> > [mailto:[EMAIL PROTECTED] On Behalf Of Leaf Sun
> > Sent: Sunday, November 06, 2005 3:11 AM
> > To: r-help@stat.math.ethz.ch
> > Subject: [R] OLS variables
> > 
> > Dear all,
> > 
> > Is there any simple way in R that can I put the all the 
> > interactions of the variables in the OLS model?
> > 
> > e.g.
> > 
> > I have a bunch of variables, x1,x2, x20... I expect then 
> > to have interaction (e.g. x1*x2, x3*x4*x5... ) with some 
> > combinations(2 way or higher dimensions). 
> > 
> > Is there any way that I can write the model simpler?
> > 
> > Thanks!
> > 
> > Leaf
> > 
> >
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 
-- 
Adaikalavan Ramasamy[EMAIL PROTECTED]
Centre for Statistics in Medicine   http://www.ihs.ox.ac.uk/csm/
Wolfson College Annexe  Tel : 01865 284 408
Linton Road, Oxford OX2 6UD Fax : 01865 284 424

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] OLS variables

2005-11-06 Thread Leaf Sun
Thanks for the information!

Leaf
  
=== At 2005-11-06, 11:07:31 you wrote: ===

>IMHO, the details section of help(formula) provides a nicer help.
>
>Regards, Adai
>
>
>On Sun, 2005-11-06 at 08:27 -0500, John Fox wrote:
>> Dear Leaf,
>> 
>> I assume that you're using lm() to fit the model, and that you don't really
>> want *all* of the interactions among 20 predictors: You'd need quite a lot
>> of data to fit a model with 2^20 terms in it, and might have trouble
>> interpreting the results. 
>> 
>> If you know which interactions you're looking for, then why not specify them
>> directly, as in lm(y ~  x1*x2 + x3*x4*x5 + etc.)? On the other hand, it you
>> want to include all interactions, say, up to three-way, and you've put the
>> variables in a data frame, then lm(y ~ .^3, data=DataFrame) will do it.
>> There are many terms in this model, however, if not quite 2^20.
>> 
>> The introductory manual that comes with R has information on model formulas
>> in Section 11.
>> 
>> I hope this helps,
>>  John 
>> 
>> 
>> John Fox
>> Department of Sociology
>> McMaster University
>> Hamilton, Ontario
>> Canada L8S 4M4
>> 905-525-9140x23604
>> http://socserv.mcmaster.ca/jfox 
>>  
>> 
>> > -Original Message-
>> > From: [EMAIL PROTECTED] 
>> > [mailto:[EMAIL PROTECTED] On Behalf Of Leaf Sun
>> > Sent: Sunday, November 06, 2005 3:11 AM
>> > To: r-help@stat.math.ethz.ch
>> > Subject: [R] OLS variables
>> > 
>> > Dear all,
>> > 
>> > Is there any simple way in R that can I put the all the 
>> > interactions of the variables in the OLS model?
>> > 
>> > e.g.
>> > 
>> > I have a bunch of variables, x1,x2, x20... I expect then 
>> > to have interaction (e.g. x1*x2, x3*x4*x5... ) with some 
>> > combinations(2 way or higher dimensions). 
>> > 
>> > Is there any way that I can write the model simpler?
>> > 
>> > Thanks!
>> > 
>> > Leaf
>> > 
>> >
>> 
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>> 
>-- 
>Adaikalavan Ramasamy[EMAIL PROTECTED]
>Centre for Statistics in Medicine   http://www.ihs.ox.ac.uk/csm/
>Wolfson College Annexe  Tel : 01865 284 408
>Linton Road, Oxford OX2 6UD Fax : 01865 284 424
>
>.

= = = = = = = = = = = = = = = = = = = =

Leaf Sun
[EMAIL PROTECTED]
2005-11-06

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] OLS variables

2005-11-06 Thread Kjetil Brinchmann halvorsen
John Fox wrote:
> Dear Leaf,
> 
> I assume that you're using lm() to fit the model, and that you don't really
> want *all* of the interactions among 20 predictors: You'd need quite a lot
> of data to fit a model with 2^20 terms in it, and might have trouble
> interpreting the results. 
> 
> If you know which interactions you're looking for, then why not specify them
> directly, as in lm(y ~  x1*x2 + x3*x4*x5 + etc.)? On the other hand, it you
> want to include all interactions, say, up to three-way, and you've put the
> variables in a data frame, then lm(y ~ .^3, data=DataFrame) will do it.  

This is nice with factors, but with continuous variables, and need of a
response-surface type, of model, will not do. For instance, with 
variables x, y, z in data frame dat
lm( y ~ (x+z)^2, data=dat )
gives a model mwith the terms x, z and x*z, not the square terms.
There is a need for a semi-automatic way to get these, for instance,
use poly() or polym() as in:

lm(y ~ polym(x,z,degree=2), data=dat)

Kjetil

> There are many terms in this model, however, if not quite 2^20.
> 
> The introductory manual that comes with R has information on model formulas
> in Section 11.
> 
> I hope this helps,
>  John 
> 
> 
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox 
>  
> 
>> -Original Message-
>> From: [EMAIL PROTECTED] 
>> [mailto:[EMAIL PROTECTED] On Behalf Of Leaf Sun
>> Sent: Sunday, November 06, 2005 3:11 AM
>> To: r-help@stat.math.ethz.ch
>> Subject: [R] OLS variables
>>
>> Dear all,
>>
>> Is there any simple way in R that can I put the all the 
>> interactions of the variables in the OLS model?
>>
>> e.g.
>>
>> I have a bunch of variables, x1,x2, x20... I expect then 
>> to have interaction (e.g. x1*x2, x3*x4*x5... ) with some 
>> combinations(2 way or higher dimensions). 
>>
>> Is there any way that I can write the model simpler?
>>
>> Thanks!
>>
>> Leaf
>>
>>
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 
> 
> 



-- 

Checked by AVG Free Edition.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] OLS variables

2005-11-07 Thread Prof Brian Ripley
On Sun, 6 Nov 2005, Kjetil Brinchmann halvorsen wrote:

> John Fox wrote:
>>
>> I assume that you're using lm() to fit the model, and that you don't really
>> want *all* of the interactions among 20 predictors: You'd need quite a lot
>> of data to fit a model with 2^20 terms in it, and might have trouble
>> interpreting the results.
>>
>> If you know which interactions you're looking for, then why not specify them
>> directly, as in lm(y ~  x1*x2 + x3*x4*x5 + etc.)? On the other hand, it you
>> want to include all interactions, say, up to three-way, and you've put the
>> variables in a data frame, then lm(y ~ .^3, data=DataFrame) will do it.
>
> This is nice with factors, but with continuous variables, and need of a
> response-surface type, of model, will not do. For instance, with
> variables x, y, z in data frame dat
>lm( y ~ (x+z)^2, data=dat )
> gives a model mwith the terms x, z and x*z, not the square terms.
> There is a need for a semi-automatic way to get these, for instance,
> use poly() or polym() as in:
>
> lm(y ~ polym(x,z,degree=2), data=dat)

This is an R-S difference (FAQ 3.3.2).  R's formula parser always takes 
x^2 = x whereas the S one does so only for factors.  This makes sense it 
you interpret `interaction' strictly as in John's description - S chose 
to see an interaction of any two continuous variables as multiplication
(something which puzzled me when I first encountered it, as it was not 
well documented back in 1991).

I have often wondered if this difference was thought to be an improvement, 
or if it just a different implementation of the Rogers-Wilkinson syntax.
Should we consider changing it?

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] OLS variables

2005-11-07 Thread John Fox
Dear Brian,

I don't have a strong opinion, but R's interpretation seems more consistent
to me, and as Kjetil points out, one can use polym() to specify a
full-polynomial model. It occurs to me that ^ and ** could be differentiated
in model formulae to provide both.

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

> -Original Message-
> From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
> Sent: Monday, November 07, 2005 4:05 AM
> To: Kjetil Brinchmann halvorsen
> Cc: John Fox; r-help@stat.math.ethz.ch
> Subject: Re: [R] OLS variables
> 
> On Sun, 6 Nov 2005, Kjetil Brinchmann halvorsen wrote:
> 
> > John Fox wrote:
> >>
> >> I assume that you're using lm() to fit the model, and that 
> you don't 
> >> really want *all* of the interactions among 20 predictors: 
> You'd need 
> >> quite a lot of data to fit a model with 2^20 terms in it, 
> and might 
> >> have trouble interpreting the results.
> >>
> >> If you know which interactions you're looking for, then why not 
> >> specify them directly, as in lm(y ~  x1*x2 + x3*x4*x5 + 
> etc.)? On the 
> >> other hand, it you want to include all interactions, say, up to 
> >> three-way, and you've put the variables in a data frame, 
> then lm(y ~ .^3, data=DataFrame) will do it.
> >
> > This is nice with factors, but with continuous variables, 
> and need of 
> > a response-surface type, of model, will not do. For instance, with 
> > variables x, y, z in data frame dat
> >lm( y ~ (x+z)^2, data=dat )
> > gives a model mwith the terms x, z and x*z, not the square terms.
> > There is a need for a semi-automatic way to get these, for 
> instance, 
> > use poly() or polym() as in:
> >
> > lm(y ~ polym(x,z,degree=2), data=dat)
> 
> This is an R-S difference (FAQ 3.3.2).  R's formula parser 
> always takes
> x^2 = x whereas the S one does so only for factors.  This 
> makes sense it you interpret `interaction' strictly as in 
> John's description - S chose to see an interaction of any two 
> continuous variables as multiplication (something which 
> puzzled me when I first encountered it, as it was not well 
> documented back in 1991).
> 
> I have often wondered if this difference was thought to be an 
> improvement, or if it just a different implementation of the 
> Rogers-Wilkinson syntax.
> Should we consider changing it?
> 
> -- 
> Brian D. Ripley,  [EMAIL PROTECTED]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] OLS variables

2005-11-07 Thread Prof Brian Ripley
On Mon, 7 Nov 2005, John Fox wrote:

> Dear Brian,
>
> I don't have a strong opinion, but R's interpretation seems more consistent
> to me, and as Kjetil points out, one can use polym() to specify a
> full-polynomial model. It occurs to me that ^ and ** could be differentiated
> in model formulae to provide both.

However, poly[m] only provide orthogonal polynomials, and I have from time 
to time considered extending them to provide raw polynomials too.
Is that a better-supported idea?

>
> Regards,
> John
>
> 
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox
> 
>
>> -Original Message-
>> From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
>> Sent: Monday, November 07, 2005 4:05 AM
>> To: Kjetil Brinchmann halvorsen
>> Cc: John Fox; r-help@stat.math.ethz.ch
>> Subject: Re: [R] OLS variables
>>
>> On Sun, 6 Nov 2005, Kjetil Brinchmann halvorsen wrote:
>>
>>> John Fox wrote:
>>>>
>>>> I assume that you're using lm() to fit the model, and that
>> you don't
>>>> really want *all* of the interactions among 20 predictors:
>> You'd need
>>>> quite a lot of data to fit a model with 2^20 terms in it,
>> and might
>>>> have trouble interpreting the results.
>>>>
>>>> If you know which interactions you're looking for, then why not
>>>> specify them directly, as in lm(y ~  x1*x2 + x3*x4*x5 +
>> etc.)? On the
>>>> other hand, it you want to include all interactions, say, up to
>>>> three-way, and you've put the variables in a data frame,
>> then lm(y ~ .^3, data=DataFrame) will do it.
>>>
>>> This is nice with factors, but with continuous variables,
>> and need of
>>> a response-surface type, of model, will not do. For instance, with
>>> variables x, y, z in data frame dat
>>>lm( y ~ (x+z)^2, data=dat )
>>> gives a model mwith the terms x, z and x*z, not the square terms.
>>> There is a need for a semi-automatic way to get these, for
>> instance,
>>> use poly() or polym() as in:
>>>
>>> lm(y ~ polym(x,z,degree=2), data=dat)
>>
>> This is an R-S difference (FAQ 3.3.2).  R's formula parser
>> always takes
>> x^2 = x whereas the S one does so only for factors.  This
>> makes sense it you interpret `interaction' strictly as in
>> John's description - S chose to see an interaction of any two
>> continuous variables as multiplication (something which
>> puzzled me when I first encountered it, as it was not well
>> documented back in 1991).
>>
>> I have often wondered if this difference was thought to be an
>> improvement, or if it just a different implementation of the
>> Rogers-Wilkinson syntax.
>> Should we consider changing it?

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] OLS variables

2005-11-07 Thread John Fox
Dear Brian,

I like the idea of providing support for raw polynomials in poly() and
polym(), if only for pedagogical reasons.

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

> -Original Message-
> From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
> Sent: Monday, November 07, 2005 11:14 AM
> To: John Fox
> Cc: r-help@stat.math.ethz.ch; 'Kjetil Brinchmann halvorsen'
> Subject: RE: [R] OLS variables
> 
> On Mon, 7 Nov 2005, John Fox wrote:
> 
> > Dear Brian,
> >
> > I don't have a strong opinion, but R's interpretation seems more 
> > consistent to me, and as Kjetil points out, one can use polym() to 
> > specify a full-polynomial model. It occurs to me that ^ and 
> ** could 
> > be differentiated in model formulae to provide both.
> 
> However, poly[m] only provide orthogonal polynomials, and I 
> have from time to time considered extending them to provide 
> raw polynomials too.
> Is that a better-supported idea?
> 
> >
> > Regards,
> > John
> >
> > 
> > John Fox
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario
> > Canada L8S 4M4
> > 905-525-9140x23604
> > http://socserv.mcmaster.ca/jfox
> > 
> >
> >> -Original Message-
> >> From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
> >> Sent: Monday, November 07, 2005 4:05 AM
> >> To: Kjetil Brinchmann halvorsen
> >> Cc: John Fox; r-help@stat.math.ethz.ch
> >> Subject: Re: [R] OLS variables
> >>
> >> On Sun, 6 Nov 2005, Kjetil Brinchmann halvorsen wrote:
> >>
> >>> John Fox wrote:
> >>>>
> >>>> I assume that you're using lm() to fit the model, and that
> >> you don't
> >>>> really want *all* of the interactions among 20 predictors:
> >> You'd need
> >>>> quite a lot of data to fit a model with 2^20 terms in it,
> >> and might
> >>>> have trouble interpreting the results.
> >>>>
> >>>> If you know which interactions you're looking for, then why not 
> >>>> specify them directly, as in lm(y ~  x1*x2 + x3*x4*x5 +
> >> etc.)? On the
> >>>> other hand, it you want to include all interactions, say, up to 
> >>>> three-way, and you've put the variables in a data frame,
> >> then lm(y ~ .^3, data=DataFrame) will do it.
> >>>
> >>> This is nice with factors, but with continuous variables,
> >> and need of
> >>> a response-surface type, of model, will not do. For 
> instance, with 
> >>> variables x, y, z in data frame dat
> >>>lm( y ~ (x+z)^2, data=dat )
> >>> gives a model mwith the terms x, z and x*z, not the square terms.
> >>> There is a need for a semi-automatic way to get these, for
> >> instance,
> >>> use poly() or polym() as in:
> >>>
> >>> lm(y ~ polym(x,z,degree=2), data=dat)
> >>
> >> This is an R-S difference (FAQ 3.3.2).  R's formula parser always 
> >> takes
> >> x^2 = x whereas the S one does so only for factors.  This 
> makes sense 
> >> it you interpret `interaction' strictly as in John's 
> description - S 
> >> chose to see an interaction of any two continuous variables as 
> >> multiplication (something which puzzled me when I first 
> encountered 
> >> it, as it was not well documented back in 1991).
> >>
> >> I have often wondered if this difference was thought to be an 
> >> improvement, or if it just a different implementation of the 
> >> Rogers-Wilkinson syntax.
> >> Should we consider changing it?
> 
> -- 
> Brian D. Ripley,  [EMAIL PROTECTED]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html