Re: [R] R2 always increases as variables are added?

李俊杰 Mon, 21 May 2007 21:10:11 -0700

Hi, Lynch,

Thank you for attention first.


I am also not a statistician and have just taken several statistics classes.
So it is natral for us to ask some question seeming naive to statisticans.

I am sorry that I cannot agree with your point that we must always include
intercept in our model. becaus if true intercept is zero, the strategy of
you or your textbook will be have 2 losses. First, there will be
explaination problem. If true intercept is zero and your estimate of it is
not zero, the result of regression is misleading. However, it might be not
so serious as we judge those coefficients which are actually zeros to be
none-zeros, but the misjudge here is still a loss in some
extent. Secondly, if true intercept is zero, your strategy's predictive
ability is often lower than other strategies which do not always include
intercept.

If you are interested in the performance of your strategies, e.g. maximizing
adjusted R^2 always with intercept. you can run the code I put in the
attachment.
It will show that maximizing adjusted R^2 NOT always with intercept beats
maximizing adjusted R^2 always with intercept.

Junjie





2007/5/22, Paul Lynch <[EMAIL PROTECTED]>:


Junjie,
   First, a disclaimer:  I am not a statistician, and have only taken
one statistics class, but I just took it this Spring, so the concepts
of linear regression are relatively fresh in my head and hopefully I
will not be too inaccurate.
   According to my statistics textbook, when selecting variables for
a model, the intercept term is always present.  The "variables" under
consideration do not include the constant "1" that multiplies the
intercept term.  I don't think it makes sense to compare models with
and without an intercept term.  (Also, I don't know what the point of
using a model without an intercept term would be, but that is probably
just my ignorance.)
   Similarly, the formula you were using for R**2 seems to only be
useful in the context of a standard linear regression (i.e., one that
includes an intercept term).  As your example shows, it is easy to
construct a "fit" (e.g. y = 10,000,000*x) so that SSR > SST if one is
not deriving the fit from the regular linear regression process.
         --Paul

On 5/19/07, 李俊杰 <[EMAIL PROTECTED]> wrote:
> I know that "-1" indicates to remove the intercept term. But my question
is
> why intercept term CAN NOT be treated as a variable term as we place a
> column consited of 1 in the predictor matrix.
>
> If I stick to make a comparison between a model with intercept and one
> without intercept on adjusted r2 term, now I think the strategy is
always to
> use another definition of r-square or adjusted r-square, in which
> r-square=sum(( y.hat)^2)/sum((y)^2).
>
> Am I  in the right way?
>
> Thanks
>
> Li Junjie
>
>
> 2007/5/19, Paul Lynch <[EMAIL PROTECTED]>:
> > In case you weren't aware, the meaning of the "-1" in y ~ x - 1 is to
> > remove the intercept term that would otherwise be implied.
> >     --Paul
> >
> > On 5/17/07, 李俊杰 <[EMAIL PROTECTED]> wrote:
> > > Hi, everybody,
> > >
> > > 3 questions about R-square:
> > > ---------(1)----------- Does R2 always increase as variables are
added?
> > > ---------(2)----------- Does R2 always greater than 1?
> > > ---------(3)----------- How is R2 in summary(lm(y~x-1))$r.squared
> > > calculated? It is different from (r.square=sum((y.hat-mean
> > > (y))^2)/sum((y-mean(y))^2))
> > >
> > > I will illustrate these problems by the following codes:
> > > ---------(1)-----------  R2  doesn't always increase as
> variables are added
> > >
> > > > x=matrix(rnorm(20),ncol=2)
> > > > y=rnorm(10)
> > > >
> > > > lm=lm(y~1)
> > > > y.hat=rep(1*lm$coefficients,length(y))
> > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> > > [1] 2.646815e-33
> > > >
> > > > lm=lm(y~x-1)
> > > > y.hat=x%*%lm$coefficients
> > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> > > [1] 0.4443356
> > > >
> > > > ################ This is the biggest model, but its R2 is not the
> biggest,
> > > why?
> > > > lm=lm(y~x)
> > > > y.hat=cbind(rep(1,length(y)),x)%*%lm$coefficients
> > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> > > [1] 0.2704789
> > >
> > >
> > > ---------(2)-----------  R2  can greater than 1
> > >
> > > > x=rnorm(10)
> > > > y=runif(10)
> > > > lm=lm(y~x-1)
> > > > y.hat=x*lm$coefficients
> > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> > > [1] 3.513865
> > >
> > >
> > >  ---------(3)----------- How is R2 in summary(lm(y~x-1))$r.squared
> > > calculated? It is different from (r.square=sum((y.hat-mean
> > > (y))^2)/sum((y-mean(y))^2))
> > > > x=matrix(rnorm(20),ncol=2)
> > > > xx=cbind(rep(1,10),x)
> > > > y=x%*%c(1,2)+rnorm(10)
> > > > ### r2 calculated by lm(y~x)
> > > > lm=lm(y~x)
> > > > summary(lm)$r.squared
> > > [1] 0.9231062
> > > > ### r2 calculated by lm(y~xx-1)
> > > > lm=lm(y~xx-1)
> > > > summary(lm)$r.squared
> > > [1] 0.9365253
> > > > ### r2 calculated by me
> > > > y.hat=xx%*%lm$coefficients
> > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> > > [1] 0.9231062
> > >
> > >
> > > Thanks a lot for any cue:)
> > >
> > >
> > >
> > >
> > > --
> > > Junjie Li,                  [EMAIL PROTECTED]
> > > Undergranduate in DEP of Tsinghua University,
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> > --
> > Paul Lynch
> > Aquilent, Inc.
> > National Library of Medicine (Contractor)
> >
>
>
>
> --
>
> Junjie Li,                  [EMAIL PROTECTED]
> Undergranduate in DEP of Tsinghua University,


--
Paul Lynch
Aquilent, Inc.
National Library of Medicine (Contractor)




--
Junjie Li,                  [EMAIL PROTECTED]
Undergranduate in DEP of Tsinghua University,

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R2 always increases as variables are added?

Reply via email to