Hi John,Peter

Thanks for your quick response.

1) I plotted the residual vs fitted values graph and it was fairly random,
there was no observable pattern in it - So I guess we are OK on that front.
The Q-Q plot was stretched 'S' shaped with the middle part being mostly on
the straight line. I tried to take log on dependent variable to make this
more linear but that didn't help.

2) I began the model building process with one variable which I felt had
most affect. Then I build on it adding new variables and dropping them in
case they didn't have much affect (negative affect on p-value or adjusted-R
value or F-statistic). I discretized the variables when they had +ve effect.

------------------------------------------------ ANOVA
starts--------------------------------------------------------
Analysis of Variance Table
Response:    (dependent_variable)

        Df            Sum Sq            Mean Sq       F-value
Pr(>F)
X        4            17869888          4467472            7.2345
3.438e-05    ***
Y        3            105155343        35051781          56.7616        <
2.2e-16    ***
Z        2            71488149          35744075          57.8826
<    2.2e-16    ***
A        1            28396895          28396895         45.9849
6.995e-10    ***
B        1            8056873            8056873           13.0470
0.000466    ***
C        5            11912948          2382590           3.8583
0.002985    **
D        1            644076              644076            1.0430
   0.309452
E        1            3827020            3827020           6.1973
   0.014349    *
F        1            12611611          12611611         20.4228
1.621e-05    ***
Residuals        106        65457844                617527

---
Signif.    codes:        0    '***'    0.001    '**'    0.01    '*'    0.05
'.'    0.1    '    '    1
------------------------------------------------ ANOVA
ends--------------------------------------------------------

------------------------------------------------ Model stats
---------------------------------------------------------
Residual standard error: 785.8 on 106 degrees of freedom
Multiple R-Squared: 0.7989,     Adjusted R-squared: 0.7628
F-statistic: 22.16 on 19 and 106 DF,  p-value: < 2.2e-16
----------------------------------------------------------------------------------------------------------------------------

-Ankit

On 10/8/07, John Sorkin <[EMAIL PROTECTED]> wrote:
>
> Ankit,
>
> (1) Not necessarily. Linear regression has a number of assumptions. I
> suggest you get a basics statistics
> textbook and do some reading. A brief summary of the assumptions include:
> (a) The relation between outcome and predictor variables lie along a line
> (or plane for a regression with
> multiple predictor variables) or some surface that can be modeled using a
> linear function
> (b) The predictor variables are independent of one another
> (c) The residuals from the regression are normally distributed
> (d) The variance of the residuals is constant through out the range of the
> independent variables.
> (e) The predictor variables are measured without error.
>
> Even if the above assumptions are violated, you can still get a
> significant f statistic, significance for some,
> or all of your predictor variables, etc. If the assumptions are violated,
> the meaning of the results you
> obtain from your regression analysis can be questionable, if not
> outrightly incorrect. There a number of
> tests that you can perform to make sure you model conforms to (or at least
> does not wildly violate) the basic
> assumptions. Some commonly performed tests like examining the pattern of
> residuals, can be done in R
> by simply plotting the fit you obtain, i.e.
>
> fit1<-lm(y~x+z)
> plot(fit1)             #This produces a number of helpful graphs that
> will  help you evaluate your model.
>
> Fortunately, linear regression is fairly robust to minor violations of
> several of the assumptions noted above,
> however in order to fully evaluate the appropriateness of you model, you
> will need to read a textbook, speak
> to people with more experience than you, and play, play, play with data.
>
> (2) The more predictor variables you have the more observations you need.
> Although there is no absolute
> rule, many people like to have a minimum of five to ten observations per
> independent variable. I like to have
> at least ten. Given that you have eight independent variables, you would,
> by my criteria need at least
> 80 observations. You have 130 so you should be OK, assuming that your
> observations are independent of
> one-another.
>
> Sorry I can't be of more help; statistics can not be learned in a single
> E-mail message. The fact that you
> are asking important questions about what you are doing reflects well on
> you. I suspect that in a year or
> so you will be answering, rather than asking questions posted on the R
> Listserv mailing list!
>
> John
>
>
> John Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> Baltimore VA Medical Center GRECC,
> University of Maryland School of Medicine Claude D. Pepper OAIC,
> University of Maryland Clinical Nutrition Research Unit, and
> Baltimore VA Center Stroke of Excellence
>
> University of Maryland School of Medicine
> Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
>
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> [EMAIL PROTECTED]
> >>> "rocker turtle" <[EMAIL PROTECTED]> 10/07/07 2:32 PM >>>
> Hi,
>
> First of all kudos to the creaters/contributors to R ! This is a great
> package and I am finding it very useful in my research, will love to
> contribute any modules and dataset which I develop to the project.
>
> While doing multiple regression I arrived at the following peculiar
> situation.
> Out of 8 variables only 4 have  <0.04 p-values (of t-statistic), rest all
> have p-values between 0.1 and 1.0 and the coeff of Regression is coming
> around ~0.8 (adjusted ~0.78). The F-statistic is
> around 30 and its own p-value is ~0. Also I am constrained with a dataset
> of
> 130 datapoints.
>
> Being new to statistics I would really appreciate if someone can help me
> understand these values.
> 1) Does the above test values indicate a statistically sound and
> significant
> model ?
> 2) Is a dataset of 130 enough to run linear regression with ~7-10
> variables
> ? If not what is approximately a good size.
>
> Thanks in advance.
> -Ankit
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:10}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to