Hi John,Peter Thanks for your quick response.
1) I plotted the residual vs fitted values graph and it was fairly random, there was no observable pattern in it - So I guess we are OK on that front. The Q-Q plot was stretched 'S' shaped with the middle part being mostly on the straight line. I tried to take log on dependent variable to make this more linear but that didn't help. 2) I began the model building process with one variable which I felt had most affect. Then I build on it adding new variables and dropping them in case they didn't have much affect (negative affect on p-value or adjusted-R value or F-statistic). I discretized the variables when they had +ve effect. ------------------------------------------------ ANOVA starts-------------------------------------------------------- Analysis of Variance Table Response: (dependent_variable) Df Sum Sq Mean Sq F-value Pr(>F) X 4 17869888 4467472 7.2345 3.438e-05 *** Y 3 105155343 35051781 56.7616 < 2.2e-16 *** Z 2 71488149 35744075 57.8826 < 2.2e-16 *** A 1 28396895 28396895 45.9849 6.995e-10 *** B 1 8056873 8056873 13.0470 0.000466 *** C 5 11912948 2382590 3.8583 0.002985 ** D 1 644076 644076 1.0430 0.309452 E 1 3827020 3827020 6.1973 0.014349 * F 1 12611611 12611611 20.4228 1.621e-05 *** Residuals 106 65457844 617527 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ------------------------------------------------ ANOVA ends-------------------------------------------------------- ------------------------------------------------ Model stats --------------------------------------------------------- Residual standard error: 785.8 on 106 degrees of freedom Multiple R-Squared: 0.7989, Adjusted R-squared: 0.7628 F-statistic: 22.16 on 19 and 106 DF, p-value: < 2.2e-16 ---------------------------------------------------------------------------------------------------------------------------- -Ankit On 10/8/07, John Sorkin <[EMAIL PROTECTED]> wrote: > > Ankit, > > (1) Not necessarily. Linear regression has a number of assumptions. I > suggest you get a basics statistics > textbook and do some reading. A brief summary of the assumptions include: > (a) The relation between outcome and predictor variables lie along a line > (or plane for a regression with > multiple predictor variables) or some surface that can be modeled using a > linear function > (b) The predictor variables are independent of one another > (c) The residuals from the regression are normally distributed > (d) The variance of the residuals is constant through out the range of the > independent variables. > (e) The predictor variables are measured without error. > > Even if the above assumptions are violated, you can still get a > significant f statistic, significance for some, > or all of your predictor variables, etc. If the assumptions are violated, > the meaning of the results you > obtain from your regression analysis can be questionable, if not > outrightly incorrect. There a number of > tests that you can perform to make sure you model conforms to (or at least > does not wildly violate) the basic > assumptions. Some commonly performed tests like examining the pattern of > residuals, can be done in R > by simply plotting the fit you obtain, i.e. > > fit1<-lm(y~x+z) > plot(fit1) #This produces a number of helpful graphs that > will help you evaluate your model. > > Fortunately, linear regression is fairly robust to minor violations of > several of the assumptions noted above, > however in order to fully evaluate the appropriateness of you model, you > will need to read a textbook, speak > to people with more experience than you, and play, play, play with data. > > (2) The more predictor variables you have the more observations you need. > Although there is no absolute > rule, many people like to have a minimum of five to ten observations per > independent variable. I like to have > at least ten. Given that you have eight independent variables, you would, > by my criteria need at least > 80 observations. You have 130 so you should be OK, assuming that your > observations are independent of > one-another. > > Sorry I can't be of more help; statistics can not be learned in a single > E-mail message. The fact that you > are asking important questions about what you are doing reflects well on > you. I suspect that in a year or > so you will be answering, rather than asking questions posted on the R > Listserv mailing list! > > John > > > John Sorkin M.D., Ph.D. > Chief, Biostatistics and Informatics > Baltimore VA Medical Center GRECC, > University of Maryland School of Medicine Claude D. Pepper OAIC, > University of Maryland Clinical Nutrition Research Unit, and > Baltimore VA Center Stroke of Excellence > > University of Maryland School of Medicine > Division of Gerontology > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > [EMAIL PROTECTED] > >>> "rocker turtle" <[EMAIL PROTECTED]> 10/07/07 2:32 PM >>> > Hi, > > First of all kudos to the creaters/contributors to R ! This is a great > package and I am finding it very useful in my research, will love to > contribute any modules and dataset which I develop to the project. > > While doing multiple regression I arrived at the following peculiar > situation. > Out of 8 variables only 4 have <0.04 p-values (of t-statistic), rest all > have p-values between 0.1 and 1.0 and the coeff of Regression is coming > around ~0.8 (adjusted ~0.78). The F-statistic is > around 30 and its own p-value is ~0. Also I am constrained with a dataset > of > 130 datapoints. > > Being new to statistics I would really appreciate if someone can help me > understand these values. > 1) Does the above test values indicate a statistically sound and > significant > model ? > 2) Is a dataset of 130 enough to run linear regression with ~7-10 > variables > ? If not what is approximately a good size. > > Thanks in advance. > -Ankit > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > Confidentiality Statement: > This email message, including any attachments, is for ...{{dropped:10}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.