rocker turtle wrote: > Hi, > > First of all kudos to the creaters/contributors to R ! This is a great > package and I am finding it very useful in my research, will love to > contribute any modules and dataset which I develop to the project. > > While doing multiple regression I arrived at the following peculiar > situation. > Out of 8 variables only 4 have <0.04 p-values (of t-statistic), rest all > have p-values between 0.1 and 1.0 and the coeff of Regression is coming > around ~0.8 (adjusted ~0.78). The F-statistic is > around 30 and its own p-value is ~0. Also I am constrained with a dataset of > 130 datapoints. > > Nothing particularly peculiar about this...
> Being new to statistics I would really appreciate if someone can help me > understand these values. > 1) Does the above test values indicate a statistically sound and significant > model ? > Significant, yes, in a sense (see below). Soundness is something you cannot really see from the output of a regression analysis, because it contains results which are valid _provided_ the model assumption holds. To check the assumptions there is a battery of techniques, e.g. residual plots and interaction tests -- there are books about this, which won't really fit into a short email.... Re. significance, it is important to realise that you generally need to compare multiple model fits to assess which variables are important. With one fit, you can say what happens if you drop single variables from the model, so in your case, you have four seven-variable models that do not fit any worse than the full model. You can't really say anything about what happens if you remove two or more variables. You can also see what happens if you drop all variables; this is the overall F test, which in your case is highly significant, so at least one variable must be required. You can be fairly confident that variables with very small p-values cannot be removed, whereas borderline cases may end up with their p-values becoming insignificant when other variables are removed. > 2) Is a dataset of 130 enough to run linear regression with ~7-10 variables > ? If not what is approximately a good size. > > Wrong question, I think. Some people suggest heuristics like 10-20 observations per variable, but this contains an implicit understanding that you are dealing with "typical problems" in e.g. clinical epidemiology. Designed experiments can contain many more parameters, data with strong correlations require more observations to untangle which variables are important, and even otherwise, you might be looking for effects that are small compared to the residual variation and consequentially require more observations. When you do have the data, I think it is more sound to look at the standard errors of the regression coefficients and discuss whether they are sufficiently small for the kinds of conclusions you want to make. > Thanks in advance. > -Ankit > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.