Hi It seems to me quite like a homework for which the policy of this list is not to respond. But far from being an expert in statistics I only express my opinion. It seems to me that your height variable behaves like a two level factor and the 190 value points to rather suspicious value in weight if I look at the plot
plot(scores, weight) Regards Petr > Dear members of the R-help list, > > I have sent the email below to the R-SIG-ME list to ask for help in > interpreting some R output of fitted linear models. > > Unfortunately, I haven't yet received any answers. As I am not sure if my > email was sent successfully to the mailing list I > > am asking for help here: > > > > Dear members of the R-SIG-ME list, > > > I am new to linear models and struggling with interpreting some of the R > output but hope to get some advice from here. > > I created the following dummy data set: > > scores <- c(2,6,10,12,14,20) > > weight <- c(60,70,80,75,80,85) > > height <- c(180,180,190,180,180,180) > > The scores of a game/match should be dependent on the weight of the player > but not on the height. > > For me the output of the following two linear models make sense: > > > (lm1 <- summary(lm(scores ~ weight))) > > Call: > lm(formula = scores ~ weight) > > Residuals: > 1 2 3 4 5 6 > 1.08333 -1.41667 -3.91667 1.33333 0.08333 2.83333 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) -38.0833 10.0394 -3.793 0.01921 * > weight 0.6500 0.1331 4.885 0.00813 ** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 2.661 on 4 degrees of freedom > Multiple R-squared: 0.8564, Adjusted R-squared: 0.8205 > F-statistic: 23.86 on 1 and 4 DF, p-value: 0.008134 > > > > > (lm2 <- summary(lm(scores ~ height))) > > Call: > lm(formula = scores ~ height) > > Residuals: > 1 2 3 4 5 6 > -8.800e+00 -4.800e+00 1.377e-14 1.200e+00 3.200e+00 9.200e+00 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 25.2000 139.6175 0.180 0.866 > height -0.0800 0.7684 -0.104 0.922 > > Residual standard error: 7.014 on 4 degrees of freedom > Multiple R-squared: 0.002703, Adjusted R-squared: -0.2466 > F-statistic: 0.01084 on 1 and 4 DF, p-value: 0.9221 > > The p-value of the first output is 0.008134 which makes sense as scores and > weight have a high correlation > > and therefore, the scores "can be explained" by the explanatory > variable/factor weight very well. Hence, the R-squared > > value is close to 1. For the second example it also makes sense that the > p-value is almost 1 (p=0.9221) as there is > > hardly any correlation between scores and height. > > What is not clear to me is shown in my 3rd linear model which includes both > weight and height. > > > (lm3 <- summary(lm(scores ~ weight + height))) > > Call: > lm(formula = scores ~ weight + height) > > Residuals: > 1 2 3 4 5 6 > 1.189e+00 -1.946e+00 -2.165e-15 4.865e-01 -1.081e+00 1.351e+00 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 49.45946 33.50261 1.476 0.23635 > weight 0.71351 0.08716 8.186 0.00381 ** > height -0.50811 0.19096 -2.661 0.07628 . > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 1.677 on 3 degrees of freedom > Multiple R-squared: 0.9573, Adjusted R-squared: 0.9288 > F-statistic: 33.6 on 2 and 3 DF, p-value: 0.008833 > > It makes sense that the R-squared value is higher when one adds both > explanatory variables/factors to the linear model as > > the more variables are added the more variance is explained and therefore > the fit of the model will be better. However, I do NOT > > understand why the p-value of height (Pr(> | t |) = 0.07628) is now almost > significant? And also, I do NOT understand why the overall > > p-value of 0.008833 is less significant as compared to the one from model > lm1 which was p-value: 0.008134. > > The p-value of weight being low (p=0.00381) makes sense as this factor > "explains" the scores very well. > > > > After fitting the 3 models (lm1, lm2 and lm3) I wanted to compare model lm1 > with lm3 using the anova function to check whether the factor height > > significantly improves the model. In other words I wanted to check if adding > height to the model helps explaining the scores of the players. > > The output of the anova looks as follows: > > > lm1 <- lm(scores ~ weight) > > > > lm2 <- lm(scores ~ weight + height) > > > > anova(lm1,lm2) > Analysis of Variance Table > > Model 1: scores ~ weight > Model 2: scores ~ weight + height > Res.Df RSS Df Sum of Sq F Pr(>F) > 1 4 28.3333 > 2 3 8.4324 1 19.901 7.0801 0.07628 . > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > In my opinion the p-value should be almost 1 and not close to significance > (0.07) as we have seen from model lm2 > > height does not at all "explain" the scores. Here, I thought that a > significant p-value means that the factor height adds > > significant value to the model. > > > I would be very grateful if anyone could help me in interpreting the R > output. > > Best regards > > > > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Help-needed- > in-interpreting-linear-models-tp4291670p4291670.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.