On 17-Mar-09 23:04:25, Erin Hodgess wrote: > Dear R People: > Here is a small data frame and two particular formulas: >> test.df > y x > 1 -0.9261650 1 > 2 1.5702700 2 > 3 0.1673920 3 > 4 0.7893085 4 > 5 0.3576875 5 > 6 -1.4620915 6 > 7 -0.5506215 7 > 8 -0.3480292 8 > 9 -1.2344036 9 > 10 0.8502660 10 >> summary(lm(exp(y)~x)) > > Call: > lm(formula = exp(y) ~ x) > > Residuals: > Min 1Q Median 3Q Max > -1.6360 -0.6435 -0.4722 0.4215 2.9127 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 2.1689 0.9782 2.217 0.0574 . > x -0.1368 0.1577 -0.868 0.4108 > --- > Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 > > Residual standard error: 1.432 on 8 degrees of freedom > Multiple R-squared: 0.08604, Adjusted R-squared: -0.0282 > F-statistic: 0.7532 on 1 and 8 DF, p-value: 0.4108 > >> summary(lm(I(y^2)~x)) > > Call: > lm(formula = I(y^2) ~ x) > > Residuals: > Min 1Q Median 3Q Max > -0.9584 -0.6387 -0.2651 0.5754 1.4412 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 1.10084 0.62428 1.763 0.116 > x -0.03813 0.10061 -0.379 0.715 > > Residual standard error: 0.9138 on 8 degrees of freedom > Multiple R-squared: 0.01764, Adjusted R-squared: -0.1052 > F-statistic: 0.1436 on 1 and 8 DF, p-value: 0.7146 > >> > > These both work just fine. > > My question is: when do you know to use I() and just the function of > the variable, please? > > thanks in advance, > Erin > PS Happy St Pat's Day!
In the case of your formula you will find it works just as well without I(): summary(lm(y^2 ~ x)) Call: lm(formula = y^2 ~ x) Residuals: Min 1Q Median 3Q Max -0.9584 -0.6387 -0.2651 0.5754 1.4412 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.10084 0.62428 1.763 0.116 x -0.03813 0.10061 -0.379 0.715 The point of I() is that it forces numerical evaluation in an expression which could be interpreted as a symbolic model formula. Thus if X1 and X2 were numeric, and you want to regress Y on the numerical values of X1*X2, then you should use I(X1*X2), since in Y ~ X1*X2 this would be interpreted as (essentially) fitting both linear terms and their interaction (equivalent to product here), namely corresponding to Y = a + b1*X1 + b2*X2 + b12*X1*X2 In order to force the fitted equation to be Y = a + b*X1*X2 you would use Y ~ I(X1*X2). This issue does not arise when a product is on the left-hand side of the model formula, so you could simply use X1*X2 ~ Y Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 17-Mar-09 Time: 23:31:21 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.