I was surprised to find that just changing the base level of a factor variable changed the number of significant coefficients in the solution.
I was surprised at this and want to know how I should choose the order of the factors, if the order affects the result. Here is the small example. It is taken from 'The R Book', Crawley p. 365. The data is at http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/competition.txt In R > comp<-read.table("C:\\Temp\\competition.txt", header=T) > attach(comp) Data has dependent variable 'biomass' and different types of 'clipping' that were done: Control (none), n25, n50, r10, r5: > summary(comp) biomass clipping Min. :415.0 control:6 1st Qu.:508.8 n25 :6 Median :568.0 n50 :6 Mean :561.8 r10 :6 3rd Qu.:631.8 r5 :6 Max. :731.0 List mean Biomass of each type of Clipping: > aggregate (comp$biomass, list (comp$clipping) , mean) Group.1 x control 465.1667 n25 553.3333 n50 569.3333 r10 610.6667 r5 610.5000 do regression - get same result as book p. 365 Clipping type 'control' is not in list of coefficients because it is first alphabetically so it is folded into Intercept > model<-lm(biomass ~ clipping) > summary(model) Call: lm(formula = biomass ~ clipping) Residuals: Min 1Q Median 3Q Max -103.333 -49.667 3.417 43.375 177.667 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 465.17 28.75 16.177 9.4e-15 *** clippingn25 88.17 40.66 2.168 0.03987 * clippingn50 104.17 40.66 2.562 0.01683 * clippingr10 145.50 40.66 3.578 0.00145 ** clippingr5 145.33 40.66 3.574 0.00147 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 70.43 on 25 degrees of freedom Multiple R-squared: 0.4077, Adjusted R-squared: 0.3129 F-statistic: 4.302 on 4 and 25 DF, p-value: 0.008752 Relevel - make 'n25' the base level of Clipping: > comp$clipping <- relevel (comp$clipping, ref="n25") > summary(comp) biomass clipping Min. :415.0 n25 :6 1st Qu.:508.8 control:6 Median :568.0 n50 :6 Mean :561.8 r10 :6 3rd Qu.:631.8 r5 :6 Max. :731.0 Redo LM with releveled data > modelRelev<-lm(biomass~clipping, data=comp) Different results. (Some parts, Residuals and Std Errors, are the same) Especially note the Pr and Signifcance columns are different. > summary(modelRelev) Call: lm(formula = biomass ~ clipping, data = comp) Residuals: Min 1Q Median 3Q Max -103.333 -49.667 3.417 43.375 177.667 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 553.33 28.75 19.244 <2e-16 *** clippingcontrol -88.17 40.66 -2.168 0.0399 * clippingn50 16.00 40.66 0.393 0.6973 clippingr10 57.33 40.66 1.410 0.1709 clippingr5 57.17 40.66 1.406 0.1721 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 70.43 on 25 degrees of freedom Multiple R-squared: 0.4077, Adjusted R-squared: 0.3129 F-statistic: 4.302 on 4 and 25 DF, p-value: 0.008752 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.