Hi R help, Hi R help,
Which is the easiest (most elegant) way to force "aov" to treat numerical variables as categorical ? Sincerely, Andrea Bernasconi DG PROBLEM EXAMPLE I consider the latin squares example described at page 157 of the book: Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, J. Stuart Hunter, William G. Hunter. This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. The file tab0408.dat contains following DATA: > DATA driver cars additive y 1 1 1 A 19 2 2 1 D 23 3 3 1 B 15 4 4 1 C 19 5 1 2 B 24 6 2 2 C 24 7 3 2 D 14 8 4 2 A 18 9 1 3 D 23 10 2 3 A 19 11 3 3 C 15 12 4 3 B 19 13 1 4 C 26 14 2 4 B 30 15 3 4 A 16 16 4 4 D 16 Now > summary( aov(MODEL, data=DATA) ) Df Sum Sq Mean Sq F value Pr(>F) cars 1 12.8 12.800 0.8889 0.3680 driver 1 115.2 115.200 8.0000 0.0179 * additive 3 40.0 13.333 0.9259 0.4634 Residuals 10 144.0 14.400 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 This results differ from book result at p 159, since "cars" and "driver" are treated as numerical variables by "aov". BRUTE FORCE SOLUTION Manually transforming "cars" and "driver" into categorical variables, I obtain the correct result: > DATA_AB driver cars additive y 1 D1 C1 A 19 2 D2 C1 D 23 3 D3 C1 B 15 4 D4 C1 C 19 5 D1 C2 B 24 6 D2 C2 C 24 7 D3 C2 D 14 8 D4 C2 A 18 9 D1 C3 D 23 10 D2 C3 A 19 11 D3 C3 C 15 12 D4 C3 B 19 13 D1 C4 C 26 14 D2 C4 B 30 15 D3 C4 A 16 16 D4 C4 D 16 > summary( aov(MODEL, data=DATA_AB) ) Df Sum Sq Mean Sq F value Pr(>F) cars 3 24 8.000 1.5 0.307174 driver 3 216 72.000 13.5 0.004466 ** additive 3 40 13.333 2.5 0.156490 Residuals 6 32 5.333 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 QUESTION Which is the easiest (most elegant) way to force "driver" and "cars" from DATA to be treated as categorical variables by "aov"? More generally, which is the easiest way to force "aov" to treat numerical variables as categorical ? Sincerely, Andrea Bernasconi DG PROBLEM EXAMPLE I consider the latin squares example described at page 157 of the book: Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, J. Stuart Hunter, William G. Hunter. This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. The file tab0408.dat contains following DATA: > DATA driver cars additive y 1 1 1 A 19 2 2 1 D 23 3 3 1 B 15 4 4 1 C 19 5 1 2 B 24 6 2 2 C 24 7 3 2 D 14 8 4 2 A 18 9 1 3 D 23 10 2 3 A 19 11 3 3 C 15 12 4 3 B 19 13 1 4 C 26 14 2 4 B 30 15 3 4 A 16 16 4 4 D 16 Now > summary( aov(MODEL, data=DATA) ) Df Sum Sq Mean Sq F value Pr(>F) cars 1 12.8 12.800 0.8889 0.3680 driver 1 115.2 115.200 8.0000 0.0179 * additive 3 40.0 13.333 0.9259 0.4634 Residuals 10 144.0 14.400 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 This results differ from book result at p 159, since "cars" and "driver" are treated as numerical variables by "aov". BRUTE FORCE SOLUTION Manually transforming "cars" and "driver" into categorical variables, I obtain the correct result: > DATA_AB driver cars additive y 1 D1 C1 A 19 2 D2 C1 D 23 3 D3 C1 B 15 4 D4 C1 C 19 5 D1 C2 B 24 6 D2 C2 C 24 7 D3 C2 D 14 8 D4 C2 A 18 9 D1 C3 D 23 10 D2 C3 A 19 11 D3 C3 C 15 12 D4 C3 B 19 13 D1 C4 C 26 14 D2 C4 B 30 15 D3 C4 A 16 16 D4 C4 D 16 > summary( aov(MODEL, data=DATA_AB) ) Df Sum Sq Mean Sq F value Pr(>F) cars 3 24 8.000 1.5 0.307174 driver 3 216 72.000 13.5 0.004466 ** additive 3 40 13.333 2.5 0.156490 Residuals 6 32 5.333 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 QUESTION Which is the easiest (most elegant) way to force "driver" and "cars" from DATA to be treated as categorical variables by "aov"? More generally, which is the easiest way to force "aov" to treat numerical variables as categorical ? [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.