Re: [R] Which is the easiest (most elegant) way to force "aov" to treat numerical variables as categorical ?
I think I found the solution ! > cc<-factor(cars) > dd<-factor(driver) > MODEL<-y~cc+dd+additive > summary(aov(MODEL,data=DATA)) On 14 Jun, 2010, at 2:52 PM, Andrea Bernasconi DG wrote: > Hi R help, > > Hi R help, > > Which is the easiest (most elegant) way to force "aov" to treat numerical > variables as categorical ? > > Sincerely, Andrea Bernasconi DG > > PROBLEM EXAMPLE > > I consider the latin squares example described at page 157 of the book: > Statistics for Experimenters: Design, Innovation, and Discovery by George E. > P. Box, J. Stuart Hunter, William G. Hunter. > > This example use the data-file /BHH2-Data/tab0408.dat from > ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. > > The file tab0408.dat contains following DATA: > > DATA >driver cars additive y > 1 11A 19 > 2 21D 23 > 3 31B 15 > 4 41C 19 > 5 12B 24 > 6 22C 24 > 7 32D 14 > 8 42A 18 > 9 13D 23 > 10 23A 19 > 11 33C 15 > 12 43B 19 > 13 14C 26 > 14 24B 30 > 15 34A 16 > 16 44D 16 > > Now > > summary( aov(MODEL, data=DATA) ) > Df Sum Sq Mean Sq F value Pr(>F) > cars 1 12.8 12.800 0.8889 0.3680 > driver 1 115.2 115.200 8. 0.0179 * > additive 3 40.0 13.333 0.9259 0.4634 > Residuals 10 144.0 14.400 > --- > Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 > > This results differ from book result at p 159, since "cars" and "driver" are > treated as numerical variables by "aov". > > BRUTE FORCE SOLUTION > > Manually transforming "cars" and "driver" into categorical variables, I > obtain the correct result: > > DATA_AB >driver cars additive y > 1 D1 C1A 19 > 2 D2 C1D 23 > 3 D3 C1B 15 > 4 D4 C1C 19 > 5 D1 C2B 24 > 6 D2 C2C 24 > 7 D3 C2D 14 > 8 D4 C2A 18 > 9 D1 C3D 23 > 10 D2 C3A 19 > 11 D3 C3C 15 > 12 D4 C3B 19 > 13 D1 C4C 26 > 14 D2 C4B 30 > 15 D3 C4A 16 > 16 D4 C4D 16 > > summary( aov(MODEL, data=DATA_AB) ) > Df Sum Sq Mean Sq F value Pr(>F) > cars 3 24 8.000 1.5 0.307174 > driver 3216 72.00013.5 0.004466 ** > additive 3 40 13.333 2.5 0.156490 > Residuals6 32 5.333 > --- > Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 > > QUESTION > > Which is the easiest (most elegant) way to force "driver" and "cars" from > DATA to be treated as categorical variables by "aov"? > More generally, which is the easiest way to force "aov" to treat numerical > variables as categorical ? > > Sincerely, Andrea Bernasconi DG > > PROBLEM EXAMPLE > > I consider the latin squares example described at page 157 of the book: > Statistics for Experimenters: Design, Innovation, and Discovery by George E. > P. Box, J. Stuart Hunter, William G. Hunter. > > This example use the data-file /BHH2-Data/tab0408.dat from > ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. > > The file tab0408.dat contains following DATA: > > DATA >driver cars additive y > 1 11A 19 > 2 21D 23 > 3 31B 15 > 4 41C 19 > 5 12B 24 > 6 22C 24 > 7 32D 14 > 8 42A 18 > 9 13D 23 > 10 23A 19 > 11 33C 15 > 12 43B 19 > 13 14C 26 > 14 24B 30 > 15 34A 16 > 16 44D 16 > > Now > > summary( aov(MODEL, data=DATA) ) > Df Sum Sq Mean Sq F value Pr(>F) > cars 1 12.8 12.800 0.8889 0.3680 > driver 1 115.2 115.200 8. 0.0179 * > additive 3 40.0 13.333 0.9259 0.4634 > Residuals 10 144.0 14.400 > --- > Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 > > This results differ from book result at p 159, since "cars" and "driver" are > treated as numerical variables by "aov". > > BRUTE FORCE SOLUTION > > Manually transforming "cars" and "driver" into categorical variables, I > obtain the correct result: > > DATA_AB >driver cars additive y > 1 D1 C1A 19 > 2 D2 C1D 23 > 3 D3 C1B 15 > 4 D4 C1C 19 > 5 D1 C2B 24 > 6 D2 C2C 24 > 7 D3 C2D 14 > 8 D4 C2A 18 > 9 D1 C3D 23 > 10 D2 C3A 19 > 11 D3 C3C 15 > 12 D4 C
Re: [R] Which is the easiest (most elegant) way to force "aov" to treat numerical variables as categorical ?
Hi, See ?factor e.g.: DATA$driver <- factor(DATA$driver) See also the level= argument if you want to change the order of your levels. HTH, Ivan Le 6/14/2010 14:52, Andrea Bernasconi DG a écrit : > Hi R help, > > Hi R help, > > Which is the easiest (most elegant) way to force "aov" to treat numerical > variables as categorical ? > > Sincerely, Andrea Bernasconi DG > > PROBLEM EXAMPLE > > I consider the latin squares example described at page 157 of the book: > Statistics for Experimenters: Design, Innovation, and Discovery by George E. > P. Box, J. Stuart Hunter, William G. Hunter. > > This example use the data-file /BHH2-Data/tab0408.dat from > ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. > > The file tab0408.dat contains following DATA: > >> DATA >> > driver cars additive y > 1 11A 19 > 2 21D 23 > 3 31B 15 > 4 41C 19 > 5 12B 24 > 6 22C 24 > 7 32D 14 > 8 42A 18 > 9 13D 23 > 10 23A 19 > 11 33C 15 > 12 43B 19 > 13 14C 26 > 14 24B 30 > 15 34A 16 > 16 44D 16 > > Now > >> summary( aov(MODEL, data=DATA) ) >> > Df Sum Sq Mean Sq F value Pr(>F) > cars 1 12.8 12.800 0.8889 0.3680 > driver 1 115.2 115.200 8. 0.0179 * > additive 3 40.0 13.333 0.9259 0.4634 > Residuals 10 144.0 14.400 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > This results differ from book result at p 159, since "cars" and "driver" are > treated as numerical variables by "aov". > > BRUTE FORCE SOLUTION > > Manually transforming "cars" and "driver" into categorical variables, I > obtain the correct result: > >> DATA_AB >> > driver cars additive y > 1 D1 C1A 19 > 2 D2 C1D 23 > 3 D3 C1B 15 > 4 D4 C1C 19 > 5 D1 C2B 24 > 6 D2 C2C 24 > 7 D3 C2D 14 > 8 D4 C2A 18 > 9 D1 C3D 23 > 10 D2 C3A 19 > 11 D3 C3C 15 > 12 D4 C3B 19 > 13 D1 C4C 26 > 14 D2 C4B 30 > 15 D3 C4A 16 > 16 D4 C4D 16 > >> summary( aov(MODEL, data=DATA_AB) ) >> > Df Sum Sq Mean Sq F value Pr(>F) > cars 3 24 8.000 1.5 0.307174 > driver 3216 72.00013.5 0.004466 ** > additive 3 40 13.333 2.5 0.156490 > Residuals6 32 5.333 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > QUESTION > > Which is the easiest (most elegant) way to force "driver" and "cars" from > DATA to be treated as categorical variables by "aov"? > More generally, which is the easiest way to force "aov" to treat numerical > variables as categorical ? > > Sincerely, Andrea Bernasconi DG > > PROBLEM EXAMPLE > > I consider the latin squares example described at page 157 of the book: > Statistics for Experimenters: Design, Innovation, and Discovery by George E. > P. Box, J. Stuart Hunter, William G. Hunter. > > This example use the data-file /BHH2-Data/tab0408.dat from > ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. > > The file tab0408.dat contains following DATA: > >> DATA >> > driver cars additive y > 1 11A 19 > 2 21D 23 > 3 31B 15 > 4 41C 19 > 5 12B 24 > 6 22C 24 > 7 32D 14 > 8 42A 18 > 9 13D 23 > 10 23A 19 > 11 33C 15 > 12 43B 19 > 13 14C 26 > 14 24B 30 > 15 34A 16 > 16 44D 16 > > Now > >> summary( aov(MODEL, data=DATA) ) >> > Df Sum Sq Mean Sq F value Pr(>F) > cars 1 12.8 12.800 0.8889 0.3680 > driver 1 115.2 115.200 8. 0.0179 * > additive 3 40.0 13.333 0.9259 0.4634 > Residuals 10 144.0 14.400 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > This results differ from book result at p 159, since "cars" and "driver" are > treated as numerical variables by "aov". > > BRUTE FORCE SOLUTION > > Manually transforming "cars" and "driver" into categorical variables, I > obtain the correct result: > >> DATA_AB >> > driver cars additive y > 1 D1 C1A 19 > 2 D2 C1D 23 > 3 D3 C1B 15 > 4 D4 C1C 19 > 5 D1 C2B 24 > 6 D2 C2C 24 > 7 D3 C2D 14 > 8 D4 C2A 18 > 9 D1 C3D 23 > 10 D2 C3A 19 > 11 D3 C3C 15
[R] Which is the easiest (most elegant) way to force "aov" to treat numerical variables as categorical ?
Hi R help, Hi R help, Which is the easiest (most elegant) way to force "aov" to treat numerical variables as categorical ? Sincerely, Andrea Bernasconi DG PROBLEM EXAMPLE I consider the latin squares example described at page 157 of the book: Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, J. Stuart Hunter, William G. Hunter. This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. The file tab0408.dat contains following DATA: > DATA driver cars additive y 1 11A 19 2 21D 23 3 31B 15 4 41C 19 5 12B 24 6 22C 24 7 32D 14 8 42A 18 9 13D 23 10 23A 19 11 33C 15 12 43B 19 13 14C 26 14 24B 30 15 34A 16 16 44D 16 Now > summary( aov(MODEL, data=DATA) ) Df Sum Sq Mean Sq F value Pr(>F) cars 1 12.8 12.800 0.8889 0.3680 driver 1 115.2 115.200 8. 0.0179 * additive 3 40.0 13.333 0.9259 0.4634 Residuals 10 144.0 14.400 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 This results differ from book result at p 159, since "cars" and "driver" are treated as numerical variables by "aov". BRUTE FORCE SOLUTION Manually transforming "cars" and "driver" into categorical variables, I obtain the correct result: > DATA_AB driver cars additive y 1 D1 C1A 19 2 D2 C1D 23 3 D3 C1B 15 4 D4 C1C 19 5 D1 C2B 24 6 D2 C2C 24 7 D3 C2D 14 8 D4 C2A 18 9 D1 C3D 23 10 D2 C3A 19 11 D3 C3C 15 12 D4 C3B 19 13 D1 C4C 26 14 D2 C4B 30 15 D3 C4A 16 16 D4 C4D 16 > summary( aov(MODEL, data=DATA_AB) ) Df Sum Sq Mean Sq F value Pr(>F) cars 3 24 8.000 1.5 0.307174 driver 3216 72.00013.5 0.004466 ** additive 3 40 13.333 2.5 0.156490 Residuals6 32 5.333 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 QUESTION Which is the easiest (most elegant) way to force "driver" and "cars" from DATA to be treated as categorical variables by "aov"? More generally, which is the easiest way to force "aov" to treat numerical variables as categorical ? Sincerely, Andrea Bernasconi DG PROBLEM EXAMPLE I consider the latin squares example described at page 157 of the book: Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, J. Stuart Hunter, William G. Hunter. This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. The file tab0408.dat contains following DATA: > DATA driver cars additive y 1 11A 19 2 21D 23 3 31B 15 4 41C 19 5 12B 24 6 22C 24 7 32D 14 8 42A 18 9 13D 23 10 23A 19 11 33C 15 12 43B 19 13 14C 26 14 24B 30 15 34A 16 16 44D 16 Now > summary( aov(MODEL, data=DATA) ) Df Sum Sq Mean Sq F value Pr(>F) cars 1 12.8 12.800 0.8889 0.3680 driver 1 115.2 115.200 8. 0.0179 * additive 3 40.0 13.333 0.9259 0.4634 Residuals 10 144.0 14.400 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 This results differ from book result at p 159, since "cars" and "driver" are treated as numerical variables by "aov". BRUTE FORCE SOLUTION Manually transforming "cars" and "driver" into categorical variables, I obtain the correct result: > DATA_AB driver cars additive y 1 D1 C1A 19 2 D2 C1D 23 3 D3 C1B 15 4 D4 C1C 19 5 D1 C2B 24 6 D2 C2C 24 7 D3 C2D 14 8 D4 C2A 18 9 D1 C3D 23 10 D2 C3A 19 11 D3 C3C 15 12 D4 C3B 19 13 D1 C4C 26 14 D2 C4B 30 15 D3 C4A 16 16 D4 C4D 16 > summary( aov(MODEL, data=DATA_AB) ) Df Sum Sq Mean Sq F value Pr(>F) cars 3 24 8.000 1.5 0.307174 driver 3216 72.00013.5 0.004466 ** additive 3 40 13.333 2.5 0.156490 Residuals6 32 5.333 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 QUESTION Wh