Hi R help,

Hi R help,

Which is the easiest (most elegant) way to force "aov" to treat numerical 
variables as categorical ?

Sincerely, Andrea Bernasconi DG

PROBLEM EXAMPLE

I consider the latin squares example described at page 157 of the book:
Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. 
Box, J. Stuart Hunter, William G. Hunter.

This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ 
in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.

The file tab0408.dat contains following DATA:
> DATA
   driver cars additive  y
1       1    1        A 19
2       2    1        D 23
3       3    1        B 15
4       4    1        C 19
5       1    2        B 24
6       2    2        C 24
7       3    2        D 14
8       4    2        A 18
9       1    3        D 23
10      2    3        A 19
11      3    3        C 15
12      4    3        B 19
13      1    4        C 26
14      2    4        B 30
15      3    4        A 16
16      4    4        D 16

Now
> summary( aov(MODEL, data=DATA) )
            Df Sum Sq Mean Sq F value Pr(>F)  
cars         1   12.8  12.800  0.8889 0.3680  
driver       1  115.2 115.200  8.0000 0.0179 *
additive     3   40.0  13.333  0.9259 0.4634  
Residuals   10  144.0  14.400                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This results differ from book result at p 159, since "cars" and "driver" are 
treated as numerical variables by "aov".

BRUTE FORCE SOLUTION

Manually transforming "cars" and "driver" into categorical variables, I obtain 
the correct result:
> DATA_AB
   driver cars additive  y
1      D1   C1        A 19
2      D2   C1        D 23
3      D3   C1        B 15
4      D4   C1        C 19
5      D1   C2        B 24
6      D2   C2        C 24
7      D3   C2        D 14
8      D4   C2        A 18
9      D1   C3        D 23
10     D2   C3        A 19
11     D3   C3        C 15
12     D4   C3        B 19
13     D1   C4        C 26
14     D2   C4        B 30
15     D3   C4        A 16
16     D4   C4        D 16
> summary( aov(MODEL, data=DATA_AB) )
            Df Sum Sq Mean Sq F value   Pr(>F)   
cars         3     24   8.000     1.5 0.307174   
driver       3    216  72.000    13.5 0.004466 **
additive     3     40  13.333     2.5 0.156490   
Residuals    6     32   5.333                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

QUESTION

Which is the easiest (most elegant) way to force "driver" and "cars" from DATA 
to be treated as categorical variables by "aov"?
More generally, which is the easiest way to force "aov"  to treat numerical 
variables as categorical ?

Sincerely, Andrea Bernasconi DG

PROBLEM EXAMPLE

I consider the latin squares example described at page 157 of the book:
Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. 
Box, J. Stuart Hunter, William G. Hunter.

This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ 
in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.

The file tab0408.dat contains following DATA:
> DATA
   driver cars additive  y
1       1    1        A 19
2       2    1        D 23
3       3    1        B 15
4       4    1        C 19
5       1    2        B 24
6       2    2        C 24
7       3    2        D 14
8       4    2        A 18
9       1    3        D 23
10      2    3        A 19
11      3    3        C 15
12      4    3        B 19
13      1    4        C 26
14      2    4        B 30
15      3    4        A 16
16      4    4        D 16

Now
> summary( aov(MODEL, data=DATA) )
            Df Sum Sq Mean Sq F value Pr(>F)  
cars         1   12.8  12.800  0.8889 0.3680  
driver       1  115.2 115.200  8.0000 0.0179 *
additive     3   40.0  13.333  0.9259 0.4634  
Residuals   10  144.0  14.400                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This results differ from book result at p 159, since "cars" and "driver" are 
treated as numerical variables by "aov".

BRUTE FORCE SOLUTION

Manually transforming "cars" and "driver" into categorical variables, I obtain 
the correct result:
> DATA_AB
   driver cars additive  y
1      D1   C1        A 19
2      D2   C1        D 23
3      D3   C1        B 15
4      D4   C1        C 19
5      D1   C2        B 24
6      D2   C2        C 24
7      D3   C2        D 14
8      D4   C2        A 18
9      D1   C3        D 23
10     D2   C3        A 19
11     D3   C3        C 15
12     D4   C3        B 19
13     D1   C4        C 26
14     D2   C4        B 30
15     D3   C4        A 16
16     D4   C4        D 16
> summary( aov(MODEL, data=DATA_AB) )
            Df Sum Sq Mean Sq F value   Pr(>F)   
cars         3     24   8.000     1.5 0.307174   
driver       3    216  72.000    13.5 0.004466 **
additive     3     40  13.333     2.5 0.156490   
Residuals    6     32   5.333                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

QUESTION

Which is the easiest (most elegant) way to force "driver" and "cars" from DATA 
to be treated as categorical variables by "aov"?
More generally, which is the easiest way to force "aov"  to treat numerical 
variables as categorical ?



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to