Re: [R] Which is the easiest (most elegant) way to force "aov" to treat numerical variables as categorical ?

2010-06-14 Thread Andrea Bernasconi DG
I think I found the solution !

> cc<-factor(cars)
> dd<-factor(driver)
> MODEL<-y~cc+dd+additive
> summary(aov(MODEL,data=DATA))

On 14 Jun, 2010, at 2:52 PM, Andrea Bernasconi DG wrote:

> Hi R help,
> 
> Hi R help,
> 
> Which is the easiest (most elegant) way to force "aov" to treat numerical 
> variables as categorical ?
> 
> Sincerely, Andrea Bernasconi DG
> 
> PROBLEM EXAMPLE
> 
> I consider the latin squares example described at page 157 of the book:
> Statistics for Experimenters: Design, Innovation, and Discovery by George E. 
> P. Box, J. Stuart Hunter, William G. Hunter.
> 
> This example use the data-file /BHH2-Data/tab0408.dat from 
> ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.
> 
> The file tab0408.dat contains following DATA:
> > DATA
>driver cars additive  y
> 1   11A 19
> 2   21D 23
> 3   31B 15
> 4   41C 19
> 5   12B 24
> 6   22C 24
> 7   32D 14
> 8   42A 18
> 9   13D 23
> 10  23A 19
> 11  33C 15
> 12  43B 19
> 13  14C 26
> 14  24B 30
> 15  34A 16
> 16  44D 16
> 
> Now
> > summary( aov(MODEL, data=DATA) )
> Df Sum Sq Mean Sq F value Pr(>F)  
> cars 1   12.8  12.800  0.8889 0.3680  
> driver   1  115.2 115.200  8. 0.0179 *
> additive 3   40.0  13.333  0.9259 0.4634  
> Residuals   10  144.0  14.400 
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> This results differ from book result at p 159, since "cars" and "driver" are 
> treated as numerical variables by "aov".
> 
> BRUTE FORCE SOLUTION
> 
> Manually transforming "cars" and "driver" into categorical variables, I 
> obtain the correct result:
> > DATA_AB
>driver cars additive  y
> 1  D1   C1A 19
> 2  D2   C1D 23
> 3  D3   C1B 15
> 4  D4   C1C 19
> 5  D1   C2B 24
> 6  D2   C2C 24
> 7  D3   C2D 14
> 8  D4   C2A 18
> 9  D1   C3D 23
> 10 D2   C3A 19
> 11 D3   C3C 15
> 12 D4   C3B 19
> 13 D1   C4C 26
> 14 D2   C4B 30
> 15 D3   C4A 16
> 16 D4   C4D 16
> > summary( aov(MODEL, data=DATA_AB) )
> Df Sum Sq Mean Sq F value   Pr(>F)   
> cars 3 24   8.000 1.5 0.307174   
> driver   3216  72.00013.5 0.004466 **
> additive 3 40  13.333 2.5 0.156490   
> Residuals6 32   5.333
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
> 
> QUESTION
> 
> Which is the easiest (most elegant) way to force "driver" and "cars" from 
> DATA to be treated as categorical variables by "aov"?
> More generally, which is the easiest way to force "aov"  to treat numerical 
> variables as categorical ?
> 
> Sincerely, Andrea Bernasconi DG
> 
> PROBLEM EXAMPLE
> 
> I consider the latin squares example described at page 157 of the book:
> Statistics for Experimenters: Design, Innovation, and Discovery by George E. 
> P. Box, J. Stuart Hunter, William G. Hunter.
> 
> This example use the data-file /BHH2-Data/tab0408.dat from 
> ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.
> 
> The file tab0408.dat contains following DATA:
> > DATA
>driver cars additive  y
> 1   11A 19
> 2   21D 23
> 3   31B 15
> 4   41C 19
> 5   12B 24
> 6   22C 24
> 7   32D 14
> 8   42A 18
> 9   13D 23
> 10  23A 19
> 11  33C 15
> 12  43B 19
> 13  14C 26
> 14  24B 30
> 15  34A 16
> 16  44D 16
> 
> Now
> > summary( aov(MODEL, data=DATA) )
> Df Sum Sq Mean Sq F value Pr(>F)  
> cars 1   12.8  12.800  0.8889 0.3680  
> driver   1  115.2 115.200  8. 0.0179 *
> additive 3   40.0  13.333  0.9259 0.4634  
> Residuals   10  144.0  14.400 
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> This results differ from book result at p 159, since "cars" and "driver" are 
> treated as numerical variables by "aov".
> 
> BRUTE FORCE SOLUTION
> 
> Manually transforming "cars" and "driver" into categorical variables, I 
> obtain the correct result:
> > DATA_AB
>driver cars additive  y
> 1  D1   C1A 19
> 2  D2   C1D 23
> 3  D3   C1B 15
> 4  D4   C1C 19
> 5  D1   C2B 24
> 6  D2   C2C 24
> 7  D3   C2D 14
> 8  D4   C2A 18
> 9  D1   C3D 23
> 10 D2   C3A 19
> 11 D3   C3C 15
> 12 D4   C

Re: [R] Which is the easiest (most elegant) way to force "aov" to treat numerical variables as categorical ?

2010-06-14 Thread Ivan Calandra
Hi,

See ?factor
e.g.: DATA$driver <- factor(DATA$driver)
See also the level= argument if you want to change the order of your levels.

HTH,
Ivan

Le 6/14/2010 14:52, Andrea Bernasconi DG a écrit :
> Hi R help,
>
> Hi R help,
>
> Which is the easiest (most elegant) way to force "aov" to treat numerical 
> variables as categorical ?
>
> Sincerely, Andrea Bernasconi DG
>
> PROBLEM EXAMPLE
>
> I consider the latin squares example described at page 157 of the book:
> Statistics for Experimenters: Design, Innovation, and Discovery by George E. 
> P. Box, J. Stuart Hunter, William G. Hunter.
>
> This example use the data-file /BHH2-Data/tab0408.dat from 
> ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.
>
> The file tab0408.dat contains following DATA:
>
>> DATA
>>  
> driver cars additive  y
> 1   11A 19
> 2   21D 23
> 3   31B 15
> 4   41C 19
> 5   12B 24
> 6   22C 24
> 7   32D 14
> 8   42A 18
> 9   13D 23
> 10  23A 19
> 11  33C 15
> 12  43B 19
> 13  14C 26
> 14  24B 30
> 15  34A 16
> 16  44D 16
>
> Now
>
>> summary( aov(MODEL, data=DATA) )
>>  
>  Df Sum Sq Mean Sq F value Pr(>F)
> cars 1   12.8  12.800  0.8889 0.3680
> driver   1  115.2 115.200  8. 0.0179 *
> additive 3   40.0  13.333  0.9259 0.4634
> Residuals   10  144.0  14.400
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> This results differ from book result at p 159, since "cars" and "driver" are 
> treated as numerical variables by "aov".
>
> BRUTE FORCE SOLUTION
>
> Manually transforming "cars" and "driver" into categorical variables, I 
> obtain the correct result:
>
>> DATA_AB
>>  
> driver cars additive  y
> 1  D1   C1A 19
> 2  D2   C1D 23
> 3  D3   C1B 15
> 4  D4   C1C 19
> 5  D1   C2B 24
> 6  D2   C2C 24
> 7  D3   C2D 14
> 8  D4   C2A 18
> 9  D1   C3D 23
> 10 D2   C3A 19
> 11 D3   C3C 15
> 12 D4   C3B 19
> 13 D1   C4C 26
> 14 D2   C4B 30
> 15 D3   C4A 16
> 16 D4   C4D 16
>
>> summary( aov(MODEL, data=DATA_AB) )
>>  
>  Df Sum Sq Mean Sq F value   Pr(>F)
> cars 3 24   8.000 1.5 0.307174
> driver   3216  72.00013.5 0.004466 **
> additive 3 40  13.333 2.5 0.156490
> Residuals6 32   5.333
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> QUESTION
>
> Which is the easiest (most elegant) way to force "driver" and "cars" from 
> DATA to be treated as categorical variables by "aov"?
> More generally, which is the easiest way to force "aov"  to treat numerical 
> variables as categorical ?
>
> Sincerely, Andrea Bernasconi DG
>
> PROBLEM EXAMPLE
>
> I consider the latin squares example described at page 157 of the book:
> Statistics for Experimenters: Design, Innovation, and Discovery by George E. 
> P. Box, J. Stuart Hunter, William G. Hunter.
>
> This example use the data-file /BHH2-Data/tab0408.dat from 
> ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.
>
> The file tab0408.dat contains following DATA:
>
>> DATA
>>  
> driver cars additive  y
> 1   11A 19
> 2   21D 23
> 3   31B 15
> 4   41C 19
> 5   12B 24
> 6   22C 24
> 7   32D 14
> 8   42A 18
> 9   13D 23
> 10  23A 19
> 11  33C 15
> 12  43B 19
> 13  14C 26
> 14  24B 30
> 15  34A 16
> 16  44D 16
>
> Now
>
>> summary( aov(MODEL, data=DATA) )
>>  
>  Df Sum Sq Mean Sq F value Pr(>F)
> cars 1   12.8  12.800  0.8889 0.3680
> driver   1  115.2 115.200  8. 0.0179 *
> additive 3   40.0  13.333  0.9259 0.4634
> Residuals   10  144.0  14.400
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> This results differ from book result at p 159, since "cars" and "driver" are 
> treated as numerical variables by "aov".
>
> BRUTE FORCE SOLUTION
>
> Manually transforming "cars" and "driver" into categorical variables, I 
> obtain the correct result:
>
>> DATA_AB
>>  
> driver cars additive  y
> 1  D1   C1A 19
> 2  D2   C1D 23
> 3  D3   C1B 15
> 4  D4   C1C 19
> 5  D1   C2B 24
> 6  D2   C2C 24
> 7  D3   C2D 14
> 8  D4   C2A 18
> 9  D1   C3D 23
> 10 D2   C3A 19
> 11 D3   C3C 15

[R] Which is the easiest (most elegant) way to force "aov" to treat numerical variables as categorical ?

2010-06-14 Thread Andrea Bernasconi DG
Hi R help,

Hi R help,

Which is the easiest (most elegant) way to force "aov" to treat numerical 
variables as categorical ?

Sincerely, Andrea Bernasconi DG

PROBLEM EXAMPLE

I consider the latin squares example described at page 157 of the book:
Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. 
Box, J. Stuart Hunter, William G. Hunter.

This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ 
in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.

The file tab0408.dat contains following DATA:
> DATA
   driver cars additive  y
1   11A 19
2   21D 23
3   31B 15
4   41C 19
5   12B 24
6   22C 24
7   32D 14
8   42A 18
9   13D 23
10  23A 19
11  33C 15
12  43B 19
13  14C 26
14  24B 30
15  34A 16
16  44D 16

Now
> summary( aov(MODEL, data=DATA) )
Df Sum Sq Mean Sq F value Pr(>F)  
cars 1   12.8  12.800  0.8889 0.3680  
driver   1  115.2 115.200  8. 0.0179 *
additive 3   40.0  13.333  0.9259 0.4634  
Residuals   10  144.0  14.400 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This results differ from book result at p 159, since "cars" and "driver" are 
treated as numerical variables by "aov".

BRUTE FORCE SOLUTION

Manually transforming "cars" and "driver" into categorical variables, I obtain 
the correct result:
> DATA_AB
   driver cars additive  y
1  D1   C1A 19
2  D2   C1D 23
3  D3   C1B 15
4  D4   C1C 19
5  D1   C2B 24
6  D2   C2C 24
7  D3   C2D 14
8  D4   C2A 18
9  D1   C3D 23
10 D2   C3A 19
11 D3   C3C 15
12 D4   C3B 19
13 D1   C4C 26
14 D2   C4B 30
15 D3   C4A 16
16 D4   C4D 16
> summary( aov(MODEL, data=DATA_AB) )
Df Sum Sq Mean Sq F value   Pr(>F)   
cars 3 24   8.000 1.5 0.307174   
driver   3216  72.00013.5 0.004466 **
additive 3 40  13.333 2.5 0.156490   
Residuals6 32   5.333
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

QUESTION

Which is the easiest (most elegant) way to force "driver" and "cars" from DATA 
to be treated as categorical variables by "aov"?
More generally, which is the easiest way to force "aov"  to treat numerical 
variables as categorical ?

Sincerely, Andrea Bernasconi DG

PROBLEM EXAMPLE

I consider the latin squares example described at page 157 of the book:
Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. 
Box, J. Stuart Hunter, William G. Hunter.

This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ 
in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.

The file tab0408.dat contains following DATA:
> DATA
   driver cars additive  y
1   11A 19
2   21D 23
3   31B 15
4   41C 19
5   12B 24
6   22C 24
7   32D 14
8   42A 18
9   13D 23
10  23A 19
11  33C 15
12  43B 19
13  14C 26
14  24B 30
15  34A 16
16  44D 16

Now
> summary( aov(MODEL, data=DATA) )
Df Sum Sq Mean Sq F value Pr(>F)  
cars 1   12.8  12.800  0.8889 0.3680  
driver   1  115.2 115.200  8. 0.0179 *
additive 3   40.0  13.333  0.9259 0.4634  
Residuals   10  144.0  14.400 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This results differ from book result at p 159, since "cars" and "driver" are 
treated as numerical variables by "aov".

BRUTE FORCE SOLUTION

Manually transforming "cars" and "driver" into categorical variables, I obtain 
the correct result:
> DATA_AB
   driver cars additive  y
1  D1   C1A 19
2  D2   C1D 23
3  D3   C1B 15
4  D4   C1C 19
5  D1   C2B 24
6  D2   C2C 24
7  D3   C2D 14
8  D4   C2A 18
9  D1   C3D 23
10 D2   C3A 19
11 D3   C3C 15
12 D4   C3B 19
13 D1   C4C 26
14 D2   C4B 30
15 D3   C4A 16
16 D4   C4D 16
> summary( aov(MODEL, data=DATA_AB) )
Df Sum Sq Mean Sq F value   Pr(>F)   
cars 3 24   8.000 1.5 0.307174   
driver   3216  72.00013.5 0.004466 **
additive 3 40  13.333 2.5 0.156490   
Residuals6 32   5.333
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

QUESTION

Wh