[R] slow computation of mixed ANOVA using aov

Steven Lacey Fri, 18 Mar 2005 08:12:27 -0800

Dear R-help list,
 
I am trying to do a mixed ANOVA on a 8960 x 5 dataframe. I have 3 factors
for which I want to test all main effects and interactions : f1 (40 levels),
f2 (7 levels), and f3 (4 levels). I also have a subject factor, subject, and
a dependent variable, dv. 
 
Some more information about the factors:
f2 is a between-subject factor. That is, for each level of f2 there are 8
nested levels of the subject factor. For example, levels 1-8 of subject are
nested in level 1 of f2. Levels 9-16 of subject are nested in level 2 of f2.
In other words, the subjects that participated in any level of f2 are
different from the subjects that participated in any other level of f2. 
 
In contrast, f1 and f3 are within-subject factors. That is, for any one of
the 56 subjects, we have a 160 medians corresponding to each condition from
a complete crossing of factors f1 and f2. While it is true that we do have
replicate observations for any subject in each of these conditions, we take
the median of those values and operate as if there is only a single
observation for each subject in each of the 160 within-subject conditions. 
 
Below is code that will generate dataframe d, which is just like the one I
am working with:
 
f1<-gl(40,1,8960,ordered=T)
f2<-gl(7,1280,8960,ordered=T)
f3<-gl(4,40,8960,ordered=T)
subject<-gl(56,160,8960,ordered=T)
dv<-rnorm(8960,mean=500,sd=50)
d <- data.frame(f1,f2,f3,f4,dv)
 
To run the mixed ANOVA I use the following call (modeled after J. Baron):
aov(dv~f1*f2*f3+Error(subject/(f1*f2)),data=d)


WARNING: Exert caution when running the aov command. I have run the exact
same command on Windows and Unix machines (each with 1GB of RAM; allocated
up to 3 or 4GB of memory for the analysis ) and it has taken many, many
hours to finish. That said, this is not a new problem posted on the R-help
list. There are several posts where analysts have run into similar problems.
My general impression of these posts, and correct me if I am wrong, is that
because aov is a wrapper around lm, the extra time is required to build and
manipulate a design matrix (via qr decomposition) that is 8960 x several
thousand columns large! Is that so? It seems fitting because if I call aov
with only a single factor, then it returns in a few seconds. 
 
In order to test if the computational slowness was something unique to R, I
ran the same analysis, including all 3 factors, in SPSS. To my surprise SPSS
returned almost instantaneously. I do not know much about the algorithm in
SPSS, but I suspect it may be calculating condition means and sums of
squares rather than generating a design matrix. Does that sound plausible?
 
At this point I am a dedicated R user. However, I do the kind of analysis
described above quite often. It is important that my statistical package be
able to handle it more efficiently than what I have been able to get R to do
at this point. Am I doing anything obviously wrong? Is there a method in R
that more closely resembles the algorithm used in SPSS? If not, are there
any other methods R has to do these kind of analyses? Could I split up the
analysis in some principled way to ease the processing demand on R?
 
Thanks in advanvce for any further insight into this issue, 
Steve Lacey

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] slow computation of mixed ANOVA using aov

Reply via email to