On Mon, 24 Aug 2009, big permie wrote:
Dear R users,
I have a matrix a and a classification vector b such that
str(a)
num [1:50, 1:800000]
and
str(b)
Factor w/ 3 levels "cond1","cond2","cond3"
I'd like to do an anova on all 800000 columns and record the F statistic for
each test; I currently do this using
f.stat.vec <- numeric(length(a[1,])
for (i in 1:length(a[1,]) {
f.test.frame <- data.frame(nums = a[,i], cond = b)
aov.vox <- aov(nums ~ cond, data = f.test.frame)
f.stat <- summary(aov.vox)[[1]][1,4]
f.stat.vec[i] <- f.stat
}
The problem is that this code takes about 70 minutes to run.
Using lsfit(), my five year old windows XP PC does 100k columns in about
40 seconds, so I reckon that in 5 minutes it could do 800000:
x <- factor(sample(1:3,50,repl=T))
x.mat <- model.matrix( ~x )[, 2:3 ] # drop intercept here
y <- matrix(rnorm(50*100000),nc=100000)
system.time({fit <- lsfit( x.mat, y );ls.pr <- ls.print( fit, print.it=FALSE
)})
user system elapsed
39.16 0.58 39.79
# check F-statistic:
summary(as.numeric( ls.pr[[ 'summary' ]][, "F-value" ]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.2932 0.7095 1.0460 1.4270 17.1100
# theoretical 1st Qu., Median, 3rd Qu match
qf(c(.25,0.5, 0.75 ), 2, 47 )
[1] 0.2894502 0.7034708 1.4280000
If it needed to be faster, I would take the part out of ls.print() that
does just the F-statistic and work on it.
Of course, a newer computer would be a lot faster, too.
HTH,
Chuck
Is there a faster way to do an anova & record the F stat for each column?
Any help would be appreciated.
Thanks
Heath
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.