On Mon, 24 Aug 2009, big permie wrote:

Dear R users,

I have a matrix a and a classification vector b such that

str(a)
num [1:50, 1:800000]
and
str(b)
Factor w/ 3 levels "cond1","cond2","cond3"

I'd like to do an anova on all 800000 columns and record the F statistic for
each test; I currently do this using

f.stat.vec <- numeric(length(a[1,])

for (i in 1:length(a[1,]) {
 f.test.frame <- data.frame(nums = a[,i], cond = b)
 aov.vox <- aov(nums ~ cond, data = f.test.frame)
 f.stat <- summary(aov.vox)[[1]][1,4]
 f.stat.vec[i] <- f.stat
}

The problem is that this code takes about 70 minutes to run.

Using lsfit(), my five year old windows XP PC does 100k columns in about 40 seconds, so I reckon that in 5 minutes it could do 800000:

x <- factor(sample(1:3,50,repl=T))
x.mat <- model.matrix( ~x )[, 2:3 ] # drop intercept here
y <- matrix(rnorm(50*100000),nc=100000)
system.time({fit <- lsfit( x.mat, y );ls.pr <- ls.print( fit,  print.it=FALSE 
)})
   user  system elapsed
  39.16    0.58   39.79

# check F-statistic:
summary(as.numeric( ls.pr[[ 'summary' ]][, "F-value" ]))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 0.0000  0.2932  0.7095  1.0460  1.4270 17.1100

# theoretical 1st Qu., Median, 3rd Qu match

qf(c(.25,0.5, 0.75 ), 2, 47 )
[1] 0.2894502 0.7034708 1.4280000



If it needed to be faster, I would take the part out of ls.print() that does just the F-statistic and work on it.

Of course, a newer computer would be a lot faster, too.

HTH,

Chuck


Is there a faster way to do an anova & record the F stat for each column?

Any help would be appreciated.

Thanks
Heath

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu               UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to