Ok thanks to a hint of Matthew to a former post with a similar request I have now three faster solutions (see below), the last one being the fastest, but the former two also faster than the for-loop, apply(lm(formula)) and sapply(lm(formula)) versions in my last mail:

one problem only: using lsfit I can't get directly measures such as r.squared ...

---------------

## using lm with a matrix response (recommended by BDR)
date()
rsq <-unlist(summary(lm(array(c(Y), dim = c(t.length, prod(d.dim[2:4]))) ~ X)))[seq(22, prod(d.dim[2:4]) * 30, by = 30)] #get r.squared list-element
names(rsq) <- prod(d.dim[2:4])
rsq <- array(rsq, dim = d.dim[2:4])
date()



## using sapply and lsfit instead of lm (recommended by Kevin Wright) date() fac <- rep(1:prod(d.dim[2:4]), rep(t.length, prod(d.dim[2:4]))) z <- sapply(split(as.vector(Y), fac), FUN = function(x) lsfit(X, x)$coef[2]) dim(z) <- d.dim[2:4] date()

## using lsfit with a matrix response:
date()
rsq <-lsfit(X, array(c(Y), dim = c(t.length, prod(d.dim[2:4]))))$coef[2,]
names(rsq) <- prod(d.dim[2:4])
rsq <- array(rsq, dim = d.dim[2:4])
date()

------------------

thanks
Christoph

Wiener, Matthew wrote:
Christoph --

There was just a thread on this earlier this week.  You can search in the
archives for the title:   "refitting lm() with same x, different y".

(Actually, it doesn't turn up in the R site search yet, at least for me.
But if you just go to the archive of recent messages, available through
CRAN, you can search on refitting and find it.  The original post was from
William Valdar, on April 19.)

Hope this helps,

Matt Wiener

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Christoph Lehmann
Sent: Thursday, April 21, 2005 9:24 AM
To: R-help@stat.math.ethz.ch
Subject: [R] apply vs sapply vs loop - lm() call appl(y)ied on array


Dear useRs

(Code of the now mentioned small example is below)

I have 7 * 8 * 9 = 504 series of data (each length 5). For each of theses series I want to compute a lm(), where the designmatrx X is the same for all these computations.

The 504 series are in an array of dimension d.dim <- c(5, 7, 8, 9)
means, the first dimension holds the data-series.

The lm computation needs performance optimization, since in fact the dimensions are much larger. I compared the following approaches:

using a for-loop. using apply, and using sapply. All of these require roughly the same time of computation. I was astonished since I expected at least sapply to outperfomr the for-loop.

Do you have me another solution, which is faster? many thanks

here is the code
## ------------------------------------------------------
t.length <- 5
d.dim <- c(t.length,7,8,9) # dimesions: time, x, y, z
Y <- array( rep(1:t.length, prod(d.dim)) + rnorm(prod(d.dim), 0, 0.1), d.dim)
X <- c(1,3,2,4,5)


## -------- performance tests
## using for loop
date()
z <- rep(0, prod(d.dim[2:4]))
l <- 0
for (i in 1:dim(Y)[4])
  for (j in 1:dim(Y)[3])
   for (k in 1:dim(Y)[2]) {
     l <- l + 1
     z[l] <- unlist(summary(lm(Y[,k, j, i] ~ X)))$r.squared
   }
date()

## using apply
date()
z <- apply(Y, 2:4, function(x) unlist(summary(lm(x ~ X)))$r.squared)
date()

## using sapply
date()
fac <- rep(1:prod(d.dim[2:4]), rep(t.length, prod(d.dim[2:4])))
z <- sapply(split(as.vector(Y), fac), FUN = function(x) unlist(summary(lm(x ~ X)))$r.squared)
dim(z) <- d.dim[2:4]
date()


## ------------------------------------------------------


______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to