Dear all, Dimitris and Andy, thanks for your great help. I have progressed to the following code which runs very fast and effective:
mat <- matrix(sample(-15:50, 15 * 10, TRUE), 15, 10) mat[mat>45] <- NA mat<-NA mat temps <- c(35, 37, 39) ind <- rbind( t(sapply(temps, function(temp) rowSums(mat > temp, na.rm=TRUE) )), rowSums(!is.na(mat), na.rm=FALSE), apply(mat, 1, max, na.rm=TRUE)) ind <- t(ind) ind However, some weather stations have missing values for the whole year. Unfortunately, the code breaks down (when uncommenting mat<-NA). I have tried 'ifelse' statements in the functions, but it becomes even more of a mess. I could subset the matrix before hand, but this would mean merging with a complete matrix afterwards to make it compatible with other years. That would slow things down. How can I make the code robust for rows containing all missing values? Thanks for your help, Sander. Dimitris Rizopoulos wrote: > for the maximum you could use something like: > > ind[, 1] <- apply(mat, 2, max) > > I hope it helps. > > Best, > Dimitris > > ---- > Dimitris Rizopoulos > Ph.D. Student > Biostatistical Centre > School of Public Health > Catholic University of Leuven > > Address: Kapucijnenvoer 35, Leuven, Belgium > Tel: +32/16/336899 > Fax: +32/16/337015 > Web: http://www.med.kuleuven.ac.be/biostat/ > http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm > > > > ----- Original Message ----- > From: "Sander Oom" <[EMAIL PROTECTED]> > To: "Dimitris Rizopoulos" <[EMAIL PROTECTED]> > Cc: <r-help@stat.math.ethz.ch> > Sent: Friday, June 10, 2005 12:10 PM > Subject: Re: [R] Replacing for loop with tapply!? > > >>Thanks Dimitris, >> >>Very impressive! Much faster than before. >> >>Thanks to new found R.basic, I can simply rotate the result with >>rotate270{R.basic}: >> >>>mat <- matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000) >>>temps <- c(37, 39, 41) >>>################# >>>#ind <- matrix(0, length(temps), ncol(mat)) >>>ind <- matrix(0, 4, ncol(mat)) >>>(startDate <- date()) >>[1] "Fri Jun 10 12:08:01 2005" >>>for(i in seq(along = temps)) ind[i, ] <- colSums(mat > temps[i]) >>>ind[4, ] <- colMeans(max(mat)) >>Error in colMeans(max(mat)) : 'x' must be an array of at least two >>dimensions >>>(endDate <- date()) >>[1] "Fri Jun 10 12:08:02 2005" >>>ind <- rotate270(ind) >>>ind[1:10,] >> V4 V3 V2 V1 >>1 0 56 75 80 >>2 0 46 53 60 >>3 0 50 58 67 >>4 0 60 72 80 >>5 0 59 68 76 >>6 0 55 67 74 >>7 0 62 77 93 >>8 0 45 57 67 >>9 0 57 68 75 >>10 0 61 66 76 >> >>However, I have not managed to get the row maximum using your >>method? It >>should be 50 for most rows, but my first guess code gives an error! >> >>Any suggestions? >> >>Sander >> >> >> >>Dimitris Rizopoulos wrote: >>>maybe you are looking for something along these lines: >>> >>>mat <- matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000) >>>temps <- c(37, 39, 41) >>>################# >>>ind <- matrix(0, length(temps), ncol(mat)) >>>for(i in seq(along = temps)) ind[i, ] <- colSums(mat > temps[i]) >>>ind >>> >>> >>>I hope it helps. >>> >>>Best, >>>Dimitris >>> >>>---- >>>Dimitris Rizopoulos >>>Ph.D. Student >>>Biostatistical Centre >>>School of Public Health >>>Catholic University of Leuven >>> >>>Address: Kapucijnenvoer 35, Leuven, Belgium >>>Tel: +32/16/336899 >>>Fax: +32/16/337015 >>>Web: http://www.med.kuleuven.ac.be/biostat/ >>> http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm >>> >>> >>>----- Original Message ----- >>>From: "Sander Oom" <[EMAIL PROTECTED]> >>>To: <r-help@stat.math.ethz.ch> >>>Sent: Friday, June 10, 2005 10:50 AM >>>Subject: [R] Replacing for loop with tapply!? >>> >>> >>>>Dear all, >>>> >>>>We have a large data set with temperature data for weather stations >>>>across the globe (15000 stations). >>>> >>>>For each station, we need to calculate the number of days a certain >>>>temperature is exceeded. >>>> >>>>So far we used the following S code, where mat88 is a matrix >>>>containing >>>>rows of 365 daily temperatures for each of 15000 weather stations: >>>> >>>>m <- 37 >>>>n <- 2 >>>>outmat88 <- matrix(0, ncol = 4, nrow = nrow(mat88)) >>>>for(i in 1:nrow(mat88)) { >>>># i <- 3 >>>>row1 <- as.data.frame(df88[i, ]) >>>>temprow37 <- select.rows(row1, row1 > m) >>>>temprow39 <- select.rows(row1, row1 > m + n) >>>>temprow41 <- select.rows(row1, row1 > m + 2 * n) >>>>outmat88[i, 1] <- max(row1, na.rm = T) >>>>outmat88[i, 2] <- count.rows(temprow37) >>>>outmat88[i, 3] <- count.rows(temprow39) >>>>outmat88[i, 4] <- count.rows(temprow41) >>>>} >>>>outmat88 >>>> >>>>We have transferred the data to a more potent Linux box running R, >>>>but >>>>still hope to speed up the code. >>>> >>>>I know a for loop should be avoided when looking for speed. I also >>>>know >>>>the answer is in something like tapply, but my understanding of >>>>these >>>>commands is still to limited to see the solution. Could someone >>>>show >>>>me >>>>the way!? >>>> >>>>Thanks in advance, >>>> >>>>Sander. >>>>-- ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html