Re: [R] Replacing for loop with tapply!?
ations and 365 days. > > Regards, Adai > > > > On Fri, 2005-06-10 at 20:05 +0200, Sander Oom wrote: >>Dear all, >> >>Dimitris and Andy, thanks for your great help. I have progressed to the >>following code which runs very fast and effective: >> >>mat <- matrix(sample(-15:50, 15 * 10, TRUE), 15, 10) >>mat[mat>45] <- NA >>mat<-NA >>mat >>temps <- c(35, 37, 39) >>ind <- rbind( >> t(sapply(temps, function(temp) >> rowSums(mat > temp, na.rm=TRUE) )), >> rowSums(!is.na(mat), na.rm=FALSE), >> apply(mat, 1, max, na.rm=TRUE)) >>ind <- t(ind) >>ind >> >>However, some weather stations have missing values for the whole year. >>Unfortunately, the code breaks down (when uncommenting mat<-NA). >> >>I have tried 'ifelse' statements in the functions, but it becomes even >>more of a mess. I could subset the matrix before hand, but this would >>mean merging with a complete matrix afterwards to make it compatible >>with other years. That would slow things down. >> >>How can I make the code robust for rows containing all missing values? >> >>Thanks for your help, >> >>Sander. >> >>Dimitris Rizopoulos wrote: >>>for the maximum you could use something like: >>> >>>ind[, 1] <- apply(mat, 2, max) >>> >>>I hope it helps. >>> >>>Best, >>>Dimitris >>> >>> >>>Dimitris Rizopoulos >>>Ph.D. Student >>>Biostatistical Centre >>>School of Public Health >>>Catholic University of Leuven >>> >>>Address: Kapucijnenvoer 35, Leuven, Belgium >>>Tel: +32/16/336899 >>>Fax: +32/16/337015 >>>Web: http://www.med.kuleuven.ac.be/biostat/ >>> http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm >>> >>> >>> >>>- Original Message - >>>From: "Sander Oom" <[EMAIL PROTECTED]> >>>To: "Dimitris Rizopoulos" <[EMAIL PROTECTED]> >>>Cc: >>>Sent: Friday, June 10, 2005 12:10 PM >>>Subject: Re: [R] Replacing for loop with tapply!? >>> >>> >>>>Thanks Dimitris, >>>> >>>>Very impressive! Much faster than before. >>>> >>>>Thanks to new found R.basic, I can simply rotate the result with >>>>rotate270{R.basic}: >>>> >>>>>mat <- matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000) >>>>>temps <- c(37, 39, 41) >>>>># >>>>>#ind <- matrix(0, length(temps), ncol(mat)) >>>>>ind <- matrix(0, 4, ncol(mat)) >>>>>(startDate <- date()) >>>>[1] "Fri Jun 10 12:08:01 2005" >>>>>for(i in seq(along = temps)) ind[i, ] <- colSums(mat > temps[i]) >>>>>ind[4, ] <- colMeans(max(mat)) >>>>Error in colMeans(max(mat)) : 'x' must be an array of at least two >>>>dimensions >>>>>(endDate <- date()) >>>>[1] "Fri Jun 10 12:08:02 2005" >>>>>ind <- rotate270(ind) >>>>>ind[1:10,] >>>> V4 V3 V2 V1 >>>>1 0 56 75 80 >>>>2 0 46 53 60 >>>>3 0 50 58 67 >>>>4 0 60 72 80 >>>>5 0 59 68 76 >>>>6 0 55 67 74 >>>>7 0 62 77 93 >>>>8 0 45 57 67 >>>>9 0 57 68 75 >>>>10 0 61 66 76 >>>> >>>>However, I have not managed to get the row maximum using your >>>>method? It >>>>should be 50 for most rows, but my first guess code gives an error! >>>> >>>>Any suggestions? >>>> >>>>Sander >>>> >>>> >>>> >>>>Dimitris Rizopoulos wrote: >>>>>maybe you are looking for something along these lines: >>>>> >>>>>mat <- matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000) >>>>>temps <- c(37, 39, 41) >>>>># >>>>>ind <- matrix(0, length(temps), ncol(mat)) >>>>>for(i in seq(along = temps)) ind[i, ] <- colSums(mat > temps[i]) >>>>>ind >>>>> >>>>> >>>>>I hope it helps. >>>>> >>>>>Best, >>>>>Dimitris >>>>> >>>>> >>>>>Dimitris Rizopoulos >>>&
Re: [R] Replacing for loop with tapply!?
Hi On 10 Jun 2005 at 20:05, Sander Oom wrote: > Dear all, > > Dimitris and Andy, thanks for your great help. I have progressed to > the following code which runs very fast and effective: > > mat <- matrix(sample(-15:50, 15 * 10, TRUE), 15, 10) > mat[mat>45] <- NA > mat<-NA By this you redefine mat as > str(mat) logi NA > and your code gives an error that it has to have some dimensions + apply(mat, 1, max, na.rm=TRUE)) Error in rowSums(mat > temp, na.rm = TRUE) : 'x' must be an array of at least two dimensions > If your matrix has one row full of NA's it only complains but computes a value. > mat[3,]<-NA > temps <- c(35, 37, 39) > ind <- rbind( + t(sapply(temps, function(temp) +rowSums(mat > temp, na.rm=TRUE) )), + rowSums(!is.na(mat), na.rm=FALSE), + apply(mat, 1, max, na.rm=TRUE)) Warning message: no finite arguments to max; returning -Inf > ind <- t(ind) > ind > ind [,1] [,2] [,3] [,4] [,5] [1,]5539 48 [2,]1119 42 [3,]0000 -Inf > mat > temps <- c(35, 37, 39) > ind <- rbind( > t(sapply(temps, function(temp) >rowSums(mat > temp, na.rm=TRUE) )), > rowSums(!is.na(mat), na.rm=FALSE), > apply(mat, 1, max, na.rm=TRUE)) > ind <- t(ind) > ind > > However, some weather stations have missing values for the whole year. > Unfortunately, the code breaks down (when uncommenting mat<-NA). > > I have tried 'ifelse' statements in the functions, but it becomes even > more of a mess. I could subset the matrix before hand, but this would > mean merging with a complete matrix afterwards to make it compatible > with other years. That would slow things down. > > How can I make the code robust for rows containing all missing values? which(rowSums(!is.na(mat))==0) This gives you indices which lines of your matrix has all values NA and you can use it for fine tuning of your code. What you need to do depends on what results do you want, how ind matrix should look like after processing mat with one or more rows full of NA's. HTH Petr > > Thanks for your help, > > Sander. > > Dimitris Rizopoulos wrote: > > for the maximum you could use something like: > > > > ind[, 1] <- apply(mat, 2, max) > > > > I hope it helps. > > > > Best, > > Dimitris > > > > > > Dimitris Rizopoulos > > Ph.D. Student > > Biostatistical Centre > > School of Public Health > > Catholic University of Leuven > > > > Address: Kapucijnenvoer 35, Leuven, Belgium > > Tel: +32/16/336899 > > Fax: +32/16/337015 > > Web: http://www.med.kuleuven.ac.be/biostat/ > > http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm > > > > > > > > - Original Message - > > From: "Sander Oom" <[EMAIL PROTECTED]> > > To: "Dimitris Rizopoulos" <[EMAIL PROTECTED]> Cc: > > Sent: Friday, June 10, 2005 12:10 PM > > Subject: Re: [R] Replacing for loop with tapply!? > > > > > >>Thanks Dimitris, > >> > >>Very impressive! Much faster than before. > >> > >>Thanks to new found R.basic, I can simply rotate the result with > >>rotate270{R.basic}: > >> > >>>mat <- matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000) > >>>temps <- c(37, 39, 41) > >>># > >>>#ind <- matrix(0, length(temps), ncol(mat)) > >>>ind <- matrix(0, 4, ncol(mat)) > >>>(startDate <- date()) > >>[1] "Fri Jun 10 12:08:01 2005" > >>>for(i in seq(along = temps)) ind[i, ] <- colSums(mat > temps[i]) > >>>ind[4, ] <- colMeans(max(mat)) > >>Error in colMeans(max(mat)) : 'x' must be an array of at least two > >>dimensions > >>>(endDate <- date()) > >>[1] "Fri Jun 10 12:08:02 2005" > >>>ind <- rotate270(ind) > >>>ind[1:10,] > >> V4 V3 V2 V1 > >>1 0 56 75 80 > >>2 0 46 53 60 > >>3 0 50 58 67 > >>4 0 60 72 80 > >>5 0 59 68 76 > >>6 0 55 67 74 > >>7 0 62 77 93 > >>8 0 45 57 67 > >>9 0 57 68 75 > >>10 0 61 66 76 > >> > >>However, I have not managed to get the row maximum using your > >>method? It > >>should be 50 for most rows, but my first guess code gives an error! > >> > >>Any suggestions? > >> > >>Sander > >> > >> > >> > &g
Re: [R] Replacing for loop with tapply!?
OK, so you want to find some summary statistics for each column, where some columns could be completely missing. Writing a small wrapper should help. When you use apply(), you are actually applying a function to every column (or row). First, let us simulate a dataset with 15 days/rows and 10 stations/columns ### simulate data set.seed(1)# for reproducibility mat <- matrix(sample(-15:50, 15 * 10, TRUE), 15, 10) mat[ mat > 45 ] <- NA # create some missing values mat[ ,9 ] <- NA # station 9's data is completely missing Here are two example of such wrappers : find.stats1 <- function( data, threshold=c(37,39,41) ){ n <- length(threshold) out <- matrix( nrow=(n + 1), ncol=ncol(data) ) # initialise out[1, ] <- apply(data, 2, function(x) ifelse( all(is.na(x)), NA, max(x, na.rm=T) )) for(i in 1:n) out[ i+1, ] <- colSums( data > threshold[i], na.rm=T ) rownames(out) <- c( "daily_max", paste("above", threshold, sep="_") ) colnames(out) <- rownames(data) # name of the stations return( out ) } find.stats2 <- function( data, threshold=c(37,39,41) ){ n <- length(threshold) excess <- numeric( n ) out<- matrix( nrow=(n + 1), ncol=ncol(data) ) # initialise good <- which( apply( data, 2, function(x) !all(is.na(x)) ) ) # colums that are not completely missing out[ , good] <- apply( data[ , good], 2, function(x){ m <- max( x, na.rm=T ) for(i in 1:n){ excess[i] <- sum( x > threshold[i], na.rm=TRUE ) } return( c(m, excess) ) } ) rownames(out) <- c( "daily_max", paste("above", threshold, sep="_") ) colnames(out) <- rownames(data) # name of the stations return( out ) } find.stats1( mat ) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] daily_max 44 42 39 41 45 43 42 45 NA42 above_37 212132210 1 above_39 210132110 1 above_41 210022110 1 find.stats2( mat ) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] daily_max 44 42 39 41 45 43 42 45 NA42 above_37 21213221 NA 1 above_39 21013211 NA 1 above_41 21002211 NA 1 On my laptop 'find.stats1' and 'find.stats2' (which is more flexible) takes 7 and 6 seconds respectively to execute on a dataset with 1 stations and 365 days. Regards, Adai On Fri, 2005-06-10 at 20:05 +0200, Sander Oom wrote: > Dear all, > > Dimitris and Andy, thanks for your great help. I have progressed to the > following code which runs very fast and effective: > > mat <- matrix(sample(-15:50, 15 * 10, TRUE), 15, 10) > mat[mat>45] <- NA > mat<-NA > mat > temps <- c(35, 37, 39) > ind <- rbind( > t(sapply(temps, function(temp) >rowSums(mat > temp, na.rm=TRUE) )), > rowSums(!is.na(mat), na.rm=FALSE), > apply(mat, 1, max, na.rm=TRUE)) > ind <- t(ind) > ind > > However, some weather stations have missing values for the whole year. > Unfortunately, the code breaks down (when uncommenting mat<-NA). > > I have tried 'ifelse' statements in the functions, but it becomes even > more of a mess. I could subset the matrix before hand, but this would > mean merging with a complete matrix afterwards to make it compatible > with other years. That would slow things down. > > How can I make the code robust for rows containing all missing values? > > Thanks for your help, > > Sander. > > Dimitris Rizopoulos wrote: > > for the maximum you could use something like: > > > > ind[, 1] <- apply(mat, 2, max) > > > > I hope it helps. > > > > Best, > > Dimitris > > > > > > Dimitris Rizopoulos > > Ph.D. Student > > Biostatistical Centre > > School of Public Health > > Catholic University of Leuven > > > > Address: Kapucijnenvoer 35, Leuven, Belgium > > Tel: +32/16/336899 > > Fax: +32/16/337015 > > Web: http://www.med.kuleuven.ac.be/biostat/ > > http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm > > > > > > > > - Original Message - > > From: "Sander Oom" <[EMAIL PROTECTED]> > > To: "Dimitris Rizopoulos" <[EMAIL PROTECTED]> > > Cc: > > Sent: Friday, June 10, 2005 12:10 PM > > Subject: Re: [R] Replacing for loop with tapply!? > > > > > >>Thanks Dimitris, > >> > >>Very impressive! Much faster than before.
Re: [R] Replacing for loop with tapply!?
Sander Oom wrote: >Dear all, > >We have a large data set with temperature data for weather stations >across the globe (15000 stations). > >For each station, we need to calculate the number of days a certain >temperature is exceeded. > >So far we used the following S code, where mat88 is a matrix containing >rows of 365 daily temperatures for each of 15000 weather stations: > > m <- 37 > n <- 2 > outmat88 <- matrix(0, ncol = 4, nrow = nrow(mat88)) > for(i in 1:nrow(mat88)) { > # i <- 3 > row1 <- as.data.frame(df88[i, ]) > temprow37 <- select.rows(row1, row1 > m) > temprow39 <- select.rows(row1, row1 > m + n) > temprow41 <- select.rows(row1, row1 > m + 2 * n) > outmat88[i, 1] <- max(row1, na.rm = T) > outmat88[i, 2] <- count.rows(temprow37) > outmat88[i, 3] <- count.rows(temprow39) > outmat88[i, 4] <- count.rows(temprow41) > } > outmat88 > > > What you need is not tapply but apply. Something like apply(mat88, 1, function(x) sum(x > 30)) where your treshold should replace 30 and the `1' refers to rows. For multiple tresholds: apply(mat88, 1, function(x) c( sum(x>20), sum(x>25), sum(x>30))) Kjetil >We have transferred the data to a more potent Linux box running R, but >still hope to speed up the code. > >I know a for loop should be avoided when looking for speed. I also know >the answer is in something like tapply, but my understanding of these >commands is still to limited to see the solution. Could someone show me >the way!? > >Thanks in advance, > >Sander. > > -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Replacing for loop with tapply!?
Dear all, Dimitris and Andy, thanks for your great help. I have progressed to the following code which runs very fast and effective: mat <- matrix(sample(-15:50, 15 * 10, TRUE), 15, 10) mat[mat>45] <- NA mat<-NA mat temps <- c(35, 37, 39) ind <- rbind( t(sapply(temps, function(temp) rowSums(mat > temp, na.rm=TRUE) )), rowSums(!is.na(mat), na.rm=FALSE), apply(mat, 1, max, na.rm=TRUE)) ind <- t(ind) ind However, some weather stations have missing values for the whole year. Unfortunately, the code breaks down (when uncommenting mat<-NA). I have tried 'ifelse' statements in the functions, but it becomes even more of a mess. I could subset the matrix before hand, but this would mean merging with a complete matrix afterwards to make it compatible with other years. That would slow things down. How can I make the code robust for rows containing all missing values? Thanks for your help, Sander. Dimitris Rizopoulos wrote: > for the maximum you could use something like: > > ind[, 1] <- apply(mat, 2, max) > > I hope it helps. > > Best, > Dimitris > > > Dimitris Rizopoulos > Ph.D. Student > Biostatistical Centre > School of Public Health > Catholic University of Leuven > > Address: Kapucijnenvoer 35, Leuven, Belgium > Tel: +32/16/336899 > Fax: +32/16/337015 > Web: http://www.med.kuleuven.ac.be/biostat/ > http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm > > > > - Original Message - > From: "Sander Oom" <[EMAIL PROTECTED]> > To: "Dimitris Rizopoulos" <[EMAIL PROTECTED]> > Cc: > Sent: Friday, June 10, 2005 12:10 PM > Subject: Re: [R] Replacing for loop with tapply!? > > >>Thanks Dimitris, >> >>Very impressive! Much faster than before. >> >>Thanks to new found R.basic, I can simply rotate the result with >>rotate270{R.basic}: >> >>>mat <- matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000) >>>temps <- c(37, 39, 41) >>># >>>#ind <- matrix(0, length(temps), ncol(mat)) >>>ind <- matrix(0, 4, ncol(mat)) >>>(startDate <- date()) >>[1] "Fri Jun 10 12:08:01 2005" >>>for(i in seq(along = temps)) ind[i, ] <- colSums(mat > temps[i]) >>>ind[4, ] <- colMeans(max(mat)) >>Error in colMeans(max(mat)) : 'x' must be an array of at least two >>dimensions >>>(endDate <- date()) >>[1] "Fri Jun 10 12:08:02 2005" >>>ind <- rotate270(ind) >>>ind[1:10,] >> V4 V3 V2 V1 >>1 0 56 75 80 >>2 0 46 53 60 >>3 0 50 58 67 >>4 0 60 72 80 >>5 0 59 68 76 >>6 0 55 67 74 >>7 0 62 77 93 >>8 0 45 57 67 >>9 0 57 68 75 >>10 0 61 66 76 >> >>However, I have not managed to get the row maximum using your >>method? It >>should be 50 for most rows, but my first guess code gives an error! >> >>Any suggestions? >> >>Sander >> >> >> >>Dimitris Rizopoulos wrote: >>>maybe you are looking for something along these lines: >>> >>>mat <- matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000) >>>temps <- c(37, 39, 41) >>># >>>ind <- matrix(0, length(temps), ncol(mat)) >>>for(i in seq(along = temps)) ind[i, ] <- colSums(mat > temps[i]) >>>ind >>> >>> >>>I hope it helps. >>> >>>Best, >>>Dimitris >>> >>> >>>Dimitris Rizopoulos >>>Ph.D. Student >>>Biostatistical Centre >>>School of Public Health >>>Catholic University of Leuven >>> >>>Address: Kapucijnenvoer 35, Leuven, Belgium >>>Tel: +32/16/336899 >>>Fax: +32/16/337015 >>>Web: http://www.med.kuleuven.ac.be/biostat/ >>> http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm >>> >>> >>>- Original Message - >>>From: "Sander Oom" <[EMAIL PROTECTED]> >>>To: >>>Sent: Friday, June 10, 2005 10:50 AM >>>Subject: [R] Replacing for loop with tapply!? >>> >>> >>>>Dear all, >>>> >>>>We have a large data set with temperature data for weather stations >>>>across the globe (15000 stations). >>>> >>>>For each station, we need to calculate the number of days a certain >>>>temperature is exceeded. >>>> >>>>So far we used the following S code, where mat88 is a matrix >>>>containi
Re: [R] Replacing for loop with tapply!?
for the maximum you could use something like: ind[, 1] <- apply(mat, 2, max) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/336899 Fax: +32/16/337015 Web: http://www.med.kuleuven.ac.be/biostat/ http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm - Original Message - From: "Sander Oom" <[EMAIL PROTECTED]> To: "Dimitris Rizopoulos" <[EMAIL PROTECTED]> Cc: Sent: Friday, June 10, 2005 12:10 PM Subject: Re: [R] Replacing for loop with tapply!? > Thanks Dimitris, > > Very impressive! Much faster than before. > > Thanks to new found R.basic, I can simply rotate the result with > rotate270{R.basic}: > > > mat <- matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000) > > temps <- c(37, 39, 41) > > # > > #ind <- matrix(0, length(temps), ncol(mat)) > > ind <- matrix(0, 4, ncol(mat)) > > (startDate <- date()) > [1] "Fri Jun 10 12:08:01 2005" > > for(i in seq(along = temps)) ind[i, ] <- colSums(mat > temps[i]) > > ind[4, ] <- colMeans(max(mat)) > Error in colMeans(max(mat)) : 'x' must be an array of at least two > dimensions > > (endDate <- date()) > [1] "Fri Jun 10 12:08:02 2005" > > ind <- rotate270(ind) > > ind[1:10,] >V4 V3 V2 V1 > 1 0 56 75 80 > 2 0 46 53 60 > 3 0 50 58 67 > 4 0 60 72 80 > 5 0 59 68 76 > 6 0 55 67 74 > 7 0 62 77 93 > 8 0 45 57 67 > 9 0 57 68 75 > 10 0 61 66 76 > > However, I have not managed to get the row maximum using your > method? It > should be 50 for most rows, but my first guess code gives an error! > > Any suggestions? > > Sander > > > > Dimitris Rizopoulos wrote: >> maybe you are looking for something along these lines: >> >> mat <- matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000) >> temps <- c(37, 39, 41) >> # >> ind <- matrix(0, length(temps), ncol(mat)) >> for(i in seq(along = temps)) ind[i, ] <- colSums(mat > temps[i]) >> ind >> >> >> I hope it helps. >> >> Best, >> Dimitris >> >> >> Dimitris Rizopoulos >> Ph.D. Student >> Biostatistical Centre >> School of Public Health >> Catholic University of Leuven >> >> Address: Kapucijnenvoer 35, Leuven, Belgium >> Tel: +32/16/336899 >> Fax: +32/16/337015 >> Web: http://www.med.kuleuven.ac.be/biostat/ >> http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm >> >> >> - Original Message - >> From: "Sander Oom" <[EMAIL PROTECTED]> >> To: >> Sent: Friday, June 10, 2005 10:50 AM >> Subject: [R] Replacing for loop with tapply!? >> >> >>>Dear all, >>> >>>We have a large data set with temperature data for weather stations >>>across the globe (15000 stations). >>> >>>For each station, we need to calculate the number of days a certain >>>temperature is exceeded. >>> >>>So far we used the following S code, where mat88 is a matrix >>>containing >>>rows of 365 daily temperatures for each of 15000 weather stations: >>> >>>m <- 37 >>>n <- 2 >>>outmat88 <- matrix(0, ncol = 4, nrow = nrow(mat88)) >>>for(i in 1:nrow(mat88)) { >>># i <- 3 >>>row1 <- as.data.frame(df88[i, ]) >>>temprow37 <- select.rows(row1, row1 > m) >>>temprow39 <- select.rows(row1, row1 > m + n) >>>temprow41 <- select.rows(row1, row1 > m + 2 * n) >>>outmat88[i, 1] <- max(row1, na.rm = T) >>>outmat88[i, 2] <- count.rows(temprow37) >>>outmat88[i, 3] <- count.rows(temprow39) >>>outmat88[i, 4] <- count.rows(temprow41) >>>} >>>outmat88 >>> >>>We have transferred the data to a more potent Linux box running R, >>>but >>>still hope to speed up the code. >>> >>>I know a for loop should be avoided when looking for speed. I also >>>know >>>the answer is in something like tapply, but my understanding of >>>these >>>commands is still to limited to see the solution. Could someone >>>show >>>me >>>the way!? >>> >>>Thanks in advance, >>> >>>Sander. >>>-- >>> >>>Dr Sander P. Oom >>>Animal, Plant and Environmental Sciences, >>
Re: [R] Replacing for loop with tapply!?
Thanks Dimitris, Very impressive! Much faster than before. Thanks to new found R.basic, I can simply rotate the result with rotate270{R.basic}: > mat <- matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000) > temps <- c(37, 39, 41) > # > #ind <- matrix(0, length(temps), ncol(mat)) > ind <- matrix(0, 4, ncol(mat)) > (startDate <- date()) [1] "Fri Jun 10 12:08:01 2005" > for(i in seq(along = temps)) ind[i, ] <- colSums(mat > temps[i]) > ind[4, ] <- colMeans(max(mat)) Error in colMeans(max(mat)) : 'x' must be an array of at least two dimensions > (endDate <- date()) [1] "Fri Jun 10 12:08:02 2005" > ind <- rotate270(ind) > ind[1:10,] V4 V3 V2 V1 1 0 56 75 80 2 0 46 53 60 3 0 50 58 67 4 0 60 72 80 5 0 59 68 76 6 0 55 67 74 7 0 62 77 93 8 0 45 57 67 9 0 57 68 75 10 0 61 66 76 However, I have not managed to get the row maximum using your method? It should be 50 for most rows, but my first guess code gives an error! Any suggestions? Sander Dimitris Rizopoulos wrote: > maybe you are looking for something along these lines: > > mat <- matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000) > temps <- c(37, 39, 41) > # > ind <- matrix(0, length(temps), ncol(mat)) > for(i in seq(along = temps)) ind[i, ] <- colSums(mat > temps[i]) > ind > > > I hope it helps. > > Best, > Dimitris > > > Dimitris Rizopoulos > Ph.D. Student > Biostatistical Centre > School of Public Health > Catholic University of Leuven > > Address: Kapucijnenvoer 35, Leuven, Belgium > Tel: +32/16/336899 > Fax: +32/16/337015 > Web: http://www.med.kuleuven.ac.be/biostat/ > http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm > > > - Original Message - > From: "Sander Oom" <[EMAIL PROTECTED]> > To: > Sent: Friday, June 10, 2005 10:50 AM > Subject: [R] Replacing for loop with tapply!? > > >>Dear all, >> >>We have a large data set with temperature data for weather stations >>across the globe (15000 stations). >> >>For each station, we need to calculate the number of days a certain >>temperature is exceeded. >> >>So far we used the following S code, where mat88 is a matrix >>containing >>rows of 365 daily temperatures for each of 15000 weather stations: >> >>m <- 37 >>n <- 2 >>outmat88 <- matrix(0, ncol = 4, nrow = nrow(mat88)) >>for(i in 1:nrow(mat88)) { >># i <- 3 >>row1 <- as.data.frame(df88[i, ]) >>temprow37 <- select.rows(row1, row1 > m) >>temprow39 <- select.rows(row1, row1 > m + n) >>temprow41 <- select.rows(row1, row1 > m + 2 * n) >>outmat88[i, 1] <- max(row1, na.rm = T) >>outmat88[i, 2] <- count.rows(temprow37) >>outmat88[i, 3] <- count.rows(temprow39) >>outmat88[i, 4] <- count.rows(temprow41) >>} >>outmat88 >> >>We have transferred the data to a more potent Linux box running R, >>but >>still hope to speed up the code. >> >>I know a for loop should be avoided when looking for speed. I also >>know >>the answer is in something like tapply, but my understanding of >>these >>commands is still to limited to see the solution. Could someone show >>me >>the way!? >> >>Thanks in advance, >> >>Sander. >>-- >> >>Dr Sander P. Oom >>Animal, Plant and Environmental Sciences, >>University of the Witwatersrand >>Private Bag 3, Wits 2050, South Africa >>Tel (work) +27 (0)11 717 64 04 >>Tel (home) +27 (0)18 297 44 51 >>Fax +27 (0)18 299 24 64 >>Email [EMAIL PROTECTED] >>Web www.oomvanlieshout.net/sander >> >>__ >>R-help@stat.math.ethz.ch mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide! >>http://www.R-project.org/posting-guide.html >> > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > -- Dr Sander P. Oom Animal, Plant and Environmental Sciences, University of the Witwatersrand Private Bag 3, Wits 2050, South Africa Tel (work) +27 (0)11 717 64 04 Tel (home) +27 (0)18 297 44 51 Fax +27 (0)18 299 24 64 Email [EMAIL PROTECTED] Web www.oomvanlieshout.net/sander __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Replacing for loop with tapply!?
maybe you are looking for something along these lines: mat <- matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000) temps <- c(37, 39, 41) # ind <- matrix(0, length(temps), ncol(mat)) for(i in seq(along = temps)) ind[i, ] <- colSums(mat > temps[i]) ind I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/336899 Fax: +32/16/337015 Web: http://www.med.kuleuven.ac.be/biostat/ http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm - Original Message - From: "Sander Oom" <[EMAIL PROTECTED]> To: Sent: Friday, June 10, 2005 10:50 AM Subject: [R] Replacing for loop with tapply!? > Dear all, > > We have a large data set with temperature data for weather stations > across the globe (15000 stations). > > For each station, we need to calculate the number of days a certain > temperature is exceeded. > > So far we used the following S code, where mat88 is a matrix > containing > rows of 365 daily temperatures for each of 15000 weather stations: > > m <- 37 > n <- 2 > outmat88 <- matrix(0, ncol = 4, nrow = nrow(mat88)) > for(i in 1:nrow(mat88)) { > # i <- 3 > row1 <- as.data.frame(df88[i, ]) > temprow37 <- select.rows(row1, row1 > m) > temprow39 <- select.rows(row1, row1 > m + n) > temprow41 <- select.rows(row1, row1 > m + 2 * n) > outmat88[i, 1] <- max(row1, na.rm = T) > outmat88[i, 2] <- count.rows(temprow37) > outmat88[i, 3] <- count.rows(temprow39) > outmat88[i, 4] <- count.rows(temprow41) > } > outmat88 > > We have transferred the data to a more potent Linux box running R, > but > still hope to speed up the code. > > I know a for loop should be avoided when looking for speed. I also > know > the answer is in something like tapply, but my understanding of > these > commands is still to limited to see the solution. Could someone show > me > the way!? > > Thanks in advance, > > Sander. > -- > > Dr Sander P. Oom > Animal, Plant and Environmental Sciences, > University of the Witwatersrand > Private Bag 3, Wits 2050, South Africa > Tel (work) +27 (0)11 717 64 04 > Tel (home) +27 (0)18 297 44 51 > Fax +27 (0)18 299 24 64 > Email [EMAIL PROTECTED] > Web www.oomvanlieshout.net/sander > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Replacing for loop with tapply!?
Dear all, We have a large data set with temperature data for weather stations across the globe (15000 stations). For each station, we need to calculate the number of days a certain temperature is exceeded. So far we used the following S code, where mat88 is a matrix containing rows of 365 daily temperatures for each of 15000 weather stations: m <- 37 n <- 2 outmat88 <- matrix(0, ncol = 4, nrow = nrow(mat88)) for(i in 1:nrow(mat88)) { # i <- 3 row1 <- as.data.frame(df88[i, ]) temprow37 <- select.rows(row1, row1 > m) temprow39 <- select.rows(row1, row1 > m + n) temprow41 <- select.rows(row1, row1 > m + 2 * n) outmat88[i, 1] <- max(row1, na.rm = T) outmat88[i, 2] <- count.rows(temprow37) outmat88[i, 3] <- count.rows(temprow39) outmat88[i, 4] <- count.rows(temprow41) } outmat88 We have transferred the data to a more potent Linux box running R, but still hope to speed up the code. I know a for loop should be avoided when looking for speed. I also know the answer is in something like tapply, but my understanding of these commands is still to limited to see the solution. Could someone show me the way!? Thanks in advance, Sander. -- Dr Sander P. Oom Animal, Plant and Environmental Sciences, University of the Witwatersrand Private Bag 3, Wits 2050, South Africa Tel (work) +27 (0)11 717 64 04 Tel (home) +27 (0)18 297 44 51 Fax +27 (0)18 299 24 64 Email [EMAIL PROTECTED] Web www.oomvanlieshout.net/sander __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html