dear R wizards: here is the strange question for the day. It seems to me that nrow() is very slow. Let me explain what I mean:
ds= data.frame( NA, x=rnorm(10000) ) ## a sample data set > system.time( { for (i in 1:10000) NA } ) ## doing nothing takes virtually no time user system elapsed 0.000 0.000 0.001 ## this is something that should take time; we need to add 10,000 values 10,000 times > system.time( { for (i in 1:10000) mean(ds$x) } ) user system elapsed 0.416 0.001 0.416 ## alas, this should be very fast. it is just reading off an attribute of ds. it takes almost a quarter of the time of mean()! > system.time( { for (i in 1:10000) nrow(ds) } ) user system elapsed 0.124 0.001 0.125 ## here is an alternative way to implement nrows, which is already much faster: > system.time( { for (i in 1:10000) length(ds$x) } ) user system elapsed 0.041 0.000 0.041 is there a faster way to learn how big a data frame is? I know this sounds silly, but this is inside a "by" statement, where I figure out how many observations are in each subset. strangely, this takes a whole lot of time. I don't believe it is possible to ask "by" to attach an attribute to the data frame that stores the number of observations that it is actually passing. pointers appreciated. regards, /iaw -- Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com) [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.