dear R wizards:  here is the strange question for the day.  It seems to me
that nrow() is very slow.  Let me explain what I mean:

ds= data.frame( NA, x=rnorm(10000) )   ##  a sample data set

> system.time( { for (i in 1:10000) NA } )   ## doing nothing takes
virtually no time
   user  system elapsed
  0.000   0.000   0.001

## this is something that should take time; we need to add 10,000 values
10,000 times
> system.time( { for (i in 1:10000) mean(ds$x) } )
   user  system elapsed
  0.416   0.001   0.416

## alas, this should be very fast.  it is just reading off an attribute of
ds.  it takes almost a quarter of the time of mean()!
> system.time( { for (i in 1:10000) nrow(ds) } )
   user  system elapsed
  0.124   0.001   0.125

## here is an alternative way to implement nrows, which is already much
faster:
> system.time( { for (i in 1:10000) length(ds$x) } )
   user  system elapsed
  0.041   0.000   0.041

is there a faster way to learn how big a data frame is?  I know this sounds
silly, but this is inside a "by" statement, where I figure out how many
observations are in each subset.  strangely, this takes a whole lot of
time.  I don't believe it is possible to ask "by" to attach an attribute to
the data frame that stores the number of observations that it is actually
passing.

pointers appreciated.

regards,

/iaw
-- 
Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to