Thanks, Stefan.

I tested the expressions over a set of various size of data frames.
The result shows 2) and 3) are faster than 1) especially over a data frame
with a large number of columns. The third one is probably the best.

 1) subset(df, V1 > 0, V2) or  subset(df, V1 > 0, V2)$V2
 2) df[df$V1 > 0.5, "V2"]
 3) df$V2[df$V1 > 0]


== TESTS ==

1. test over 1000000*10 matrix

> df <- as.data.frame.matrix(matrix(runif(10000000),1000000))
> system.time(subset(df, V1 > 0.5, V2), gcFirst=T)
   user  system elapsed
  0.260   0.044   0.302
> system.time(subset(df, V1 > 0.5, V2)$V2, gcFirst=T)
   user  system elapsed
  0.256   0.044   0.300
> system.time(df[df$V1 > 0.5, "V2"], gcFirst=T)
   user  system elapsed
  0.100   0.016   0.117
> system.time(df$V2[df$V1 > 0.5], gcFirst=T)
   user  system elapsed
  0.104   0.012   0.117


2. test over 100000*100 matrix

> df <- as.data.frame.matrix(matrix(runif(10000000),100000))
> system.time(subset(df, V1 > 0.5, V2), gcFirst=T)
   user  system elapsed
   0.04    0.00    0.04
> system.time(subset(df, V1 > 0.5, V2)$V2, gcFirst=T)
   user  system elapsed
  0.040   0.000   0.042
> system.time(df[df$V1 > 0.5, "V2"], gcFirst=T)
   user  system elapsed
  0.012   0.000   0.011
> system.time(df$V2[df$V1 > 0.5], gcFirst=T)
   user  system elapsed
  0.012   0.000   0.011


3. test over 10000*1000 matrix

> df <- as.data.frame.matrix(matrix(runif(10000000),10000))
> system.time(subset(df, V1 > 0.5, V2), gcFirst=T)
   user  system elapsed
  0.008   0.000   0.008
> system.time(subset(df, V1 > 0.5, V2)$V2, gcFirst=T)
   user  system elapsed
  0.004   0.000   0.005
> system.time(df[df$V1 > 0.5, "V2"], gcFirst=T)
   user  system elapsed
  0.004   0.000   0.001
> system.time(df$V2[df$V1 > 0.5], gcFirst=T)
   user  system elapsed
  0.004   0.000   0.001


4. test over 100*100000 matrix

> df <- as.data.frame.matrix(matrix(runif(10000000),100))
> system.time(subset(df, V1 > 0.5, V2), gcFirst=T)
   user  system elapsed
  0.336   0.000   0.336
> system.time(subset(df, V1 > 0.5, V2)$V2, gcFirst=T)
   user  system elapsed
  0.332   0.000   0.330
> system.time(df[df$V1 > 0.5, "V2"], gcFirst=T)
   user  system elapsed
  0.004   0.000   0.005
> system.time(df$V2[df$V1 > 0.5], gcFirst=T)
   user  system elapsed
      0       0       0


5. test over 10*1000000 matrix

> df <- as.data.frame.matrix(matrix(runif(10000000),10))
> system.time(subset(df, V1 > 0.5, V2), gcFirst=T)
   user  system elapsed
 26.698   0.000  26.698
> system.time(subset(df, V1 > 0.5, V2)$V2, gcFirst=T)
   user  system elapsed
 26.678   0.004  26.678
> system.time(df[df$V1>0.5, "V2"], gcFirst=T)
   user  system elapsed
  0.060   0.000   0.057
> system.time(df$V2[df$V1>0.5], gcFirst=T)
   user  system elapsed
      0       0       0


2009/9/26 Stefan Grosse <singularit...@gmx.net>

> On Sat, 26 Sep 2009 15:26:12 +0900 You Hyun Jo <youhyu...@gmail.com>
> wrote:
>
> YHJ> Is there any (performance) difference (except the difference of
> YHJ> the return types)
> YHJ> between the following two computations?
>
> Try it yourself.
> ?system.time
> is useful for that purpose.
>
> Stefan
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to