Thanks, Stefan.
I tested the expressions over a set of various size of data frames.
The result shows 2) and 3) are faster than 1) especially over a data frame
with a large number of columns. The third one is probably the best.
1) subset(df, V1 0, V2) or subset(df, V1 0, V2)$V2
2) df[df$V1 0.5, V2]
3) df$V2[df$V1 0]
== TESTS ==
1. test over 100*10 matrix
df - as.data.frame.matrix(matrix(runif(1000),100))
system.time(subset(df, V1 0.5, V2), gcFirst=T)
user system elapsed
0.260 0.044 0.302
system.time(subset(df, V1 0.5, V2)$V2, gcFirst=T)
user system elapsed
0.256 0.044 0.300
system.time(df[df$V1 0.5, V2], gcFirst=T)
user system elapsed
0.100 0.016 0.117
system.time(df$V2[df$V1 0.5], gcFirst=T)
user system elapsed
0.104 0.012 0.117
2. test over 10*100 matrix
df - as.data.frame.matrix(matrix(runif(1000),10))
system.time(subset(df, V1 0.5, V2), gcFirst=T)
user system elapsed
0.040.000.04
system.time(subset(df, V1 0.5, V2)$V2, gcFirst=T)
user system elapsed
0.040 0.000 0.042
system.time(df[df$V1 0.5, V2], gcFirst=T)
user system elapsed
0.012 0.000 0.011
system.time(df$V2[df$V1 0.5], gcFirst=T)
user system elapsed
0.012 0.000 0.011
3. test over 1*1000 matrix
df - as.data.frame.matrix(matrix(runif(1000),1))
system.time(subset(df, V1 0.5, V2), gcFirst=T)
user system elapsed
0.008 0.000 0.008
system.time(subset(df, V1 0.5, V2)$V2, gcFirst=T)
user system elapsed
0.004 0.000 0.005
system.time(df[df$V1 0.5, V2], gcFirst=T)
user system elapsed
0.004 0.000 0.001
system.time(df$V2[df$V1 0.5], gcFirst=T)
user system elapsed
0.004 0.000 0.001
4. test over 100*10 matrix
df - as.data.frame.matrix(matrix(runif(1000),100))
system.time(subset(df, V1 0.5, V2), gcFirst=T)
user system elapsed
0.336 0.000 0.336
system.time(subset(df, V1 0.5, V2)$V2, gcFirst=T)
user system elapsed
0.332 0.000 0.330
system.time(df[df$V1 0.5, V2], gcFirst=T)
user system elapsed
0.004 0.000 0.005
system.time(df$V2[df$V1 0.5], gcFirst=T)
user system elapsed
0 0 0
5. test over 10*100 matrix
df - as.data.frame.matrix(matrix(runif(1000),10))
system.time(subset(df, V1 0.5, V2), gcFirst=T)
user system elapsed
26.698 0.000 26.698
system.time(subset(df, V1 0.5, V2)$V2, gcFirst=T)
user system elapsed
26.678 0.004 26.678
system.time(df[df$V10.5, V2], gcFirst=T)
user system elapsed
0.060 0.000 0.057
system.time(df$V2[df$V10.5], gcFirst=T)
user system elapsed
0 0 0
2009/9/26 Stefan Grosse singularit...@gmx.net
On Sat, 26 Sep 2009 15:26:12 +0900 You Hyun Jo youhyu...@gmail.com
wrote:
YHJ Is there any (performance) difference (except the difference of
YHJ the return types)
YHJ between the following two computations?
Try it yourself.
?system.time
is useful for that purpose.
Stefan
[[alternative HTML version deleted]]
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.