Hi, I'm subsetting a named vector using character indices. My vector of indices (or keys) is 10x longer than the vector I'm subsetting. All my keys are distinct and only 10% of them are valid (i.e. match a name of the vector being subsetted). It is surprisingly slow:
x1 <- 1:1000 names(x1) <- paste("a", x1, sep="") keys <- sample(c(names(x1), paste("b", 1:9000, sep=""))) > system.time(y1 <- x1[keys]) user system elapsed 0.410 0.000 0.416 x2 <- 1:2000 names(x2) <- paste("a", x2, sep="") keys <- sample(c(names(x2), paste("b", 1:18000, sep=""))) > system.time(y2 <- x2[keys]) user system elapsed 1.730 0.000 1.736 x3 <- 1:4000 names(x3) <- paste("a", x3, sep="") keys <- sample(c(names(x3), paste("b", 1:36000, sep=""))) > system.time(y3 <- x3[keys]) user system elapsed 8.900 0.010 9.227 x4 <- 1:8000 names(x4) <- paste("a", x4, sep="") keys <- sample(c(names(x4), paste("b", 1:72000, sep=""))) > system.time(y4 <- x4[keys]) user system elapsed 130.390 0.000 132.316 And it's apparently worse than quadratic in time! I'm wondering why this subsetting by name is so slow since it seems it could be implemented with x4[match(keys, names(x4))], which is very fast: only 0.012s! This is with R-2.11.0 and R-2.12.0. Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel