Note that you are talking about very small times here.
Yes, it probably switches early for ns=1, but is that a common usage?
Do people really do lots of single lookups from long vectors -- if so they
deserve what they get, and it would be better to use a hashed environment.
(Indeed a strategy considered but not implemented was to attach a hash for
future use.)
On Tue, 18 Nov 2008, Martin Morgan wrote:
This creates a named vector of length nx, then repeatedly draws a
single sample from it.
lkup - function(nx, m=1L) {
tbl - seq_len(nx)
names(tbl) - as.character(tbl)
v - sample(names(tbl), m, replace=TRUE)
system.time(for(k in v) tbl[k], gcFirst=TRUE)
}
There is an abrupt performance degredation at nx=1000
lkup(1000)
user system elapsed
0.180 0.000 0.179
lkup(1001)
user system elapsed
2.444 0.016 2.462
This is because of the heuristic at stringSubscript.c:424, which
switches from a 'naive' nx * ns algorithm (ns is the number of
elements to be extracted, ns = 1 above) to a hash-based strategy when
nx * ns 1.
It seems like the 'naive' algorithm takes nx * ns time, whereas the
hash-based algorithm takes C(nx) + ns, where C(nx) is the cost of
creating the hash. Guessing that the cost of building the hash is
about linear in nx, the results above suggest a new heuristic for
switching at ns of about 15.
(I don't quite follow the last sentence of the comment above
stringSubscript, so perhaps I misunderstand entirely).
Martin
sessionInfo()
R version 2.9.0 Under development (unstable) (2008-11-15 r46953)
i686-pc-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M2 B169
Phone: (206) 667-2793
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
--
Brian D. Ripley, [EMAIL PROTECTED]
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax: +44 1865 272595
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel