Roger,
On Mar 15, 2010, at 15:25 , Roger Peng wrote:
If I recall correctly, I thought indexing a vector/list with a
character vector uses hashing if the vector is over a certain length
(I can't remember the cutoff). Otherwise, it's a linear operation.
What I was talking about was lookup of one element in a named generic
vector of length n so that is linear (~O(n)), because we were
discussing l[[i]].
But, yes, you're right that if you use a vector for indexing x[y] (a
slightly different topic but applicable to the data table example)
then match() is used (=hashing) if either the subscript or names are
long enough (subscrip...@493 has the exact formula).
Cheers,
Simon
On Thu, Mar 11, 2010 at 8:09 PM, Ben <mi...@emerose.org> wrote:
lists are generic vectors with names so lookup is O(n). Environments
in R are true hash tables for that purpose:
Ahh, thanks for the information! A function I wrote before indexing
on a data frame was slower than I expected, and now I know why.
I don't quite understand - characters are (after raw vectors) the
most expressive data type, so I'm not quite sure why that would be a
limitation .. You can cast anything (but raw vector with nulls) into
to a character.
It's no big problem, it's just that if the solution is to convert to
character type, then there are some implementation details to worry
about. For instance, I assume that as.character(x) is a reversible
1-1 mapping if x is an integer (and not NA or NULL, etc). But
apparently that isn't exactly true for floats, and it would get more
complicated for other data types. So that's why I said it would not
be elegant, but that is a very subjective statement.
On a deeper level, it seems counterintuitive to me that indexing in R
is O(n). Futhermore, associative arrays are a fundamental data type,
so I think it's weird that I can read the R tutorial, the R language
definition, and even the manual page for new.env() and still not have
enough information to build a decent one. So IMHO things would be
better if R had a built-in easy-to-use general purpose associative
array.
I don't see a problem thus I'm not surprised it didn't come up
;). But maybe I'm just missing your point ...
Nope, this has come up before---I think R and I are just on different
wavelengths. Various things that I think are a problem with R are
apparently not, and it's fine the way it is.
Anyway, sorry for getting off topic ;-) You posted everything I
need to know and I really appreciate your help.
--
Ben Escoto
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
--
Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel