Roger,

On Mar 15, 2010, at 15:25 , Roger Peng wrote:

If I recall correctly, I thought indexing a vector/list with a
character vector uses hashing if the vector is over a certain length
(I can't remember the cutoff). Otherwise, it's a linear operation.


What I was talking about was lookup of one element in a named generic vector of length n so that is linear (~O(n)), because we were discussing l[[i]].

But, yes, you're right that if you use a vector for indexing x[y] (a slightly different topic but applicable to the data table example) then match() is used (=hashing) if either the subscript or names are long enough (subscrip...@493 has the exact formula).

Cheers,
Simon




On Thu, Mar 11, 2010 at 8:09 PM, Ben <mi...@emerose.org> wrote:
lists are generic vectors with names so lookup is O(n). Environments
in R are true hash tables for that purpose:

Ahh, thanks for the information!  A function I wrote before indexing
on a data frame was slower than I expected, and now I know why.

I don't quite understand - characters are (after raw vectors) the
most expressive data type, so I'm not quite sure why that would be a
limitation .. You can cast anything (but raw vector with nulls) into
to a character.

It's no big problem, it's just that if the solution is to convert to
character type, then there are some implementation details to worry
about.  For instance, I assume that as.character(x) is a reversible
1-1 mapping if x is an integer (and not NA or NULL, etc).  But
apparently that isn't exactly true for floats, and it would get more
complicated for other data types.  So that's why I said it would not
be elegant, but that is a very subjective statement.

On a deeper level, it seems counterintuitive to me that indexing in R
is O(n).  Futhermore, associative arrays are a fundamental data type,
so I think it's weird that I can read the R tutorial, the R language
definition, and even the manual page for new.env() and still not have
enough information to build a decent one.  So IMHO things would be
better if R had a built-in easy-to-use general purpose associative
array.

I don't see a problem thus I'm not surprised it didn't come up
;). But maybe I'm just missing your point ...

Nope, this has come up before---I think R and I are just on different
wavelengths.  Various things that I think are a problem with R are
apparently not, and it's fine the way it is.

Anyway, sorry for getting off topic ;-) You posted everything I need to know and I really appreciate your help.


--
Ben Escoto

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/



______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to