03-Nov-2013 02:37, Charles Hixson пишет:
I'm contemplating an associative array that will eventually grow to be
an estimated 64KB in size, assuming it's about half full.  It would then
be holding around 90,000 entries.  Is this reasonable, or should I go
with a sorted array, and do binary searches?  I estimate that binary
searches would average around 15 accesses/hit, but that's still a lot
faster than disk access.  The array would be nearly full, so there would
be less wasted space, but, of course, insertions would become quite
expensive.  (Deletions aren't expected.)  In either case the keys would
be fixed length character arrays, and the values would also be simple
structs without reference types (so the garbage collector could pretty
much ignore things).

90k elements is a small AA. It would work just fine with any sane hash function (e.g. the built-in one).

At around 10M+ elements it may be worth to consider space saving techniques. That said on x64 100M+ is perfectly OK, on 32bit I'd advise against it but only because of GC.

FWIW I'm expecting this array to be around the entire time the program
is running, but it would probably be rolled out most of the time, as
frequently accessed items would be pulled into a much smaller structure,
and if I go with the array rather than the hash table I may only do
updates as a batch.  (Since it's linear with the size of the array, but
essentially ignores the number of items being added.)

My temptation is to go with the hash table...but I'm not sure of the
underlying mechanics.

Think of it as a sparse array that has "holes" and takes around ~2-4 times the size of normal array. It's not only because of holes but because a slot is a bit bigger.
In return lookups/inserts/deletions are cheap O(1) with high probability.

--
Dmitry Olshansky

Reply via email to