If AAs don't have any hidden penalty, like causing the garbage collector to thrash, then that's the way to go
Any time you insert an item into an AA it might allocate. So it might cause a garbage collection cycle. I'm unsure what you mean by "causing the garbage collector to trash" memory leeaks? If it's not acceptable that the gc will collect then either build a custom hash map class with determenistic memory
management or implement your sorted array version.
Note: Lookup into AA's does not allocate
I've used larger AAs in simple programs without problems, but this one will probably run continually for months.
I have never written anything that is supposed to run for months at a time so take anything i say with a grain of salt.
is only one piece, so I'm a bit concerned about things that I can't see any easy way to test. So I thought I'd ask. P.S.: The external codes are NOT known at compile time. Not even at the start of run time. They are added incrementally, and there is no actual rule about what they will be. (This is a part of the reason for all the "about" and "estimated" in my original statement (unless I edited them out).)
Since the codes have no rule the best you can do for an hash function is to use a gernerall one, this might not be good enough for your purposes. You could test it with the build in string hash function on random strings of a specific length but it's no guarantee that it will work well for you specific data. It's however highly likely that it will still be faster then 15 misses to 1 hit ratio.
So it sounds like I should plan on using AAs, and just keep the array option around as a fallback position.
I would aggre. If the extra memory and non-deterministisism is not acceptable go with the array.