just dont forget that a RadixTree is O(L) on the length of the strings upon lookup, while a Set is O(1) on average (worse the more collisions you have) since a string's hashCode is stored as an instance field. But since they're lazily calculated, for brand new strings, lookup time on a Set is O(N) on the size of the string youre looking up.
On Fri, Sep 4, 2009 at 8:59 AM, andreasp7n<andr...@petersson.at> wrote: > > On 3 Sep., 17:14, Barney <barney.h...@gmail.com> wrote: >> Is it realistic to use HashSet to determine if a large amount of >> string data (2 000 000 strings of length 20) is composed of unique >> entry ? > > i needed something like this recently, i used a radix tree data > structure to store all strings. quite space-saving. stored 3M customer > names, adresses in memory. was no problem memory-wise. there is a > practical implementation over at http://code.google.com/p/radixtree/ > > while building up the radix tree you can check if you have any > duplication easily. > > > -- http://mapsdev.blogspot.com/ Marcelo Takeshi Fukushima --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "The Java Posse" group. To post to this group, send email to javaposse@googlegroups.com To unsubscribe from this group, send email to javaposse+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/javaposse?hl=en -~----------~----~----~----~------~----~------~--~---