On Fri, Mar 19, 2010 at 5:42 PM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote:
> I sounds like I'm missing something here... A quick check of running 20000 > random Strings of 30 characters from a-zA-Z0-1 + 20 different national > characters through Java's Collator returned an average collatorKey-length of > 175 bytes. On http://wiki.apache.org/solr/UnicodeCollation it is stated that > a standard sort is used, which - to my knowledge - loads the Strings into > memory. For my quick test, this means a tripling of memory usage for the sort > field when indexing collatorKeys? > Right, JDK collation sucks, use the ICU for collation keys too: http://site.icu-project.org/charts/collation-icu4j-sun at 1.59 bytes/char, thats less than UTF-16 -- Robert Muir rcm...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org