On Fri, May 14, 2010 at 11:23 AM, Yonik Seeley <yo...@lucidimagination.com> wrote: > On Fri, May 14, 2010 at 11:21 AM, Michael McCandless > <luc...@mikemccandless.com> wrote: >> On Fri, May 14, 2010 at 10:59 AM, Yonik Seeley >> <yo...@lucidimagination.com> wrote: >>> On Fri, May 14, 2010 at 7:29 AM, Robert Muir <rcm...@gmail.com> wrote: >>>> On Fri, May 14, 2010 at 5:14 AM, Michael McCandless >>>> <luc...@mikemccandless.com> wrote: >>>>> Or just cutover to UTF8 order for trunk. >>>> >>>> I would really prefer we go this route, instead of trying to do any >>>> hacks at this point! >>> >>> Sounds good... >>> So it seems like the biggest issue we might have in cutting over would >>> be the field cache and sorting? Instead of using String.compareTo we >>> need one that compares as UTF-32 (or longer term, don't even create >>> strings of course...) >> >> Actually, I think on changing to unicode codepoint order, the >> StringIndex returned by FieldCache would in fact be sorted in >> codepoint order (even though it's still a String[]), because it just >> enums the terms from TermsEnum. > > Right... the FIeldCache will be ordered correctly... but when the sort > code compares values across segments?
And even worse... when a binary search is done to convert an ord from one segment to another. -Yonik --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org