Just as a followup, I looked into this collation difference. for Lucene, the difference between the ICU/Harmony implementation and the RI implementation can be seen here:
Collator.getInstance(new Locale("")).compare("Ø", "U") So I think this is covered under differences in CLDR data (HARMONY-5406), since its differences in Collation rules for the root Locale. On Tue, Sep 14, 2010 at 12:12 PM, Robert Muir <rcm...@gmail.com> wrote: > > > On Tue, Sep 14, 2010 at 12:06 PM, Tim Ellison <t.p.elli...@gmail.com>wrote: > >> >> A quick look at the harmony code, and it seems we are explicitly >> throwing IIOBE's from our code, e.g. >> >> if (index < 0 || index >= len) { >> throw new IndexOutOfBoundsException(); >> } >> >> In this case I see no reason why we wouldn't match the RI behavior, even >> though it is not required by the spec. >> > > OK , I can open a jira issue for this one. > > >> >> Yes, again I think this beyond the spec. As you say, Harmony defers to >> the ICU project to provide the i18n functionality. If you consider the >> collation or break iterators to be producing a result that is 'wrong' >> we'd raise it with ICU and get their opinion. >> >> > Yes, in both cases Lucene supports "jdk or icu" impl for collation and > break iteration, so we have encountered these differences before. > I can't comment on the collation difference (except we already > conditionalized our test case for ICU here) > For the break iterator difference, a user from Laos commented that when he > uses the ICU implementation with Thai, he gets better results... so maybe > ICU simply improved the thai dictionary? > > -- > Robert Muir > rcm...@gmail.com > -- Robert Muir rcm...@gmail.com