Re: JDK 9 Build 111 seems to miss some locale data, Lucene tests fail with Farsi and Thai language

Alan Bateman Sat, 26 Mar 2016 07:12:06 -0700

On 26/03/2016 11:56, Uwe Schindler wrote:

Hi,


after also testing the separate "Jigsaw" build on jdk9.java.net I see the same 
problems. So both builds 111 are wrong.

To me it looks like the Unicode data files are missing some information - which 
could again be a packaging bug. As said before, build 110 does not have this 
problem, so it seems to be a side-effect of Jigsaw merging.

The following stuff does not work:

(1) Thai's locale does not have working dictionary-based BreakIterator available. The 
following "check" in Lucene for this fails, because it cannot detect a boundary 
correctly:

   /**
    * True if the JRE supports a working dictionary-based breakiterator for 
Thai.
    * If this is false, this tokenizer will not work at all!
    */
   public static final boolean DBBI_AVAILABLE;
   private static final BreakIterator proto = BreakIterator.getWordInstance(new 
Locale("th"));
   static {
     // check that we have a working dictionary-based break iterator for thai
     proto.setText("ภาษาไทย");
     DBBI_AVAILABLE = proto.isBoundary(4);
   }

After this static initializer, DBBI_AVAILABLE is false. This makes some tests 
to be ignored, but 2 fail because of this (which might be an oversight on our 
side). But nevertheless, this is a bug in build 111.

I just tried to duplicate this on OSX and Linux without success. The logyou linked to suggests this is Linux, is that right? Is this the JDKbundle, I haven't checked the JRE bundle but would be surprise anythingis missing. The JDK has several tests for Thai so if it was completelybroken then I would have expected it would have been seen. I've no doubtthat it is not working in your environment, we just need to figure outwhat is different.


(2) The collator for Arabic (Farsi) language fails to work correctly. This also 
looks like missing data.

Collator collator = Collator.getInstance(new Locale("ar"));

Are there any exceptions or anything here? Or maybe it tests thecollector with compare?


-Alan

Re: JDK 9 Build 111 seems to miss some locale data, Lucene tests fail with Farsi and Thai language

Reply via email to