On 26/03/2016 11:56, Uwe Schindler wrote:
Hi,
after also testing the separate "Jigsaw" build on jdk9.java.net I see the same
problems. So both builds 111 are wrong.
To me it looks like the Unicode data files are missing some information - which
could again be a packaging bug. As said before, build 110 does not have this
problem, so it seems to be a side-effect of Jigsaw merging.
The following stuff does not work:
(1) Thai's locale does not have working dictionary-based BreakIterator available. The
following "check" in Lucene for this fails, because it cannot detect a boundary
correctly:
/**
* True if the JRE supports a working dictionary-based breakiterator for
Thai.
* If this is false, this tokenizer will not work at all!
*/
public static final boolean DBBI_AVAILABLE;
private static final BreakIterator proto = BreakIterator.getWordInstance(new
Locale("th"));
static {
// check that we have a working dictionary-based break iterator for thai
proto.setText("ภาษาไทย");
DBBI_AVAILABLE = proto.isBoundary(4);
}
After this static initializer, DBBI_AVAILABLE is false. This makes some tests
to be ignored, but 2 fail because of this (which might be an oversight on our
side). But nevertheless, this is a bug in build 111.
I just tried to duplicate this on OSX and Linux without success. The log
you linked to suggests this is Linux, is that right? Is this the JDK
bundle, I haven't checked the JRE bundle but would be surprise anything
is missing. The JDK has several tests for Thai so if it was completely
broken then I would have expected it would have been seen. I've no doubt
that it is not working in your environment, we just need to figure out
what is different.
(2) The collator for Arabic (Farsi) language fails to work correctly. This also
looks like missing data.
Collator collator = Collator.getInstance(new Locale("ar"));
Are there any exceptions or anything here? Or maybe it tests the
collector with compare?
-Alan