[
https://issues.apache.org/jira/browse/SOLR-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620999#comment-14620999
]
Hoss Man commented on SOLR-7770:
--------------------------------
Misc comments from uwe on the mailing list regarding this...
{quote}
I debugged the date parsing problems with a new test (TestDateUtil in solrj).
The reason for this failing is the following 2 things, but they are related (if
not even the same bug):
- https://bugs.openjdk.java.net/browse/JDK-8129881 is triggered: TIKA uses
Date#toString() which inserts a broken
timezone shortcut into the resulting date. This cannot be parsed anymore! This
happens all the timein ROOT Locale
(see below).
- Solr uses Locale.ROOT to parse the date (of course, because it's language
independent). This locale is missing all
text representations of weekdays or timezones in OpenJDK's CLDR locale data, so
it cannot parse the weekday or the
time zones. If I change DateUtil to use Locale.ENGLISH, it works as expected.
I will open a bug report at Oracle.
{quote}
...
bq. I opened Report (Review ID: JI-9022158) - Change to CLDR Locale data in JDK
9 b71 causes SimpleDateFormat parsing errors
...
{quote}
I think the real issue here is the following (Rory can you add this to issue?):
According to Unicode, all locales should fall back to the ROOT locale, if the
specific Locale does not have data
(e.g.,
http://cldr.unicode.org/development/development-process/design-proposals/generic-calendar-data).
The problem
is now that the CLDR Java implementation seems to fall back to the root locale,
but the root locale does not have
weekdays and time zone short names - our test verifies this: ROOT locale is
missing all this information.
This causes all the bugs, also the one in
https://bugs.openjdk.java.net/browse/JDK-8129881. The root locale should
have the default English weekday and timezone names (see
http://cldr.unicode.org/development/development-process/design-proposals/generic-calendar-data).
I think the ROOT locale and the fallback mechanism should be revisited in JDK's
CLDR impl, there seems to be a bug
with that (either missing data or the fallback to defaults does not work
correctly).
{quote}
from Balchandra...
bq. Here is the JBS id: https://bugs.openjdk.java.net/browse/JDK-8130845
> Date field problems using ExtractingRequestHandler and java 9 (b71)
> -------------------------------------------------------------------
>
> Key: SOLR-7770
> URL: https://issues.apache.org/jira/browse/SOLR-7770
> Project: Solr
> Issue Type: Bug
> Reporter: Hoss Man
>
> Tracking bug to note that the (Tika based) ExtractingRequestHandler will not
> work properly with jdk9 starting with build71.
> This first manifested itself with failures like this from the tests...
> {noformat}
> [junit4] 2> NOTE: reproduce with: ant test
> -Dtestcase=ExtractingRequestHandlerTest
> -Dtests.method=testArabicPDF -Dtests.seed=232D0A5404C2ADED
> -Dtests.multiplier=3 -Dtests.slow=true
> -Dtests.locale=en_JM -Dtests.timezone=Etc/GMT-7 -Dtests.asserts=true
> -Dtests.file.encoding=UTF-8
> [junit4] ERROR 0.58s | ExtractingRequestHandlerTest.testArabicPDF <<<
> [junit4] > Throwable #1: org.apache.solr.common.SolrException: Invalid
> Date String:'Tue Mar 09 13:44:49
> GMT+07:00 2010'
> {noformat}
> Workarround noted by Uwe...
> {quote}
> The test passes on JDK 9 b71 with:
> -Dargs="-Djava.locale.providers=JRE,SPI"
> This reenabled the old Locale data. I will add this to the build parameters
> of policeman Jenkins to stop this from
> failing. To me it looks like the locale data somehow is not able to correctly
> parse weekdays and/or timezones. I
> will check this out tomorrow and report a bug to the OpenJDK people. There is
> something fishy with CLDR locale data.
> There are already some bugs open, so work is not yet finished (e.g. sometimes
> it uses wrong timezone shortcuts,...)
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]