[ https://issues.apache.org/jira/browse/SOLR-13580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hoss Man updated SOLR-13580: ---------------------------- Description: Per [JDK-8221432|https://bugs.openjdk.java.net/browse/JDK-8221432] Java13 has updated to [CLDR 35.1|http://cldr.unicode.org/] – which controls the definition of language & locale specific formatting characters – in a non-backwards compatible way due to "French" changes in [CLDR 34|http://cldr.unicode.org/index/downloads/cldr-34#TOC-Detailed-Data-Changes] This impacts people who use any of the "ParseNumeric" UpdateProcessors in conjunction with the "locale=fr" or "locale=fr_FR" init param and expect the (pre java13) existing behavior of treating U+00A0 (NO BREAK SPACE) as a "grouping" character (ie: between thousands and million, between millions and billions, etc...). Starting with java13 the JVM expects U+202F (NARROW NO BREAK SPACE) in it's place. Notably: upgrading to jdk13-ea+26 caused failures in Solr's ParsingFieldUpdateProcessorsTest which was initially had hardcoded test data that used U+00A0. ParsingFieldUpdateProcessorsTest has since been updated to account for this discrepency by modifying the test data used to determine the "expected" character for the current JVM, but there is nothing Solr or the ParseNumeric UpdateProcessors can do to help mitigate this change in behavior for end users who upgrade to java13. Affected users with U+00A0 characters in their incoming SolrInputDocuments will see the ParseNumeric UpdateProcessors (configured with locale=fr...) "skip" these values as unparsable, most likely resulting in a failure to index into a numeric field since the original "String" value will be left as is. was: ParsingFieldUpdateProcessorsTest has uncovered a JDK 13-ea+26 bug when dealing with the fr_FR Locale (which may affect other locales as well) which causes the grouping seperator ( U+00A0 in fr_FR ) to be ignored when parsing, treating them as a termination character -- example: "10 898" is parsed as "10" instead of "10898", leaving the " 898" portion of the string unparsed. The way the ParseNumeric UpdateProcessors are implemented, the fact that the NumbertFormat instance does not recognize the entire string as a Number results in the String value being left "as is" in the input documents. In ParsingFieldUpdateProcessorsTest this has manifested as jenkins failures like this... {noformat} [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=ParsingFieldUpdateProcessorsTest -Dtests.method=testParseFloatNonRootLocale -Dtests.seed=AE6C840917DD963B -Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=us -Dtests.timezone=GMT -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] FAILURE 0.03s | ParsingFieldUpdateProcessorsTest.testParseFloatNonRootLocale <<< [junit4] > Throwable #1: java.lang.AssertionError [junit4] > at __randomizedtesting.SeedInfo.seed([AE6C840917DD963B:B5B079D8B7786A26]:0) [junit4] > at org.apache.solr.update.processor.ParsingFieldUpdateProcessorsTest.testParseFloatNonRootLocale(ParsingFieldUpdateProcessorsTest.java:471) [junit4] > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4] > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit4] > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4] > at java.base/java.lang.reflect.Method.invoke(Method.java:567) [junit4] > at java.base/java.lang.Thread.run(Thread.java:830) {noformat} Summary: java 13 changes to locale specific Numeric parsing rules affect ParseNumeric UpdateProcessors when using 'local' config option - notably affects French (was: java 13-ea NumberFormat.parse bugs in some Locales, affects ParseNumeric UpdateProcessors when using the 'locale' config option) updated summary & description to be helpful to end users who might see a change in behavior and think there is a bug in the UpdaeProcessors > java 13 changes to locale specific Numeric parsing rules affect ParseNumeric > UpdateProcessors when using 'local' config option - notably affects French > -------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: SOLR-13580 > URL: https://issues.apache.org/jira/browse/SOLR-13580 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Hoss Man > Assignee: Hoss Man > Priority: Major > Labels: Java13 > Attachments: SOLR-13580.patch > > > Per [JDK-8221432|https://bugs.openjdk.java.net/browse/JDK-8221432] Java13 has > updated to [CLDR 35.1|http://cldr.unicode.org/] – which controls the > definition of language & locale specific formatting characters – in a > non-backwards compatible way due to "French" changes in [CLDR > 34|http://cldr.unicode.org/index/downloads/cldr-34#TOC-Detailed-Data-Changes] > This impacts people who use any of the "ParseNumeric" UpdateProcessors in > conjunction with the "locale=fr" or "locale=fr_FR" init param and expect the > (pre java13) existing behavior of treating U+00A0 (NO BREAK SPACE) as a > "grouping" character (ie: between thousands and million, between millions and > billions, etc...). Starting with java13 the JVM expects U+202F (NARROW NO > BREAK SPACE) in it's place. > Notably: upgrading to jdk13-ea+26 caused failures in Solr's > ParsingFieldUpdateProcessorsTest which was initially had hardcoded test data > that used U+00A0. ParsingFieldUpdateProcessorsTest has since been updated to > account for this discrepency by modifying the test data used to determine the > "expected" character for the current JVM, but there is nothing Solr or the > ParseNumeric UpdateProcessors can do to help mitigate this change in behavior > for end users who upgrade to java13. > Affected users with U+00A0 characters in their incoming SolrInputDocuments > will see the ParseNumeric UpdateProcessors (configured with locale=fr...) > "skip" these values as unparsable, most likely resulting in a failure to > index into a numeric field since the original "String" value will be left as > is. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org