[ 
https://issues.apache.org/jira/browse/SOLR-13580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-13580:
----------------------------
    Description: 
Per [JDK-8221432|https://bugs.openjdk.java.net/browse/JDK-8221432] Java13 has 
updated to [CLDR 35.1|http://cldr.unicode.org/] – which controls the definition 
of language & locale specific formatting characters – in a non-backwards 
compatible way due to "French" changes in [CLDR 
34|http://cldr.unicode.org/index/downloads/cldr-34#TOC-Detailed-Data-Changes]

This impacts people who use any of the "ParseNumeric" UpdateProcessors in 
conjunction with the "locale=fr" or "locale=fr_FR" init param and expect the 
(pre java13) existing behavior of treating U+00A0 (NO BREAK SPACE) as a 
"grouping" character (ie: between thousands and million, between millions and 
billions, etc...). Starting with java13 the JVM expects U+202F (NARROW NO BREAK 
SPACE) in it's place.

Notably: upgrading to jdk13-ea+26 caused failures in Solr's 
ParsingFieldUpdateProcessorsTest which was initially had hardcoded test data 
that used U+00A0. ParsingFieldUpdateProcessorsTest has since been updated to 
account for this discrepency by modifying the test data used to determine the 
"expected" character for the current JVM, but there is nothing Solr or the 
ParseNumeric UpdateProcessors can do to help mitigate this change in behavior 
for end users who upgrade to java13.

Affected users with U+00A0 characters in their incoming SolrInputDocuments will 
see the ParseNumeric UpdateProcessors (configured with locale=fr...) "skip" 
these values as unparsable, most likely resulting in a failure to index into a 
numeric field since the original "String" value will be left as is.
  

  was:
ParsingFieldUpdateProcessorsTest has uncovered a JDK 13-ea+26 bug when dealing 
with the fr_FR Locale (which may affect other locales as well) which causes the 
grouping seperator ( U+00A0 in fr_FR ) to be ignored when parsing, treating 
them as a termination character -- example: "10 898" is parsed as "10" instead 
of "10898", leaving the " 898" portion of the string unparsed.

The way the ParseNumeric UpdateProcessors are implemented, the fact that the 
NumbertFormat instance does not recognize the entire string as a Number results 
in the String value being left "as is" in the input documents.

In ParsingFieldUpdateProcessorsTest this has manifested as jenkins failures 
like this...

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=ParsingFieldUpdateProcessorsTest 
-Dtests.method=testParseFloatNonRootLocale -Dtests.seed=AE6C840917DD963B 
-Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=us 
-Dtests.timezone=GMT -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
   [junit4] FAILURE 0.03s | 
ParsingFieldUpdateProcessorsTest.testParseFloatNonRootLocale <<<
   [junit4]    > Throwable #1: java.lang.AssertionError
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([AE6C840917DD963B:B5B079D8B7786A26]:0)
   [junit4]    >        at 
org.apache.solr.update.processor.ParsingFieldUpdateProcessorsTest.testParseFloatNonRootLocale(ParsingFieldUpdateProcessorsTest.java:471)
   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   [junit4]    >        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   [junit4]    >        at 
java.base/java.lang.reflect.Method.invoke(Method.java:567)
   [junit4]    >        at java.base/java.lang.Thread.run(Thread.java:830)
{noformat}

        Summary: java 13 changes to locale specific Numeric parsing rules 
affect ParseNumeric UpdateProcessors when using 'local' config option  - 
notably affects French  (was: java 13-ea NumberFormat.parse bugs in some 
Locales, affects ParseNumeric UpdateProcessors when using the 'locale' config 
option)

updated summary & description to be helpful to end users who might see a change 
in behavior and think there is a bug in the UpdaeProcessors

> java 13 changes to locale specific Numeric parsing rules affect ParseNumeric 
> UpdateProcessors when using 'local' config option  - notably affects French
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13580
>                 URL: https://issues.apache.org/jira/browse/SOLR-13580
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>            Priority: Major
>              Labels: Java13
>         Attachments: SOLR-13580.patch
>
>
> Per [JDK-8221432|https://bugs.openjdk.java.net/browse/JDK-8221432] Java13 has 
> updated to [CLDR 35.1|http://cldr.unicode.org/] – which controls the 
> definition of language & locale specific formatting characters – in a 
> non-backwards compatible way due to "French" changes in [CLDR 
> 34|http://cldr.unicode.org/index/downloads/cldr-34#TOC-Detailed-Data-Changes]
> This impacts people who use any of the "ParseNumeric" UpdateProcessors in 
> conjunction with the "locale=fr" or "locale=fr_FR" init param and expect the 
> (pre java13) existing behavior of treating U+00A0 (NO BREAK SPACE) as a 
> "grouping" character (ie: between thousands and million, between millions and 
> billions, etc...). Starting with java13 the JVM expects U+202F (NARROW NO 
> BREAK SPACE) in it's place.
> Notably: upgrading to jdk13-ea+26 caused failures in Solr's 
> ParsingFieldUpdateProcessorsTest which was initially had hardcoded test data 
> that used U+00A0. ParsingFieldUpdateProcessorsTest has since been updated to 
> account for this discrepency by modifying the test data used to determine the 
> "expected" character for the current JVM, but there is nothing Solr or the 
> ParseNumeric UpdateProcessors can do to help mitigate this change in behavior 
> for end users who upgrade to java13.
> Affected users with U+00A0 characters in their incoming SolrInputDocuments 
> will see the ParseNumeric UpdateProcessors (configured with locale=fr...) 
> "skip" these values as unparsable, most likely resulting in a failure to 
> index into a numeric field since the original "String" value will be left as 
> is.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to