[ https://issues.apache.org/jira/browse/DAFFODIL-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954683#comment-16954683 ]
Mike Beckerle commented on DAFFODIL-2218: ----------------------------------------- Ah, if backing out is just trading bugs, then yeah, we should submit a patch to ICU and fix this bug once they have a release that incorporates the fix. Looks to me like ICU project has grown up now and is now on github, and has a standard JIRA, etc. [http://site.icu-project.org/bugs.] Regular pull-request cycle for patches. > ICU behavior incompatible - textNumberCheckPolicy lax is lax about "+" signs. > Was not before. > ---------------------------------------------------------------------------------------------- > > Key: DAFFODIL-2218 > URL: https://issues.apache.org/jira/browse/DAFFODIL-2218 > Project: Daffodil > Issue Type: Bug > Components: Back End, ICU, Libraries > Reporter: Mike Beckerle > Priority: Minor > Fix For: 2.5.0 > > > ICU libraries changed behavior and now strict behavior is being lax about + > signs. > Daffodil should revert back to the latest ICU version that doesn't have this > problem. > Likely we have to determine what ICU version this changed in, and back out to > a prior one, as this new behavior is not implementing the DFDL spec behavior. > See also https://issues.apache.org/jira/browse/DAFFODIL-845 > This from a DFDL Workgroup email thread on this subject: > {code:java} > Re: [DFDL-WG] Action 313: Plus '+' sign and lax > textNumberCheckPolicyInboxxSteve Hanson <s...@uk.ibm.com> Fri, Aug 30, 10:56 > AMto me, slawrence, DFDL-WG, Liam ICU changing behaviour in an incompatible > way is not good. > IBM DFDL is way behind, and is still > on ICU 51.2. We are limited in what we can do as we try to keep the > same level as IBM Integration Bus & WTX as we have had C namespacing > issues in the past. > Looking at the links, there are other > changes that have crept in when lenient. > - The string must > contain a complete prefix and suffix. > For example, if the pattern is "{#};(#)", then > "{123}" or "(123)" would match, but "{123", > "123}", and "123" would all fail. > (The latter strings would be accepted in lenient mode.) > - > Minus and plus signs can only appear if specified in the pattern. > In lenient mode, a plus or minus sign can always precede > a number. > In typical ICU fashion, even this is > not complete. It says nothing about what happens if the pattern has a sign > and the data doesn't. > I suggest you test all the combos with > Daffodil and establish the truth. > Then we need to decide what to do. If > there is no way of controlling this (eg, parameter or env var) then the > safest option is to backoff Daffodil to the latest ICU release that matches > the DFDL 1.0 spec, and change the spec so that the link to ICU is specific > rather than the generic link which is in the spec today > (http://www.icu-project.org/apiref/icu4c/classDecimalFormat.html#_details) > and which floats to the latest release. We can't have a moving target. > Regards > > Steve Hanson > IBM Hybrid Integration, Hursley, UK > Architect, IBM > DFDL > Co-Chair, OGF > DFDL Working Group > s...@uk.ibm.com > tel:+44-1962-815848 > mob:+44-7717-378890 > Note: I work Tuesday to Friday > From: > Mike Beckerle <mbeckerle.d...@gmail.com> > To: > DFDL-WG <dfdl...@ogf.org> > Date: > 29/08/2019 19:49 > Subject: > [DFDL-WG] Action > 313: Plus '+' sign and lax textNumberCheckPolicy > Sent by: > "dfdl-wg" > <dfdl-wg-boun...@ogf.org> > Looks like ICU changed behavior.... > From: Steve Lawrence <slawre...@apache.org> > Sent: Thursday, August 29, 2019 1:30 PM > To: us...@daffodil.apache.org > Subject: Re: Plus '+' sign and lax textNumberCheckPolicy - was: Re: How > to model a fixed-length integer that may be padded with space on the left? > I think this is a difference in ICU version? > A little grepping through ICU source, I found a change [1] to their > number parsing logic in Dec 2017: > + if (!isStrict) { > + parser.addMatcher(WhitespaceMatcher.getInstance()); > + parser.addMatcher(new > PlusSignMatcher()); > + } > That looks to me like a change to make it so plus signs are always > matched in lax/lenient mode regardless of the pattern (Daffodils current > behavior). A couple minor changes have been made to that section, but > nothing that allows you to turn if off if lenient is on. > It's hard to tell in the git history what release that was in, but it > looks like around version 61, which is relatively new (Daffodil is on > version 62). > Also, the latest version of DecimalFormatProperties.java (looks to be an > internal implementation, so no online javadocs), has javadocs that > states that plus signs are always allowed in lenient/lax mode [2]. > I think this is a change in ICU behavior in newer versions. > - Steve > [1] > https://github.com/unicode-org/icu/commit/68340c8464bd988477d6c88f46f9dfe4562a6d02#diff-565b07c255337881b4e06f766691667cR119-R122 > [2] > https://github.com/unicode-org/icu/blob/master/icu4j/main/classes/core/src/com/ibm/icu/impl/number/DecimalFormatProperties.java#L53-L54 > -- > dfdl-wg mailing list > dfdl...@ogf.org > https://www.ogf.org/mailman/listinfo/dfdl-wg > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 > 3AU > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)