[ 
https://issues.apache.org/jira/browse/DAFFODIL-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Beckerle updated DAFFODIL-2218:
------------------------------------
    Description: 
 

ICU libraries changed behavior and now strict behavior is being lax about + 
signs.

Daffodil should revert back to the latest ICU version that doesn't have this 
problem.

Likely we have to determine what ICU version this changed in, and back out to a 
prior one.

This from a DFDL Workgroup email thread on this subject:
{code:java}
Re: [DFDL-WG] Action 313: Plus '+' sign and lax 
textNumberCheckPolicyInboxxSteve Hanson <s...@uk.ibm.com> Fri, Aug 30, 10:56 
AMto me, slawrence, DFDL-WG, Liam ICU changing behaviour in an incompatible
way is not good. 



IBM DFDL is way behind, and is still
on ICU 51.2.  We are limited in what we can do as we try to keep the
same level as IBM Integration Bus & WTX as we have had C namespacing
issues in the past.



Looking at the links, there are other
changes that have crept in when lenient. 



- The string must
contain a complete prefix and suffix. 

For example, if the pattern is "{#};(#)", then
"{123}" or "(123)" would match, but "{123",
"123}", and "123" would all fail. 

(The latter strings would be accepted in lenient mode.)




-
Minus and plus signs can only appear if specified in the pattern. 

In lenient mode, a plus or minus sign can always precede
a number.










In typical ICU fashion, even this is
not complete. It says nothing about what happens if the pattern has a sign
and the data doesn't.



I suggest you test all the combos with
Daffodil and establish the truth.



Then we need to decide what to do. If
there is no way of controlling this (eg, parameter or env var) then the
safest option is to backoff Daffodil to the latest ICU release that matches
the DFDL 1.0 spec, and change the spec so that the link to ICU is specific
rather than the generic link which is in the spec today 
(http://www.icu-project.org/apiref/icu4c/classDecimalFormat.html#_details)
and which floats to the latest release. We can't have a moving target.



Regards

 

Steve Hanson
IBM Hybrid Integration, Hursley, UK

Architect, IBM
DFDL

Co-Chair, OGF
DFDL Working Group

s...@uk.ibm.com

tel:+44-1962-815848

mob:+44-7717-378890

Note: I work Tuesday to Friday 







From:      
 Mike Beckerle <mbeckerle.d...@gmail.com>

To:      
 DFDL-WG <dfdl...@ogf.org>

Date:      
 29/08/2019 19:49

Subject:    
   [DFDL-WG] Action
313: Plus '+' sign and lax textNumberCheckPolicy

Sent by:    
   "dfdl-wg"
<dfdl-wg-boun...@ogf.org>








Looks like ICU changed behavior....



From: Steve Lawrence <slawre...@apache.org>

Sent: Thursday, August 29, 2019 1:30 PM

To: us...@daffodil.apache.org

Subject: Re: Plus '+' sign and lax textNumberCheckPolicy - was: Re: How
to model a fixed-length integer that may be padded with space on the left?



I think this is a difference in ICU version?



A little grepping through ICU source, I found a change [1] to their

number parsing logic in Dec 2017:



+        if (!isStrict) {

+            parser.addMatcher(WhitespaceMatcher.getInstance());

+            parser.addMatcher(new
PlusSignMatcher());

+        }



That looks to me like a change to make it so plus signs are always

matched in lax/lenient mode regardless of the pattern (Daffodils current

behavior). A couple minor changes have been made to that section, but

nothing that allows you to turn if off if lenient is on.



It's hard to tell in the git history what release that was in, but it

looks like around version 61, which is relatively new (Daffodil is on

version 62).



Also, the latest version of DecimalFormatProperties.java (looks to be an

internal implementation, so no online javadocs), has javadocs that

states that plus signs are always allowed in lenient/lax mode [2].



I think this is a change in ICU behavior in newer versions.



- Steve



[1]

https://github.com/unicode-org/icu/commit/68340c8464bd988477d6c88f46f9dfe4562a6d02#diff-565b07c255337881b4e06f766691667cR119-R122

[2]

https://github.com/unicode-org/icu/blob/master/icu4j/main/classes/core/src/com/ibm/icu/impl/number/DecimalFormatProperties.java#L53-L54



--

  dfdl-wg mailing list

  dfdl...@ogf.org

  https://www.ogf.org/mailman/listinfo/dfdl-wg




Unless stated otherwise above:

IBM United Kingdom Limited - Registered in England and Wales with number
741598. 

Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
{code}

  was:
 

ICU libraries changed behavior and now strict behavior is being lax about + 
signs.

Daffodil should revert back to the latest ICU version that doesn't have this 
problem.

This from a DFDL Workgroup email thread on this subject:
{code:java}
Re: [DFDL-WG] Action 313: Plus '+' sign and lax 
textNumberCheckPolicyInboxxSteve Hanson <s...@uk.ibm.com> Fri, Aug 30, 10:56 
AMto me, slawrence, DFDL-WG, Liam ICU changing behaviour in an incompatible
way is not good. 



IBM DFDL is way behind, and is still
on ICU 51.2.  We are limited in what we can do as we try to keep the
same level as IBM Integration Bus & WTX as we have had C namespacing
issues in the past.



Looking at the links, there are other
changes that have crept in when lenient. 



- The string must
contain a complete prefix and suffix. 

For example, if the pattern is "{#};(#)", then
"{123}" or "(123)" would match, but "{123",
"123}", and "123" would all fail. 

(The latter strings would be accepted in lenient mode.)




-
Minus and plus signs can only appear if specified in the pattern. 

In lenient mode, a plus or minus sign can always precede
a number.










In typical ICU fashion, even this is
not complete. It says nothing about what happens if the pattern has a sign
and the data doesn't.



I suggest you test all the combos with
Daffodil and establish the truth.



Then we need to decide what to do. If
there is no way of controlling this (eg, parameter or env var) then the
safest option is to backoff Daffodil to the latest ICU release that matches
the DFDL 1.0 spec, and change the spec so that the link to ICU is specific
rather than the generic link which is in the spec today 
(http://www.icu-project.org/apiref/icu4c/classDecimalFormat.html#_details)
and which floats to the latest release. We can't have a moving target.



Regards

 

Steve Hanson
IBM Hybrid Integration, Hursley, UK

Architect, IBM
DFDL

Co-Chair, OGF
DFDL Working Group

s...@uk.ibm.com

tel:+44-1962-815848

mob:+44-7717-378890

Note: I work Tuesday to Friday 







From:      
 Mike Beckerle <mbeckerle.d...@gmail.com>

To:      
 DFDL-WG <dfdl...@ogf.org>

Date:      
 29/08/2019 19:49

Subject:    
   [DFDL-WG] Action
313: Plus '+' sign and lax textNumberCheckPolicy

Sent by:    
   "dfdl-wg"
<dfdl-wg-boun...@ogf.org>








Looks like ICU changed behavior....



From: Steve Lawrence <slawre...@apache.org>

Sent: Thursday, August 29, 2019 1:30 PM

To: us...@daffodil.apache.org

Subject: Re: Plus '+' sign and lax textNumberCheckPolicy - was: Re: How
to model a fixed-length integer that may be padded with space on the left?



I think this is a difference in ICU version?



A little grepping through ICU source, I found a change [1] to their

number parsing logic in Dec 2017:



+        if (!isStrict) {

+            parser.addMatcher(WhitespaceMatcher.getInstance());

+            parser.addMatcher(new
PlusSignMatcher());

+        }



That looks to me like a change to make it so plus signs are always

matched in lax/lenient mode regardless of the pattern (Daffodils current

behavior). A couple minor changes have been made to that section, but

nothing that allows you to turn if off if lenient is on.



It's hard to tell in the git history what release that was in, but it

looks like around version 61, which is relatively new (Daffodil is on

version 62).



Also, the latest version of DecimalFormatProperties.java (looks to be an

internal implementation, so no online javadocs), has javadocs that

states that plus signs are always allowed in lenient/lax mode [2].



I think this is a change in ICU behavior in newer versions.



- Steve



[1]

https://github.com/unicode-org/icu/commit/68340c8464bd988477d6c88f46f9dfe4562a6d02#diff-565b07c255337881b4e06f766691667cR119-R122

[2]

https://github.com/unicode-org/icu/blob/master/icu4j/main/classes/core/src/com/ibm/icu/impl/number/DecimalFormatProperties.java#L53-L54



--

  dfdl-wg mailing list

  dfdl...@ogf.org

  https://www.ogf.org/mailman/listinfo/dfdl-wg




Unless stated otherwise above:

IBM United Kingdom Limited - Registered in England and Wales with number
741598. 

Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
{code}


> ICU behavior incompatible - textNumberCheckPolicy lax is lax about "+" signs. 
> Was not before. 
> ----------------------------------------------------------------------------------------------
>
>                 Key: DAFFODIL-2218
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2218
>             Project: Daffodil
>          Issue Type: Bug
>            Reporter: Mike Beckerle
>            Priority: Major
>
>  
> ICU libraries changed behavior and now strict behavior is being lax about + 
> signs.
> Daffodil should revert back to the latest ICU version that doesn't have this 
> problem.
> Likely we have to determine what ICU version this changed in, and back out to 
> a prior one.
> This from a DFDL Workgroup email thread on this subject:
> {code:java}
> Re: [DFDL-WG] Action 313: Plus '+' sign and lax 
> textNumberCheckPolicyInboxxSteve Hanson <s...@uk.ibm.com> Fri, Aug 30, 10:56 
> AMto me, slawrence, DFDL-WG, Liam ICU changing behaviour in an incompatible
> way is not good. 
> IBM DFDL is way behind, and is still
> on ICU 51.2.  We are limited in what we can do as we try to keep the
> same level as IBM Integration Bus & WTX as we have had C namespacing
> issues in the past.
> Looking at the links, there are other
> changes that have crept in when lenient. 
> - The string must
> contain a complete prefix and suffix. 
> For example, if the pattern is "{#};(#)", then
> "{123}" or "(123)" would match, but "{123",
> "123}", and "123" would all fail. 
> (The latter strings would be accepted in lenient mode.)
> -
> Minus and plus signs can only appear if specified in the pattern. 
> In lenient mode, a plus or minus sign can always precede
> a number.
> In typical ICU fashion, even this is
> not complete. It says nothing about what happens if the pattern has a sign
> and the data doesn't.
> I suggest you test all the combos with
> Daffodil and establish the truth.
> Then we need to decide what to do. If
> there is no way of controlling this (eg, parameter or env var) then the
> safest option is to backoff Daffodil to the latest ICU release that matches
> the DFDL 1.0 spec, and change the spec so that the link to ICU is specific
> rather than the generic link which is in the spec today 
> (http://www.icu-project.org/apiref/icu4c/classDecimalFormat.html#_details)
> and which floats to the latest release. We can't have a moving target.
> Regards
>  
> Steve Hanson
> IBM Hybrid Integration, Hursley, UK
> Architect, IBM
> DFDL
> Co-Chair, OGF
> DFDL Working Group
> s...@uk.ibm.com
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday 
> From:      
>  Mike Beckerle <mbeckerle.d...@gmail.com>
> To:      
>  DFDL-WG <dfdl...@ogf.org>
> Date:      
>  29/08/2019 19:49
> Subject:    
>    [DFDL-WG] Action
> 313: Plus '+' sign and lax textNumberCheckPolicy
> Sent by:    
>    "dfdl-wg"
> <dfdl-wg-boun...@ogf.org>
> Looks like ICU changed behavior....
> From: Steve Lawrence <slawre...@apache.org>
> Sent: Thursday, August 29, 2019 1:30 PM
> To: us...@daffodil.apache.org
> Subject: Re: Plus '+' sign and lax textNumberCheckPolicy - was: Re: How
> to model a fixed-length integer that may be padded with space on the left?
> I think this is a difference in ICU version?
> A little grepping through ICU source, I found a change [1] to their
> number parsing logic in Dec 2017:
> +        if (!isStrict) {
> +            parser.addMatcher(WhitespaceMatcher.getInstance());
> +            parser.addMatcher(new
> PlusSignMatcher());
> +        }
> That looks to me like a change to make it so plus signs are always
> matched in lax/lenient mode regardless of the pattern (Daffodils current
> behavior). A couple minor changes have been made to that section, but
> nothing that allows you to turn if off if lenient is on.
> It's hard to tell in the git history what release that was in, but it
> looks like around version 61, which is relatively new (Daffodil is on
> version 62).
> Also, the latest version of DecimalFormatProperties.java (looks to be an
> internal implementation, so no online javadocs), has javadocs that
> states that plus signs are always allowed in lenient/lax mode [2].
> I think this is a change in ICU behavior in newer versions.
> - Steve
> [1]
> https://github.com/unicode-org/icu/commit/68340c8464bd988477d6c88f46f9dfe4562a6d02#diff-565b07c255337881b4e06f766691667cR119-R122
> [2]
> https://github.com/unicode-org/icu/blob/master/icu4j/main/classes/core/src/com/ibm/icu/impl/number/DecimalFormatProperties.java#L53-L54
> --
>   dfdl-wg mailing list
>   dfdl...@ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to