[ 
https://issues.apache.org/jira/browse/MATH-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

尹茂椿萱 updated MATH-1688:
-----------------------
    Description: 
Description

ComplexFormat.parse exhibits inconsistent and undocumented behavior when 
parsing inputs containing commas.

Commas are silently ignored in numeric components, but not treated as 
structural separators. In addition, error handling behavior differs between 
versions (e.g., returning null vs throwing an exception), which further 
complicates usage.

—

Reproducible Example (Commons Math 4.0)

```java
import org.apache.commons.math4.legacy.util.ComplexFormat;

public class StringUtils {
    public static void main(String[] args)

{         ComplexFormat format = new ComplexFormat();         
System.out.println(format.parse(",,7+,,,2i"));   // (7.0, 2.0)         
System.out.println(format.parse(",8+,,3i"));     // (8.0, 3.0)         
System.out.println(format.parse(",7"));          // (7.0, 0.0)         
System.out.println(format.parse("7,,8"));        // (78.0, 0.0)         
System.out.println(format.parse("#7"));          // throws MathParseException   
  }

}
```

—

Observed Behavior
 - Commas are ignored when they appear inside numeric components:
  - ",,7" → 7
  - ",,,2" → 2

 - As a result:
  - ",,7+,,,2i" → (7.0, 2.0)
  - ",8+,,3i" → (8.0, 3.0)

 - Input such as:
  - "7,,8"
  
  is parsed as:
  - (78.0, 0.0)

  This indicates that commas are not treated as delimiters between values,
  but are instead silently removed inside numeric parsing.
 - Other invalid characters (e.g. '#') are not ignored:
  - "#7" results in a MathParseException

 - Error handling differs from earlier versions:
  - In some versions, invalid input may return null
  - In Commons Math 4.0, invalid input throws an exception

—

Expected Behavior

Parsing should be consistent and predictable:
 - Either commas should be explicitly supported as valid separators and 
documented
 - Or invalid characters should cause parsing to fail uniformly

In particular:
 - If commas are treated as delimiters, "7,,8" should not collapse into 78
 - If commas are not valid syntax, inputs containing them should fail 
consistently

Additionally, error handling behavior should be clearly defined and consistent 
across versions.

—

Root Cause Analysis

The behavior originates from CompositeFormat.parseNumber:

    Number number = format.parse(source, pos);

This delegates parsing to NumberFormat (typically DecimalFormat).

DecimalFormat treats ',' as a grouping separator and ignores it during parsing:

    ",,7" → 7
    "7,,8" → 78

This is confirmed by observing that:
 - pos.getIndex() advances after parsing ",,7"
 - startIndex != endIndex, so parsing is considered successful

Therefore:
 - ',' is implicitly ignored inside numeric components by NumberFormat
 - but ComplexFormat does not handle ',' consistently in structural parsing

—

Consequence

This leads to inconsistent parsing behavior:
 - ',' is ignored inside numeric values
 - ',' is not treated as a structural delimiter
 - ',' is rejected in structural positions (e.g. around '+' or 'i')

As a result:
 - Parsing becomes context-dependent and non-intuitive
 - Malformed input may be silently accepted and misinterpreted
 - Behavior differs across versions (null vs exception)

—
 # 
 ## 
 ### Additional Notes

This behavior is not documented in the ComplexFormat API and may surprise users 
expecting strict parsing.

The issue arises from the interaction between:
 - a lenient numeric parser (NumberFormat)
 - and a stricter structural parser (ComplexFormat)

—
 # 
 ## 
 ### Possible Improvements

 - Disable grouping parsing in NumberFormat when used by ComplexFormat
 - Or explicitly handle separators at the ComplexFormat level
 - Or document the current behavior clearly

Providing a strict parsing mode could also help avoid ambiguity.

  was:
### Description

ComplexFormat.parse exhibits inconsistent and undocumented behavior when 
parsing inputs containing commas.

Commas are silently ignored in numeric components, but not treated as 
structural separators. In addition, error handling behavior differs between 
versions (e.g., returning null vs throwing an exception), which further 
complicates usage.

---

### Reproducible Example (Commons Math 4.0)

```java
import org.apache.commons.math4.legacy.util.ComplexFormat;

public class StringUtils {
    public static void main(String[] args) {
        ComplexFormat format = new ComplexFormat();

        System.out.println(format.parse(",,7+,,,2i"));   // (7.0, 2.0)
        System.out.println(format.parse(",8+,,3i"));     // (8.0, 3.0)

        System.out.println(format.parse(",7"));          // (7.0, 0.0)
        System.out.println(format.parse("7,,8"));        // (78.0, 0.0)
        System.out.println(format.parse("#7"));          // throws 
MathParseException
    }
}
```

---

### Observed Behavior

- Commas are ignored when they appear inside numeric components:
  - ",,7" → 7
  - ",,,2" → 2

- As a result:
  - ",,7+,,,2i" → (7.0, 2.0)
  - ",8+,,3i" → (8.0, 3.0)

- Input such as:
  - "7,,8"
  
  is parsed as:
  - (78.0, 0.0)

  This indicates that commas are not treated as delimiters between values,
  but are instead silently removed inside numeric parsing.

- Other invalid characters (e.g. '#') are not ignored:
  - "#7" results in a MathParseException

- Error handling differs from earlier versions:
  - In some versions, invalid input may return null
  - In Commons Math 4.0, invalid input throws an exception

---

### Expected Behavior

Parsing should be consistent and predictable:

- Either commas should be explicitly supported as valid separators and 
documented
- Or invalid characters should cause parsing to fail uniformly

In particular:

- If commas are treated as delimiters, "7,,8" should not collapse into 78
- If commas are not valid syntax, inputs containing them should fail 
consistently

Additionally, error handling behavior should be clearly defined and consistent 
across versions.

---

### Root Cause Analysis

The behavior originates from CompositeFormat.parseNumber:

    Number number = format.parse(source, pos);

This delegates parsing to NumberFormat (typically DecimalFormat).

DecimalFormat treats ',' as a grouping separator and ignores it during parsing:

    ",,7" → 7
    "7,,8" → 78

This is confirmed by observing that:

- pos.getIndex() advances after parsing ",,7"
- startIndex != endIndex, so parsing is considered successful

Therefore:

- ',' is implicitly ignored inside numeric components by NumberFormat
- but ComplexFormat does not handle ',' consistently in structural parsing

---

### Consequence

This leads to inconsistent parsing behavior:

- ',' is ignored inside numeric values
- ',' is not treated as a structural delimiter
- ',' is rejected in structural positions (e.g. around '+' or 'i')

As a result:

- Parsing becomes context-dependent and non-intuitive
- Malformed input may be silently accepted and misinterpreted
- Behavior differs across versions (null vs exception)

---

### Additional Notes

This behavior is not documented in the ComplexFormat API and may surprise users 
expecting strict parsing.

The issue arises from the interaction between:

- a lenient numeric parser (NumberFormat)
- and a stricter structural parser (ComplexFormat)

---

### Possible Improvements

- Disable grouping parsing in NumberFormat when used by ComplexFormat
- Or explicitly handle separators at the ComplexFormat level
- Or document the current behavior clearly

Providing a strict parsing mode could also help avoid ambiguity.


> ComplexFormat.parse exhibits inconsistent behavior due to implicit comma 
> skipping by NumberFormat
> -------------------------------------------------------------------------------------------------
>
>                 Key: MATH-1688
>                 URL: https://issues.apache.org/jira/browse/MATH-1688
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 4.0-beta1
>            Reporter: 尹茂椿萱
>            Priority: Major
>
> Description
> ComplexFormat.parse exhibits inconsistent and undocumented behavior when 
> parsing inputs containing commas.
> Commas are silently ignored in numeric components, but not treated as 
> structural separators. In addition, error handling behavior differs between 
> versions (e.g., returning null vs throwing an exception), which further 
> complicates usage.
> —
> Reproducible Example (Commons Math 4.0)
> ```java
> import org.apache.commons.math4.legacy.util.ComplexFormat;
> public class StringUtils {
>     public static void main(String[] args)
> {         ComplexFormat format = new ComplexFormat();         
> System.out.println(format.parse(",,7+,,,2i"));   // (7.0, 2.0)         
> System.out.println(format.parse(",8+,,3i"));     // (8.0, 3.0)         
> System.out.println(format.parse(",7"));          // (7.0, 0.0)         
> System.out.println(format.parse("7,,8"));        // (78.0, 0.0)         
> System.out.println(format.parse("#7"));          // throws MathParseException 
>     }
> }
> ```
> —
> Observed Behavior
>  - Commas are ignored when they appear inside numeric components:
>   - ",,7" → 7
>   - ",,,2" → 2
>  - As a result:
>   - ",,7+,,,2i" → (7.0, 2.0)
>   - ",8+,,3i" → (8.0, 3.0)
>  - Input such as:
>   - "7,,8"
>   
>   is parsed as:
>   - (78.0, 0.0)
>   This indicates that commas are not treated as delimiters between values,
>   but are instead silently removed inside numeric parsing.
>  - Other invalid characters (e.g. '#') are not ignored:
>   - "#7" results in a MathParseException
>  - Error handling differs from earlier versions:
>   - In some versions, invalid input may return null
>   - In Commons Math 4.0, invalid input throws an exception
> —
> Expected Behavior
> Parsing should be consistent and predictable:
>  - Either commas should be explicitly supported as valid separators and 
> documented
>  - Or invalid characters should cause parsing to fail uniformly
> In particular:
>  - If commas are treated as delimiters, "7,,8" should not collapse into 78
>  - If commas are not valid syntax, inputs containing them should fail 
> consistently
> Additionally, error handling behavior should be clearly defined and 
> consistent across versions.
> —
> Root Cause Analysis
> The behavior originates from CompositeFormat.parseNumber:
>     Number number = format.parse(source, pos);
> This delegates parsing to NumberFormat (typically DecimalFormat).
> DecimalFormat treats ',' as a grouping separator and ignores it during 
> parsing:
>     ",,7" → 7
>     "7,,8" → 78
> This is confirmed by observing that:
>  - pos.getIndex() advances after parsing ",,7"
>  - startIndex != endIndex, so parsing is considered successful
> Therefore:
>  - ',' is implicitly ignored inside numeric components by NumberFormat
>  - but ComplexFormat does not handle ',' consistently in structural parsing
> —
> Consequence
> This leads to inconsistent parsing behavior:
>  - ',' is ignored inside numeric values
>  - ',' is not treated as a structural delimiter
>  - ',' is rejected in structural positions (e.g. around '+' or 'i')
> As a result:
>  - Parsing becomes context-dependent and non-intuitive
>  - Malformed input may be silently accepted and misinterpreted
>  - Behavior differs across versions (null vs exception)
> —
>  # 
>  ## 
>  ### Additional Notes
> This behavior is not documented in the ComplexFormat API and may surprise 
> users expecting strict parsing.
> The issue arises from the interaction between:
>  - a lenient numeric parser (NumberFormat)
>  - and a stricter structural parser (ComplexFormat)
> —
>  # 
>  ## 
>  ### Possible Improvements
>  - Disable grouping parsing in NumberFormat when used by ComplexFormat
>  - Or explicitly handle separators at the ComplexFormat level
>  - Or document the current behavior clearly
> Providing a strict parsing mode could also help avoid ambiguity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to