Alaksiej Ščarbaty created NIFI-16061:
----------------------------------------

             Summary: JsonRecordSetWriter promotes quoted JSON strings to 
numbers when field schema is CHOICE(INT, STRING)
                 Key: NIFI-16061
                 URL: https://issues.apache.org/jira/browse/NIFI-16061
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Extensions
    Affects Versions: 2.10.0
            Reporter: Alaksiej Ščarbaty


h3. Description

When a JSON field carries a quoted string value in one record and a bare number 
in another, the output incorrectly promotes the quoted string to a bare number.
h3. Root cause

Schema inference sees `TextNode("42")` as STRING and `IntNode(7)` as INT, so 
`FieldTypeInference` merges the field to `CHOICE(INT, STRING)`. At write time, 
`DataTypeUtils.findMostSuitableType` sorts candidates by `RecordFieldType` enum 
ordinal (INT=3 before STRING=13) and returns the first type the string value is 
convertible to. Because `"42"` is convertible to INT, the string is silently 
coerced to a number.

The same issue applies to any type narrower than STRING that appears in a 
CHOICE, including BOOLEAN: `"false"` is promoted to bare `false`.
h3. Steps to reproduce

Use any flow with JsonTreeReader + JsonRecordSetWriter and inferred schema with 
records:
{code:java}
{"val":"42"}
{"val":7}{code}
 

Expected output: 
{code:java}
[{"val":"42"},{"val":7}]{code}
Actual output: 

 

 
{code:java}
[{"val":42},{"val":7}]{code}
 
h3. Open questions

Is implicit type narrowing desired by default? 

Shall we avoid type narrowing in these situations and adhere to the actual 
field type? Or at least to make this behavior configurable?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to