Alaksiej Ščarbaty created NIFI-16061:
----------------------------------------
Summary: JsonRecordSetWriter promotes quoted JSON strings to
numbers when field schema is CHOICE(INT, STRING)
Key: NIFI-16061
URL: https://issues.apache.org/jira/browse/NIFI-16061
Project: Apache NiFi
Issue Type: Improvement
Components: Extensions
Affects Versions: 2.10.0
Reporter: Alaksiej Ščarbaty
h3. Description
When a JSON field carries a quoted string value in one record and a bare number
in another, the output incorrectly promotes the quoted string to a bare number.
h3. Root cause
Schema inference sees `TextNode("42")` as STRING and `IntNode(7)` as INT, so
`FieldTypeInference` merges the field to `CHOICE(INT, STRING)`. At write time,
`DataTypeUtils.findMostSuitableType` sorts candidates by `RecordFieldType` enum
ordinal (INT=3 before STRING=13) and returns the first type the string value is
convertible to. Because `"42"` is convertible to INT, the string is silently
coerced to a number.
The same issue applies to any type narrower than STRING that appears in a
CHOICE, including BOOLEAN: `"false"` is promoted to bare `false`.
h3. Steps to reproduce
Use any flow with JsonTreeReader + JsonRecordSetWriter and inferred schema with
records:
{code:java}
{"val":"42"}
{"val":7}{code}
Expected output:
{code:java}
[{"val":"42"},{"val":7}]{code}
Actual output:
{code:java}
[{"val":42},{"val":7}]{code}
h3. Open questions
Is implicit type narrowing desired by default?
Shall we avoid type narrowing in these situations and adhere to the actual
field type? Or at least to make this behavior configurable?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)