[
https://issues.apache.org/jira/browse/NIFI-16061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18093061#comment-18093061
]
David Handermann commented on NIFI-16061:
-----------------------------------------
Thanks for describing the current behavior [~awelless].
The current type narrowing is expected and preferred in general for {{CHOICE}}
record field types, as it attempts to provide a more consistent shape for
output record values, regardless of input format.
The {{Serialized JSON Input Handling}} property on the {{JsonRecordSetWriter}}
presents one possible solution, provided that the input satisfies the criteria
to qualify.
Another possibility could be the introduction of a new configuration property,
but it actually seems like a Schema Inference question. In other words, if the
output schema defined the field type as a String, then that would avoid the
type narrowing.
> JsonRecordSetWriter promotes quoted JSON strings to numbers when field schema
> is CHOICE(INT, STRING)
> ----------------------------------------------------------------------------------------------------
>
> Key: NIFI-16061
> URL: https://issues.apache.org/jira/browse/NIFI-16061
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Affects Versions: 2.10.0
> Reporter: Alaksiej Ščarbaty
> Priority: Major
>
> h3. Description
> When a JSON field carries a quoted string value in one record and a bare
> number in another, the output incorrectly promotes the quoted string to a
> bare number.
> h3. Root cause
> Schema inference sees _TextNode("42")_ as STRING and _IntNode(7)_ as INT, so
> _FieldTypeInference_ merges the field to {_}CHOICE(INT, STRING){_}. At write
> time, _DataTypeUtils.findMostSuitableType_ sorts candidates by
> _RecordFieldType_ enum ordinal (INT=3 before STRING=13) and returns the first
> type the string value is convertible to. Because _"42"_ is convertible to
> INT, the string is silently coerced to a number.
> The same issue applies to any type narrower than STRING that appears in a
> CHOICE, including BOOLEAN: _"false"_ is promoted to bare {_}false{_}.
> h3. Steps to reproduce
> Use any flow with _JsonTreeReader_ + _JsonRecordSetWriter_ and inferred
> schema with records:
> {code:java}
> {"val":"42"}
> {"val":7}{code}
>
> Expected writer outpue:
> {code:java}
> [{"val":"42"},{"val":7}]{code}
> Actual output:
> {code:java}
> [{"val":42},{"val":7}]{code}
> h3. Open questions
> Is implicit type narrowing desired by default?
> Shall we avoid type narrowing in these situations and adhere to the actual
> field type? Or at least to make this behavior configurable?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)