[ 
https://issues.apache.org/jira/browse/NIFI-16061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18093061#comment-18093061
 ] 

David Handermann commented on NIFI-16061:
-----------------------------------------

Thanks for describing the current behavior [~awelless].

The current type narrowing is expected and preferred in general for {{CHOICE}} 
record field types, as it attempts to provide a more consistent shape for 
output record values, regardless of input format.

The {{Serialized JSON Input Handling}} property on the {{JsonRecordSetWriter}} 
presents one possible solution, provided that the input satisfies the criteria 
to qualify.

Another possibility could be the introduction of a new configuration property, 
but it actually seems like a Schema Inference question. In other words, if the 
output schema defined the field type as a String, then that would avoid the 
type narrowing.

> JsonRecordSetWriter promotes quoted JSON strings to numbers when field schema 
> is CHOICE(INT, STRING)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-16061
>                 URL: https://issues.apache.org/jira/browse/NIFI-16061
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>    Affects Versions: 2.10.0
>            Reporter: Alaksiej Ščarbaty
>            Priority: Major
>
> h3. Description
> When a JSON field carries a quoted string value in one record and a bare 
> number in another, the output incorrectly promotes the quoted string to a 
> bare number.
> h3. Root cause
> Schema inference sees _TextNode("42")_ as STRING and _IntNode(7)_ as INT, so 
> _FieldTypeInference_ merges the field to {_}CHOICE(INT, STRING){_}. At write 
> time, _DataTypeUtils.findMostSuitableType_ sorts candidates by 
> _RecordFieldType_ enum ordinal (INT=3 before STRING=13) and returns the first 
> type the string value is convertible to. Because _"42"_ is convertible to 
> INT, the string is silently coerced to a number.
> The same issue applies to any type narrower than STRING that appears in a 
> CHOICE, including BOOLEAN: _"false"_ is promoted to bare {_}false{_}.
> h3. Steps to reproduce
> Use any flow with _JsonTreeReader_ + _JsonRecordSetWriter_ and inferred 
> schema with records:
> {code:java}
> {"val":"42"}
> {"val":7}{code}
>  
> Expected writer outpue: 
> {code:java}
> [{"val":"42"},{"val":7}]{code}
> Actual output: 
> {code:java}
> [{"val":42},{"val":7}]{code}
> h3. Open questions
> Is implicit type narrowing desired by default? 
> Shall we avoid type narrowing in these situations and adhere to the actual 
> field type? Or at least to make this behavior configurable?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to