[ 
https://issues.apache.org/jira/browse/NIFI-16067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18093290#comment-18093290
 ] 

Sönke Liebau commented on NIFI-16067:
-------------------------------------

We opened a PR for this: https://github.com/apache/nifi/pull/11387

> PutIcebergRecord maps fields by ordinal position instead of column name
> -----------------------------------------------------------------------
>
>                 Key: NIFI-16067
>                 URL: https://issues.apache.org/jira/browse/NIFI-16067
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 2.7.0, 2.8.0, 2.7.1, 2.7.2, 2.9.0, 2.10.0
>            Reporter: Sönke Liebau
>            Priority: Major
>
> h3. Description
> PutIcebergRecord (added in NIFI-15062) doesn't match columns by name. It 
> writes each
> record's values into the Iceberg table columns by position, and the position 
> it uses is
> the one from the *incoming* record schema, not the table schema. So the 
> mapping is wrong
> whenever the input field order doesn't line up with the table's column order.
> If the record has fewer fields than the table, the missing columns aren't set 
> to NULL
> either - the values just shift left into whatever table column happens to sit 
> at that
> position. Sometimes that lands on an incompatible type and you get a 
> ClassCastException,
> which is the good outcome because at least it fails loudly. When the types 
> happen to line
> up, the row is written with values in the wrong columns and nothing complains.
> This gets worse with schema evolution: as soon as you add a column, the 
> producer and the
> table no longer agree on the column set, and existing flows start silently 
> writing data
> into the wrong place.
> h3. Steps to reproduce
> Create an Iceberg table whose column order differs from the incoming record 
> (or add a
> new column so the record has fewer fields than the table), then push records 
> through
> PutIcebergRecord into it. Depending on the type alignment you'll either see a
> ClassCastException or rows written into the wrong columns.
> h3. Root cause
> {{DelegatedRecord.get(int position)}} looks the position up against the input 
> record
> schema instead of the Iceberg struct the wrapper already holds:
> nifi-iceberg-processors/.../record/DelegatedRecord.java, lines 78-82:
> {code:java}
> @Override
> public Object get(final int position) {
>     final RecordField recordField = record.getSchema().getField(position); // 
> input schema
>     return record.getValue(recordField);
> }
> {code}
> The Iceberg write path walks the table struct position by position and calls 
> `get( i )`
> for column i, but '{{{}get( i )'{}}} hands back input field i, so column i 
> ends up with input
> column i's value no matter what it's named. The struct is only exposed through
> {{struct()}} and never used for the actual lookup. `{{{}get(int, Class)`{}}} 
> just delegates to `{{{}get(int)`{}}}, so it's wrong too.
> h3. Fix
> Look the field up by name instead of id, something along the lines below:
> {code:java}
> @Override
> public Object get(final int position) {
>     final Types.NestedField field = struct.fields().get(position);
>     return record.getValue(field.name());
> }
> {code}
> We are happy to open a PR for this if the issue is confirmed.
> h3. Related
> NIFI-14162 and NIFI-14614 are the same ordinal-vs-name problem in QueryRecord 
> (both
> fixed for 2.7.0). This is the Iceberg-bundle version of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to