[
https://issues.apache.org/jira/browse/NIFI-15866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sönke Liebau updated NIFI-15866:
--------------------------------
Description:
The PutIcebergRecord processor currently does not support columns with type
Date.
When inserting in a table with a Date column, each time a ClassCastException
between java.sql.Date (from the Flowfile) and java.time.LocalDate (required
from PutIcebergRecord) is thrown.
This behavior occurs with Avro as well as Parquet and their specific readers
configured in PutIcebergRecord, with a hardcorded Avro Schema referencing the
Date column.
This bug is very similar to an already fixed bug regarding
Datetimes/Timestamps: https://issues.apache.org/jira/browse/NIFI-15568.
*How to reproduce*
* Create an Iceberg table with one Date column
* Generate a record flowfile with GenerateFlowFile containing a value for this
column, for example as CSV or JSON. Give an explicit Avro Schema as attribute
in the GenerateFlowfile:
+CSV:+
{code:java}
dateCol
234234
{code}
+Avro Schema:+
{code:java}
{
"type": "record",
"name": "Document",
"namespace": "com.example",
"fields": [
{
"name": "dateCol",
"type": [
"null",
{
"type": "int",
"logicalType": "date"
}
]
}
]
}{code}
* Use a ConvertRecord to transform the CSV to an Avro flowfile, using a
AvroRecordSetWriter with setting „Use ‚Schema Text‘ Property“
* Write into Iceberg with PutIcebergRecord and a default AvroReader
Stacktrace:
{code:java}
PutIcebergRecord[id=8e5f6f95-9ec0-37d7-ad34-9aa4500355f7] Write Rows to Table
[xyz.test] failed FlowFile[filename=b13a8af8-851e-4f00-93dd-69f40a36e7b5]:
java.lang.ClassCastException: class java.sql.Date cannot be cast to class
java.time.LocalDate (java.sql.Date is in module java.sql of loader 'platform';
java.time.LocalDate is in module java.base of loader 'bootstrap')
java.lang.ClassCastException: class java.sql.Date cannot be cast to class
java.time.LocalDate (java.sql.Date is in module java.sql of loader 'platform';
java.time.LocalDate is in module java.base of loader 'bootstrap')at
org.apache.iceberg.data.parquet.GenericParquetWriter$DateWriter.write(GenericParquetWriter.java:91)at
org.apache.iceberg.parquet.ParquetValueWriters$OptionWriter.write(ParquetValueWriters.java:421)at
org.apache.iceberg.parquet.ParquetValueWriters$StructWriter.write(ParquetValueWriters.java:665)at
org.apache.iceberg.parquet.ParquetWriter.add(ParquetWriter.java:138)at
org.apache.iceberg.io.DataWriter.write(DataWriter.java:71)at
org.apache.iceberg.io.BaseTaskWriter$RollingFileWriter.write(BaseTaskWriter.java:401)at
org.apache.iceberg.io.BaseTaskWriter$RollingFileWriter.write(BaseTaskWriter.java:384)at
org.apache.iceberg.io.BaseTaskWriter$BaseRollingWriter.write(BaseTaskWriter.java:311)at
org.apache.iceberg.io.UnpartitionedWriter.write(UnpartitionedWriter.java:42)at
org.apache.nifi.services.iceberg.parquet.io.ParquetIcebergRowWriter.write(ParquetIcebergRowWriter.java:39)at
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(Unknown
Source)at java.base/java.lang.reflect.Method.invoke(Unknown Source)at
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:251)at
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler$ProxiedReturnObjectInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:237)at
jdk.proxy515/jdk.proxy515.$Proxy706.write(Unknown Source)at
org.apache.nifi.processors.iceberg.PutIcebergRecord.writeRecords(PutIcebergRecord.java:240)at
org.apache.nifi.processors.iceberg.PutIcebergRecord.processFlowFiles(PutIcebergRecord.java:176)at
org.apache.nifi.processors.iceberg.PutIcebergRecord.onTrigger(PutIcebergRecord.java:156)at
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1274)at
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:229)at
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:102)at
org.apache.nifi.engine.FlowEngine.lambda$wrap$1(FlowEngine.java:105)at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)at
java.base/java.util.concurrent.FutureTask.runAndReset(Unknown Source)at
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
Source)at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)at java.base/java.lang.Thread.run(Unknown Source) {code}
was:
The PutIcebergRecord processor currently does not support columns with type
Date.
When inserting in a table with a Date column, each time a ClassCastException
between java.sql.Date (from the Flowfile) and java.time.LocalDate (required
from PutIcebergRecord) is thrown.
This behavior occurs with Avro as well as Parquet and their specific readers
configured in PutIcebergRecord, with a hardcorded Avro Schema referencing the
Date column.
This bug is very similar to an already fixed bug regarding
Datetimes/Timestamps: https://issues.apache.org/jira/browse/NIFI-15568.
For reproducing the bug in NiFi:
* Create an Iceberg table with one Date column
* Generate a record flowfile with GenerateFlowFile containing a value for this
column, for example as CSV or JSON. Give an explicit Avro Schema as attribute
in the GenerateFlowfile:
+CSV:+
dateCol
234234
+Avro Schema:+
{
"type": "record",
"name": "Document",
"namespace": "com.example",
"fields": [
{
"name": "dateCol",
"type": ["null", \{"type": "int", "logicalType": "date"} ]
}
]
}
* Use a ConvertRecord to transform the CSV to an Avro flowfile, using a
AvroRecordSetWriter with setting „Use ‚Schema Text‘ Property“
* Write into Iceberg with PutIcebergRecord and a default AvroReader
Summary: Inserting Date values in Iceberg tables results in Exception
(was: nserting Date values in Iceberg tables )
> Inserting Date values in Iceberg tables results in Exception
> ------------------------------------------------------------
>
> Key: NIFI-15866
> URL: https://issues.apache.org/jira/browse/NIFI-15866
> Project: Apache NiFi
> Issue Type: Bug
> Affects Versions: 2.9.0
> Reporter: Sönke Liebau
> Priority: Minor
>
> The PutIcebergRecord processor currently does not support columns with type
> Date.
> When inserting in a table with a Date column, each time a ClassCastException
> between java.sql.Date (from the Flowfile) and java.time.LocalDate (required
> from PutIcebergRecord) is thrown.
> This behavior occurs with Avro as well as Parquet and their specific readers
> configured in PutIcebergRecord, with a hardcorded Avro Schema referencing the
> Date column.
>
> This bug is very similar to an already fixed bug regarding
> Datetimes/Timestamps: https://issues.apache.org/jira/browse/NIFI-15568.
>
> *How to reproduce*
> * Create an Iceberg table with one Date column
> * Generate a record flowfile with GenerateFlowFile containing a value for
> this column, for example as CSV or JSON. Give an explicit Avro Schema as
> attribute in the GenerateFlowfile:
>
> +CSV:+
>
> {code:java}
> dateCol
> 234234
> {code}
>
>
> +Avro Schema:+
>
> {code:java}
> {
> "type": "record",
> "name": "Document",
> "namespace": "com.example",
> "fields": [
> {
> "name": "dateCol",
> "type": [
> "null",
> {
> "type": "int",
> "logicalType": "date"
> }
> ]
> }
> ]
> }{code}
>
> * Use a ConvertRecord to transform the CSV to an Avro flowfile, using a
> AvroRecordSetWriter with setting „Use ‚Schema Text‘ Property“
> * Write into Iceberg with PutIcebergRecord and a default AvroReader
>
> Stacktrace:
> {code:java}
> PutIcebergRecord[id=8e5f6f95-9ec0-37d7-ad34-9aa4500355f7] Write Rows to Table
> [xyz.test] failed FlowFile[filename=b13a8af8-851e-4f00-93dd-69f40a36e7b5]:
> java.lang.ClassCastException: class java.sql.Date cannot be cast to class
> java.time.LocalDate (java.sql.Date is in module java.sql of loader
> 'platform'; java.time.LocalDate is in module java.base of loader 'bootstrap')
> java.lang.ClassCastException: class java.sql.Date cannot be cast to class
> java.time.LocalDate (java.sql.Date is in module java.sql of loader
> 'platform'; java.time.LocalDate is in module java.base of loader
> 'bootstrap')at
> org.apache.iceberg.data.parquet.GenericParquetWriter$DateWriter.write(GenericParquetWriter.java:91)at
>
> org.apache.iceberg.parquet.ParquetValueWriters$OptionWriter.write(ParquetValueWriters.java:421)at
>
> org.apache.iceberg.parquet.ParquetValueWriters$StructWriter.write(ParquetValueWriters.java:665)at
> org.apache.iceberg.parquet.ParquetWriter.add(ParquetWriter.java:138)at
> org.apache.iceberg.io.DataWriter.write(DataWriter.java:71)at
> org.apache.iceberg.io.BaseTaskWriter$RollingFileWriter.write(BaseTaskWriter.java:401)at
>
> org.apache.iceberg.io.BaseTaskWriter$RollingFileWriter.write(BaseTaskWriter.java:384)at
>
> org.apache.iceberg.io.BaseTaskWriter$BaseRollingWriter.write(BaseTaskWriter.java:311)at
>
> org.apache.iceberg.io.UnpartitionedWriter.write(UnpartitionedWriter.java:42)at
>
> org.apache.nifi.services.iceberg.parquet.io.ParquetIcebergRowWriter.write(ParquetIcebergRowWriter.java:39)at
> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(Unknown
> Source)at java.base/java.lang.reflect.Method.invoke(Unknown Source)at
> org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:251)at
>
> org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler$ProxiedReturnObjectInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:237)at
> jdk.proxy515/jdk.proxy515.$Proxy706.write(Unknown Source)at
> org.apache.nifi.processors.iceberg.PutIcebergRecord.writeRecords(PutIcebergRecord.java:240)at
>
> org.apache.nifi.processors.iceberg.PutIcebergRecord.processFlowFiles(PutIcebergRecord.java:176)at
>
> org.apache.nifi.processors.iceberg.PutIcebergRecord.onTrigger(PutIcebergRecord.java:156)at
>
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)at
>
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1274)at
>
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:229)at
>
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:102)at
> org.apache.nifi.engine.FlowEngine.lambda$wrap$1(FlowEngine.java:105)at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown
> Source)at java.base/java.util.concurrent.FutureTask.runAndReset(Unknown
> Source)at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
> Source)at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)at java.base/java.lang.Thread.run(Unknown Source) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)