[ 
https://issues.apache.org/jira/browse/NIFI-15568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nir Yanay updated NIFI-15568:
-----------------------------
    Description: 
While working with PutIcebergRecord in NiFi 2.7.2, I encountered two separate 
issues when writing to Apache Iceberg tables using an on-prem S3-compatible 
object store and an Iceberg REST catalog.
h3. *Issue 1: On-Prem S3 Configuration Not Supported by S3FileIOProvider*

NiFi's default S3IcebergFileIOProvider does not expose the necessary 
configuration options required to connect to an on-prem S3-compatible storage 
(e.g., MinIO).

Specifically, it does not allow configuring:
 * Custom S3 endpoint
 * Path-style access
 * Storage class

As a result, PutIcebergRecord cannot be used with an on-prem S3 backend out of 
the box. To resolve this, I extended S3IcebergFileIOProvider to support the 
missing properties, enabling connectivity to on-prem S3-compatible storage 
systems.
h3. *Issue 2: Timestamp Type Mismatch Between NiFi and Iceberg*

After enabling on-prem S3 support, I encountered a timestamp compatibility 
issue when writing records containing timestamp fields: NiFi represents 
timestamps as java.sql.timestamp while Iceberg represents timestamps as 
java.time.LocalDateTime ( [Find 
Here|https://github.com/apache/iceberg/blob/730ce29d5cd722b1751a1984d9eabb68542eba39/parquet/src/main/java/org/apache/iceberg/data/parquet/GenericParquetWriter.java#L122]
 )
h4. Unpartitioned Tables

Initially, I added a converter to handle the type conversion, which resolved 
the issue for unpartitioned Iceberg tables.
h4. Partitioned Tables

However, when the timestamp column was used as a partition key unfortunately 
writes failed again. Further investigation showed that Iceberg internally 
expects timestamp partition keys values to be represented both as Long and 
LocalDateTime at different places in the flow of writing. 

To resolve this, I leveraged Iceberg's InternanlRecordWrapper, which correctly 
handles this dual representation and allows partitioned writes to succeed.

*PR*

I am currently working on a pull request i will update a link to it soon.

  was:
While working with PutIcebergRecord in NiFi 2.7.2, I encountered two separate 
issues when writing to Apache Iceberg tables using an on-prem S3-compatible 
object store and an Iceberg REST catalog.
h3. *Issue 1: On-Prem S3 Configuration Not Supported by S3FileIOProvider*

NiFi's default S3IcebergFileIOProvider does not expose the necessary 
configuration options required to connect to an on-prem S3-compatible storage 
(e.g., MinIO).

Specifically, it does not allow configuring:
 * Custom S3 endpoint
 * Path-style access
 * Storage class

As a result, PutIcebergRecord cannot be used with an on-prem S3 backend out of 
the box. To resolve this, I extended S3IcebergFileIOProvider to support the 
missing properties, enabling connectivity to on-prem S3-compatible storage 
systems.
h3. *Issue 2: Timestamp Type Mismatch Between NiFi and Iceberg*

After enabling on-prem S3 support, I encountered a timestamp compatibility 
issue when writing records containing timestamp fields: NiFi represents 
timestamps as java.sql.timestamp while Iceberg represents timestamps as 
java.time.LocalDateTime ( [Find 
Here|https://github.com/apache/iceberg/blob/730ce29d5cd722b1751a1984d9eabb68542eba39/parquet/src/main/java/org/apache/iceberg/data/parquet/GenericParquetWriter.java#L122]
 )
h4. Unpartitioned Tables

Initially, I added a converter to handle the type conversion, which resolved 
the issue for unpartitioned Iceberg tables.
h4. Partitioned Tables

However, when the timestamp column was used as a partition key unfortunately 
writes failed again. Further investigation showed that Iceberg internally 
expects timestamp partition keys values to be represented both as Long and 
LocalDateTime at different places in the flow of writing. 

To resolve this, I leveraged Iceberg's InternanlRecordWrapper, which correctly 
handles this dual representation and allows partitioned writes to succeed.

*PR*

A pull request has been opened addressing both issues: LINK


> Iceberg S3 on-prem support and iceberg-parquet timestamp fix.
> -------------------------------------------------------------
>
>                 Key: NIFI-15568
>                 URL: https://issues.apache.org/jira/browse/NIFI-15568
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 2.7.0, 2.8.0, 2.7.1, 2.7.2
>            Reporter: Nir Yanay
>            Priority: Minor
>
> While working with PutIcebergRecord in NiFi 2.7.2, I encountered two separate 
> issues when writing to Apache Iceberg tables using an on-prem S3-compatible 
> object store and an Iceberg REST catalog.
> h3. *Issue 1: On-Prem S3 Configuration Not Supported by S3FileIOProvider*
> NiFi's default S3IcebergFileIOProvider does not expose the necessary 
> configuration options required to connect to an on-prem S3-compatible storage 
> (e.g., MinIO).
> Specifically, it does not allow configuring:
>  * Custom S3 endpoint
>  * Path-style access
>  * Storage class
> As a result, PutIcebergRecord cannot be used with an on-prem S3 backend out 
> of the box. To resolve this, I extended S3IcebergFileIOProvider to support 
> the missing properties, enabling connectivity to on-prem S3-compatible 
> storage systems.
> h3. *Issue 2: Timestamp Type Mismatch Between NiFi and Iceberg*
> After enabling on-prem S3 support, I encountered a timestamp compatibility 
> issue when writing records containing timestamp fields: NiFi represents 
> timestamps as java.sql.timestamp while Iceberg represents timestamps as 
> java.time.LocalDateTime ( [Find 
> Here|https://github.com/apache/iceberg/blob/730ce29d5cd722b1751a1984d9eabb68542eba39/parquet/src/main/java/org/apache/iceberg/data/parquet/GenericParquetWriter.java#L122]
>  )
> h4. Unpartitioned Tables
> Initially, I added a converter to handle the type conversion, which resolved 
> the issue for unpartitioned Iceberg tables.
> h4. Partitioned Tables
> However, when the timestamp column was used as a partition key unfortunately 
> writes failed again. Further investigation showed that Iceberg internally 
> expects timestamp partition keys values to be represented both as Long and 
> LocalDateTime at different places in the flow of writing. 
> To resolve this, I leveraged Iceberg's InternanlRecordWrapper, which 
> correctly handles this dual representation and allows partitioned writes to 
> succeed.
> *PR*
> I am currently working on a pull request i will update a link to it soon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to