[jira] [Created] (NIFI-15568) Iceberg S3 on-prem support and iceberg-parquet timestamp fix.

Nir Yanay (Jira) Sun, 08 Feb 2026 07:28:07 -0800

Nir Yanay created NIFI-15568:
--------------------------------

             Summary: Iceberg S3 on-prem support and iceberg-parquet timestamp 
fix.
                 Key: NIFI-15568
                 URL: https://issues.apache.org/jira/browse/NIFI-15568
             Project: Apache NiFi
          Issue Type: Bug
          Components: Extensions
    Affects Versions: 2.7.2, 2.7.1, 2.7.0, 2.8.0
            Reporter: Nir Yanay

While working with PutIcebergRecord in NiFi 2.7.2, I encountered two separate
issues when writing to Apache Iceberg tables using an on-prem S3-compatible
object store and an Iceberg REST catalog.
h3. *Issue 1: On-Prem S3 Configuration Not Supported by S3FileIOProvider*

NiFi's default S3IcebergFileIOProvider does not expose the necessary
configuration options required to connect to an on-prem S3-compatible storage
(e.g., MinIO).

Specifically, it does not allow configuring:
* Custom S3 endpoint
* Path-style access
* Storage class

As a result, PutIcebergRecord cannot be used with an on-prem S3 backend out of
the box. To resolve this, I extended S3IcebergFileIOProvider to support the
missing properties, enabling connectivity to on-prem S3-compatible storage
systems.
h3. *Issue 2: Timestamp Type Mismatch Between NiFi and Iceberg*

After enabling on-prem S3 support, I encountered a timestamp compatibility
issue when writing records containing timestamp fields: NiFi represents
timestamps as java.sql.timestamp while Iceberg represents timestamps as
java.time.LocalDateTime ( [Find
Here|https://github.com/apache/iceberg/blob/730ce29d5cd722b1751a1984d9eabb68542eba39/parquet/src/main/java/org/apache/iceberg/data/parquet/GenericParquetWriter.java#L122]
)
h4. Unpartitioned Tables

Initially, I added a converter to handle the type conversion, which resolved
the issue for unpartitioned Iceberg tables.
h4. Partitioned Tables

However, when the timestamp column was used as a partition key unfortunately
writes failed again. Further investigation showed that Iceberg internally
expects timestamp partition keys values to be represented both as Long and
LocalDateTime at different places in the flow of writing.

To resolve this, I leveraged Iceberg's InternanlRecordWrapper, which correctly
handles this dual representation and allows partitioned writes to succeed.

*PR*

A pull request has been opened addressing both issues: LINK

--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (NIFI-15568) Iceberg S3 on-prem support and iceberg-parquet timestamp fix.

Reply via email to