Hi Owen,

This is a great idea. I think we could support this to some degree.
For example, by running a user-generated function in the pre-commit
aggregator: 
https://github.com/apache/iceberg/blob/15485f5523d08aae2a503c143c51b6df2debb655/flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergWriteAggregator.java#L109
This is going to be fairly limited, but would for instance allow to
keep track of the watermark.

Another extension point would be
https://github.com/apache/iceberg/blob/15485f5523d08aae2a503c143c51b6df2debb655/flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergSinkWriter.java#L99

Cheers,
Max

On Tue, Nov 25, 2025 at 9:03 PM Owen Zhang via dev
<[email protected]> wrote:
>
> Hi Team,
>
> I'd like to initiate a discussion on a feature that appears to be valuable 
> for Flink + Iceberg users. (Related issue: #14662)
>
> Currently, Iceberg's FlinkSink offers a set of data statistics in snapshot 
> summary. However, there is no mechanism for application-level code to 
> populate custom/application-defined statistics into Iceberg snapshot 
> properties at commit time. An example use case:
>
> A Flink job computes the event-time boundaries for data ingested in each 
> checkpoint (min/max event time in that batch) and aims to include this 
> information in the snapshot summary, alongside the built-in statistics. The 
> snapshot summary is a natural place for such metadata, since the statistics 
> directly describe the data in that specific snapshot and belong with the 
> snapshot itself. At the same time, the logic for computing these statistics 
> is application-specific, making it difficult to handle entirely within the 
> Iceberg framework.
>
> We've explored workarounds (such as static variables and external store, see 
> this PR) to pass these values to the committer, but these approaches are 
> either not robust or add unnecessary complexity.
>
> Is there a recommended Flink-native approach for allowing applications to 
> propagate custom, per-checkpoint metadata from Flink operators to the Iceberg 
> committer to write to snapshot summary? If not, would the community be 
> interested in supporting such a feature?
>
> Any guidance or pointers to related work would be appreciated. We’re also 
> happy to contribute if this aligns with project goals.
>
> Thanks,
> Owen

Reply via email to