Steven,

we were also wondering if it is a strict requirement that "later" updates to Iceberg subsume earlier updates. In the current version, you only check whether checkpoint X made it to Iceberg and then discard all committable state from Flink state for checkpoints smaller X.

If we go with a (somewhat random) nonce, this would not work. Instead the sink would have to check for each set of committables seperately if they had already been committed. Do you think this is feasible? During normal operation this set would be very small, it would usually only be the committables for the last checkpoint. Only when there is an outage would multiple sets of committables pile up.

We were thinking to extend the GlobalCommitter interface to allow it to report success or failure and then let the framework retry. I think this is something that you would need for the Iceberg case. The signature could be like this:

CommitStatus commitGlobally(List<Committable>, Nonce)

where CommitStatus could be an enum of SUCCESS, TERMINAL_FAILURE, and RETRY.

Best,
Aljoscha

Reply via email to