On 22.09.20 06:06, Steven Wu wrote:
In addition, it is undesirable to do the committed-or-not check in the commit method, which happens for each checkpoint cycle. CommitResult already indicates SUCCESS or not. when framework calls commit with a list of GlobalCommittableT, it should be certain they are uncommitted. The only time we aren't sure is when a list of GlobalCommittableT is restored from a checkpoint. `*recoverGlobalCommittables*` is the ideal place to do such a check and filter out the ones that were already committed. Retained ones will be committed in the next checkpoint cycle. Since framework takes care of the checkpoint and restore, we need some hook for the sink to add the custom logic on the restored list.
I think we don't need the `recoverGlobalCommittables()` hook. The sink implementation would have to do the filtering once, so it can either do it in the recover hook or it could do it in the next `commit()` call. Both of these would mean we only have to do one pass through the list and connect to Iceberg. Doing the check in `commit()` would mean the interface of GlobalCommittable is simpler and to me it seems natural that we do the check in the commit() method to ensure that commits are idempotent.
What do you think? Aljoscha