I have been working on the protocol for splitting/checkpointing of bundles for usage with SplittableDoFn but in the mean time wanted to share a proposal for bundle finalization[1].
Bundle finalization is used to solve a problem where integration with external systems which require acknowledgement (such as queue based sources) should only be done when the output of a bundle is durably persisted. The idea is that after a bundle is completed and the runner durably persists the output a best effort finalization call is made back to the same SDK harness instance. This allows the SDK harness to send any "acknowledgements" to the external system. Any failures during finalization require the external system to be able to restore anything which wasn't acknowledged. I also discuss why I don't believe we gain much by providing "guaranteed" finalization. Please take a look at the doc I shared and feel free to comment. 1: https://s.apache.org/beam-finalizing-bundles