I have started to work on how to change the user facing API within the Java
SDK to support splitting/checkpointing[1], backlog reporting[2] and bundle
finalization[3].

I have this PR[4] which contains minimal interface/type definitions to
convey how the API surface would change with these 4 changes:
1) Exposes the ability for @SplitRestriction to take a backlog suggestion
on how to perform splitting and for how many restrictions should be
returned.
2) Adds the ability for RestrictionTrackers to report backlog
3) Updates @ProcessElement to be required to take a generic
RestrictionTracker instead of the users own restriction tracker type.
4) Adds the ability for @StartBundle/@ProcessElement/@FinishBundle to
register a callback that is invoked after bundle finalization.

The details are in the javadoc comments as to how I would expect the
contract to play out.
Feel free to comment on the ML/PR around the contract and after the
feedback is received/digested/implemented, I would like to get the changes
submitted so that work can start  towards providing an implementation in
the Java SDK, Python SDK, and Go SDK and the shared runner portability
library.

I would like to call out special attention to 3 since with this change it
will enable us to remove the synchronization requirement for users as we
will wrap the underlying restriction tracker allowing us to add appropriate
synchronization as needed and also to watch any calls that pass through the
object such as the claim calls. I also believe this prevents people from
writing RestrictionTrackers where the contract of tryClaim is subverted
since markDone is outside the purview of tryClaim as in
ByteKeyRangeTracker[5].

1: https://s.apache.org/beam-checkpoint-and-split-bundles
2: https://s.apache.org/beam-bundles-backlog-splitting
3: https://s.apache.org/beam-finalizing-bundles
4: https://github.com/apache/beam/pull/6969
5: https://github.com/apache/beam/pull/6949

Reply via email to