Hey all, >From the recent experience in continuing implementation of Splittable DoFn, I would like to propose a few changes to its API. They get rid of a bug, make parts of its semantics more well-defined and easier for a user to get right, and reduce the assumptions about the runner implementation.
In short: - Add c.updateWatermark() and report watermark continuously via this method. - Make SDF.@ProcessElement return void, which is simpler for users though it doesn't allow to resume after a specified time - Declare that SDF.@ProcessElement must guarantee that after it returns, the entire tracker.currentRestriction() was processed. - Add a bool RestrictionTracker.done() method to enforce the bullet above. - For resuming after specified time, use regular DoFn with state and timers API. The only downside is the removal (from SDF) of ability to suspend the call for a certain amount of time - the suggestion is that, if you need that, you should use a regular DoFn and the timers API. Please see the full proposal in the following doc and comment there & vote on this thread. https://docs.google.com/document/d/1BGc8pM1GOvZhwR9SARSVte-20XEoBUxrGJ5gTWXdv3c/edit?usp=sharing I am going to concurrently start prototyping some parts of this proposal, because the current implementation is simply wrong and this proposal is the only way to fix it that I can think of, but I will adjust my implementation based on the discussion. I believe this proposal should not affect runner authors - I can make all the necessary changes myself. Thanks!
