Hi,
as a follow-up from previous design draft, I'd like to promote the
document [1] and associated PR [2] to proposal.
The PR contains working implementation for:
- non-portable batch flink and batch spark (legacy)
- all non-portable streaming runners that use StatefulDoFnRunner
(direct, samza, dataflow)
- portable flink (batch, streaming)
There are still some unresolved issues:
a) no way to specify allowed lateness (currently is simply zero, late
data should be dropped)
b) need a way to specify user UDF for extracting timestamp (according
to [3] it would be useful to have that option)
c) need to add more tests (e.g. late data)
The plan is to postpone resolution of issues a) and b) after the
proposal is merged. I'd like to gather some more feedback on the
proposal, iterate over that again, add more tests and then pass this to
a vote.
Unrelated - during implementation a bug [4] in Samza runner was found.
Looking forward to any comments!
Jan
[1]
https://docs.google.com/document/d/1ObLVUFsf1NcG8ZuIZE4aVy2RYKx2FfyMhkZYWPnI9-c/
[2] https://github.com/apache/beam/pull/8774
[3]
https://lists.apache.org/thread.html/813429e78b895a336d4f5507e3b2330282e2904fa25d52d6d441741a@%3Cdev.beam.apache.org%3E
[4] https://issues.apache.org/jira/browse/BEAM-8529
On 5/23/19 4:10 PM, Jan Lukavský wrote:
Hi,
I have written a very brief draft of how it might be possible to
implement @RequireTimeSortedInput discussed in [1]. I see the document
[2] a starting point for a discussion. There are several open
questions, which I believe can be resolved by this great community. :-)
Jan
[1]
https://lists.apache.org/thread.html/4609a1bb1662690d67950e76d2f1108b51327b8feaf9580de659552e@%3Cdev.beam.apache.org%3E
[2]
https://docs.google.com/document/d/1ObLVUFsf1NcG8ZuIZE4aVy2RYKx2FfyMhkZYWPnI9-c/