Hi,

as a follow-up from previous design draft, I'd like to promote the document [1] and associated PR [2] to proposal.

The PR contains working implementation for:

 - non-portable batch flink and batch spark (legacy)

 - all non-portable streaming runners that use StatefulDoFnRunner (direct, samza, dataflow)

 - portable flink (batch, streaming)

There are still some unresolved issues:

 a) no way to specify allowed lateness (currently is simply zero, late data should be dropped)

 b) need a way to specify user UDF for extracting timestamp (according to [3] it would be useful to have that option)

 c) need to add more tests (e.g. late data)

The plan is to postpone resolution of issues a) and b) after the proposal is merged. I'd like to gather some more feedback on the proposal, iterate over that again, add more tests and then pass this to a vote.

Unrelated - during implementation a bug [4] in Samza runner was found.

Looking forward to any comments!

Jan

[1] https://docs.google.com/document/d/1ObLVUFsf1NcG8ZuIZE4aVy2RYKx2FfyMhkZYWPnI9-c/

[2] https://github.com/apache/beam/pull/8774

[3] https://lists.apache.org/thread.html/813429e78b895a336d4f5507e3b2330282e2904fa25d52d6d441741a@%3Cdev.beam.apache.org%3E

[4] https://issues.apache.org/jira/browse/BEAM-8529


On 5/23/19 4:10 PM, Jan Lukavský wrote:
Hi,

I have written a very brief draft of how it might be possible to implement @RequireTimeSortedInput discussed in [1]. I see the document [2] a starting point for a discussion. There are several open questions, which I believe can be resolved by this great community. :-)

Jan

[1] https://lists.apache.org/thread.html/4609a1bb1662690d67950e76d2f1108b51327b8feaf9580de659552e@%3Cdev.beam.apache.org%3E

[2] https://docs.google.com/document/d/1ObLVUFsf1NcG8ZuIZE4aVy2RYKx2FfyMhkZYWPnI9-c/

Reply via email to