[GitHub] flink issue #5342: [FLINK-8479] Timebounded stream join
Github user kl0u commented on the issue: https://github.com/apache/flink/pull/5342 Thanks for the work @florianschmidt1994 ! Merging this. ---
[GitHub] flink issue #5342: [FLINK-8479] Timebounded stream join
Github user kl0u commented on the issue: https://github.com/apache/flink/pull/5342 Thanks @florianschmidt1994 . I will, but may be not today. ---
[GitHub] flink issue #5342: [FLINK-8479] Timebounded stream join
Github user florianschmidt1994 commented on the issue: https://github.com/apache/flink/pull/5342 @kl0u I made some changes on how I handle events on either side of the stream. By introducing some generic methods we can now reuse large parts of the code for either input stream and remove a lot of code duplication. Could you have another look at this? ---
[GitHub] flink issue #5342: [FLINK-8479] Timebounded stream join
Github user kl0u commented on the issue: https://github.com/apache/flink/pull/5342 Changes look good to me! I will let it run on Travis and then merge. ---
[GitHub] flink issue #5342: [FLINK-8479] Timebounded stream join
Github user florianschmidt1994 commented on the issue: https://github.com/apache/flink/pull/5342 @hequn8128 Thank you for the review. Regarding your concern about delaying the watermark I added some sketches and description about my thought process to the design document. ---
[GitHub] flink issue #5342: [FLINK-8479] Timebounded stream join
Github user florianschmidt1994 commented on the issue: https://github.com/apache/flink/pull/5342 @bowenli86 the document should now be be public for everyone to comment on. Yes, it caches data on either side, and for each incoming element it looks up eligible records from the other side, and joins and emits those if they fulfil the user criteria. Entries get removed from the cache whenever they are too old to be joined, which is determined by a combination of the current watermark and the time boundary defined by the user. ---
[GitHub] flink issue #5342: [FLINK-8479] Timebounded stream join
Github user bowenli86 commented on the issue: https://github.com/apache/flink/pull/5342 Very interesting! two things: 1. can you make the google doc publicly viewable? I cannot access it right now 2. how does it handle event time window joins of two streams, where data in one stream always quite late than the other? For example, we are joining stream A and B on a 10 min event-time tumbling window from 12:00 -12:10, 12:10 - 12:20 data in stream B always arrive 30 mins later than the data in stream A. How does the operators handle that? Does it cache A's data until B's data arrives, do the join, and remove A's data from cache? (I haven't read the code in detail, just try to get a general idea of the overall design) ---
[GitHub] flink issue #5342: [FLINK-8479] Timebounded stream join
Github user florianschmidt1994 commented on the issue: https://github.com/apache/flink/pull/5342 Yeah seems like I made a typo there. It's actually https://issues.apache.org/jira/browse/FLINK-8479, I just fixed the title. ---