[ https://issues.apache.org/jira/browse/FLINK-28719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578890#comment-17578890 ]
Chesnay Schepler edited comment on FLINK-28719 at 8/12/22 9:42 AM: ------------------------------------------------------------------- ??As far as I understand, op1 and op2 should have watermark 1 and 2 respectively, because those subtasks don't have any events in them (besides 1 and 2, of course, which create those watermarks). Then, why do they get the maximum watermark of 7 after first step??? Watermarks are broadcasted to all downstream operators. So, if a source emits watermark 7, then all maps get it as an input. ??Also, why after op1 and op2, op4 and op5 get consumed? Is there any strategy that dictates in which order to process subtasks? Whichever arrives first at the downstream operator. You have 7 map subtasks all sending data to the window operators at the same time, and the order in which they arrive is not deterministic. So they _may_ arrive in a perfect round-robin pattern, or sequentially, or in any other pattern. The only guarantee is that the elements from a particular map subtask arrive in the same order that they were sent in. ??Also, why op3 to op7 have such watermarks: W2, W5, W6, W6, W6, after first step? I thought those subtasks should have watermark of Long.MinValue, because there were no elements before? The above 2 comments should answer it; all watermarks are sent to all map subtasks, and are consumed by the window operator in any order. was (Author: zentol): > As far as I understand, op1 and op2 should have watermark 1 and 2 > respectively, because those subtasks don't have any events in them (besides 1 > and 2, of course, which create those watermarks). Then, why do they get the > maximum watermark of 7 after first step? Watermarks are broadcasted to all downstream operators. So, if a source emits watermark 7, then all maps get it as an input. > Also, why after op1 and op2, op4 and op5 get consumed? Is there any strategy > that dictates in which order to process subtasks? Whichever arrives first at the downstream operator. You have 7 map subtasks all sending data to the window operators at the same time, and the order in which they arrive is not deterministic. So they _may_ arrive in a perfect round-robin pattern, or sequentially, or in any other pattern. The only guarantee is that the elements from a particular map subtask arrive in the same order that they were sent in. > Also, why op3 to op7 have such watermarks: W2, W5, W6, W6, W6, after first > step? I thought those subtasks should have watermark of Long.MinValue, > because there were no elements before? The above 2 comments should answer it; all watermarks are sent to all map subtasks, and are consumed by the window operator in any order. > Mapping a data source before window aggregation causes Flink to stop handle > late events correctly. > -------------------------------------------------------------------------------------------------- > > Key: FLINK-28719 > URL: https://issues.apache.org/jira/browse/FLINK-28719 > Project: Flink > Issue Type: Bug > Components: API / DataStream > Affects Versions: 1.15.1 > Reporter: Mykyta Mykhailenko > Priority: Major > > I have created a > [repository|https://github.com/mykytamykhailenko/flink-map-with-issue] where > I describe this issue in detail. > I have provided a few tests and source code so that you can reproduce the > issue on your own machine. -- This message was sent by Atlassian Jira (v8.20.10#820010)