[ 
https://issues.apache.org/jira/browse/FLINK-28719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578890#comment-17578890
 ] 

Chesnay Schepler edited comment on FLINK-28719 at 8/12/22 9:43 AM:
-------------------------------------------------------------------

??As far as I understand, op1 and op2 should have watermark 1 and 2 
respectively, because those subtasks don't have any events in them (besides 1 
and 2, of course, which create those watermarks). Then, why do they get the 
maximum watermark of 7 after first step???

Watermarks are broadcasted to all downstream operators. So, if a source emits 
watermark 7, then all map subtasks get it as an input.

??Also, why after op1 and op2, op4 and op5 get consumed? Is there any strategy 
that dictates in which order to process subtasks???

Whichever arrives first at the downstream operator. You have 7 map subtasks all 
sending data to the window operators at the same time, and the order in which 
they arrive is not deterministic. So they _may_ arrive in a perfect round-robin 
pattern, or sequentially, or in any other pattern. The only guarantee is that 
the elements from a particular map subtask arrive in the same order that they 
were sent in.

??Also, why op3 to op7 have such watermarks: W2, W5, W6, W6,  W6, after first 
step? I thought those subtasks should have watermark of Long.MinValue, because 
there were no elements before???

The above 2 comments should answer it; all watermarks are sent to all map 
subtasks, and are consumed by the window operator in any order.


was (Author: zentol):
??As far as I understand, op1 and op2 should have watermark 1 and 2 
respectively, because those subtasks don't have any events in them (besides 1 
and 2, of course, which create those watermarks). Then, why do they get the 
maximum watermark of 7 after first step???

Watermarks are broadcasted to all downstream operators. So, if a source emits 
watermark 7, then all maps get it as an input.

??Also, why after op1 and op2, op4 and op5 get consumed? Is there any strategy 
that dictates in which order to process subtasks???

Whichever arrives first at the downstream operator. You have 7 map subtasks all 
sending data to the window operators at the same time, and the order in which 
they arrive is not deterministic. So they _may_ arrive in a perfect round-robin 
pattern, or sequentially, or in any other pattern. The only guarantee is that 
the elements from a particular map subtask arrive in the same order that they 
were sent in.

??Also, why op3 to op7 have such watermarks: W2, W5, W6, W6,  W6, after first 
step? I thought those subtasks should have watermark of Long.MinValue, because 
there were no elements before???

The above 2 comments should answer it; all watermarks are sent to all map 
subtasks, and are consumed by the window operator in any order.

> Mapping a data source before window aggregation causes Flink to stop handle 
> late events correctly.
> --------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-28719
>                 URL: https://issues.apache.org/jira/browse/FLINK-28719
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataStream
>    Affects Versions: 1.15.1
>            Reporter: Mykyta Mykhailenko
>            Priority: Major
>
> I have created a 
> [repository|https://github.com/mykytamykhailenko/flink-map-with-issue] where 
> I describe this issue in detail. 
> I have provided a few tests and source code so that you can reproduce the 
> issue on your own machine. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to