[ https://issues.apache.org/jira/browse/FLINK-26334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17501135#comment-17501135 ]
realdengziqi commented on FLINK-26334: -------------------------------------- [~martijnvisser] Thanks for your attention, I've updated the problem description to make it clearer. And I came up with my solution in the description. Hope the community can assign this issue to me. thanks;) > When timestamp - offset + windowSize < 0, elements cannot be assigned to the > correct window > ------------------------------------------------------------------------------------------- > > Key: FLINK-26334 > URL: https://issues.apache.org/jira/browse/FLINK-26334 > Project: Flink > Issue Type: Bug > Components: API / DataStream > Affects Versions: 1.15.0, 1.14.3 > Environment: flink version 1.14.3 > Reporter: realdengziqi > Priority: Major > Attachments: image-2022-03-04-11-28-26-616.png, > image-2022-03-04-11-37-10-035.png > > Original Estimate: 3h > Remaining Estimate: 3h > > h2. issue > Hello! > When we were studying the flink source code, we found that there was > a problem with its algorithm for calculating the window start time. When > _timestamp - offset + windowSize < 0_ , the element will be incorrectly > allocated to a window with a WindowSize larger than its own timestamp. > The problem is in > _org.apache.flink.streaming.api.windowing.windows.TimeWindow_ > {code:java} > public static long getWindowStartWithOffset(long timestamp, long offset, long > windowSize) { > return timestamp - (timestamp - offset + windowSize) % windowSize; > } {code} > _!image-2022-03-04-11-28-26-616.png|width=710,height=251!_ > We believe that this violates the constraints between time and > window. That is, an element should fall within a window whose start time is > less than its own timestamp and whose end time is greater than its own > timestamp. However, the current situation is when {_}timestamp - offset + > windowSize < 0{_}, the element falls into a future time window. > h2. Solution > In fact, the original algorithm is no problem in python, the key to > this problem is the processing of the remainder operation by the programming > language. > We finally think that it should be modified to the following > algorithm. > {code:java} > public static long getWindowStartWithOffset(long timestamp, long offset, long > windowSize) { > return timestamp > - (timestamp - offset) % windowSize > - (windowSize & (timestamp - offset) >> 63); > } {code} > _windowSize & (timestamp - offset) >> 63_ The function of this > formula is to subtract windowSize from the overall operation result when > {_}timestamp - offset<0{_}, otherwise do nothing. This way we can handle both > positive and negative timestamps. > Finally, the element can be assigned to the correct window. > !image-2022-03-04-11-37-10-035.png|width=712,height=284! > This code can pass current unit tests. > h2. getWindowStartWithOffset methods in other packages > I think that there should be many places in > {_}getWindowStartWithOffset{_}. We searched for this method in the project > and found that the problem of negative timestamps is handled in _flink.table._ > Below is their source code. > > _{{org.apache.flink.table.runtime.operators.window.grouping.WindowsGrouping}}_ > {code:java} > private long getWindowStartWithOffset(long timestamp, long offset, long > windowSize) { > long remainder = (timestamp - offset) % windowSize; > // handle both positive and negative cases > if (remainder < 0) { > return timestamp - (remainder + windowSize); > } else { > return timestamp - remainder; > } > } {code} > h2. Can we make a pull request? > If the community deems it necessary to revise it, hopefully this task > can be handed over to us. Our members are all students who have just > graduated from school, and it is a great encouragement for us to contribute > code to flink. > Thank you so much! > From Deng Ziqi & Lin Wanni & Guo Yuanfang -- This message was sent by Atlassian Jira (v8.20.1#820001)