[ https://issues.apache.org/jira/browse/KAFKA-12718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355521#comment-17355521 ]
Matthias J. Sax edited comment on KAFKA-12718 at 6/2/21, 7:09 AM: ------------------------------------------------------------------ `key=[k1@0/5]` is the key of a session, with data key `k1` and session start time of 0 and session end time of 5. The format is `[dataKey@windowStart/windowEnd]`. Given the input data we observe and expected the following: The first record creates a new session `k1@0/0` – the second record extend the existing session (gap is set to 5) – for this case, we get a tombstone for the existing sessions and a second record for the new sessions. Thus after processing the first two input records, we have 3 output records. Seems the first 6 output records are actually the same as in the expected result, but output records 7 and 8 are not expected in the result. Given that grace-period is zero, the fourth input record `k2` with ts=6 actually closes the session `k1@0/5` (before your fix) and thus the 5th input record was not expected to produce any output – however, with the fix, `k2` does not close the window any longer, and thus we get more result records. I guess the goal of the test was to verify that the first session gets closed, so I think the right fix is to change the input data, ie, the timestamp of input record key=k2 should be changes from 6 to 11 to bump the time beyond session-end plus gap? was (Author: mjsax): `key=[k1@0/5]` is the key of a session, with data key `k1` and session start time of 0 and session end time of 5. The format is `[dataKey@windowStart/windowEnd]`. Given the input data we observe and expected the following: The first record creates a new session `k1@0/0` – the second record extend the existing session (gap is set to 5) – for this case, we get a tombstone for the existing sessions and a second record for the new sessions. Thus after processing the first two input records, we have 3 output records. Seems the first 6 output records are actually the same as in the expected result, but output records 7 and 8 are not expected in the result. Given that grace-period is zero, the fourth input record `k2` with ts=6 actually closes the session `k1@0/5` and thus the 5th input record should not result in any output. Thus, the expected result seems to be correct, while the observed output record 7 and 8 are incorrect: seems this is an issue introduced with your code change? Does this help? > SessionWindows are closed too early > ----------------------------------- > > Key: KAFKA-12718 > URL: https://issues.apache.org/jira/browse/KAFKA-12718 > Project: Kafka > Issue Type: Bug > Components: streams > Reporter: Matthias J. Sax > Assignee: Juan C. Gonzalez-Zurita > Priority: Major > Labels: beginner, easy-fix, newbie > Fix For: 3.0.0 > > > SessionWindows are defined based on a {{gap}} parameter, and also support an > additional {{grace-period}} configuration to handle out-of-order data. > To incorporate the session-gap a session window should only be closed at > {{window-end + gap}} and to incorporate grace-period, the close time should > be pushed out further to {{window-end + gap + grace}}. > However, atm we compute the window close time as {{window-end + grace}} > omitting the {{gap}} parameter. > Because default grace-period is 24h most users might not notice this issues. > Even if they set a grace period explicitly (eg, when using suppress()), they > would most likely set a grace-period larger than gap-time not hitting the > issue (or maybe only realize it when inspecting the behavior closely). > However, if a user wants to disable the grace-period and sets it to zero (on > any other value smaller than gap-time), sessions might be close too early and > user might notice. -- This message was sent by Atlassian Jira (v8.3.4#803005)