I have a very naive question. I know Jan suggested to use 2 successive fixed
overlapping windows with offset as a temporary solution to dedup the events.
However, I am wondering whether using a single fixed window of length let's say
1 day followed by a deduplicate function is a good alternative
> If the user chooses to create a window of 10 years, I'd say it is
expected behavior that the state will be kept for as long as this duration.
State will be kept, the problem is that each key in the window will
carry a cleanup timer, although there might be nothing to clear (there
is no state
If the user chooses to create a window of 10 years, I'd say it is
expected behavior that the state will be kept for as long as this duration.
GlobalWindows are different because they represent the default case
where the user does not even use windowing. I think it warrants to be
treated differ
Window triggering is afaik operation that is specific to GBK. Stateful
DoFns can have (as shown in the case of deduplication) timers set for
the GC only, triggering has no effect there. And yes, if we have other
timers than GC (any user timers), then we have to have GC timer (because
timers are
The inefficiency described happens if and only if the following two conditions
are met:
a) there are many timers per single window (as otherwise they will be
negligible)
b) there are many keys which actually contain no state (as otherwise the timer would be negligible wrt the state size)
On 8/25/20 9:27 PM, Maximilian Michels wrote:
I agree that this probably solves the described issue in the most
straightforward way, but special handling for global window feels
weird, as there is really nothing special about global window wrt
state cleanup.
Why is special handling for the
I agree that this probably solves the described issue in the most straightforward way, but special handling for global window feels weird, as there is really nothing special about global window wrt state cleanup.
Why is special handling for the global window weird? After all, it is a
special ca
I'd suggest a modified option (2) which does not use a timer to perform
the cleanup (as mentioned, this will cause problems with migrating state).
Instead, whenever we receive a watermark which closes the global window,
we enumerate all keys and cleanup the associated state.
This is the clean
Awesome! Thanks a lot for the memory profile. Couple remarks:
a) I can see that there are about 378k keys and each of them sets a timer.
b) Based on the settings for DeduplicatePerKey you posted, you will keep
track of all keys of the last 30 minutes.
Unless you have much fewer keys, the behav