[
https://issues.apache.org/jira/browse/BEAM-8212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209371#comment-17209371
]
Sam Whittle commented on BEAM-8212:
-----------------------------------
This also affects Dataflow pipelines. Timers are per-key, so in-memory
deduping doesn't work if the keyspace is unbounded as they are all unique.
From the streaming runner point of view these timers may be valid end of
global window timers due to the triggering policy. For stateful garbage
collection I think that we should just not set these timers as they only
possibly fire once the pipeline is drained or completing and the state can be
GC'd for the entire pipeline at once instead.
> StatefulParDoFn creates GC timers for every record
> ---------------------------------------------------
>
> Key: BEAM-8212
> URL: https://issues.apache.org/jira/browse/BEAM-8212
> Project: Beam
> Issue Type: Bug
> Components: runner-core
> Reporter: Akshay Iyangar
> Assignee: Sam Whittle
> Priority: P3
>
> Hi
> So currently the StatefulParDoFn create timers for all the records.
> [https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/StatefulDoFnRunner.java#L211]
> This becomes a problem if you are using GlobalWindows for streaming where
> these timers get created and never get closed since the window will never
> close.
> This is a problem especially if your memory bound in rocksDB where these
> timers take up potential space and sloe the pipelines considerably.
> Was wondering that if the pipeline runs in global windows we should avoid
> adding timers to it at all?
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)