[jira] [Commented] (BEAM-175) Leak garbage collection timers in GlobalWindow
[ https://issues.apache.org/jira/browse/BEAM-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284759#comment-15284759 ] Ben Chambers commented on BEAM-175: --- I think in general this seems OK. I like that we're making the behavior explicit rather than trying to guessi t. 1. This may be too much configuration. There may be other/better ways of getting pane indices (eg., once we have a sink API or a state API). We should make sure we understand the use cases before exposing the knob. Especially since ReduceFnRunner is already complicated -- this is just adding more permutations of cases it needs to handle. 2. I worry that the default, at least in the Pane Index case, is actually *less* performant than the ZERO case, which is likely what was desired 90% of the time. If we go this direction, I would propose we change the default. 3. You should flesh this out to address error cases. When do we detect that the user is accessing the PaneIndex with the ZERO behavior specified. What kind of error message do they get? Etc. > Leak garbage collection timers in GlobalWindow > -- > > Key: BEAM-175 > URL: https://issues.apache.org/jira/browse/BEAM-175 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Mark Shields >Assignee: Mark Shields > > Consider the transform: > Window > .into(new GlobalWindows()) > .triggering( > Repeatedly.forever( > AfterProcessingTime.pastFirstElementInPane().plusDelayOf(...))) > .discardingFiredPanes() > This is a common idiom for 'process elements bunched by arrival time'. > Currently we create an end-of-window timer per key, which clearly will only > fire if the pipeline is drained. > Better would be to avoid creating end-of-window timers if there's no state > which needs to be processed at end-of-window (ie at drain if the Global > window). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-175) Leak garbage collection timers in GlobalWindow
[ https://issues.apache.org/jira/browse/BEAM-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260884#comment-15260884 ] Mark Shields commented on BEAM-175: --- Just to note: there's a few overflow-from-GlobalWindow.maxTimestamp bugs which need to be fixed as part of this change. > Leak garbage collection timers in GlobalWindow > -- > > Key: BEAM-175 > URL: https://issues.apache.org/jira/browse/BEAM-175 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Mark Shields >Assignee: Mark Shields > > Consider the transform: > Window > .into(new GlobalWindows()) > .triggering( > Repeatedly.forever( > AfterProcessingTime.pastFirstElementInPane().plusDelayOf(...))) > .discardingFiredPanes() > This is a common idiom for 'process elements bunched by arrival time'. > Currently we create an end-of-window timer per key, which clearly will only > fire if the pipeline is drained. > Better would be to avoid creating end-of-window timers if there's no state > which needs to be processed at end-of-window (ie at drain if the Global > window). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-175) Leak garbage collection timers in GlobalWindow
[ https://issues.apache.org/jira/browse/BEAM-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250716#comment-15250716 ] Mark Shields commented on BEAM-175: --- Actually, BigQuerySink is fine since the Reshuffle is short-circuited to avoid ReduceFnRunner and its associated timers, etc. So this will only hit folks who are explicitly GBKing in the GlobalWindow. > Leak garbage collection timers in GlobalWindow > -- > > Key: BEAM-175 > URL: https://issues.apache.org/jira/browse/BEAM-175 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Mark Shields >Assignee: Mark Shields > > Consider the transform: > Window > .into(new GlobalWindows()) > .triggering( > Repeatedly.forever( > AfterProcessingTime.pastFirstElementInPane().plusDelayOf(...))) > .discardingFiredPanes() > This is a common idiom for 'process elements bunched by arrival time'. > Currently we create an end-of-window timer per key, which clearly will only > fire if the pipeline is drained. > Better would be to avoid creating end-of-window timers if there's no state > which needs to be processed at end-of-window (ie at drain if the Global > window). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-175) Leak garbage collection timers in GlobalWindow
[ https://issues.apache.org/jira/browse/BEAM-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227344#comment-15227344 ] Mark Shields commented on BEAM-175: --- For the Global window every pane will be EARLY, so it is just getIndex standing between us and nirvana. I'll code up a fix which replaces the monotonic index with a random, fix the timers, and we can at least see where we're at. > Leak garbage collection timers in GlobalWindow > -- > > Key: BEAM-175 > URL: https://issues.apache.org/jira/browse/BEAM-175 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Mark Shields >Assignee: Mark Shields > > Consider the transform: > Window > .into(new GlobalWindows()) > .triggering( > Repeatedly.forever( > AfterProcessingTime.pastFirstElementInPane().plusDelayOf(...))) > .discardingFiredPanes() > This is a common idiom for 'process elements bunched by arrival time'. > Currently we create an end-of-window timer per key, which clearly will only > fire if the pipeline is drained. > Better would be to avoid creating end-of-window timers if there's no state > which needs to be processed at end-of-window (ie at drain if the Global > window). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-175) Leak garbage collection timers in GlobalWindow
[ https://issues.apache.org/jira/browse/BEAM-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227319#comment-15227319 ] Mark Shields commented on BEAM-175: --- So we can be clever about creating and deleting end-of-window timers when state comes in. I think that's a worthwhile fix even with the PaneInfo tracking problem since pane info is plain-old state (easy to flush to backing store) but timers are more fiddly and may need to be resident. Getting rid of PaneInfo tracking is tricky. As a first step we could abandon PaneInfo.getIndex and PaneInfo.getNonSpeculativeIndex, which were added in-lieu of user-accessible state. (Idea is user can pass those indexes to external systems for disambiguation. Eg filename, key). That leaves making sure the EARLY / ON_TIME / LATE timing transitions are handled correctly. Currently the previous pane's timing is used to prevent multiple ON_TIME firings. Ie once ON_TIME, all subsequent are LATE even if the output watermark has not yet gone byond the end of the window. So I think we should consider this fix for Beam only since it requires a breaking API change. > Leak garbage collection timers in GlobalWindow > -- > > Key: BEAM-175 > URL: https://issues.apache.org/jira/browse/BEAM-175 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Mark Shields >Assignee: Mark Shields > > Consider the transform: > Window > .into(new GlobalWindows()) > .triggering( > Repeatedly.forever( > AfterProcessingTime.pastFirstElementInPane().plusDelayOf(...))) > .discardingFiredPanes() > This is a common idiom for 'process elements bunched by arrival time'. > Currently we create an end-of-window timer per key, which clearly will only > fire if the pipeline is drained. > Better would be to avoid creating end-of-window timers if there's no state > which needs to be processed at end-of-window (ie at drain if the Global > window). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-175) Leak garbage collection timers in GlobalWindow
[ https://issues.apache.org/jira/browse/BEAM-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227290#comment-15227290 ] Mark Shields commented on BEAM-175: --- Extra problem, for no extra cost: In the global window we dutifully increment the pane index for each successive pane. So even if we fix the timer problem we still have state-per-key-till-end-of-time. Thankfully we don't store 'false' in the trigger finished bits, so we don't have to worry about that for Repeatedly triggers. > Leak garbage collection timers in GlobalWindow > -- > > Key: BEAM-175 > URL: https://issues.apache.org/jira/browse/BEAM-175 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Mark Shields >Assignee: Mark Shields > > Consider the transform: > Window > .into(new GlobalWindows()) > .triggering( > Repeatedly.forever( > AfterProcessingTime.pastFirstElementInPane().plusDelayOf(...))) > .discardingFiredPanes() > This is a common idiom for 'process elements bunched by arrival time'. > Currently we create an end-of-window timer per key, which clearly will only > fire if the pipeline is drained. > Better would be to avoid creating end-of-window timers if there's no state > which needs to be processed at end-of-window (ie at drain if the Global > window). -- This message was sent by Atlassian JIRA (v6.3.4#6332)