[jira] [Commented] (BEAM-175) Leak garbage collection timers in GlobalWindow

2016-05-16 Thread Ben Chambers (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284759#comment-15284759
 ] 

Ben Chambers commented on BEAM-175:
---

I think in general this seems OK. I like that we're making the behavior 
explicit rather than trying to guessi t.

1. This may be too much configuration. There may be other/better ways of 
getting pane indices (eg., once we have a sink API or a state API). We should 
make sure we understand the use cases before exposing the knob. Especially 
since ReduceFnRunner is already complicated -- this is just adding more 
permutations of cases it needs to handle.
2. I worry that the default, at least in the Pane Index case, is actually 
*less* performant than the ZERO case, which is likely what was desired 90% of 
the time. If we go this direction, I would propose we change the default.
3. You should flesh this out to address error cases. When do we detect that the 
user is accessing the PaneIndex with the ZERO behavior specified. What kind of 
error message do they get? Etc.

> Leak garbage collection timers in GlobalWindow
> --
>
> Key: BEAM-175
> URL: https://issues.apache.org/jira/browse/BEAM-175
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Mark Shields
>Assignee: Mark Shields
>
> Consider the  transform:
>   Window
> .into(new GlobalWindows())
> .triggering(
>   Repeatedly.forever(
> AfterProcessingTime.pastFirstElementInPane().plusDelayOf(...)))
> .discardingFiredPanes()
> This is a common idiom for 'process elements bunched by arrival time'.
> Currently we create an end-of-window timer per key, which clearly will only 
> fire if the pipeline is drained.
> Better would be to avoid creating end-of-window timers if there's no state 
> which needs to be processed at end-of-window (ie at drain if the Global 
> window).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-175) Leak garbage collection timers in GlobalWindow

2016-04-27 Thread Mark Shields (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260884#comment-15260884
 ] 

Mark Shields commented on BEAM-175:
---

Just to note: there's a few overflow-from-GlobalWindow.maxTimestamp bugs which 
need to be fixed as part of this change.

> Leak garbage collection timers in GlobalWindow
> --
>
> Key: BEAM-175
> URL: https://issues.apache.org/jira/browse/BEAM-175
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Mark Shields
>Assignee: Mark Shields
>
> Consider the  transform:
>   Window
> .into(new GlobalWindows())
> .triggering(
>   Repeatedly.forever(
> AfterProcessingTime.pastFirstElementInPane().plusDelayOf(...)))
> .discardingFiredPanes()
> This is a common idiom for 'process elements bunched by arrival time'.
> Currently we create an end-of-window timer per key, which clearly will only 
> fire if the pipeline is drained.
> Better would be to avoid creating end-of-window timers if there's no state 
> which needs to be processed at end-of-window (ie at drain if the Global 
> window).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-175) Leak garbage collection timers in GlobalWindow

2016-04-20 Thread Mark Shields (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250716#comment-15250716
 ] 

Mark Shields commented on BEAM-175:
---

Actually, BigQuerySink is fine since the Reshuffle is short-circuited to avoid 
ReduceFnRunner and its associated timers, etc.

So this will only hit folks who are explicitly GBKing in the GlobalWindow.

> Leak garbage collection timers in GlobalWindow
> --
>
> Key: BEAM-175
> URL: https://issues.apache.org/jira/browse/BEAM-175
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Mark Shields
>Assignee: Mark Shields
>
> Consider the  transform:
>   Window
> .into(new GlobalWindows())
> .triggering(
>   Repeatedly.forever(
> AfterProcessingTime.pastFirstElementInPane().plusDelayOf(...)))
> .discardingFiredPanes()
> This is a common idiom for 'process elements bunched by arrival time'.
> Currently we create an end-of-window timer per key, which clearly will only 
> fire if the pipeline is drained.
> Better would be to avoid creating end-of-window timers if there's no state 
> which needs to be processed at end-of-window (ie at drain if the Global 
> window).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-175) Leak garbage collection timers in GlobalWindow

2016-04-05 Thread Mark Shields (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227344#comment-15227344
 ] 

Mark Shields commented on BEAM-175:
---

For the Global window every pane will be EARLY, so it is just getIndex standing 
between us and nirvana.

I'll code up a fix which replaces the monotonic index with a random, fix the 
timers, and we can at least see where we're at.

> Leak garbage collection timers in GlobalWindow
> --
>
> Key: BEAM-175
> URL: https://issues.apache.org/jira/browse/BEAM-175
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Mark Shields
>Assignee: Mark Shields
>
> Consider the  transform:
>   Window
> .into(new GlobalWindows())
> .triggering(
>   Repeatedly.forever(
> AfterProcessingTime.pastFirstElementInPane().plusDelayOf(...)))
> .discardingFiredPanes()
> This is a common idiom for 'process elements bunched by arrival time'.
> Currently we create an end-of-window timer per key, which clearly will only 
> fire if the pipeline is drained.
> Better would be to avoid creating end-of-window timers if there's no state 
> which needs to be processed at end-of-window (ie at drain if the Global 
> window).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-175) Leak garbage collection timers in GlobalWindow

2016-04-05 Thread Mark Shields (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227319#comment-15227319
 ] 

Mark Shields commented on BEAM-175:
---

So we can be clever about creating and deleting end-of-window timers when state 
comes in. I think that's a worthwhile fix even with the PaneInfo tracking 
problem since pane info is plain-old state (easy to flush to backing store) but 
timers are more fiddly and may need to be resident.

Getting rid of PaneInfo tracking is tricky. As a first step we could abandon 
PaneInfo.getIndex and PaneInfo.getNonSpeculativeIndex, which were added in-lieu 
of user-accessible state. (Idea is user can pass those indexes to external 
systems for disambiguation. Eg filename, key). That leaves making sure the 
EARLY / ON_TIME / LATE timing transitions are handled correctly. Currently the 
previous pane's timing is used to prevent multiple ON_TIME firings. Ie once 
ON_TIME, all subsequent are LATE even if the output watermark has not yet gone 
byond the end of the window.

So I think we should consider this fix for Beam only since it requires a 
breaking API change.

> Leak garbage collection timers in GlobalWindow
> --
>
> Key: BEAM-175
> URL: https://issues.apache.org/jira/browse/BEAM-175
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Mark Shields
>Assignee: Mark Shields
>
> Consider the  transform:
>   Window
> .into(new GlobalWindows())
> .triggering(
>   Repeatedly.forever(
> AfterProcessingTime.pastFirstElementInPane().plusDelayOf(...)))
> .discardingFiredPanes()
> This is a common idiom for 'process elements bunched by arrival time'.
> Currently we create an end-of-window timer per key, which clearly will only 
> fire if the pipeline is drained.
> Better would be to avoid creating end-of-window timers if there's no state 
> which needs to be processed at end-of-window (ie at drain if the Global 
> window).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-175) Leak garbage collection timers in GlobalWindow

2016-04-05 Thread Mark Shields (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227290#comment-15227290
 ] 

Mark Shields commented on BEAM-175:
---

Extra problem, for no extra cost:
In the global window we dutifully increment the pane index for each successive 
pane.
So even if we fix the timer problem we still have 
state-per-key-till-end-of-time.

Thankfully we don't store 'false' in the trigger finished bits, so we don't 
have to worry about that for Repeatedly triggers. 

> Leak garbage collection timers in GlobalWindow
> --
>
> Key: BEAM-175
> URL: https://issues.apache.org/jira/browse/BEAM-175
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Mark Shields
>Assignee: Mark Shields
>
> Consider the  transform:
>   Window
> .into(new GlobalWindows())
> .triggering(
>   Repeatedly.forever(
> AfterProcessingTime.pastFirstElementInPane().plusDelayOf(...)))
> .discardingFiredPanes()
> This is a common idiom for 'process elements bunched by arrival time'.
> Currently we create an end-of-window timer per key, which clearly will only 
> fire if the pipeline is drained.
> Better would be to avoid creating end-of-window timers if there's no state 
> which needs to be processed at end-of-window (ie at drain if the Global 
> window).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)