[ https://issues.apache.org/jira/browse/BEAM-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Beam JIRA Bot updated BEAM-4702: -------------------------------- Labels: stale-P2 (was: ) > After SQL GROUP BY <windowing> the result should be globally windowed > --------------------------------------------------------------------- > > Key: BEAM-4702 > URL: https://issues.apache.org/jira/browse/BEAM-4702 > Project: Beam > Issue Type: New Feature > Components: dsl-sql > Reporter: Kenneth Knowles > Priority: P2 > Labels: stale-P2 > Time Spent: 1h 10m > Remaining Estimate: 0h > > Beam SQL runs in two contexts: > 1. As a PTransform in a pipeline. A PTransform operates on a PCollection, > which is always implicitly windows and a PTransform should operate per-window > so it automatically works on bounded and unbounded data. This only works if > the query has no windowing operators, in which case the GROUP BY <non-window > stuff> should operate per-window. > 2. As a top-level shell that starts and ends with SQL. In the relational > model there are no implicit windows. Calcite has some extensions for > windowing, but they manifest (IMO correctly) as just items in the GROUP BY > list. The output of the aggregation is "just rows" again. So it should be > globally windowed. > The problem is that this semantic fix makes it so we cannot join windowing > stream subqueries. Because we don't have retractions, we only support > GroupByKey-based equijoins over windowed streams, with the default trigger. > _These joins implicitly also join windows_. For example: > {code} > JOIN(left.id = right.id) > SELECT ... GROUP BY id, TUMBLE(1 hour) > SELECT ... GROUP BY id, TUMBLE(1 hour) > {code} > Semantically, there may be a joined row for 1:00pm on the left and 10:00pm on > the right. But by the time the right-hand row for 10:00pm shows up, the left > one may be GC'd. So this is implicitly, but nondeterministically, joining on > the window as well. Before this PR, we left the windowing strategies for left > and right in place, and asserted that they matched. > If we re-window into the global window always, there _are no windowed > streams_ so you just can't do stream joins. The solution is probably to track > which field of a stream is the window and allow joins which also explicitly > express the equijoin over the window field. -- This message was sent by Atlassian Jira (v8.3.4#803005)