[ 
https://issues.apache.org/jira/browse/BEAM-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beam JIRA Bot updated BEAM-4702:
--------------------------------
    Labels: stale-P2  (was: )

> After SQL GROUP BY <windowing> the result should be globally windowed
> ---------------------------------------------------------------------
>
>                 Key: BEAM-4702
>                 URL: https://issues.apache.org/jira/browse/BEAM-4702
>             Project: Beam
>          Issue Type: New Feature
>          Components: dsl-sql
>            Reporter: Kenneth Knowles
>            Priority: P2
>              Labels: stale-P2
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Beam SQL runs in two contexts:
> 1. As a PTransform in a pipeline. A PTransform operates on a PCollection, 
> which is always implicitly windows and a PTransform should operate per-window 
> so it automatically works on bounded and unbounded data. This only works if 
> the query has no windowing operators, in which case the GROUP BY <non-window 
> stuff> should operate per-window.
> 2. As a top-level shell that starts and ends with SQL. In the relational 
> model there are no implicit windows. Calcite has some extensions for 
> windowing, but they manifest (IMO correctly) as just items in the GROUP BY 
> list. The output of the aggregation is "just rows" again. So it should be 
> globally windowed.
> The problem is that this semantic fix makes it so we cannot join windowing 
> stream subqueries. Because we don't have retractions, we only support 
> GroupByKey-based equijoins over windowed streams, with the default trigger. 
> _These joins implicitly also join windows_. For example:
> {code}
> JOIN(left.id = right.id)
>   SELECT ... GROUP BY id, TUMBLE(1 hour)
>   SELECT ... GROUP BY id, TUMBLE(1 hour)  
> {code}
> Semantically, there may be a joined row for 1:00pm on the left and 10:00pm on 
> the right. But by the time the right-hand row for 10:00pm shows up, the left 
> one may be GC'd. So this is implicitly, but nondeterministically, joining on 
> the window as well. Before this PR, we left the windowing strategies for left 
> and right in place, and asserted that they matched.
> If we re-window into the global window always, there _are no windowed 
> streams_ so you just can't do stream joins. The solution is probably to track 
> which field of a stream is the window and allow joins which also explicitly 
> express the equijoin over the window field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to