Short answer: yes, we should allow it.

The design falls into 3 parts:
* Validation. We should allow any combination: table-table, stream-table and 
stream-stream joins, as long as the query can make progress. That often means 
that where a stream is involved, the join condition should involve a monotonic 
expression. If it is a stream-table join you can make progress without the 
monotonic expression, but if there are 2 streams you will need it.
* Translation to relational algebra. Inspired by differential calculus’ product 
rule[1], "stream(x join y)" becomes "x join stream(y) union all stream(x) join 
y". Suppose that products is a table (i.e. we do not receive notifications of 
new products); then "stream(products)" is empty. Suppose that orders is a both 
a stream and a table; i.e. a stream with history. Because stream(products) is 
empty, "stream(products join orders)" is simply “products join stream(orders)”. 
These rewrites would happen in a DeltaJoinTransposeRule. 
* Updates to relations. Suppose that the products table is updated two or three 
times during each day. How quickly does the end user expect those updated 
records to appear in the output of the stream-table join? If the table is 
updated at 10am, should the new data be loaded only when processing 
transactions from 10am (which might not hit the join until say 10:07am). There 
is no ‘right answer’ here; we should offer the end user a choice of policies. A 
good basic policy would be “cache for no more than T seconds” or “cache as long 
as you like” but give a manual way to flush the cache.

Can you please log a jira case to track this? Next step would be to write some 
sample queries and decide whether they are valid.

Julian

[1] https://en.wikipedia.org/wiki/Product_rule

> On Nov 13, 2015, at 9:35 PM, Milinda Pathirage <[email protected]> wrote:
> 
> Hi devs,
> 
> Current SqlValidatorImpl doesn't allow queries like following:
> 
> select stream orders.orderId, orders.productId, products.name from
> orders join products on orders.productId = products.id
> 
> 
> if the 'products' is a relation. This query fails at the modality check.
> But I am not sure whether fixing (or changing)  the modality checking logic
> is enough to solve this. Do we need to change planner rules as well. Really
> appreciate any ideas on this.
> 
> Thanks
> Milinda
> 
> p.s. I am trying to get this base case working where every element from a
> stream is joined with a relation. stream-to-stream joins requires changes
> to parser as well to support windowing. That's my understanding, Julian may
> have better ideas.
> 
> -- 
> Milinda Pathirage
> 
> PhD Student | Research Assistant
> School of Informatics and Computing | Data to Insight Center
> Indiana University
> 
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org

Reply via email to