On Jan 28, 2015, at 10:02 AM, Yi Pan <nickpa...@gmail.com> wrote:

> I try to understand your comments below: "But there is not a simple
> mapping between
> true SQL and a data-flow graph that you can execute." What is the specific
> meaning of this statement? Could you elaborate on this a bit more?

The structure of a SQL query (and its AST) is different to the structure of the 
relational algebra that it translates to. The elements of a SQL query are its 
clauses (FROM, WHERE, GROUP BY, SELECT, HAVING, ORDER BY) and the elements of a 
relational algebra expression are the relational operators (scan, join, filter, 
aggregate, project, sort) and for simple queries there is a simple mapping. But 
the mapping becomes complex when there are sub-queries and especially 
correlations, but even a 3-way outer join can be complex. In Calcite, 
SqlToRelConverter, which performs this task, started off 100 lines long and is 
now 5,000.

My point was that you shouldn’t conflate the SQL AST with the logical algebra. 
It sounds like the point is already taken.

In non-streaming databases, it is almost possible to execute the logical 
algebra as is. (You need to use iterators, i.e. convert relations into streams, 
and when joining, you need to be careful not to create cartesian products 
before you start applying filters, but otherwise you’re safe.)

But in streaming databases, the logical algebra is not implementable. You 
cannot literally implement the stream-to-relation or relation-to-stream 
operators, or, heaven forbid, the r-stream, that re-transmits the whole table 
every clock-tick. So in addition to the logical algebra you need a physical 
algebra. The stream-to-relation and relation-to-stream operators are in the 
logical algebra but very likely have disappeared by the time you get to the 
physical algebra. And the physical algebra introduces new constructs like 
lookups into time-varying materializations and partitioning.

Julian

Reply via email to