Re: Materialized view query rewriting with joins

Michael Mior Sun, 12 Jun 2016 11:34:02 -0700

I'm close to getting something working for the case of two tables. I don't
think it should be too challenging to extend to multiple tables. The gist
of the approach is to match a Join over two TableScans and then use
JoinToMultiJoin rule to get a MultiJoin instance. This is useful because
it's much easier to deal with with join conditions across more than two
tables with a MultiJoin than several join instances. From there, the plan
is to find a materialized view with matching conditions and rewrite the
original to join to a LogicalProject over a TableScan over the materialized
view.


To test this out, I created two instances of a new class called DummyTable
with infinite cost and created a materialized view over these tables. I
hoped this would be helpful for debugging because it forces Calcite to use
the materialized view to execute the query (since actually performing scans
over these tables has infinite cost).

The matching and substitution part seems to be working great, but Calcite
still fails to find a plan for the query. Any help would be appreciated.
Below are some relevant components of different parts of the plan. (Sorry,
they're quite long). Thanks!






Original query matched by the new rule

LogicalProject(following=[$5], time=[$1], tweet_id=[$0]): rowcount =
1.7976931348623157E308, cumulative cost = {1.7976931348623157E308
 rows, Infinity cpu, 0.0 io}, id = 641
  LogicalJoin(subset=[rel#636:Subset#22.NONE.[]], condition=[=($5, $3)],
joinType=[inner]): rowcount = 1.7976931348623157E308, cumulat
ive cost = {1.7976931348623157E308 rows, 0.0 cpu, 0.0 io}, id = 634
    DummyScan(subset=[rel#33:Subset#0.ENUMERABLE.[]], table=[[twissandra,
tweet]]): rowcount = 1.7976931348623157E308, cu
mulative cost = {inf}, id = 0
    DummyScan(subset=[rel#98:Subset#8.ENUMERABLE.[]], table=[[twissandra,
followers]]): rowcount = 1.7976931348623157E308
, cumulative cost = {inf}, id = 18

Original query rewritten to use a MultiJoin

LogicalProject(following=[$3], time=[$1], tweet_id=[$0]): rowcount = 1.0,
cumulative cost = {inf}, id = 667
  MultiJoin(joinFilter=[=($3, $2)], isFullOuterJoin=[false],
joinTypes=[[INNER, INNER]], outerJoinConditions=[[NULL, NULL]], projField
s=[[{0, 1}, {0}]]): rowcount = 1.0, cumulative cost = {inf}, id = 664
    LogicalProject(tweet_id=[$0], time=[$1], username=[$3]): rowcount =
1.7976931348623157E308, cumulative cost = {inf}, id = 651
      DummyScan(table=[[twissandra, tweet]]): rowcount =
1.7976931348623157E308, cumulative cost = {inf}, id = 0
    LogicalProject(following=[$1]): rowcount = 1.7976931348623157E308,
cumulative cost = {inf}, id = 652
      DummyScan(table=[[twissandra, followers]]): rowcount =
1.7976931348623157E308, cumulative cost = {inf}, id = 18

Replacement scan over the materialized view (this is what is substituted in
the original query)

LogicalProject(following=[$1], time=[$2], tweet_id=[$0]): rowcount = 100.0,
cumulative cost = {300.0 rows, 701.0 cpu, 0.0 io}, id = 69
3
  LogicalProject(username=[$0], time=[CAST($1):TIMESTAMP(0)],
tweet_id=[CAST($2):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE "ISO-88
59-1$en_US$primary"]): rowcount = 100.0, cumulative cost = {200.0 rows,
401.0 cpu, 0.0 io}, id = 44
    CassandraTableScan(table=[[twissandra, timeline]]): rowcount = 100.0,
cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 23

--
Michael Mior
michael.m...@gmail.com

2016-06-07 1:39 GMT-04:00 Amogh Margoor <amo...@qubole.com>:

> Sorry, I missed Lattices as I didn't know about Lattices without
> aggregations. Btw there is this rule where we try to replace star table
> scan by Tiles of Lattices with filters:
>
> https://github.com/qubole/quark/blob/master/optimizer/src/main/java/com/qubole/quark/planner/FilterAggStarRule.java
> I thought it might be useful reference if we are trying to optimise for
> materialised views defined on joins and filters(  e.g., MV defined by
> "select * from A join B on A.id=B.aid where A.x >10 and B.y < '2015-01-10'"
> )
>
> Regards,
> Amogh
>
> On Tue, Jun 7, 2016 at 7:06 AM, Julian Hyde <jh...@apache.org> wrote:
>
> > Yes, I think you could make Lattices work without aggregation, as long as
> > you keep the key assumption, which is that dropping joined tables does
> not
> > change the number of rows (so, primary key and foreign key integrity,
> > relationships are all many-to-one from the root (fact) table, and you
> have
> > to include the root table in your query).
> >
> > I might end up using lattices with and without aggregation in the Druid
> > adapter. Because Druid cannot do joins, I plan to map each Druid table
> to a
> > lattice, so that you can satisfy a star query using a Druid table. Druid
> is
> > *mainly* for aggregation, but Druid just introduced a “select” query[1]
> > that returns raw rows, and the Druid adapter can execute non-aggregation
> > queries.
> >
> > Once you’ve translated a query "A join B join C” onto a virtual table ABC
> > I suppose you could then apply other materialized view techniques.
> >
> > If you were able to implement the Goldstein & Larson algorithm it would
> be
> > interesting to compose it with other materialized view substitutions in
> > Calcite. After all, each materialized view substitution just generates
> more
> > alternative queries to optimize, so I suspect that we could compose
> > algorithms.
> >
> > Julian
> >
> > [1] http://druid.io/docs/latest/querying/select-query.html <
> > http://druid.io/docs/latest/querying/select-query.html>
> >
> >
> > > On Jun 6, 2016, at 6:06 PM, Michael Mior <mm...@uwaterloo.ca> wrote:
> > >
> > > Thanks for the pointer. Any chance there's a good example somewhere
> using
> > > lattices that would help me get something working? Also, is it possible
> > to
> > > have lattices with no aggregation? Unfortunately either way it doesn't
> > > completely suit my use case since I'm going to have to do rewriting on
> > > joins aside from star queries. That said if I can get basic equijoins
> > > working that would be enough for me.
> > >
> > > I started trying to write a unification rule but decided it would be
> > easier
> > > to do something like MaterializedViewFilterScanRule. The first problem
> > that
> > > I've run into is trying to get a Join instance in a rule where the left
> > and
> > > right inputs are TableScan instances. Even if I have a
> RelOptRuleOperand
> > > that looks like the following I end up with a join whose left and right
> > > inputs are RelSubsets.
> > >
> > > operand(LogicalJoin.class, operand(TableScan.class, any()),
> > > operand(TableScan.class, any()))
> > >
> > > Anyway, I've been looking into implementing the approach from the
> > following
> > > paper. It's very nicely written and easy to follow. It also doesn't
> > require
> > > trying different join permutations. I'm starting with several
> additional
> > > restrictions (only equijoins, no aggregations, etc.)
> > >
> > >
> >
> ftp://ftp.cse.buffalo.edu/users/azhang/disc/SIGMOD/pdf-files/331/202-optimizing.pdf
> > >
> > > --
> > > Michael Mior
> > > michael.m...@gmail.com
> > >
> > > 2016-06-06 19:34 GMT-04:00 Julian Hyde <jh...@apache.org>:
> > >
> > >> Short answer: yes, if you use lattices.
> > >>
> > >> Remember, Calcite has two mechanisms for matching materialized views:
> > the
> > >> "general purpose" mechanism, and lattices.
> > >>
> > >> Using general purpose mechanism you can declare a materialized view
> > based
> > >> on any query, but there needs to be a unification rule to rewrite your
> > >> query onto that view. (That is, rewrite your query using
> > >> semantics-preserving algebraic transformations such that one branch of
> > the
> > >> rewritten query is identical to the query that defines the
> materialized
> > >> view. See http://www.slideshare.net/julianhyde/calcite-algebraedw2015
> <
> > >> http://www.slideshare.net/julianhyde/calcite-algebraedw2015> slides
> > 15+.)
> > >> There are unification rules for scan, filter, project and some kinds
> of
> > >> aggregation, but not join.
> > >>
> > >> Knowing how difficult it was to write unification rules, and knowing
> how
> > >> common was the use case of star queries (scan - filter - join -
> > aggregate),
> > >> I invented lattices. First you define a lattice (a collection of
> tables
> > >> joined using many-to-one relationships), then you (or the system)
> > defines
> > >> materialized views on top of that lattice. If an incoming query uses
> > tables
> > >> from that lattice, joined using the same join keys, then Calcite will
> > >> consider substituting any materialized views in that lattice.
> > >>
> > >> Matching queries to lattice materialized views is much more efficient
> > than
> > >> matching to general purpose materialized views, two reasons. (a) The
> > joins
> > >> are removed, the materialized view and the query become just projects
> of
> > >> the lattice’s columns, and we avoid the combinatorial explosion that
> > occurs
> > >> when joins are permuted. (b) A lattice can contain dozens of
> > materialized
> > >> views but it is clear, because they are partially ordered, which is
> the
> > >> best for a given query, and they are all considered at the same time.
> > >>
> > >> Julian
> > >>
> > >>> On Jun 5, 2016, at 18:58, amo...@qubole.com wrote:
> > >>>
> > >>> AFAIK it doesn't. Not sure if some work has been done towards it.
> > >>> Sent from my iPhone
> > >>>
> > >>>> On 06-Jun-2016, at 7:08 AM, Michael Mior <mm...@uwaterloo.ca>
> wrote:
> > >>>>
> > >>>> Am I correct in understanding that Calcite doesn't currently rewrite
> > >>>> queries to use materialized views if the query which defines the
> view
> > >>>> includes a join? If this is the case, has there been any work
> towards
> > >>>> making this happen?
> > >>>>
> > >>>> --
> > >>>> Michael Mior
> > >>>> michael.m...@gmail.com
> > >>
> >
> >
>

Re: Materialized view query rewriting with joins

Reply via email to