Any further ideas of what might be going on or suggestions on how to debug
this? I can put together a sample of the code I have so far if anyone is
willing to spend a bit of time looking through it (~300 LOC). Thanks!

--
Michael Mior
mm...@uwaterloo.ca

2016-06-13 14:49 GMT-04:00 Michael Mior <mm...@uwaterloo.ca>:

> Julian,
>
> I understand that MultiJoin cannot be implemented directly. I'm only using
> it to collect information on multiple joins into a single node since it
> makes further steps of the algorithm much easier. Ultimate the rewrite rule
> is attempting to match a LogicalProject on top of a LogicalJoin of two
> TableScans and convert it into a LogicalProject on top of a TableScan of
> the materialized view. The MultiJoin is only used internally within the
> rule and I never actually perform any transformation using the MultiJoin.
> The only transformation I am attempting is to replace a Join with a
> TableScan. This is what is failing.
>
> --
> Michael Mior
> michael.m...@gmail.com
>
> 2016-06-13 14:44 GMT-04:00 Julian Hyde <jh...@apache.org>:
>
>> Using MultiJoin is a good idea. But remember that it was designed with
>> only one, very specific purpose: to gather joins to be optimized by a
>> greedy algorithm.
>>
>> MultiJoin cannot be implemented directly. In fact the only rule that can
>> digest it and convert it into regular Join nodes is LoptOptimizeJoinRule.
>> That might explain why you cannot find a plan.
>>
>> Julian
>>
>>
>> > On Jun 12, 2016, at 11:33 AM, Michael Mior <mm...@uwaterloo.ca> wrote:
>> >
>> > I'm close to getting something working for the case of two tables. I
>> don't
>> > think it should be too challenging to extend to multiple tables. The
>> gist
>> > of the approach is to match a Join over two TableScans and then use
>> > JoinToMultiJoin rule to get a MultiJoin instance. This is useful because
>> > it's much easier to deal with with join conditions across more than two
>> > tables with a MultiJoin than several join instances. From there, the
>> plan
>> > is to find a materialized view with matching conditions and rewrite the
>> > original to join to a LogicalProject over a TableScan over the
>> materialized
>> > view.
>> >
>> > To test this out, I created two instances of a new class called
>> DummyTable
>> > with infinite cost and created a materialized view over these tables. I
>> > hoped this would be helpful for debugging because it forces Calcite to
>> use
>> > the materialized view to execute the query (since actually performing
>> scans
>> > over these tables has infinite cost).
>> >
>> > The matching and substitution part seems to be working great, but
>> Calcite
>> > still fails to find a plan for the query. Any help would be appreciated.
>> > Below are some relevant components of different parts of the plan.
>> (Sorry,
>> > they're quite long). Thanks!
>> >
>> >
>> >
>> >
>> >
>> >
>> > Original query matched by the new rule
>> >
>> > LogicalProject(following=[$5], time=[$1], tweet_id=[$0]): rowcount =
>> > 1.7976931348623157E308, cumulative cost = {1.7976931348623157E308
>> > rows, Infinity cpu, 0.0 io}, id = 641
>> >  LogicalJoin(subset=[rel#636:Subset#22.NONE.[]], condition=[=($5, $3)],
>> > joinType=[inner]): rowcount = 1.7976931348623157E308, cumulat
>> > ive cost = {1.7976931348623157E308 rows, 0.0 cpu, 0.0 io}, id = 634
>> >    DummyScan(subset=[rel#33:Subset#0.ENUMERABLE.[]], table=[[twissandra,
>> > tweet]]): rowcount = 1.7976931348623157E308, cu
>> > mulative cost = {inf}, id = 0
>> >    DummyScan(subset=[rel#98:Subset#8.ENUMERABLE.[]], table=[[twissandra,
>> > followers]]): rowcount = 1.7976931348623157E308
>> > , cumulative cost = {inf}, id = 18
>> >
>> > Original query rewritten to use a MultiJoin
>> >
>> > LogicalProject(following=[$3], time=[$1], tweet_id=[$0]): rowcount =
>> 1.0,
>> > cumulative cost = {inf}, id = 667
>> >  MultiJoin(joinFilter=[=($3, $2)], isFullOuterJoin=[false],
>> > joinTypes=[[INNER, INNER]], outerJoinConditions=[[NULL, NULL]],
>> projField
>> > s=[[{0, 1}, {0}]]): rowcount = 1.0, cumulative cost = {inf}, id = 664
>> >    LogicalProject(tweet_id=[$0], time=[$1], username=[$3]): rowcount =
>> > 1.7976931348623157E308, cumulative cost = {inf}, id = 651
>> >      DummyScan(table=[[twissandra, tweet]]): rowcount =
>> > 1.7976931348623157E308, cumulative cost = {inf}, id = 0
>> >    LogicalProject(following=[$1]): rowcount = 1.7976931348623157E308,
>> > cumulative cost = {inf}, id = 652
>> >      DummyScan(table=[[twissandra, followers]]): rowcount =
>> > 1.7976931348623157E308, cumulative cost = {inf}, id = 18
>> >
>> > Replacement scan over the materialized view (this is what is
>> substituted in
>> > the original query)
>> >
>> > LogicalProject(following=[$1], time=[$2], tweet_id=[$0]): rowcount =
>> 100.0,
>> > cumulative cost = {300.0 rows, 701.0 cpu, 0.0 io}, id = 69
>> > 3
>> >  LogicalProject(username=[$0], time=[CAST($1):TIMESTAMP(0)],
>> > tweet_id=[CAST($2):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE "ISO-88
>> > 59-1$en_US$primary"]): rowcount = 100.0, cumulative cost = {200.0 rows,
>> > 401.0 cpu, 0.0 io}, id = 44
>> >    CassandraTableScan(table=[[twissandra, timeline]]): rowcount = 100.0,
>> > cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 23
>> >
>> > --
>> > Michael Mior
>> > michael.m...@gmail.com <mailto:michael.m...@gmail.com>
>> >
>> > 2016-06-07 1:39 GMT-04:00 Amogh Margoor <amo...@qubole.com <mailto:
>> amo...@qubole.com>>:
>> >
>> >> Sorry, I missed Lattices as I didn't know about Lattices without
>> >> aggregations. Btw there is this rule where we try to replace star table
>> >> scan by Tiles of Lattices with filters:
>> >>
>> >>
>> https://github.com/qubole/quark/blob/master/optimizer/src/main/java/com/qubole/quark/planner/FilterAggStarRule.java
>> <
>> https://github.com/qubole/quark/blob/master/optimizer/src/main/java/com/qubole/quark/planner/FilterAggStarRule.java
>> >
>> >> I thought it might be useful reference if we are trying to optimise for
>> >> materialised views defined on joins and filters(  e.g., MV defined by
>> >> "select * from A join B on A.id=B.aid where A.x >10 and B.y <
>> '2015-01-10'"
>> >> )
>> >>
>> >> Regards,
>> >> Amogh
>> >>
>> >> On Tue, Jun 7, 2016 at 7:06 AM, Julian Hyde <jh...@apache.org <mailto:
>> jh...@apache.org>> wrote:
>> >>
>> >>> Yes, I think you could make Lattices work without aggregation, as
>> long as
>> >>> you keep the key assumption, which is that dropping joined tables does
>> >> not
>> >>> change the number of rows (so, primary key and foreign key integrity,
>> >>> relationships are all many-to-one from the root (fact) table, and you
>> >> have
>> >>> to include the root table in your query).
>> >>>
>> >>> I might end up using lattices with and without aggregation in the
>> Druid
>> >>> adapter. Because Druid cannot do joins, I plan to map each Druid table
>> >> to a
>> >>> lattice, so that you can satisfy a star query using a Druid table.
>> Druid
>> >> is
>> >>> *mainly* for aggregation, but Druid just introduced a “select”
>> query[1]
>> >>> that returns raw rows, and the Druid adapter can execute
>> non-aggregation
>> >>> queries.
>> >>>
>> >>> Once you’ve translated a query "A join B join C” onto a virtual table
>> ABC
>> >>> I suppose you could then apply other materialized view techniques.
>> >>>
>> >>> If you were able to implement the Goldstein & Larson algorithm it
>> would
>> >> be
>> >>> interesting to compose it with other materialized view substitutions
>> in
>> >>> Calcite. After all, each materialized view substitution just generates
>> >> more
>> >>> alternative queries to optimize, so I suspect that we could compose
>> >>> algorithms.
>> >>>
>> >>> Julian
>> >>>
>> >>> [1] http://druid.io/docs/latest/querying/select-query.html <
>> >>> http://druid.io/docs/latest/querying/select-query.html <
>> http://druid.io/docs/latest/querying/select-query.html>>
>> >>>
>> >>>
>> >>>> On Jun 6, 2016, at 6:06 PM, Michael Mior <mm...@uwaterloo.ca> wrote:
>> >>>>
>> >>>> Thanks for the pointer. Any chance there's a good example somewhere
>> >> using
>> >>>> lattices that would help me get something working? Also, is it
>> possible
>> >>> to
>> >>>> have lattices with no aggregation? Unfortunately either way it
>> doesn't
>> >>>> completely suit my use case since I'm going to have to do rewriting
>> on
>> >>>> joins aside from star queries. That said if I can get basic equijoins
>> >>>> working that would be enough for me.
>> >>>>
>> >>>> I started trying to write a unification rule but decided it would be
>> >>> easier
>> >>>> to do something like MaterializedViewFilterScanRule. The first
>> problem
>> >>> that
>> >>>> I've run into is trying to get a Join instance in a rule where the
>> left
>> >>> and
>> >>>> right inputs are TableScan instances. Even if I have a
>> >> RelOptRuleOperand
>> >>>> that looks like the following I end up with a join whose left and
>> right
>> >>>> inputs are RelSubsets.
>> >>>>
>> >>>> operand(LogicalJoin.class, operand(TableScan.class, any()),
>> >>>> operand(TableScan.class, any()))
>> >>>>
>> >>>> Anyway, I've been looking into implementing the approach from the
>> >>> following
>> >>>> paper. It's very nicely written and easy to follow. It also doesn't
>> >>> require
>> >>>> trying different join permutations. I'm starting with several
>> >> additional
>> >>>> restrictions (only equijoins, no aggregations, etc.)
>> >>>>
>> >>>>
>> >>>
>> >>
>> ftp://ftp.cse.buffalo.edu/users/azhang/disc/SIGMOD/pdf-files/331/202-optimizing.pdf
>> >>>>
>> >>>> --
>> >>>> Michael Mior
>> >>>> michael.m...@gmail.com
>> >>>>
>> >>>> 2016-06-06 19:34 GMT-04:00 Julian Hyde <jh...@apache.org>:
>> >>>>
>> >>>>> Short answer: yes, if you use lattices.
>> >>>>>
>> >>>>> Remember, Calcite has two mechanisms for matching materialized
>> views:
>> >>> the
>> >>>>> "general purpose" mechanism, and lattices.
>> >>>>>
>> >>>>> Using general purpose mechanism you can declare a materialized view
>> >>> based
>> >>>>> on any query, but there needs to be a unification rule to rewrite
>> your
>> >>>>> query onto that view. (That is, rewrite your query using
>> >>>>> semantics-preserving algebraic transformations such that one branch
>> of
>> >>> the
>> >>>>> rewritten query is identical to the query that defines the
>> >> materialized
>> >>>>> view. See
>> http://www.slideshare.net/julianhyde/calcite-algebraedw2015
>> >> <
>> >>>>> http://www.slideshare.net/julianhyde/calcite-algebraedw2015> slides
>> >>> 15+.)
>> >>>>> There are unification rules for scan, filter, project and some kinds
>> >> of
>> >>>>> aggregation, but not join.
>> >>>>>
>> >>>>> Knowing how difficult it was to write unification rules, and knowing
>> >> how
>> >>>>> common was the use case of star queries (scan - filter - join -
>> >>> aggregate),
>> >>>>> I invented lattices. First you define a lattice (a collection of
>> >> tables
>> >>>>> joined using many-to-one relationships), then you (or the system)
>> >>> defines
>> >>>>> materialized views on top of that lattice. If an incoming query uses
>> >>> tables
>> >>>>> from that lattice, joined using the same join keys, then Calcite
>> will
>> >>>>> consider substituting any materialized views in that lattice.
>> >>>>>
>> >>>>> Matching queries to lattice materialized views is much more
>> efficient
>> >>> than
>> >>>>> matching to general purpose materialized views, two reasons. (a) The
>> >>> joins
>> >>>>> are removed, the materialized view and the query become just
>> projects
>> >> of
>> >>>>> the lattice’s columns, and we avoid the combinatorial explosion that
>> >>> occurs
>> >>>>> when joins are permuted. (b) A lattice can contain dozens of
>> >>> materialized
>> >>>>> views but it is clear, because they are partially ordered, which is
>> >> the
>> >>>>> best for a given query, and they are all considered at the same
>> time.
>> >>>>>
>> >>>>> Julian
>> >>>>>
>> >>>>>> On Jun 5, 2016, at 18:58, amo...@qubole.com wrote:
>> >>>>>>
>> >>>>>> AFAIK it doesn't. Not sure if some work has been done towards it.
>> >>>>>> Sent from my iPhone
>> >>>>>>
>> >>>>>>> On 06-Jun-2016, at 7:08 AM, Michael Mior <mm...@uwaterloo.ca>
>> >> wrote:
>> >>>>>>>
>> >>>>>>> Am I correct in understanding that Calcite doesn't currently
>> rewrite
>> >>>>>>> queries to use materialized views if the query which defines the
>> >> view
>> >>>>>>> includes a join? If this is the case, has there been any work
>> >> towards
>> >>>>>>> making this happen?
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Michael Mior
>> >>>>>>> michael.m...@gmail.com
>>
>>
>

Reply via email to