Julian, I understand that MultiJoin cannot be implemented directly. I'm only using it to collect information on multiple joins into a single node since it makes further steps of the algorithm much easier. Ultimate the rewrite rule is attempting to match a LogicalProject on top of a LogicalJoin of two TableScans and convert it into a LogicalProject on top of a TableScan of the materialized view. The MultiJoin is only used internally within the rule and I never actually perform any transformation using the MultiJoin. The only transformation I am attempting is to replace a Join with a TableScan. This is what is failing.
-- Michael Mior michael.m...@gmail.com 2016-06-13 14:44 GMT-04:00 Julian Hyde <jh...@apache.org>: > Using MultiJoin is a good idea. But remember that it was designed with > only one, very specific purpose: to gather joins to be optimized by a > greedy algorithm. > > MultiJoin cannot be implemented directly. In fact the only rule that can > digest it and convert it into regular Join nodes is LoptOptimizeJoinRule. > That might explain why you cannot find a plan. > > Julian > > > > On Jun 12, 2016, at 11:33 AM, Michael Mior <mm...@uwaterloo.ca> wrote: > > > > I'm close to getting something working for the case of two tables. I > don't > > think it should be too challenging to extend to multiple tables. The gist > > of the approach is to match a Join over two TableScans and then use > > JoinToMultiJoin rule to get a MultiJoin instance. This is useful because > > it's much easier to deal with with join conditions across more than two > > tables with a MultiJoin than several join instances. From there, the plan > > is to find a materialized view with matching conditions and rewrite the > > original to join to a LogicalProject over a TableScan over the > materialized > > view. > > > > To test this out, I created two instances of a new class called > DummyTable > > with infinite cost and created a materialized view over these tables. I > > hoped this would be helpful for debugging because it forces Calcite to > use > > the materialized view to execute the query (since actually performing > scans > > over these tables has infinite cost). > > > > The matching and substitution part seems to be working great, but Calcite > > still fails to find a plan for the query. Any help would be appreciated. > > Below are some relevant components of different parts of the plan. > (Sorry, > > they're quite long). Thanks! > > > > > > > > > > > > > > Original query matched by the new rule > > > > LogicalProject(following=[$5], time=[$1], tweet_id=[$0]): rowcount = > > 1.7976931348623157E308, cumulative cost = {1.7976931348623157E308 > > rows, Infinity cpu, 0.0 io}, id = 641 > > LogicalJoin(subset=[rel#636:Subset#22.NONE.[]], condition=[=($5, $3)], > > joinType=[inner]): rowcount = 1.7976931348623157E308, cumulat > > ive cost = {1.7976931348623157E308 rows, 0.0 cpu, 0.0 io}, id = 634 > > DummyScan(subset=[rel#33:Subset#0.ENUMERABLE.[]], table=[[twissandra, > > tweet]]): rowcount = 1.7976931348623157E308, cu > > mulative cost = {inf}, id = 0 > > DummyScan(subset=[rel#98:Subset#8.ENUMERABLE.[]], table=[[twissandra, > > followers]]): rowcount = 1.7976931348623157E308 > > , cumulative cost = {inf}, id = 18 > > > > Original query rewritten to use a MultiJoin > > > > LogicalProject(following=[$3], time=[$1], tweet_id=[$0]): rowcount = 1.0, > > cumulative cost = {inf}, id = 667 > > MultiJoin(joinFilter=[=($3, $2)], isFullOuterJoin=[false], > > joinTypes=[[INNER, INNER]], outerJoinConditions=[[NULL, NULL]], projField > > s=[[{0, 1}, {0}]]): rowcount = 1.0, cumulative cost = {inf}, id = 664 > > LogicalProject(tweet_id=[$0], time=[$1], username=[$3]): rowcount = > > 1.7976931348623157E308, cumulative cost = {inf}, id = 651 > > DummyScan(table=[[twissandra, tweet]]): rowcount = > > 1.7976931348623157E308, cumulative cost = {inf}, id = 0 > > LogicalProject(following=[$1]): rowcount = 1.7976931348623157E308, > > cumulative cost = {inf}, id = 652 > > DummyScan(table=[[twissandra, followers]]): rowcount = > > 1.7976931348623157E308, cumulative cost = {inf}, id = 18 > > > > Replacement scan over the materialized view (this is what is substituted > in > > the original query) > > > > LogicalProject(following=[$1], time=[$2], tweet_id=[$0]): rowcount = > 100.0, > > cumulative cost = {300.0 rows, 701.0 cpu, 0.0 io}, id = 69 > > 3 > > LogicalProject(username=[$0], time=[CAST($1):TIMESTAMP(0)], > > tweet_id=[CAST($2):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE "ISO-88 > > 59-1$en_US$primary"]): rowcount = 100.0, cumulative cost = {200.0 rows, > > 401.0 cpu, 0.0 io}, id = 44 > > CassandraTableScan(table=[[twissandra, timeline]]): rowcount = 100.0, > > cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 23 > > > > -- > > Michael Mior > > michael.m...@gmail.com <mailto:michael.m...@gmail.com> > > > > 2016-06-07 1:39 GMT-04:00 Amogh Margoor <amo...@qubole.com <mailto: > amo...@qubole.com>>: > > > >> Sorry, I missed Lattices as I didn't know about Lattices without > >> aggregations. Btw there is this rule where we try to replace star table > >> scan by Tiles of Lattices with filters: > >> > >> > https://github.com/qubole/quark/blob/master/optimizer/src/main/java/com/qubole/quark/planner/FilterAggStarRule.java > < > https://github.com/qubole/quark/blob/master/optimizer/src/main/java/com/qubole/quark/planner/FilterAggStarRule.java > > > >> I thought it might be useful reference if we are trying to optimise for > >> materialised views defined on joins and filters( e.g., MV defined by > >> "select * from A join B on A.id=B.aid where A.x >10 and B.y < > '2015-01-10'" > >> ) > >> > >> Regards, > >> Amogh > >> > >> On Tue, Jun 7, 2016 at 7:06 AM, Julian Hyde <jh...@apache.org <mailto: > jh...@apache.org>> wrote: > >> > >>> Yes, I think you could make Lattices work without aggregation, as long > as > >>> you keep the key assumption, which is that dropping joined tables does > >> not > >>> change the number of rows (so, primary key and foreign key integrity, > >>> relationships are all many-to-one from the root (fact) table, and you > >> have > >>> to include the root table in your query). > >>> > >>> I might end up using lattices with and without aggregation in the Druid > >>> adapter. Because Druid cannot do joins, I plan to map each Druid table > >> to a > >>> lattice, so that you can satisfy a star query using a Druid table. > Druid > >> is > >>> *mainly* for aggregation, but Druid just introduced a “select” query[1] > >>> that returns raw rows, and the Druid adapter can execute > non-aggregation > >>> queries. > >>> > >>> Once you’ve translated a query "A join B join C” onto a virtual table > ABC > >>> I suppose you could then apply other materialized view techniques. > >>> > >>> If you were able to implement the Goldstein & Larson algorithm it would > >> be > >>> interesting to compose it with other materialized view substitutions in > >>> Calcite. After all, each materialized view substitution just generates > >> more > >>> alternative queries to optimize, so I suspect that we could compose > >>> algorithms. > >>> > >>> Julian > >>> > >>> [1] http://druid.io/docs/latest/querying/select-query.html < > >>> http://druid.io/docs/latest/querying/select-query.html < > http://druid.io/docs/latest/querying/select-query.html>> > >>> > >>> > >>>> On Jun 6, 2016, at 6:06 PM, Michael Mior <mm...@uwaterloo.ca> wrote: > >>>> > >>>> Thanks for the pointer. Any chance there's a good example somewhere > >> using > >>>> lattices that would help me get something working? Also, is it > possible > >>> to > >>>> have lattices with no aggregation? Unfortunately either way it doesn't > >>>> completely suit my use case since I'm going to have to do rewriting on > >>>> joins aside from star queries. That said if I can get basic equijoins > >>>> working that would be enough for me. > >>>> > >>>> I started trying to write a unification rule but decided it would be > >>> easier > >>>> to do something like MaterializedViewFilterScanRule. The first problem > >>> that > >>>> I've run into is trying to get a Join instance in a rule where the > left > >>> and > >>>> right inputs are TableScan instances. Even if I have a > >> RelOptRuleOperand > >>>> that looks like the following I end up with a join whose left and > right > >>>> inputs are RelSubsets. > >>>> > >>>> operand(LogicalJoin.class, operand(TableScan.class, any()), > >>>> operand(TableScan.class, any())) > >>>> > >>>> Anyway, I've been looking into implementing the approach from the > >>> following > >>>> paper. It's very nicely written and easy to follow. It also doesn't > >>> require > >>>> trying different join permutations. I'm starting with several > >> additional > >>>> restrictions (only equijoins, no aggregations, etc.) > >>>> > >>>> > >>> > >> > ftp://ftp.cse.buffalo.edu/users/azhang/disc/SIGMOD/pdf-files/331/202-optimizing.pdf > >>>> > >>>> -- > >>>> Michael Mior > >>>> michael.m...@gmail.com > >>>> > >>>> 2016-06-06 19:34 GMT-04:00 Julian Hyde <jh...@apache.org>: > >>>> > >>>>> Short answer: yes, if you use lattices. > >>>>> > >>>>> Remember, Calcite has two mechanisms for matching materialized views: > >>> the > >>>>> "general purpose" mechanism, and lattices. > >>>>> > >>>>> Using general purpose mechanism you can declare a materialized view > >>> based > >>>>> on any query, but there needs to be a unification rule to rewrite > your > >>>>> query onto that view. (That is, rewrite your query using > >>>>> semantics-preserving algebraic transformations such that one branch > of > >>> the > >>>>> rewritten query is identical to the query that defines the > >> materialized > >>>>> view. See > http://www.slideshare.net/julianhyde/calcite-algebraedw2015 > >> < > >>>>> http://www.slideshare.net/julianhyde/calcite-algebraedw2015> slides > >>> 15+.) > >>>>> There are unification rules for scan, filter, project and some kinds > >> of > >>>>> aggregation, but not join. > >>>>> > >>>>> Knowing how difficult it was to write unification rules, and knowing > >> how > >>>>> common was the use case of star queries (scan - filter - join - > >>> aggregate), > >>>>> I invented lattices. First you define a lattice (a collection of > >> tables > >>>>> joined using many-to-one relationships), then you (or the system) > >>> defines > >>>>> materialized views on top of that lattice. If an incoming query uses > >>> tables > >>>>> from that lattice, joined using the same join keys, then Calcite will > >>>>> consider substituting any materialized views in that lattice. > >>>>> > >>>>> Matching queries to lattice materialized views is much more efficient > >>> than > >>>>> matching to general purpose materialized views, two reasons. (a) The > >>> joins > >>>>> are removed, the materialized view and the query become just projects > >> of > >>>>> the lattice’s columns, and we avoid the combinatorial explosion that > >>> occurs > >>>>> when joins are permuted. (b) A lattice can contain dozens of > >>> materialized > >>>>> views but it is clear, because they are partially ordered, which is > >> the > >>>>> best for a given query, and they are all considered at the same time. > >>>>> > >>>>> Julian > >>>>> > >>>>>> On Jun 5, 2016, at 18:58, amo...@qubole.com wrote: > >>>>>> > >>>>>> AFAIK it doesn't. Not sure if some work has been done towards it. > >>>>>> Sent from my iPhone > >>>>>> > >>>>>>> On 06-Jun-2016, at 7:08 AM, Michael Mior <mm...@uwaterloo.ca> > >> wrote: > >>>>>>> > >>>>>>> Am I correct in understanding that Calcite doesn't currently > rewrite > >>>>>>> queries to use materialized views if the query which defines the > >> view > >>>>>>> includes a join? If this is the case, has there been any work > >> towards > >>>>>>> making this happen? > >>>>>>> > >>>>>>> -- > >>>>>>> Michael Mior > >>>>>>> michael.m...@gmail.com > >