[ https://issues.apache.org/jira/browse/CALCITE-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145678#comment-16145678 ]
Jesus Camacho Rodriguez commented on CALCITE-1965: -------------------------------------------------- [~christian.beikov], this is all very exciting, thanks for putting together the examples. One aspect that needs more thought is the structure of the OEQ, since just storing the predicates doing a DFS traversal of the join tree will not work for more complex cases. In particular, I think it would be necessary to add structural identifiers to OEQ, e.g. pre-post-depth identifiers are an example, to actually know the exact position of the predicate when we have multi-level join trees. For instance, consider the following examples, which have the same OEQ but are not equivalent: {noformat} EQ1={c.b, b.b} OEQ1={e.d, d.d}{c.c, d.c}{a.a, b.a} ljoin(e.d = d.d) / \ table(e) ljoin(c.c = d.c) / \ ijoin(c.b = b.b) table(d) / \ table(c) ljoin(a.a = b.a) / \ table(a) table(b) EQ2={c.b, b.b} OEQ2={e.d, d.d}{c.c, d.c}{a.a, b.a} ljoin(e.d = d.d) / \ table(e) ljoin(c.c = d.c) / \ ijoin(a.a = b.b) table(d) / \ table(a) ljoin(c.b = b.a) / \ table(c) table(b) {noformat} However, adding the identifiers: {noformat} EQ'1= 4-6-3{c.b, b.b} OEQ'1= 1-9-1{e.d, d.d} 3-8-2{c.c, d.c} 6-5-4{a.a, b.a} EQ'2= 6-5-4{c.b, b.b} OEQ'2= 1-9-1{e.d, d.d} 3-8-2{c.c, d.c} 4-6-3{a.a, b.a} {noformat} Other schemes for the identifiers could be considered too. I have not thought how this would play with your idea for the matching / rewriting as it will add certain complexity, what do you think? Another question I have is how to deal with multiple repetitions of same table... With the identifiers this seems easier, but without them, I am not sure whether we would be able to rewrite those cases. Wrt the normalization step (rewriting to left outer joins), it can be done via transformation rule {{JoinCommuteRule.java}}, which could be triggered before attempting the rewriting. Another aspect to control would be where to trigger the rule, trying to minimize the number of executions (but this can be done later on). About subqueries, for your particular example above, when the subqueries are parsed they will be rewritten into a plan with a left outer join over a left outer join, thus it would not be a problem. I was talking about these kind of plans that can result from the subquery or the ON clause having other Filter expressions: *Query1 With Filter* {noformat} select * from a left join ( select b.a as x from b left join c on b.b = c.b where b.s = 'foo' ) on a.a = x ijoin(a.a = b.a) / \ / filter(b.s = 'foo) / \ table(a) ljoin(b.b = c.b) / \ table(b) table(c) : {noformat} *Query2 With Filter* {noformat} select * from a left join ( select b.a as x from (select b.a, b.b from b where b.s = 'foo') b left join c on b.b = c.b ) on a.a = x ijoin(a.a = b.a) / \ table(a) ljoin(b.b = c.b) / \ filter(b.s = 'foo') \ / \ table(b) table(c) : {noformat} *Query3 With Filter* {noformat} select * from a left join ( select b.a as x from b left join (select c.b from c where c.s = 'foo') c on b.b = c.b ) on a.a = x ijoin(a.a = b.a) / \ table(a) ljoin(b.b = c.b) / \ / filter(c.s = 'foo') / \ table(b) table(c) : {noformat} In those cases, we could bail out from the rewriting right now, but I just wanted to know whether you had considered how to tackle them, as having some filtering in the query is a pattern that will be quite common, at least on top of the table scan operators as in the last two examples. Another more complex example also for a follow-up: *Query4 With Filter* {noformat} select * from a left join ( select b.a as x from b left join c on b.b = c.b and b.s = 'foo' ) on a.a = x ijoin(a.a = b.a) / \ table(a) ljoin(b.b = c.b and b.s = 'foo') / \ table(b) table(c) : {noformat} I was not aware that the same authors had tackled the rewriting problem for outer joins too, my bad, thanks! The approach that they discuss in that article seems to cover the case when you have a Filter operator on top of the TS too. Also it seems to play well with the approach in the previous paper / what we already implemented in Calcite. I will check it further too and let you know. > Support outer joins for materialized views > ------------------------------------------ > > Key: CALCITE-1965 > URL: https://issues.apache.org/jira/browse/CALCITE-1965 > Project: Calcite > Issue Type: Improvement > Components: core > Reporter: Christian Beikov > Assignee: Julian Hyde > > Currently, only inner joins are supported for materialized view > substitutions. The support for outer joins involves creating new pulled up > predicates in case of outer joins that represent semantics of the join. For a > join predicate like "a.id = b.id" the inner join just pulls up that > predicate. When having a left join like e.g. {{select * from a left join b on > a.id = b.id}}, the actual pulled up predicate would be {{OR(=(a.id, > b.id),ISNULL(b.id))}}. For a right join it would be {{OR(=(a.id, > b.id),ISNULL(a.id))}} and for a full outer join it would be {{OR(=(a.id, > b.id),ISNULL(a.id),ISNULL(b.id))}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)