[jira] [Commented] (CALCITE-1965) Support outer joins for materialized views

Jesus Camacho Rodriguez (JIRA) Tue, 29 Aug 2017 10:02:29 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145678#comment-16145678
 ]


Jesus Camacho Rodriguez commented on CALCITE-1965:
--------------------------------------------------

[~christian.beikov], this is all very exciting, thanks for putting together the 
examples.

One aspect that needs more thought is the structure of the OEQ, since just 
storing the predicates doing a DFS traversal of the join tree will not work for 
more complex cases. In particular, I think it would be necessary to add 
structural identifiers to OEQ, e.g. pre-post-depth identifiers are an example, 
to actually know the exact position of the predicate when we have multi-level 
join trees.
For instance, consider the following examples, which have the same OEQ but are 
not equivalent:

{noformat}
EQ1={c.b, b.b}
OEQ1={e.d, d.d}{c.c, d.c}{a.a, b.a}

           ljoin(e.d = d.d)
          /               \
       table(e)            ljoin(c.c = d.c)
                           /             \
              ijoin(c.b = b.b)            table(d)
               /            \
            table(c)   ljoin(a.a = b.a)
                        /           \
                     table(a)    table(b) 


EQ2={c.b, b.b}
OEQ2={e.d, d.d}{c.c, d.c}{a.a, b.a}
           ljoin(e.d = d.d)
          /               \
       table(e)            ljoin(c.c = d.c)
                           /             \
              ijoin(a.a = b.b)            table(d)
               /            \
            table(a)   ljoin(c.b = b.a)
                        /           \
                     table(c)    table(b) 
{noformat}

However, adding the identifiers:
{noformat}
EQ'1= 4-6-3{c.b, b.b}
OEQ'1= 1-9-1{e.d, d.d} 3-8-2{c.c, d.c} 6-5-4{a.a, b.a}

EQ'2= 6-5-4{c.b, b.b}
OEQ'2= 1-9-1{e.d, d.d} 3-8-2{c.c, d.c} 4-6-3{a.a, b.a}
{noformat}
Other schemes for the identifiers could be considered too. I have not thought 
how this would play with your idea for the matching / rewriting as it will add 
certain complexity, what do you think?

Another question I have is how to deal with multiple repetitions of same 
table... With the identifiers this seems easier, but without them, I am not 
sure whether we would be able to rewrite those cases.

Wrt the normalization step (rewriting to left outer joins), it can be done via 
transformation rule {{JoinCommuteRule.java}}, which could be triggered before 
attempting the rewriting.

Another aspect to control would be where to trigger the rule, trying to 
minimize the number of executions (but this can be done later on).

About subqueries, for your particular example above, when the subqueries are 
parsed they will be rewritten into a plan with a left outer join over a left 
outer join, thus it would not be a problem. I was talking about these kind of 
plans that can result from the subquery or the ON clause having other Filter 
expressions:

*Query1 With Filter*
{noformat}
select *
from a
left join (
        select b.a as x
        from b 
        left join c on b.b = c.b
        where b.s = 'foo'
) on a.a = x

   ijoin(a.a = b.a)
      /             \
     /           filter(b.s = 'foo)
    /                 \
 table(a)   ljoin(b.b = c.b)
                 /        \
          table(b)    table(c) :
{noformat}

*Query2 With Filter*
{noformat}
select *
from a
left join (
        select b.a as x
        from (select b.a, b.b from b where b.s = 'foo') b 
        left join c on b.b = c.b
) on a.a = x

   ijoin(a.a = b.a)
    /            \
 table(a)   ljoin(b.b = c.b)
                /       \
 filter(b.s = 'foo')     \
             /            \
          table(b)       table(c) :
{noformat}

*Query3 With Filter*
{noformat}
select *
from a
left join (
        select b.a as x
        from b 
        left join (select c.b from c where c.s = 'foo') c on b.b = c.b
) on a.a = x

   ijoin(a.a = b.a)
    /            \
 table(a)   ljoin(b.b = c.b)
                /      \
               /    filter(c.s = 'foo')
             /            \
          table(b)      table(c) :
{noformat}

In those cases, we could bail out from the rewriting right now, but I just 
wanted to know whether you had considered how to tackle them, as having some 
filtering in the query is a pattern that will be quite common, at least on top 
of the table scan operators as in the last two examples.

Another more complex example also for a follow-up:
*Query4 With Filter*
{noformat}
select *
from a
left join (
        select b.a as x
        from b 
        left join c on b.b = c.b and b.s = 'foo'
) on a.a = x

   ijoin(a.a = b.a)
    /            \
 table(a)   ljoin(b.b = c.b and b.s = 'foo')
                /        \
          table(b)    table(c) :
{noformat}

I was not aware that the same authors had tackled the rewriting problem for 
outer joins too, my bad, thanks!
The approach that they discuss in that article seems to cover the case when you 
have a Filter operator on top of the TS too. Also it seems to play well with 
the approach in the previous paper / what we already implemented in Calcite. I 
will check it further too and let you know.

> Support outer joins for materialized views
> ------------------------------------------
>
>                 Key: CALCITE-1965
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1965
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Christian Beikov
>            Assignee: Julian Hyde
>
> Currently, only inner joins are supported for materialized view 
> substitutions. The support for outer joins involves creating new pulled up 
> predicates in case of outer joins that represent semantics of the join. For a 
> join predicate like "a.id = b.id" the inner join just pulls up that 
> predicate. When having a left join like e.g. {{select * from a left join b on 
> a.id = b.id}}, the actual pulled up predicate would be {{OR(=(a.id, 
> b.id),ISNULL(b.id))}}. For a right join it would be  {{OR(=(a.id, 
> b.id),ISNULL(a.id))}} and for a full outer join it would be  {{OR(=(a.id, 
> b.id),ISNULL(a.id),ISNULL(b.id))}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (CALCITE-1965) Support outer joins for materialized views

Reply via email to