Paul Rogers created DRILL-7556:
----------------------------------

             Summary: Generalize the "Base" storage plugin filter push down 
mechanism
                 Key: DRILL-7556
                 URL: https://issues.apache.org/jira/browse/DRILL-7556
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.18.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers
             Fix For: 1.18.0


DRILL-7458 adds a Base framework for storage plugins which includes a 
simplified representation of filters that can be pushed down into Drill. It 
makes the assumption that plugins can generally only handle filters of the form:

{code}
column relop constant
{code}

For example, {{`foo` < 10}} or {{`bar` = "Fred"}}. (The code "flips" 
expressions of the form {{constant relop column}}.)

[~volodymyr] suggests this is too narrow and suggests two additional cases:

{code}
column-expr relop constant
fn(column) = conststant
{code}

Examples:

{code:sql}
foo + 10 = 20
substr(bar, 2, 6) = 'Fred'
{code}

The first case should be handled by a general expression rewriter: simplify 
constant expressions:

{code:sql}
foo + 10 = 20 --> foo = 10
{code}

Then, filter-push down need only handle the simplified expression rather than 
every push-down mechanism needing to do the simplification.

For this ticket, we wish to handle the second case: any expression that 
contains a single column associated with the target table. Provide a new 
push-down node to handle the non-relop case so that simple plugins can simply 
ignore such expressions, but more complex plugins (such as Parquet) can 
optionally handle them.

A second improvement is to handle the more complex case: two or more columns, 
all of which come from the same target table. For example:

{code:sql}
foo + bar = 20
{code}

Where both {{foo}} and {{bar}} are from the same table. It would be a very 
sophisticated plugin indeed (maybe the JDBC storage plugin) which can handle 
this case, but it should be available.

As part of this work, we must handle join-equivalent columns:

{code:sql}
SELECT ... FROM t1, t2
  WHERE t1.a = t2.b
  AND t1.a = 20
{code}

If the plugin for table {{t2}} can handle filter push-down, then the expression 
{{t1.a = 20}} is join-equivalent to {{t2.b = 20}}.

It is not clear if the Drill logical plan already handles join equivalence. If 
not, it should be added. If so, the filter push-down mechanism should add 
documentation that describes how the mechanism works.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to