Paul Rogers created DRILL-7558:
----------------------------------

             Summary: Generalize filter push-down planner phase
                 Key: DRILL-7558
                 URL: https://issues.apache.org/jira/browse/DRILL-7558
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.18.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers
             Fix For: 1.18.0


DRILL-7458 provides a base framework for storage plugins, including a 
simplified filter push-down mechanism. [~volodymyr] notes that it may be *too* 
simple:

{quote}
What about the case when this rule was applied for one filter, but planner at 
some point pushed another filter above the scan, for example, if we have such 
case:

{code}
Filter(a=2)
  Join(t1.b=t2.b, type=inner)
    Filter(b=3)
    Scan(t1)
    Scan(t2)
{code}

Filter b=3 will be pushed into scan, planner will push filter above join:

{code}
Join(t1.b=t2.b, type=inner)
    Filter(a=2)
    Scan(t1, b=3)
    Scan(t2)
{code}

In this case, check whether filter was pushed is not enough.
{quote}

Drill divides planning into a number of *phases*, each defined by a set of 
*rules*. Most storage plugins perform filter push-down during the physical 
planning stage. However, by this point, Drill has already decided on the degree 
of parallelism: it is too late to use filter push-down to set the degree of 
parallelism. Yet, if using something like a REST API, we want to use filters to 
help us shard the query (that is, to set the degree of parallelism.)
 
DRILL-7458 performs filter push-down at *logical* planning time to work around 
the above limitation. (In Drill, there are three different phases that could be 
considered the logical phase, depending on which planning options are set to 
control Calcite.)

[~volodymyr] points out that the the logical plan phase may be wrong because it 
will perform rewrites of the type he cited.

Thus, we need to research where to insert filter push down. It must come:

* After rewrites of the kind described above.
* After join equivalence computations. (See DRILL-7556.)
* Before the decision is made about the number of minor fragments.

The goal of this ticket is to either:

* Research to identify an existing phase which satisfies these requirements, or
* Create a new phase.

Due to the way Calcite works, it is not a good idea to have a single phase 
handle two tasks that depend on one another. That is, we cannot combine filter 
push down in a phase which defines the filters, nor can we add filter push-down 
in a phase that choose parallelism.

Background: Calcite is a rule-based query planner inspired by 
[Volcano|https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf].
The above issue is a flaw with rule-based planners and was identified as early 
as the [Cascades query framework 
paper|https://www.csd.uoc.gr/~hy460/pdf/CascadesFrameworkForQueryOptimization.pdf]
 which was the follow-up to Volcano.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to