Paul Rogers created DRILL-7558: ---------------------------------- Summary: Generalize filter push-down planner phase Key: DRILL-7558 URL: https://issues.apache.org/jira/browse/DRILL-7558 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.18.0 Reporter: Paul Rogers Assignee: Paul Rogers Fix For: 1.18.0
DRILL-7458 provides a base framework for storage plugins, including a simplified filter push-down mechanism. [~volodymyr] notes that it may be *too* simple: {quote} What about the case when this rule was applied for one filter, but planner at some point pushed another filter above the scan, for example, if we have such case: {code} Filter(a=2) Join(t1.b=t2.b, type=inner) Filter(b=3) Scan(t1) Scan(t2) {code} Filter b=3 will be pushed into scan, planner will push filter above join: {code} Join(t1.b=t2.b, type=inner) Filter(a=2) Scan(t1, b=3) Scan(t2) {code} In this case, check whether filter was pushed is not enough. {quote} Drill divides planning into a number of *phases*, each defined by a set of *rules*. Most storage plugins perform filter push-down during the physical planning stage. However, by this point, Drill has already decided on the degree of parallelism: it is too late to use filter push-down to set the degree of parallelism. Yet, if using something like a REST API, we want to use filters to help us shard the query (that is, to set the degree of parallelism.) DRILL-7458 performs filter push-down at *logical* planning time to work around the above limitation. (In Drill, there are three different phases that could be considered the logical phase, depending on which planning options are set to control Calcite.) [~volodymyr] points out that the the logical plan phase may be wrong because it will perform rewrites of the type he cited. Thus, we need to research where to insert filter push down. It must come: * After rewrites of the kind described above. * After join equivalence computations. (See DRILL-7556.) * Before the decision is made about the number of minor fragments. The goal of this ticket is to either: * Research to identify an existing phase which satisfies these requirements, or * Create a new phase. Due to the way Calcite works, it is not a good idea to have a single phase handle two tasks that depend on one another. That is, we cannot combine filter push down in a phase which defines the filters, nor can we add filter push-down in a phase that choose parallelism. Background: Calcite is a rule-based query planner inspired by [Volcano|https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf]. The above issue is a flaw with rule-based planners and was identified as early as the [Cascades query framework paper|https://www.csd.uoc.gr/~hy460/pdf/CascadesFrameworkForQueryOptimization.pdf] which was the follow-up to Volcano. -- This message was sent by Atlassian Jira (v8.3.4#803005)