[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21230 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/21230#discussion_r187388065 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala --- @@ -23,17 +23,10 @@ import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project import org.apache.spark.sql.catalyst.rules.Rule object PushDownOperatorsToDataSource extends Rule[LogicalPlan] { - override def apply( - plan: LogicalPlan): LogicalPlan = plan transformUp { + override def apply(plan: LogicalPlan): LogicalPlan = plan.mapChildren { --- End diff -- Could you update the PR description (`transformDown` -> `mapChildren`), too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21230#discussion_r185991248 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala --- @@ -23,49 +23,46 @@ import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project import org.apache.spark.sql.catalyst.rules.Rule object PushDownOperatorsToDataSource extends Rule[LogicalPlan] { - override def apply( - plan: LogicalPlan): LogicalPlan = plan transformUp { -// PhysicalOperation guarantees that filters are deterministic; no need to check -case PhysicalOperation(project, newFilters, relation : DataSourceV2Relation) => - // merge the filters - val filters = relation.filters match { -case Some(existing) => - existing ++ newFilters -case _ => - newFilters - } + override def apply(plan: LogicalPlan): LogicalPlan = { +var pushed = false +plan transformDown { + // PhysicalOperation guarantees that filters are deterministic; no need to check + case PhysicalOperation(project, filters, relation: DataSourceV2Relation) if !pushed => --- End diff -- `PhysicalOperation` just accumulates project and filter above a specific node, if we transform down a tree, and only transform once, we will never hit `PhysicalOperation` move than once. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21230#discussion_r185986217 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala --- @@ -23,49 +23,46 @@ import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project import org.apache.spark.sql.catalyst.rules.Rule object PushDownOperatorsToDataSource extends Rule[LogicalPlan] { - override def apply( - plan: LogicalPlan): LogicalPlan = plan transformUp { -// PhysicalOperation guarantees that filters are deterministic; no need to check -case PhysicalOperation(project, newFilters, relation : DataSourceV2Relation) => - // merge the filters - val filters = relation.filters match { -case Some(existing) => - existing ++ newFilters -case _ => - newFilters - } + override def apply(plan: LogicalPlan): LogicalPlan = { +var pushed = false +plan transformDown { + // PhysicalOperation guarantees that filters are deterministic; no need to check + case PhysicalOperation(project, filters, relation: DataSourceV2Relation) if !pushed => --- End diff -- Is that possible one plan has multiple `PhysicalOperation`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/21230 [SPARK-24172][SQL] we should not apply operator pushdown to data source v2 many times ## What changes were proposed in this pull request? In `PushDownOperatorsToDataSource`, we use `transformUp` to match `PhysicalOperation` and apply pushdown. This is problematic if we have multiple `Filter` and `Project` above the data source v2 relation. e.g. for a query ``` Project Filter DataSourceV2Relation ``` The pattern match will be triggered twice and we will do operator pushdown twice. This is unnecessary, we can use `transformDown` to only apply pushdown once. ## How was this patch tested? existing test You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark step2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21230.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21230 commit e224f8a798ed30319efab386720c997227e1b421 Author: Wenchen FanDate: 2018-05-03T15:14:01Z we should not apply operator pushdown to data source v2 many times --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org