[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...

2018-05-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21230


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...

2018-05-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/21230#discussion_r187388065
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala
 ---
@@ -23,17 +23,10 @@ import 
org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project
 import org.apache.spark.sql.catalyst.rules.Rule
 
 object PushDownOperatorsToDataSource extends Rule[LogicalPlan] {
-  override def apply(
-  plan: LogicalPlan): LogicalPlan = plan transformUp {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan.mapChildren {
--- End diff --

Could you update the PR description (`transformDown` -> `mapChildren`), too?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...

2018-05-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21230#discussion_r185991248
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala
 ---
@@ -23,49 +23,46 @@ import 
org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project
 import org.apache.spark.sql.catalyst.rules.Rule
 
 object PushDownOperatorsToDataSource extends Rule[LogicalPlan] {
-  override def apply(
-  plan: LogicalPlan): LogicalPlan = plan transformUp {
-// PhysicalOperation guarantees that filters are deterministic; no 
need to check
-case PhysicalOperation(project, newFilters, relation : 
DataSourceV2Relation) =>
-  // merge the filters
-  val filters = relation.filters match {
-case Some(existing) =>
-  existing ++ newFilters
-case _ =>
-  newFilters
-  }
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+var pushed = false
+plan transformDown {
+  // PhysicalOperation guarantees that filters are deterministic; no 
need to check
+  case PhysicalOperation(project, filters, relation: 
DataSourceV2Relation) if !pushed =>
--- End diff --

`PhysicalOperation` just accumulates project and filter above a specific 
node, if we transform down a tree, and only transform once, we will never hit 
`PhysicalOperation` move than once.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...

2018-05-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21230#discussion_r185986217
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala
 ---
@@ -23,49 +23,46 @@ import 
org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project
 import org.apache.spark.sql.catalyst.rules.Rule
 
 object PushDownOperatorsToDataSource extends Rule[LogicalPlan] {
-  override def apply(
-  plan: LogicalPlan): LogicalPlan = plan transformUp {
-// PhysicalOperation guarantees that filters are deterministic; no 
need to check
-case PhysicalOperation(project, newFilters, relation : 
DataSourceV2Relation) =>
-  // merge the filters
-  val filters = relation.filters match {
-case Some(existing) =>
-  existing ++ newFilters
-case _ =>
-  newFilters
-  }
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+var pushed = false
+plan transformDown {
+  // PhysicalOperation guarantees that filters are deterministic; no 
need to check
+  case PhysicalOperation(project, filters, relation: 
DataSourceV2Relation) if !pushed =>
--- End diff --

Is that possible one plan has multiple `PhysicalOperation`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21230: [SPARK-24172][SQL] we should not apply operator p...

2018-05-03 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/21230

[SPARK-24172][SQL] we should not apply operator pushdown to data source v2 
many times

## What changes were proposed in this pull request?

In `PushDownOperatorsToDataSource`, we use `transformUp` to match 
`PhysicalOperation` and apply pushdown. This is problematic if we have multiple 
`Filter` and `Project` above the data source v2 relation.

e.g. for a query
```
Project
  Filter
DataSourceV2Relation
```

The pattern match will be triggered twice and we will do operator pushdown 
twice. This is unnecessary, we can use `transformDown` to only apply pushdown 
once.

## How was this patch tested?

existing test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark step2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21230.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21230


commit e224f8a798ed30319efab386720c997227e1b421
Author: Wenchen Fan 
Date:   2018-05-03T15:14:01Z

we should not apply operator pushdown to data source v2 many times




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org