[GitHub] spark pull request #22518: [SPARK-25482][SQL] ReuseSubquery can be useless w...

cloud-fan Mon, 12 Nov 2018 08:17:25 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22518#discussion_r232720903
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
    @@ -1268,4 +1269,16 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
           assert(getNumSortsInQuery(query5) == 1)
         }
       }
    +
    +  test("SPARK-25482: Reuse same Subquery in order to execute it only 
once") {
    +    withTempView("t1", "t2") {
    +      sql("create temporary view t1(a int) using parquet")
    +      sql("create temporary view t2(b int) using parquet")
    +      val plan = sql("select * from t2 where b > (select max(a) from t1)")
    --- End diff --
    
    I think you are right about it, but it also means the data source scan must 
wait until the subquery is finished. We need to make tradeoffs carefully.
    
    I'd suggest we open a new ticket about scalar subquery filter pushdown to 
data source, and forbid it here.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22518: [SPARK-25482][SQL] ReuseSubquery can be useless w...

Reply via email to