[GitHub] spark pull request #22518: [SPARK-25482][SQL] ReuseSubquery can be useless w...

mgaido91 Mon, 12 Nov 2018 01:16:38 -0800

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22518#discussion_r232580202
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
    @@ -1268,4 +1269,16 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
           assert(getNumSortsInQuery(query5) == 1)
         }
       }
    +
    +  test("SPARK-25482: Reuse same Subquery in order to execute it only 
once") {
    +    withTempView("t1", "t2") {
    +      sql("create temporary view t1(a int) using parquet")
    +      sql("create temporary view t2(b int) using parquet")
    +      val plan = sql("select * from t2 where b > (select max(a) from t1)")
    --- End diff --
    
    Sure, please can you check the PR description? I think the context is quite 
well explained there.
    
    Anyway, as a quick summary: in this case `b > (select max(a) from t1)` is 
pushed down as a datasource filter. So we have 2 instances of `b > (select 
max(a) from t1)` and the result is not reused. It is not reused because the 
copied plan satisfies `==`, so even if `ReuseSubquery` replaces it, then the 
change is ignored.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22518: [SPARK-25482][SQL] ReuseSubquery can be useless w...

Reply via email to