[spark] branch branch-3.0 updated: [SPARK-33372][SQL] Fix InSet bucket pruning

wenchen Mon, 09 Nov 2020 00:42:19 -0800

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new c157fa3  [SPARK-33372][SQL] Fix InSet bucket pruning
c157fa3 is described below

commit c157fa36e8e1aa18dcf87b6c774970b8122df0dc
Author: Yuming Wang <yumw...@ebay.com>
AuthorDate: Mon Nov 9 08:32:51 2020 +0000

    [SPARK-33372][SQL] Fix InSet bucket pruning
    
    ### What changes were proposed in this pull request?
    
    This pr fix `InSet` bucket pruning because of it's values should not be 
`Literal`:
    
https://github.com/apache/spark/blob/cbd3fdea62dab73fc4a96702de8fd1f07722da66/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L253-L255
    
    ### Why are the changes needed?
    
    Fix bug.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Unit test and manual test:
    
    ```scala
    spark.sql("select id as a, id as b from range(10000)").write.bucketBy(100, 
"a").saveAsTable("t")
    spark.sql("select * from t where a in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
11)").show
    ```
    
    Before this PR | After this PR
    -- | --
    
![image](https://user-images.githubusercontent.com/5399861/98380788-fb120980-2083-11eb-8fae-4e21ad873e9b.png)
 | 
![image](https://user-images.githubusercontent.com/5399861/98381095-5ba14680-2084-11eb-82ca-2d780c85305c.png)
    
    Closes #30279 from wangyum/SPARK-33372.
    
    Authored-by: Yuming Wang <yumw...@ebay.com>
    Signed-off-by: Wenchen Fan <wenc...@databricks.com>
    (cherry picked from commit 69799c514ff9874c57bf94d4de21ea4cd0cbbf8d)
    Signed-off-by: Wenchen Fan <wenc...@databricks.com>
---
 .../apache/spark/sql/execution/datasources/FileSourceStrategy.scala  | 5 ++---
 .../test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala  | 2 +-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
index b1e03fe..4b9e0c6 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
@@ -89,9 +89,8 @@ object FileSourceStrategy extends Strategy with Logging {
       case expressions.In(a: Attribute, list)
         if list.forall(_.isInstanceOf[Literal]) && a.name == bucketColumnName 
=>
         getBucketSetFromIterable(a, list.map(e => e.eval(EmptyRow)))
-      case expressions.InSet(a: Attribute, hset)
-        if hset.forall(_.isInstanceOf[Literal]) && a.name == bucketColumnName 
=>
-        getBucketSetFromIterable(a, hset.map(e => 
expressions.Literal(e).eval(EmptyRow)))
+      case expressions.InSet(a: Attribute, hset) if a.name == bucketColumnName 
=>
+        getBucketSetFromIterable(a, hset)
       case expressions.IsNull(a: Attribute) if a.name == bucketColumnName =>
         getBucketSetFromValue(a, null)
       case expressions.And(left, right) =>
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala
index 558bfd7..df8ca33 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala
@@ -188,7 +188,7 @@ abstract class BucketedReadSuite extends QueryTest with 
SQLTestUtils {
 
       // Case 4: InSet
       val inSetExpr = expressions.InSet($"j".expr,
-        Set(bucketValue, bucketValue + 1, bucketValue + 2, bucketValue + 
3).map(lit(_).expr))
+        Set(bucketValue, bucketValue + 1, bucketValue + 2, bucketValue + 3))
       checkPrunedAnswers(
         bucketSpec,
         bucketValues = Seq(bucketValue, bucketValue + 1, bucketValue + 2, 
bucketValue + 3),


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33372][SQL] Fix InSet bucket pruning

Reply via email to