[ https://issues.apache.org/jira/browse/SPARK-16186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Davies Liu resolved SPARK-16186. -------------------------------- Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 13887 [https://github.com/apache/spark/pull/13887] > Support partition batch pruning with `IN` predicate in InMemoryTableScanExec > ---------------------------------------------------------------------------- > > Key: SPARK-16186 > URL: https://issues.apache.org/jira/browse/SPARK-16186 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Dongjoon Hyun > Fix For: 2.1.0 > > > One of the most frequent usage patterns for Spark SQL is using **cached > tables**. > This issue improves `InMemoryTableScanExec` to handle `IN` predicate > efficiently by pruning partition batches. > Of course, the performance improvement varies over the queries and the > datasets. For the following simple query, the query duration in Spark UI goes > from 9 seconds to 50~90ms. It's about over 100 times faster. > {code} > $ bin/spark-shell --driver-memory 6G > scala> val df = spark.range(2000000000) > scala> df.createOrReplaceTempView("t") > scala> spark.catalog.cacheTable("t") > scala> sql("select id from t where id = 1").collect() // About 2 mins > scala> sql("select id from t where id = 1").collect() // less than 90ms > scala> sql("select id from t where id in (1,2,3)").collect() // 9 seconds > (Before) > scala> sql("select id from t where id in (1,2,3)").collect() // less than > 90ms (After) > {code} > This issue has impacts over 35 queries of TPC-DS if the tables are cached. > Note that this optimization is applied for IN. To apply IN predicate having > more than 10 items, *spark.sql.optimizer.inSetConversionThreshold* option > should be increased. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org