[ https://issues.apache.org/jira/browse/SPARK-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657646#comment-16657646 ]
Apache Spark commented on SPARK-25784: -------------------------------------- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/22778 > Infer filters from constraints after rewriting predicate subquery > ----------------------------------------------------------------- > > Key: SPARK-25784 > URL: https://issues.apache.org/jira/browse/SPARK-25784 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Yuming Wang > Priority: Major > > Benchmark: > {code:scala} > withTempView("t1", "t2") { > withTempDir { dir => > spark.range(3000000) > .selectExpr("cast(null as int) as c1", "if(id % 2 = 0, null, id) as > c2", "id as c3") > .coalesce(1) > .orderBy("c2") > .write > .mode("overwrite") > .option("parquet.block.size", 10485760) > .parquet(dir.getCanonicalPath) > spark.read.parquet(dir.getCanonicalPath).createTempView("t1") > spark.read.parquet(dir.getCanonicalPath).createTempView("t2") > Seq("c1", "c2", "c3").foreach { column => > val benchmark = new Benchmark(s"join key $column", 10) > Seq(false, true).foreach { inferFilters => > benchmark.addCase(s"Is infer filters $inferFilters", numIters = 5) { > _ => > withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> > inferFilters.toString) { > sql(s"select t1.* from t1 where t1.$column in (select $column > from t2)").count() > } > } > } > benchmark.run() > } > } > } > {code} > Benchmark result: > {noformat} > Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 > Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz > join key c1: Best/Avg Time(ms) Rate(M/s) Per > Row(ns) Relative > ------------------------------------------------------------------------------------------------ > Is infer filters false 2005 / 2163 0.0 > 200481431.0 1.0X > Is infer filters true 190 / 207 0.0 > 18962935.7 10.6X > Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 > Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz > join key c2: Best/Avg Time(ms) Rate(M/s) Per > Row(ns) Relative > ------------------------------------------------------------------------------------------------ > Is infer filters false 2368 / 2498 0.0 > 236803743.1 1.0X > Is infer filters true 1234 / 1268 0.0 > 123443912.3 1.9X > Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 > Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz > join key c3: Best/Avg Time(ms) Rate(M/s) Per > Row(ns) Relative > ------------------------------------------------------------------------------------------------ > Is infer filters false 2754 / 2907 0.0 > 275376009.7 1.0X > Is infer filters true 2237 / 2255 0.0 > 223739457.8 1.2X > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org