Yuming Wang created SPARK-29485: ----------------------------------- Summary: Improve BooleanSimplification performance Key: SPARK-29485 URL: https://issues.apache.org/jira/browse/SPARK-29485 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang
How to reproduce: {code:scala} val columns = (0 until 1000).map{ i => s"id as id$i"} spark.range(1).selectExpr(columns : _*).write.saveAsTable("t1") spark.range(1).selectExpr(columns : _*).write.saveAsTable("t2") RuleExecutor.resetMetrics() spark.table("t1").join(spark.table("t2"), (1 until 800).map(i => s"id${i}")).show(false) logWarning(RuleExecutor.dumpTimeSpent()) {code} {noformat} === Metrics of Analyzer/Optimizer Rules === Total number of runs: 20157 Total time: 12.918977054 seconds Rule Effective Time / Total Time Effective Runs / Total Runs org.apache.spark.sql.catalyst.optimizer.BooleanSimplification 0 / 9835799647 0 / 3 org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints 1532613008 / 1532613008 1 / 1 ... {noformat} If we disable {{BooleanSimplification}}: {noformat} === Metrics of Analyzer/Optimizer Rules === Total number of runs: 20154 Total time: 3.715814437 seconds Rule Effective Time / Total Time Effective Runs / Total Runs org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints 2081338100 / 2081338100 1 / 1 ... {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org