This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 5d09828 [SPARK-37447][SQL] Cache LogicalPlan.isStreaming() result in a lazy val 5d09828 is described below commit 5d098286a003eac26d1fe6c026d8123f3cbe9e9c Author: Josh Rosen <joshro...@databricks.com> AuthorDate: Mon Nov 29 16:46:28 2021 +0800 [SPARK-37447][SQL] Cache LogicalPlan.isStreaming() result in a lazy val ### What changes were proposed in this pull request? This PR adds caching to `LogicalPlan.isStreaming()`: the default implementation's result will now be cached in a `private lazy val`. ### Why are the changes needed? This improves the performance of the `DeduplicateRelations` analyzer rule. The default implementation of `isStreaming` recursively visits every node in the tree. `DeduplicateRelations.renewDuplicatedRelations` is recursively invoked on every node in the tree and each invocation calls `isStreaming`. This leads to `O(n^2)` invocations of `isStreaming` on leaf nodes. Caching `isStreaming` avoids this performance problem. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Correctness should be covered by existing tests. This significantly improved `DeduplicateRelations` performance in local microbenchmarking with large query plans (~20% reduction in that rule's runtime in one of my tests). Closes #34691 from JoshRosen/cache-LogicalPlan.isStreaming. Authored-by: Josh Rosen <joshro...@databricks.com> Signed-off-by: Wenchen Fan <wenc...@databricks.com> --- .../org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala index 7c31a00..4aa7bf1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala @@ -41,7 +41,8 @@ abstract class LogicalPlan def metadataOutput: Seq[Attribute] = children.flatMap(_.metadataOutput) /** Returns true if this subtree has data from a streaming data source. */ - def isStreaming: Boolean = children.exists(_.isStreaming) + def isStreaming: Boolean = _isStreaming + private[this] lazy val _isStreaming = children.exists(_.isStreaming) override def verboseStringWithSuffix(maxFields: Int): String = { super.verboseString(maxFields) + statsCache.map(", " + _.toString).getOrElse("") --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org