JoshRosen opened a new pull request #34691: URL: https://github.com/apache/spark/pull/34691
### What changes were proposed in this pull request? This PR adds caching to `LogicalPlan.isStreaming()`: the default implementation's result will now be cached in a `private lazy val`. ### Why are the changes needed? This improves the performance of the `DeduplicateRelations` analyzer rule. The default implementation of `isStreaming` recursively visits every node in the tree. `DeduplicateRelations.renewDuplicatedRelations` is recursively invoked on every node in the tree and each invocation calls `isStreaming`. This leads to `O(n^2)` invocations of `isStreaming` on leaf nodes. Caching `isStreaming` avoids this performance problem. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Correctness should be covered by existing tests. This significantly improved `DeduplicateRelations` performance in local microbenchmarking with large query plans. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org