[ https://issues.apache.org/jira/browse/SPARK-37447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447828#comment-17447828 ]
Apache Spark commented on SPARK-37447: -------------------------------------- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/34691 > Cache LogicalPlan.isStreaming() in a lazy val > --------------------------------------------- > > Key: SPARK-37447 > URL: https://issues.apache.org/jira/browse/SPARK-37447 > Project: Spark > Issue Type: Improvement > Components: Optimizer > Affects Versions: 3.2.0 > Reporter: Josh Rosen > Assignee: Josh Rosen > Priority: Major > > The default implementation of `LogicalPlan.isStreaming()` calls > `children.exists(_.isStreaming)`. This can be expensive for large trees, so > as a performance optimization I think we should cache the result in a private > lazy val. > This is especially important for programs that programmatically construct > huge query plans because that will result in multiple analysis passes (and > therefore multiple invocations of rules which call `isStreaming`). For > example, this the `isStreaming` check accounts for a significant portion of > the time in `DeduplicateRelations` (> 20% in my local tests). -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org