Josh Rosen created SPARK-37447: ---------------------------------- Summary: Cache LogicalPlan.isStreaming() in a lazy val Key: SPARK-37447 URL: https://issues.apache.org/jira/browse/SPARK-37447 Project: Spark Issue Type: Improvement Components: Optimizer Affects Versions: 3.2.0 Reporter: Josh Rosen
The default implementation of `LogicalPlan.isStreaming()` calls `children.exists(_.isStreaming)`. This can be expensive for large trees, so as a performance optimization I think we should cache the result in a private lazy val. This is especially important for programs that programmatically construct huge query plans because that will result in multiple analysis passes (and therefore multiple invocations of rules which call `isStreaming`). For example, this the `isStreaming` check accounts for a significant portion of the time in `DeduplicateRelations` (> 20% in my local tests). -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org