sigmod commented on a change in pull request #32742: URL: https://github.com/apache/spark/pull/32742#discussion_r644614374
########## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala ########## @@ -603,6 +603,19 @@ case class AdaptiveSparkPlanExec( (newPlan, optimized) } + /** + * Clean up logical plan stats before re-optimize + */ + private def cleanupStats(logicalPlan: LogicalPlan): Unit = { + logicalPlan.invalidateStatsCache() + // We must invalidate ineffective rules before re-optimize since AQE Optimizer may introduce + // LocalRelation that can affect result. Review comment: Ok, IIUC, wether a stage is "materialized or not" is kept as an external varying state outside of plan nodes? If that's the case, the same rule object is invoked multiple times for the same logical plan but violates the contract of passing `ruleId`s: https://github.com/databricks/runtime/blob/88acfbbb14f1396c312c394ed8a2a645738f83f0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala#L445-L446 I'd prefer us keeping ineffective rule bits internal to TreeNodes for simplicity, otherwise it might be difficult to reason about certain behaviors. I suspect we don't have that many fixpoint iterations in AQEOptimizer so that passing `ruleId` doesn't help that much? What `ruleId` helped most are Analyzer rules, because the fix-point batch can run 5~8 iterations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org