mslapek commented on code in PR #5623: URL: https://github.com/apache/datafusion/pull/5623#discussion_r1664441817
########## datafusion/optimizer/src/optimizer.rs: ########## @@ -330,15 +324,14 @@ impl Optimizer { } log_plan(&format!("Optimized plan (pass {i})"), &new_plan); - // TODO this is an expensive way to see if the optimizer did anything and - // it would be better to change the OptimizerRule trait to return an Option - // instead - if old_plan.as_ref() == &new_plan { + // HashSet::insert returns, whether the value was newly inserted. + let plan_is_fresh = + previous_plans.insert(LogicalPlanSignature::new(&new_plan)); + if !plan_is_fresh { Review Comment: Yes, a cycle shows a logical error in the optimizer - because at each step the performance should be improved. At the same time, it does NOT imply a correctness error. The plan probably will be still correct (but not optimal). > IMO cycle should be error condition We should take the perspective of a user. If a data scientist does research, should we harm the availability of the database? IMO Maybe a log of this situation could be useful, with a configuration flag turning this into an error... @findepi Btw. What motivated you to review this merged (one year ago) PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org