mslapek commented on code in PR #5623:
URL: https://github.com/apache/datafusion/pull/5623#discussion_r1664441817


##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -330,15 +324,14 @@ impl Optimizer {
             }
             log_plan(&format!("Optimized plan (pass {i})"), &new_plan);
 
-            // TODO this is an expensive way to see if the optimizer did 
anything and
-            // it would be better to change the OptimizerRule trait to return 
an Option
-            // instead
-            if old_plan.as_ref() == &new_plan {
+            // HashSet::insert returns, whether the value was newly inserted.
+            let plan_is_fresh =
+                previous_plans.insert(LogicalPlanSignature::new(&new_plan));
+            if !plan_is_fresh {

Review Comment:
   Yes, a cycle shows a logical error in the optimizer - because at each step 
the performance should be improved.
   
   At the same time, it does NOT imply a correctness error. The plan probably 
will be still correct (but not optimal).
   
   > IMO cycle should be error condition
   
   We should take the perspective of a user. If a data scientist does research, 
should we harm the availability of the database?
   
   IMO Maybe a log of this situation could be useful, with a configuration flag 
turning this into an error...
   
   @findepi Btw. What motivated you to review this merged (one year ago) PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to