alamb commented on issue #14231:
URL: https://github.com/apache/datafusion/issues/14231#issuecomment-2612024839

   My preference is to try and keep the core of datafusion focused on executing 
the plans as provided as much as possible, and performing "always good 
optimizations"
   
   For optimizations where there is some tradeoff (like choosing between 
sorting for sort merge join or hashing, for example) I strongly suggest we keep 
as much of that out of the core as possible (and use user defined passes 
instead).
   
   The rationale is that when tradeoffs are present, no particular choice will 
be ideal for all usecase (hence why we already have `prefer_existing_sort`). I 
can imagine some systems that want to prioritize plans that require less memory 
but more compute, as well as other systems that would prefer maximum 
performance even if it takes more memory, etc
   
   If we make the optimizer passes in the core of datafusion have baked in 
tradeoffs/heuristics I think it will just get more and more complicated as 
people try to change how the tradeoffs work
   
   I feel strongly enough about this to help with the project


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to