pjmore opened a new pull request #1485: URL: https://github.com/apache/arrow-datafusion/pull/1485
Adds experimental egraph based query optimizer. # Rationale for this change An e-graph is a data structure which allows many different representations for a expression to exist at once and for all of them to be equivalent. The library being used [egg](https://github.com/egraphs-good/egg) also includes a DSL which allows new rules to be made easily. These two things make it simple to write new rules in a way that integrates with the existing rules well as they are all applied at once. ## Remaining work 1. Add round trip test for expressions and plans. 2. Improve documentation 3. Ensure that invariants, such as expression naming, are preserved by the optimizer. Add tests to ensure they are upheld. 4. Write invariant checking functionality 5. Add support for timezones in TokomakScalar for timestamp types. 6. Add support for UserDefinedLogicalPlans. 7. Add support for plans containing Values. 8. Replace Symbol uses with another type. Symbol uses a global cache protected by a mutex for storing the string value of a Symbol, could cause heavy contention on lock if optimizer is run from multiple threads at once. 9. Add custom DSL with some conveniences such as specifying the type of node that will be matched. # What changes are included in this PR? Add Tokomak create to the workspace. Add experimental-tokomak feature to datafusion-cli and optional Tokomak dependency Add experimental-tokomak feature to tpch and optional Tokomak dependency Made some types in the logical_plans module PartialOrd, Ord, and Hash. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org