erratic-pattern commented on PR #10358: URL: https://github.com/apache/datafusion/pull/10358#issuecomment-2095040396
I've made a new algorithm for this that should in theory reduce the amount of work needed to be done by short-circuiting earlier once there is a consecutive sequence of unchanged expression trees equal to the number of rewriters. However, it ended up being harder than I thought to get actual performance improvements when comparing with local benchmarks. I tried several different approaches, but most were actually slower than the code that I have here in this PR. I did eventually come up with something that could possibly compete with the simpler algorithm in this PR. You can compare the branch [here](https://github.com/apache/datafusion/compare/main...erratic-pattern:adam/loop-expr-simplifier-static-dispatch?expand=1) My guess is that the additional overhead of checking each iteration is actual significant when we are only running 3 rewriters. I think we would see bigger improvements in cases where the number of optimization rules is larger. Many of the approaches I tried required dynamic dispatch on the `TreeNodeRewriter`s which was surprisingly a larger cost that I expected. The approach I ended up with avoids the dynamic dispatch which seems to be the main reason it's faster. I would be interested in scaling this up to run across all of the `OptimizationRules` where we should see a bigger improvement. However, I don't think everything has fully migrated from `try_optimize` so we would have to wait for that, I think. Once that happens, it should be possible to generalize the new branch code to work with both `OptimizationRule`s as well as `TreeNodeRewriter`s . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org