adriangb commented on PR #19639: URL: https://github.com/apache/datafusion/pull/19639#issuecomment-3724684141
> So I guess the main factor is expressions like this being super expensive to evaluate (query 9): I wonder if it's the expression being expensive to evaluate or if evaluating it where it is currently causes the issue. That is, if this was evaluated in a `FilterExec` right before the `HashJoin -> RepartitionExec` (and thus lifted work out of the hash join) would it perform better? We should also try with `SET datafusion.optimizer.hash_join_inlist_pushdown_max_size = 0`. > A run with join filter pushdown disabled and DATAFUSION_OPTIMIZER_REPARTITION_FILE_MIN_SIZE = 128 * 1024 shows almost no regression for tpch I guess we need to test both of those to understand how each one impacts results... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
