2010YOUY01 commented on code in PR #17636: URL: https://github.com/apache/datafusion/pull/17636#discussion_r2366274431
########## benchmarks/src/hj.rs: ########## @@ -150,6 +149,20 @@ const HASH_QUERIES: &[&str] = &[ FULL JOIN range(30000) AS t2 ON (t1.value % 2) = (t2.value % 2) "#, + // Q13: INNER 30K x 30K | MEDIUM ~33% | double predicate + r#" + SELECT t1.value, t2.value + FROM range(30000) AS t1 + INNER JOIN range(30000) AS t2 + ON (t1.value = t2.value) AND (t1.value > 10000 and t2.value < 20000) Review Comment: It seems `(t1.value > 10000 and t2.value < 20000)` will be pushed down below join, instead of getting done inside `HashJoinExec` I think we can use `ON (t1.value = t2.value) AND ((t1.value+t2.value)%10 > 0)` here for high selectivity ########## benchmarks/src/hj.rs: ########## @@ -58,10 +58,9 @@ const HASH_QUERIES: &[&str] = &[ // equality on key + cheap filter to downselect r#" SELECT t1.value, t2.value - FROM range(10000) AS t1 + FROM generate_series(0,10000, 1000) AS t1(value) Review Comment: Thanks for the update. I think it's better to apply it to all other queries -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org