2010YOUY01 commented on code in PR #17636:
URL: https://github.com/apache/datafusion/pull/17636#discussion_r2366274431


##########
benchmarks/src/hj.rs:
##########
@@ -150,6 +149,20 @@ const HASH_QUERIES: &[&str] = &[
         FULL JOIN range(30000) AS t2
           ON (t1.value % 2) = (t2.value % 2)
     "#,
+    // Q13: INNER 30K x 30K | MEDIUM ~33% | double predicate
+    r#"
+        SELECT t1.value, t2.value
+        FROM range(30000) AS t1
+        INNER JOIN range(30000) AS t2
+          ON (t1.value = t2.value) AND (t1.value > 10000 and t2.value < 20000)

Review Comment:
   It seems `(t1.value > 10000 and t2.value < 20000)` will be pushed down below 
join, instead of getting done inside `HashJoinExec`
   I think we can use `ON (t1.value = t2.value) AND ((t1.value+t2.value)%10 > 
0)` here for high selectivity



##########
benchmarks/src/hj.rs:
##########
@@ -58,10 +58,9 @@ const HASH_QUERIES: &[&str] = &[
     // equality on key + cheap filter to downselect
     r#"
         SELECT t1.value, t2.value
-        FROM range(10000) AS t1
+        FROM generate_series(0,10000, 1000) AS t1(value)

Review Comment:
   Thanks for the update. I think it's better to apply it to all other queries



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to