2010YOUY01 commented on code in PR #21821:
URL: https://github.com/apache/datafusion/pull/21821#discussion_r3286130643


##########
benchmarks/src/hj.rs:
##########
@@ -303,6 +301,110 @@ const HASH_QUERIES: &[HashJoinQuery] = &[
         build_size: "100K_(20%_dups)",
         probe_size: "60M",
     },
+    // RightSemi Join benchmarks with Int32 keys
+    //
+    // Fanout (average build rows matched per probe row, as measured by running
+    // the equivalent INNER JOIN under `EXPLAIN ANALYZE` and reading the
+    // `HashJoinExec` metrics): 1 for Q16-Q18. Build keys here are primary
+    // keys (`n_nationkey`, `s_suppkey`), so each probe row matches at most
+    // one build row. `prob_hit` controls what fraction of probe rows find
+    // that one match.
+    //
+    // Fanout still matters because semi joins short-circuit after the first
+    // match. Coverage of fanout > 1 (build-side duplicates) is left for a
+    // follow-up.

Review Comment:
   Yes, it would be great to add this coverage in a follow-up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to