2010YOUY01 commented on PR #17482:
URL: https://github.com/apache/datafusion/pull/17482#issuecomment-3395859957

   I have generated some queries to benchmark, the result looks amazing ๐Ÿš€ I 
think we can iterate from here, I'll review the implementation soon.
   
   <details>
     <summary>In-equality join benchmark</summary>
     To run it, manually swap the queries in `benchmark/nlj.rs`
   
   ```
   const NLJ_QUERIES: &[&str] = &[
       r#"
       set datafusion.optimizer.allow_piecewise_merge_join=true
   "#,
       // Q1: INNER 20 x 10K | Ultra-Low ~0.1%
       r#"
   SELECT *
   FROM generate_series(0, 19, 1) AS t1(v1)
   JOIN generate_series(0, 9999, 1) AS t2(v2)
   ON t1.v1 > t2.v2;
   "#,
       // Q2: INNER 2K x 10K | Low ~10%
       r#"
   SELECT *
   FROM generate_series(0, 1999, 1) AS t1(v1)
   JOIN generate_series(0, 9999, 1) AS t2(v2)
   ON t1.v1 > t2.v2;
   "#,
       // Q3: INNER 10K x 10K | Medium ~50%
       r#"
   SELECT *
   FROM generate_series(0, 9999, 1) AS t1(v1)
   JOIN generate_series(0, 9999, 1) AS t2(v2)
   ON t1.v1 > t2.v2;
   "#,
       // Q4: INNER 1K x 10K | High ~95% (LHS near top)
       r#"
   SELECT *
   FROM generate_series(9000, 9999, 1) AS t1(v1)
   JOIN generate_series(0, 9999, 1) AS t2(v2)
   ON t1.v1 > t2.v2;
   "#,
       // Q5: INNER 30K x 30K | Medium ~50% | comparator symmetry
       r#"
   SELECT *
   FROM generate_series(0, 29999, 1) AS t1(v1)
   JOIN generate_series(0, 29999, 1) AS t2(v2)
   ON t1.v1 < t2.v2;
   "#,
       // Q6: INNER 1K x 200K | Low ~0.25% (small โ†’ large)
       r#"
   SELECT *
   FROM generate_series(0, 999, 1) AS t1(v1)
   JOIN generate_series(0, 199999, 1) AS t2(v2)
   ON t1.v1 > t2.v2;
   "#,
       // Q7: INNER 200K x 10K | Med-Low ~2.5% (large โ†’ small)
       r#"
   SELECT *
   FROM generate_series(0, 199999, 1) AS t1(v1)
   JOIN generate_series(0, 9999, 1) AS t2(v2)
   ON t1.v1 < t2.v2;
   "#,
       // Q8: LEFT OUTER 20K x 300 | Low ~0.75% (outer-fill path)
       r#"
   SELECT *
   FROM generate_series(0, 19999, 1) AS t1(v1)
   LEFT JOIN generate_series(19700, 19999, 1) AS t2(v2)
   ON t1.v1 > t2.v2;
   "#,
       // Q9: RIGHT OUTER 100 x 10K | Zero-match (pure outer-extend check)
       r#"
   SELECT *
   FROM generate_series(0, 99, 1) AS t1(v1)
   RIGHT JOIN generate_series(19900, 19999, 1) AS t2(v2)
   ON t1.v1 > t2.v2;
   "#,
   ];
   ```
   
   </details>
   
   ```
   Result: yongting@Yongtings-MacBook-Pro-2 ~/C/d/benchmarks (pwmj *)> 
./bench.sh compare nlj pwmj
   Comparing nlj and pwmj
   --------------------
   Benchmark nlj.json
   --------------------
   โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
   โ”ƒ Query        โ”ƒ       nlj โ”ƒ      pwmj โ”ƒ          Change โ”ƒ
   โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
   โ”‚ QQuery 1     โ”‚   0.25 ms โ”‚   0.02 ms โ”‚  +13.85x faster โ”‚
   โ”‚ QQuery 2     โ”‚   8.86 ms โ”‚   0.30 ms โ”‚  +29.34x faster โ”‚
   โ”‚ QQuery 3     โ”‚  87.69 ms โ”‚   2.70 ms โ”‚  +32.50x faster โ”‚
   โ”‚ QQuery 4     โ”‚   6.92 ms โ”‚  51.96 ms โ”‚    7.51x slower โ”‚
   โ”‚ QQuery 5     โ”‚ 198.44 ms โ”‚  11.43 ms โ”‚  +17.36x faster โ”‚
   โ”‚ QQuery 6     โ”‚   8.08 ms โ”‚ 136.61 ms โ”‚   16.91x slower โ”‚
   โ”‚ QQuery 7     โ”‚ 473.14 ms โ”‚   2.83 ms โ”‚ +167.34x faster โ”‚
   โ”‚ QQuery 8     โ”‚   5.47 ms โ”‚  49.55 ms โ”‚    9.05x slower โ”‚
   โ”‚ QQuery 9     โ”‚   0.20 ms โ”‚   0.63 ms โ”‚    3.12x slower โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
   ```
   
   Note there are several slower queries, that's because one join side is very 
small, so the brute-force nested loop join become optimal, I suspect in some 
cases NLJ can even win Hash Join.
   The planner/optimizer should take those cases into account, I think this is 
a good follow-up project to do.
   
   When I was trying different queries, I noticed one query with full join 
case: piecewise merge join is slower than NLJ, but it should get faster, we 
should take a closer look:
   ```
   set datafusion.optimizer.allow_piecewise_merge_join = true;
   
   SELECT *
   FROM range(100000) AS t1
   FULL JOIN range(100000) AS t2
   ON (t1.value > t2.value);
   ```
   NLJ is around 1.2s on my machine, while PWMJ is around 2.5s.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to