UBarney commented on issue #16710:
URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3078931663

   > Thanks [@nuno-faria](https://github.com/nuno-faria) that's a great insight 
(for TPC-H / very nested joins we probably should implement a smarter join 
order algorithm).
   > 
   > For h2o joins however, it seems it is only doing a single join 
(https://github.com/apache/datafusion/blob/main/benchmarks/queries/h2o/join.sql)
 so I would expect the order to be correct there?
   
   The join order seems to be correct.
   
   `benchmarks/results/join/h2o_join.json` is the result from running the 
original join.sql, while 
`benchmarks/results/join/h2o_join_exch_join_order.json` is the result after 
swapping all the join orders in `join.sql`. As can be seen, the latter is 
slower.
   
   <details>
   <summary>h2o_join_exch_join_order.sql</summary>
   
   ```sql
   SELECT x.id1, x.id2, x.id3, x.id4 as xid4, small.id4 as smallid4, x.id5, 
x.id6, x.v1, small.v2 FROM small INNER JOIN x ON x.id1 = small.id1;
   
   SELECT x.id1 as xid1, medium.id1 as mediumid1, x.id2, x.id3, x.id4 as xid4, 
medium.id4 as mediumid4, x.id5 as xid5, medium.id5 as mediumid5, x.id6, x.v1, 
medium.v2 FROM medium INNER JOIN x ON x.id2 = medium.id2;
   
   SELECT x.id1 as xid1, medium.id1 as mediumid1, x.id2, x.id3, x.id4 as xid4, 
medium.id4 as mediumid4, x.id5 as xid5, medium.id5 as mediumid5, x.id6, x.v1, 
medium.v2 FROM medium RIGHT JOIN x ON x.id2 = medium.id2;
   
   SELECT x.id1 as xid1, medium.id1 as mediumid1, x.id2, x.id3, x.id4 as xid4, 
medium.id4 as mediumid4, x.id5 as xid5, medium.id5 as mediumid5, x.id6, x.v1, 
medium.v2 FROM medium JOIN x ON x.id5 = medium.id5;
   
   SELECT x.id1 as xid1, large.id1 as largeid1, x.id2 as xid2, large.id2 as 
largeid2, x.id3, x.id4 as xid4, large.id4 as largeid4, x.id5 as xid5, large.id5 
as largeid5, x.id6 as xid6, large.id6 as largeid6, x.v1, large.v2 FROM large 
JOIN x ON x.id3 = large.id3;
   
   ```
   
   </details>
   
   
   ```
   √ devhomeinsp  ~/c/datafusion > python3 benchmarks/compare.py 
benchmarks/results/join/h2o_join.json 
benchmarks/results/join/h2o_join_exch_join_order.json 
   ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
   ┃ Query        ┃       join ┃       join ┃       Change ┃
   ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
   │ QQuery 1     │    1.77 ms │    1.70 ms │    no change │
   │ QQuery 2     │   12.38 ms │   12.90 ms │    no change │
   │ QQuery 3     │   12.23 ms │   12.31 ms │    no change │
   │ QQuery 4     │   10.96 ms │   12.17 ms │ 1.11x slower │
   │ QQuery 5     │ 2206.66 ms │ 3882.40 ms │ 1.76x slower │
   └──────────────┴────────────┴────────────┴──────────────┘
   ┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
   ┃ Benchmark Summary      ┃           ┃
   ┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
   │ Total Time (join)      │ 2244.00ms │
   │ Total Time (join)      │ 3921.49ms │
   │ Average Time (join)    │  448.80ms │
   │ Average Time (join)    │  784.30ms │
   │ Queries Faster         │         0 │
   │ Queries Slower         │         2 │
   │ Queries with No Change │         3 │
   │ Queries with Failure   │         0 │
   └────────────────────────┴───────────┘
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to