LiaCastaneda commented on PR #17529:
URL: https://github.com/apache/datafusion/pull/17529#issuecomment-3429689554

   👋 I was playing around with this feature today, here are some results for 
sf1 and sf10, claude did these nice summaries
   
   ```
   | Query | Baseline Time | Hash Pushdown Time | Time Change    | Baseline 
Memory | Hash Memory | Memory Change |
   
|-------|---------------|--------------------|----------------|-----------------|-------------|---------------|
   | Q1    | 55.80 ms      | 59.15 ms           | 1.06x slower   | 378.9 MB     
   | 349.0 MB    | -29.9 MB      |
   | Q2    | 32.31 ms      | 21.93 ms           | 1.47x faster   | 243.4 MB     
   | 223.8 MB    | -19.6 MB      |
   | Q3    | 50.02 ms      | 43.97 ms           | 1.14x faster   | 465.0 MB     
   | 438.5 MB    | -26.5 MB      |
   | Q4    | 34.93 ms      | 36.87 ms           | 1.06x slower   | 639.0 MB     
   | 656.9 MB    | +17.9 MB      |
   | Q5    | 58.00 ms      | 52.43 ms           | 1.11x faster   | 458.2 MB     
   | 453.5 MB    | -4.7 MB       |
   | Q6    | 24.32 ms      | 26.13 ms           | 1.07x slower   | 430.4 MB     
   | 422.9 MB    | -7.5 MB       |
   | Q7    | 61.62 ms      | 39.10 ms           | 1.58x faster   | 522.3 MB     
   | 494.2 MB    | -28.1 MB      |
   | Q8    | 54.45 ms      | 47.40 ms           | 1.15x faster   | 597.3 MB     
   | 576.9 MB    | -20.4 MB      |
   | Q9    | 61.88 ms      | 78.72 ms           | 1.27x slower   | 649.0 MB     
   | 729.8 MB    | +80.8 MB      |
   | Q10   | 61.38 ms      | 43.34 ms           | 1.42x faster   | 493.4 MB     
   | 457.4 MB    | -36.0 MB      |
   | Q11   | 21.78 ms      | 15.00 ms           | 1.45x faster   | 213.7 MB     
   | 173.5 MB    | -40.2 MB      |
   | Q12   | 56.02 ms      | 53.81 ms           | 1.04x faster   | 507.0 MB     
   | 480.0 MB    | -27.0 MB      |
   | Q13   | 27.48 ms      | 26.66 ms           | no change      | 386.6 MB     
   | 411.0 MB    | +24.4 MB      |
   | Q14   | 25.54 ms      | 29.37 ms           | 1.15x slower   | 410.0 MB     
   | 406.6 MB    | -3.4 MB       |
   | Q15   | 41.22 ms      | 40.93 ms           | no change      | 420.3 MB     
   | 420.8 MB    | +0.5 MB       |
   | Q16   | 18.24 ms      | 16.11 ms           | 1.13x faster   | 196.8 MB     
   | 193.5 MB    | -3.3 MB       |
   | Q17   | 68.96 ms      | 54.58 ms           | 1.26x faster   | 739.0 MB     
   | 702.2 MB    | -36.8 MB      |
   | Q18   | 78.27 ms      | 99.72 ms           | 1.27x slower   | 646.4 MB     
   | 763.2 MB    | +116.8 MB     |
   | Q19   | 40.37 ms      | 37.09 ms           | 1.09x faster   | 546.1 MB     
   | 555.2 MB    | +9.1 MB       |
   | Q20   | 34.68 ms      | 35.96 ms           | no change      | 466.8 MB     
   | 477.9 MB    | +11.1 MB      |
   | Q21   | 102.68 ms     | 87.24 ms           | 1.18x faster   | 714.5 MB     
   | 602.4 MB    | -112.1 MB     |
   | Q22   | 12.09 ms      | 13.67 ms           | 1.13x slower   | 205.6 MB     
   | 205.6 MB    | 0.0 MB        |
   ```
   
   SF10
   
   ```
   | Query | Baseline Time | Hash Pushdown Time | Time Change      | Baseline 
Memory | Hash Memory | Memory Change |
   |-------|---------------|--------------------| 
-----------------|-----------------|-------------|---------------|
   | Q1    | 472.58 ms     | 502.66 ms          | 1.06x slower     | 835.4 MB   
     | 844.5 MB    | +9.1 MB       |
   | Q2    | 131.42 ms     | 109.07 ms          | 1.21x faster     | 734.1 MB   
     | 766.8 MB    | +32.7 MB      |
   | Q3    | 428.75 ms     | 287.98 ms          | 1.49x faster     | 1.3 GB     
     | 1.1 GB      | -194.2 MB     |
   | Q4    | 327.30 ms     | 341.23 ms          | 1.04x slower     | 3.2 GB     
     | 3.4 GB      | +200 MB       |
   | Q5    | 430.55 ms     | 409.65 ms          | 1.05x faster     | 1.3 GB     
     | 1.3 GB      | +75 MB        |
   | Q6    | 172.13 ms     | 169.20 ms          | no change        | 1.0 GB     
     | 1.0 GB      | +8 MB         |
   | Q7    | 423.89 ms     | 284.72 ms          | 1.49x faster     | 1.3 GB     
     | 1.3 GB      | -4.1 MB       |
   | Q8    | 559.76 ms     | 454.55 ms          | 1.23x faster     | 1.8 GB     
     | 1.9 GB      | +45.8 MB      |
   | Q9    | 697.39 ms     | 835.67 ms          | 1.20x slower     | 2.7 GB     
     | 3.4 GB      | +700 MB       |
   | Q10   | 533.91 ms     | 363.03 ms          | 1.47x faster     | 1.8 GB     
     | 1.6 GB      | -138.1 MB     |
   | Q11   | 90.07 ms      | 63.19 ms           | 1.43x faster     | 514.3 MB   
     | 564.8 MB    | +50.5 MB      |
   | Q12   | 469.67 ms     | 451.50 ms          | 1.04x faster     | 1.3 GB     
     | 1.4 GB      | +50.1 MB      |
   | Q13   | 306.60 ms     | 284.17 ms          | 1.08x faster     | 2.8 GB     
     | 2.3 GB      | -500 MB       |
   | Q14   | 220.53 ms     | 225.02 ms          | no change        | 1.1 GB     
     | 1.2 GB      | +111.3 MB     |
   | Q15   | 321.28 ms     | 315.00 ms          | no change        | 969.6 MB   
     | 1.1 GB      | +95.2 MB      |
   | Q16   | 81.82 ms      | 79.00 ms           | no change        | 766.1 MB   
     | 794.2 MB    | +28.1 MB      |
   | Q17   | 694.52 ms     | 512.22 ms          | 1.36x faster     | 1.2 GB     
     | 1.3 GB      | +15.4 MB      |
   | Q18   | 904.41 ms     | 1290.26 ms         | 1.43x slower     | 4.4 GB     
     | 5.1 GB      | +700 MB       |
   | Q19   | 330.76 ms     | 273.10 ms          | 1.21x faster     | 1.6 GB     
     | 1.6 GB      | +26.7 MB      |
   | Q20   | 290.40 ms     | 282.53 ms          | no change        | 1.7 GB     
     | 1.6 GB      | -59.3 MB      |
   | Q21   | 947.20 ms     | 789.62 ms          | 1.20x faster     | 2.0 GB     
     | 2.0 GB      | +28.7 MB      |
   | Q22   | 89.73 ms      | 88.49 ms           | no change        | 615.0 MB   
     | 615.1 MB    | +0.1 MB       |
   ```
   
   Memory measurements are from `Peak RSS` which iiuc is the peak memory used 
by the process, not sure if this is the best way to test memory. Also, I'm not 
sure how we can test huge build sides with ~1B hashes, which makes me wonder 
what would be a good way to track how many hashes are generated per query, 
maybe include it as a metric of the join. 
   
   In any case, with some manual logging for Q18 (which appears to be the 
heaviest join), I'm seeing 1.5M hashes for SF1 and 15M hashes for SF10. The 
correlation between more data and more memory is clear, but IMO, if there's a 
way to measure the total number of hashes across partitions, could we opt out 
of the feature during runtime? and allow the user to configure this based on 
their available resources/ pod size


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to