akoshchiy commented on issue #15607:
URL: https://github.com/apache/datafusion/issues/15607#issuecomment-3801609515

   I've been working on range calculate optimization and have some results in 
#20014.  It uses the arrow's `DynComparator` for comparing array elements and 
arrow's kernels for precomputing bounds with deltas.
   
   Here is a benchmark results on my pc (ryzen 9900x, ubuntu 25.10)
   ```
   # before
   ./target/release-nonlto/deps/window_query_sql-cfc4a16626956623 --bench 
--noplot
   Gnuplot not found, using plotters backend
   window empty over, aggregate functions
                           time:   [6.5153 ms 6.5321 ms 6.5511 ms]
   Found 13 outliers among 100 measurements (13.00%)
     4 (4.00%) high mild
     9 (9.00%) high severe
   
   Benchmarking window empty over, built-in functions: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 12.8s, or reduce sample count to 30.
   window empty over, built-in functions
                           time:   [127.21 ms 128.60 ms 129.98 ms]
   
   Benchmarking window order by, aggregate functions: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 97.7s, or reduce sample count to 10.
   window order by, aggregate functions
                           time:   [984.48 ms 987.61 ms 990.80 ms]
   
   Benchmarking window order by and range offsets, aggregate functions: Warming 
up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 371.7s, or reduce sample count to 10.
   window order by and range offsets, aggregate functions
                           time:   [3.6569 s 3.6602 s 3.6637 s]
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   Benchmarking window order by, built-in functions: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 86.9s, or reduce sample count to 10.
   window order by, built-in functions
                           time:   [868.42 ms 870.71 ms 873.11 ms]
   Found 9 outliers among 100 measurements (9.00%)
     9 (9.00%) high mild
   
   Benchmarking window partition by, u64_wide, aggregate functions: Warming up 
for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 36.3s, or reduce sample count to 10.
   window partition by, u64_wide, aggregate functions
                           time:   [352.45 ms 354.97 ms 357.51 ms]
   
   window partition by, u64_narrow, aggregate functions
                           time:   [9.8960 ms 9.9682 ms 10.045 ms]
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   Benchmarking window partition by, u64_wide, built-in functions: Warming up 
for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 39.1s, or reduce sample count to 10.
   window partition by, u64_wide, built-in functions
                           time:   [388.56 ms 390.62 ms 392.69 ms]
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   window partition by, u64_narrow, built-in functions
                           time:   [36.278 ms 36.612 ms 36.945 ms]
   
   Benchmarking window partition and order by, u64_wide, aggregate functions: 
Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 54.1s, or reduce sample count to 10.
   window partition and order by, u64_wide, aggregate functions
                           time:   [532.08 ms 533.75 ms 535.46 ms]
   
   Benchmarking window partition and order by and range offsets, u64_wide, 
aggregate functions: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 63.5s, or reduce sample count to 10.
   window partition and order by and range offsets, u64_wide, aggregate 
functions
                           time:   [636.52 ms 638.74 ms 641.07 ms]
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   Benchmarking window partition and order by, u64_narrow, aggregate functions: 
Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 17.0s, or reduce sample count to 20.
   window partition and order by, u64_narrow, aggregate functions
                           time:   [168.59 ms 168.99 ms 169.42 ms]
   Found 5 outliers among 100 measurements (5.00%)
     3 (3.00%) high mild
     2 (2.00%) high severe
   
   Benchmarking window partition and order by and range offsets, u64_narrow, 
aggregate functions: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 58.4s, or reduce sample count to 10.
   window partition and order by and range offsets, u64_narrow, aggregate 
functions
                           time:   [572.87 ms 573.66 ms 574.46 ms]
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   Benchmarking window partition and order by, u64_wide, built-in functions: 
Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 51.4s, or reduce sample count to 10.
   window partition and order by, u64_wide, built-in functions
                           time:   [509.20 ms 511.51 ms 513.82 ms]
   
   Benchmarking window partition and order by, u64_narrow, built-in functions: 
Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 16.2s, or reduce sample count to 30.
   window partition and order by, u64_narrow, built-in functions
                           time:   [161.73 ms 162.18 ms 162.68 ms]
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   
   Benchmarking window partition and order by and range offsets, u64_wide, 
aggregate functions #2: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 64.2s, or reduce sample count to 10.
   window partition and order by and range offsets, u64_wide, aggregate 
functions #2
                           time:   [635.90 ms 638.16 ms 640.43 ms]
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   
   
   # after
   ./target/release-nonlto/deps/window_query_sql-cfc4a16626956623 --bench 
--noplot
   Gnuplot not found, using plotters backend
   window empty over, aggregate functions
                           time:   [6.4957 ms 6.5093 ms 6.5242 ms]
                           change: [−0.7133% −0.3502% −0.0081%] (p = 0.05 > 
0.05)
                           No change in performance detected.
   Found 16 outliers among 100 measurements (16.00%)
     5 (5.00%) high mild
     11 (11.00%) high severe
   
   Benchmarking window empty over, built-in functions: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 11.6s, or reduce sample count to 40.
   window empty over, built-in functions
                           time:   [113.10 ms 113.88 ms 114.68 ms]
                           change: [−12.647% −11.449% −10.347%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   Benchmarking window order by, aggregate functions: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 26.2s, or reduce sample count to 10.
   window order by, aggregate functions
                           time:   [257.99 ms 258.92 ms 259.91 ms]
                           change: [−73.906% −73.783% −73.653%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     7 (7.00%) high mild
   
   Benchmarking window order by and range offsets, aggregate functions: Warming 
up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 57.4s, or reduce sample count to 10.
   window order by and range offsets, aggregate functions
                           time:   [556.19 ms 557.88 ms 559.59 ms]
                           change: [−84.813% −84.758% −84.709%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   Benchmarking window order by, built-in functions: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 15.6s, or reduce sample count to 30.
   window order by, built-in functions
                           time:   [154.50 ms 155.14 ms 155.79 ms]
                           change: [−82.282% −82.183% −82.098%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   Benchmarking window partition by, u64_wide, aggregate functions: Warming up 
for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 35.7s, or reduce sample count to 10.
   window partition by, u64_wide, aggregate functions
                           time:   [350.09 ms 352.64 ms 355.17 ms]
                           change: [−1.6380% −0.6558% +0.3300%] (p = 0.20 > 
0.05)
                           No change in performance detected.
   Found 5 outliers among 100 measurements (5.00%)
     5 (5.00%) low mild
   
   window partition by, u64_narrow, aggregate functions
                           time:   [9.8499 ms 9.9108 ms 9.9729 ms]
                           change: [−1.5361% −0.5754% +0.4139%] (p = 0.25 > 
0.05)
                           No change in performance detected.
   
   Benchmarking window partition by, u64_wide, built-in functions: Warming up 
for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 38.6s, or reduce sample count to 10.
   window partition by, u64_wide, built-in functions
                           time:   [386.84 ms 389.04 ms 391.20 ms]
                           change: [−1.1424% −0.4060% +0.4229%] (p = 0.31 > 
0.05)
                           No change in performance detected.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   window partition by, u64_narrow, built-in functions
                           time:   [32.804 ms 33.098 ms 33.394 ms]
                           change: [−10.649% −9.5960% −8.4033%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   Benchmarking window partition and order by, u64_wide, aggregate functions: 
Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 61.0s, or reduce sample count to 10.
   window partition and order by, u64_wide, aggregate functions
                           time:   [608.91 ms 610.93 ms 612.95 ms]
                           change: [+13.925% +14.459% +14.983%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) low mild
     2 (2.00%) high mild
   
   Benchmarking window partition and order by and range offsets, u64_wide, 
aggregate functions: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 79.2s, or reduce sample count to 10.
   window partition and order by and range offsets, u64_wide, aggregate 
functions
                           time:   [787.09 ms 788.88 ms 790.68 ms]
                           change: [+22.988% +23.505% +24.037%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) low mild
     1 (1.00%) high mild
   
   window partition and order by, u64_narrow, aggregate functions
                           time:   [47.286 ms 47.538 ms 47.814 ms]
                           change: [−72.050% −71.869% −71.694%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   Benchmarking window partition and order by and range offsets, u64_narrow, 
aggregate functions: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 12.1s, or reduce sample count to 40.
   window partition and order by and range offsets, u64_narrow, aggregate 
functions
                           time:   [118.38 ms 119.05 ms 119.74 ms]
                           change: [−79.359% −79.247% −79.103%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) low mild
     2 (2.00%) high mild
   
   Benchmarking window partition and order by, u64_wide, built-in functions: 
Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 61.3s, or reduce sample count to 10.
   window partition and order by, u64_wide, built-in functions
                           time:   [612.57 ms 614.95 ms 617.33 ms]
                           change: [+19.513% +20.224% +20.949%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   window partition and order by, u64_narrow, built-in functions
                           time:   [41.364 ms 41.519 ms 41.679 ms]
                           change: [−74.516% −74.399% −74.275%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   Benchmarking window partition and order by and range offsets, u64_wide, 
aggregate functions #2: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 78.6s, or reduce sample count to 10.
   window partition and order by and range offsets, u64_wide, aggregate 
functions #2
                           time:   [783.53 ms 785.16 ms 786.79 ms]
                           change: [+22.528% +23.035% +23.519%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   ```
   So, there is a good performance gain on queries with no partition and 
low-cardinality col partitions, but also here is a regression queries on 
high-cardinality col partitions because of the small batches. Also there is a 
overflow regression because of a lack of saturating operations in arrow kernels.
    
   It's still in progress, now I'm trying to overcome the 20% regression - the 
main idea is use a custom array comparator which combines comparing and delta 
compute, and which could be built at once. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to